뉴스 및 공지사항

    [세미나] [ISE Seminar] Dec 3 16:00 / E2 1501 / Doubly robust methods for multi armed bandits / Prof. Garud N. lyengar / Columbia University
    • 관리자
    • 2024.11.26
    • 63
    산업및시스템공학과에서는 Columbia University Garud N. Iyengar 교수님을 모시고 세미나를 진행하오니, 관련 분야 연구자 분들의 많은 관심과 참석 부탁드립니다. 

     

     1. 일시 : 2024년 12월 3일(화), 오후 4시

     

     2. 장소 :  산업경영학동(E2-2) 1층 공동강의실(1501호)

     
     3. 연사 :  Professor Garud N. Iyengar(Columbia University, https://www.engineering.columbia.edu/faculty-staff/directory/garud-n-iyengar)
     
     4. Title : Doubly robust methods for multi armed bandits
     
     5. Abstract:  In a multi-arm bandit (MAB) setting, in each epoch, the decision maker chooses one arm and observes the reward associated with only that arm, i.e. the rewards of all other arms are missing! Doubly robust (DR) estimation is a well-known technique in the statistics literature on handling missing data. The pseudo-reward samples generated by the DR technique are unbiased provided either the model for the data or the probability that the data is missing is known. In the MAB setting, the probability that the data for an arm is missing is known! Therefore, the DR technique generates unbiased pseudo-rewards for the unselected arms. In each round, the conventional methods for MAB update the estimate for the reward of the chosen arm, and the DR estimator imputes the missing rewards and computes new estimates for all arms in all rounds. Thus, there is a possibility that DR estimates for the rewards of all arms converge uniformly independent of the specific arm selection policy used. This potentially allows us to simultaneously reduce regret and optimize other criteria, e.g. identify the best arm, or the arms on a Pareto front. We show that this is indeed possible in many contexts such as revenue management, Pareto front identification, and sparse linear reinforcement learning, and this leads to improved theoretical guarantees and empirical performance.
     
     6. Bio: Garud Iyengar is the Avanessians Director of the Data Science Institute at Columbia University, and a Professor in the Engineering School. He received his B. Tech. in Electrical Engineering from IIT Kanpur, and an MS and PhD in Electrical Engineering from Stanford  University. His research interests are broadly in control, machine learning and optimization. His current projects focus on the areas of large-scale power systems and supply chains, causal inference, distributional robust decision making,  and modeling of cellular processes. He was elected an INFORMS Fellow in 2018. 
     
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     
    We invite you to attend the following seminar by Prof. Garud N. Iyengar(Columbia University).
     
     1. Date & Time : 3 December(Tue) 16:00 p.m

     

     2. Place : IE B/D(E2-2) Lecutre Room #1501
     
     3. Speaker : Professor Garud N. Iyengar(Columbia University, https://www.engineering.columbia.edu/faculty-staff/directory/garud-n-iyengar)
     
     4. Title : Doubly robust methods for multi armed bandits
     
     5. Abstract:  In a multi-arm bandit (MAB) setting, in each epoch, the decision maker chooses one arm and observes the reward associated with only that arm, i.e. the rewards of all other arms are missing! Doubly robust (DR) estimation is a well-known technique in the statistics literature on handling missing data. The pseudo-reward samples generated by the DR technique are unbiased provided either the model for the data or the probability that the data is missing is known. In the MAB setting, the probability that the data for an arm is missing is known! Therefore, the DR technique generates unbiased pseudo-rewards for the unselected arms. In each round, the conventional methods for MAB update the estimate for the reward of the chosen arm, and the DR estimator imputes the missing rewards and computes new estimates for all arms in all rounds. Thus, there is a possibility that DR estimates for the rewards of all arms converge uniformly independent of the specific arm selection policy used. This potentially allows us to simultaneously reduce regret and optimize other criteria, e.g. identify the best arm, or the arms on a Pareto front. We show that this is indeed possible in many contexts such as revenue management, Pareto front identification, and sparse linear reinforcement learning, and this leads to improved theoretical guarantees and empirical performance.
     
     6. Bio: Garud Iyengar is the Avanessians Director of the Data Science Institute at Columbia University, and a Professor in the Engineering School. He received his B. Tech. in Electrical Engineering from IIT Kanpur, and an MS and PhD in Electrical Engineering from Stanford  University. His research interests are broadly in control, machine learning and optimization. His current projects focus on the areas of large-scale power systems and supply chains, causal inference, distributional robust decision making,  and modeling of cellular processes. He was elected an INFORMS Fellow in 2018.