KTH
Wednesday, October 9th, 12:30pm in CS conference room
Abstract:
Bandit optimisation problems constitute the most
fundamental
and basic instances of sequential decision problems with an
exploration-exploitation trade-off. They naturally arise in many
contemporary applications found in communication networks, e-commerce
and recommendation systems. In this lecture, we present recent results
on bandit optimisation problems with large strategy sets. For such
problems, the number of possible strategies may not be negligible
compared to the time horizon. However the average rewards obtained
using the various strategies exhibit some structure that may be
exploited to speed up the learning process, and in turn, improve
performance. We propose simple algorithms, optimally leveraging this
structural property, and discuss their applications in communication
networks and recommendation systems.
Bio: Alexandre Proutiere is an associate professor of Electrical Engineering at KTH, the Royal Institute of Technology. He received an engineering degree from Ecole Nationale Superieure des Telecoms (Paris) and then, from 1998 to 2000, he worked in the radio communication department at the Ministry of Foreign Affairs in Paris. He received his PhD in Applied Mathematics from Ecole Polytechnique, Palaiseau, France in 2003 under the supervision of James Roberts. Following his PhD he worked as a researcher at Microsoft Research in Cambridge (UK) before joining KTH as an associate professor. Proutiere's research focuses on the design and performance evaluation of computer networks, with a specific interest in resource allocation and control in wireless systems. His work has had significant impact both in the development of theoretical tools and in their practical application. This is highlighted by the "Best Paper" awards he has received from top publication venues such as ACM SIGMETRICS and ACM Mobihoc. Additionally, his work has been recognized by the ACM SIGMETRICS Rising Star award, given for outstanding contributions to computer/communication performance evaluation by a research not more than 7 years from their PhD.