Combining Strategies Efficiently: High-Quality Decisions from Conflicting Advice Wouter M. Koolen Abstract: Combining Strategies Efficiently: High-Quality Decisions from Conflicting Advice Wouter M. Koolen In this dissertation we study machine learning: the automated discovery and exploitation of regularities in data. We may use regularities identified in objects to explain the past (e.g. archaeology, justice), as well as regularities found in processes to predict the future (e.g. weather, stock market) and guide our actions. With ubiquitous computational resources, machine learning algorithms have become pervasive. For example, they manage financial portfolios and power-saving policy, provide personalised movie recommendations as well as advertisements, and form the core of state-of-the-art data compression software. This dissertation develops the theory of online learning, a branch of machine learning that investigates sequential decision problems with immediate feedback. In particular, we study the setting called prediction with expert advice. Our task is to predict a sequence of data. Each trial, we may first consult a given set of experts. We then combine their advice and issue our prediction of the next outcome. Finally, the next outcome is revealed, and we incur loss based on the discrepancy between our prediction and it. The goal is to build efficient algorithms with small regret, i.e. the difference between the incurred cumulative loss and the loss of the best strategy in hindsight from a fixed reference class. In this sense, the strategies in the reference class are the patterns, and achieving small regret means learning which reference strategy best models the data. The main difference between the learning problems we consider is the complexity of the reference set. Algorithms for prediction with expert advice have many applications including classification, regression, hypothesis testing, model selection, data compression, gambling and investing in the stock market. In Chapter 2 we give a game-theoretic analysis of the simplest online learning problem, the prediction of a sequence of binary outcomes under 0/1 loss with the help of two experts. For this simple problem, we compute the minimax, i.e. game-theoretically optimal, regret, and show how to implement the optimal strategy efficiently. We then give special attention to the case that one of the experts is good. We conclude with a new result: the optimal algorithm for competing with the set of meta-experts that switch between the two basic experts. In Chapter 3 we show how models for prediction with expert advice can be defined concisely and clearly using hidden Markov models (HMMs); standard algorithms can then be used to efficiently calculate how the expert predictions should be weighted. We focus on algorithms for tracking the best expert. Here the strategies in the reference set follow the advice of a single expert, but this expert may change between trials. We cast existing models as HMMs, starting from the fixed share algorithm, recover the running times and regret bounds for each algorithm, and discuss how they are related. We also describe three new models for switching between experts. In Chapter 4 we extend the setting to tracking the best learning expert. Whereas vanilla experts can be tapped for advice about the current trial, learning experts may be queried for advice given each possible subset of the past data. This additional power is available to both the algorithm and the reference strategies. Achieving small regret thus means learning how to partition the trials, and which learning expert to train and follow within each partition cell. We give efficient algorithms with small regret for tracking learning experts that can themselves be formalised using the expert HMMs of Chapter 3. In Chapter 5 we consider reference strategies that switch between two experts based on their cumulative loss instead of on time. This chapter is formulated in financial terms to make the presentation more intuitive. We present a simple online two-way trading algorithm that exploits fluctuations in the unit price of an asset. Rather than analysing worst-case performance under some assumptions, we prove a novel, unconditional performance bound that is parameterised either by the actual dynamics of the price of the asset, or by a simplifying model thereof. We discuss application of the results to prediction with expert advice, data compression and hypothesis testing. In Chapter 6 we consider prediction with structured concepts. Each round we select a concept, which is composed of components. The loss of a concept is the sum of the losses of its components. Whereas the losses of different components are independent, the losses of different concepts are highly related. We develop an online algorithm, called Component Hedge that exploits this dependence, and thereby avoids the so called range factor that arises when the dependences are ignored. We show that Component Hedge has optimal regret bounds for a large variety of structured concept classes.