Abstract: The current framework of large-scale A/B testing in the tech industry has several drawbacks. Firstly, the tests are often continuously monitored without correcting for the resulting inflation in the false alarm rate. Secondly, the number of samples used in the experiment grows linearly with the number of options being tested, independent of the quality of the options. Lastly, running hundreds or thousands of such tests artificially inflates the apparent number of significant discoveries, and companies have no idea what proportion of their discoveries are spurious.
We propose a new framework as an alternative to existing setups for controlling false alarms across multiple A/B tests, and tackles all three aforementioned issues. It combines ideas from pure exploration for best-arm identification in multi-armed bandits (MAB), with online false discovery rate (FDR) control. This framework has various applications, including pharmaceutical companies testing a control pill against a few treatment options, to internet companies testing their current default webpage (control) versus many alternatives (treatment). Our setup allows running a (possibly infinite) sequence of best-arm MAB instances, and controlling the overall FDR of the process in a fully online manner. We adapt existing theory from both the MAB and online FDR literature to ensure that our framework comes with strong sample-optimality guarantees, as well as control of the power and (a modified) FDR at any time.
This talk will incorporate ideas from different joint works with Fanny Yang, Kevin Jamieson, Michael Jordan, Martin Wainwright, Tijana Zrnic, Akshay Balsubramani, Steve Howard, Jas Sekhon and Jon McAuliffe.
Bio: Aaditya Ramdas is a postdoctoral researcher in Statistics and EECS at UC Berkeley, advised by Michael Jordan and Martin Wainwright. He finished his PhD in Statistics and Machine Learning at CMU, advised by Larry Wasserman and Aarti Singh, winning the Best Thesis Award in Statistics. A lot of his research focuses on modern aspects of reproducibility in science and technology — involving statistical testing and false discovery rate control in static and dynamic settings.