With fixed horizon testing, the data scientist cannot say anything statistically (confidence intervals or p values) about the experiment and can only say this is the number of exposed users and this is the treatment mean and control mean. Often, a PM will ask a data scientist how an experiment is doing a couple of days after the experiment has started. Naturally, as humans, we want to keep peeking at the data and roll out features that help our customer base as quickly as possible. By peeking often, we can decrease the experiment duration if the effect size is much bigger than the minimum detectable effect (MDE). In the fixed horizon framework, this should not be done as you will increase the false positive rate. The consequence of this is that we can do what all product managers (PM) want to do, which is “run a test until it is statistically significant and then stop.” It is similar to the “set it and forget it” approach with target-date funds. Also, you do not have to decide before the test starts how many times you are going to peek like you have to do with a grouped sequential test. The specific version of sequential testing that we use at Amplitude, called mixture Sequential Probability Ratio Test (mSPRT), allows you to peek as many times as you want. The advantage of sequential testing is that you can peek several times. Peeking several times → end experiment earlier Sequential testing advantagesįirst, we will explore the advantages of sequential testing. There are pros and cons for each approach, and it is not a case where one method is always better than the other. Note: Throughout this post, when we say T-test, we are referring to the fixed horizon T-test. In this technical post, we will explain the pros and cons of the sequential test and fixed horizon T-test. We envision several customers asking “How do I know what test to pick?” A big component of causality is a statistical analysis of experimentation data.Īt Amplitude, we have recently released a fixed horizon T-test in addition to sequential testing, which we have had since the beginning of Experiment. Now, data-driven companies use experimentation to make decision-making more objective. You are able to make statements like “changing caused conversion to increase by 5%.” Without experimentation, a more common approach is to make changes based on domain knowledge or select customer requests. If \(T C\) reaches \(N\), stop the test.Experimentation helps product teams make better decisions based on causality instead of correlations. If \(T-C\) reaches \(2\sqrt\), stop the test. Track the number of incoming successes from the control group. Track the number of incoming successes from the treatment group. The sequential procedure works like this:Īt the beginning of the experiment, choose a sample size \(N\).Īssign subjects randomly to the treatment and control, with 50% probability each. Sequential sampling allows the experimenter to stop the trial early if the treatment appears to be a winner it therefore addresses the “peeking” problem associated with eager experimenters who use (abuse) traditional fixed-sample methods. In this post, I will describe a simple procedure for analyzing data in a continuous fashion via sequential sampling. Stopping an A/B test early because the results are statistically significant is usually a bad idea.
0 Comments
Leave a Reply. |