We bet the value of our tests on baselines.
We wouldn’t consider doing a test without one. They tell us whether the change we are making is improving things or making them worse.
Imagine you think you’ve developed a cure for a disease. How would you test it? You’d give it to people who have the disease and see if they improve. But some people with the disease improve without your cure, too. So how would you know that your cure is responsible for improvements you see? Well, you would have to have a totally separate group of people who also have the disease but who do not take your cure. Then you compare the differences between the groups.
In experimental design the baseline is set by a control group. It’s compared with the group that gets a treatment. The control group gets no treatment – a placebo. The goal is to eliminate variables that cause confusion. If results are the same for both treatment and control groups then the treatment didn’t do anything.
We put a fully randomized group of 10% of visitors in our control groups for each test. They don’t see the changes we are making. Then we measure the difference between the groups. The difference, once statistically significant, tells us what effect the treatment caused.
We tried complicated approaches to baselines in the past. We adjusted ratios dynamically and aggregated up to get significance. But those approaches didn’t work. The best way for us is the old way. Simply carving off some of the visitors from the test, creating a control group and measuring it.
Please hop in the comments section and share your experiences with baselines.