Interpreting test result in Mida report

Created by LiRou C, Modified on Thu, 12 Dec, 2024 at 2:23 PM by LiRou C

Mida employs a sequential testing methodology powered by a frequentist statistical engine - a robust approach to A/B testing that offers several key advantages for decision-making.

What is Sequential Testing?

Unlike traditional fixed-horizon testing that requires a predetermined sample size, sequential testing allows for continuous monitoring of results as data accumulates. This means you can:

Check results at any time during the test
Stop the test early when clear winners or losers emerge
Continue collecting data when results are inconclusive
Maintain statistical validity throughout the monitoring process

The Frequentist Framework:

Mida's statistical engine is built on frequentist principles, which:

Calculates the probability of observing the test results if there were truly no difference between variants
Controls the false positive rate (Type I error) at your chosen significance level (typically 95%)
Provides confidence intervals to show the range of likely true effect sizes
Makes no assumptions about prior probabilities, relying purely on observed data

This combination of sequential testing and frequentist statistics ensures:

Efficient resource use by enabling early stopping when appropriate
Protection against false conclusions through rigorous statistical controls
Clear, interpretable results based on observed data
Flexible monitoring without compromising statistical validity

Test Result Cases:

Case 1: Clear Winner

Green: This shows a winning variant. Given that the required confidence level is 95%, this test result is considered statistically significant because the statistical significance value of 99.71% surpasses the required threshold of 95%. Statistical significance refers to the probability that the differences observed in the test are not due to chance. In this context, a 99.71% statistical significance means that there is a less than 0.3% likelihood that the results occurred by chance. And, with an improvement of 233.04%, this is a meaningful lift.

Next, look at the confidence interval of the difference of means. The confidence interval provides the range of expected lift values, at the 95% confidence level. In other words, the lower bound is the “worst case” scenario of possible lift and the upper bound is the “best case” scenario. Here, you see a range from 0.44% to 1.32%. Since both numbers show a good increase compared to the Control CR (0.26%), you can feel confident about the change.

Case 2: Clear Loser

Red: This shows a losing variant. Given that the required confidence level is 90%, then the statistical significance of the losing variant should ideally be less than 10% (100%-90%). The statistical significance value of Variant 1 is 6.71% which implies that the observed difference occurred due to randomness is roughly 6.71%, a figure that is below the required threshold of 10%. In other words, you can be (100 - 6.71) = 93.29% confident that the losing variant, Variant 1, is indeed inferior to the winning variant, Control.

Looking at the confidence interval, we see a range from 4.69% to 5.29% Conversion Rate (CR). In simpler terms, 4.69% represent the “worst case” scenario, and 5.29% represent the “best case” scenario of Variant 1's performance. Since both values are below the 5.38% CR of Control, you can be confident that the Control is the winner.

Case 3: Inconclusive Result

Gray: This indicates that the test doesn't have definitive results and hasn't reached statistical significance yet. Depending on what you're trying to achieve with your experiment, here are some options you may want to consider:

1. Let It Run Longer: In some instances, you might need to allow the experiment more time to gather a larger sample size and achieve more accurate results.

2. Simplify Variations: If you have too many variations, consider reducing them. For instance, you might bring four variations down to just two or three.

3. Prioritize Brand Consistency: If the results are similar between two variations, choose the one that aligns best with your brand's guidelines.

4. Repeat the Test: Running the same test again can be beneficial for confirming your initial findings. Keep in mind that factors like the time of the year, or fluctuations in website traffic, may affect the end results.

5. Keep It As Is: Occasionally, your original design or strategy may not need any changes and is the most suitable version.