import numpy as np
import matplotlib.pyplot
import seaborn as sns
from collections import Counter
= 10
n_trials =3/4
p=1000
size= np.random.binomial(n=n_trials, p=p, size=size)
x= Counter(x)
freqs ##probs = freqs/size
##print(probs)
##sns.distplot(x, kde=True)
=False, stat='density',binwidth=1.0,fill=False) sns.histplot(x, kde
Notes from Udacity A/B Testing course, I took this course around the time it first launched. The course is about planning and analyzing A/B tests - not about implementing A/B testing using a specific framework.
Instructors:
- Carrie Grimes Bostock Googler,
- Caroline Buckey Polaris Googler,
- Diane Tang Googler.
Lesson 1: Overview of A/B Testing
The Instructors gave the following examples of A/B testing from the industry:
- Google tested 41 different shades of blue.
- Amazon initially decided to launch their first personalized product recommendations based on an A/B test showing a huge revenue increase by adding that feature. (See the second paragraph in the introduction.)
- LinkedIn tested whether to use the top slot on a user’s stream for top news articles or an encouragement to add more contacts. (See the first paragraph in “A/B testing with view based JSON” section.)
- Amazon determined that every 100ms increase in page load time decreased sales by 1%. (In “Secondary metrics” section on the last page) Google’s latenc resultsy showed a similar impact for a 100ms delay.
- Kayak tested whether notifying users that their payment was encrypted would make users more or less likely to complete the payment.
- Khan Academy tests changes like letting students know how many other students are working on the exercise with them, or making it easier for students to fast-forward past skills they already have. (See the question “What is the most interesting A/B test you’ve seen so far?”)
- Metrics Difference between click-through rate and click-through probability?
- CTR is used to measure usability e.g. how easy to find the button, \(\frac{ \text { click}}{\text{ page views}}\).
- CTP is used to measure the impact \(\frac{ \text {unique visitors click}}{\text{ unique visitors view the page}}\).
- Statistical significance and practical significance
- Statistical significance is about ensuring observed effects are not due to chance.
- Practical significance depends on the industry e.g. medicine vs. internet.
- Statistical significance
- \(\alpha\): the probability you happen to observe the effect in your sample if \(H_0\) is true.
- Small sample: \(\alpha\) low, \(\beta\) high.
- Larger sample, \(\alpha\) same, \(\beta\) lower
- any larger change than your practical significant boundary will have a lower \(\beta\), so it will be easier to detect the significant difference.
- \(1-\beta\) also called sensitivity
- How to calculate sample size?
- Use this calculator, input baseline conversion rate, minimum detectable effect (the smallest effect that will be detected \((1-\beta)%\) of the time), alpha, and beta.
Python Modelling
Binomeal Distribution
Estimate mean and standard deviation
={'float':"{0:0.2f}".format})
np.set_printoptions(formatter=2)
np.set_printoptions(precision= np.round(x.mean(),2)
mean = np.round(n_trials* p,2)
mean_theoretical =6
widthprint(f'mean {mean: <{width}} mean_theoretical {mean_theoretical}')
= np.round(x.var(),2)
variance = np.round(n_trials* p * (1-p),2)
variance_theoretrical print(f'var {variance: <{width}} var_theoretrical {variance_theoretrical}')
= np.round(x.std(),2)
sd = np.round(np.sqrt(variance_theoretrical),2)
sd_theoretical print(f'sd {sd: <{width}} sd_theoretical {sd_theoretical}')
##TODO can we do it with PYMC, in a tab
mean 7.47 mean_theoretical 7.5
var 1.93 var_theoretrical 1.88
sd 1.39 sd_theoretical 1.37
Estimating p from data
= 10
size =10
n_trials= np.random.uniform(low=0.0, high=1.0)
p= np.random.binomial(n=n_trials, p=p, size=size)
x=round(p,3)
p=np.round(x.mean()/n_trials,3)
p_est=np.round((x.mean()+1)/(n_trials+2),3) ## baysian estimator
p_b_estprint(f'{p=} {p_est=} {p_b_est=}')
print(f'\t {np.round(np.abs(p-p_est),3)} {np.round(np.abs(p-p_b_est),3)}')
p=0.162 p_est=0.1 p_b_est=0.167
0.062 0.005
Estimating Confidece Intervals
=n_trials
n= 95/100
confidence =1-confidence
alpha=1-(1/2)*alpha
z=np.round(z+np.sqrt(p_est*(1-p_est)/n_trials),2)
ciprint(f'{alpha=},{z=}')
print(f'[-{ci},{ci}] wald ci')
=1-(1/2)*alpha
z_lb=1-(1/2)*(1-alpha)
z_ubprint(f'{alpha=},{z_lb=},{z_ub=}')
=(p_est+z_lb*z_lb/(2*n)+z_lb*np.sqrt(p_est*(1-p_est)/n + z_lb*z_lb/(4*n)))/(1+z_lb*z_lb/n)
lb_wilson=(p_est+z_ub*z_ub/(2*n)+z_ub*np.sqrt(p_est*(1-p_est)/n + z_ub*z_ub/(4*n)))/(1+z_ub*z_ub/n)
ub_wilsonprint(f'[-{lb_wilson},{ub_wilson}] wilson ci')
alpha=0.050000000000000044,z=0.975
[-1.07,1.07] wald ci
alpha=0.050000000000000044,z_lb=0.975,z_ub=0.525
[-0.2958906056936498,0.17513456528071358] wilson ci
Resources
- A/B testing article on Wikipedia.
- These notes were influenced by Joanna