TLDR
Is there a port of Excel 2019 for Marketing Statistics ?
Sample Size, Mean, Standard Deviation, and Standard Error of the Mean
Let’s start with some data:
# importing pandas as pd
import pandas as pd
# Creating the data frame
df = pd.DataFrame({
"A":[12, 4, 5, 44, 1],
"B":[5, 2, 54, 3, 2],
"C":[20, 16, 7, 3, 8],
"D":[14, 3, 17, 2, 6],
})
# Print the data frame
dfSample Size
We want a replacement of Excel’s =COUNT() function to get the sample size n - we need to get the row count in the data frame. Pandas also has a count() function but we should use the first item in the shape property.
df.shape[0]pitfall:
df.count() will only return the count of non-NA/NaN rows for each column. ## Mean We want a replacement of Excel’s =AVERAGE() function to calculate the mean \bar X according to: \bar X = \frac{\sum X}{n}
# Print the data frame
df.mean(axis = 0)Standard Deviation
We want a replacement of Excel’s =STDEV() function to calculate the mean \bar X according to:
SD = \sqrt { \frac{\sum {(X-\bar X)^2}}{n-1}}
df.std(axis = 0) Standard Error of the Mean
In this case there is no built in formula in excel for this. It would be calculated with an ugly formula that looks like: =STDEV()\SQRT(COUNT()) to calculate the SE according to: SE = \frac{SD}{\sqrt{n}}
# find standard error of the mean of all the columns
df.sem(axis = 0)Citation
@online{bochman2021,
author = {Bochman, Oren},
title = {Excel 2019 for {Marketing} {Statistics} in Pandas},
date = {2021-09-24},
url = {https://orenbochman.github.io/posts/2021/2021-09-24-statistics-for-marketing-in-python/},
langid = {en}
}