TLDR
Is there a port of Excel 2019 for Marketing Statistics ?
Sample Size, Mean, Standard Deviation, and Standard Error of the Mean
Let’s start with some data:
# importing pandas as pd
import pandas as pd
# Creating the data frame
= pd.DataFrame({
df "A":[12, 4, 5, 44, 1],
"B":[5, 2, 54, 3, 2],
"C":[20, 16, 7, 3, 8],
"D":[14, 3, 17, 2, 6],
})
# Print the data frame
df
Sample Size
We want a replacement of Excel’s =COUNT()
function to get the sample size n - we need to get the row count in the data frame. Pandas also has a count()
function but we should use the first item in the shape property.
0] df.shape[
pitfall:
df.count()
will only return the count of non-NA/NaN rows for each column. ## Mean We want a replacement of Excel’s =AVERAGE()
function to calculate the mean \bar X according to: \bar X = \frac{\sum X}{n}
# Print the data frame
= 0) df.mean(axis
Standard Deviation
We want a replacement of Excel’s =STDEV()
function to calculate the mean \bar X according to:
SD = \sqrt { \frac{\sum {(X-\bar X)^2}}{n-1}}
= 0) df.std(axis
Standard Error of the Mean
In this case there is no built in formula in excel for this. It would be calculated with an ugly formula that looks like: =STDEV()\SQRT(COUNT())
to calculate the SE according to: SE = \frac{SD}{\sqrt{n}}
# find standard error of the mean of all the columns
= 0) df.sem(axis
Citation
@online{bochman2021,
author = {Bochman, Oren},
title = {Excel 2019 for {Marketing} {Statistics} in Pandas},
date = {2021-09-24},
url = {https://orenbochman.github.io/posts/2021/2021-09-24-statistics-for-marketing-in-python/2021-09-24-statistics-for-marketing-in-python.html},
langid = {en}
}