Caution
Section omitted to comply with the Honor Code
Bayesian Statistics: From Concept to Data Analysis
Oren Bochman
Conditional Probability, Bayes’ Law, Homework
Section omitted to comply with the Honor Code
---
title: "Conditional Probability and Bayes' Law - M1L2HW1"
subtitle: "Bayesian Statistics: From Concept to Data Analysis"
categories:
- Bayesian Statistics
keywords:
- Conditional Probability
- Bayes' Law
- Homework
---
::::: {.content-visible unless-profile="HC"}
::: {.callout-caution}
Section omitted to comply with the Honor Code
:::
:::::
::::: {.content-hidden unless-profile="HC"}
\index{dataset! Titanic}
[Titanic data]{.column-margin} I put the data from the Titanic for this problem into a .csv file and loaded it into a data frame as follows:
```{python}
import pandas as pd
df=pd.read_csv("data/titanic.csv",index_col='class')
print(df)
```
::: {#exr-cpb-1}
[Titanic data]{.column-margin} If we randomly select a person's name from the complete list of passengers and crew, what is the probability that this person traveled in 1st class? Round your answer to two decimal places.
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{python}
def l02q01(df):
return round(df.loc[1].sum() / df.values.sum(),2)
l02q01(df)
```
:::
::: {#exr-cpb-2}
[Titanic data]{.column-margin} What is the probability that a (randomly selected) person survived? Round your answer to two decimal places.
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{python}
def l02q02(df):
return round(df.survived.sum() / df.values.sum(),2)
l02q02(df)
```
:::
::: {#exr-cpb-3}
[Titanic data]{.column-margin} What is the probability that a (randomly selected) person survived, given that they were in 1st class? Round your answer to two decimal places.
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{python}
def l02q03(df):
return round(
df.loc[1,'survived'].sum() /
df.loc[1,].sum(),
2)
l02q03(df)
```
:::
::: {#exr-cpb-4}
[Titanic data]{.column-margin} True/False: The events concerning class and survival are statistically independent.
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
False
:::
\index{dataset!marbles}
[Marbles data]{.column-margin} You have three bags, labeled A, B, and C. Bag A contains two red marbles and three blue marbles. Bag B contains five red marbles and one blue marble. Bag C contains three red marbles only.
again I put the data for this into a CSV file and loaded it as follows
```{python}
import pandas as pd
mdf=pd.read_csv("data/marbles.csv",index_col='bag')
print(mdf)
```
::: {#exr-cpb-5}
[Marbles data]{.column-margin} If you select from bag B, what is the probability that you will draw a red marble?
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{python}
def l02q05(df):
p = {}
p['red n b'] = df.loc['b','red'].sum() / df.loc['b',].sum()
print(f"pr['red n b'] = {df.loc['b','red'].sum()} / {df.loc['b',].sum()} = {p['red n b']}")
l02q05(mdf)
```
:::
::: {#exr-cpb-6}
[Marbles data]{.column-margin} If you randomly select one of the three bags with equal probability, so that $\mathbb{P}r(A)=\mathbb{P}r(B)=\mathbb{P}r(C)=1/3$, and then randomly draw a marble from that bag, what is the probability that the marble will be blue? Round your answer to two decimal places.
:::
::: {.callout-note collapse="true"}
#### Hint
This is the marginal probability $\mathbb{P}r(blue)$. You can obtain this using the `law of total` probability (which appears in the denominator of Bayes' theorem). It is
$$
\begin{aligned}
\mathbb{P}r(blue) &= \mathbb{P}r(blue \cap A) + \mathbb{P}r(blue \cap B) + \mathbb{P}r(blue \cap C)
\\ &= \mathbb{P}r(blue \mid A) \cdot \mathbb{P}r(A)+\mathbb{P}r(blue \mid B)\cdot \mathbb{P}r(B) + \mathbb{P}r(blue \mid C) \cdot \mathbb{P}r(C)
\end{aligned}
$$
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{python}
def l02q06(df):
p = {}
for bag in ['a','b','c']:
p[f'blue|{bag}'] = df.loc[bag,'blue'] / df.loc[bag,].sum()
p[bag] = 1/3
p[f'blue n {bag}'] = p[f'blue|{bag}'] * p[bag]
return round(p['blue n a'] + p['blue n b'] + p['blue n c'],2)
print(f'pr(blue)={l02q06(mdf)}')
```
:::
::: {#exr-cpb-7}
[Marbles data]{.column-margin} Suppose a bag is randomly selected (again, with equal probability), but you do not know which it is. You randomly draw a marble and observe that it is blue. What is the probability that the bag you selected this marble from is A? That is, find $\mathbb{P}r(A\mid \text{blue})$.
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{python}
def l02q07(df):
p = {}
for bag in ['a','b','c']:
p[f'blue|{bag}'] = df.loc[bag,'blue'] / df.loc[bag,].sum()
p[bag] = 1/3
p[f'blue n {bag}'] = p[f'blue|{bag}'] * p[bag]
p['blue']= p['blue n a'] + p['blue n b'] + p['blue n c']
p['a|blue']=p['blue|a']*p['a']/p['blue']
return p
p=l02q07(mdf)
print(f"{p['a|blue']=:1.2f}")
```
:::
::: {#exr-cpb-8}
[Marbles data]{.column-margin} Suppose a bag is randomly selected (again, with equal probability), but you do not know which it is. You randomly draw a marble and observe that it is blue. What is the probability that the bag you selected is `C`? That is, find $\mathbb{P}r(C\mid \text{blue})$.
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{python}
def l02q08(df):
p = {}
for bag in ['a','b','c']:
p[f'blue|{bag}'] = df.loc[bag,'blue'] / df.loc[bag,].sum()
p[bag] = 1/3
p[f'blue n {bag}'] = p[f'blue|{bag}'] * p[bag]
p['blue']= p['blue n a'] + p['blue n b'] + p['blue n c']
p['z|blue']=p['blue|a']*p['a']/p['blue']
p['c|blue']=p['blue|c']*p['c']/p['blue']
return p
p=l02q08(mdf)
print(f"{p['c|blue']=:1.2f}")
```
:::
::: {#exr-cpb-9}
[Marbles data]{.column-margin} Suppose a bag is randomly selected (again, with equal probability), but you do not know which it is. You randomly draw a marble and observe that it is red. What is the probability that the bag you selected is C? That is, find $\mathbb{P}r(C\mid \text{red})$.
:::
::: {.solution .callout-tip collapse="true"}
#### Solution:
```{python}
def l02q09(df):
p = {}
for bag in ['a','b','c']:
p[f'blue|{bag}'] = df.loc[bag,'blue'] / df.loc[bag,].sum()
p[f'red|{bag}'] = df.loc[bag,'red'] / df.loc[bag,].sum()
p[bag] = 1/3
p[f'blue n {bag}'] = p[f'blue|{bag}'] * p[bag]
p[f'red n {bag}'] = p[f'red|{bag}'] * p[bag]
#print(f'{p}')
p['blue'] = p['blue n a'] + p['blue n b'] + p['blue n c']
p['red' ] = p['red n a'] + p['red n b'] + p['red n c']
p['z|blue'] = p['blue|a'] * p['a'] / p['blue']
p['c|blue'] = p['blue|c'] * p['c'] / p['blue']
p['c|red' ] = p['red|c'] * p['c'] / p['red']
return p
p=l02q09(mdf)
print(f"{p['c|red']=:1.2f}")
```
:::
:::::