4 Conditional Probability and Bayes’ Law - M1L2HW1

Caution

Section omitted to comply with the Honor Code

--- title: "Conditional Probability and Bayes' Law - M1L2HW1" subtitle: "Bayesian Statistics: From Concept to Data Analysis" categories: - Bayesian Statistics keywords: - Conditional Probability - Bayes' Law - Homework --- ::::: {.content-visible unless-profile="HC"} ::: {.callout-caution} Section omitted to comply with the Honor Code ::: ::::: ::::: {.content-hidden unless-profile="HC"} \index{dataset! Titanic} [Titanic data]{.column-margin} I put the data from the Titanic for this problem into a .csv file and loaded it into a data frame as follows: ```{python} import pandas as pd df=pd.read_csv("data/titanic.csv",index_col='class') print(df) ``` ::: {#exr-cpb-1} [Titanic data]{.column-margin} If we randomly select a person's name from the complete list of passengers and crew, what is the probability that this person traveled in 1st class? Round your answer to two decimal places. ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: ```{python} def l02q01(df): return round(df.loc[1].sum() / df.values.sum(),2) l02q01(df) ``` ::: ::: {#exr-cpb-2} [Titanic data]{.column-margin} What is the probability that a (randomly selected) person survived? Round your answer to two decimal places. ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: ```{python} def l02q02(df): return round(df.survived.sum() / df.values.sum(),2) l02q02(df) ``` ::: ::: {#exr-cpb-3} [Titanic data]{.column-margin} What is the probability that a (randomly selected) person survived, given that they were in 1st class? Round your answer to two decimal places. ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: ```{python} def l02q03(df): return round( df.loc[1,'survived'].sum() / df.loc[1,].sum(), 2) l02q03(df) ``` ::: ::: {#exr-cpb-4} [Titanic data]{.column-margin} True/False: The events concerning class and survival are statistically independent. ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: False ::: \index{dataset!marbles} [Marbles data]{.column-margin} You have three bags, labeled A, B, and C. Bag A contains two red marbles and three blue marbles. Bag B contains five red marbles and one blue marble. Bag C contains three red marbles only. again I put the data for this into a CSV file and loaded it as follows ```{python} import pandas as pd mdf=pd.read_csv("data/marbles.csv",index_col='bag') print(mdf) ``` ::: {#exr-cpb-5} [Marbles data]{.column-margin} If you select from bag B, what is the probability that you will draw a red marble? ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: ```{python} def l02q05(df): p = {} p['red n b'] = df.loc['b','red'].sum() / df.loc['b',].sum() print(f"pr['red n b'] = {df.loc['b','red'].sum()} / {df.loc['b',].sum()} = {p['red n b']}") l02q05(mdf) ``` ::: ::: {#exr-cpb-6} [Marbles data]{.column-margin} If you randomly select one of the three bags with equal probability, so that $\mathbb{P}r(A)=\mathbb{P}r(B)=\mathbb{P}r(C)=1/3$, and then randomly draw a marble from that bag, what is the probability that the marble will be blue? Round your answer to two decimal places. ::: ::: {.callout-note collapse="true"} #### Hint This is the marginal probability $\mathbb{P}r(blue)$. You can obtain this using the `law of total` probability (which appears in the denominator of Bayes' theorem). It is $$ \begin{aligned} \mathbb{P}r(blue) &= \mathbb{P}r(blue \cap A) + \mathbb{P}r(blue \cap B) + \mathbb{P}r(blue \cap C) \\ &= \mathbb{P}r(blue \mid A) \cdot \mathbb{P}r(A)+\mathbb{P}r(blue \mid B)\cdot \mathbb{P}r(B) + \mathbb{P}r(blue \mid C) \cdot \mathbb{P}r(C) \end{aligned} $$ ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: ```{python} def l02q06(df): p = {} for bag in ['a','b','c']: p[f'blue|{bag}'] = df.loc[bag,'blue'] / df.loc[bag,].sum() p[bag] = 1/3 p[f'blue n {bag}'] = p[f'blue|{bag}'] * p[bag] return round(p['blue n a'] + p['blue n b'] + p['blue n c'],2) print(f'pr(blue)={l02q06(mdf)}') ``` ::: ::: {#exr-cpb-7} [Marbles data]{.column-margin} Suppose a bag is randomly selected (again, with equal probability), but you do not know which it is. You randomly draw a marble and observe that it is blue. What is the probability that the bag you selected this marble from is A? That is, find $\mathbb{P}r(A\mid \text{blue})$. ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: ```{python} def l02q07(df): p = {} for bag in ['a','b','c']: p[f'blue|{bag}'] = df.loc[bag,'blue'] / df.loc[bag,].sum() p[bag] = 1/3 p[f'blue n {bag}'] = p[f'blue|{bag}'] * p[bag] p['blue']= p['blue n a'] + p['blue n b'] + p['blue n c'] p['a|blue']=p['blue|a']*p['a']/p['blue'] return p p=l02q07(mdf) print(f"{p['a|blue']=:1.2f}") ``` ::: ::: {#exr-cpb-8} [Marbles data]{.column-margin} Suppose a bag is randomly selected (again, with equal probability), but you do not know which it is. You randomly draw a marble and observe that it is blue. What is the probability that the bag you selected is `C`? That is, find $\mathbb{P}r(C\mid \text{blue})$. ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: ```{python} def l02q08(df): p = {} for bag in ['a','b','c']: p[f'blue|{bag}'] = df.loc[bag,'blue'] / df.loc[bag,].sum() p[bag] = 1/3 p[f'blue n {bag}'] = p[f'blue|{bag}'] * p[bag] p['blue']= p['blue n a'] + p['blue n b'] + p['blue n c'] p['z|blue']=p['blue|a']*p['a']/p['blue'] p['c|blue']=p['blue|c']*p['c']/p['blue'] return p p=l02q08(mdf) print(f"{p['c|blue']=:1.2f}") ``` ::: ::: {#exr-cpb-9} [Marbles data]{.column-margin} Suppose a bag is randomly selected (again, with equal probability), but you do not know which it is. You randomly draw a marble and observe that it is red. What is the probability that the bag you selected is C? That is, find $\mathbb{P}r(C\mid \text{red})$. ::: ::: {.solution .column-screen-inset-right .callout-tip collapse="true"} #### Solution: ```{python} def l02q09(df): p = {} for bag in ['a','b','c']: p[f'blue|{bag}'] = df.loc[bag,'blue'] / df.loc[bag,].sum() p[f'red|{bag}'] = df.loc[bag,'red'] / df.loc[bag,].sum() p[bag] = 1/3 p[f'blue n {bag}'] = p[f'blue|{bag}'] * p[bag] p[f'red n {bag}'] = p[f'red|{bag}'] * p[bag] #print(f'{p}') p['blue'] = p['blue n a'] + p['blue n b'] + p['blue n c'] p['red' ] = p['red n a'] + p['red n b'] + p['red n c'] p['z|blue'] = p['blue|a'] * p['a'] / p['blue'] p['c|blue'] = p['blue|c'] * p['c'] / p['blue'] p['c|red' ] = p['red|c'] * p['c'] / p['red'] return p p=l02q09(mdf) print(f"{p['c|red']=:1.2f}") ``` ::: :::::