# Bernoulli vs Multinoulli (Categorical) vs Binomial vs Multinomial / Gaussian / Poisson Distributions

Yao Yao on September 9, 2014

## 0. The “Choose” notation

• $n!$ reads “n factorial”
• ${n \choose x} = \frac{n!}{x!(n-x)!}$ reads “n choose x”

$n \choose x$ counts the number of ways of selecting $x$ objects out of $n$ without replacement disregarding the order of the items, i.e. $C_n^x$.

Specially,

${n \choose 0} = {n \choose n} = 1$

• $n_1$ 个 ojects of type $1$
• $n_2$ 个 ojects of type $2$
• $\dots$
• $n_c$ 个 ojects of type $c$
• $\sum_{i=1}^{c} n_i = n$

## 1. Bernoulli vs Multinoulli (Categorical) vs Binomial vs Multinomial

• $\operatorname{Bernoulli}(\pi_1)$ (伯努利分布)：抛硬币 1次
• $\operatorname{Multinoulli}(\boldsymbol{\pi})$ (多努利分布): 投骰子 $1$次; a.k.a Categorical (范畴分布)
• $\operatorname{Binomial}(n, \pi_1)$ (二项分布)：抛硬币 $n$次
• $\operatorname{Multinomial}(n, \boldsymbol{\pi})$ (多项分布)：投骰子 $n$次

• 严格来说，这应该是 4 个 RVs，而不是 4 个 distributions，但是这方面的混乱不是一天两天了
• 更进一步来说，这是 4 个 discrete RVs
• 单独一个 $\pi_1$ 表示 “每一次 toss，得到 binary outcome $1$ 的概率”
• 因为是 binary，所以 $\pi_0 = 1 - \pi_1$ 就省略了
• $\boldsymbol{\pi}$ 其实是一个 distribution：
• $\boldsymbol{\pi} = \lbrace \pi_1, \dots, \pi_c \rbrace$ ($c$ 应该是一个 countable)
• $\pi_i$ 表示 “每一次 toss，得到 categorical outcome $i$ 的概率”
• $\sum_{i} \pi_i = 1$

• $\operatorname{Multinoulli}(\lbrace \pi_1, 1-\pi_1 \rbrace) \sim \operatorname{Bernoulli}(\pi_1)$
• $\operatorname{Binomial}(1, \pi_1) \sim \operatorname{Bernoulli}(\pi_1)$
• $\operatorname{Multinomial}(1, \boldsymbol{\pi}) \sim \operatorname{Multinoulli}(\boldsymbol{\pi})$

If $X \sim \operatorname{Bernoulli}(\pi_1)$:

• $X$ is binary
• $\mathbb{P}(X=1) = \pi_1$
• $\mathbb{P}(X=0) = 1-\pi_1$
• PMF $p_X(x) = \pi_1^x (1-\pi_1)^{1-x}$
• $\mathbb{E}[X] = \pi_1$
• $\operatorname{Var}(X) = \pi_1(1-\pi_1)$

If $X \sim \operatorname{Multinoulli}(\boldsymbol{\pi})$:

• $X$ is categorical
• $\mathbb{P}(X=1) = \pi_1$
• $\cdots$
• $\mathbb{P}(X=c) = \pi_c$
• PMF $p_X(x) = \prod_{i=1}^c \pi_i^{I(x=i)}$

If $\mathbf{X} \sim \operatorname{Binomial}(n, \pi_1)$:

• $X = (n_1, n_0)$
• $n_1$ 表示 “出现 outcome $1$ 的个数”
• $n_0$ 表示 “出现 outcome $0$ 的个数”
• $n_1 + n_0 = n$ (因为 toss 了 $n$ 次)
• $\mathbb{P}_X(n_1, n_0) = {n \choose {n_1, n_0}} \pi_1^{n_1} \, (1-\pi_1)^{n_0}$
• 但有时候为了省事，我们又可以令 $X = n_1 \times 1 + n_0 \times 0 = n_1$，所以有：
• PMF $p_{X}(x) = {n \choose x} \pi_1^x(1-\pi_1)^{n-x}$, where $x = 0,\ldots,n$
• 我不喜欢这种省事

If $\mathbf{X} \sim \operatorname{Multinomial}(n, \boldsymbol{\pi})$:

• $X = (n_1, n_2, \dots, n_c)$
• $n_1$ 表示 “出现 outcome $1$ 的个数”
• $\cdots$
• $n_c$ 表示 “出现 outcome $c$ 的个数”
• $\sum_{i=1}^c n_i = n$ (因为 toss 了 $n$ 次)
• $\mathbb{P}_X(n_1, \dots, n_c) = {n \choose {n_1, \dots, n_c}} \pi_1^{n_1} \dots \pi_c^{n_c}$

• Multivariate 一般指 compound RV，比如 $X = X_1,X_2$，然后 $X_1$ 和 $X_2$ 各有一个 distribution，合起来 $X$ 有一个 multivariate distribution
• Multinomial 有很强的 categorical/count 的性质

Exercise:

• Suppose a friend has 8 children, 7 of which are girls and none are twins
• If each gender has an independent 50% probability for each birth, what’s the probability of getting 7 or more girls out of 8 births?
${8 \choose 7} .5^{7}(1-.5)^{1} + {8 \choose 8} .5^{8}(1-.5)^{0} \approx 0.04$
choose(8, 7) * .5 ^ 8 + choose(8, 8) * .5 ^ 8
##  0.03516

pbinom(6, size = 8, prob = .5, lower.tail = FALSE) ## if lower.tail=TRUE (default), return P(X ≤ x), otherwise, return P(X > x). 所以这里是 return P(X > 6)
##  0.03516


## 2. Normal (Gaussian) Distribution

### 2.1 Definition

If $X \sim \mathcal{N}(\mu, \sigma^2)$, we call RV $X$ following a normal or Gaussian distribution with mean $\mu$ and variance $\sigma^2$:

• PMF $f(x) = \frac{1}{\sqrt{2 \pi \sigma^2} } e^{ - \frac{(x-\mu)^2}{2 \sigma^2}}$
• $E[X] = \mu$ and $Var(X) = \sigma^2$

The distribution of $\mathcal{N}(0, 1)$ is called the standard normal distribution:

• PMF $\phi(x) = \frac{1}{\sqrt{2 \pi} } e^{ - \frac{x^2}{2} }$

Standard normal RVs are often labeled $Z$:

• If $X \sim \mbox{N}(\mu,\sigma^2)$, then $Z = \frac{X - \mu}{\sigma} \sim \mbox{N}(0,1)$ i.e. $Z$ is standard normal
• If $Z$ is standard normal, then $X = \mu + \sigma Z \sim \mbox{N}(\mu, \sigma^2)$
• The non-standard normal density is $\frac{\phi(\frac{x - \mu}{\sigma})}{\sigma}$

Percentiles:

1. Approximately 68%, 95% and 99% of the normal density lies within 1, 2 and 3 standard deviations from the mean, respectively
2. -1.28, -1.645, -1.96 and -2.33 are the $10^{\text{th}}$, $5^{\text{th}}$, $2.5^{\text{th}}$ and $10^{\text{st}}$ percentiles of the standard normal distribution respectively
3. By symmetry, 1.28, 1.645, 1.96 and 2.33 are the $90^{\text{th}}$, $95^{\text{th}}$, $97.5^{\text{th}}$ and $99^{\text{th}}$ percentiles of the standard normal distribution respectively

Other properties:

• The normal distribution is symmetric and peaked about its mean, therefore the mean, median and mode are all equal
• A constant times a normally distributed random variable is also normally distributed
• Sums of normally distributed random variables are again normally distributed even if the variables are dependent
• Sample means of normally distributed random variables are again normally distributed
• The square of a standard normal random variable follows what is called the chi-squared distribution
• The exponent of a normally distributed random variables follows what is called the log-normal distribution

### 2.2 Exercise

#### 2.2.1 What is the $95^{\text{th}}$ percentile of a $N(\mu, \sigma^2)$ distribution?

• Quick answer in R qnorm(.95, mean = mu, sd = sd)
• We want the point $x_0$ so that $P(X \leq x_0) = .95$
$\begin{eqnarray*} P(X \leq x_0) & = P \left ( \frac{X - \mu}{\sigma} \leq \frac{x_0 - \mu}{\sigma} \right ) \newline & = P \left ( Z \leq \frac{x_0 - \mu}{\sigma} \right ) = 0.95 \end{eqnarray*}$
• Therefore $\frac{x_0 - \mu}{\sigma} = 1.645$ or $x_0 = \mu + 1.645\sigma$
• In general $x_0 = \mu + z_0 \sigma$ where $z_0$ is the appropriate standard normal quantile

#### 2.2.2 What is the probability that a $\mbox{N}(\mu,\sigma^2)$ RV is 2 standard deviations above the mean?

I.e. we want to know

$\begin{eqnarray*} P(X > \mu + 2\sigma) & = P \left ( \frac{X -\mu}{\sigma} > \frac{\mu + 2\sigma - \mu}{\sigma} \right ) \newline & = P(Z \geq 2 ) \newline & \approx 2.5\% \end{eqnarray*}$

#### 2.2.3 Clicks Problem I

Assume that the number of daily ad clicks for a company is approximately normal distributed with a mean of 1020 and a stadard deviation of 50. What is the probablity of getting more than 1160 clicks in a day?

• First thought: it is not very likely, 1160 is 2.8 standard deviations from the mean
pnorm(1160, mean = 1020, sd = 50, lower.tail = FALSE)
##  0.002555

pnorm(2.8, lower.tail = FALSE)
##  0.002555


#### 2.2.4 Clicks Problem II

What number of daily ad clicks would represent the one where 75% of days have fewer clicks?

qnorm(0.75, mean = 1020, sd = 50)
##  1054


## 3. Poisson distribution

### 3.1 Definition

• The Poisson mass function is $$P(X = x; \lambda) = \frac{\lambda^x e^{-\lambda}}{x!}, \text{ for } x=0,1,\ldots$$
• The mean of this distribution is $\mu = \lambda$
• The variance of this distribution is $\sigma^2 = \lambda$
• Notice that $x$ ranges $[0,\infty]$

### 3.2 Some uses for the Poisson distribution

The Poisson distribution applies when:

1. the event is something that can be counted in whole numbers;
2. occurrences are independent, so that one occurrence neither diminishes nor increases the chance of another;
3. the average frequency of occurrence for the time period in question is known;
4. and it is possible to count how many events have occurred,

such as the number of times a firefly lights up in my garden in a given 5 seconds, some evening, but meaningless to ask how many such events have not occurred.

When $n$ is large and $p$ is small:

• Poisson distribution can be used to approximate binomials

### 3.3 Rates and Poisson random variables

• Poisson random variables are used to model rates
• If $X \sim Poisson(\lambda)$ on 1 unit interval, then $Y \sim Poisson(k\lambda)$ on $k$ unit intervals.
• $\lambda = E[\frac{Y}{k}]$ is the expected count per time unit (i.e. rate)
• $k$ means the total monitoring process takes $k$ time units

### 3.4 Exercise: Rate

The number of people that show up at a bus stop is Poisson with a mean of 2.5 per hour. If watching the bus stop for 4 hours, what is the probability that 3 or fewer people show up for the whole time?

ppois(3, lambda = 2.5 * 4)
##  0.01034


### 3.5 Poisson approximation to the binomial

• When $n$ is large and $p$ is small, the Poisson distribution is an accurate approximation to the binomial distribution
• Notation
• $X \sim \mbox{Binomial}(n, p)$

• $\lambda = n p$ and
• $n$ gets large
• $p$ gets small
• $\lambda$ stays constant

### 3.6 Exercise: Poisson approximation to the binomial

We flip a coin with success probablity 0.01 five hundred times. What’s the probability of 2 or fewer successes?

pbinom(2, size = 500, prob = .01)
##  0.1234

ppois(2, lambda=500 * .01)
##  0.1247