Chapter 4 Discrete Distributions

4.1 Bernoulli distribution

A Bernoulli trial is a random experiment with exactly two possible outcomes. We often call the outcomes as “success” and “failure”. The probability of success is the same every time the experiment is conducted.

For example, flipping a coin is a Bernoulli trial, it has two outcomes (Head and Tail).

We can use random variable \(X\) to denote the outcome from a Bernoulli trial.

For example, \(X=1\) if success is observed and \(X=0\) if failure is observed so that \[ P(X=1)=p \\ P(X=0)=1-p \] In such case, the random variable \(X\) has Bernoulli distribution and its pmf can be written as \[ P(X=x \mid p)=\left\{ \begin{array}{lcl} p & \mbox{if} & x=1 \\ 1-p & \mbox{if} & x=0 \end{array}\right. \] The expected value of Bernoulli random variable is \(E(X)=p\) and its variance is \(Var(X)=p(1-p)\).

4.2 Binomial distribution

If we perform the identical Bernoulli trial for \(n\) times independently, and the random variable \(X\) is the number of successes out of the \(n\) trials, then \(X\) has binomial distribution \[ X\sim Bin(n, p). \] Its pmf can be written as \[ P(X=x \mid n,p)=\left\{ \begin{array}{lcl} {n \choose x} p^x(1-p)^{n-x} & x=0, 1, 2, \ldots, n \\ 0 & otherwise \end{array}\right. \]

If \(X\sim Bin(n, p)\), then \(E(X)=np\) and \(Var(X)=np(1-p)\)

Summary: \(Bin(n, p)\) describes the probability of obtaining \(X\) successes from \(n\) indenpendent identical Bernoulli trials with the probability of success equal to \(p\).

4.3 Geometric distribution

If we keep performing the identical Bernoulli trial independently until we observe the first success, and denote the total number of trials as \(X\), then \(X\) has geometric distribution, \[ X \sim Geom(p). \] Its pmf is \[ P(X=x \mid p)=(1-p)^{x-1}p,\; x=1,2,\ldots \] The geometric distribution gets its name because its probabilities follow a geometric sequence with common ratio \(r=(1-p)\).

If \(X\sim Geom(p)\), then \(E(X)=1/p\) and \(Var(X)=(1-p)/p^2\).

Summary: A Geometric distribution depicts the probability of the number of Bernoulli trials needed to get one success.

Alternatively, geometric distribution can also be used to express the probability of the number of failures before the first success. If \(X\) is the number of failures before the first success, then its pmf is \[ P(X=x \mid p)=(1-p)^xp. \] In this case, \(E(X)=1/p-1\) and \(Var(X)=(1-p)/p^2\).

4.4 Negative Binomial Distribution

We repeat Bernoulli trial until we obtain \(r\) successes, and let \(X\) denote the number of failures before the \(r\)th success, then \(X\) has negative binomial distribution, \[ X\sim nb(r, p). \] Its pmf is \[ P(X=x \mid r, p)={x+r-1 \choose r-1}p^r(1-p)^x, \; x=0,1,2,\ldots \] If \(X\sim nb(r, b)\), then \(E(X)=\frac{r(1-p)}{p}\) and \(Var(X)=\frac{r(1-p)}{p^2}\).

==Binomial distributions, Geometric distributions and Negative binomial distributions are all about a series of independent identical Bernoulli trials.==

4.5 Hypergeometric Distribution

Hypergeometric distribution is NOT about a series of independent identical Bernoulli trials.

Suppose that we have a box that contain N balls in total, among which there are \(M\) black balls and the rest are all white balls. If we randomly select \(n\) balls without replacement, what is the probability of getting \(X\) black balls?

\[ P(X=x\mid M, N, n)=\frac{{{M \choose x}{N-M \choose n-x}}}{{N \choose n}} \] for \(x\) an integer satisfying \(max\bigg(0, n-(N-M)\bigg) \le x\le min(n, M)\).

Image, if we draw \(n=10\) balls from the box, but there are only \((N-M)=3\) white balls inside, we will get at least \(n-(N-M)=7\).

If \(X \sim hg(M, N, n)\), then \[ E(X)=n\cdot \frac{M}{N} \\ Var(X)=\left(\frac{N-n }{N-1}\right )\cdot n \cdot \frac{M}{N}\cdot (1- \frac{M}{N}) \] The ratio \(M/N\) is the proportion of black balls in the box. If we replace the \(M/N\) by \(p\) in \(E(X)\) and \(Var(X)\), they become \[ E(X)=np \\ Var(X)=\frac{N-n}{N-1}\cdot np(1-p) \] When \(n\) is small relative to \(N\), hypergeometric distribution can be approximated by binomial distribution.

4.6 Poisson distribution

Poisson distribution is used to model the number of occurrences in a given time interval with the assumption that the probability of occurrence is proportional to the length of waiting time.

For example, waiting for a bus in a station, or waiting for customers to arrive in bank.

The Poisson distribution has a single parameter \(\lambda\), which is called the rate/intensity parameter. The rate measure the number of occurrence per time unit. For example, if we model the number of customers arrive in a bank in \(60\) mins, the rate is the average number of customers arrive every minute.

Poisson distribution has pmf \[ P(X=x \mid \lambda)=\frac{e^{-\lambda}\lambda^x}{x!}, \;\; x=0, 1, 2, \ldots. \] and \[ E(X)=Var(X)=\lambda . \]

Example (Poisson approximation)

A typesetter, on the average, makes one error in every \(500\) words typeset. A typical page contains \(300\) words. What is the probability that there will be no more than two errors in five pages (\(1500\) words)?

This is a \(Bin(n=1500, p=1/500)\). \[ \begin{split} P(\text{no more than two errors}) & = P(X\le 2) \\ & = \sum_{x=0}^2 {1500\choose x}\bigg(\frac{1}{500}\bigg)^x \bigg(\frac{499}{500}\bigg)^{1500-x} \\ &=.4230 \end{split} \] When \(n\) is large and \(p\) is small, the binomial distribution can be approximated with \(Poisson(\lambda)\) with \(\lambda=np\).

In this example, \(\lambda=1500\times \frac{1}{500}=3\), thus \[ \begin{split} P(X \le 2) & \approx \frac{e^{-\lambda}\lambda^0}{0!} +\frac{e^{-\lambda}\lambda^1}{1!}+\frac{e^{-\lambda}\lambda^2}{2!} \\ &= e^{-3}\bigg(1+3+\frac{3^2}{2}\bigg) \\ &=.4232 \end{split} \]