Chapter 3 Random Variable
3.1 Random variable and events
Recall: an event is a set of outcomes from a random experiment.
For example, if we toss an even coin for twice, the sample space is \(\{HH, TT, HT, TH\}\), and we can define three events as \[ \text{Success}=\{HH\} \\ \text{Failure}=\{TT\} \\ \text{Even}=\{HT, TH\}. \] For simplicity, I encode these three events into three numbers:
| Numbers | Events | Outcomes |
|---|---|---|
| \(1\) | Success | \(\{HH\}\) |
| \(0\) | Failure | \(\{TT\}\) |
| \(0.5\) | Even | \(\{HT, TH\}\) |
This can be interpreted as:
- if we see \(\{HH\}\), we get number \(1\);
- if we see \(\{TT\}\), we get number \(0\);
- if we see \(\{HT\}\) or \(\{TH\}\), we get number \(0.5\).
Since the outcome of the experiment is random, so the number we eventually get is also random.
If we use a capitalized letter \(X\) to denote the number that we will get after carrying out the experiment, the value it can take is totally random, depending on which outcome we observe. We call this uncertain quantity as random variable, or abbreviated as r.v.
The random variable is a mapping from the sample space \(\mathcal S\) to the real number \(\mathcal R\), which encoding the sets of random outcomes into real numbers.
By doing this, we create a math model for events from a random experiment.
Discrete rv versus Continuous rv:
- If a random variable can only take some values in an interval, it is a discrete random variable;
- If a random variable can take all possible values in an interval, it is a continuous random variable.
The random variable in the previous example is discrete.
Since values of a random variable represent events from a sample space, we can assign probabilities to these values, for example
| \(X\) | Outcomes | Probability |
|---|---|---|
| \(1\) | \(\{HH\}\) | \(1/4\) |
| \(0\) | \(\{TT\}\) | \(1/4\) |
| \(0.5\) | \(\{HT, TH\}\) | \(1/2\) |
3.2 Probability mass function (pmf)
For a discrete random variable \(X\), we define the probability mass function (pmf) p(a)$ of \(X\) by \[ p(a)=Prob\{X=a\}, \] which simply describes the probability of the random variable \(X\) taking the value \(a\).
In order to avoid confusion, I used the letter \(a\) to denote a specific number that the random variable \(X\) can possibly take. However, in most literature including our textbook, the lower case letter, such as \(x\), is used to denote a specific number, at the mean time, the upper case of the same letter, such as \(X\), is used to denote the random variable itself.
Thus, the pmf if often written as \[ p(x)=Prob\{X=x\}. \] Example 3.13
A store carries flash drives with either 1 GB, 2 GB, 4 GB, 8 GB, or 16 GB of memory.
The accompanying table gives the distribution of \(Y\) = the amount of memory in a purchased drive.
| \(y\) | \(1\) | \(2\) | \(4\) | \(8\) | \(16\) |
|---|---|---|---|---|---|
| \(p(y)\) | \(.05\) | \(.10\) | \(.35\) | \(.40\) | \(.10\) |
- the upper case \(Y\) is the notation of the random variable, which is a “concept” instead of a specific number;
- the lower case \(y\) is a specific number, such as \(2\).
3.3 Cumulative distribution function (cdf)
The cumulative distribution function (cdf) \(F(x)\) can be expressed in terms of the pmf \(p(x)\) by \[ F(a)=P(X \le a)=\sum_{\text{all } x\le a} p(x). \]
back to the Example 3.13, Let’s first determine \(F(a)\) for each of the five possible values of \(Y\): \[ \begin{split} F(1)&=P(Y \le 1)=P(Y =1)=p(1)=.05 \\ F(2)&=P(Y\le 2)=P(Y=1 \;or\; 2)=p(1)+p(2)=0.05+0.10=.15 \\ F(4) &= P(Y \le 4) = P(Y = 1 \;or\; 2 \;or\; 4) = p(1) + p(2) + p(4) = .50 \\ F(8) &= P(Y \le 8) =P(Y = 1 \;or\; 2 \;or\; 4 \;or\;8) = p(1) + p(2) + p(4) + p(8) = .90 \\ F(16) &= P(Y \le 16) = 1 \end{split} \] Now for any other number \(a\), \(F(a)\) will equal the value of F at the closest possible value of Y to the left of \(a\).
For example, if we want to compute \(F(2.7)\), the closest possible value of \(Y\) that is to the left of \(2.7\) is 2, thus, \[ \begin{split} F(2.7) &= P(Y \le 2.7) \\ &= P(Y \le 2) \\ &= F(2) \\ &= .15\\ \end{split} \]
If we want to calculate \(F(7.999)\), the closest possible value of \(Y\) to the left of \(7.999\) is \(4\), so we have \[ \begin{split} F(7.999) &= P(Y \le 7.999) \\ &= P(Y \le 4) \\ &= F(4) \\ &= .50 \end{split} \] If \(a\) is less than 1, \(F(a)=0\) [e.g. \(F(.58) = 0\)], and if \(a\) is at least 16, \(F(a)=1\) [e.g. \(F(25) = 1\)].
The cdf is thus \[ F(a) = \left\{ \begin{array}{rcl} 0 & a < 1 \\ .05 & 1\le a <2\\ .15 & 2\le a <4\\ .50 & 4 \le a <8\\ .90 & 8\le a <16\\ 1 & 16 \le a \end{array}\right. \]
The plot of cdf is
Question: can you find the point \(\big(4, F(4)\big)\) from the function plot?
3.4 Steps to obtain cdf for discrete distributions
Given a discrete distribution with pmf,
| \(X\) | \(x_1\) | \(x_2\) | \(\cdots\) | \(x_i\) | \(\cdots\) | \(x_n\) |
|---|---|---|---|---|---|---|
| \(p(x)\) | \(p(x_1)\) | \(p(x_2)\) | \(\cdots\) | \(p(x_i)\) | \(\cdots\) | \(p(x_n)\) |
- The random variable \(X\) can take \(n\) possible values, \(x_1, \ldots, x_n\);
- The probability of \(X=x_i\) is \(p(x_i)\).
Assume \(x_1, \ldots, x_n\) is already sorted into ascending order.
We can obtain the cdf \(F(x)\) of \(X\) with following steps:
Step 1: Partition the domain of \(F(x)\) into several intervals using \(x_1, \ldots, x_n\) as cutoff points;
| \(x<x_1\) | \((-\infty, x_1)\) |
|---|---|
| \(x_1 \le x < x_2\) | \([x_1, x_2)\) |
| \(x_2 \le x < x_3\) | \([x_2, x_3)\) |
| \(\cdots\) | |
| \(x_{n-1} \le x < x_n\) | \([x_{n-1}, x_n)\) |
| \(x \ge x_n\) | \([x_n, \infty)\) |
Step 2: Compute \(F(x_1), F(x_2), \ldots, F(x_n)\); \[ \begin{split} F(x_1)&=p(x_1) \\ F(x_2)&=p(x_1)+p(x_2) \\ \cdots \\ F(x_i)&=p(x_1)+p(x_2)+\cdots+p(x_i) \\ \cdots \\ F(x_n)&=1 \end{split} \] Step 3: Find the value of \(F(x)\) for each intervals obtained in the first step;
| Intervals | \(F(x)\) |
|---|---|
| \(x<x_1\) | \(0\) |
| \(x_1 \le x < x_2\) | \(F(x_1)\) |
| \(x_2 \le x < x_3\) | \(F(x_2)\) |
| \(\cdots\) | \(\cdots\) |
| \(x_i \le x < x_{i+1}\) | \(F(x_i)\) |
| \(\cdots\) | \(\cdots\) |
| \(x_{n-1} \le x < x_n\) | \(F(x_{n-1})\) |
| \(x \ge x_n\) | \(1\) |
Thus the cdf for the discrete random variable \(X\) can be written as \[ F(x) = \left\{ \begin{array}{ll} 0 & x < x_1 \\ F(x_1) & x_1\le x < x_2\\ F(x_2) & x_2\le x <x_3\\ \cdots & \cdots \\ F(x_{n-1}) & x_{n-1}\le x <x_n\\ 1 & x\ge x_n \end{array}\right. \]
3.5 Steps to obtain pmf from cdf
Assume we have a discrete random variable \(X\) and its cdf is \[ F(x) = \left\{ \begin{array}{ll} 0 & x < x_1 \\ F(x_1) & x_1\le x < x_2\\ \cdots & \cdots\\ F(x_{k-1}) & x_{k-1}\le x <x_k\\ 1 & x\ge x_k \end{array}\right. \] Find its pmf.
Step 1: extract the possible values that \(X\) can take from the intervals of \(F(x)\)’s domain. \[ F(x) = \left\{ \begin{array}{ll} 0 & x < x_1\\ F(x_1) & \color{red}x_1 \color{black}\le x < x_2\\ F(x_2) & \color{red}x_2 \color{black} \le x <x_3\\ 1 & x\ge \color{red} x_3 \end{array}\right. \] Step 2: express the pmf in tabular form
| \(x\) | \(x_1\) | \(x_2\) | \(x_3\) |
|---|---|---|---|
| \(p(x)\) | \(F(x_1)-0\) | \(F(x_2)-F(x_1)\) | \(1-F(x_2)\) |
The pmf evaluated at the possible value \(x_i\) is \[ p(x_i)=F(x_i)-F(x_i^-), \] where \(x_i^-\) is the possible value that is exactly to the left of \(x_i\).
3.6 Expected value and Variance
If a person goes to a Casino for gambling, for each run of the game, the probability of winning is \(p=0.3\), and the probability of losing is \(1-p=0.7\). If \(X\) denotes the money he earns from playing the game for once, then \(X=\$50\) if he wins and \(X=-\$50\) if he loses. We can write the pmf of \(X\) in tabular form as
| \(X\) | \(50\) | \(-50\) |
|---|---|---|
| \(p(x)\) | \(0.3\) | \(0.7\) |
The question: if he plays for \(100\) times, how much money can he expect to earn?
To calculate the money earned, we first look at the number of runs that he expects to win. For each run, the probability of winning is \(0.3\), so he can expect to win \(30\) times out of \(100\) trials. The \(30\) runs he wins will make him \(30\times \$50=\$1500\), and the rest \(70\) trials he loses will cost him \(70\times \$50=\$3500\). Thus, the total money he expects to earn is \(\$1500-\$3500=-\$2000\). He will lose \(\$2000\) if he play the same game for \(100\) times.
For each run, how much money can he expect to earn? This can be obtained by dividing the previous value by the number of trials, which gives us \(-\$2000/100=-\$20\). If we look closer at this value and the pmf table, we can see \[ -\$20=\$50\times 0.3+(-\$50)\times 0.7=\$50 \times p+(-\$50)\times (1-p). \]
This is not coincidence. The value \(-\$20\) is called the expected value of the random variable \(X\), which can be obtained by the sum of all possible values times their corresponding probabilities.
The expected value call be also called the “mean”, which is more common in literature.
Definition of Expected Value:
Let \(X\) be a discrete rv with set of possible values D and pmf \(p(x)\). The expected value or mean value of X, denoted by \(E(X)\) or \(\mu_X\) or just \(\mu\), is
\[
E(X)=\mu_X=\sum_{x\in D} x\cdot p(x)
\]
If the rv X has a set of possible values \(D\) and pmf \(p(x)\), then the expected value of any function \(h(X)\), denoted by \(E[h(X)]\) or , is computed by
\[
E[h(X)]=\sum_{x\in D} h(x)\cdot p(x)
\]
Rules of Expected Value:
\[
E(aX + b) = a E(X) + b
\]
Let \(X\) have pmf \(p(x)\) and expected value m. Then the variance of \(X\), denoted by \(V(X)\) or \(\sigma^2_X\), or just \(\sigma^2\), is \[ V(X)=\sum_D (x-\mu)^2\cdot p(x)=E[(X-\mu)^2] \] The standard deviation (SD) of \(X\) is \(\sigma_X=\sqrt{\sigma^2_X}\).
A Shortcut Formula for Variance \[ V(X)=\sigma^2=\bigg[\sum_D x^2\cdot p(x)\bigg]-\mu^2=E(X^2)-[E(X)]^2 \]
Rules of Variance: \[ V(aX+b)=\sigma^2_{aX+b}=a^2\cdot \sigma_X^2 \]