Probability Distributions

Probability Distributions — Study Notes

This chapter is about turning the random outcomes of an experiment into numbers and describing how those numbers are spread out. The two core skills are (a) moving between a probability function and its cumulative form, and (b) computing the mean and variance that summarise a distribution.

1. Random variables: discrete vs. continuous

A random variable \(X\) is a rule that attaches a real number to every outcome of a random experiment. We write capital letters \(X, Y, Z\) for the variable and small letters \(x, y, z\) for the values it can take.

A discrete random variable takes isolated, countable values (you count it) — for example the number of heads in five tosses, or the number of defective bulbs in a box.
A continuous random variable can take any value in an interval (you measure it) — for example the lifetime of a bulb or the time taken to finish a phone call. For a continuous variable, \(P(X=a)=0\) for every single value \(a\); only intervals carry positive probability.

2. Discrete variables: pmf and cdf

The probability mass function (pmf) lists the probability of each value: \(f(x_k)=P(X=x_k)\). A valid pmf must satisfy

\[ f(x_k)\ge 0 \quad\text{for every } k, \qquad \sum_k f(x_k)=1. \]

The cumulative distribution function (cdf) accumulates probability up to a point:

\[ F(x)=P(X\le x)=\sum_{x_k\le x} f(x_k). \]

For a discrete variable the cdf is a step function: flat between successive values, with a jump at each \(x_k\). The size of the jump at \(x_k\) is exactly the probability there, so the pmf can be recovered from the cdf by

\[ f(x_i)=F(x_i)-F(x_{i-1}). \]

3. Continuous variables: pdf and cdf

For a continuous variable the pmf is replaced by a probability density function (pdf) \(f(x)\), which must satisfy

\[ f(x)\ge 0 \quad\text{for all } x, \qquad \int_{-\infty}^{\infty} f(x)\,dx = 1. \]

Probability is now the area under the curve: \(P(a\le X\le b)=\displaystyle\int_a^b f(x)\,dx\). The cdf is the running integral of the density,

\[ F(x)=\int_{-\infty}^{x} f(u)\,du, \]

and, conversely, the density is the derivative of the cdf wherever that derivative exists:

\[ f(x)=\dfrac{dF(x)}{dx}=F'(x). \]

Because single points have zero probability, the symbols \(\lt\) and \(\le\) may be interchanged freely for a continuous variable.

4. Mean and variance

The mean (expected value) locates the centre of a distribution:

\[ \mu=E(X)=\sum_k x_k\,f(x_k)\ \ \text{(discrete)}, \qquad \mu=E(X)=\int_{-\infty}^{\infty} x\,f(x)\,dx\ \ \text{(continuous)}. \]

More generally \(E\big(g(X)\big)=\sum_k g(x_k)f(x_k)\) or \(\displaystyle\int g(x)f(x)\,dx\). The variance measures the spread about the mean:

\[ \operatorname{Var}(X)=E\big[(X-\mu)^2\big]=E(X^2)-\big(E(X)\big)^2, \qquad \sigma=\sqrt{\operatorname{Var}(X)}. \]

The shortcut \(E(X^2)-\mu^2\) is almost always faster than working from the definition. Two linear rules are used constantly:

\[ E(aX+b)=aE(X)+b, \qquad \operatorname{Var}(aX+b)=a^2\operatorname{Var}(X). \]

Adding a constant shifts the mean but leaves the variance unchanged, while a multiplier \(a\) scales the variance by \(a^2\) (not by \(a\)).

5. Special discrete distributions

One-point distribution: all the probability sits at a single value \(x_0\), so \(P(X=x_0)=1\); then \(\mu=x_0\) and \(\sigma^2=0\).
Two-point distribution: \(X\) takes \(x_1\) with probability \(p\) and \(x_2\) with probability \(q=1-p\).
Bernoulli distribution \(\text{Ber}(p)\): a single trial giving success (1) or failure (0), with \(f(x)=p^x(1-p)^{1-x}\) for \(x\in\{0,1\}\); here \(\mu=p\) and \(\sigma^2=pq\).
Binomial distribution \(B(n,p)\): the number of successes in \(n\) independent Bernoulli trials, with

\[ P(X=x)=\binom{n}{x}p^x q^{\,n-x}, \quad x=0,1,2,\dots,n, \qquad \mu=np,\quad \sigma^2=npq. \]

The three conditions for a binomial model are: a fixed number \(n\) of independent trials, exactly two outcomes per trial, and a constant success probability \(p\).

6. Common exam traps

Normalise first. If a pmf or pdf contains an unknown constant, find it from \(\sum f=1\) or \(\int f=1\) before computing any probability, mean or variance.
Square the multiplier. In \(\operatorname{Var}(aX+b)\) the factor is \(a^2\); the additive constant \(b\) contributes nothing.
Mean and variance are independent. A large mean does not imply a large variance — two different distributions can share the same mean and the same variance.
Heavy tails. A density can have a finite mean yet an infinite variance (the second-moment integral may diverge even when the first converges). Always test convergence before declaring that a moment “exists”.
Size-biased vs. uniform selection. Picking an individual at random favours larger groups (a size-biased average), whereas picking a group at random treats every group equally (an ordinary average); the two give different expected values.
Use the complement. For “at least one” it is usually quicker to compute \(P(X\ge 1)=1-P(X=0)\).

Probability Distributions — Study Notes