Math

Probability Theory

Probability theory provides the mathematical framework for analyzing random phenomena and uncertainty. It forms the foundation of statistics, machine learning, and many areas of science.

1. Probability Spaces

Definition 1.1(Probability Space)
A probability space is a triple (Ω,F,P)(\Omega, \mathcal{F}, P) where:
  • Ω\Omega is the sample space (set of all possible outcomes)
  • F\mathcal{F} is a σ\sigma-algebra of events
  • P:F[0,1]P: \mathcal{F} \to [0,1] is the probability measure

The probability measure PP must satisfy:

  1. P(Ω)=1P(\Omega) = 1 (normalization)
  2. For countably many disjoint events A1,A2,A_1, A_2, \ldots:
    P(i=1Ai)=i=1P(Ai)P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)

2. Random Variables

Definition 2.1(Random Variable)
A random variable XX is a measurable function X:ΩRX: \Omega \to \mathbb{R}. Its expected value (or mean) is:
E[X]=ΩXdP=xf(x)dx\mathbb{E}[X] = \int_{\Omega} X \, dP = \int_{-\infty}^{\infty} x \, f(x) \, dx
where f(x)f(x) is the probability density function (for continuous XX).
Definition 2.2(Variance)
The variance of a random variable XX measures the spread of its distribution:
Var(X)=E[(XE[X])2]=E[X2](E[X])2\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2

3. Important Distributions

Example 3.1
The Normal (Gaussian) distribution with mean μ\mu and variance σ2\sigma^2 has density:
f(x)=12πσ2exp((xμ)22σ2)f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)
We write XN(μ,σ2)X \sim \mathcal{N}(\mu, \sigma^2).

4. The Law of Large Numbers

Theorem 4.1(Strong Law of Large Numbers)
Let X1,X2,X_1, X_2, \ldots be i.i.d. random variables with E[Xi]=μ\mathbb{E}[X_i] = \mu. Then with probability 1:
limn1ni=1nXi=μ\lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^{n} X_i = \mu
Remark. This theorem justifies the intuition that sample averages converge to the true mean as the sample size grows.
Theorem 4.2(Central Limit Theorem)
Let X1,X2,X_1, X_2, \ldots be i.i.d. random variables with mean μ\mu and variance σ2\sigma^2. Then as nn \to \infty:
i=1nXinμσndN(0,1)\frac{\sum_{i=1}^{n} X_i - n\mu}{\sigma\sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1)
Corollary 4.1
For large nn, the sample mean Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i is approximately normally distributed:
XˉnN(μ,σ2n)\bar{X}_n \approx \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right)
Proof.
This follows directly from the Central Limit Theorem by noting that:
Xˉn=1ni=1nXi=μ+σnZn\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i = \mu + \frac{\sigma}{\sqrt{n}} \cdot Z_n
where ZndN(0,1)Z_n \xrightarrow{d} \mathcal{N}(0,1).