The Poisson distribution is a special case of the binomial distribution when \(n \to \infty \ and \ p \to 0\). It is used for predicting the number of independent events k that occur rarely but at a known constant rate \(n \lambda\), in a fixed interval of time or space.
The Poisson Distribution: \(\bbox[bisque]{P(X = k)= \frac{\lambda^k e^{-\lambda}}{k!} \ k = 0, 1, 2, ...} \ \) derivationWe want to show that the binomial distribution equation becomes the Poisson distribution equation when p is very small and n is very large.
We need to understand a couple of approximations:
X: Poisson random variable, k: the number of events
Figure P1 shows that as n gets larger and p gets smaller the Normal and the Poisson distributions superimpose. The Python code for Figure P1 and the derivation of the Poisson distribution equation are respectively shown in the expandable sections below (click the arrows to expand).
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom, poisson
# Define the parameters n,p
params = [[20, 0.5], [50, 0.2], [1000, 0.01]]
lmbd = 10
# Create a color map
colors = plt.cm.viridis(np.linspace(0, 1, len(params)))
# Plot the binomial distributions
fig, ax = plt.subplots(figsize=(12,5))
ax.set_facecolor('lightblue')
for i, (n, p) in enumerate(params):
x = np.arange(0, 25, 1)
bino_pmf = binom.pmf(x, n, p)
ax.bar(x, bino_pmf, alpha=0.5, color=colors[i], label=f"binom. p:{p} n:{n} " + r"$\mu$" + f":{n*p:.0f}")
ax.plot(x, bino_pmf, marker='_', markersize= 13, linestyle='-', color=colors[i])
# Plot the Poisson distribution
x = np.arange(0, 25, 1)
pois_pmf = poisson.pmf(x, lmbd)
ax.plot(x, pois_pmf, "o", color='black', label=f"poisson " + r"$\lambda$" + f":{lmbd}")
# Add labels and title
ax.set_xlabel('x = number of successes',fontsize=16)
ax.set_xlim(left=0, right=20)
ax.set_ylabel('y = p(x)', fontsize=16)
ax.set_title('Comparison of the Binomial and the Poisson Distributions')
ax.set_xticks([0,5,10,15,20])
ax.tick_params(axis='x', labelsize=16)
ax.tick_params(axis='y', labelsize=16)
ax.legend(fontsize=14)
ax.grid(True)
plt.show()
Figure P2 shows the shape of the Poisson distribution for \(\lambda = 1 \rightarrow 8\). Note that \(\lambda\) (the expected value) is a real positive number, but k (the number of observations, corresponding to the number of successes in the binomial distribution) can only be 0 or an integer. Therefore the lines on Figure P2 are just for guidance, they do not represent continuous values for p(k).
For example, say you expect to see a mutation that will restore bacterial growth at a rate of 4 mutations per 108 bacteria. That is, if you plate 108 bacteria you expect to see 4 colonies on the plate on the next day. Thus, \(\lambda=4\); however, you do not necessarily see exactly 4 colonies the next day. If you plate 108 bacteria on each of 10 plates you should see that the number of colonies k on each plate will follow the distribution shown on the dark green curve of Figure P2.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom, poisson
# Create a color map
colors = plt.cm.viridis(np.linspace(0, 1, 5))
# Plot the binomial distributions
fig, ax = plt.subplots(figsize=(12,5))
ax.set_facecolor('lightblue')
lbnd_values = [1,2,4,6,8]
# Plot the Poisson distribution
x = np.arange(0, 17, 1)
for i, lmbd in enumerate(lbnd_values):
pois_pmf = poisson.pmf(x, lmbd)
ax.plot(x, pois_pmf, "o", color=colors[i], linestyle='-',label=fr'$\lambda$:{lmbd}')
# Add labels and title
ax.set_xlabel(f'k',fontsize=16)
ax.set_xlim(left=0, right=16)
ax.set_ylabel(f'y = p(x)', fontsize=16)
ax.set_title(r'Poisson Distribution p(k) =
$\frac{\lambda^k e^{-\lambda}}{k!}$ where k $\in \mathbb{Z}^{0+}$', fontsize=18)
ax.legend(fontsize=16)
ax.set_xticks(np.arange(min(x), max(x)+1, 2))
ax.tick_params(axis='x', labelsize=16)
ax.tick_params(axis='y', labelsize=16)
ax.grid(True)
plt.show()
A study by Kim et al. (1996) examined the probability of capturing a specific region of the human genome in a genomic library. The authors reported:
“We have constructed an arrayed human genomic BAC library with approximately 4x coverage, represented by 96,000 BAC clones with an average insert size of nearly 140 kb. … More than 92% of the probes used to screen the library identified one or more hits. This is close to the 98% frequency predicted by a Poisson distribution for recovery of any marker from a 4x library.”
Consider a scenario where a rare disease occurs at an average rate of 2 cases per 100,000 people annually. The probability of observing \( k \) cases in a year can be modeled using the Poisson distribution with \( \lambda = 2 \).
The chi-square distribution is often used in conjunction with the Poisson distribution particularly in hypothesis testing and constructing confidence intervals for Poisson rates.
The chi-square test assesses how well the observed data fit the expected frequencies under the Poisson model:
The test statistic is calculated as:
$$ \chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i} $$
When estimating the confidence interval for a Poisson rate, the chi-square distribution provides critical values:
$$ \text{Confidence Interval for } \lambda: \left( \frac{1}{2}\chi^2_{\alpha/2,2k}, \frac{1}{2}\chi^2_{1-\alpha/2,2(k+1)} \right) $$
In Poisson regression models, which are used to model count data, the deviance (a measure of goodness-of-fit) follows a chi-square distribution. This allows for hypothesis testing about the predictors in the model. ......