# Poisson distribution

In probability theory and statistics, the **Poisson distribution** (; French pronunciation: [pwasɔ̃]), named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.^{[1]} The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area or volume.

For instance, a call center receives an average of 180 calls per hour, 24 hours a day. The calls are independent; receiving one does not change the probability of when the next one will arrive. The number of calls received during any minute has a Poisson probability distribution: the most likely numbers are 2 and 3 but 1 and 4 are also likely and there is a small probability of it being as low as zero and a very small probability it could be 10. Another example is the number of decay events that occur from a radioactive source during a defined observation period.

The positive real number λ is equal to the expected value of X and also to its variance.^{[3]}

The Poisson distribution can be applied to systems with a . The number of such events that occur during a fixed time interval is, under the right circumstances, a random number with a Poisson distribution.

The Poisson distribution is an appropriate model if the following assumptions are true:^{[5]}

If these conditions are true, then k is a Poisson random variable, and the distribution of k is a Poisson distribution.

The Poisson distribution is also the limit of a binomial distribution, for which the probability of success for each trial equals λ divided by the number of trials, as the number of trials approaches infinity (see Related distributions).

Suppose that astronomers estimate that large meteorites (above a certain size) hit the earth on average once every 100 years (*λ* = 1 event per 100 years), and that the number of meteorite hits follows a Poisson distribution. What is the probability of k = 0 meteorite hits in the next 100 years?

Under these assumptions, the probability that no large meteorites hit the earth in the next 100 years is roughly 0.37. The remaining 1 − 0.37 = 0.63 is the probability of 1, 2, 3, or more large meteorite hits in the next 100 years.
In an example above, an overflow flood occurred once every 100 years (*λ* = 1). The probability of no overflow floods in 100 years was roughly 0.37, by the same calculation.

In general, if an event occurs on average once per interval (*λ* = 1), and the events follow a Poisson distribution, then *P*(0 events in next interval) = 0.37. In addition, *P*(exactly one event in next interval) = 0.37, as shown in the table for overflow floods.

The number of students who arrive at the student union per minute will likely not follow a Poisson distribution, because the rate is not constant (low rate during class time, high rate between class times) and the arrivals of individual students are not independent (students tend to come in groups).

The number of magnitude 5 earthquakes per year in a country may not follow a Poisson distribution if one large earthquake increases the probability of aftershocks of similar magnitude.

Examples in which at least one event is guaranteed are not Poisson distributed; but may be modeled using a zero-truncated Poisson distribution.

Count distributions in which the number of intervals with zero events is higher than predicted by a Poisson model may be modeled using a zero-inflated model.

This distribution has been extended to the bivariate case.^{[26]} The generating function for this distribution is

The marginal distributions are Poisson(*θ*_{1}) and Poisson(*θ*_{2}) and the correlation coefficient is limited to the range

This definition is analogous to one of the ways in which the classical Poisson distribution is obtained from a (classical) Poisson process.

We give values of some important transforms of the free Poisson law; the computation can be found in e.g. in the book *Lectures on the Combinatorics of Free Probability* by A. Nica and R. Speicher^{[29]}

The Cauchy transform (which is the negative of the Stieltjes transformation) is given by

Since each observation has expectation *λ* so does the sample mean. Therefore, the maximum likelihood estimate is an unbiased estimator of *λ*. It is also an efficient estimator since its variance achieves the Cramér–Rao lower bound (CRLB).^{[citation needed]} Hence it is minimum-variance unbiased. Also it can be proven that the sum (and hence the sample mean as it is a one-to-one function of the sum) is a complete and sufficient statistic for λ.

To find the parameter λ that maximizes the probability function for the Poisson population, we can use the logarithm of the likelihood function:

So *λ* is the average of the *k*_{i} values. Obtaining the sign of the second derivative of *L* at the stationary point will determine what kind of extreme value *λ* is.

which is the negative of *n* times the reciprocal of the average of the k_{i}. This expression is negative when the average is positive. If this is satisfied, then the stationary point maximizes the probability function.

The confidence interval for the mean of a Poisson distribution can be expressed using the relationship between the cumulative distribution functions of the Poisson and chi-squared distributions. The chi-squared distribution is itself closely related to the gamma distribution, and this leads to an alternative expression. Given an observation *k* from a Poisson distribution with mean *μ*, a confidence interval for *μ* with confidence level 1 – *α* is

When quantiles of the gamma distribution are not available, an accurate approximation to this exact interval has been proposed (based on the Wilson–Hilferty transformation):^{[32]}

For application of these formulae in the same context as above (given a sample of *n* measured values *k*_{i} each drawn from a Poisson distribution with mean *λ*), one would set

In Bayesian inference, the conjugate prior for the rate parameter *λ* of the Poisson distribution is the gamma distribution.^{[33]} Let

denote that *λ* is distributed according to the gamma density *g* parameterized in terms of a shape parameter *α* and an inverse scale parameter *β*:

Then, given the same sample of *n* measured values *k*_{i} as before, and a prior of Gamma(*α*, *β*), the posterior distribution is

The posterior predictive distribution for a single additional observation is a negative binomial distribution,^{[34]}^{: 53 } sometimes called a gamma–Poisson distribution.

Applications of the Poisson distribution can be found in many fields including:^{[37]}

The Poisson distribution arises in connection with Poisson processes. It applies to various phenomena of discrete properties (that is, those that may happen 0, 1, 2, 3, … times during a given period of time or in a given area) whenever the probability of the phenomenon happening is constant in time or space. Examples of events that may be modelled as a Poisson distribution include:

Gallagher showed in 1976 that the counts of prime numbers in short intervals obey a Poisson distribution^{[47]} provided a certain version of the unproved prime r-tuple conjecture of Hardy-Littlewood^{[48]} is true.

The rate of an event is related to the probability of an event occurring in some small subinterval (of time, space or otherwise)^{[clarification needed]}. In the case of the Poisson distribution^{[clarification needed]}, one assumes that there exists a small enough subinterval for which the probability of an event occurring twice is "negligible". With this assumption one can derive the Poisson distribution from the Binomial one, given only the information of expected number of total events in the whole interval^{[clarification needed]}.

In this case the binomial distribution converges to what is known as the Poisson distribution by the Poisson limit theorem.

In several of the above examples^{[clarification needed]}—such as, the number of mutations in a given sequence of DNA—the events being counted are actually the outcomes of discrete trials, and would more precisely^{[clarification needed]} be modelled using the binomial distribution, that is

In such cases *n* is very large and *p* is very small (and so the expectation *np* is of intermediate magnitude). Then the distribution may be approximated by the less cumbersome Poisson distribution^{[citation needed]}

This approximation is sometimes known as the *law of rare events*,^{[49]}^{: 5 } since each of the *n* individual Bernoulli events rarely occurs.

The name "law of rare events" may be misleading because the total count of success events in a Poisson process need not be rare if the parameter *np* is not small. For example, the number of telephone calls to a busy switchboard in one hour follows a Poisson distribution with the events appearing frequent to the operator, but they are rare from the point of view of the average member of the population who is very unlikely to make a call to that switchboard in that hour.

The variance of the binomial distribution is 1 − *p* times that of the Poisson distribution, so almost equal when *p* is very small.

The word *law* is sometimes used as a synonym of probability distribution, and *convergence in law* means *convergence in distribution*. Accordingly, the Poisson distribution is sometimes called the "law of small numbers" because it is the probability distribution of the number of occurrences of an event that happens rarely but has very many opportunities to happen. *The Law of Small Numbers* is a book by Ladislaus Bortkiewicz about the Poisson distribution, published in 1898.^{[40]}^{[50]}

The Poisson distribution arises as the number of points of a Poisson point process located in some finite region. More specifically, if *D* is some region space, for example Euclidean space **R**^{d}, for which |*D*|, the area, volume or, more generally, the Lebesgue measure of the region is finite, and if *N*(*D*) denotes the number of points in *D*, then

Poisson regression and negative binomial regression are useful for analyses where the dependent (response) variable is the count (0, 1, 2, …) of the number of events or occurrences in an interval.

An everyday example is the graininess that appears as photographs are enlarged; the graininess is due to Poisson fluctuations in the number of reduced silver grains, not to the individual grains themselves. By correlating the graininess with the degree of enlargement, one can estimate the contribution of an individual grain (which is otherwise too small to be seen unaided).^{[citation needed]} Many other molecular applications of Poisson noise have been developed, e.g., estimating the number density of receptor molecules in a cell membrane.

In causal set theory the discrete elements of spacetime follow a Poisson distribution in the volume.

which is mathematically equivalent but numerically stable. The natural logarithm of the Gamma function can be obtained using the `lgamma`

function in the C standard library (C99 version) or R, the `gammaln`

function in MATLAB or SciPy, or the `log_gamma`

function in Fortran 2008 and later.

Some computing languages provide built-in functions to evaluate the Poisson distribution, namely

A simple algorithm to generate random Poisson-distributed numbers (pseudo-random number sampling) has been given by Knuth:^{[53]}^{: 137-138 }

The complexity is linear in the returned value *k*, which is λ on average. There are many other algorithms to improve this. Some are given in Ahrens & Dieter, see § References below.

For large values of λ, the value of L = *e*^{−λ} may be so small that it is hard to represent. This can be solved by a change to the algorithm which uses an additional parameter STEP such that *e*^{−STEP} does not underflow:^{[citation needed]}

The choice of STEP depends on the threshold of overflow. For double precision floating point format the threshold is near *e*^{700}, so 500 should be a safe *STEP*.

Other solutions for large values of λ include rejection sampling and using Gaussian approximation.

Inverse transform sampling is simple and efficient for small values of λ, and requires only one uniform random number *u* per sample. Cumulative probabilities are examined in turn until one exceeds *u*.

The distribution was first introduced by Siméon Denis Poisson (1781–1840) and published together with his probability theory in his work (1837).^{[55]}^{: 205-207 } The work theorized about the number of wrongful convictions in a given country by focusing on certain random variables *N* that count, among other things, the number of discrete occurrences (sometimes called "events" or "arrivals") that take place during a time-interval of given length. The result had already been given in 1711 by Abraham de Moivre in .^{[56]}^{: 219 }^{[57]}^{: 14-15 }^{[58]}^{: 193 }^{[7]}^{: 157 } This makes it an example of Stigler's law and it has prompted some authors to argue that the Poisson distribution should bear the name of de Moivre.^{[59]}^{[60]}

*Recherches sur la probabilité des jugements en matière criminelle et en matière civile*

*De Mensura Sortis seu; de Probabilitate Eventuum in Ludis a Casu Fortuito Pendentibus*

In 1860, Simon Newcomb fitted the Poisson distribution to the number of stars found in a unit of space.^{[61]}
A further practical application of this distribution was made by Ladislaus Bortkiewicz in 1898 when he was given the task of investigating the number of soldiers in the Prussian army killed accidentally by horse kicks;^{[40]}^{: 23-25 } this experiment introduced the Poisson distribution to the field of reliability engineering.