# Moment (mathematics)

In mathematics, the **moments** of a function are quantitative measures related to the shape of the function's graph. If the function represents mass, then the first moment is the center of the mass, and the second moment is the rotational inertia. If the function is a probability distribution, then the first moment is the expected value, the second central moment is the variance, the third standardized moment is the skewness, and the fourth standardized moment is the kurtosis. The mathematical concept is closely related to the concept of moment in physics.

For a distribution of mass or probability on a bounded interval, the collection of all the moments (of all orders, from 0 to ∞) uniquely determines the distribution (Hausdorff moment problem). The same is not true on unbounded intervals (Hamburger moment problem).

In the mid-nineteenth century, Pafnuty Chebyshev became the first person to think systematically in terms of the moments of random variables.^{[1]}

The *n*-th raw moment (i.e., moment about zero) of a distribution is defined by^{[2]}

The n-th moment about zero of a probability density function *f*(*x*) is the expected value of X^{ n} and is called a *raw moment* or *crude moment*.^{[3]} The moments about its mean μ are called *central* moments; these describe the shape of the function, independently of translation.

If *f* is a probability density function, then the value of the integral above is called the n-th moment of the probability distribution. More generally, if *F* is a cumulative probability distribution function of any probability distribution, which may not have a density function, then the n-th moment of the probability distribution is given by the Riemann–Stieltjes integral

These normalised central moments are dimensionless quantities, which represent the distribution independently of any linear change of scale.

For an electric signal, the first moment is its DC level, and the second moment is proportional to its average power.^{[4]}^{[5]}

The third central moment is the measure of the lopsidedness of the distribution; any symmetric distribution will have a third central moment, if defined, of zero. The normalised third central moment is called the skewness, often γ. A distribution that is skewed to the left (the tail of the distribution is longer on the left) will have a negative skewness. A distribution that is skewed to the right (the tail of the distribution is longer on the right), will have a positive skewness.

For distributions that are not too different from the normal distribution, the median will be somewhere near *μ* − *γσ*/6; the mode about *μ* − *γσ*/2.

The fourth central moment is a measure of the heaviness of the tail of the distribution, compared to the normal distribution of the same variance. Since it is the expectation of a fourth power, the fourth central moment, where defined, is always nonnegative; and except for a point distribution, it is always strictly positive. The fourth central moment of a normal distribution is 3*σ*^{4}.

The kurtosis κ is defined to be the standardized fourth central moment (Equivalently, as in the next section, excess kurtosis is the fourth cumulant divided by the square of the second cumulant.)^{[6]}^{[7]} If a distribution has heavy tails, the kurtosis will be high (sometimes called leptokurtic); conversely, light-tailed distributions (for example, bounded distributions such as the uniform) have low kurtosis (sometimes called platykurtic).

The kurtosis can be positive without limit, but κ must be greater than or equal to *γ*^{2} + 1; equality only holds for binary distributions. For unbounded skew distributions not too far from normal, κ tends to be somewhere in the area of *γ*^{2} and 2*γ*^{2}.

As with variance, skewness, and kurtosis, these are higher-order statistics, involving non-linear combinations of the data, and can be used for description or estimation of further shape parameters. The higher the moment, the harder it is to estimate, in the sense that larger samples are required in order to obtain estimates of similar quality. This is due to the excess degrees of freedom consumed by the higher orders. Further, they can be subtle to interpret, often being most easily understood in terms of lower order moments – compare the higher-order derivatives of jerk and jounce in physics. For example, just as the 4th-order moment (kurtosis) can be interpreted as "relative importance of tails as compared to shoulders in contribution to dispersion" (for a given amount of dispersion, higher kurtosis corresponds to thicker tails, while lower kurtosis corresponds to broader shoulders), the 5th-order moment can be interpreted as measuring "relative importance of tails as compared to center (mode and shoulders) in contribution to skewness" (for a given amount of skewness, higher 5th moment corresponds to higher skewness in the tail portions and little skewness of mode, while lower 5th moment corresponds to more skewness in shoulders).

Some examples are covariance, coskewness and cokurtosis. While there is a unique covariance, there are multiple co-skewnesses and co-kurtoses.

The first raw moment and the second and third *unnormalized central* moments are additive in the sense that if *X* and *Y* are independent random variables then

(These can also hold for variables that satisfy weaker conditions than independence. The first always holds; if the second holds, the variables are called uncorrelated).

In fact, these are the first three cumulants and all cumulants share this additivity property.

For all *k*, the k-th raw moment of a population can be estimated using the k-th raw sample moment

It can be shown that the expected value of the raw sample moment is equal to the k-th raw moment of the population, if that moment exists, for any sample size n. It is thus an unbiased estimator. This contrasts with the situation for central moments, whose computation uses up a degree of freedom by using the sample mean. So for example an unbiased estimate of the population variance (the second central moment) is given by

Partial moments are sometimes referred to as "one-sided moments." The n-th order lower and upper partial moments with respect to a reference point *r* may be expressed as

If the integral function do not converge, the partial moment does not exist.

Partial moments are normalized by being raised to the power 1/*n*. The upside potential ratio may be expressed as a ratio of a first-order upper partial moment to a normalized second-order lower partial moment. They have been used in the definition of some financial metrics, such as the Sortino ratio, as they focus purely on upside or downside.

Let (*M*, *d*) be a metric space, and let B(*M*) be the Borel σ-algebra on *M*, the σ-algebra generated by the *d*-open subsets of *M*. (For technical reasons, it is also convenient to assume that *M* is a separable space with respect to the metric *d*.) Let 1 ≤ *p* ≤ ∞.

The **p-th central moment** of a measure μ on the measurable space (*M*, B(*M*)) about a given point *x*_{0} ∈ *M* is defined to be

*μ* is said to have **finite p-th central moment** if the p-th central moment of μ about *x*_{0} is finite for some *x*_{0} ∈ *M*.

This terminology for measures carries over to random variables in the usual way: if (Ω, Σ, **P**) is a probability space and *X* : Ω → *M* is a random variable, then the **p-th central moment** of *X* about *x*_{0} ∈ *M* is defined to be

and *X* has **finite p-th central moment** if the p-th central moment of *X* about *x*_{0} is finite for some *x*_{0} ∈ *M*.