# Wasserstein metric

*Vasershtein metric*

The "Wasserstein metric" has a colourful history with several quite different fields of applications. It also has various historical sources.

The term "Vasershtein distance" appeared for the first time in [a3]. For probability measures $P$, $Q$ (cf. also Probability measure) on a metric space $( U , d )$, L.N. Vasershtein [a10] had introduced the metric $\operatorname {l} _ { 1 } ( P , Q ) = \operatorname { inf } \{ \mathsf{E} d ( X , Y ) \}$, where the infimum is with respect to all random variables $X$, $Y$ with distributions $P$, $Q$. His work was very influential in ergodic theory in connection with generalizations of the Ornstein isomorphism theorem (see [a5]). In the English literature the Russian name was pronounced typically as "Wasserstein" and the notation $W ( P , Q )$ is common for $\mathbf{l} _ { 1 } ( P , Q )$.

The minimal $L_1$-metric $\mathbf{l}_{1}$ had been introduced and investigated already in 1940 by L.V. Kantorovich for compact metric spaces [a6]. This work was motivated by the classical Monge transportation problem. Subsequently, the transportation distance was generalized to general cost functionals; the special case $c ( x , y ) = d ^ { p } ( x , y )$ leads to the minimal $L _ { p }$-metric $\text{l} _ { p } ( P , Q ) = \operatorname { inf } \{ \| d ( X , Y ) \| _ { p } \}$, with $\| \cdot \| p$ the usual $L _ { p }$-norm. The famous Kantorovich–Rubinshtein theorem [a8] gives a dual representation of $\mathbf{l}_{1}$ in terms of a Lipschitz metric:

\begin{equation*} \operatorname{l} _ { 1 } ( P , Q ) = \operatorname{ sup } \left\{ \int f d ( P - Q ) : \operatorname { Lip } f \leq 1 \right\}. \end{equation*}

From this point of view, the notion of a Kantorovich metric or minimal $L_1$-metric (or minimal $L _ { p }$-metric) seems historically to be also appropriate.

In fact, in 1914, C. Gini, while introducing a "simple index of dissimilarity" , also defined the $\mathbf{l}_{1}$ metric in a discrete setting on the real line and T. Salvemini (the discrete case, [a9]) and G. Dall'Aglio (the general case, [a2]) proved the basic representation

\begin{equation*} \operatorname { l } _ { p } ^ { p } ( P , Q ) = \int _ { 0 } ^ { 1 } | F ^ { - 1 } ( u ) - G ^ { - 1 } ( u ) | ^ { p } d u , p \geq 1, \end{equation*}

where $F$, $G$ are the distribution functions of $P$, $Q$. Gini had given this formula for empirical distributions and $p = 1,2$. This influential work initiated a lot of work on measures with given marginals in the Italian School of probability, while M. Fréchet [a4] explicitly dealt with metric properties of these distances.

C.L. Mallows [a7] introduced the $I_2$-metric in a statistical context. He used its properties for proving a central limit theorem and proved the representation above. Based on Mallows work, P.J. Bickel and D.A. Freedman [a1] described topological properties and investigated applications to statistical problems such as the bootstrap (cf. also Bootstrap method). They introduced the notion of a Mallows metric for $I_2$. This notion is used mainly in the statistics literature and in some literature on algorithms.

So, the minimal $L _ { p }$-metric $\text{I} _ { p }$ was invented historically several times from different perspectives. Maybe historically the name Gini–Dall'Aglio–Kantorovich–Vasershtein–Mallows metric would be correct for this class of metrics. For simplicity reasons it seems preferable to use the notion of a minimal $L _ { p }$-metric, and write it as $\mathbf{l} _ { p } ( P , Q )$.

#### References

[a1] | P.J. Bickel, D.A. Freedman, "Some asymptotic theory for the bootstrap" Ann. Statist. , 9 (1981) pp. 1196–1217 |

[a2] | G. Dall'Aglio, "Sugli estremi dei momenti delle funzioni di ripartizione doppia" Ann. Scuola Norm. Sup. Pisa Cl. Sci. , 3 : 1 (1956) pp. 33–74 |

[a3] | R.L. Dobrushin, "Definition of a system of random variables by conditional distributions" Teor. Verojatnost. i Primenen. , 15 (1970) pp. 469–497 (In Russian) |

[a4] | M. Fréchet, "Les tableaux de corréllation dont les marges sont données" Ann. Univ. Lyon Ser. A , 20 (1957) pp. 13–31 |

[a5] | R.M. Gray, D.L. Neuhoff, P.C. Shields, "A generalization to Ornstein's $d$-distance with applications to information theory" Ann. Probab. , 3 (1975) pp. 315–328 |

[a6] | L.V. Kantorovich, "On one effective method of solving certain classes of extremal problems" Dokl. Akad. Nauk USSR , 28 (1940) pp. 212–215 |

[a7] | C.L. Mallows, "A note on asymptotic joint normality" Ann. Math. Stat. , 43 (1972) pp. 508–515 |

[a8] | L.V. Kantorovich, G.Sh. Rubinstein, "On a space of completely additive functions" Vestnik Leningrad Univ., Ser. Mat. Mekh. i Astron. , 13 : 7 (1958) pp. 52–59 (In Russian) |

[a9] | T. Salvemini, "Sul calcolo degli indici di concordanza tra due caratteri quantitativi" Atti della VI Riunione della Soc. Ital. di Statistica, Roma (1943) |

[a10] | L.N. Wasserstein, "Markov processes over denumerable products of spaces describing large systems of automata" Probl. Inform. Transmission , 5 (1969) pp. 47–52 |

**How to Cite This Entry:**

Wasserstein metric.

*Encyclopedia of Mathematics.*URL: http://encyclopediaofmath.org/index.php?title=Wasserstein_metric&oldid=50083