Kolmogorov–Smirnov test - Misplaced Pages

Kolmogorov–Smirnov test ( K–S test or KS test ) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2 ), one-dimensional probability distributions that can be used to test whether a sample came from a given reference probability distribution (one-sample K–S test), or to test whether two samples came from the same distribution (two-sample K–S test). Intuitively, the test provides a method to qualitatively answer the question "How likely is it that we would see a collection of samples like this if they were drawn from that probability distribution?" or, in the second case, "How likely is it that we would see two sets of samples like this if they were drawn from the same (but unknown) probability distribution?". It is named after Andrey Kolmogorov and Nikolai Smirnov .

#271728

100-399: The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The null distribution of this statistic is calculated under the null hypothesis that the sample is drawn from the reference distribution (in

200-427: A function called a metric or distance function . Metric spaces are the most general setting for studying many of the concepts of mathematical analysis and geometry . The most familiar example of a metric space is 3-dimensional Euclidean space with its usual notion of distance. Other well-known examples are a sphere equipped with the angular distance and the hyperbolic plane . A metric may correspond to

300-514: A goodness of fit test. In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic (see Test with estimated parameters ). Various studies have found that, even in this corrected form,

400-451: A metaphorical , rather than physical, notion of distance: for example, the set of 100-character Unicode strings can be equipped with the Hamming distance , which measures the number of characters that need to be changed to get from one string to another. Since they are very general, metric spaces are a tool used in many different branches of mathematics. Many types of mathematical objects have

500-448: A metric space is an ordered pair ( M , d ) where M is a set and d is a metric on M , i.e., a function d : M × M → R {\displaystyle d\,\colon M\times M\to \mathbb {R} } satisfying the following axioms for all points x , y , z ∈ M {\displaystyle x,y,z\in M} : If

600-751: A "structure-preserving" map is one that fully preserves the distance function: It follows from the metric space axioms that a distance-preserving function is injective. A bijective distance-preserving function is called an isometry . One perhaps non-obvious example of an isometry between spaces described in this article is the map f : ( R 2 , d 1 ) → ( R 2 , d ∞ ) {\displaystyle f:(\mathbb {R} ^{2},d_{1})\to (\mathbb {R} ^{2},d_{\infty })} defined by f ( x , y ) = ( x + y , x − y ) . {\displaystyle f(x,y)=(x+y,x-y).} If there

700-399: A broader and more flexible way. This was important for the growing field of functional analysis. Mathematicians like Hausdorff and Stefan Banach further refined and expanded the framework of metric spaces. Hausdorff introduced topological spaces as a generalization of metric spaces. Banach's work in functional analysis heavily relied on the metric structure. Over time, metric spaces became

800-399: A central part of modern mathematics . They have influenced various fields including topology , geometry , and applied mathematics . Metric spaces continue to play a crucial role in the study of abstract mathematical concepts. A distance function is enough to define notions of closeness and convergence that were first developed in real analysis . Properties that depend on the structure of

900-707: A certain interpretation all statements of classical formal logic can be formulated as those of intuitionistic logic. In 1929, Kolmogorov earned his Doctor of Philosophy degree from Moscow State University. In 1929, Kolmogorov and Alexandrov during a long travel stayed about a month in an island in lake Sevan in Armenia. In 1930, Kolmogorov went on his first long trip abroad, traveling to Göttingen and Munich and then to Paris . He had various scientific contacts in Göttingen, first with Richard Courant and his students working on limit theorems, where diffusion processes proved to be

1000-488: A characterization of metrizability in terms of other topological properties, without reference to metrics. Convergence of sequences in Euclidean space is defined as follows: Convergence of sequences in a topological space is defined as follows: In metric spaces, both of these definitions make sense and they are equivalent. This is a general pattern for topological properties of metric spaces: while they can be defined in

1100-451: A critical value of the test statistic D α such that P( D n > D α ) = α , then a band of width ± D α around F n ( x ) will entirely contain F ( x ) with probability 1 − α . A distribution-free multivariate Kolmogorov–Smirnov goodness of fit test has been proposed by Justel , Peña and Zamar (1997). The test uses a statistic which is built using Rosenblatt's transformation, and an algorithm

SECTION 10

#1732858617272

1200-431: A fellow student of Luzin; indeed, several researchers have concluded that the two friends were involved in a homosexual relationship, although neither acknowledged this openly during their lifetimes. Kolmogorov (together with Aleksandr Khinchin ) became interested in probability theory . Also in 1925, he published his work in intuitionistic logic , "On the principle of the excluded middle," in which he proved that under

1300-551: A fit with minimum KS. In this case we should reject H 0 , which is often the case with MLE, because the sample standard deviation might be very large for T-2 data, but with KS minimization we may get still a too low KS to reject H 0 . In the Student-T case, a modified KS test with KS estimate instead of MLE, makes the KS test indeed slightly worse. However, in other cases, such a modified KS test leads to slightly better test power. Under

1400-404: A large bias error on sigma. Using a moment fit or KS minimization instead has a large impact on the critical values, and also some impact on test power. If we need to decide for Student-T data with df = 2 via KS test whether the data could be normal or not, then a ML estimate based on H 0 (data is normal, so using the standard deviation for scale) would give much larger KS distance, than

1500-400: A metric space are referred to as metric properties . Every metric space is also a topological space , and some metric properties can also be rephrased without reference to distance in the language of topology; that is, they are really topological properties . For any point x in a metric space M and any real number r > 0 , the open ball of radius r around x is defined to be

1600-434: A metric space by measuring distances the same way we would in M . Formally, the induced metric on A is a function d A : A × A → R {\displaystyle d_{A}:A\times A\to \mathbb {R} } defined by d A ( x , y ) = d ( x , y ) . {\displaystyle d_{A}(x,y)=d(x,y).} For example, if we take

1700-404: A natural notion of distance and therefore admit the structure of a metric space, including Riemannian manifolds , normed vector spaces , and graphs . In abstract algebra , the p -adic numbers arise as elements of the completion of a metric structure on the rational numbers . Metric spaces are also studied in their own right in metric geometry and analysis on metric spaces . Many of

1800-416: A particular interpretation of Hilbert's thirteenth problem . Around this time he also began to develop, and has since been considered a founder of, algorithmic complexity theory – often referred to as Kolmogorov complexity theory . Kolmogorov married Anna Dmitrievna Egorova in 1942. He pursued a vigorous teaching routine throughout his life both at the university level and also with younger children, as he

1900-400: A purely topological way, there is often a way that uses the metric which is easier to state or more familiar from real analysis. Informally, a metric space is complete if it has no "missing points": every sequence that looks like it should converge to something actually converges. To make this precise: a sequence ( x n ) in a metric space M is Cauchy if for every ε > 0 there

2000-509: A reputation for his wide-ranging erudition. While an undergraduate student in college, he attended the seminars of the Russian historian S. V. Bakhrushin , and he published his first research paper on the fifteenth and sixteenth centuries' landholding practices in the Novgorod Republic . During the same period (1921–22), Kolmogorov worked out and proved several results in set theory and in

2100-460: A special case of this for the normal distribution. The logarithm transformation may help to overcome cases where the Kolmogorov test data does not seem to fit the assumption that it came from the normal distribution. Using estimated parameters, the question arises which estimation method should be used. Usually this would be the maximum likelihood method , but e.g. for the normal distribution MLE has

SECTION 20

#1732858617272

2200-748: A totally unacceptable 7 % {\displaystyle 7~\%} when n = 10 {\displaystyle n=10} . However, a very simple expedient of replacing x {\displaystyle x} by in the argument of the Jacobi theta function reduces these errors to 0.003 % {\displaystyle 0.003~\%} , 0.027 % {\displaystyle 0.027\%} , and 0.27 % {\displaystyle 0.27~\%} respectively; such accuracy would be usually considered more than adequate for all practical applications. The goodness-of-fit test or

2300-412: A way of measuring distances between them. Taking the completion of this metric space gives a new set of functions which may be less nice, but nevertheless useful because they behave similarly to the original nice functions in important ways. For example, weak solutions to differential equations typically live in a completion (a Sobolev space ) rather than the original space of nice functions for which

2400-487: Is uniformly continuous if for every real number ε > 0 there exists δ > 0 such that for all points x and y in M 1 such that d ( x , y ) < δ {\displaystyle d(x,y)<\delta } , we have d 2 ( f ( x ) , f ( y ) ) < ε . {\displaystyle d_{2}(f(x),f(y))<\varepsilon .} The only difference between this definition and

2500-545: Is K - Lipschitz if d 2 ( f ( x ) , f ( y ) ) ≤ K d 1 ( x , y ) for all x , y ∈ M 1 . {\displaystyle d_{2}(f(x),f(y))\leq Kd_{1}(x,y)\quad {\text{for all}}\quad x,y\in M_{1}.} Lipschitz maps are particularly important in metric geometry, since they provide more flexibility than distance-preserving maps, but still make essential use of

2600-775: Is Lebesgue's number lemma , which shows that for any open cover of a compact space, every point is relatively deep inside one of the sets of the cover. Unlike in the case of topological spaces or algebraic structures such as groups or rings , there is no single "right" type of structure-preserving function between metric spaces. Instead, one works with different types of functions depending on one's goals. Throughout this section, suppose that ( M 1 , d 1 ) {\displaystyle (M_{1},d_{1})} and ( M 2 , d 2 ) {\displaystyle (M_{2},d_{2})} are two metric spaces. The words "function" and "map" are used interchangeably. One interpretation of

2700-410: Is not a topological property, since R {\displaystyle \mathbb {R} } is complete but the homeomorphic space (0, 1) is not. This notion of "missing points" can be made precise. In fact, every metric space has a unique completion , which is a complete space that contains the given space as a dense subset. For example, [0, 1] is the completion of (0, 1) , and

2800-487: Is (e.g. whether it's normal or not normal). Again, tables of critical values have been published. A shortcoming of the univariate Kolmogorov–Smirnov test is that it is not very powerful because it is devised to be sensitive against all possible types of differences between two distribution functions. Some argue that the Cucconi test , originally proposed for simultaneously comparing location and scale, can be much more powerful than

2900-412: Is 1. Fast and accurate algorithms to compute the cdf Pr ⁡ ( D n ≤ x ) {\displaystyle \operatorname {Pr} (D_{n}\leq x)} or its complement for arbitrary n {\displaystyle n} and x {\displaystyle x} , are available from: If either the form or the parameters of F ( x ) are determined from

3000-415: Is a neighborhood of x (informally, it contains all points "close enough" to x ) if it contains an open ball of radius r around x for some r > 0 . An open set is a set which is a neighborhood of all its points. It follows that the open balls form a base for a topology on M . In other words, the open sets of M are exactly the unions of open balls. As in any topology, closed sets are

3100-559: Is a continuous bijection whose inverse is also continuous; if there is a homeomorphism between M 1 and M 2 , they are said to be homeomorphic . Homeomorphic spaces are the same from the point of view of topology, but may have very different metric properties. For example, R {\displaystyle \mathbb {R} } is unbounded and complete, while (0, 1) is bounded but not complete. A function f : M 1 → M 2 {\displaystyle f\,\colon M_{1}\to M_{2}}

Kolmogorov–Smirnov test - Misplaced Pages Continue

3200-482: Is an integer N such that for all m , n > N , d ( x m , x n ) < ε . By the triangle inequality, any convergent sequence is Cauchy: if x m and x n are both less than ε away from the limit, then they are less than 2ε away from each other. If the converse is true—every Cauchy sequence in M converges—then M is complete. Euclidean spaces are complete, as is R 2 {\displaystyle \mathbb {R} ^{2}} with

3300-435: Is an isometry between the spaces M 1 and M 2 , they are said to be isometric . Metric spaces that are isometric are essentially identical . On the other end of the spectrum, one can forget entirely about the metric structure and study continuous maps , which only preserve topological structure. There are several equivalent definitions of continuity for metric spaces. The most important are: A homeomorphism

3400-496: Is bounded. To see this, start with a finite cover by r -balls for some arbitrary r . Since the subset of M consisting of the centers of these balls is finite, it has finite diameter, say D . By the triangle inequality, the diameter of the whole space is at most D + 2 r . The converse does not hold: an example of a metric space that is bounded but not totally bounded is R 2 {\displaystyle \mathbb {R} ^{2}} (or any other infinite set) with

3500-493: Is defined as The Kolmogorov–Smirnov statistic for a given cumulative distribution function F ( x ) is where sup x is the supremum of the set of distances. Intuitively, the statistic takes the largest absolute difference between the two distribution functions across all x values. By the Glivenko–Cantelli theorem , if the sample comes from distribution F ( x ), then D n converges to 0 almost surely in

3600-418: Is defined by d 1 ( ( x 1 , y 1 ) , ( x 2 , y 2 ) ) = | x 2 − x 1 | + | y 2 − y 1 | {\displaystyle d_{1}((x_{1},y_{1}),(x_{2},y_{2}))=|x_{2}-x_{1}|+|y_{2}-y_{1}|} and can be thought of as

3700-429: Is developed to compute it in the bivariate case. An approximate test that can be easily computed in any dimension is also presented. The Kolmogorov–Smirnov test statistic needs to be modified if a similar test is to be applied to multivariate data . This is not straightforward because the maximum difference between two joint cumulative distribution functions is not generally the same as the maximum difference of any of

3800-577: Is due to Peacock (see also Gosset for a 3D version) and another to Fasano and Franceschini (see Lopes et al. for a comparison and computational details). Critical values for the test statistic can be obtained by simulations, but depend on the dependence structure in the joint distribution. In one dimension, the Kolmogorov–Smirnov statistic is identical to the so-called star discrepancy D, so another native KS extension to higher dimensions would be simply to use D also for higher dimensions. Unfortunately,

3900-400: Is finite is not very impressive: even when n = 1000 {\displaystyle n=1000} , the corresponding maximum error is about 0.9 % {\displaystyle 0.9~\%} ; this error increases to 2.6 % {\displaystyle 2.6~\%} when n = 100 {\displaystyle n=100} and to

4000-524: Is purely discrete or mixed, implemented in C++ and in the KSgeneral package of the R language . The functions disc_ks_test() , mixed_ks_test() and cont_ks_test() compute also the KS test statistic and p-values for purely discrete, mixed or continuous null distributions and arbitrary sample sizes. The KS test and its p-values for discrete null distributions and small sample sizes are also computed in as part of

4100-487: Is rejected at level α {\displaystyle \alpha } if Where n {\displaystyle n} and m {\displaystyle m} are the sizes of first and second sample respectively. The value of c ( α ) {\displaystyle c({\alpha })} is given in the table below for the most common levels of α {\displaystyle \alpha } and in general by so that

Kolmogorov–Smirnov test - Misplaced Pages Continue

4200-501: Is the Brownian bridge. If F is continuous then under the null hypothesis n D n {\displaystyle {\sqrt {n}}D_{n}} converges to the Kolmogorov distribution, which does not depend on F . This result may also be known as the Kolmogorov theorem. The accuracy of this limit as an approximation to the exact cdf of K {\displaystyle K} when n {\displaystyle n}

4300-430: Is used. One might require that the result of the test used should not depend on which choice is made. One approach to generalizing the Kolmogorov–Smirnov statistic to higher dimensions which meets the above concern is to compare the cdfs of the two samples with all possible orderings, and take the largest of the set of resulting KS statistics. In d dimensions, there are 2 − 1 such orderings. One such variation

4400-667: The Cold War . In 1939, he was elected a full member (academician) of the USSR Academy of Sciences . During World War II Kolmogorov contributed to the Soviet war effort by applying statistical theory to artillery fire, developing a scheme of stochastic distribution of barrage balloons intended to help protect Moscow from German bombers during the Battle of Moscow . In his study of stochastic processes , especially Markov processes , Kolmogorov and

4500-506: The Great Purge in 1936, Kolmogorov's doctoral advisor Nikolai Luzin became a high-profile target of Stalin's regime in what is now called the "Luzin Affair." Kolmogorov and several other students of Luzin testified against Luzin, accusing him of plagiarism, nepotism, and other forms of misconduct; the hearings eventually concluded that he was a servant to "fascistoid science" and thus an enemy of

4600-538: The Heine–Cantor theorem states that if M 1 is compact, then every continuous map is uniformly continuous. In other words, uniform continuity cannot distinguish any non-topological features of compact metric spaces. A Lipschitz map is one that stretches distances by at most a bounded factor. Formally, given a real number K > 0 , the map f : M 1 → M 2 {\displaystyle f\,\colon M_{1}\to M_{2}}

4700-464: The surface of the Earth as a set of points. We can measure the distance between two such points by the length of the shortest path along the surface , " as the crow flies "; this is particularly useful for shipping and aviation. We can also measure the straight-line distance between two points through the Earth's interior; this notion is, for example, natural in seismology , since it roughly corresponds to

4800-454: The two sample test can also be performed under more general conditions that allow for discontinuity, heterogeneity and dependence across samples. The two-sample K–S test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples. The Kolmogorov–Smirnov test can be modified to serve as

4900-428: The 1990s and other surviving testimonies, that the students of Luzin had initiated the accusations against Luzin out of personal acrimony; there was no definitive evidence that the students were coerced by the state, nor was there any definitive evidence to support their allegations of academic misconduct. Soviet historian of mathematics A.P. Yushkevich surmised that, unlike many of the other high-profile persecutions of

5000-741: The British mathematician Sydney Chapman independently developed a pivotal set of equations in the field that have been given the name of the Chapman–Kolmogorov equations . Later, Kolmogorov focused his research on turbulence , beginning his publications in 1941. In classical mechanics , he is best known for the Kolmogorov–Arnold–Moser theorem , first presented in 1954 at the International Congress of Mathematicians . In 1957, working jointly with his student Vladimir Arnold , he solved

5100-486: The Euclidean metric and its subspace the interval (0, 1) with the induced metric are homeomorphic but have very different metric properties. Conversely, not every topological space can be given a metric. Topological spaces which are compatible with a metric are called metrizable and are particularly well-behaved in many ways: in particular, they are paracompact Hausdorff spaces (hence normal ) and first-countable . The Nagata–Smirnov metrization theorem gives

SECTION 50

#1732858617272

5200-414: The Kolmogorov–Smirnov statistic is where F 1 , n {\displaystyle F_{1,n}} and F 2 , m {\displaystyle F_{2,m}} are the empirical distribution functions of the first and the second sample respectively, and sup {\displaystyle \sup } is the supremum function . For large samples, the null hypothesis

5300-409: The Kolmogorov–Smirnov test can be constructed by using the critical values of the Kolmogorov distribution. This test is asymptotically valid when n → ∞ . {\displaystyle n\to \infty .} It rejects the null hypothesis at level α {\displaystyle \alpha } if where K α is found from The asymptotic power of this test

5400-425: The Kolmogorov–Smirnov test when comparing two distribution functions. Two-sample KS tests have been applied in economics to detect asymmetric effects and to study natural experiments. While the Kolmogorov–Smirnov test is usually used to test whether a given F ( x ) is the underlying probability distribution of F n ( x ), the procedure may be inverted to give confidence limits on F ( x ) itself. If one chooses

5500-645: The Soviet people. Luzin lost his academic positions, but curiously he was neither arrested nor expelled from the Academy of Sciences of the Soviet Union . The question of whether Kolmogorov and others were coerced into testifying against their teacher remains a topic of considerable speculation among historians; all parties involved refused to publicly discuss the case for the rest of their lives. Soviet-Russian mathematician Semën Samsonovich Kutateladze concluded in 2013, after reviewing archival documents made available during

5600-477: The age of five he noticed the regularity in the sum of the series of odd numbers: 1 = 1 2 ; 1 + 3 = 2 2 ; 1 + 3 + 5 = 3 2 , {\displaystyle 1=1^{2};1+3=2^{2};1+3+5=3^{2},} etc. In 1910, his aunt adopted him, and they moved to Moscow, where he graduated from high school in 1920. Later that same year, Kolmogorov began to study at Moscow State University and at

5700-619: The assumption that F ( x ) {\displaystyle F(x)} is non-decreasing and right-continuous, with countable (possibly infinite) number of jumps, the KS test statistic can be expressed as: From the right-continuity of F ( x ) {\displaystyle F(x)} , it follows that F ( F − 1 ( t ) ) ≥ t {\displaystyle F(F^{-1}(t))\geq t} and F − 1 ( F ( x ) ) ≤ x {\displaystyle F^{-1}(F(x))\leq x} and hence,

5800-436: The basic notions of mathematical analysis , including balls , completeness , as well as uniform , Lipschitz , and Hölder continuity , can be defined in the setting of metric spaces. Other notions, such as continuity , compactness , and open and closed sets , can be defined for metric spaces, but also in the even more general setting of topological spaces . To see the utility of different notions of distance, consider

5900-408: The complementary distribution functions. Thus the maximum difference will differ depending on which of Pr ( X < x ∧ Y < y ) {\displaystyle \Pr(X<x\land Y<y)} or Pr ( X < x ∧ Y > y ) {\displaystyle \Pr(X<x\land Y>y)} or any of the other two possible arrangements

6000-495: The complements of open sets. Sets may be both open and closed as well as neither open nor closed. This topology does not carry all the information about the metric space. For example, the distances d 1 , d 2 , and d ∞ defined above all induce the same topology on R 2 {\displaystyle \mathbb {R} ^{2}} , although they behave differently in many respects. Similarly, R {\displaystyle \mathbb {R} } with

6100-455: The condition reads Here, again, the larger the sample sizes, the more sensitive the minimal bound: For a given ratio of sample sizes (e.g. m = n {\displaystyle m=n} ), the minimal bound scales in the size of either of the samples according to its inverse square root. Note that the two-sample test checks whether the two data samples come from the same distribution. This does not specify what that common distribution

SECTION 60

#1732858617272

6200-456: The creation of modern probability theory . He also contributed to the mathematics of topology , intuitionistic logic , turbulence , classical mechanics , algorithmic information theory and computational complexity . Andrey Kolmogorov was born in Tambov , about 500 kilometers southeast of Moscow , in 1903. His unmarried mother, Maria Yakovlevna Kolmogorova, died giving birth to him. Andrey

6300-504: The data X i the critical values determined in this way are invalid. In such cases, Monte Carlo or other methods may be required, but tables have been prepared for some cases. Details for the required modifications to the test statistic and for the critical values for the normal distribution and the exponential distribution have been published, and later publications also include the Gumbel distribution . The Lilliefors test represents

6400-478: The dgof package of the R language. Major statistical packages among which SAS PROC NPAR1WAY , Stata ksmirnov implement the KS test under the assumption that F ( x ) {\displaystyle F(x)} is continuous, which is more conservative if the null distribution is actually not continuous (see ). The Kolmogorov–Smirnov test may also be used to test whether two underlying one-dimensional probability distributions differ. In this case,

6500-407: The differential equation actually makes sense. A metric space M is bounded if there is an r such that no pair of points in M is more than distance r apart. The least such r is called the diameter of M . The space M is called precompact or totally bounded if for every r > 0 there is a finite cover of M by open balls of radius r . Every totally bounded space

6600-479: The discrete metric no longer remembers that the set is a plane, but treats it just as an undifferentiated set of points. All of these metrics make sense on R n {\displaystyle \mathbb {R} ^{n}} as well as R 2 {\displaystyle \mathbb {R} ^{2}} . Given a metric space ( M , d ) and a subset A ⊆ M {\displaystyle A\subseteq M} , we can consider A to be

6700-414: The discrete metric. Compactness is a topological property which generalizes the properties of a closed and bounded subset of Euclidean space. There are several equivalent definitions of compactness in metric spaces: One example of a compact space is the closed interval [0, 1] . Compactness is important for similar reasons to completeness: it makes it easy to find limits. Another important tool

6800-1066: The distance function d ( x , y ) = | y − x | {\displaystyle d(x,y)=|y-x|} given by the absolute difference form a metric space. Many properties of metric spaces and functions between them are generalizations of concepts in real analysis and coincide with those concepts when applied to the real line. The Euclidean plane R 2 {\displaystyle \mathbb {R} ^{2}} can be equipped with many different metrics. The Euclidean distance familiar from school mathematics can be defined by d 2 ( ( x 1 , y 1 ) , ( x 2 , y 2 ) ) = ( x 2 − x 1 ) 2 + ( y 2 − y 1 ) 2 . {\displaystyle d_{2}((x_{1},y_{1}),(x_{2},y_{2}))={\sqrt {(x_{2}-x_{1})^{2}+(y_{2}-y_{1})^{2}}}.} The taxicab or Manhattan distance

6900-774: The distance you need to travel along horizontal and vertical lines to get from one point to the other, as illustrated at the top of the article. The maximum , L ∞ {\displaystyle L^{\infty }} , or Chebyshev distance is defined by d ∞ ( ( x 1 , y 1 ) , ( x 2 , y 2 ) ) = max { | x 2 − x 1 | , | y 2 − y 1 | } . {\displaystyle d_{\infty }((x_{1},y_{1}),(x_{2},y_{2}))=\max\{|x_{2}-x_{1}|,|y_{2}-y_{1}|\}.} This distance does not have an easy explanation in terms of paths in

7000-477: The distribution of D n {\displaystyle D_{n}} depends on the null distribution F ( x ) {\displaystyle F(x)} , i.e., is no longer distribution-free as in the continuous case. Therefore, a fast and accurate method has been developed to compute the exact and asymptotic distribution of D n {\displaystyle D_{n}} when F ( x ) {\displaystyle F(x)}

7100-404: The era, Stalin did not personally initiate the persecution of Luzin and instead eventually concluded that he was not a threat to the regime, which would explain the unusually mild punishment relative to other contemporaries. In a 1938 paper, Kolmogorov "established the basic theorems for smoothing and predicting stationary stochastic processes "—a paper that had major military applications during

7200-509: The field of non-euclidean geometry through the use of the Cayley-Klein metric . The idea of an abstract space with metric properties was addressed in 1906 by René Maurice Fréchet and the term metric space was coined by Felix Hausdorff in 1914. Fréchet's work laid the foundation for understanding convergence , continuity , and other key concepts in non-geometric spaces. This allowed mathematicians to study functions and sequences in

7300-440: The form of the Kolmogorov–Smirnov test statistic and its asymptotic distribution under the null hypothesis were published by Andrey Kolmogorov , while a table of the distribution was published by Nikolai Smirnov . Recurrence relations for the distribution of the test statistic in finite samples are available. Under null hypothesis that the sample comes from the hypothesized distribution F ( x ), in distribution , where B ( t )

7400-1168: The formula d ∞ ( p , q ) ≤ d 2 ( p , q ) ≤ d 1 ( p , q ) ≤ 2 d ∞ ( p , q ) , {\displaystyle d_{\infty }(p,q)\leq d_{2}(p,q)\leq d_{1}(p,q)\leq 2d_{\infty }(p,q),} which holds for every pair of points p , q ∈ R 2 {\displaystyle p,q\in \mathbb {R} ^{2}} . A radically different distance can be defined by setting d ( p , q ) = { 0 , if p = q , 1 , otherwise. {\displaystyle d(p,q)={\begin{cases}0,&{\text{if }}p=q,\\1,&{\text{otherwise.}}\end{cases}}} Using Iverson brackets , d ( p , q ) = [ p ≠ q ] {\displaystyle d(p,q)=[p\neq q]} In this discrete metric , all distinct points are 1 unit apart: none of them are close to each other, and none of them are very far away from each other either. Intuitively,

7500-481: The length of time it takes for seismic waves to travel between those two points. The notion of distance encoded by the metric space axioms has relatively few requirements. This generality gives metric spaces a lot of flexibility. At the same time, the notion is strong enough to encode many intuitive facts about what distance means. This means that general results about metric spaces can be applied in many different contexts. Like many fundamental mathematical concepts,

7600-506: The limit when n {\displaystyle n} goes to infinity. Kolmogorov strengthened this result, by effectively providing the rate of this convergence (see Kolmogorov distribution ). Donsker's theorem provides a yet stronger result. In practice, the statistic requires a relatively large number of data points (in comparison to other goodness of fit criteria such as the Anderson–Darling test statistic) to properly reject

7700-406: The limiting distribution does not depend on the marginal distributions. The Kolmogorov–Smirnov test is implemented in many software programs. Most of these implement both the one and two sampled test. Metric (mathematics) In mathematics , a metric space is a set together with a notion of distance between its elements , usually called points . The distance is measured by

7800-567: The limits of discrete random processes, then with Hermann Weyl in intuitionistic logic, and lastly with Edmund Landau in function theory. His pioneering work About the Analytical Methods of Probability Theory was published (in German) in 1931. Also in 1931, he became a professor at Moscow State University . In 1933, Kolmogorov published his book Foundations of the Theory of Probability , laying

7900-430: The metric d is unambiguous, one often refers by abuse of notation to "the metric space M ". By taking all axioms except the second, one can show that distance is always non-negative: 0 = d ( x , x ) ≤ d ( x , y ) + d ( y , x ) = 2 d ( x , y ) {\displaystyle 0=d(x,x)\leq d(x,y)+d(y,x)=2d(x,y)} Therefore

8000-531: The metric on a metric space can be interpreted in many different ways. A particular metric may not be best thought of as measuring physical distance, but, instead, as the cost of changing from one state to another (as with Wasserstein metrics on spaces of measures ) or the degree of difference between two objects (for example, the Hamming distance between two strings of characters, or the Gromov–Hausdorff distance between metric spaces themselves). Formally,

8100-416: The metric. For example, a curve in a metric space is rectifiable (has finite length) if and only if it has a Lipschitz reparametrization. Andrey Kolmogorov Andrey Nikolaevich Kolmogorov (Russian: Андре́й Никола́евич Колмого́ров , IPA: [ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf] , 25 April 1903 – 20 October 1987) was a Soviet mathematician who played a central role in

8200-459: The modern axiomatic foundations of probability theory and establishing his reputation as the world's leading expert in this field. In 1935, Kolmogorov became the first chairman of the department of probability theory at Moscow State University. Around the same years (1936) Kolmogorov contributed to the field of ecology and generalized the Lotka–Volterra model of predator–prey systems. During

8300-565: The null hypothesis. The Kolmogorov distribution is the distribution of the random variable where B ( t ) is the Brownian bridge . The cumulative distribution function of K is given by which can also be expressed by the Jacobi theta function ϑ 01 ( z = 0 ; τ = 2 i x 2 / π ) {\displaystyle \vartheta _{01}(z=0;\tau =2ix^{2}/\pi )} . Both

8400-429: The one-sample case) or that the samples are drawn from the same distribution (in the two-sample case). In the one-sample case, the distribution considered under the null hypothesis may be continuous (see Section 2 ), purely discrete or mixed (see Section 2.2 ). In the two-sample case (see Section 3 ), the distribution considered under the null hypothesis is a continuous distribution but is otherwise unrestricted. However,

8500-543: The other metrics described above. Two examples of spaces which are not complete are (0, 1) and the rationals, each with the metric induced from R {\displaystyle \mathbb {R} } . One can think of (0, 1) as "missing" its endpoints 0 and 1. The rationals are missing all the irrationals, since any irrational has a sequence of rationals converging to it in R {\displaystyle \mathbb {R} } (for example, its successive decimal approximations). These examples show that completeness

8600-428: The plane, but it still satisfies the metric space axioms. It can be thought of similarly to the number of moves a king would have to make on a chess board to travel from one point to another on the given space. In fact, these three distances, while they have distinct properties, are similar in some ways. Informally, points that are close in one are close in the others, too. This observation can be quantified with

8700-472: The real line. Arthur Cayley , in his article "On Distance", extended metric concepts beyond Euclidean geometry into domains bounded by a conic in a projective space. His distance was given by logarithm of a cross ratio . Any projectivity leaving the conic stable also leaves the cross ratio constant, so isometries are implicit. This method provides models for elliptic geometry and hyperbolic geometry , and Felix Klein , in several publications, established

8800-407: The real numbers are the completion of the rationals. Since complete spaces are generally easier to work with, completions are important throughout mathematics. For example, in abstract algebra, the p -adic numbers are defined as the completion of the rationals under a different metric. Completion is particularly common as a tool in functional analysis . Often one has a set of nice functions and

8900-574: The research vessel Dmitri Mendeleev . He wrote a number of articles for the Great Soviet Encyclopedia . In his later years, he devoted much of his effort to the mathematical and philosophical relationship between probability theory in abstract and applied areas. Kolmogorov died in Moscow in 1987 and his remains were buried in the Novodevichy cemetery . A quotation attributed to Kolmogorov

9000-464: The same time Mendeleev Moscow Institute of Chemistry and Technology . Kolmogorov writes about this time: "I arrived at Moscow University with a fair knowledge of mathematics. I knew in particular the beginning of set theory . I studied many questions in articles in the Encyclopedia of Brockhaus and Efron , filling out for myself what was presented too concisely in these articles." Kolmogorov gained

9100-415: The second axiom can be weakened to If x ≠ y , then d ( x , y ) ≠ 0 {\textstyle {\text{If }}x\neq y{\text{, then }}d(x,y)\neq 0} and combined with the first to make d ( x , y ) = 0 ⟺ x = y {\textstyle d(x,y)=0\iff x=y} . The real numbers with

9200-478: The set of points that are strictly less than distance r from x : B r ( x ) = { y ∈ M : d ( x , y ) < r } . {\displaystyle B_{r}(x)=\{y\in M:d(x,y)<r\}.} This is a natural way to define a set of points that are relatively close to x . Therefore, a set N ⊆ M {\displaystyle N\subseteq M}

9300-426: The star discrepancy is hard to calculate in high dimensions. In 2021 the functional form of the multivariate KS test statistic was proposed, which simplified the problem of estimating the tail probabilities of the multivariate KS test statistic, which is needed for the statistical test. For the multivariate case, if F i is the i th continuous marginal from a probability distribution with k variables, then so

9400-510: The test is less powerful for testing normality than the Shapiro–Wilk test or Anderson–Darling test . However, these other tests have their own disadvantages. For instance the Shapiro–Wilk test is known not to work well in samples with many identical values. The empirical distribution function F n for n independent and identically distributed (i.i.d.) ordered observations X i

9500-415: The theory of Fourier series . In 1922, Kolmogorov gained international recognition for constructing a Fourier series that diverges almost everywhere . Around this time, he decided to devote his life to mathematics . In 1925, Kolmogorov graduated from Moscow State University and began to study under the supervision of Nikolai Luzin . He formed a lifelong close friendship with Pavel Alexandrov ,

9600-410: The two-dimensional sphere S as a subset of R 3 {\displaystyle \mathbb {R} ^{3}} , the Euclidean metric on R 3 {\displaystyle \mathbb {R} ^{3}} induces the straight-line metric on S described above. Two more useful examples are the open interval (0, 1) and the closed interval [0, 1] thought of as subspaces of

9700-413: The ε–δ definition of continuity is the order of quantifiers: the choice of δ must depend only on ε and not on the point x . However, this subtle change makes a big difference. For example, uniformly continuous maps take Cauchy sequences in M 1 to Cauchy sequences in M 2 . In other words, uniform continuity preserves some metric properties which are not purely topological. On the other hand,

9800-583: Was actively involved in developing a pedagogy for gifted children in literature, music, and mathematics. At Moscow State University, Kolmogorov occupied different positions including the heads of several departments: probability , statistics , and random processes ; mathematical logic . He also served as the Dean of the Moscow State University Department of Mechanics and Mathematics. In 1971, Kolmogorov joined an oceanographic expedition aboard

9900-471: Was presumed to have been killed in the Russian Civil War . Andrey Kolmogorov was educated in his aunt Vera's village school, and his earliest literary efforts and mathematical papers were printed in the school journal "The Swallow of Spring". Andrey (at the age of five) was the "editor" of the mathematical section of this journal. Kolmogorov's first mathematical discovery was published in this journal: at

10000-523: Was raised by two of his aunts in Tunoshna (near Yaroslavl ) at the estate of his grandfather, a well-to-do nobleman . Little is known about Andrey's father. He was supposedly named Nikolai Matveyevich Katayev and had been an agronomist . Katayev had been exiled from Saint Petersburg to the Yaroslavl province after his participation in the revolutionary movement against the tsars . He disappeared in 1919 and

#271728