An F -test is any statistical test used to compare the variances of two samples or the ratio of variances between multiple samples. The test statistic , random variable F, is used to determine if the tested data has an F -distribution under the true null hypothesis , and true customary assumptions about the error term (ε). It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact " F -tests" mainly arise when the models have been fitted to the data using least squares . The name was coined by George W. Snedecor , in honour of Ronald Fisher . Fisher initially developed the statistic as the variance ratio in the 1920s.
114-458: Common examples of the use of F -tests include the study of the following cases The F -test is sensitive to non-normality . In the analysis of variance (ANOVA), alternative tests include Levene's test , Bartlett's test , and the Brown–Forsythe test . However, when any of these tests are conducted to test the underlying assumption of homoscedasticity ( i.e. homogeneity of variance), as
228-521: A b x f ( x ) d x = ∫ a b x x 2 + π 2 d x = 1 2 ln b 2 + π 2 a 2 + π 2 . {\displaystyle \int _{a}^{b}xf(x)\,dx=\int _{a}^{b}{\frac {x}{x^{2}+\pi ^{2}}}\,dx={\frac {1}{2}}\ln {\frac {b^{2}+\pi ^{2}}{a^{2}+\pi ^{2}}}.} The limit of this expression as
342-422: A significantly better fit to the data. One approach to this problem is to use an F -test. If there are n data points to estimate parameters of both models from, then one can calculate the F statistic, given by where RSS i is the residual sum of squares of model i . If the regression model has been calculated with weights, then replace RSS i with χ, the weighted sum of squared residuals. Under
456-401: A weighted average of the x i values, with weights given by their probabilities p i . In the special case that all possible outcomes are equiprobable (that is, p 1 = ⋅⋅⋅ = p k ), the weighted average is given by the standard average . In the general case, the expected value takes into account the fact that some outcomes are more likely than others. Informally,
570-421: A → −∞ and b → ∞ does not exist: if the limits are taken so that a = − b , then the limit is zero, while if the constraint 2 a = − b is taken, then the limit is ln(2) . To avoid such ambiguities, in mathematical textbooks it is common to require that the given integral converges absolutely , with E[ X ] left undefined otherwise. However, measure-theoretic notions as given below can be used to give
684-451: A bias tending towards 0 as the sample size tends towards infinity. Usually, the most important case is distributional robustness - robustness to breaking of the assumptions about the underlying distribution of the data. Classical statistical procedures are typically sensitive to "longtailedness" (e.g., when the distribution of the data has longer tails than the assumed normal distribution). This implies that they will be strongly affected by
798-433: A breakdown point of 0 (or finite-sample breakdown point of 1 / n {\displaystyle 1/n} ) because we can make x ¯ {\displaystyle {\overline {x}}} arbitrarily large just by changing any of x 1 , … , x n {\displaystyle x_{1},\dots ,x_{n}} . The higher the breakdown point of an estimator,
912-431: A breakdown point of 0, as a single large observation can throw it off. The median absolute deviation and interquartile range are robust measures of statistical dispersion , while the standard deviation and range are not. Trimmed estimators and Winsorised estimators are general methods to make statistics more robust. L-estimators are a general class of simple statistics, often robust, while M-estimators are
1026-490: A convex subset of the set of all finite signed measures on Σ {\displaystyle \Sigma } . We want to estimate the parameter θ ∈ Θ {\displaystyle \theta \in \Theta } of a distribution F {\displaystyle F} in A {\displaystyle A} . Let the functional T : A → Γ {\displaystyle T:A\rightarrow \Gamma } be
1140-521: A general class of robust statistics, and are now the preferred solution, though they can be quite involved to calculate. Gelman et al. in Bayesian Data Analysis (2004) consider a data set relating to speed-of-light measurements made by Simon Newcomb . The data sets for that book can be found via the Classic data sets page, and the book's website contains more information on the data. Although
1254-411: A mathematician, was provoked and determined to solve the problem once and for all. He began to discuss the problem in the famous series of letters to Pierre de Fermat . Soon enough, they both independently came up with a solution. They solved the problem in different computational ways, but their results were identical because their computations were based on the same fundamental principle. The principle
SECTION 10
#17328980415351368-488: A multidimensional random variable, i.e. a random vector X . It is defined component by component, as E[ X ] i = E[ X i ] . Similarly, one may define the expected value of a random matrix X with components X ij by E[ X ] ij = E[ X ij ] . Consider a random variable X with a finite list x 1 , ..., x k of possible outcomes, each of which (respectively) has probability p 1 , ..., p k of occurring. The expectation of X
1482-413: A preliminary step to testing for mean effects, there is an increase in the experiment-wise Type I error rate. Most F -tests arise by considering a decomposition of the variability in a collection of data in terms of sums of squares . The test statistic in an F -test is the ratio of two scaled sums of squares reflecting different sources of variability. These sums of squares are constructed so that
1596-416: A random variable X is often denoted by E( X ) , E[ X ] , or E X , with E also often stylized as E {\displaystyle \mathbb {E} } or E . The idea of the expected value originated in the middle of the 17th century from the study of the so-called problem of points , which seeks to divide the stakes in a fair way between two players, who have to end their game before it
1710-570: A real number μ {\displaystyle \mu } if and only if the two surfaces in the x {\displaystyle x} - y {\displaystyle y} -plane, described by x ≤ μ , 0 ≤ y ≤ F ( x ) or x ≥ μ , F ( x ) ≤ y ≤ 1 {\displaystyle x\leq \mu ,\;\,0\leq y\leq F(x)\quad {\text{or}}\quad x\geq \mu ,\;\,F(x)\leq y\leq 1} respectively, have
1824-432: A result of the outliers. The MAD is better behaved, and Qn is a little bit more efficient than MAD. This simple example demonstrates that when outliers are present, the standard deviation cannot be recommended as an estimate of scale. Traditionally, statisticians would manually screen data for outliers , and remove them, usually checking the source of the data to see whether the outliers were erroneously recorded. Indeed, in
1938-511: A single test is performed to detect any of several possible differences. Alternatively, we could carry out pairwise tests among the treatments (for instance, in the medical trial example with four treatments we could carry out six tests among pairs of treatments). The advantage of the ANOVA F -test is that we do not need to pre-specify which treatments are to be compared, and we do not need to adjust for making multiple comparisons . The disadvantage of
2052-455: A small circle of mutual scientific friends in Paris about it. In Dutch mathematician Christiaan Huygens' book, he considered the problem of points, and presented a solution based on the same principle as the solutions of Pascal and Fermat. Huygens published his treatise in 1657, (see Huygens (1657) ) " De ratiociniis in ludo aleæ " on probability theory just after visiting Paris. The book extended
2166-517: A small univariate data set containing one modest and one large outlier. The estimated standard deviation will be grossly inflated by the large outlier. The result is that the modest outlier looks relatively normal. As soon as the large outlier is removed, the estimated standard deviation shrinks, and the modest outlier now looks unusual. This problem of masking gets worse as the complexity of the data increases. For example, in regression problems, diagnostic plots are used to identify outliers. However, it
2280-611: A systematic definition of E[ X ] for more general random variables X . All definitions of the expected value may be expressed in the language of measure theory . In general, if X is a real-valued random variable defined on a probability space (Ω, Σ, P) , then the expected value of X , denoted by E[ X ] , is defined as the Lebesgue integral E [ X ] = ∫ Ω X d P . {\displaystyle \operatorname {E} [X]=\int _{\Omega }X\,d\operatorname {P} .} Despite
2394-507: A value in any given open interval is given by the integral of f over that interval. The expectation of X is then given by the integral E [ X ] = ∫ − ∞ ∞ x f ( x ) d x . {\displaystyle \operatorname {E} [X]=\int _{-\infty }^{\infty }xf(x)\,dx.} A general and mathematically precise formulation of this definition uses measure theory and Lebesgue integration , and
SECTION 20
#17328980415352508-469: A variety of stylizations: the expectation operator can be stylized as E (upright), E (italic), or E {\displaystyle \mathbb {E} } (in blackboard bold ), while a variety of bracket notations (such as E( X ) , E[ X ] , and E X ) are all used. Another popular notation is μ X . ⟨ X ⟩ , ⟨ X ⟩ av , and X ¯ {\displaystyle {\overline {X}}} are commonly used in physics. M( X )
2622-418: Is added to the dataset, and to test what happens when an extreme outlier replaces one of the existing data points, and then to consider the effect of multiple additions or replacements. The mean is not a robust measure of central tendency . If the dataset is, e.g., the values {2,3,5,6,9}, then if we add another datapoint with value -1000 or +1000 to the data, the resulting mean will be very different from
2736-640: Is a Borel function ), we can use this inversion formula to obtain E [ g ( X ) ] = 1 2 π ∫ R g ( x ) [ ∫ R e − i t x φ X ( t ) d t ] d x . {\displaystyle \operatorname {E} [g(X)]={\frac {1}{2\pi }}\int _{\mathbb {R} }g(x)\left[\int _{\mathbb {R} }e^{-itx}\varphi _{X}(t)\,dt\right]dx.} If E [ g ( X ) ] {\displaystyle \operatorname {E} [g(X)]}
2850-466: Is a normal Q–Q plot (panel (b)). The outliers are visible in these plots. Panels (c) and (d) of the plot show the bootstrap distribution of the mean (c) and the 10% trimmed mean (d). The trimmed mean is a simple, robust estimator of location that deletes a certain percentage of observations (10% here) from each end of the data, then computes the mean in the usual way. The analysis was performed in R and 10,000 bootstrap samples were used for each of
2964-573: Is a sample from these variables. T n : ( X n , Σ n ) → ( Γ , S ) {\displaystyle T_{n}:({\mathcal {X}}^{n},\Sigma ^{n})\rightarrow (\Gamma ,S)} is an estimator. Let i ∈ { 1 , … , n } {\displaystyle i\in \{1,\dots ,n\}} . The empirical influence function E I F i {\displaystyle EIF_{i}} at observation i {\displaystyle i}
3078-541: Is any random variable with finite expectation, then Markov's inequality may be applied to the random variable | X −E[ X ]| to obtain Chebyshev's inequality P ( | X − E [ X ] | ≥ a ) ≤ Var [ X ] a 2 , {\displaystyle \operatorname {P} (|X-{\text{E}}[X]|\geq a)\leq {\frac {\operatorname {Var} [X]}{a^{2}}},} where Var
3192-462: Is as in the previous example. A number of convergence results specify exact conditions which allow one to interchange limits and expectations, as specified below. The probability density function f X {\displaystyle f_{X}} of a scalar random variable X {\displaystyle X} is related to its characteristic function φ X {\displaystyle \varphi _{X}} by
3306-635: Is called the probability density function of X (relative to Lebesgue measure). According to the change-of-variables formula for Lebesgue integration, combined with the law of the unconscious statistician , it follows that E [ X ] ≡ ∫ Ω X d P = ∫ R x f ( x ) d x {\displaystyle \operatorname {E} [X]\equiv \int _{\Omega }X\,d\operatorname {P} =\int _{\mathbb {R} }xf(x)\,dx} for any absolutely continuous random variable X . The above discussion of continuous random variables
3420-704: Is common that once a few outliers have been removed, others become visible. The problem is even worse in higher dimensions. Robust methods provide automatic ways of detecting, downweighting (or removing), and flagging outliers, largely removing the need for manual screening. Care must be taken; initial data showing the ozone hole first appearing over Antarctica were rejected as outliers by non-human screening. Although this article deals with general principles for univariate statistical methods, robust methods also exist for regression problems, generalized linear models, and parameter estimation of various distributions. The basic tools used to describe and measure robustness are
3534-408: Is defined as E [ X ] = x 1 p 1 + x 2 p 2 + ⋯ + x k p k . {\displaystyle \operatorname {E} [X]=x_{1}p_{1}+x_{2}p_{2}+\cdots +x_{k}p_{k}.} Since the probabilities must satisfy p 1 + ⋅⋅⋅ + p k = 1 , it is natural to interpret E[ X ] as
F-test - Misplaced Pages Continue
3648-736: Is defined as a vector in the space of the estimator, which is in turn defined for a sample which is a subset of the population: For example, The empirical influence function is defined as follows. Let n ∈ N ∗ {\displaystyle n\in \mathbb {N} ^{*}} and X 1 , … , X n : ( Ω , A ) → ( X , Σ ) {\displaystyle X_{1},\dots ,X_{n}:(\Omega ,{\mathcal {A}})\rightarrow ({\mathcal {X}},\Sigma )} are i.i.d. and ( x 1 , … , x n ) {\displaystyle (x_{1},\dots ,x_{n})}
3762-450: Is defined by: What this means is that we are replacing the i -th value in the sample by an arbitrary value and looking at the output of the estimator. Alternatively, the EIF is defined as the effect, scaled by n+1 instead of n, on the estimator of adding the point x {\displaystyle x} to the sample. Instead of relying solely on the data, we could use the distribution of
3876-407: Is easily obtained by setting Y 0 = X 1 {\displaystyle Y_{0}=X_{1}} and Y n = X n + 1 − X n {\displaystyle Y_{n}=X_{n+1}-X_{n}} for n ≥ 1 , {\displaystyle n\geq 1,} where X n {\displaystyle X_{n}}
3990-564: Is equivalent to the representation E [ X ] = ∫ 0 ∞ ( 1 − F ( x ) ) d x − ∫ − ∞ 0 F ( x ) d x , {\displaystyle \operatorname {E} [X]=\int _{0}^{\infty }{\bigl (}1-F(x){\bigr )}\,dx-\int _{-\infty }^{0}F(x)\,dx,} also with convergent integrals. Expected values as defined above are automatically finite numbers. However, in many cases it
4104-436: Is finite if and only if E[ X ] and E[ X ] are both finite. Due to the formula | X | = X + X , this is the case if and only if E| X | is finite, and this is equivalent to the absolute convergence conditions in the definitions above. As such, the present considerations do not define finite expected values in any cases not previously considered; they are only useful for infinite expectations. The following table gives
4218-626: Is finite, changing the order of integration, we get, in accordance with Fubini–Tonelli theorem , E [ g ( X ) ] = 1 2 π ∫ R G ( t ) φ X ( t ) d t , {\displaystyle \operatorname {E} [g(X)]={\frac {1}{2\pi }}\int _{\mathbb {R} }G(t)\varphi _{X}(t)\,dt,} where G ( t ) = ∫ R g ( x ) e − i t x d x {\displaystyle G(t)=\int _{\mathbb {R} }g(x)e^{-itx}\,dx}
4332-1052: Is fundamental to be able to consider expected values of ±∞ . This is intuitive, for example, in the case of the St. Petersburg paradox , in which one considers a random variable with possible outcomes x i = 2 , with associated probabilities p i = 2 , for i ranging over all positive integers. According to the summation formula in the case of random variables with countably many outcomes, one has E [ X ] = ∑ i = 1 ∞ x i p i = 2 ⋅ 1 2 + 4 ⋅ 1 4 + 8 ⋅ 1 8 + 16 ⋅ 1 16 + ⋯ = 1 + 1 + 1 + 1 + ⋯ . {\displaystyle \operatorname {E} [X]=\sum _{i=1}^{\infty }x_{i}\,p_{i}=2\cdot {\frac {1}{2}}+4\cdot {\frac {1}{4}}+8\cdot {\frac {1}{8}}+16\cdot {\frac {1}{16}}+\cdots =1+1+1+1+\cdots .} It
4446-429: Is known to be asymptotically normal due to the central limit theorem. However, outliers can make the distribution of the mean non-normal, even for fairly large data sets. Besides this non-normality, the mean is also inefficient in the presence of outliers and less variable measures of location are available. The plot below shows a density plot of the speed-of-light data, together with a rug plot (panel (a)). Also shown
4560-467: Is natural to say that the expected value equals +∞ . There is a rigorous mathematical theory underlying such ideas, which is often taken as part of the definition of the Lebesgue integral. The first fundamental observation is that, whichever of the above definitions are followed, any nonnegative random variable whatsoever can be given an unambiguous expected value; whenever absolute convergence fails, then
4674-510: Is of practical interest. The empirical influence function is a measure of the dependence of the estimator on the value of any one of the points in the sample. It is a model-free measure in the sense that it simply relies on calculating the estimator again with a different sample. On the right is Tukey's biweight function, which, as we will later see, is an example of what a "good" (in a sense defined later on) empirical influence function should look like. In mathematical terms, an influence function
F-test - Misplaced Pages Continue
4788-425: Is often assumed that the data errors are normally distributed, at least approximately, or that the central limit theorem can be relied on to produce normally distributed estimates. Unfortunately, when there are outliers in the data, classical estimators often have very poor performance, when judged using the breakdown point and the influence function described below. The practical effect of problems seen in
4902-637: Is otherwise available. For example, in the case of an unweighted dice, Chebyshev's inequality says that odds of rolling between 1 and 6 is at least 53%; in reality, the odds are of course 100%. The Kolmogorov inequality extends the Chebyshev inequality to the context of sums of random variables. The following three inequalities are of fundamental importance in the field of mathematical analysis and its applications to probability theory. The Hölder and Minkowski inequalities can be extended to general measure spaces , and are often given in that context. By contrast,
5016-418: Is properly finished. This problem had been debated for centuries. Many conflicting proposals and solutions had been suggested over the years when it was posed to Blaise Pascal by French writer and amateur mathematician Chevalier de Méré in 1654. Méré claimed that this problem could not be solved and that it showed just how flawed mathematics was when it came to its application to the real world. Pascal, being
5130-403: Is that the value of a future gain should be directly proportional to the chance of getting it. This principle seemed to have come naturally to both of them. They were very pleased by the fact that they had found essentially the same solution, and this in turn made them absolutely convinced that they had solved the problem conclusively; however, they did not publish their findings. They only informed
5244-593: Is the Fourier transform of g ( x ) . {\displaystyle g(x).} The expression for E [ g ( X ) ] {\displaystyle \operatorname {E} [g(X)]} also follows directly from the Plancherel theorem . The expectation of a random variable plays an important role in a variety of contexts. In statistics , where one seeks estimates for unknown parameters based on available data gained from samples ,
5358-478: Is the variance . These inequalities are significant for their nearly complete lack of conditional assumptions. For example, for any random variable with finite expectation, the Chebyshev inequality implies that there is at least a 75% probability of an outcome being within two standard deviations of the expected value. However, in special cases the Markov and Chebyshev inequalities often give much weaker information than
5472-417: Is the unrestricted one. That is, model 1 has p 1 parameters, and model 2 has p 2 parameters, where p 1 < p 2 , and for any choice of parameters in model 1, the same regression curve can be achieved by some choice of the parameters of model 2. One common context in this regard is that of deciding whether a model fits the data significantly better than does a naive model, in which
5586-1673: Is then natural to define: E [ X ] = { E [ X + ] − E [ X − ] if E [ X + ] < ∞ and E [ X − ] < ∞ ; + ∞ if E [ X + ] = ∞ and E [ X − ] < ∞ ; − ∞ if E [ X + ] < ∞ and E [ X − ] = ∞ ; undefined if E [ X + ] = ∞ and E [ X − ] = ∞ . {\displaystyle \operatorname {E} [X]={\begin{cases}\operatorname {E} [X^{+}]-\operatorname {E} [X^{-}]&{\text{if }}\operatorname {E} [X^{+}]<\infty {\text{ and }}\operatorname {E} [X^{-}]<\infty ;\\+\infty &{\text{if }}\operatorname {E} [X^{+}]=\infty {\text{ and }}\operatorname {E} [X^{-}]<\infty ;\\-\infty &{\text{if }}\operatorname {E} [X^{+}]<\infty {\text{ and }}\operatorname {E} [X^{-}]=\infty ;\\{\text{undefined}}&{\text{if }}\operatorname {E} [X^{+}]=\infty {\text{ and }}\operatorname {E} [X^{-}]=\infty .\end{cases}}} According to this definition, E[ X ] exists and
5700-514: Is thus a special case of the general Lebesgue theory, due to the fact that every piecewise-continuous function is measurable. The expected value of any real-valued random variable X {\displaystyle X} can also be defined on the graph of its cumulative distribution function F {\displaystyle F} by a nearby equality of areas. In fact, E [ X ] = μ {\displaystyle \operatorname {E} [X]=\mu } with
5814-600: Is to provide methods with good performance when there are small departures from a parametric distribution . For example, robust methods work well for mixtures of two normal distributions with different standard deviations ; under this model, non-robust methods like a t-test work poorly. Robust statistics seek to provide methods that emulate popular statistical methods, but are not unduly affected by outliers or other small departures from model assumptions . In statistics, classical estimation methods rely heavily on assumptions that are often not met in practice. In particular, it
SECTION 50
#17328980415355928-458: Is used in Russian-language literature. As discussed above, there are several context-dependent ways of defining the expected value. The simplest and original definition deals with the case of finitely many possible outcomes, such as in the flip of a coin. With the theory of infinite series, this can be extended to the case of countably many possible outcomes. It is also very common to consider
6042-403: Is worth just such a Sum, as wou'd procure in the same Chance and Expectation at a fair Lay. ... If I expect a or b, and have an equal chance of gaining them, my Expectation is worth (a+b)/2. More than a hundred years later, in 1814, Pierre-Simon Laplace published his tract " Théorie analytique des probabilités ", where the concept of expected value was defined explicitly: ... this advantage in
6156-411: The F -distribution with degrees of freedom d 1 = K − 1 {\displaystyle d_{1}=K-1} and d 2 = N − K {\displaystyle d_{2}=N-K} under the null hypothesis. The statistic will be large if the between-group variability is large relative to the within-group variability, which is unlikely to happen if
6270-669: The breakdown point , the influence function and the sensitivity curve . Intuitively, the breakdown point of an estimator is the proportion of incorrect observations (e.g. arbitrarily large observations) an estimator can handle before giving an incorrect (e.g., arbitrarily large) result. Usually, the asymptotic (infinite sample) limit is quoted as the breakdown point, although the finite-sample breakdown point may be more useful. For example, given n {\displaystyle n} independent random variables ( X 1 , … , X n ) {\displaystyle (X_{1},\dots ,X_{n})} and
6384-454: The expected value (also called expectation , expectancy , expectation operator , mathematical expectation , mean , expectation value , or first moment ) is a generalization of the weighted average . Informally, the expected value is the mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in
6498-435: The expected values of a quantitative variable within several pre-defined groups differ from each other. For example, suppose that a medical trial compares four treatments. The ANOVA F -test can be used to assess whether any of the treatments are on average superior, or inferior, to the others versus the null hypothesis that all four treatments yield the same mean response. This is an example of an "omnibus" test, meaning that
6612-509: The median absolute deviation (MAD) and the Rousseeuw–Croux (Qn) estimator of scale. The plots are based on 10,000 bootstrap samples for each estimator, with some Gaussian noise added to the resampled data ( smoothed bootstrap ). Panel (a) shows the distribution of the standard deviation, (b) of the MAD and (c) of Qn. [REDACTED] The distribution of standard deviation is erratic and wide,
6726-474: The population means of the groups all have the same value. The result of the F test can be determined by comparing calculated F value and critical F value with specific significance level (e.g. 5%). The F table serves as a reference guide containing critical F values for the distribution of the F-statistic under the assumption of a true null hypothesis. It is designed to help determine the threshold beyond which
6840-408: The sample mean serves as an estimate for the expectation, and is itself a random variable. In such settings, the sample mean is considered to meet the desirable criterion for a "good" estimator in being unbiased ; that is, the expected value of the estimate is equal to the true value of the underlying parameter. For a different example, in decision theory , an agent making an optimal choice in
6954-429: The ANOVA F -test is that if we reject the null hypothesis , we do not know which treatments can be said to be significantly different from the others, nor, if the F -test is performed at level α, can we state that the treatment pair with the greatest mean difference is significantly different at level α. Consider two models, 1 and 2, where model 1 is 'nested' within model 2. Model 1 is the restricted model, and model 2
SECTION 60
#17328980415357068-455: The F statistic < the critical F value If the F statistic > the critical F value Note that when there are only two groups for the one-way ANOVA F -test, F = t 2 {\displaystyle F=t^{2}} where t is the Student's t {\displaystyle t} statistic . The F -test in one-way analysis of variance ( ANOVA ) is used to assess whether
7182-403: The F statistic is expected to exceed a controlled percentage of the time (e.g., 5%) when the null hypothesis is accurate. To locate the critical F value in the F table, one needs to utilize the respective degrees of freedom. This involves identifying the appropriate row and column in the F table that corresponds to the significance level being tested (e.g., 5%). How to use critical F values: If
7296-527: The Jensen inequality is special to the case of probability spaces. In general, it is not the case that E [ X n ] → E [ X ] {\displaystyle \operatorname {E} [X_{n}]\to \operatorname {E} [X]} even if X n → X {\displaystyle X_{n}\to X} pointwise. Thus, one cannot interchange limits and expectation, without additional conditions on
7410-398: The Lebesgue theory of expectation is identical to the summation formulas given above. However, the Lebesgue theory clarifies the scope of the theory of probability density functions. A random variable X is said to be absolutely continuous if any of the following conditions are satisfied: These conditions are all equivalent, although this is nontrivial to establish. In this definition, f
7524-469: The asymptotic value of some estimator sequence ( T n ) n ∈ N {\displaystyle (T_{n})_{n\in \mathbb {N} }} . We will suppose that this functional is Fisher consistent , i.e. ∀ θ ∈ Θ , T ( F θ ) = θ {\displaystyle \forall \theta \in \Theta ,T(F_{\theta })=\theta } . This means that at
7638-399: The bulk of the data looks to be more or less normally distributed, there are two obvious outliers. These outliers have a large effect on the mean, dragging it towards them, and away from the center of the bulk of the data. Thus, if the mean is intended as a measure of the location of the center of the data, it is, in a sense, biased when outliers are present. Also, the distribution of the mean
7752-507: The concept of expectation by adding rules for how to calculate expectations in more complicated situations than the original problem (e.g., for three or more players), and can be seen as the first successful attempt at laying down the foundations of the theory of probability . In the foreword to his treatise, Huygens wrote: It should be said, also, that for some time some of the best mathematicians of France have occupied themselves with this kind of calculus so that no one should attribute to me
7866-417: The corresponding realizations x 1 , … , x n {\displaystyle x_{1},\dots ,x_{n}} , we can use X n ¯ := X 1 + ⋯ + X n n {\displaystyle {\overline {X_{n}}}:={\frac {X_{1}+\cdots +X_{n}}{n}}} to estimate the mean. Such an estimator has
7980-462: The corresponding theory of absolutely continuous random variables is described in the next section. The density functions of many common distributions are piecewise continuous , and as such the theory is often developed in this restricted setting. For such functions, it is sufficient to only consider the standard Riemann integration . Sometimes continuous random variables are defined as those corresponding to this special class of densities, although
8094-490: The distinct case of random variables dictated by (piecewise-)continuous probability density functions , as these arise in many natural contexts. All of these specific definitions may be viewed as special cases of the general definition based upon the mathematical tools of measure theory and Lebesgue integration , which provide these different contexts with an axiomatic foundation and common language. Any definition of expected value may be extended to define an expected value of
8208-527: The expectation of a random variable with a countably infinite set of possible outcomes is defined analogously as the weighted average of all possible outcomes, where the weights are given by the probabilities of realizing each given value. This is to say that E [ X ] = ∑ i = 1 ∞ x i p i , {\displaystyle \operatorname {E} [X]=\sum _{i=1}^{\infty }x_{i}\,p_{i},} where x 1 , x 2 , ... are
8322-492: The expected value can be defined as +∞ . The second fundamental observation is that any random variable can be written as the difference of two nonnegative random variables. Given a random variable X , one defines the positive and negative parts by X = max( X , 0) and X = −min( X , 0) . These are nonnegative random variables, and it can be directly checked that X = X − X . Since E[ X ] and E[ X ] are both then defined as either nonnegative numbers or +∞ , it
8436-494: The expected value operator is not σ {\displaystyle \sigma } -additive, i.e. E [ ∑ n = 0 ∞ Y n ] ≠ ∑ n = 0 ∞ E [ Y n ] . {\displaystyle \operatorname {E} \left[\sum _{n=0}^{\infty }Y_{n}\right]\neq \sum _{n=0}^{\infty }\operatorname {E} [Y_{n}].} An example
8550-488: The expected values of some commonly occurring probability distributions . The third column gives the expected values both in the form immediately given by the definition, as well as in the simplified form obtained by computation therefrom. The details of these computations, which are not always straightforward, can be found in the indicated references. The basic properties below (and their names in bold) replicate or follow immediately from those of Lebesgue integral . Note that
8664-443: The following problems: There are various definitions of a "robust statistic ". Strictly speaking, a robust statistic is resistant to errors in the results, produced by deviations from assumptions (e.g., of normality). This means that if the assumptions are only approximately met, the robust estimator will still have a reasonable efficiency , and reasonably small bias , as well as being asymptotically unbiased , meaning having
8778-401: The honour of the first invention. This does not belong to me. But these savants, although they put each other to the test by proposing to each other many questions difficult to solve, have hidden their methods. I have had therefore to examine and go deeply for myself into this matter by beginning with the elements, and it is impossible for me for this reason to affirm that I have even started from
8892-1193: The indicator function of the event A . {\displaystyle A.} Then, it follows that X n → 0 {\displaystyle X_{n}\to 0} pointwise. But, E [ X n ] = n ⋅ Pr ( U ∈ [ 0 , 1 n ] ) = n ⋅ 1 n = 1 {\displaystyle \operatorname {E} [X_{n}]=n\cdot \Pr \left(U\in \left[0,{\tfrac {1}{n}}\right]\right)=n\cdot {\tfrac {1}{n}}=1} for each n . {\displaystyle n.} Hence, lim n → ∞ E [ X n ] = 1 ≠ 0 = E [ lim n → ∞ X n ] . {\displaystyle \lim _{n\to \infty }\operatorname {E} [X_{n}]=1\neq 0=\operatorname {E} \left[\lim _{n\to \infty }X_{n}\right].} Analogously, for general sequence of random variables { Y n : n ≥ 0 } , {\displaystyle \{Y_{n}:n\geq 0\},}
9006-399: The infinite sum is a finite number independent of the ordering of summands. In the alternative case that the infinite sum does not converge absolutely, one says the random variable does not have finite expectation. Now consider a random variable X which has a probability density function given by a function f on the real number line . This means that the probability of X taking on
9120-499: The influence function can be studied empirically by examining the sampling distribution of proposed estimators under a mixture model , where one mixes in a small amount (1–5% is often sufficient) of contamination. For instance, one may use a mixture of 95% a normal distribution, and 5% a normal distribution with the same mean but significantly higher standard deviation (representing outliers). Robust parametric statistics can proceed in two ways: Robust estimates have been studied for
9234-552: The inversion formula: f X ( x ) = 1 2 π ∫ R e − i t x φ X ( t ) d t . {\displaystyle f_{X}(x)={\frac {1}{2\pi }}\int _{\mathbb {R} }e^{-itx}\varphi _{X}(t)\,dt.} For the expected value of g ( X ) {\displaystyle g(X)} (where g : R → R {\displaystyle g:{\mathbb {R} }\to {\mathbb {R} }}
9348-447: The letters "a.s." stand for " almost surely "—a central property of the Lebesgue integral. Basically, one says that an inequality like X ≥ 0 {\displaystyle X\geq 0} is true almost surely, when the probability measure attributes zero-mass to the complementary event { X < 0 } . {\displaystyle \left\{X<0\right\}.} Concentration inequalities control
9462-447: The likelihood of a random variable taking on large values. Markov's inequality is among the best-known and simplest to prove: for a nonnegative random variable X and any positive number a , it states that P ( X ≥ a ) ≤ E [ X ] a . {\displaystyle \operatorname {P} (X\geq a)\leq {\frac {\operatorname {E} [X]}{a}}.} If X
9576-495: The likelihood ratio statistic, the F -test is a likelihood ratio test . Robust statistics Robust statistics are statistics that maintain their properties even if the underlying distributional assumptions are incorrect. Robust statistical methods have been developed for many common problems, such as estimating location , scale , and regression parameters . One motivation is to produce statistical methods that are not unduly affected by outliers . Another motivation
9690-416: The mean of the original data. Similarly, if we replace one of the values with a datapoint of value -1000 or +1000 then the resulting mean will be very different from the mean of the original data. The median is a robust measure of central tendency . Taking the same dataset {2,3,5,6,9}, if we add another datapoint with value -1000 or +1000 then the median will change slightly, but it will still be similar to
9804-460: The median has a breakdown point of 0.5. The X% trimmed mean has a breakdown point of X%, for the chosen level of X. Huber (1981) and Maronna et al. (2019) contain more details. The level and the power breakdown points of tests are investigated in He, Simpson & Portnoy (1990) . Statistics with high breakdown points are sometimes called resistant statistics. In the speed-of-light example, removing
9918-410: The median of the original data. If we replace one of the values with a data point of value -1000 or +1000 then the resulting median will still be similar to the median of the original data. Described in terms of breakdown points , the median has a breakdown point of 50%, meaning that half the points must be outliers before the median can be moved outside the range of the non-outliers, while the mean has
10032-521: The model F {\displaystyle F} , the estimator sequence asymptotically measures the correct quantity. Let G {\displaystyle G} be some distribution in A {\displaystyle A} . What happens when the data doesn't follow the model F {\displaystyle F} exactly but another, slightly different, "going towards" G {\displaystyle G} ? We're looking at: Expected value In probability theory ,
10146-417: The more robust it is. Intuitively, we can understand that a breakdown point cannot exceed 50% because if more than half of the observations are contaminated, it is not possible to distinguish between the underlying distribution and the contaminating distribution Rousseeuw & Leroy (1987) . Therefore, the maximum breakdown point is 0.5 and there are estimators which achieve such a breakdown point. For example,
10260-432: The newly abstract situation, this definition is extremely similar in nature to the very simplest definition of expected values, given above, as certain weighted averages. This is because, in measure theory, the value of the Lebesgue integral of X is defined via weighted averages of approximations of X which take on finitely many values. Moreover, if given a random variable with finitely or countably many possible values,
10374-415: The null hypothesis that model 2 does not provide a significantly better fit than model 1, F will have an F distribution, with ( p 2 − p 1 , n − p 2 ) degrees of freedom . The null hypothesis is rejected if the F calculated from the data is greater than the critical value of the F -distribution for some desired false-rejection probability (e.g. 0.05). Since F is a monotone function of
10488-483: The one-way ANOVA F -test statistic is or The "explained variance", or "between-group variability" is where Y ¯ i ⋅ {\displaystyle {\bar {Y}}_{i\cdot }} denotes the sample mean in the i -th group, n i {\displaystyle n_{i}} is the number of observations in the i -th group, Y ¯ {\displaystyle {\bar {Y}}} denotes
10602-434: The only explanatory term is the intercept term, so that all predicted values for the dependent variable are set equal to that variable's sample mean. The naive model is the restricted model, since the coefficients of all potential explanatory variables are restricted to equal zero. Another common context is deciding whether there is a structural break in the data: here the restricted model uses all data in one regression, while
10716-449: The overall mean of the data, and K {\displaystyle K} denotes the number of groups. The "unexplained variance", or "within-group variability" is where Y i j {\displaystyle Y_{ij}} is the j observation in the i out of K {\displaystyle K} groups and N {\displaystyle N} is the overall sample size. This F -statistic follows
10830-497: The possible outcomes of the random variable X and p 1 , p 2 , ... are their corresponding probabilities. In many non-mathematical textbooks, this is presented as the full definition of expected values in this context. However, there are some subtleties with infinite summation, so the above formula is not suitable as a mathematical definition. In particular, the Riemann series theorem of mathematical analysis illustrates that
10944-622: The presence of outliers in the data, and the estimates they produce may be heavily distorted if there are extreme outliers in the data, compared to what they would be if the outliers were not included in the data. By contrast, more robust estimators that are not so sensitive to distributional distortions such as longtailedness are also resistant to the presence of outliers. Thus, in the context of robust statistics, distributionally robust and outlier-resistant are effectively synonymous. For one perspective on research in robust statistics up to 2000, see Portnoy & He (2000) . Some experts prefer
11058-466: The random variables. The approach is quite different from that of the previous paragraph. What we are now trying to do is to see what happens to an estimator when we change the distribution of the data slightly: it assumes a distribution, and measures sensitivity to change in this distribution. By contrast, the empirical influence assumes a sample set, and measures sensitivity to change in the samples. Let A {\displaystyle A} be
11172-672: The random variables. To see this, let U {\displaystyle U} be a random variable distributed uniformly on [ 0 , 1 ] . {\displaystyle [0,1].} For n ≥ 1 , {\displaystyle n\geq 1,} define a sequence of random variables X n = n ⋅ 1 { U ∈ ( 0 , 1 n ) } , {\displaystyle X_{n}=n\cdot \mathbf {1} \left\{U\in \left(0,{\tfrac {1}{n}}\right)\right\},} with 1 { A } {\displaystyle \mathbf {1} \{A\}} being
11286-456: The raw and trimmed means. The distribution of the mean is clearly much wider than that of the 10% trimmed mean (the plots are on the same scale). Also whereas the distribution of the trimmed mean appears to be close to normal, the distribution of the raw mean is quite skewed to the left. So, in this sample of 66 observations, only 2 outliers cause the central limit theorem to be inapplicable. [REDACTED] Robust statistical methods, of which
11400-438: The same finite area, i.e. if ∫ − ∞ μ F ( x ) d x = ∫ μ ∞ ( 1 − F ( x ) ) d x {\displaystyle \int _{-\infty }^{\mu }F(x)\,dx=\int _{\mu }^{\infty }{\big (}1-F(x){\big )}\,dx} and both improper Riemann integrals converge. Finally, this
11514-418: The same principle. But finally I have found that my answers in many cases do not differ from theirs. In the mid-nineteenth century, Pafnuty Chebyshev became the first person to think systematically in terms of the expectations of random variables . Neither Pascal nor Huygens used the term "expectation" in its modern sense. In particular, Huygens writes: That any one Chance or Expectation to win any thing
11628-445: The sample data set; it is not the value you would "expect" to get in reality. The expected value of a random variable with a finite number of outcomes is a weighted average of all possible outcomes. In the case of a continuum of possible outcomes, the expectation is defined by integration . In the axiomatic foundation for probability provided by measure theory , the expectation is given by Lebesgue integration . The expected value of
11742-422: The sections below. The outliers in the speed-of-light data have more than just an adverse effect on the mean; the usual estimate of scale is the standard deviation, and this quantity is even more badly affected by outliers because the squares of the deviations from the mean go into the calculation, so the outliers' effects are exacerbated. The plots below show the bootstrap distributions of the standard deviation,
11856-544: The speed-of-light data is 27.43. Removing the two lowest observations and recomputing gives 27.67. The trimmed mean is less affected by the outliers and has a higher breakdown point. If we replace the lowest observation, −44, by −1000, the mean becomes 11.73, whereas the 10% trimmed mean is still 27.43. In many areas of applied statistics, it is common for data to be log-transformed to make them near symmetrical. Very small values become large negative when log-transformed, and zeroes become negatively infinite. Therefore, this example
11970-425: The speed-of-light example above, it is easy to see and remove the two outliers prior to proceeding with any further analysis. However, in modern times, data sets often consist of large numbers of variables being measured on large numbers of experimental units. Therefore, manual screening for outliers is often impractical. Outliers can often interact in such a way that they mask each other. As a simple example, consider
12084-409: The statistic tends to be greater when the null hypothesis is not true. In order for the statistic to follow the F -distribution under the null hypothesis, the sums of squares should be statistically independent , and each should follow a scaled χ²-distribution . The latter condition is guaranteed if the data values are independent and normally distributed with a common variance . The formula for
12198-505: The sum hoped for. We will call this advantage mathematical hope. The use of the letter E to denote "expected value" goes back to W. A. Whitworth in 1901. The symbol has since become popular for English writers. In German, E stands for Erwartungswert , in Spanish for esperanza matemática , and in French for espérance mathématique. When "E" is used to denote "expected value", authors use
12312-438: The term resistant statistics for distributional robustness, and reserve 'robustness' for non-distributional robustness, e.g., robustness to violation of assumptions about the probability model or estimator, but this is a minority usage. Plain 'robustness' to mean 'distributional robustness' is common. When considering how robust an estimator is to the presence of outliers, it is useful to test what happens when an extreme outlier
12426-460: The term is used differently by various authors. Analogously to the countably-infinite case above, there are subtleties with this expression due to the infinite region of integration. Such subtleties can be seen concretely if the distribution of X is given by the Cauchy distribution Cauchy(0, π) , so that f ( x ) = ( x + π ) . It is straightforward to compute in this case that ∫
12540-411: The theory of chance is the product of the sum hoped for by the probability of obtaining it; it is the partial sum which ought to result when we do not wish to run the risks of the event in supposing that the division is made proportional to the probabilities. This division is the only equitable one when all strange circumstances are eliminated; because an equal degree of probability gives an equal right for
12654-416: The trimmed mean is a simple example, seek to outperform classical statistical methods in the presence of outliers, or, more generally, when underlying parametric assumptions are not quite correct. Whilst the trimmed mean performs well relative to the mean in this example, better robust estimates are available. In fact, the mean, median and trimmed mean are all special cases of M-estimators . Details appear in
12768-464: The two lowest observations causes the mean to change from 26.2 to 27.75, a change of 1.55. The estimate of scale produced by the Qn method is 6.3. We can divide this by the square root of the sample size to get a robust standard error, and we find this quantity to be 0.78. Thus, the change in the mean resulting from removing two outliers is approximately twice the robust standard error. The 10% trimmed mean for
12882-504: The unrestricted model uses separate regressions for two different subsets of the data. This use of the F-test is known as the Chow test . The model with more parameters will always be able to fit the data at least as well as the model with fewer parameters. Thus typically model 2 will give a better (i.e. lower error) fit to the data than model 1. But one often wants to determine whether model 2 gives
12996-411: The value of certain infinite sums involving positive and negative summands depends on the order in which the summands are given. Since the outcomes of a random variable have no naturally given order, this creates a difficulty in defining expected value precisely. For this reason, many mathematical textbooks only consider the case that the infinite sum given above converges absolutely , which implies that
#534465