In statistical hypothesis testing , e-values quantify the evidence in the data against a null hypothesis (e.g., "the coin is fair", or, in a medical context, "this new treatment has no effect"). They serve as a more robust alternative to p-values , addressing some shortcomings of the latter.
77-402: In contrast to p-values, e-values can deal with optional continuation: e-values of subsequent experiments (e.g. clinical trials concerning the same treatment) may simply be multiplied to provide a new, "product" e-value that represents the evidence in the joint experiment. This works even if, as often happens in practice, the decision to perform later experiments may depend in vague, unknown ways on
154-645: A meta-analysis . The advantage of e-values in this setting is that they allow for optional continuation. Indeed, they have been employed in what may be the world's first fully 'online' meta-analysis with explicit Type-I error control. Informally, optional continuation implies that the product of any number of e-values, E ( 1 ) , E ( 2 ) , … {\displaystyle E_{(1)},E_{(2)},\ldots } , defined on independent samples Y ( 1 ) , Y ( 2 ) , … {\displaystyle Y_{(1)},Y_{(2)},\ldots } ,
231-425: A type I error , or a false positive , is the rejection of the null hypothesis when it is actually true. A type II error , or a false negative , is the failure to reject a null hypothesis that is actually false. Type I error: an innocent person may be convicted. Type II error: a guilty person may be not convicted. Much of statistical theory revolves around the minimization of one or both of these errors, though
308-447: A data-dependent level α ~ {\displaystyle {\widetilde {\alpha }}} is controlled for every choice of the data-dependent significance level. Traditional p-values only satisfy this guarantee for data-independent or pre-specified levels. This stronger guarantee is also called the post-hoc α {\displaystyle \alpha } Type-I error , as it allows one to choose
385-651: A generalization of a level 0 test. This interpretation shows that e-values are indeed fundamental to testing: they are equivalent to tests, thinly veiled by a rescaling. From this perspective, it may be surprising that typical e-values look very different from traditional tests: maximizing the objective E Q [ ε α ] {\displaystyle \mathbb {E} ^{Q}[\varepsilon _{\alpha }]} for an alternative hypothesis H 1 = { Q } {\displaystyle H_{1}=\{Q\}} would yield traditional Neyman-Pearson style tests. Indeed, this maximizes
462-630: A p-value satisfies this guarantee if and only if it is the reciprocal 1 / E {\displaystyle 1/E} of an e-variable E {\displaystyle E} . The interpretation of this guarantee is that, on average, the relative Type-I error distortion P ( p ′ ≤ α ~ ∣ α ~ ) / α ~ {\displaystyle P(p^{\prime }\leq {\widetilde {\alpha }}\mid {\widetilde {\alpha }})/{\widetilde {\alpha }}} caused by using
539-400: A particular hypothesis amongst a "set of alternative hypotheses", H 1 , H 2 ..., it was easy to make an error, [and] these errors will be of two kinds: In all of the papers co-written by Neyman and Pearson the expression H 0 always signifies "the hypothesis to be tested". In the same paper they call these two sources of error, errors of type I and errors of type II respectively. It
616-524: A particular sample may be judged as likely to have been randomly drawn from a certain population": and, as Florence Nightingale David remarked, "it is necessary to remember the adjective 'random' [in the term 'random sample'] should apply to the method of drawing the sample and not to the sample itself". They identified "two sources of error", namely: In 1930, they elaborated on these two sources of error, remarking that in testing hypotheses two considerations must be kept in view, we must be able to reduce
693-629: A product of e-values will ever become larger than 1 / α {\displaystyle 1/\alpha } is bounded by α {\displaystyle \alpha } . Thus if we decide to combine the samples observed so far and reject the null if the product e-value is larger than 1 / α {\displaystyle 1/\alpha } , then our Type-I error probability remains bounded by α {\displaystyle \alpha } . We say that testing based on e-values remains safe (Type-I valid) under optional continuation . Mathematically, this
770-413: A single outcome and τ {\displaystyle \tau } a fixed sample size or some stopping time. We shall refer to such Y {\displaystyle Y} , which represent the full sequence of outcomes of a statistical experiment, as a sample or batch of outcomes. But in some cases Y {\displaystyle Y} may also be an unordered bag of outcomes or
847-552: A single outcome. An e-variable or e-statistic is a nonnegative random variable E = E ( Y ) {\displaystyle E=E(Y)} such that under all P ∈ H 0 {\displaystyle P\in H_{0}} , its expected value is bounded by 1: E P [ E ] ≤ 1 {\displaystyle {\mathbb {E} }_{P}[E]\leq 1} . The value taken by e-variable E {\displaystyle E}
SECTION 10
#1732876135789924-659: A statistical model, and w {\displaystyle w} a prior density on Θ {\displaystyle \Theta } , then we can set Q {\displaystyle Q} as above to be the Bayes marginal distribution with density q ( Y ) = ∫ q θ ( Y ) w ( θ ) d θ {\displaystyle q(Y)=\int q_{\theta }(Y)w(\theta )d\theta } and then E = q ( Y ) / p 0 ( Y ) {\displaystyle E=q(Y)/p_{0}(Y)}
1001-809: A stronger guarantee. In particular, for every possibly data-dependent significance level α ~ > 0 {\displaystyle {\widetilde {\alpha }}>0} , we have E [ P ( p ′ ≤ α ~ ∣ α ~ ) α ~ ] ≤ 1 , {\displaystyle \mathbb {E} \left[{\frac {P(p^{\prime }\leq {\widetilde {\alpha }}\mid {\widetilde {\alpha }})}{\widetilde {\alpha }}}\right]\leq 1,} if and only if E [ 1 / p ′ ] ≤ 1 {\displaystyle \mathbb {E} [1/p^{\prime }]\leq 1} . This means that
1078-430: A ticket for 1 monetary unit, with nonnegative pay-off E = E ( Y ) {\displaystyle E=E(Y)} . The statements " E {\displaystyle E} is an e-variable" and "if the null hypothesis is true, you do not expect to gain any money if you engage in this bet" are logically equivalent. This is because E {\displaystyle E} being an e-variable means that
1155-419: A type II error corresponds to acquitting a criminal. The crossover error rate (CER) is the point at which type I errors and type II errors are equal. A system with a lower CER value provides more accuracy than a system with a higher CER value. In terms of false positives and false negatives, a positive result corresponds to rejecting the null hypothesis, while a negative result corresponds to failing to reject
1232-575: A weighted harmonic average of post-hoc p-values is still a post-hoc p-value. Let H 0 = { P 0 } {\displaystyle H_{0}=\{P_{0}\}} be a simple null hypothesis. Let Q {\displaystyle Q} be any other distribution on Y {\displaystyle Y} , and let E := q ( Y ) p 0 ( Y ) {\displaystyle E:={\frac {q(Y)}{p_{0}(Y)}}} be their likelihood ratio. Then E {\displaystyle E}
1309-412: Is a difference or an association. If the result of the test corresponds with reality, then a correct decision has been made. However, if the result of the test does not correspond with reality, then an error has occurred. There are two situations in which the decision is wrong. The null hypothesis may be true, whereas we reject H 0 {\textstyle H_{0}} . On the other hand,
1386-469: Is a valid p-value. Moreover, the e-value based test with significance level α {\displaystyle \alpha } , which rejects P 0 {\displaystyle P_{0}} if p ′ ≤ α {\displaystyle p^{\prime }\leq \alpha } , has a Type-I error bounded by α {\displaystyle \alpha } . But, whereas with standard p-values
1463-474: Is also a Bayes factor of H 0 {\displaystyle H_{0}} vs. H 1 := Q {\displaystyle H_{1}:={\mathcal {Q}}} . If the null is composite, then some special e-variables can be written as Bayes factors with some very special priors, but most Bayes factors one encounters in practice are not e-variables and many e-variables one encounters in practice are not Bayes factors. Suppose you can buy
1540-465: Is an e-variable. As a consequence, we can truly reject at level p ′ {\displaystyle p^{\prime }} and still retain the post-hoc Type-I error guarantee. For a traditional p-value p {\displaystyle p} , rejecting at level p comes with no such guarantee. Moreover, a post-hoc p-value inherits optional continuation and merging properties of e-values. But instead of an arithmetic weighted average,
1617-694: Is an e-variable. Conversely, any e-variable relative to a simple null H 0 = { P 0 } {\displaystyle H_{0}=\{P_{0}\}} can be written as a likelihood ratio with respect to some distribution Q {\displaystyle Q} . Thus, when the null is simple, e-variables coincide with likelihood ratios. E-variables exist for general composite nulls as well though, and they may then be thought of as generalizations of likelihood ratios. The two main ways of constructing e-variables, UI and RIPr (see below) both lead to expressions that are variations of likelihood ratios as well. Two other standard generalizations of
SECTION 20
#17328761357891694-446: Is an estimate for λ {\displaystyle {\lambda }} , based only on past data X i − 1 = ( X 1 , … , X i − 1 ) {\displaystyle X^{i-1}=(X_{1},\ldots ,X_{i-1})} , and designed to make E i , λ {\displaystyle E_{i,\lambda }} as large as possible in
1771-404: Is an integral part of hypothesis testing . The test goes about choosing about two competing propositions called null hypothesis , denoted by H 0 {\textstyle H_{0}} and alternative hypothesis , denoted by H 1 {\textstyle H_{1}} . This is conceptually similar to the judgement in a court trial. The null hypothesis corresponds to
1848-402: Is called the e-value . In practice, the term e-value (a number) is often used when one is really referring to the underlying e-variable (a random variable, that is, a measurable function of the data). A test for a null hypothesis H 0 {\displaystyle H_{0}} is traditionally modeled as a function ϕ {\displaystyle \phi } from
1925-417: Is chosen so that E ≥ 0 {\displaystyle E\geq 0} a.s. Any e-variable can be written in the 1 + λ U {\displaystyle 1+\lambda U} form although with parametric nulls, writing it as a likelihood ratio is usually mathematically more convenient. The 1 + λ U {\displaystyle 1+\lambda U} form on
2002-544: Is classically conveniently summarized as a function ϕ α {\displaystyle \phi _{\alpha }} from the data to { 0 , 1 } {\displaystyle \{0,1\}} that satisfies E P [ ϕ α ] ≤ α , for every P ∈ H 0 {\displaystyle \mathbb {E} ^{P}[\phi _{\alpha }]\leq \alpha ,{\text{ for every }}P\in H_{0}} . Moreover, this
2079-505: Is important to consider the amount of risk one is willing to take to falsely reject H 0 or accept H 0 . The solution to this question would be to report the p-value or significance level α of the statistic. For example, if the p-value of a test statistic result is estimated at 0.0596, then there is a probability of 5.96% that we falsely reject H 0 . Or, if we say, the statistic is performed at level α, like 0.05, then we allow to falsely reject H 0 at 5%. A significance level α of 0.05
2156-405: Is itself an e-value, even if the definition of each e-value is allowed to depend on all previous outcomes, and no matter what rule is used to decide when to stop gathering new samples (e.g. to perform new trials). It follows that, for any significance level 0 < α < 1 {\displaystyle 0<\alpha <1} , if the null is true, then the probability that
2233-1100: Is possible to view this as an alternative definition of an e-value. Under this post-hoc Type-I error, the problem of choosing the significance level α {\displaystyle \alpha } vanishes: we can simply choose the smallest data-dependent level at which we reject the hypothesis by setting it equal to the post-hoc p-value: α ~ = p ′ {\displaystyle {\widetilde {\alpha }}=p^{\prime }} . Indeed, at this data-dependent level we have E [ P ( p ′ ≤ p ′ ∣ p ′ ) p ′ ] = E [ 1 p ′ ] ≤ 1 , {\displaystyle \mathbb {E} \left[{\frac {P(p^{\prime }\leq p^{\prime }\mid p^{\prime })}{p^{\prime }}}\right]=\mathbb {E} \left[{\frac {1}{p^{\prime }}}\right]\leq 1,} since 1 / p ′ {\displaystyle 1/p^{\prime }}
2310-468: Is relatively common, but there is no general rule that fits all scenarios. The speed limit of a freeway in the United States is 120 kilometers per hour (75 mph). A device is set to measure the speed of passing vehicles. Suppose that the device will conduct three measurements of the speed of a passing vehicle, recording as a random sample X 1 , X 2 , X 3 . The traffic police will or will not fine
2387-523: Is shown by first showing that the product e-variables form a nonnegative discrete-time martingale in the filtration generated by Y ( 1 ) , Y ( 2 ) , … {\displaystyle Y_{(1)},Y_{(2)},\ldots } (the individual e-variables are then increments of this martingale). The results then follow as a consequence of Doob's optional stopping theorem and Ville's inequality . Type I and type II errors In statistical hypothesis testing ,
E-values - Misplaced Pages Continue
2464-415: Is sometimes called an error of the first kind. In terms of the courtroom example, a type I error corresponds to convicting an innocent defendant. The second kind of error is the mistaken failure to reject the null hypothesis as the result of a test procedure. This sort of error is called a type II error (false negative) and is also referred to as an error of the second kind. In terms of the courtroom example,
2541-406: Is sometimes generalized to permit external randomization by letting the test ϕ α {\displaystyle \phi _{\alpha }} take value in [ 0 , 1 ] {\displaystyle [0,1]} . Here, its value is interpreted as a probability with which one should subsequently reject the hypothesis. An issue with modelling a test in this manner,
2618-427: Is standard practice for statisticians to conduct tests in order to determine whether or not a "speculative hypothesis " concerning the observed phenomena of the world (or its inhabitants) can be supported. The results of such testing determine whether a particular set of results agrees reasonably (or does not agree) with the speculated hypothesis. On the basis that it is always assumed, by statistical convention, that
2695-428: Is that any weighted average of e-values remains an e-value, even if the individual e-values are arbitrarily dependent. This is one of the reasons why e-values have also turned out to be useful tools in multiple testing . E-values can be interpreted in a number of different ways: first, an e-value can be interpreted as rescaling of a test that is presented on a more appropriate scale that facilitates merging them. Second,
2772-402: Is that the traditional decision space { not reject H 0 , reject H 0 } {\displaystyle \{{\text{not reject }}H_{0},{\text{ reject }}H_{0}\}} or { 0 , 1 } {\displaystyle \{0,1\}} does not encode the level α {\displaystyle \alpha } at which
2849-402: Is the solution." As a consequence of this, in experimental science the null hypothesis is generally a statement that a particular treatment has no effect; in observational science, it is that there is no difference between the value of a particular measured variable, and that of an experimental prediction. If the probability of obtaining a result as extreme as the one obtained, supposing that
2926-449: Is to be either nullified or not nullified by the test. When the null hypothesis is nullified, it is possible to conclude that data support the "alternative hypothesis" (which is the original speculated one). The consistent application by statisticians of Neyman and Pearson's convention of representing "the hypothesis to be tested" (or "the hypothesis to be nullified") with the expression H 0 has led to circumstances where many understand
3003-404: Is uncertainty, there is the possibility of making an error. Considering this, all statistical hypothesis tests have a probability of making type I and type II errors. These two types of error rates are traded off against each other: for any given sample set, the effort to reduce one type of error generally results in increasing the other type of error. The same idea can be expressed in terms of
3080-417: Is valid if it is an e-value. In fact, this reveals that e-values bounded to [ 0 , 1 / α ] {\displaystyle [0,1/\alpha ]} are rescaled randomized tests, that are continuously interpreted as evidence against the hypothesis. The standard e-value that takes value in [ 0 , ∞ ] {\displaystyle [0,\infty ]} appears as
3157-800: The X i {\displaystyle X_{i}} are i.i.d. according to a distribution P {\displaystyle P} with mean μ {\displaystyle \mu } ; no other assumptions about P {\displaystyle P} are made. Then we may first construct a family of e-variables for single outcomes, E i , λ := 1 + λ ( X i − μ ) {\displaystyle E_{i,\lambda }:=1+\lambda (X_{i}-\mu )} , for any λ ∈ [ − 1 / ( 1 − μ ) , 1 / μ ] {\displaystyle \lambda \in [-1/(1-\mu ),1/\mu ]} (these are
E-values - Misplaced Pages Continue
3234-739: The λ {\displaystyle \lambda } for which E i , λ {\displaystyle E_{i,\lambda }} is guaranteed to be nonnegative). We may then define a new e-variable for the complete data vector Y {\displaystyle Y} by taking the product E := ∏ i = 1 n E i , λ ˘ | X i − 1 {\displaystyle E:=\prod _{i=1}^{n}E_{i,{\breve {\lambda }}|X^{i-1}}} , where λ ˘ | X i − 1 {\displaystyle {\breve {\lambda }}|X^{i-1}}
3311-495: The "e-power" or "GRO" sense (see below). Waudby-Smith and Ramdas use this approach to construct "nonparametric" confidence intervals for the mean that tend to be significantly narrower than those based on more classical methods such as Chernoff, Hoeffding and Bernstein bounds . E-values are more suitable than p-value when one expects follow-up tests involving the same null hypothesis with different data or experimental set-ups. This includes, for example, combining individual results in
3388-401: The alpha level could increase the analyses' power. A test statistic is robust if the type I error rate is controlled. Varying different threshold (cut-off) values could also be used to make the test either more specific or more sensitive, which in turn elevates the test quality. For example, imagine a medical test, in which an experimenter might measure the concentration of a certain protein in
3465-429: The alternative hypothesis H 1 {\textstyle H_{1}} may be true, whereas we do not reject H 0 {\textstyle H_{0}} . Two types of error are distinguished: type I error and type II error. The first kind of error is the mistaken rejection of a null hypothesis as the result of a test procedure. This kind of error is called a type I error (false positive) and
3542-443: The blood sample. The experimenter could adjust the threshold (black vertical line in the figure) and people would be diagnosed as having diseases if any number is detected above this certain threshold. According to the image, changing the threshold would result in changes in false positives and false negatives, corresponding to movement on the curve. Since in a real experiment it is impossible to avoid all type I and type II errors, it
3619-404: The chance of rejecting a true hypothesis to as low a value as desired; the test must be so devised that it will reject the hypothesis tested when it is likely to be false. In 1933, they observed that these "problems are rarely presented in such a form that we can discriminate with certainty between the true and false hypothesis". They also noted that, in deciding whether to fail to reject, or reject
3696-421: The complete elimination of either is an impossibility if the outcome is not determined by a known, observable causal process. The knowledge of type I errors and type II errors is widely used in medical science , biometrics and computer science . Type I errors can be thought of as errors of commission (i.e., wrongly including a 'false case'). For instance, consider testing patients for a virus infection. If when
3773-432: The critical region. That is to say, if the recorded speed of a vehicle is greater than critical value 121.9, the driver will be fined. However, there are still 5% of the drivers are falsely fined since the recorded average speed is greater than 121.9 but the true speed does not pass 120, which we say, a type I error. The type II error corresponds to the case that the true speed of a vehicle is over 120 kilometers per hour but
3850-414: The data observed in earlier experiments, and it is not known beforehand how many trials will be conducted: the product e-value remains a meaningful quantity, leading to tests with Type-I error control . For this reason, e-values and their sequential extension, the e-process , are the fundamental building blocks for anytime-valid statistical methods (e.g. confidence sequences). Another advantage over p-values
3927-767: The data to { not reject H 0 , reject H 0 } {\displaystyle \{{\text{not reject }}H_{0},{\text{ reject }}H_{0}\}} . A test ϕ α {\displaystyle \phi _{\alpha }} is said to be valid for level α {\displaystyle \alpha } if P ( ϕ α = reject H 0 ) ≤ α , for every P ∈ H 0 . {\displaystyle P(\phi _{\alpha }={\text{reject }}H_{0})\leq \alpha ,{\text{ for every }}P\in H_{0}.} This
SECTION 50
#17328761357894004-631: The driver is not fined. For example, if the true speed of a vehicle μ=125, the probability that the driver is not fined can be calculated as P = ( T < 121.9 | μ = 125 ) = P ( T − 125 2 3 < 121.9 − 125 2 3 ) = ϕ ( − 2.68 ) = 0.0036 {\displaystyle P=(T<121.9|\mu =125)=P\left({\frac {T-125}{\frac {2}{\sqrt {3}}}}<{\frac {121.9-125}{\frac {2}{\sqrt {3}}}}\right)=\phi (-2.68)=0.0036} which means, if
4081-400: The drivers depending on the average speed X ¯ {\displaystyle {\bar {X}}} . That is to say, the test statistic T = X 1 + X 2 + X 3 3 = X ¯ {\displaystyle T={\frac {X_{1}+X_{2}+X_{3}}{3}}={\bar {X}}} In addition, we suppose that
4158-408: The expected gain of buying the ticket is the pay-off minus the cost, i.e. E − 1 {\displaystyle E-1} , which has expectation ≤ 0 {\displaystyle \leq 0} . Based on this interpretation, the product e-value for a sequence of tests can be interpreted as the amount of money you have gained by sequentially betting with pay-offs given by
4235-514: The facts a chance of disproving the null hypothesis. In the practice of medicine, the differences between the applications of screening and testing are considerable. Screening involves relatively cheap tests that are given to large populations, none of whom manifest any clinical indication of disease (e.g., Pap smears ). Testing involves far more expensive, often invasive, procedures that are given only to those who manifest some clinical indication of disease, and are most often applied to confirm
4312-582: The individual e-variables and always re-investing all your gains. The betting interpretation becomes particularly visible if we rewrite an e-variable as E := 1 + λ U {\displaystyle E:=1+\lambda U} where U {\displaystyle U} has expectation ≤ 0 {\displaystyle \leq 0} under all P ∈ H 0 {\displaystyle P\in H_{0}} and λ ∈ R {\displaystyle \lambda \in {\mathbb {R} }}
4389-425: The inequality (*) above is usually an equality (with continuous-valued data) or near-equality (with discrete data), this is not the case with e-variables. This makes e-value-based tests more conservative (less power) than those based on standard p-values. In exchange for this conservativeness, the p-value p ′ = 1 / E {\displaystyle p^{\prime }=1/E} comes with
4466-674: The likelihood ratio are (a) the generalized likelihood ratio as used in the standard, classical likelihood ratio test and (b) the Bayes factor . Importantly, neither (a) nor (b) are e-variables in general: generalized likelihood ratios in sense (a) are not e-variables unless the alternative is simple (see below under "universal inference"). Bayes factors are e-variables if the null is simple. To see this, note that, if Q = { Q θ : θ ∈ Θ } {\displaystyle {\mathcal {Q}}=\{Q_{\theta }:\theta \in \Theta \}} represents
4543-853: The main innovation of the e-value compared to traditional testing is to maximize a different power target. For any e-variable E {\displaystyle E} and any 0 < α ≤ 1 {\displaystyle 0<\alpha \leq 1} and all P ∈ H 0 {\displaystyle P\in H_{0}} , it holds that P ( E ≥ 1 α ) = P ( 1 / E ≤ α ) ≤ ( ∗ ) α {\displaystyle P\left(E\geq {\frac {1}{\alpha }}\right)=P(1/E\leq \alpha )\ {\overset {(*)}{\leq }}\ \alpha } . This means p ′ = 1 / E {\displaystyle p^{\prime }=1/E}
4620-403: The measurements X 1 , X 2 , X 3 are modeled as normal distribution N(μ,2). Then, T should follow N(μ,2/ 3 {\displaystyle {\sqrt {3}}} ) and the parameter μ represents the true speed of passing vehicle. In this experiment, the null hypothesis H 0 and the alternative hypothesis H 1 should be H 0 : μ=120 against H 1 : μ>120. If we perform
4697-445: The null hypothesis were true, is lower than a pre-specified cut-off probability (for example, 5%), then the result is said to be statistically significant and the null hypothesis is rejected. British statistician Sir Ronald Aylmer Fisher (1890–1962) stressed that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give
SECTION 60
#17328761357894774-683: The null hypothesis; "false" means the conclusion drawn is incorrect. Thus, a type I error is equivalent to a false positive, and a type II error is equivalent to a false negative. Tabulated relations between truth/falseness of the null hypothesis and outcomes of the test: (probability = 1 − α {\textstyle 1-\alpha } ) (probability = 1 − β {\textstyle 1-\beta } ) A perfect test would have zero false positives and zero false negatives. However, statistical methods are probabilistic, and it cannot be known for certain whether statistical conclusions are correct. Whenever there
4851-519: The other hand is often more convenient in nonparametric settings. As a prototypical example, consider the case that Y = ( X 1 , … , X n ) {\displaystyle Y=(X_{1},\ldots ,X_{n})} with the X i {\displaystyle X_{i}} taking values in the bounded interval [ 0 , 1 ] {\displaystyle [0,1]} . According to H 0 {\displaystyle H_{0}} ,
4928-406: The patient is not infected with the virus, but the test shows that they do, this is considered a type I error. By contrast, type II errors are errors of omission (i.e, wrongly leaving out a 'true case'). In the example above, if the patient is infected by the virus, but the test shows that they are not, that would be a type II error. In statistical test theory , the notion of a statistical error
5005-428: The position of the defendant: just as he is presumed to be innocent until proven guilty, so is the null hypothesis presumed to be true until the data provide convincing evidence against it. The alternative hypothesis corresponds to the position against the defendant. Specifically, the null hypothesis also involves the absence of a difference or the absence of an association. Thus, the null hypothesis can never be that there
5082-898: The probability under Q {\displaystyle Q} that ε α = 1 / α {\displaystyle \varepsilon _{\alpha }=1/\alpha } . But if we continuously interpret the value of the test ε α {\displaystyle \varepsilon _{\alpha }} as evidence against the hypothesis, then we may also be interested in maximizing different targets such as E Q [ log ε α ] {\displaystyle \mathbb {E} ^{Q}[\log \varepsilon _{\alpha }]} . This yields tests that are remarkably different from traditional Neyman-Pearson tests, and more suitable when merged through multiplication as they are positive with probability 1 under Q {\displaystyle Q} . From this angle,
5159-400: The rate of correct results and therefore used to minimize error rates and improve the quality of hypothesis test. To reduce the probability of committing a type I error, making the alpha value more stringent is both simple and efficient. To decrease the probability of committing a type II error, which is closely associated with analyses' power, either increasing the test's sample size or relaxing
5236-495: The reciprocal of an e-value is a p-value, but not just any p-value: a special p-value for which a rejection `at level p' retains a generalized Type-I error guarantee. Third, they are broad generalizations of likelihood ratios and are also related to, yet distinct from, Bayes factors . Fourth, they have an interpretation as bets. Fifth, in a sequential context, they can also be interpreted as increments of nonnegative supermartingales . Interest in e-values has exploded since 2019, when
5313-424: The significance level after observing the data: post-hoc. A p-value that satisfies this guarantee is also called a post-hoc p-value . As p ′ {\displaystyle p^{\prime }} is a post-hoc p-value if and only if p ′ = 1 / E {\displaystyle p^{\prime }=1/E} for some e-value E {\displaystyle E} , it
5390-424: The speculated hypothesis is wrong, and the so-called "null hypothesis" that the observed phenomena simply occur by chance (and that, as a consequence, the speculated agent has no effect) – the test will determine whether this hypothesis is right or wrong. This is why the hypothesis under test is often called the null hypothesis (most likely, coined by Fisher (1935, p. 19)), because it is this hypothesis that
5467-617: The statistic level at α=0.05, then a critical value c should be calculated to solve P ( Z ⩾ c − 120 2 3 ) = 0.05 {\displaystyle P\left(Z\geqslant {\frac {c-120}{\frac {2}{\sqrt {3}}}}\right)=0.05} According to change-of-units rule for the normal distribution. Referring to Z-table , we can get c − 120 2 3 = 1.645 ⇒ c = 121.9 {\displaystyle {\frac {c-120}{\frac {2}{\sqrt {3}}}}=1.645\Rightarrow c=121.9} Here,
5544-409: The term "the null hypothesis" as meaning "the nil hypothesis" – a statement that the results in question have arisen through chance. This is not necessarily the case – the key restriction, as per Fisher (1966), is that "the null hypothesis must be exact, that is free from vagueness and ambiguity, because it must supply the basis of the 'problem of distribution', of which the test of significance
5621-562: The term 'e-value' was coined and a number of breakthrough results were achieved by several research groups. The first overview article appeared in 2023. Let the null hypothesis H 0 {\displaystyle H_{0}} be given as a set of distributions for data Y {\displaystyle Y} . Usually Y = ( X 1 , … , X τ ) {\displaystyle Y=(X_{1},\ldots ,X_{\tau })} with each X i {\displaystyle X_{i}}
5698-974: The test ϕ α {\displaystyle \phi _{\alpha }} rejects. This is odd at best, because a rejection at level 1% is a much stronger claim than a rejection at level 10%. A more suitable decision space seems to be { not reject H 0 , reject H 0 at level α } {\displaystyle \{{\text{not reject }}H_{0},{\text{ reject }}H_{0}{\text{ at level }}\alpha \}} . The e-value can be interpreted as resolving this problem. Indeed, we can rescale from { 0 , 1 } {\displaystyle \{0,1\}} to { 0 , 1 / α } {\displaystyle \{0,1/\alpha \}} and [ 0 , 1 ] {\displaystyle [0,1]} to [ 0 , 1 / α ] {\displaystyle [0,1/\alpha ]} by rescaling
5775-692: The test by its level: ε α = ϕ α / α {\displaystyle \varepsilon _{\alpha }=\phi _{\alpha }/\alpha } , where we denote a test on this evidence scale by ε α {\displaystyle \varepsilon _{\alpha }} to avoid confusion. Such a test is then valid if E P [ ε α ] ≤ 1 , for every P ∈ H 0 {\displaystyle \mathbb {E} ^{P}[\varepsilon _{\alpha }]\leq 1,{\text{ for every }}P\in H_{0}} . That is: it
5852-423: The traffic police do not want to falsely fine innocent drivers, the level α can be set to a smaller value, like 0.01. However, if that is the case, more drivers whose true speed is over 120 kilometers per hour, like 125, would be more likely to avoid the fine. In 1928, Jerzy Neyman (1894–1981) and Egon Pearson (1895–1980), both eminent statisticians, discussed the problems associated with "deciding whether or not
5929-419: The true speed of a vehicle is 125, the driver has the probability of 0.36% to avoid the fine when the statistic is performed at level α=0.05, since the recorded average speed is lower than 121.9. If the true speed is closer to 121.9 than 125, then the probability of avoiding the fine will also be higher. The tradeoffs between type I error and type II error should also be considered. That is, in this case, if
#788211