Misplaced Pages

Cusum

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

In statistical quality control , the CUSUM (or cumulative sum control chart ) is a sequential analysis technique developed by E. S. Page of the University of Cambridge . It is typically used for monitoring change detection . CUSUM was announced in Biometrika , in 1954, a few years after the publication of Wald 's sequential probability ratio test (SPRT).

#200799

62-461: Cusum may refer to: CUSUM , a technique in statistical quality control Cusum (Pannonia) , an ancient Roman city in today's Petrovaradin Topics referred to by the same term [REDACTED] This disambiguation page lists articles associated with the title Cusum . If an internal link led you here, you may wish to change the link to point directly to

124-492: A r g m a x θ ⁡ 1 h L ( θ ∣ x ∈ [ x j , x j + h ] ) , {\displaystyle \mathop {\operatorname {arg\,max} } _{\theta }{\mathcal {L}}(\theta \mid x\in [x_{j},x_{j}+h])=\mathop {\operatorname {arg\,max} } _{\theta }{\frac {1}{h}}{\mathcal {L}}(\theta \mid x\in [x_{j},x_{j}+h]),} since h {\textstyle h}

186-661: A r g m a x θ ⁡ 1 h ∫ x j x j + h f ( x ∣ θ ) d x , {\displaystyle \mathop {\operatorname {arg\,max} } _{\theta }{\frac {1}{h}}{\mathcal {L}}(\theta \mid x\in [x_{j},x_{j}+h])=\mathop {\operatorname {arg\,max} } _{\theta }{\frac {1}{h}}\Pr(x_{j}\leq x\leq x_{j}+h\mid \theta )=\mathop {\operatorname {arg\,max} } _{\theta }{\frac {1}{h}}\int _{x_{j}}^{x_{j}+h}f(x\mid \theta )\,dx,} where f ( x ∣ θ ) {\textstyle f(x\mid \theta )}

248-420: A r g m a x θ ⁡ L ( θ ∣ x j ) = a r g m a x θ ⁡ [ lim h → 0 + L ( θ ∣ x ∈ [ x j , x j + h ] ) ] =

310-471: A r g m a x θ ⁡ L ( θ ∣ x j ) = a r g m a x θ ⁡ f ( x j ∣ θ ) , {\displaystyle \mathop {\operatorname {arg\,max} } _{\theta }{\mathcal {L}}(\theta \mid x_{j})=\mathop {\operatorname {arg\,max} } _{\theta }f(x_{j}\mid \theta ),} and so maximizing

372-933: A r g m a x θ ⁡ [ lim h → 0 + 1 h ∫ x j x j + h f ( x ∣ θ ) d x ] = a r g m a x θ ⁡ f ( x j ∣ θ ) . {\displaystyle {\begin{aligned}&\mathop {\operatorname {arg\,max} } _{\theta }{\mathcal {L}}(\theta \mid x_{j})=\mathop {\operatorname {arg\,max} } _{\theta }\left[\lim _{h\to 0^{+}}{\mathcal {L}}(\theta \mid x\in [x_{j},x_{j}+h])\right]\\[4pt]={}&\mathop {\operatorname {arg\,max} } _{\theta }\left[\lim _{h\to 0^{+}}{\frac {1}{h}}\int _{x_{j}}^{x_{j}+h}f(x\mid \theta )\,dx\right]=\mathop {\operatorname {arg\,max} } _{\theta }f(x_{j}\mid \theta ).\end{aligned}}} Therefore,

434-447: A conclusion which could only be reached via Bayes' theorem given knowledge about the marginal probabilities P ( p H = 0.5 ) {\textstyle P(p_{\text{H}}=0.5)} and P ( HH ) {\textstyle P({\text{HH}})} . Now suppose that the coin is not a fair coin, but instead that p H = 0.3 {\textstyle p_{\text{H}}=0.3} . Then

496-407: A density f ( x ∣ θ ) {\textstyle f(x\mid \theta )} , where the sum of all the p {\textstyle p} 's added to the integral of f {\textstyle f} is always one. Assuming that it is possible to distinguish an observation corresponding to one of the discrete probability masses from one which corresponds to

558-534: A given significance level . Numerous other tests can be viewed as likelihood-ratio tests or approximations thereof. The asymptotic distribution of the log-likelihood ratio, considered as a test statistic, is given by Wilks' theorem . The likelihood ratio is also of central importance in Bayesian inference , where it is known as the Bayes factor , and is used in Bayes' rule . Stated in terms of odds , Bayes' rule states that

620-464: A parameter θ {\textstyle \theta } . Then the function L ( θ ∣ x ) = f θ ( x ) , {\displaystyle {\mathcal {L}}(\theta \mid x)=f_{\theta }(x),} considered as a function of θ {\textstyle \theta } , is the likelihood function (of θ {\textstyle \theta } , given

682-440: A parameter θ {\textstyle \theta } . Then the function L ( θ ∣ x ) = p θ ( x ) = P θ ( X = x ) , {\displaystyle {\mathcal {L}}(\theta \mid x)=p_{\theta }(x)=P_{\theta }(X=x),} considered as a function of θ {\textstyle \theta } ,

SECTION 10

#1732859031201

744-643: A parameter of the probability distribution ; for example, the mean . He devised CUSUM as a method to determine changes in it, and proposed a criterion for deciding when to take corrective action. When the CUSUM method is applied to changes in mean, it can be used for step detection of a time series . A few years later, George Alfred Barnard developed a visualization method, the V-mask chart, to detect both increases and decreases in θ {\displaystyle \theta } . As its name implies, CUSUM involves

806-476: A process with a mean of 0 and a standard deviation of 0.5. From the Z {\displaystyle Z} column, it can be seen that X {\displaystyle X} never deviates by 3 standard deviations ( 3 σ {\displaystyle 3\sigma } ), so simply alerting on a high deviation will not detect a failure, whereas CUSUM shows that the S H {\displaystyle S_{H}} value exceeds 4 at

868-428: A related method. Likelihood function A likelihood function (often simply called the likelihood ) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the joint probability distribution of the random variable that (presumably) generated the observations. When evaluated on

930-500: Is negative definite for every θ ∈ Θ {\textstyle \,\theta \in \Theta \,} at which the gradient ∇ L ≡ [ ∂ L ∂ θ i ] i = 1 n i {\textstyle \;\nabla L\equiv \left[\,{\frac {\partial L}{\,\partial \theta _{i}\,}}\,\right]_{i=1}^{n_{\mathrm {i} }}\;} vanishes, and if

992-636: Is positive definite and | I ( θ ) | {\textstyle \,\left|\mathbf {I} (\theta )\right|\,} is finite. This ensures that the score has a finite variance. The above conditions are sufficient, but not necessary. That is, a model that does not meet these regularity conditions may or may not have a maximum likelihood estimator of the properties mentioned above. Further, in case of non-independently or non-identically distributed observations additional properties may need to be assumed. In Bayesian statistics, almost identical regularity conditions are imposed on

1054-570: Is assumed that the information matrix , I ( θ ) = ∫ − ∞ ∞ ∂ log ⁡ f ∂ θ r   ∂ log ⁡ f ∂ θ s   f   d z {\displaystyle \mathbf {I} (\theta )=\int _{-\infty }^{\infty }{\frac {\partial \log f}{\partial \theta _{r}}}\ {\frac {\partial \log f}{\partial \theta _{s}}}\ f\ \mathrm {d} z}

1116-494: Is central to likelihoodist statistics : the law of likelihood states that degree to which data (considered as evidence) supports one parameter value versus another is measured by the likelihood ratio. In frequentist inference , the likelihood ratio is the basis for a test statistic , the so-called likelihood-ratio test . By the Neyman–Pearson lemma , this is the most powerful test for comparing two simple hypotheses at

1178-416: Is defined to be R ( θ ) = L ( θ ∣ x ) L ( θ ^ ∣ x ) . {\displaystyle R(\theta )={\frac {{\mathcal {L}}(\theta \mid x)}{{\mathcal {L}}({\hat {\theta }}\mid x)}}.} Thus, the relative likelihood is the likelihood ratio (discussed above) with

1240-768: Is discussed below). Given a probability density or mass function x ↦ f ( x ∣ θ ) , {\displaystyle x\mapsto f(x\mid \theta ),} where x {\textstyle x} is a realization of the random variable X {\textstyle X} , the likelihood function is θ ↦ f ( x ∣ θ ) , {\displaystyle \theta \mapsto f(x\mid \theta ),} often written L ( θ ∣ x ) . {\displaystyle {\mathcal {L}}(\theta \mid x).} In other words, when f ( x ∣ θ ) {\textstyle f(x\mid \theta )}

1302-452: Is given by L ( θ ∣ x ∈ [ x j , x j + h ] ) {\textstyle {\mathcal {L}}(\theta \mid x\in [x_{j},x_{j}+h])} . Observe that a r g m a x θ ⁡ L ( θ ∣ x ∈ [ x j , x j + h ] ) =

SECTION 20

#1732859031201

1364-455: Is not directly used in AIC-based statistics. Instead, what is used is the relative likelihood of models (see below). In evidence-based medicine , likelihood ratios are used in diagnostic testing to assess the value of performing a diagnostic test . Since the actual value of the likelihood function depends on the sample, it is often convenient to work with a standardized measure. Suppose that

1426-433: Is often avoided and instead f ( x ; θ ) {\textstyle f(x;\theta )} or f ( x , θ ) {\textstyle f(x,\theta )} are used to indicate that θ {\textstyle \theta } is regarded as a fixed unknown quantity rather than as a random variable being conditioned on. The likelihood function does not specify

1488-481: Is positive and constant. Because a r g m a x θ ⁡ 1 h L ( θ ∣ x ∈ [ x j , x j + h ] ) = a r g m a x θ ⁡ 1 h Pr ( x j ≤ x ≤ x j + h ∣ θ ) =

1550-406: Is such that ∫ − ∞ ∞ H r s t ( z ) d z ≤ M < ∞ . {\textstyle \,\int _{-\infty }^{\infty }H_{rst}(z)\mathrm {d} z\leq M<\infty \;.} This boundedness of the derivatives is needed to allow for differentiation under the integral sign . And lastly, it

1612-456: Is the likelihood function , given the outcome x {\textstyle x} of the random variable X {\textstyle X} . Sometimes the probability of "the value x {\textstyle x} of X {\textstyle X} for the parameter value θ {\textstyle \theta }   " is written as P ( X = x | θ ) or P ( X = x ; θ ) . The likelihood

1674-422: Is the index of the discrete probability mass corresponding to observation x {\textstyle x} , because maximizing the probability mass (or probability) at x {\textstyle x} amounts to maximizing the likelihood of the specific observation. The fact that the likelihood function can be defined in a way that includes contributions that are not commensurate (the density and

1736-505: Is the posterior probability of θ {\textstyle \theta } given the data x {\textstyle x} . Consider a simple statistical model of a coin flip: a single parameter p H {\textstyle p_{\text{H}}} that expresses the "fairness" of the coin. The parameter is the probability that a coin lands heads up ("H") when tossed. p H {\textstyle p_{\text{H}}} can take on any value within

1798-1196: Is the probability density function, it follows that a r g m a x θ ⁡ L ( θ ∣ x ∈ [ x j , x j + h ] ) = a r g m a x θ ⁡ 1 h ∫ x j x j + h f ( x ∣ θ ) d x . {\displaystyle \mathop {\operatorname {arg\,max} } _{\theta }{\mathcal {L}}(\theta \mid x\in [x_{j},x_{j}+h])=\mathop {\operatorname {arg\,max} } _{\theta }{\frac {1}{h}}\int _{x_{j}}^{x_{j}+h}f(x\mid \theta )\,dx.} The first fundamental theorem of calculus provides that lim h → 0 + 1 h ∫ x j x j + h f ( x ∣ θ ) d x = f ( x j ∣ θ ) . {\displaystyle \lim _{h\to 0^{+}}{\frac {1}{h}}\int _{x_{j}}^{x_{j}+h}f(x\mid \theta )\,dx=f(x_{j}\mid \theta ).} Then

1860-661: Is the probability that a particular outcome x {\textstyle x} is observed when the true value of the parameter is θ {\textstyle \theta } , equivalent to the probability mass on x {\textstyle x} ; it is not a probability density over the parameter θ {\textstyle \theta } . The likelihood, L ( θ ∣ x ) {\textstyle {\mathcal {L}}(\theta \mid x)} , should not be confused with P ( θ ∣ x ) {\textstyle P(\theta \mid x)} , which

1922-440: Is this density interpreted as a function of the parameter, rather than the random variable. Thus, we can construct a likelihood function for any distribution, whether discrete, continuous, a mixture, or otherwise. (Likelihoods are comparable, e.g. for parameter estimation, only if they are Radon–Nikodym derivatives with respect to the same dominating measure.) The above discussion of the likelihood for discrete random variables uses

Cusum - Misplaced Pages Continue

1984-485: Is viewed as a function of x {\textstyle x} with θ {\textstyle \theta } fixed, it is a probability density function, and when viewed as a function of θ {\textstyle \theta } with x {\textstyle x} fixed, it is a likelihood function. In the frequentist paradigm , the notation f ( x ∣ θ ) {\textstyle f(x\mid \theta )}

2046-431: The counting measure , under which the probability density at any outcome equals the probability of that outcome. The above can be extended in a simple way to allow consideration of distributions which contain both discrete and continuous components. Suppose that the distribution consists of a number of discrete probability masses p k ( θ ) {\textstyle p_{k}(\theta )} and

2108-523: The matrix of second partials H ( θ ) ≡ [ ∂ 2 L ∂ θ i ∂ θ j ] i , j = 1 , 1 n i , n j {\displaystyle \mathbf {H} (\theta )\equiv \left[\,{\frac {\partial ^{2}L}{\,\partial \theta _{i}\,\partial \theta _{j}\,}}\,\right]_{i,j=1,1}^{n_{\mathrm {i} },n_{\mathrm {j} }}\;}

2170-407: The maximum likelihood estimate for the parameter θ is θ ^ {\textstyle {\hat {\theta }}} . Relative plausibilities of other θ values may be found by comparing the likelihoods of those other values with the likelihood of θ ^ {\textstyle {\hat {\theta }}} . The relative likelihood of θ

2232-426: The outcome X = x {\textstyle X=x} ). Again, L {\textstyle {\mathcal {L}}} is not a probability density or mass function over θ {\textstyle \theta } , despite being a function of θ {\textstyle \theta } given the observation X = x {\textstyle X=x} . The use of

2294-713: The posterior odds of two alternatives, ⁠ A 1 {\displaystyle A_{1}} ⁠ and ⁠ A 2 {\displaystyle A_{2}} ⁠ , given an event ⁠ B {\displaystyle B} ⁠ , is the prior odds, times the likelihood ratio. As an equation: O ( A 1 : A 2 ∣ B ) = O ( A 1 : A 2 ) ⋅ Λ ( A 1 : A 2 ∣ B ) . {\displaystyle O(A_{1}:A_{2}\mid B)=O(A_{1}:A_{2})\cdot \Lambda (A_{1}:A_{2}\mid B).} The likelihood ratio

2356-404: The probability density in specifying the likelihood function above is justified as follows. Given an observation x j {\textstyle x_{j}} , the likelihood for the interval [ x j , x j + h ] {\textstyle [x_{j},x_{j}+h]} , where h > 0 {\textstyle h>0} is a constant,

2418-422: The 17th observation. where ω {\displaystyle \omega } is a critical level parameter (tunable, same as threshold T) that's used to adjust the sensitivity of change detection: larger ω {\displaystyle \omega } makes CUSUM less sensitive to the change and vice versa. [REDACTED] [REDACTED] Cumulative observed-minus-expected plots are

2480-469: The actual data points, it becomes a function solely of the model parameters. In maximum likelihood estimation , the argument that maximizes the likelihood function serves as a point estimate for the unknown parameter, while the Fisher information (often approximated by the likelihood's Hessian matrix at the maximum) gives an indication of the estimate's precision . In contrast, in Bayesian statistics ,

2542-417: The calculation of a cumulative sum (which is what makes it "sequential"). Samples from a process x n {\displaystyle x_{n}} are assigned weights ω n {\displaystyle \omega _{n}} , and summed as follows: When the value of S exceeds a certain threshold value, a change in value has been found. The above formula only detects changes in

Cusum - Misplaced Pages Continue

2604-599: The corresponding likelihood. The result of such calculations is displayed in Figure ;1. The integral of L {\textstyle {\mathcal {L}}} over [0, 1] is 1/3; likelihoods need not integrate or sum to one over the parameter space. Let X {\textstyle X} be a random variable following an absolutely continuous probability distribution with density function f {\textstyle f} (a function of x {\textstyle x} ) which depends on

2666-499: The density component, the likelihood function for an observation from the continuous component can be dealt with in the manner shown above. For an observation from the discrete component, the likelihood function for an observation from the discrete component is simply L ( θ ∣ x ) = p k ( θ ) , {\displaystyle {\mathcal {L}}(\theta \mid x)=p_{k}(\theta ),} where k {\textstyle k}

2728-435: The estimate of interest is the converse of the likelihood, the so-called posterior probability of the parameter given the observed data, which is calculated via Bayes' rule . The likelihood function, parameterized by a (possibly multivariate) parameter θ {\textstyle \theta } , is usually defined differently for discrete and continuous probability distributions (a more general definition

2790-1178: The existence of a Taylor expansion . Second, for almost all x {\textstyle x} and for every θ ∈ Θ {\textstyle \,\theta \in \Theta \,} it must be that | ∂ f ∂ θ r | < F r ( x ) , | ∂ 2 f ∂ θ r ∂ θ s | < F r s ( x ) , | ∂ 3 f ∂ θ r ∂ θ s ∂ θ t | < H r s t ( x ) {\displaystyle \left|{\frac {\partial f}{\partial \theta _{r}}}\right|<F_{r}(x)\,,\quad \left|{\frac {\partial ^{2}f}{\partial \theta _{r}\,\partial \theta _{s}}}\right|<F_{rs}(x)\,,\quad \left|{\frac {\partial ^{3}f}{\partial \theta _{r}\,\partial \theta _{s}\,\partial \theta _{t}}}\right|<H_{rst}(x)} where H {\textstyle H}

2852-462: The existence of a global maximum of the likelihood function is of the utmost importance. By the extreme value theorem , it suffices that the likelihood function is continuous on a compact parameter space for the maximum likelihood estimator to exist. While the continuity assumption is usually met, the compactness assumption about the parameter space is often not, as the bounds of the true parameter values might be unknown. In that case, concavity of

2914-400: The expense incurred by the scheme when it gives false alarms, i.e., Type I errors ( Neyman & Pearson , 1936 ). On the other hand, for constant poor quality the A.R.L. measures the delay and thus the amount of scrap produced before the rectifying action is taken, i.e., Type II errors . The following example shows 20 observations X {\displaystyle X} of

2976-413: The fixed denominator L ( θ ^ ) {\textstyle {\mathcal {L}}({\hat {\theta }})} . This corresponds to standardizing the likelihood to have a maximum of 1. A likelihood region is the set of all values of θ whose relative likelihood is greater than or equal to a given threshold. In terms of percentages, a p % likelihood region for θ

3038-443: The intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=Cusum&oldid=1167272730 " Category : Disambiguation pages Hidden categories: Short description is different from Wikidata All article disambiguation pages All disambiguation pages CUSUM E. S. Page referred to a "quality number" θ {\displaystyle \theta } , by which he meant

3100-405: The likelihood function approaches a constant on the boundary of the parameter space, ∂ Θ , {\textstyle \;\partial \Theta \;,} i.e., lim θ → ∂ Θ L ( θ ) = 0 , {\displaystyle \lim _{\theta \to \partial \Theta }L(\theta )=0\;,} which may include

3162-715: The likelihood function in order to proof asymptotic normality of the posterior probability , and therefore to justify a Laplace approximation of the posterior in large samples. A likelihood ratio is the ratio of any two specified likelihoods, frequently written as: Λ ( θ 1 : θ 2 ∣ x ) = L ( θ 1 ∣ x ) L ( θ 2 ∣ x ) . {\displaystyle \Lambda (\theta _{1}:\theta _{2}\mid x)={\frac {{\mathcal {L}}(\theta _{1}\mid x)}{{\mathcal {L}}(\theta _{2}\mid x)}}.} The likelihood ratio

SECTION 50

#1732859031201

3224-516: The likelihood function plays a key role. More specifically, if the likelihood function is twice continuously differentiable on the k -dimensional parameter space Θ {\textstyle \Theta } assumed to be an open connected subset of R k , {\textstyle \mathbb {R} ^{k}\,,} there exists a unique maximum θ ^ ∈ Θ {\textstyle {\hat {\theta }}\in \Theta } if

3286-501: The likelihood of observing "HH" assuming p H = 0.5 {\textstyle p_{\text{H}}=0.5} is L ( p H = 0.5 ∣ HH ) = 0.25. {\displaystyle {\mathcal {L}}(p_{\text{H}}=0.5\mid {\text{HH}})=0.25.} This is not the same as saying that P ( p H = 0.5 ∣ H H ) = 0.25 {\textstyle P(p_{\text{H}}=0.5\mid HH)=0.25} ,

3348-407: The lower "holding barrier" rather than a lower "holding barrier". Also, CUSUM does not require the use of the likelihood function. As a means of assessing CUSUM's performance, Page defined the average run length (A.R.L.) metric ; "the expected number of articles sampled before action is taken." He further wrote: When the quality of the output is satisfactory the A.R.L. is a measure of

3410-428: The points at infinity if Θ {\textstyle \,\Theta \,} is unbounded. Mäkeläinen and co-authors prove this result using Morse theory while informally appealing to a mountain pass property. Mascarenhas restates their proof using the mountain pass theorem . In the proofs of consistency and asymptotic normality of the maximum likelihood estimator, additional assumptions are made about

3472-467: The positive direction. When negative changes need to be found as well, the min operation should be used instead of the max operation, and this time a change has been found when the value of S is below the (negative) value of the threshold value. Page did not explicitly say that ω {\displaystyle \omega } represents the likelihood function , but this is common usage. This differs from SPRT by always using zero function as

3534-1218: The probability densities that form the basis of a particular likelihood function. These conditions were first established by Chanda. In particular, for almost all x {\textstyle x} , and for all θ ∈ Θ , {\textstyle \,\theta \in \Theta \,,} ∂ log ⁡ f ∂ θ r , ∂ 2 log ⁡ f ∂ θ r ∂ θ s , ∂ 3 log ⁡ f ∂ θ r ∂ θ s ∂ θ t {\displaystyle {\frac {\partial \log f}{\partial \theta _{r}}}\,,\quad {\frac {\partial ^{2}\log f}{\partial \theta _{r}\partial \theta _{s}}}\,,\quad {\frac {\partial ^{3}\log f}{\partial \theta _{r}\,\partial \theta _{s}\,\partial \theta _{t}}}\,} exist for all r , s , t = 1 , 2 , … , k {\textstyle \,r,s,t=1,2,\ldots ,k\,} in order to ensure

3596-478: The probability density at x j {\textstyle x_{j}} amounts to maximizing the likelihood of the specific observation x j {\textstyle x_{j}} . In measure-theoretic probability theory , the density function is defined as the Radon–Nikodym derivative of the probability distribution relative to a common dominating measure. The likelihood function

3658-624: The probability mass) arises from the way in which the likelihood function is defined up to a constant of proportionality, where this "constant" can change with the observation x {\textstyle x} , but not with the parameter θ {\textstyle \theta } . In the context of parameter estimation, the likelihood function is usually assumed to obey certain conditions, known as regularity conditions. These conditions are assumed in various proofs involving likelihood functions, and need to be verified in each particular application. For maximum likelihood estimation,

3720-557: The probability of two heads on two flips is P ( HH ∣ p H = 0.3 ) = 0.3 2 = 0.09. {\displaystyle P({\text{HH}}\mid p_{\text{H}}=0.3)=0.3^{2}=0.09.} Hence L ( p H = 0.3 ∣ HH ) = 0.09. {\displaystyle {\mathcal {L}}(p_{\text{H}}=0.3\mid {\text{HH}})=0.09.} More generally, for each value of p H {\textstyle p_{\text{H}}} , we can calculate

3782-456: The probability that θ {\textstyle \theta } is the truth, given the observed sample X = x {\textstyle X=x} . Such an interpretation is a common error, with potentially disastrous consequences (see prosecutor's fallacy ). Let X {\textstyle X} be a discrete random variable with probability mass function p {\textstyle p} depending on

SECTION 60

#1732859031201

3844-525: The range 0.0 to 1.0. For a perfectly fair coin , p H = 0.5 {\textstyle p_{\text{H}}=0.5} . Imagine flipping a fair coin twice, and observing two heads in two tosses ("HH"). Assuming that each successive coin flip is i.i.d. , then the probability of observing HH is P ( HH ∣ p H = 0.5 ) = 0.5 2 = 0.25. {\displaystyle P({\text{HH}}\mid p_{\text{H}}=0.5)=0.5^{2}=0.25.} Equivalently,

#200799