The Lincoln index is a statistical measure used in several fields to estimate the population size of an animal species. Described by Frederick Charles Lincoln in 1930, it is also sometimes known as the Lincoln-Petersen method after C.G. Johannes Petersen who was the first to use the related mark and recapture method.
43-457: Consider two observers who separately count the different species of plants or animals in a given area. If they each come back having found 100 species but only 5 particular species are found by both observers, then each observer clearly missed at least 95 species (that is, the 95 that only the other observer found). Thus, we know that both observers miss a lot. On the other hand, if 99 of the 100 species each observer found had been found by both, it
86-417: A particular project, making it important to obtain or generate a cost estimate as one of the vital elements of entering into the project. The U.S. Government Accountability Office defines a cost estimate as, "the summation of individual cost elements, using established methods and valid data, to estimate the future costs of a program, based on what is known today", and reports that "realistic cost estimating
129-413: A passenger with the disease came from such an area, where q >0.5), or low rates (probability 1− q ). It was posited that only 5 out of 100 of the travelers could be detected, and 10 out of 100 were from the high risk area. Then the capture probability P was defined as: where the first term refers to the probability of detection (capture probability) in a high risk zone, and the latter term refers to
172-547: A tag or band during the second visit and then are released. Population size can be estimated from as few as two visits to the study area. Commonly, more than two visits are made, particularly if estimates of survival or movement are desired. Regardless of the total number of visits, the researcher simply records the date of each capture of each individual. The "capture histories" generated are analyzed mathematically to estimate population size, survival, or movement. When capturing and marking organisms, ecologists need to consider
215-496: Is (n, K, k) = (10, 15, 5). The problem is to estimate N . The Lincoln–Petersen method (also known as the Petersen–Lincoln index or Lincoln index ) can be used to estimate population size if only two visits are made to the study area. This method assumes that the study population is "closed". In other words, the two visits to the study area are close enough in time so that no individuals die, are born, or move into or out of
258-461: Is an example of linear optimization . In more complex cases, where more than one resource f is devoted to more than two areas, multivariate optimization is often used, through the simplex algorithm or its derivatives. The literature on the analysis of capture-recapture studies has blossomed since the early 1990s . There are very elaborate statistical models available for the analysis of these experiments. A simple model which easily accommodates
301-516: Is biased at small sample sizes. An alternative less biased estimator of population size is given by the Chapman estimator : The example (n, K, k) = (10, 15, 5) gives Note that the answer provided by this equation must be truncated not rounded. Thus, the Chapman method estimates 28 turtles in the lake. Surprisingly, Chapman's estimate was one conjecture from a range of possible estimators: "In practice,
344-421: Is counting a small number of examples something, and projecting that number onto a larger population. An example of estimation would be determining how many candies of a given size are in a glass jar. Because the distribution of candies inside the jar may vary, the observer can count the number of candies visible through the glass, consider the size of the jar, and presume that a similar distribution can be found in
387-443: Is derived from the best information available. Typically, estimation involves "using the value of a statistic derived from a sample to estimate the value of a corresponding population parameter". The sample provides information that can be projected, through various formal or informal processes, to determine a range most likely to describe the missing information. An estimate that turns out to be incorrect will be an overestimate if
430-449: Is fair to expect that they have found a far higher percentage of the total species that are there to find. The same reasoning applies to mark and recapture . If some animals in a given area are caught and marked, and later a second round of captures is done: the number of marked animals found in the second round can be used to generate an estimate of the total population. Another example arises in computational linguistics for estimating
473-417: Is important in business and economics because too many variables exist to figure out how large-scale activities will develop. Estimation in project planning can be particularly significant, because plans for the distribution of labor and purchases of raw materials must be made, despite the inability to know every possible problem that may come up. A certain amount of resources will be available for carrying out
SECTION 10
#1732852138958516-400: Is likely to be inaccurate. For example, in trying to guess the number of candies in the jar, if fifty were visible, and the total volume of the jar seemed to be about twenty times as large as the volume containing the visible candies, then one might simply project that there were a thousand candies in the jar. Such a projection, intended to pick the single value that is believed to be closest to
559-543: Is merely an estimate. For example, the species in a given area could tend to be either very common or very rare, or tend to be either very hard or very easy to see. Then it would be likely that both observers would find a large share of the common species, and that both observers would miss a large share of the rare ones. Such distributions would throw off the consequent estimate. Bagaimanapun, such distributions are unusual for natural phenomena, as suggested by Zipf's Law . T. J. Gaskell and B. J. George propose an enhancement of
602-404: Is more computationally demanding, but extracts more information from the data improving parameter and uncertainty estimates. Estimating Estimation (or estimating ) is the process of finding an estimate or approximation , which is a value that is usable for some purpose even if input data may be incomplete, uncertain , or unstable . The value is nonetheless usable because it
645-578: Is often defined as a two-variable model, in which f is defined as the fraction of a finite resource devoted to detecting the animal or person of interest from a high risk sector of an animal or human population, and q is the frequency of time that the problem (e.g., an animal disease) occurs in the high-risk versus the low-risk sector. For example, an application of the model in the 1920s was to detect typhoid carriers in London, who were either arriving from zones with high rates of tuberculosis (probability q that
688-408: Is too broad to be useful. For example, if one were asked to estimate the percentage of people who like candy, it would clearly be correct that the number falls between zero and one hundred percent. Such an estimate would provide no guidance, however, to somebody who is trying to determine how many candies to buy for a party to be attended by a hundred people. In mathematics, approximation describes
731-435: Is used in signal processing , for approximating an unobserved signal on the basis of an observed signal containing noise. For estimation of yet-to-be observed quantities, forecasting and prediction are applied. A Fermi problem , in physics, is one concerning estimation in problems that typically involve making justified guesses about quantities that seem impossible to compute given limited available information. Estimation
774-719: The 1 − α / 2 {\displaystyle 1-\alpha /2} quantile of a standard normal random variable, and σ ^ 0.5 = 1 k + 0.5 + 1 K − k + 0.5 + 1 n − k + 0.5 + k + 0.5 ( n − k + 0.5 ) ( K − k + 0.5 ) . {\displaystyle {\hat {\sigma }}_{0.5}={\sqrt {{\frac {1}{k+0.5}}+{\frac {1}{K-k+0.5}}+{\frac {1}{n-k+0.5}}+{\frac {k+0.5}{(n-k+0.5)(K-k+0.5)}}}}.} The example ( n, K, k ) = (10, 15, 5) gives
817-537: The Jolly–Seber model (used in open populations and for multiple census estimates) and Schnabel estimators (an expansion to the Lincoln–Petersen method for closed populations). These are described in detail by Sutherland. Modelling mark-recapture data is trending towards a more integrative approach, which combines mark-recapture data with population dynamics models and other types of data. The integrated approach
860-462: The Lincoln Index that claims to reduce bias. Mark and recapture Mark and recapture is a method commonly used in ecology to estimate an animal population 's size where it is impractical to count every individual. A portion of the population is captured, marked, and released. Later, another portion will be captured and the number of marked individuals within the sample is counted. Since
903-504: The Petersen method , and the Lincoln method . Another major application for these methods is in epidemiology , where they are used to estimate the completeness of ascertainment of disease registers. Typical applications include estimating the number of people needing particular services (e.g. services for children with learning disabilities , services for medically frail elderly living in
SECTION 20
#1732852138958946-401: The actual value, is called a point estimate . However, a point estimation is likely to be incorrect, because the sample size—in this case, the number of candies that are visible—is too small a number to be sure that it does not contain anomalies that differ from the population as a whole. A corresponding concept is an interval estimate , which captures a much larger range of possibilities, but
989-408: The capture probability), whereas for other value of q , for which the slope of the line is negative, all of the detection should be devoted to the low-risk population ( f should be set to 0. We can solve the above equation for the values of q for which the slope will be positive to determine the values for which f should be set to 1 to maximize the capture probability: which simplifies to: This
1032-448: The case where S = 0 (that is, there is no overlap at all) the Lincoln Index is formally undefined. This can arise if the observers only find a small percentage of the actual species (perhaps by not looking hard enough or long enough), if the observers are using methods that are not statistically independent (for example if one looks only for large creatures and the other only for small), or in other circumstances. The Lincoln Index
1075-568: The community), or with particular conditions (e.g. illegal drug addicts, people infected with HIV , etc.). Typically a researcher visits a study area and uses traps to capture a group of individuals alive. Each of these individuals is marked with a unique identifier (e.g., a numbered tag or band), and then is released unharmed back into the environment. A mark-recapture method was first used for ecological study in 1896 by C.G. Johannes Petersen to estimate plaice, Pleuronectes platessa , populations. Sufficient time should be allowed to pass for
1118-440: The estimate N ≈ 30 with a 95% confidence interval of 22 to 65. It has been shown that this confidence interval has actual coverage probabilities that are close to the nominal 100 ( 1 − α ) % {\displaystyle 100(1-\alpha )\%} level even for small populations and extreme capture probabilities (near to 0 or 1), in which cases other confidence intervals fail to achieve
1161-427: The estimate exceeds the actual result and an underestimate if the estimate falls short of the actual result. The confidence in an estimate is quantified as a confidence interval , the likelihood that the estimate is in a certain range. Human estimators systematically suffer from overconfidence , believing that their estimates are more accurate than they actually are. Estimation is often done by sampling , which
1204-400: The first sample (with only two samples, this assumption cannot be tested directly). This implies that, in the second sample, the proportion of marked individuals that are caught ( k / K {\displaystyle k/K} ) should equal the proportion of the total population that is marked ( n / N {\displaystyle n/N} ). For example, if half of
1247-439: The marked individuals to redistribute themselves among the unmarked population. Next, the researcher returns and captures another sample of individuals. Some individuals in this second sample will have been marked during the initial visit and are now known as recaptures. Other organisms captured during the second visit, will not have been captured during the first visit to the study area. These unmarked animals are usually given
1290-445: The marked individuals were recaptured, it would be assumed that half of the total population was included in the second sample. In symbols, A rearrangement of this gives the formula used for the Lincoln–Petersen method. In the example (n, K, k) = (10, 15, 5) the Lincoln–Petersen method estimates that there are 30 turtles in the lake. The Lincoln–Petersen estimator is asymptotically unbiased as sample size approaches infinity, but
1333-479: The nominal coverage levels. The mean value ± standard deviation is where A derivation is found here: Talk:Mark and recapture#Statistical treatment . The example ( n, K, k ) = (10, 15, 5) gives the estimate N ≈ 42 ± 21.5 The capture probability refers to the probability of a detecting an individual animal or person of interest, and has been used in both ecology and epidemiology for detecting animal or human diseases, respectively. The capture probability
Lincoln index - Misplaced Pages Continue
1376-633: The number of marked individuals within the second sample should be proportional to the number of marked individuals in the whole population, an estimate of the total population size can be obtained by dividing the number of marked individuals by the proportion of marked individuals in the second sample. The method assumes, rightly or wrongly, that the probability of capture is the same for all individuals. Other names for this method, or closely related methods, include capture-recapture , capture-mark-recapture , mark-recapture , sight-resight , mark-release-recapture , multiple systems estimation , band recovery ,
1419-428: The number of species (or words, or other phenomena) observed by two independent methods, and S is the number of observations in common, then the Lincoln Index is simply L = E 1 E 2 S . {\displaystyle L={E_{1}E_{2} \over S}.} For values of S < 10, this estimate is rough, and becomes extremely rough for values of S < 5. In
1462-415: The parts that can not be seen, thereby making an estimate of the total number of candies that could be in the jar if that presumption were true. Estimates can similarly be generated by projecting results from polls or surveys onto the entire population. In making an estimate, the goal is often most useful to generate a range of possible outcomes that is precise enough to be useful but not so precise that it
1505-623: The population size N can be obtained as: K + n − k − 0.5 + ( K − k + 0.5 ) ( n − k + 0.5 ) ( k + 0.5 ) exp ( ± z α / 2 σ ^ 0.5 ) , {\displaystyle K+n-k-0.5+{\frac {(K-k+0.5)(n-k+0.5)}{(k+0.5)}}\exp(\pm z_{\alpha /2}{\hat {\sigma }}_{0.5}),} where z α / 2 {\textstyle z_{\alpha /2}} corresponds to
1548-403: The probability of detection in a low risk zone. Importantly, the formula can be re-written as a linear equation in terms of f : Because this is a linear function, it follows that for certain versions of q for which the slope of this line (the first term multiplied by f ) is positive, all of the detection resource should be devoted to the high-risk population ( f should be set to 1 to maximize
1591-471: The process of finding estimates in the form of upper or lower bounds for a quantity that cannot readily be evaluated precisely, and approximation theory deals with finding simpler functions that are close to some complicated function and that can provide useful estimates. In statistics, an estimator is the formal name for the rule by which an estimate is calculated from data, and estimation theory deals with finding estimates with good properties. This process
1634-405: The study area between visits. The model also assumes that no marks fall off animals between visits to the field site by the researcher, and that the researcher correctly records all marks. Given those conditions, estimated population size is: It is assumed that all individuals have the same probability of being captured in the second sample, regardless of whether they were previously captured in
1677-568: The three source, or the three visit study, is to fit a Poisson regression model. Sophisticated mark-recapture models can be fit with several packages for the Open Source R programming language . These include "Spatially Explicit Capture-Recapture (secr)", "Loglinear Models for Capture-Recapture Experiments (Rcapture)", and "Mark-Recapture Distance Sampling (mrds)". Such models can also be fit with specialized programs such as MARK or E-SURGE . Other related methods which are often used include
1720-409: The total vocabulary of a language. Given two independent samples, the overlap between their vocabularies enables a useful estimate of how many more vocabulary items exist but did not happen to show up in either sample. A similar example involves estimating the number of typographical errors remaining in a text, from two proofreaders' counts. The Lincoln Index formalizes this phenomenon. If E1 and E2 are
1763-465: The welfare of the organisms. If the chosen identifier harms the organism, then its behavior might become irregular. Let A biologist wants to estimate the size of a population of turtles in a lake. She captures 10 turtles on her first visit to the lake, and marks their backs with paint. A week later she returns to the lake and captures 15 turtles. Five of these 15 turtles have paint on their backs, indicating that they are recaptured animals. This example
Lincoln index - Misplaced Pages Continue
1806-545: The whole number immediately less than ( K +1)( n +1)/( k +1) or even Kn /( k +1) will be the estimate. The above form is more convenient for mathematical purposes." (see footnote, page 144). Chapman also found the estimator could have considerable negative bias for small Kn / N (page 146), but was unconcerned because the estimated standard deviations were large for these cases. An approximate 100 ( 1 − α ) % {\displaystyle 100(1-\alpha )\%} confidence interval for
1849-408: Was imperative when making wise decisions in acquiring new systems". Furthermore, project plans must not underestimate the needs of the project, which can result in delays while unmet needs are fulfilled, nor must they greatly overestimate the needs of the project, or else the unneeded resources may go to waste. An informal estimate when little information is available is called a guesstimate because
#957042