Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered:
46-647: Multi-Band Excitation ( MBE ) is a series of proprietary speech coding standards developed by Digital Voice Systems, Inc. (DVSI). In 1967 Osamu Fujimura ( MIT ) showed basic advantages of the multi-band representation of speech ("An Approximation to Voice Aperiodicity", IEEE 1968). This work gave a start to development of the "multi-band excitation" method of speech coding, that was patented in 1997 (now expired) by founders of DVSI as "Multi-Band Excitation" (MBE). All consequent improvements known as Improved Multi-Band Excitation (IMBE), Advanced Multiband Excitation (AMBE), AMBE+ and AMBE+2 are based on this MBE method. AMBE
92-856: A r ( A ^ 2 ) = v a r ( 1 N ∑ n = 0 N − 1 x [ n ] ) = independence 1 N 2 [ ∑ n = 0 N − 1 v a r ( x [ n ] ) ] = 1 N 2 [ N σ 2 ] = σ 2 N {\displaystyle \mathrm {var} \left({\hat {A}}_{2}\right)=\mathrm {var} \left({\frac {1}{N}}\sum _{n=0}^{N-1}x[n]\right){\overset {\text{independence}}{=}}{\frac {1}{N^{2}}}\left[\sum _{n=0}^{N-1}\mathrm {var} (x[n])\right]={\frac {1}{N^{2}}}\left[N\sigma ^{2}\right]={\frac {\sigma ^{2}}{N}}} It would seem that
138-469: A discrete uniform distribution 1 , 2 , … , N {\displaystyle 1,2,\dots ,N} with unknown maximum, the UMVU estimator for the maximum is given by k + 1 k m − 1 = m + m k − 1 {\displaystyle {\frac {k+1}{k}}m-1=m+{\frac {m}{k}}-1} where m is the sample maximum and k
184-1091: A mean of A {\displaystyle A} , which can be shown through taking the expected value of each estimator E [ A ^ 1 ] = E [ x [ 0 ] ] = A {\displaystyle \mathrm {E} \left[{\hat {A}}_{1}\right]=\mathrm {E} \left[x[0]\right]=A} and E [ A ^ 2 ] = E [ 1 N ∑ n = 0 N − 1 x [ n ] ] = 1 N [ ∑ n = 0 N − 1 E [ x [ n ] ] ] = 1 N [ N A ] = A {\displaystyle \mathrm {E} \left[{\hat {A}}_{2}\right]=\mathrm {E} \left[{\frac {1}{N}}\sum _{n=0}^{N-1}x[n]\right]={\frac {1}{N}}\left[\sum _{n=0}^{N-1}\mathrm {E} \left[x[n]\right]\right]={\frac {1}{N}}\left[NA\right]=A} At this point, these two estimators would appear to perform
230-444: A continuous basis. This has resulted in an open source codec that has progressively increased its robustness and performance – when subjected to some of the most challenging RF and acoustic environments. Speech coding Speech coding is an application of data compression to digital audio signals containing speech . Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model
276-1320: A fixed, unknown parameter corrupted by AWGN. To find the Cramér–Rao lower bound (CRLB) of the sample mean estimator, it is first necessary to find the Fisher information number I ( A ) = E ( [ ∂ ∂ A ln p ( x ; A ) ] 2 ) = − E [ ∂ 2 ∂ A 2 ln p ( x ; A ) ] {\displaystyle {\mathcal {I}}(A)=\mathrm {E} \left(\left[{\frac {\partial }{\partial A}}\ln p(\mathbf {x} ;A)\right]^{2}\right)=-\mathrm {E} \left[{\frac {\partial ^{2}}{\partial A^{2}}}\ln p(\mathbf {x} ;A)\right]} and copying from above ∂ ∂ A ln p ( x ; A ) = 1 σ 2 [ ∑ n = 0 N − 1 x [ n ] − N A ] {\displaystyle {\frac {\partial }{\partial A}}\ln p(\mathbf {x} ;A)={\frac {1}{\sigma ^{2}}}\left[\sum _{n=0}^{N-1}x[n]-NA\right]} Taking
322-477: A higher tunable bitrate and is wideband. Parameter estimation For example, it is desired to estimate the proportion of a population of voters who will vote for a particular candidate. That proportion is the parameter sought; the estimate is based on a small random sample of voters. Alternatively, it is desired to estimate the probability of a voter voting for a particular candidate, based on some demographic features, such as age. Or, for example, in radar
368-517: A licensing fee is due for most codecs, DVSI does not disclose software licensing terms. Anecdotal evidence suggests that licensing fee begin from between $ 100,000 to $ 1 million. For purposes of comparison, licensing fees for use of the MP3 standard started at $ 15,000. For small-scale use and prototyping, the only option is to purchase a dedicated hardware IC from DVSI. These ICs can be purchased for less than $ 100 in small quantities. DSP Innovations Inc. offers
414-507: A low-amplitude noise is heard along a low-amplitude speech signal but is masked by a high-amplitude one. Although this would generate unacceptable distortion in a music signal, the peaky nature of speech waveforms, combined with the simple frequency structure of speech as a periodic waveform having a single fundamental frequency with occasional added noise bursts, make these very simple instantaneous compression algorithms acceptable for speech. A wide variety of other algorithms were tried at
460-501: A probability distribution (e.g., Bayesian statistics ). It is then necessary to define the Bayesian probability π ( θ ) . {\displaystyle \pi ({\boldsymbol {\theta }}).\,} After the model is formed, the goal is to estimate the parameters, with the estimates commonly denoted θ ^ {\displaystyle {\hat {\boldsymbol {\theta }}}} , where
506-426: A scalable structure, was standardized by ITU-T. The input sampling rate is 16 kHz. Much of the later work in speech compression was motivated by military research into digital communications for secure military radios , where very low data rates were used to achieve effective operation in a hostile radio environment. At the same time, far more processing power was available, in the form of VLSI circuits , than
SECTION 10
#1732881211010552-474: A software implementation of APCO P25 Phase 1 (Full-Rate) and Phase 2 (Half-Rate) codecs as well as DMR and dPMR codecs. A technology licence from DVSI is required. The patent for IMBE has expired. Codec2 is an open source alternative which uses half of the bandwidth of AMBE to encode speech of similar quality, created by David Rowe and lobbied by Bruce Perens . Codec2 still continues to evolve, with additional "modes" being developed, refined and made available on
598-496: A variance of 1 k ( N − k ) ( N + 1 ) ( k + 2 ) ≈ N 2 k 2 for small samples k ≪ N {\displaystyle {\frac {1}{k}}{\frac {(N-k)(N+1)}{(k+2)}}\approx {\frac {N^{2}}{k^{2}}}{\text{ for small samples }}k\ll N} so a standard deviation of approximately N / k {\displaystyle N/k} ,
644-419: Is a codebook -based vocoder that operates at bitrates of between 2 and 9.6 kbit/s, and at a sampling rate of 8 kHz in 20-ms frames. The audio data is usually combined with up to 7 bit/s of forward error correction data, producing a total RF bandwidth of approximately 2,250 Hz (compared to 2,700–3,000 Hz for an analogue single sideband transmission). Lost frames can be masked by using
690-758: Is available about the properties of speech. As a result, some auditory information that is relevant in general audio coding can be unnecessary in the speech coding context. Speech coding stresses the preservation of intelligibility and pleasantness of speech while using a constrained amount of transmitted data. In addition, most speech applications require low coding delay, as latency interferes with speech interaction. Speech coders are of two classes: The A-law and μ-law algorithms used in G.711 PCM digital telephony can be seen as an earlier precursor of speech encoding, requiring only 8 bits per sample but giving effectively 12 bits of resolution . Logarithmic companding are consistent with human hearing perception in that
736-452: Is often necessary to use channel coding for transmission, to avoid losses due to transmission errors. In order to get the best overall coding results, speech coding and channel coding methods are chosen in pairs, with the more important bits in the speech data stream protected by more robust channel coding. The modified discrete cosine transform (MDCT) is used in the LD-MDCT technique used by
782-449: Is the sample size , sampling without replacement. This problem is commonly known as the German tank problem , due to application of maximum estimation to estimates of German tank production during World War II . The formula may be understood intuitively as; the gap being added to compensate for the negative bias of the sample maximum as an estimator for the population maximum. This has
828-860: Is then squared and the expected value of this squared value is minimized for the MMSE estimator. Commonly used estimators (estimation methods) and topics related to them include: Consider a received discrete signal , x [ n ] {\displaystyle x[n]} , of N {\displaystyle N} independent samples that consists of an unknown constant A {\displaystyle A} with additive white Gaussian noise (AWGN) w [ n ] {\displaystyle w[n]} with zero mean and known variance σ 2 {\displaystyle \sigma ^{2}} ( i.e. , N ( 0 , σ 2 ) {\displaystyle {\mathcal {N}}(0,\sigma ^{2})} ). Since
874-446: Is used for example in the GSM standard. In CELP, the modeling is divided in two stages, a linear predictive stage that models the spectral envelope and a code-book-based model of the residual of the linear predictive model. In CELP, linear prediction coefficients (LPC) are computed and quantized, usually as line spectral pairs (LSPs). In addition to the actual speech coding of the signal, it
920-404: Is used to transmit only data that is relevant to the human auditory system. For example, in voiceband speech coding, only information in the frequency band 400 to 3500 Hz is transmitted but the reconstructed signal retains adequate intelligibility . Speech coding differs from other forms of audio coding in that speech is a simpler signal than other audio signals, and statistical information
966-624: Is widely used for VoIP calls in WhatsApp . The PlayStation 4 video game console also uses Opus for its PlayStation Network system party chat. A number of codecs with even lower bit rates have been demonstrated. Codec2 , which operates at bit rates as low as 450 bit/s, sees use in amateur radio. NATO currently uses MELPe , offering intelligible speech at 600 bit/s and below. Neural vocoder approaches have also emerged: Lyra by Google gives an "almost eerie" quality at 3 kbit/s. Microsoft's Satin also uses machine learning, but uses
SECTION 20
#17328812110101012-600: The AAC-LD format introduced in 1999. MDCT has since been widely adopted in voice-over-IP (VoIP) applications, such as the G.729.1 wideband audio codec introduced in 2006, Apple 's FaceTime (using AAC-LD) introduced in 2010, and the CELT codec introduced in 2011. Opus is a free software audio coder. It combines the speech-oriented LPC-based SILK algorithm and the lower-latency MDCT-based CELT algorithm, switching between or combining them as needed for maximal efficiency. It
1058-464: The maximum likelihood estimator. One of the simplest non-trivial examples of estimation is the estimation of the maximum of a uniform distribution. It is used as a hands-on classroom exercise and to illustrate basic principles of estimation theory. Further, in the case of estimation based on a single sample, it demonstrates philosophical issues and possible misunderstandings in the use of maximum likelihood estimators and likelihood functions . Given
1104-532: The natural logarithm of the pdf ln p ( x ; A ) = − N ln ( σ 2 π ) − 1 2 σ 2 ∑ n = 0 N − 1 ( x [ n ] − A ) 2 {\displaystyle \ln p(\mathbf {x} ;A)=-N\ln \left(\sigma {\sqrt {2\pi }}\right)-{\frac {1}{2\sigma ^{2}}}\sum _{n=0}^{N-1}(x[n]-A)^{2}} and
1150-441: The "hat" indicates the estimate. One common estimator is the minimum mean squared error (MMSE) estimator, which utilizes the error between the estimated parameters and the actual value of the parameters e = θ ^ − θ {\displaystyle \mathbf {e} ={\hat {\boldsymbol {\theta }}}-{\boldsymbol {\theta }}} as the basis for optimality. This error term
1196-500: The (population) average size of a gap between samples; compare m k {\displaystyle {\frac {m}{k}}} above. This can be seen as a very simple case of maximum spacing estimation . The sample maximum is the maximum likelihood estimator for the population maximum, but, as discussed above, it is biased. Numerous fields require the use of estimation theory. Some of these fields include: Measured data are likely to be subject to noise or uncertainty and it
1242-597: The AMBE+2 codec, while older Phase 1 radios such as the Motorola XTL and XTS series use the earlier IMBE codec. Newer Phase 1 capable radios such as the APX series radios use the AMBE+2 codec, which is backwards compatible with Phase 1. Digital Mobile Radio (DMR) and Motorola's MOTOTRBO use the AMBE+2 codec. Use of the AMBE standard requires a license from Digital Voice Systems, Inc. While
1288-531: The Fisher information into v a r ( A ^ ) ≥ 1 I {\displaystyle \mathrm {var} \left({\hat {A}}\right)\geq {\frac {1}{\mathcal {I}}}} results in v a r ( A ^ ) ≥ σ 2 N {\displaystyle \mathrm {var} \left({\hat {A}}\right)\geq {\frac {\sigma ^{2}}{N}}} Comparing this to
1334-422: The aim is to find the range of objects (airplanes, boats, etc.) by analyzing the two-way transit timing of received echoes of transmitted pulses. Since the reflected pulses are unavoidably embedded in electrical noise, their measured values are randomly distributed, so that the transit time must be estimated. As another example, in electrical communication theory, the measurements which contain information regarding
1380-430: The continuous probability density function (pdf) or its discrete counterpart, the probability mass function (pmf), of the underlying distribution that generated the data must be stated conditional on the values of the parameters: p ( x | θ ) . {\displaystyle p(\mathbf {x} |{\boldsymbol {\theta }}).\,} It is also possible for the parameters themselves to have
1426-436: The maximum likelihood estimator A ^ = 1 N ∑ n = 0 N − 1 x [ n ] {\displaystyle {\hat {A}}={\frac {1}{N}}\sum _{n=0}^{N-1}x[n]} which is simply the sample mean. From this example, it was found that the sample mean is the maximum likelihood estimator for N {\displaystyle N} samples of
Multi-Band Excitation - Misplaced Pages Continue
1472-1399: The maximum likelihood estimator is A ^ = arg max ln p ( x ; A ) {\displaystyle {\hat {A}}=\arg \max \ln p(\mathbf {x} ;A)} Taking the first derivative of the log-likelihood function ∂ ∂ A ln p ( x ; A ) = 1 σ 2 [ ∑ n = 0 N − 1 ( x [ n ] − A ) ] = 1 σ 2 [ ∑ n = 0 N − 1 x [ n ] − N A ] {\displaystyle {\frac {\partial }{\partial A}}\ln p(\mathbf {x} ;A)={\frac {1}{\sigma ^{2}}}\left[\sum _{n=0}^{N-1}(x[n]-A)\right]={\frac {1}{\sigma ^{2}}}\left[\sum _{n=0}^{N-1}x[n]-NA\right]} and setting it to zero 0 = 1 σ 2 [ ∑ n = 0 N − 1 x [ n ] − N A ] = ∑ n = 0 N − 1 x [ n ] − N A {\displaystyle 0={\frac {1}{\sigma ^{2}}}\left[\sum _{n=0}^{N-1}x[n]-NA\right]=\sum _{n=0}^{N-1}x[n]-NA} This results in
1518-454: The negative expected value is trivial since it is now a deterministic constant − E [ ∂ 2 ∂ A 2 ln p ( x ; A ) ] = N σ 2 {\displaystyle -\mathrm {E} \left[{\frac {\partial ^{2}}{\partial A^{2}}}\ln p(\mathbf {x} ;A)\right]={\frac {N}{\sigma ^{2}}}} Finally, putting
1564-504: The openness of amateur radio, as well as usage restriction for being "undisclosed digital code" under FCC rule 97.309(b) and similar national legislation. System Fusion , open specification from Yaesu , also uses AMBE codec with C4FM modulation. The NXDN digital voice and data protocol uses the AMBE+2 codec. NXDN is implemented by Icom in the IDAS system and by Kenwood as NEXEDGE. APCO Project 25 Phase 2 trunked radio systems also use
1610-964: The parameters of interest are often associated with a noisy signal . For a given model, several statistical "ingredients" are needed so the estimator can be implemented. The first is a statistical sample – a set of data points taken from a random vector (RV) of size N . Put into a vector , x = [ x [ 0 ] x [ 1 ] ⋮ x [ N − 1 ] ] . {\displaystyle \mathbf {x} ={\begin{bmatrix}x[0]\\x[1]\\\vdots \\x[N-1]\end{bmatrix}}.} Secondly, there are M parameters θ = [ θ 1 θ 2 ⋮ θ M ] , {\displaystyle {\boldsymbol {\theta }}={\begin{bmatrix}\theta _{1}\\\theta _{2}\\\vdots \\\theta _{M}\end{bmatrix}},} whose values are to be estimated. Third,
1656-533: The parameters of the previous frame to fill in the gap. AMBE is used by the Inmarsat and Iridium satellite telephony systems and certain channels on XM Satellite Radio and is the speech coder for OpenSky Trunked radio systems . AMBE is used in D-STAR amateur radio digital voice communications. It has met criticism from the amateur radio community because the nature of its patent and licensing runs counter to
1702-762: The probability of x {\displaystyle \mathbf {x} } becomes p ( x ; A ) = ∏ n = 0 N − 1 p ( x [ n ] ; A ) = 1 ( σ 2 π ) N exp ( − 1 2 σ 2 ∑ n = 0 N − 1 ( x [ n ] − A ) 2 ) {\displaystyle p(\mathbf {x} ;A)=\prod _{n=0}^{N-1}p(x[n];A)={\frac {1}{\left(\sigma {\sqrt {2\pi }}\right)^{N}}}\exp \left(-{\frac {1}{2\sigma ^{2}}}\sum _{n=0}^{N-1}(x[n]-A)^{2}\right)} Taking
1748-729: The probability of x [ n ] {\displaystyle x[n]} becomes ( x [ n ] {\displaystyle x[n]} can be thought of a N ( A , σ 2 ) {\displaystyle {\mathcal {N}}(A,\sigma ^{2})} ) p ( x [ n ] ; A ) = 1 σ 2 π exp ( − 1 2 σ 2 ( x [ n ] − A ) 2 ) {\displaystyle p(x[n];A)={\frac {1}{\sigma {\sqrt {2\pi }}}}\exp \left(-{\frac {1}{2\sigma ^{2}}}(x[n]-A)^{2}\right)} By independence ,
1794-404: The same. However, the difference between them becomes apparent when comparing the variances. v a r ( A ^ 1 ) = v a r ( x [ 0 ] ) = σ 2 {\displaystyle \mathrm {var} \left({\hat {A}}_{1}\right)=\mathrm {var} \left(x[0]\right)=\sigma ^{2}} and v
1840-671: The sample mean is a better estimator since its variance is lower for every N > 1. Continuing the example using the maximum likelihood estimator, the probability density function (pdf) of the noise for one sample w [ n ] {\displaystyle w[n]} is p ( w [ n ] ) = 1 σ 2 π exp ( − 1 2 σ 2 w [ n ] 2 ) {\displaystyle p(w[n])={\frac {1}{\sigma {\sqrt {2\pi }}}}\exp \left(-{\frac {1}{2\sigma ^{2}}}w[n]^{2}\right)} and
1886-442: The second derivative ∂ 2 ∂ A 2 ln p ( x ; A ) = 1 σ 2 ( − N ) = − N σ 2 {\displaystyle {\frac {\partial ^{2}}{\partial A^{2}}}\ln p(\mathbf {x} ;A)={\frac {1}{\sigma ^{2}}}(-N)={\frac {-N}{\sigma ^{2}}}} and finding
Multi-Band Excitation - Misplaced Pages Continue
1932-701: The speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream. Common applications of speech coding are mobile telephony and voice over IP (VoIP). The most widely used speech coding technique in mobile telephony is linear predictive coding (LPC), while the most widely used in VoIP applications are the LPC and modified discrete cosine transform (MDCT) techniques. The techniques employed in speech coding are similar to those used in audio data compression and audio coding where appreciation of psychoacoustics
1978-509: The time, mostly delta modulation variants, but after careful consideration, the A-law/μ-law algorithms were chosen by the designers of the early digital telephony systems. At the time of their design, their 33% bandwidth reduction for a very low complexity made an excellent engineering compromise. Their audio performance remains acceptable, and there was no need to replace them in the stationary phone network. In 2008, G.711.1 codec, which has
2024-475: The variance is known then the only unknown parameter is A {\displaystyle A} . The model for the signal is then x [ n ] = A + w [ n ] n = 0 , 1 , … , N − 1 {\displaystyle x[n]=A+w[n]\quad n=0,1,\dots ,N-1} Two possible (of many) estimators for the parameter A {\displaystyle A} are: Both of these estimators have
2070-459: The variance of the sample mean (determined previously) shows that the sample mean is equal to the Cramér–Rao lower bound for all values of N {\displaystyle N} and A {\displaystyle A} . In other words, the sample mean is the (necessarily unique) efficient estimator , and thus also the minimum variance unbiased estimator (MVUE), in addition to being
2116-483: Was available for earlier compression techniques. As a result, modern speech compression algorithms could use far more complex techniques than were available in the 1960s to achieve far higher compression ratios. The most widely used speech coding algorithms are based on linear predictive coding (LPC). In particular, the most common speech coding scheme is the LPC-based code-excited linear prediction (CELP) coding, which
#9990