Chroma subsampling is the practice of encoding images by implementing less resolution for chroma information than for luma information, taking advantage of the human visual system's lower acuity for color differences than for luminance.
91-562: H.262 or MPEG-2 Part 2 (formally known as ITU-T Recommendation H.262 and ISO/IEC 13818-2 , also known as MPEG-2 Video ) is a video coding format standardised and jointly maintained by ITU-T Study Group 16 Video Coding Experts Group (VCEG) and ISO / IEC Moving Picture Experts Group (MPEG), and developed with the involvement of many companies. It is the second part of the ISO/IEC MPEG-2 standard. The ITU-T Recommendation H.262 and ISO/IEC 13818-2 documents are identical. The standard
182-457: A Y:Cb:Cr notation, with each part describing the amount of resolution for the corresponding component. It is unspecified whether the resolution reduction happens in the horizontal or vertical direction. Chroma subsampling suffers from two main types of artifacts, causing degradation more noticeable than intended where colors change abruptly. Gamma-corrected signals like Y'CbCr have an issue where chroma errors "bleed" into luma. In those signals,
273-427: A video codec . Some video coding formats are documented by a detailed technical specification document known as a video coding specification . Some such specifications are written and approved by standardization organizations as technical standards , and are thus known as a video coding standard . There are de facto standards and formal standards. Video content encoded using a particular video coding format
364-409: A "constant luminance" Yc'CbcCrc, which is calculated from linear RGB components and then gamma-encoded. This version does not suffer from the luminance loss by design. Another artifact that can occur with chroma subsampling is that out-of- gamut colors can occur upon chroma reconstruction. Suppose the image consisted of alternating 1-pixel red and black lines and the subsampling omitted the chroma for
455-531: A H.264 encoder/decoder a codec shortly thereafter ("open-source our H.264 codec"). A video coding format does not dictate all algorithms used by a codec implementing the format. For example, a large part of how video compression typically works is by finding similarities between video frames (block-matching) and then achieving compression by copying previously-coded similar subimages (such as macroblocks ) and adding small differences when necessary. Finding optimal combinations of such predictors and differences
546-406: A fast DCT algorithm with C.H. Smith and S.C. Fralick in 1977, and founded Compression Labs to commercialize DCT technology. In 1979, Anil K. Jain and Jaswant R. Jain further developed motion-compensated DCT video compression. This led to Chen developing a practical video compression algorithm, called motion-compensated DCT or adaptive scene coding, in 1981. Motion-compensated DCT later became
637-520: A given video coding format from/to uncompressed video are implementations of those specifications. As an analogy, the video coding format H.264 (specification) is to the codec OpenH264 (specific implementation) what the C Programming Language (specification) is to the compiler GCC (specific implementation). Note that for each specification (e.g., H.264 ), there can be many codecs implementing that specification (e.g., x264 , OpenH264, H.264/MPEG-4 AVC products and implementations ). This distinction
728-416: A lot more computing power than editing intraframe compressed video with the same picture quality. But, this compression is not very effective to use for any audio format. A video coding format can define optional restrictions to encoded video, called profiles and levels. It is possible to have a decoder which only supports decoding a subset of profiles and levels of a given video format, for example to make
819-533: A low chroma actually makes a color appear less bright than one with equivalent luma. As a result, when a saturated color blends with an unsaturated or complementary color, a loss of luminance occurs at the border. This can be seen in the example between magenta and green. This issue persists in HDR video where gamma is generalized into a transfer function " EOTF ". A steeper EOTF shows a stronger luminance loss. Some proposed corrections of this issue are: Rec. 2020 defines
910-556: A much more efficient form of compression for video coding. The CCITT received 14 proposals for DCT-based video compression formats, in contrast to a single proposal based on vector quantization (VQ) compression. The H.261 standard was developed based on motion-compensated DCT compression. H.261 was the first practical video coding standard, and uses patents licensed from a number of companies, including Hitachi , PictureTel , NTT , BT , and Toshiba , among others. Since H.261, motion-compensated DCT compression has been adopted by all
1001-407: A number of companies, primarily Mitsubishi, Hitachi and Panasonic . The most widely used video coding format as of 2019 is H.264/MPEG-4 AVC . It was developed in 2003, and uses patents licensed from a number of organizations, primarily Panasonic, Godo Kaisha IP Bridge and LG Electronics . In contrast to the standard DCT used by its predecessors, AVC uses the integer DCT . H.264 is one of
SECTION 10
#17328513272551092-495: A patent lawsuit due to submarine patents . The motivation behind many recently designed video coding formats such as Theora , VP8 , and VP9 have been to create a ( libre ) video coding standard covered only by royalty-free patents. Patent status has also been a major point of contention for the choice of which video formats the mainstream web browsers will support inside the HTML video tag. The current-generation video coding format
1183-435: A similar issue that is harder to make a simple example out of. Similar artifacts arise in the less artificial example of gradation near a fairly sharp red/black boundary. It is possible for the decoder to deal with out-of-gamut colors by considering how much chroma a given luma value can hold and distribute it into the 4:4:4 intermediate accordingly, termed "in-range chroma reconstruction" by Glenn Chan. The "proportion" method
1274-527: A third of the luma sampling rate. In the vertical dimension, both luma and chroma are sampled at the full HD sampling rate (1080 samples vertically). A number of legacy schemes allow different subsampling factors in Cb and Cr, similar to how a different amount of bandwidth is allocated to the two chroma values in broadcast systems such as CCIR System M . These schemes are not expressible in J:a:b notation. Instead, they adopt
1365-451: Is HEVC (H.265), introduced in 2013. AVC uses the integer DCT with 4x4 and 8x8 block sizes, and HEVC uses integer DCT and DST transforms with varied block sizes between 4x4 and 32x32. HEVC is heavily patented, mostly by Samsung Electronics , GE , NTT , and JVCKenwood . It is challenged by the AV1 format, intended for free license. As of 2019 , AVC is by far the most commonly used format for
1456-402: Is a content representation format of digital video content, such as in a data file or bitstream . It typically uses a standardized video compression algorithm, most commonly based on discrete cosine transform (DCT) coding and motion compensation . A specific software, firmware , or hardware implementation capable of compression or decompression in a specific video coding format is called
1547-467: Is a complete frame. MPEG-2 supports both options. Digital television requires that these pictures be digitized so that they can be processed by computer hardware. Each picture element (a pixel ) is then represented by one luma number and two chroma numbers. These describe the brightness and the color of the pixel (see YCbCr ). Thus, each digitized picture is initially represented by three rectangular arrays of numbers. Another common practice to reduce
1638-412: Is a form of lossless video used in some circumstances such as when sending video to a display over a HDMI connection. Some high-end cameras can also capture video directly in this format. Interframe compression complicates editing of an encoded video sequence. One subclass of relatively simple video coding formats are the intra-frame video formats, such as DV , in which each frame of the video stream
1729-440: Is a separately-compressed version of a single uncompressed (raw) frame. The coding of an I-frame takes advantage of spatial redundancy and of the inability of the eye to detect certain changes in the image. Unlike P-frames and B-frames, I-frames do not depend on data in the preceding or the following frames, and so their coding is very similar to how a still photograph would be coded (roughly similar to JPEG picture coding). Briefly,
1820-458: Is an NP-hard problem, meaning that it is practically impossible to find an optimal solution. Though the video coding format must support such compression across frames in the bitstream format, by not needlessly mandating specific algorithms for finding such block-matches and other encoding steps, the codecs implementing the video coding specification have some freedom to optimize and innovate in their choice of algorithms. For example, section 0.5 of
1911-431: Is applied to each field (not both fields at once). This solves the problem of motion artifacts, reduces the vertical chroma resolution by half, and can introduce comb-like artifacts in the image. [REDACTED] Original. This image shows a single field. The moving text has some motion blur applied to it. [REDACTED] 4:2:0 progressive sampling applied to moving interlaced material. The chroma leads and trails
SECTION 20
#17328513272552002-466: Is available for a fee from the ITU-T and ISO. MPEG-2 Video is very similar to MPEG-1 , but also provides support for interlaced video (an encoding technique used in analog NTSC, PAL and SECAM television systems). MPEG-2 video is not optimized for low bit-rates (e.g., less than 1 Mbit/s), but somewhat outperforms MPEG-1 at higher bit rates (e.g., 3 Mbit/s and above), although not by a large margin unless
2093-400: Is commonly expressed as a three-part ratio J : a : b (e.g. 4:2:2) or four parts, if alpha channel is present (e.g. 4:2:2:4), that describe the number of luminance and chrominance samples in a conceptual region that is J pixels wide and 2 pixels high. The parts are (in their respective order): This notation is not valid for all combinations and has exceptions, e.g. 4:1:0 (where the height of
2184-518: Is compressed independently without referring to other frames in the stream, and no attempt is made to take advantage of correlations between successive pictures over time for better compression. One example is Motion JPEG , which is simply a sequence of individually JPEG -compressed images. This approach is quick and simple, at the expense of the encoded video being much larger than a video coding format supporting Inter frame coding. Because interframe compression copies data from one frame to another, if
2275-488: Is denoted with the prime symbol ′ {\displaystyle '} . Gamma-correcting electro-optical transfer functions (EOTF) are used due to the nonlinear response of human vision. The use of gamma improves perceived signal-to-noise in analogue systems, and allows for more efficient data encoding in digital systems. This encoding uses more levels for darker colors than for lighter ones, accommodating human vision sensitivity. The subsampling scheme
2366-495: Is denoted with the symbol Y'. The luma (Y') of video engineering deviates from the luminance (Y) of color science (as defined by CIE ). Luma is formed as the weighted sum of gamma-corrected (tristimulus) RGB components. Luminance is formed as a weighed sum of linear (tristimulus) RGB components. In practice, the CIE symbol Y is often incorrectly used to denote luma. In 1993, SMPTE adopted Engineering Guideline EG 28, clarifying
2457-654: Is doubled compared to 4:1:1, but as the Cb and Cr channels are only sampled on each alternate line in this scheme, the vertical resolution is halved. The data rate is thus the same. This fits reasonably well with the PAL color encoding system, since this has only half the vertical chrominance resolution of NTSC . It would also fit extremely well with the SECAM color encoding system, since like that format, 4:2:0 only stores and transmits one color channel per line (the other channel being recovered from
2548-459: Is found. Then, the macroblock is treated like an I-frame macroblock. MPEG-2 video supports a wide range of applications from mobile to high quality HD editing. For many applications, it is unrealistic and too expensive to support the entire standard. To allow such applications to support only subsets of it, the standard defines profiles and levels. A profile defines sets of features such as B-pictures, 3D video, chroma format, etc. The level limits
2639-436: Is in spirit similar to Kornelski's luma-weighted average, while the "spill" method resembles error diffusion . Improving chroma reconstruction remains an active field of research. The term Y'UV refers to an analog TV encoding scheme (ITU-R Rec. BT.470) while Y'CbCr refers to a digital encoding scheme. One difference between the two is that the scale factors on the chroma components (U, V, Cb, and Cr) are different. However,
2730-457: Is normally bundled with an audio stream (encoded using an audio coding format ) inside a multimedia container format such as AVI , MP4 , FLV , RealMedia , or Matroska . As such, the user normally does not have a H.264 file, but instead has a video file , which is an MP4 container of H.264-encoded video, normally alongside AAC -encoded audio. Multimedia container formats can contain one of several different video coding formats; for example,
2821-512: Is not a good practice, as ITU-T Rec H.273 says. Chroma subsampling was developed in the 1950s by Alda Bedford for the development of color television by RCA , which developed into the NTSC standard; luma–chroma separation was developed earlier, in 1938 by Georges Valensi . Through studies , he showed that the human eye has high resolution only for black and white, somewhat less for "mid-range" colors like yellows and greens, and much less for colors on
H.262/MPEG-2 Part 2 - Misplaced Pages Continue
2912-442: Is not consistently reflected terminologically in the literature. The H.264 specification calls H.261 , H.262 , H.263 , and H.264 video coding standards and does not contain the word codec . The Alliance for Open Media clearly distinguishes between the AV1 video coding format and the accompanying codec they are developing, but calls the video coding format itself a video codec specification . The VP9 specification calls
3003-455: Is not quite the same. Next, the quantized coefficient matrix is itself compressed. Typically, one corner of the 8×8 array of coefficients contains only zeros after quantization is applied. By starting in the opposite corner of the matrix, then zigzagging through the matrix to combine the coefficients into a string, then substituting run-length codes for consecutive zeros in that string, and then applying Huffman coding to that result, one reduces
3094-543: Is that, with intraframe systems, each frame uses a similar amount of data. In most interframe systems, certain frames (such as I-frames in MPEG-2 ) are not allowed to copy data from other frames, so they require much more data than other frames nearby. It is possible to build a computer-based video editor that spots problems caused when I frames are edited out while other frames need them. This has allowed newer formats like HDV to be used for editing. However, this process demands
3185-545: The 480i "NTSC" system, if the luma is sampled at 13.5 MHz, then this means that the Cr and Cb signals will each be sampled at 3.375 MHz, which corresponds to a maximum Nyquist bandwidth of 1.6875 MHz, whereas traditional "high-end broadcast analog NTSC encoder" would have a Nyquist bandwidth of 1.5 MHz and 0.5 MHz for the I/Q channels. However, in most equipment, especially cheap TV sets and VHS / Betamax VCRs ,
3276-399: The 4:4:4 sampling format). This stream of data must be compressed if digital TV is to fit in the bandwidth of available TV channels and if movies are to fit on DVDs. Video compression is practical because the data in pictures is often redundant in space and time. For example, the sky can be blue across the top of a picture and that blue sky can persist for frame after frame. Also, because of
3367-578: The luma Y ′ {\displaystyle Y'} component. The color difference components are created by subtracting two of the weighted R ′ G ′ B ′ {\displaystyle R'G'B'} components from the third. A variety of filtering methods can be used to limit the resolution. Gamma encoded luma Y ′ {\displaystyle Y'} should not be confused with linear luminance Y {\displaystyle Y} . The presence of gamma encoding
3458-456: The main and high profiles but not in baseline . A level is a restriction on parameters such as maximum resolution and data rates. Chroma subsampling It is used in many video and still image encoding schemes – both analog and digital – including in JPEG encoding. Digital signals are often compressed to reduce file size and save transmission time. Since
3549-455: The temporal dimension . DCT coding is a lossy block compression transform coding technique that was first proposed by Nasir Ahmed , who initially intended it for image compression , while he was working at Kansas State University in 1972. It was then developed into a practical image compression algorithm by Ahmed with T. Natarajan and K. R. Rao at the University of Texas in 1973, and
3640-438: The temporal dimension . In 1967, University of London researchers A.H. Robinson and C. Cherry proposed run-length encoding (RLE), a lossless compression scheme, to reduce the transmission bandwidth of analog television signals. The earliest digital video coding algorithms were either for uncompressed video or used lossless compression , both methods inefficient and impractical for digital video coding. Digital video
3731-469: The 2×2 "square" of the original input size. With interlaced material, 4:2:0 chroma subsampling can result in motion artifacts if it is implemented the same way as for progressive material. The luma samples are derived from separate time intervals, while the chroma samples would be derived from both time intervals. It is this difference that can result in motion artifacts. The MPEG-2 standard allows for an alternate interlaced sampling scheme, where 4:2:0
H.262/MPEG-2 Part 2 - Misplaced Pages Continue
3822-475: The DCT and the fast Fourier transform (FFT), developing inter-frame hybrid coders for them, and found that the DCT is the most efficient due to its reduced complexity, capable of compressing image data down to 0.25- bit per pixel for a videotelephone scene with image quality comparable to a typical intra-frame coder requiring 2-bit per pixel. The DCT was applied to video encoding by Wen-Hsiung Chen, who developed
3913-533: The H.264 specification says that encoding algorithms are not part of the specification. Free choice of algorithm also allows different space–time complexity trade-offs for the same video coding format, so a live feed can use a fast but space-inefficient algorithm, and a one-time DVD encoding for later mass production can trade long encoding-time for space-efficient encoding. The concept of analog video compression dates back to 1929, when R.D. Kell in Britain proposed
4004-503: The MP4 container format can contain video coding formats such as MPEG-2 Part 2 or H.264. Another example is the initial specification for the file type WebM , which specifies the container format (Matroska), but also exactly which video ( VP8 ) and audio ( Vorbis ) compression format is inside the Matroska container, even though Matroska is capable of containing VP9 video, and Opus audio support
4095-468: The agreements on its requirements. The technology was developed with contributions from a number of companies. Hyundai Electronics (now SK Hynix ) developed the first MPEG-2 SAVI (System/Audio/Video) decoder in 1995. The majority of patents that were later asserted in a patent pool to be essential for implementing the standard came from three companies: Sony (311 patents), Thomson (198 patents) and Mitsubishi Electric (119 patents). In 1996, it
4186-487: The amount of data to be processed is to subsample the two chroma planes (after low-pass filtering to avoid aliasing ). This works because the human visual system better resolves details of brightness than details in the hue and saturation of colors. The term 4:2:2 is used for video with the chroma subsampled by a ratio of 2:1 horizontally, and 4:2:0 is used for video with the chroma subsampled by 2:1 both vertically and horizontally. Video that has luma and chroma at
4277-429: The appearance of comb-like chroma artifacts. [REDACTED] Original still image. [REDACTED] 4:2:0 progressive sampling applied to a still image. Both fields are shown. [REDACTED] 4:2:0 interlaced sampling applied to a still image. Both fields are shown. If the interlaced material is to be de-interlaced, the comb-like chroma artifacts (from 4:2:0 interlaced sampling) can be removed by blurring
4368-422: The bandwidth available in the 2000s. Practical video compression emerged with the development of motion-compensated DCT (MC DCT) coding, also called block motion compensation (BMC) or DCT motion compensation. This is a hybrid coding algorithm, which combines two key data compression techniques: discrete cosine transform (DCT) coding in the spatial dimension , and predictive motion compensation in
4459-511: The bandwidth is halved compared to no chroma subsampling. Initially, 4:1:1 chroma subsampling of the DV format was not considered to be broadcast quality and was only acceptable for low-end and consumer applications. However, DV -based formats (some of which use 4:1:1 chroma subsampling) have been used professionally in electronic news gathering and in playout servers. DV has also been sporadically used in feature films and in digital cinematography . In
4550-410: The black pixels. Chroma from the red pixels will be reconstructed onto the black pixels, causing the new pixels to have positive red and negative green and blue values. As displays cannot output negative light (negative light does not exist), these negative values will effectively be clipped, and the resulting luma value will be too high. Other sub-sampling filters (especially the averaging "box") have
4641-445: The chroma channels have only the 0.5 MHz bandwidth for both Cr and Cb (or equivalently for I/Q). Thus the DV system actually provides a superior color bandwidth compared to the best composite analog specifications for NTSC, despite having only 1/4 of the chroma bandwidth of a "full" digital signal. Formats that use 4:1:1 chroma subsampling include: In 4:2:0, the horizontal sampling
SECTION 50
#17328513272554732-418: The chroma vertically. This ratio is possible, and some codecs support it, but it is not widely used. This ratio uses half of the vertical and one-fourth the horizontal color resolutions, with only one-eighth of the bandwidth of the maximum color resolutions used. Uncompressed video in this format with 8-bit quantization uses 10 bytes for every macropixel (which is 4×2 pixels) or 10 bit for every pixel. It has
4823-477: The concept of transmitting only the portions of the scene that changed from frame-to-frame. The concept of digital video compression dates back to 1952, when Bell Labs researchers B.M. Oliver and C.W. Harrison proposed the use of differential pulse-code modulation (DPCM) in video coding. In 1959, the concept of inter-frame motion compensation was proposed by NHK researchers Y. Taki, M. Hatori and S. Tanaka, who proposed predictive inter-frame video coding in
4914-408: The data in a previous I-frame or P-frame – a reference frame . To generate a P-frame, the previous reference frame is reconstructed, just as it would be in a TV receiver or DVD player. The frame being compressed is divided into 16 pixel by 16 pixel macroblocks . Then, for each of those macroblocks, the reconstructed reference frame is searched to find a 16 by 16 area that closely matches the content of
5005-414: The decoder program/hardware smaller, simpler, or faster. A profile restricts which encoding techniques are allowed. For example, the H.264 format includes the profiles baseline , main and high (and others). While P-slices (which can be predicted based on preceding slices) are supported in all profiles, B-slices (which can be predicted based on both preceding and following slices) are supported in
5096-455: The encoder takes the difference of all corresponding pixels of the two regions, and on that macroblock difference then computes the DCT and strings of coefficient values for the four 8×8 areas in the 16×16 macroblock as described above. This "residual" is appended to the motion vector and the result sent to the receiver or stored on the DVD for each macroblock being compressed. Sometimes no suitable match
5187-411: The equivalent chrominance bandwidth of a PAL-I or PAL-M signal decoded with a delay line decoder, and still very much superior to NTSC. Used by Sony in their HDCAM High Definition recorders (not HDCAM SR). In the horizontal dimension, luma is sampled horizontally at three quarters of the full HD sampling rate – 1440 samples per row instead of 1920. Chroma is sampled at 480 samples per row,
5278-458: The horizontal sample rate of luma: the horizontal chroma resolution is halved. This reduces the bandwidth of an uncompressed video signal by one-third, which means for 8 bit per component without alpha (24 bit per pixel) only 16 bits are enough, as in NV16. Many high-end digital video formats and interfaces use this scheme: In 4:1:1 chroma subsampling, the horizontal color resolution is quartered, and
5369-448: The human visual system is much more sensitive to variations in brightness than color, a video system can be optimized by devoting more bandwidth to the luma component (usually denoted Y'), than to the color difference components Cb and Cr . In compressed images, for example, the 4:2:2 Y'CbCr scheme requires two-thirds the bandwidth of non-subsampled "4:4:4" R'G'B' . This reduction results in almost no visual difference as perceived by
5460-424: The inverse cosine transform (also with perfect precision). The conversion from 8-bit integers to real-valued transform coefficients actually expands the amount of data used at this stage of the processing, but the advantage of the transformation is that the image data can then be approximated by quantizing the coefficients. Many of the transform coefficients, usually the higher frequency components, will be zero after
5551-489: The level restrictions. A few common MPEG-2 Profile/Level combinations are presented below, with particular maximum limits noted: Some applications are listed below. The following organizations have held patents for MPEG-2 video technology, as listed at MPEG LA . All of these patents are now expired in the US and most other territories. Video coding format A video coding format (or sometimes video compression format )
SECTION 60
#17328513272555642-424: The macroblock being compressed. The offset is encoded as a "motion vector". Frequently, the offset is zero, but if something in the picture is moving, the offset might be something like 23 pixels to the right and 4-and-a-half pixels up. In MPEG-1 and MPEG-2, motion vector values can either represent integer offsets or half-integer offsets. The match between the two regions will often not be perfect. To correct for this,
5733-489: The major video coding standards (including the H.26x and MPEG formats) that followed. MPEG-1 , developed by the Moving Picture Experts Group (MPEG), followed in 1991, and it was designed to compress VHS -quality video. It was succeeded in 1994 by MPEG-2 / H.262 , which was developed with patents licensed from a number of companies, primarily Sony , Thomson and Mitsubishi Electric . MPEG-2 became
5824-401: The matrix to a smaller quantity of data. It is this entropy coded data that is broadcast or that is put on DVDs. In the receiver or the player, the whole process is reversed, enabling the receiver to reconstruct, to a close approximation, the original frame. The processing of B-frames is similar to that of P-frames except that B-frames use the picture in a subsequent reference frame as well as
5915-574: The memory and processing power needed, defining maximum bit rates, frame sizes, and frame rates. A MPEG application then specifies the capabilities in terms of profile and level. For example, a DVD player may say it supports up to main profile and main level (often written as MP@ML). It means the player can play back any MPEG stream encoded as MP@ML or less. The tables below summarizes the limitations of each profile and level, though there are constraints not listed here. Note that not all profile and level combinations are permissible, and scalable modes modify
6006-439: The moving text. This image shows a single field. [REDACTED] 4:2:0 interlaced sampling applied to moving interlaced material. This image shows a single field. In the 4:2:0 interlaced scheme, however, vertical resolution of the chroma is roughly halved, since the chroma samples effectively describe an area 2 samples wide by 4 samples tall instead of 2×2. As well, the spatial displacement between both fields can result in
6097-457: The original frame is simply cut out (or lost in transmission), the following frames cannot be reconstructed properly. Making cuts in intraframe-compressed video while video editing is almost as easy as editing uncompressed video: one finds the beginning and ending of each frame, and simply copies bit-for-bit each frame that one wants to keep, and discards the frames one does not want. Another difference between intraframe and interframe compression
6188-485: The parts of the bitstream, and other pieces of information. Aside from features for handling fields for interlaced coding, MPEG-2 Video is very similar to MPEG-1 Video (and even quite similar to the earlier H.261 standard), so the entire description below applies equally well to MPEG-1. MPEG-2 includes three basic types of coded frames: intra-coded frames ( I-frames ), predictive-coded frames ( P-frames ), and bidirectionally-predictive-coded frames ( B-frames ). An I-frame
6279-590: The picture in a preceding reference frame. As a result, B-frames usually provide more compression than P-frames. B-frames are never reference frames in MPEG-2 Video. Typically, every 15th frame or so is made into an I-frame. P-frames and B-frames might follow an I-frame like this, IBBPBBPBBPBB(I), to form a Group of Pictures (GOP) ; however, the standard is flexible about this. The encoder selects which pictures are coded as I-, P-, and B-frames. P-frames provide more compression than I-frames because they take advantage of
6370-655: The previous line). However, little equipment has actually been produced that outputs a SECAM analogue video signal. In general, SECAM territories either have to use a PAL-capable display or a transcoder to convert the PAL signal to SECAM for display. Different variants of 4:2:0 chroma configurations are found in: Cb and Cr are each subsampled at a factor of 2 both horizontally and vertically. Most digital video formats corresponding to 576i "PAL" use 4:2:0 chroma subsampling. There are four main variants of 4:2:0 schemes, having different horizontal and vertical sampling siting relative to
6461-404: The quantization, which is basically a rounding operation. The penalty of this step is the loss of some subtle distinctions in brightness and color. The quantization may either be coarse or fine, as selected by the encoder. If the quantization is not too coarse and one applies the inverse transform to the matrix after it is quantized, one gets an image that looks very similar to the original image but
6552-443: The raw frame is divided into 8 pixel by 8 pixel blocks. The data in each block is transformed by the discrete cosine transform (DCT). The result is an 8×8 matrix of coefficients that have real number values. The transform converts spatial variations into frequency variations, but it does not change the information in the block; if the transform is computed with perfect precision, the original block can be recreated exactly by applying
6643-514: The recording, compression, and distribution of video content, used by 91% of video developers, followed by HEVC which is used by 43% of developers. Consumer video is generally compressed using lossy video codecs , since that results in significantly smaller files than lossless compression. Some video coding formats designed explicitly for either lossy or lossless compression, and some video coding formats such as Dirac and H.264 support both. Uncompressed video formats, such as Clean HDMI ,
6734-415: The region is not 2 pixels, but 4 pixels, so if 8 bits per component are used, the media would be 9 bits per pixel) and 4:2:1. The mapping examples given are only theoretical and for illustration. Also the diagram does not indicate any chroma filtering, which should be applied to avoid aliasing . To calculate required bandwidth factor relative to 4:4:4 (or 4:4:4:4), one needs to sum all the factors and divide
6825-577: The result by 12 (or 16, if alpha is present). Each of the three Y'CbCr components has the same sample rate, thus there is no chroma subsampling. This scheme is sometimes used in high-end film scanners and cinematic post-production. "4:4:4" may instead be wrongly referring to R'G'B' color space, which implicitly also does not have any chroma subsampling (except in JPEG R'G'B' can be subsampled). Formats such as HDCAM SR can record 4:4:4 R'G'B' over dual-link HD-SDI . The two chroma components are sampled at half
6916-519: The same resolution is called 4:4:4 . The MPEG-2 Video document considers all three sampling types, although 4:2:0 is by far the most common for consumer video, and there are no defined "profiles" of MPEG-2 for 4:4:4 video (see below for further discussion of profiles). While the discussion below in this section generally describes MPEG-2 video compression, there are many details that are not discussed, including details involving fields, chrominance formats, responses to scene changes, special codes that label
7007-528: The standard coding technique for video compression from the late 1980s onwards. The first digital video coding standard was H.120 , developed by the CCITT (now ITU-T) in 1984. H.120 was not usable in practice, as its performance was too poor. H.120 used motion-compensated DPCM coding, a lossless compression algorithm that was inefficient for video coding. During the late 1980s, a number of companies began experimenting with discrete cosine transform (DCT) coding,
7098-434: The standard video format for DVD and SD digital television . Its motion-compensated DCT algorithm was able to achieve a compression ratio of up to 100:1, enabling the development of digital media technologies such as video on demand (VOD) and high-definition television (HDTV). In 1999, it was followed by MPEG-4 / H.263 , which was a major leap forward for video compression technology. It uses patents licensed from
7189-427: The term YUV is often used erroneously to refer to Y'CbCr encoding. Hence, expressions like "4:2:2 YUV" always refer to 4:2:2 Y'CbCr, since there simply is no such thing as 4:x:x in analog encoding (such as YUV). Pixel formats used in Y'CbCr can be referred to as YUV too, for example yuv420p, yuvj420p and many others. In a similar vein, the term luminance and the symbol Y are often used erroneously to refer to luma, which
7280-428: The two fields are displayed alternately with the lines of one field interleaving between the lines of the previous field; this format is called interlaced video . The typical field rate is 50 (Europe/PAL) or 59.94 (US/NTSC) fields per second, corresponding to 25 (Europe/PAL) or 29.97 (North America/NTSC) whole frames per second. If the video is not interlaced, then it is called progressive scan video and each picture
7371-432: The two terms. The prime symbol ' is used to indicate gamma correction. Similarly, the chroma of video engineering differs from the chrominance of color science. The chroma of video engineering is formed from weighted tristimulus components (gamma corrected, OETF), not linear components. In video engineering practice, the terms chroma , chrominance , and saturation are often used interchangeably to refer to chroma, but it
7462-471: The video coding format VP9 itself a codec . As an example of conflation, Chromium's and Mozilla's pages listing their video formats support both call video coding formats, such as H.264 codecs . As another example, in Cisco's announcement of a free-as-in-beer video codec, the press release refers to the H.264 video coding format as a codec ("choice of a common video codec"), but calls Cisco's implementation of
7553-632: The video encoding standards for Blu-ray Discs ; all Blu-ray Disc players must be able to decode H.264. It is also widely used by streaming internet sources, such as videos from YouTube , Netflix , Vimeo , and the iTunes Store , web software such as the Adobe Flash Player and Microsoft Silverlight , and also various HDTV broadcasts over terrestrial ( ATSC standards , ISDB-T , DVB-T or DVB-T2 ), cable ( DVB-C ), and satellite ( DVB-S2 ). A main problem for many video coding formats has been patents , making it expensive to use or potentially risking
7644-512: The video is interlaced. All standards-conforming MPEG-2 Video decoders are also fully capable of playing back MPEG-1 Video streams. The ISO/IEC approval process was completed in November 1994. The first edition was approved in July 1995 and published by ITU-T and ISO/IEC in 1996. Didier LeGall of Bellcore chaired the development of the standard and Sakae Okubo of NTT was the ITU-T coordinator and chaired
7735-761: The viewer. The human vision system (HVS) processes color information ( hue and colorfulness ) at about a third of the resolution of luminance (lightness/darkness information in an image). Therefore it is possible to sample color information at a lower resolution while maintaining good image quality. This is achieved by encoding RGB image data into a composite black and white image, with separated color difference data ( chroma ). For example with Y ′ C b C r {\displaystyle Y'C_{b}C_{r}} , gamma encoded R ′ G ′ B ′ {\displaystyle R'G'B'} components are weighted and then summed together to create
7826-440: The way the eye works, it is possible to delete or approximate some data from video pictures with little or no noticeable degradation in image quality. A common (and old) trick to reduce the amount of data is to separate each complete "frame" of video into two "fields" upon broadcast/encoding: the "top field", which is the odd numbered horizontal lines, and the "bottom field", which is the even numbered lines. Upon reception/decoding,
7917-542: Was extended by two amendments to include the registration of copyright identifiers and the 4:2:2 Profile. ITU-T published these amendments in 1996 and ISO in 1997. There are also other amendments published later by ITU-T and ISO/IEC. The most recent edition of the standard was published in 2013 and incorporates all prior amendments. An HDTV camera with 8-bit sampling generates a raw video stream of 25 × 1920 × 1080 × 3 = 155,520,000 bytes per second for 25 frame-per-second video (using
8008-432: Was initially limited to intra-frame coding in the spatial dimension. In 1975, John A. Roese and Guner S. Robinson extended Habibi's hybrid coding algorithm to the temporal dimension, using transform coding in the spatial dimension and predictive coding in the temporal dimension, developing inter-frame motion-compensated hybrid coding. For the spatial transform coding, they experimented with different transforms, including
8099-446: Was introduced in the 1970s, initially using uncompressed pulse-code modulation (PCM), requiring high bitrates around 45–200 Mbit/s for standard-definition (SD) video, which was up to 2,000 times greater than the telecommunication bandwidth (up to 100 kbit/s ) available until the 1990s. Similarly, uncompressed high-definition (HD) 1080p video requires bitrates exceeding 1 Gbit/s , significantly greater than
8190-456: Was later added to the WebM specification. A format is the layout plan for data produced or consumed by a codec . Although video coding formats such as H.264 are sometimes referred to as codecs , there is a clear conceptual difference between a specification and its implementations. Video coding formats are described in specifications, and software, firmware , or hardware to encode/decode data in
8281-426: Was published in 1974. The other key development was motion-compensated hybrid coding. In 1974, Ali Habibi at the University of Southern California introduced hybrid coding, which combines predictive coding with transform coding. He examined several transform coding techniques, including the DCT, Hadamard transform , Fourier transform , slant transform, and Karhunen-Loeve transform . However, his algorithm
#254745