Lossy compression

From wiki.gis.com
Jump to: navigation, search
Original Image (lossless PNG, 60.1 KiB size) — uncompressed is 108.5 KiB
Low compression (84% less information than uncompressed PNG, 9.37 KiB)
Medium compression (92% less information than uncompressed PNG, 4.82 KiB)
High compression (98% less information than uncompressed PNG, 1.14 KiB)

A lossy compression method is one where compressing data and then decompressing it retrieves data that is different from the original, but is close enough to be useful in some way. Lossy compression is most commonly used to compress multimedia data (audio, video, still images), especially in applications such as streaming media and internet telephony. By contrast, lossless compression is required for text and data files, such as bank records, text articles, etc. In many cases it is advantageous to make a master lossless file which can then be used to produce compressed files for different purposes; for example a multi-megabyte file can be used at full size to produce a full-page advertisement in a glossy magazine, and a 10 kilobyte lossy copy made for a small image on a web page.

Lossy and lossless compression

It is possible to compress many types of digital data in a way which reduces the amount of information stored, and consequently the size of a computer file needed to store it or the bandwidth needed to stream it, with no loss of information. Take, for example, a picture. It is converted to a digital file by considering it to be an array of dots, and specifying the colour and brightness of each dot. If the picture contains an area of the same colour, it can be compressed without loss by saying "200 red dots" instead of "red dot, red dot, ...(197 more times).. red dot".

The original contains a certain amount of information; there is a lower limit to the size of file that can carry all the information. As an intuitive example, most people know that a compressed ZIP file is smaller than the original file; but repeatedly compressing the file will not reduce the size to nothing.

In many cases files or data streams contain more information than is needed. For example, a picture may have more detail than the eye can distinguish when reproduced at the largest size intended; an audio file does not need a lot of fine detail during a very loud passage. Developing lossy compression techniques as closely matched to human perception as possible is a complex task. In some cases the ideal is a file which provides exactly the same perception as the original, with as much digital information as possible removed; in other cases perceptible loss of quality is considered a valid trade-off for the reduced data size.

Transform coding

More generally, lossy compression can be thought of as an application of transform coding – in the case of multimedia data, perceptual coding: it transforms the raw data to a domain that more accurately reflects the information content. For example, rather than expressing a sound file as the amplitude levels over time, one may express it as the frequency spectrum over time, which corresponds more accurately to human audio perception.

While data reduction (compression, be it lossy or lossless) is a main goal of transform coding, it also allows other goals: one may represent data more accurately for the original amount of space[1] – for example, in principle, if one starts with an analog or high-resolution digital master, an MP3 file of a given bitrate (e.g. 320 kbit/s) should provide a better representation than a raw uncompressed audio in WAV or AIFF file of the same bitrate. (Uncompressed audio can get lower bitrate only by lowering sampling frequency and/or sampling resolution.) Further, a transform coding may provide a better domain for manipulating or otherwise editing the data – for example, equalization of audio is most naturally expressed in the frequency domain (boost the bass, for instance) rather than in the raw time domain.

From this point of view, perceptual encoding is not essentially about discarding data, but rather about a better representation of data.

Another use is for backward compatibility and graceful degradation: in color television, encoding color via a luminance-chrominance transform domain (such as YUV) means that black-and-white sets display the luminance, while ignoring the color information.

Another example is chroma subsampling: the use of color spaces such as YIQ, used in NTSC, allow one to reduce the resolution on the components to accord with human perception – humans have highest resolution for black-and-white (luma), lower resolution for mid-spectrum colors like yellow and green, and lowest for red and blues – thus NTSC displays approximately 350 pixels of luma per scanline, 150 pixels of yellow vs. green, and 50 pixels of blue vs. red, which are proportional to human sensitivity to each component.

Information loss

Lossy compression formats suffer from generation loss: repeatedly compressing and decompressing the file will cause it to progressively lose quality. This is in contrast with lossless data compression.

Information-theoretical foundations for lossy data compression are provided by rate-distortion theory. Much like the use of probability in optimal coding theory, rate-distortion theory heavily draws on Bayesian estimation and decision theory in order to model perceptual distortion and even aesthetic judgment.

Types

There are two basic lossy compression schemes:

  • In lossy transform codecs, samples of picture or sound are taken, chopped into small segments, transformed into a new basis space, and quantized. The resulting quantized values are then entropy coded.
  • In lossy predictive codecs, previous and/or subsequent decoded data is used to predict the current sound sample or image frame. The error between the predicted data and the real data, together with any extra information needed to reproduce the prediction, is then quantized and coded.

In some systems the two techniques are combined, with transform codecs being used to compress the error signals generated by the predictive stage.

Lossy versus lossless

The advantage of lossy methods over lossless methods is that in some cases a lossy method can produce a much smaller compressed file than any lossless method, while still meeting the requirements of the application.

Lossy methods are most often used for compressing sound, images or videos. This is because these types of data are intended for human interpretation where the mind can easily "fill in the blanks" or see past very minor errors or inconsistencies – ideally lossy compression is transparent (imperceptible), which can be verified via an ABX test.

Transparency

When a user acquires a lossily compressed file, (for example, to reduce download time) the retrieved file can be quite different from the original at the bit level while being indistinguishable to the human ear or eye for most practical purposes. Many compression methods focus on the idiosyncrasies of human physiology, taking into account, for instance, that the human eye can see only certain wavelengths of light. The psychoacoustic model describes how sound can be highly compressed without degrading perceived quality. Flaws caused by lossy compression that are noticeable to the human eye or ear are known as compression artifacts.

Compression ratio

The compression ratio (that is, the size of the compressed file compared to that of the uncompressed file) of lossy video codecs is nearly always far superior to that of the audio and still-image equivalents.

  • Video can be compressed immensely (e.g. 100:1) with little visible quality loss;
  • Audio can often be compressed at 10:1 with imperceptible loss of quality;
  • Still images are often lossily compressed at 10:1, as with audio, but the quality loss is more noticeable, especially on closer inspection.

Transcoding and editing

An important caveat about lossy compression is that converting (formally, transcoding) or editing lossily compressed files causes digital generation loss from the re-encoding. This can be avoided by only producing lossy files from (lossless) originals, and only editing (copies of) original files, such as images in raw image format instead of JPEG.

Editing of lossy files

Some editing of lossily compressed files without degradation of quality, by modifying the compressed data directly without decoding and re-encoding, is possible. Editing which reduces the file size as if it had been compressed to a greater degree, but without more loss than this, is sometimes also possible.

JPEG

The primary programs for lossless editing of JPEGs are jpegtran, and the derived exiftran (which also preserves Exif information), and Jpegcrop (which provides a Windows interface).

These allow the image to be

  • cropped,
  • rotated, flipped, and flopped, or
  • converted to grayscale (by dropping the chrominance channel).

While unwanted information is destroyed, the quality of the remaining portion is unchanged.

JPEGjoin allows different JPEG images which have the same encoding to be joined without re-encoding. (See also: New jpegtran features.)

Some changes can be made to the compression without re-encoding:

  • optimize the compression (to reduce size without change to the decoded image),
  • convert between progressive and non-progressive encoding,

The freeware Windows-only IrfanView has some lossless JPEG operations in its JPG_TRANSFORM plugin.

MP3

Splitting and joining
Mp3splt and Mp3wrap (or AlbumWrap) allow an MP3 file to be split into pieces or joined losslessly. These are analogous to split and cat.[2]
Gain
Various Replay Gain programs such as MP3gain allow the gain (overall volume) of MP3 files to be modified losslessly.

Metadata

Metadata, such as ID3 tags, VorbisComments, or Exif information, can usually be modified or removed without modifying the underlying data.

Downsampling/compressed representation scalability

One may wish to downsample or otherwise decrease the resolution of the represented source signal and the quantity of data used for its compressed representation without re-encoding, as in bitrate peeling, but this functionality is not supported in all designs, as not all codecs encode data in a form that allows less important detail to simply be dropped.

Some well known designs that have this capability include JPEG 2000 for still images and H.264/MPEG-4 AVC based Scalable Video Coding for video. Actually such schemes have also been standardized for older designs as well, such as JPEG images with progressive encoding, and MPEG-2 and MPEG-4 Part 2 video, although those prior schemes had limited success in terms of adoption into real-world common usage.

Without this capacity, which is often the case in practice, to produce a representation with lower resolution or lower fidelity than a given one, one needs to start with the original source signal and encode, or start with a compressed representation and then decompress and re-encode it (transcoding), thought this latter tends to cause digital generation loss.

Some audio formats feature a combination of a lossy format and a lossless correction which when combined reproduce the original signal; the correction can be stripped, leaving a smaller, lossily compressed, file. Such formats include MPEG-4 SLS (Scalable to Lossless), WavPack, and OptimFROG DualStream.

Methods

Graphics

Image

  • Cartesian Perceptual Compression: Also known as CPC
  • DjVu
  • Fractal compression
  • HAM, hardware compression of color information used in Amiga computers
  • ICER, used by the Mars Rovers: related to JPEG 2000 in its use of wavelets
  • JPEG
  • JPEG 2000, JPEG's successor format that uses wavelets, for Lossy or Lossless compression.
  • JBIG2
  • PGF, Progressive Graphics File (lossless or lossy compression)
  • Wavelet compression
  • S3TC texture compression for 3D computer graphics hardware

Video

  • H.261
  • H.263
  • H.264
  • MNG (supports JPEG sprites)
  • Motion JPEG
  • MPEG-1 Part 2
  • MPEG-2 Part 2
  • MPEG-4 Part 2 and Part 10 (AVC)
  • Ogg Theora (noted for its lack of patent restrictions)
  • Dirac
  • Sorenson video codec
  • VC-1

Audio

Music

  • AAC
  • ADPCM
  • ATRAC
  • Dolby AC-3
  • MP2
  • MP3
  • Musepack
  • Ogg Vorbis (noted for its lack of patent restrictions)
  • WMA

Speech

  • CELP
  • G.711
  • G.726
  • Harmonic and Individual Lines and Noise (HILN)
  • AMR (used by GSM cell carriers, such as T-Mobile)
  • Speex (noted for its lack of patent restrictions)

Other data

Researchers have (semi-seriously) performed lossy compression on text by either using a thesaurus to substitute short words for long ones, or generative text techniques [3], although these sometimes fall into the related category of lossy data conversion.

Lowering resolution

A general kind of lossy compression is to lower the resolution of an image, as in image scaling, particularly decimation. One may also remove less "lower information" parts of an image, such as by seam carving.

Many media transforms, such as Gaussian blur, are, like lossy compression, irreversible: the original signal cannot be reconstructed from the transformed signal. However, in general these will have the same size as the original, and are not a form of compression.

See also

  • Compression artifact
  • Rate–distortion theory
  • List of codecs
  • Lenna

Notes

  1. “Although one main goal of digital audio perceptual coders is data reduction, this is not a necessary characteristic. As we shall see, perceptual coding can be used to improve the representation of digital audio through advanced bit allocation.” Masking and Perceptual Coding, Victor Lombardi
  2. Though the wrap programs do more, encoding the divisions between the original files.
  3. I. H. WITTEN, et al.. "Semantic and Generative Models for Lossy Text Compression" (PDF). The Computer Journal. http://compression.ru/download/articles/text/witten_1994cj_lossy_text_compression.pdf. Retrieved 2007-10-13. 

External links


Template:Compression Methods