Information theory
Information theory is a branch of the
mathematical theory of
probability and mathematical
statistics, that quantifies the concept of
information. It is concerned with
information entropy,
communication systems,
data transmission and
rate distortion theory,
cryptography,
data compression,
error correction, and related topics. It is not to be confused with
library and information science or
information technology.
Claude E. Shannon (
1916-
2001) has been called "the father of information theory". His theory for the first time considered communication as a rigorously stated mathematical problem in
statistics and gave communications engineers a way to determine the
capacity of a
communication channel in terms of the common currency of
bits. The transmission part of the theory is not concerned with the meaning (
semantics) of the message conveyed, though the complementary wing of information theory concerns itself with content through
lossy compression of messages subject to a fidelity criterion.
These two wings of information theory are joined together and mutually justified by the information transmission theorems, or source-channel separation theorems that justify the use of bits as the universal currency for information in many contexts.
It is generally believed that the modern discipline of information theory began with the publication of Shannon's article "The Mathematical Theory of Communication" in the
Bell System Technical Journal in
July and
October of
1948. This work drew on earlier publications by
Harry Nyquist and
Ralph Hartley. In the process of working out a theory of communications that could be applied by electrical engineers to design better telecommunications systems, Shannon defined a measure of
entropy:
-
(where
pi is the probability of
i) that, when applied to an information source, could determine the capacity of the channel required to transmit the source as encoded binary digits. If the
logarithm in the formula is taken to base 2, then it gives a measure of entropy in bits. Shannon's measure of entropy came to be taken as a measure of the
information contained in a message, as opposed to the portion of the message that is strictly determined (hence predictable) by inherent structures, like for instance redundancy in the structure of languages or the statistical properties of a language relating to the frequencies of occurrence of different letter or word pairs, triplets etc. See
Markov chains.
Recently however, it has emerged that entropy was defined and used during the second world war by
Alan Turing at
Bletchley Park. Turing named it 'weight of evidence' and measured it in units called bans and decibans. Turing and Shannon collaborated during the war but it appears that they independently created the concept. (References are given in Alan Turing:The Enigma by Andrew Hodges.)
Entropy as defined by Shannon is closely related to
entropy as defined by
physicists.
Boltzmann and
Gibbs did considerable work on
statistical thermodynamics. This work was the inspiration for adopting the term entropy in information theory. There are deep relationships between entropy in the thermodynamic and informational senses. For instance,
Maxwell's demon needs information to reverse thermodynamic entropy and getting that information exactly balances out the thermodynamic gain that the demon would otherwise achieve.
Among other useful measures of information is
mutual information, a measure of the
correlation between two
random variables. Mutual information is defined for two events and as
-
where is the joint entropy, defined as
-
and is the conditional entropy of conditioned on observing . As such, the mutual information can be intuitively considered the amount of uncertainty in that is eliminated by observations of and vice versa.
Mutual information is closely related to the
log-likelihood ratio test for multinomials and to
Pearson's χ2 test.
Shannon information is appropriate for measuring uncertainty over an unordered space. An alternative measure of information was created by Fisher for measuring uncertainty over an ordered space. For example, Shannon information is used over the space of alphabetic letters, as letters do not have 'distances' between them. For information about the value of a continuous parameter such as a person's height, Fisher information is used, as estimated heights do have a well-defined distance.
Differences in Shannon information correspond to a special case of the Kullback-Liebler divergence of
Bayesian statistics, a measure of the distance between the prior and posterior probability distributions.
A. N. Kolmogorov introduced an information measure that is based on the shortest algorithm that can recreate it; see
Kolmogorov complexity.
See also
References
- Claude E. Shannon, Warren Weaver. The Mathematical Theory of Communication. Univ of Illinois Press, 1963. ISBN 0252725484
External links
Category:Communication
\n\n \n\n\n\n\n