[8] What about patents on data compression algorithms?
[Note: the appropriate group for discussing software patents is
comp.patents (or misc.legal.computing), not comp.compression.]
All patents mentioned here are US patents, and thus probably
not applicable outside the US. See item 70, "Introduction to data
compression" for the meaning of LZ77, LZ78 or LZW.
(a) Run length encoding
- Tsukiyama has two patents on run length encoding: 4,586,027 and 4,872,009
granted in 1986 and 1989 respectively. The first one covers run length
encoding in its most primitive form: a length byte followed by the
repeated byte. The second patent covers the 'invention' of limiting the
run length to 16 bytes and thus the encoding of the length on 4 bits.
Here is the start of claim 1 of patent 4,872,009, just for pleasure:
1. A method of transforming an input data string comprising a plurality
of data bytes, said plurality including portions of a plurality of
consecutive data bytes identical to one another, wherein said data
bytes may be of a plurality of types, each type representing different
information, said method comprising the steps of: [...]
- O'Brien has patented (4,988,998) run length encoding followed by LZ77.
(b) LZ77
- Waterworth patented (4,701,745) the algorithm now known as LZRW1,
because Ross Williams reinvented it later and posted it on
comp.compression on April 22, 1991. (See item 5 for the ftp site
with all LZRW derivatives.) The *same* algorithm has later been
patented by Gibson & Graybill (see below). The patent office failed
to recognize that the same algorithm was patented twice, even though
the wording used in the two patents is very similar.
The Waterworth patent is now owned by Stac Inc, which won a lawsuit
against Microsoft, concerning the compression feature of MSDOS 6.0.
Damages awarded were $120 million.
- Fiala and Greene obtained in 1990 a patent (4,906,991) on all
implementations of LZ77 using a tree data structure. Claim 1 of the
patent is much broader than the algorithms published by Fiala and
Greene in Comm.ACM, April 89. The patent covers the algorithm
published by Rodeh and Pratt in 1981 (J. of the ACM, vol 28, no 1,
pp 16-24). It also covers the algorithm previously patented by
Eastman-Lempel-Ziv (4,464,650), and the algorithms used in lharc,
lha and zoo.
- Notenboom (from Microsoft) 4,955,066 uses three levels of
compression, starting with run length encoding.
- The Gibson & Graybill patent 5,049,881 covers the LZRW1 algorithm
previously patented by Waterworth and reinvented by Ross Williams.
Claims 4 and 12 are very general and could be interpreted as
applying to any LZ algorithm using hashing (including all variants
of LZ78):
4. A compression method for compressing a stream of input data into
a compressed stream of output data based on a minimum number of
characters in each input data string to be compressed, said
compression method comprising the creation of a hash table, hashing
each occurrence of a string of input data and subsequently searching
for identical strings of input data and if such an identical string
of input data is located whose string size is at least equal to the
minimum compression size selected, compressing the second and all
subsequent occurrences of such identical string of data, if a string
of data is located which does not match to a previously compressed
string of data, storing such data as uncompressed data, and for each
input strings after each hash is used to find a possible previous
match location of the string, the location of the string is stored
in the hash table, thereby using the previously processed data to
act as a compression dictionary.
Claim 12 is identical, with 'method' replaced with 'apparatus'. Since
the 'minimal compression size' can be as small as 2, the claim could
cover any dictionary technique of the LZ family. However the text of the
patent and the other claims make clear that the patent should cover the
LZRW1 algorithm only. (In any case the Gibson & Graybill patent is likely
to be invalid because of the prior art in the Waterworth patent.)
- Phil Katz, author of pkzip, also has a patent on LZ77 (5,051,745)
but the claims only apply to sorted hash tables, and when the hash
table is substantially smaller than the window size.
- IBM patented (5,001,478) the idea of combining a history buffer (the
LZ77 technique) and a lexicon (as in LZ78).
- Stac Inc patented (5,016,009 and 5,126,739) yet another variation of LZ77
with hashing. The '009 patent was used in the lawsuit against Microsoft
(see above). Stac also has patents on LZ77 with parallel lookup in
hardware (4,841,092 and 5,003,307).
- Robert Jung, author of 'arj', has been granted patent 5,140,321
for one variation of LZ77 with hashing. This patent covers the LZRW3-A
algorithm, also previously discovered by Ross Williams. LZRW3-A was posted
on comp.compression on July 15, 1991. The patent was filed two months later
on Sept 4, 1991. (The US patent system allows this because of the
'invention date' rule.)
- Chambers 5,155,484 is yet another variation of LZ77 with hashing.
The hash function is just the juxtaposition of two input bytes,
this is the 'invention' being patented. The hash table is named
'direct lookup table'.
(c) LZ78
- One form of the original LZ78 algorithm was patented (4,464,650) by
its authors Lempel, Ziv, Cohn and Eastman.
- The LZW algorithm used in 'compress' is patented by IBM (4,814,746)
and Unisys (4,558,302). It is also used in the V.42bis compression
standard (see question 11 on V.42bis below) and in Postscript Level 2.
(Unisys sells the license to modem manufacturers for a onetime
$25,000 fee.) The IBM patent application was filed three weeks
before that of Unisys, but the US patent office failed to recognize
that they covered the same algorithm. (The IBM patent is more
general, but its claim 7 is exactly LZW.)
- AP coding is patented by Storer (4,876,541). (Get the yabba package
for source code, see question 2 above, file type .Y)
(d) arithmetic coding
- IBM holds many patents on arithmetic coding (4,286,256 4,295,125
4,463,342 4,467,317 4,633,490 4,652,856 4,891,643 4,905,297 4,935,882).
It has patented in particular the Q-coder implementation of arithmetic
coding. The arithmetic coding option of the JPEG standard requires
use of the patented algorithm. No JPEG-compatible method is
possible without infringing the patent, because what IBM actually
claims rights to is the underlying probability model (the heart of
an arithmetic coder). (See the JPEG FAQ for details.)
AT&T has 3 patents on arithmetic coding (4,973,961, 5,023,611, 5,025,258).
(e) predictor
- The 'predictor' algorithm was first described in the paper
Raita, T. and Teuhola, J. (1987), "Predictive text compression by hashing",
ACM Conference on Information Retrieval
This algorithm has been patented (5,229,768) by K. Thomas in 1993. It
is used in the Internet Draft "PPP Predictor Compression Protocol" (see
venera.isi.edu:/internet-drafts/draft-ietf-pppext-predictor-00.txt).
As can be seen from the above list, some of the most popular compression
programs (compress, pkzip, zoo, lha, arj) are now covered by patents.
(This says nothing about the validity of these patents.)
Here are some references on data compression patents. A number of them are
taken from the list prep.ai.mit.edu:/pub/lpf/patent-list.
3,914,586
Data compression method and apparatus
filed 10/25/73, granted 10/21/75
General Motors Corporation, Detroit MI
Duane E. McIntosh, Santa Ynez CA
Data compression apparatus is disclosed is operable in either a bit
pair coding mode of a word coding mode depending on the degree of
redundancy of the data to be encoded.
3,976,844
Data communication system for transmitting data in compressed form
filed Apr. 4, 1975, granted Aug. 24, 1976
inventor Bernard K. Betz, assignee Honeywell Information Systems, Inc.
[encode differences with previous line]
4,021,782
Data compaction system and apparatus
inventor Hoerning
filed 04/30/1975, granted 05/03/1977
[A primitive form of LZ77 with implicit offsets (compare with previous record)]
4,054,951
Data expansion apparatus
inventor R.D. Jackson, assignee IBM
filed Jun. 30, 1976, granted Oct. 18, 1977
[Covers only decompression of data compressed with a variant of LZ77.]
4,087,788
Data compression system
filed 1/14/77, granted 5/2/78
NCR Canada LTD - NCR Canada Ltee, Mississauga CA
Brian J. Johannesson, Waterloo CA
A data compression system is disclosed in which the left hand boundary
of a character is developed in the form of a sequence of Freeman
direction codes, the codes being stored in digital form within a
processor.
4,286,256
Method and means for arithmetic coding using a reduced number of operations.
granted Aug 25, 1981
assignee IBM
4,295,125
A method and means for pipeline decoding of the high to low order pairwise
combined digits of a decodable set of relatively shifted finite number of
strings
granted Oct 13, 1981
assignee IBM
4,412,306
System for minimizing space requirements for storage and transmission of
digital signals
filed May 14, 1981, granted Oct. 25, 1983
inventor Edward W. Moll
4,463,342
A method and means for carry-over control in a high order to low order
combining of digits of a decodable set of relatively shifted finite number
strings.
granted Jul 31, 1984
assignee IBM
4,491,934
Data compression process
filed May 12, 1982, granted Jan. 1, 1985
inventor Karl E. Heinz
4,464,650
Apparatus and method for compressing data signals and restoring the
compressed data signals
inventors Lempel, Ziv, Cohn, Eastman
assignees Sperry Corporation and At&T Bell Laboratories
filed 8/10/81, granted 8/7/84
A compressor parses the input data stream into segments where each
segment comprises a prefix and the next symbol in the data stream
following the prefix.
4,467,317
High-speed arithmetic compression using using concurrent value updating.
granted Aug 21, 1984
assignee IBM
4,494,108
Adaptive source modeling for data file compression within bounded memory
filed Jun. 5, 1984, granted Jan. 15, 1985
invntors Glen G. Langdon, Jorma J. Rissanen
assignee IBM
order 1 Markov modeling
4,558,302
High speed data compression and decompression apparatus and method
inventor Welch
assignee Sperry Corporation (now Unisys)
filed 6/20/83, granted 12/10/85
The text for this patent can be ftped from rusmv1.rus.uni-stuttgart.de
(129.69.1.12) in /info/comp.patents/US4558302.Z.
4,560,976
Data compression
filed 6/5/84, granted 12/24/85
Codex Corporation, Mansfield MA
Steven G. Finn, Framingham, MA
A stream of source characters, which occur with varying relative
frequencies, is encoded into a compressed stream of codewords, each
having one, two or three subwords, by ranking the source characters by
their current frequency of appearance, encoding the source characters
having ranks no higher than a first number as one subword codewords,
source characters having ranks higher than the first number but no
higher than a second number as two subword codewords, and the
remaining source characters as three subword codewords.
4,586,027
Method and system for data compression and restoration
inventor Tsukimaya et al.
assignee Hitachi
filed 08/07/84, granted 04/29/86
patents run length encoding
4,597,057
System for compressed storate of 8-bit ascii bytes using coded strings
of 4-bit nibbles.
inventor Snow, assignee System Development corporation.
filed 12/31/1981, granted 06/24/1986.
Compression using static dictionary of common words, prefixes and suffixes.
4,612,532
Data compression apparatus and method
inventor Bacon, assignee Telebyte Corportion
filed Jun. 19, 1984, granted Sep. 16, 1986
[Uses followsets as in the pkzip 0.92 'reduce' algorithm, but the
followsets are dynamically updated. This is in effect a sort of order-1
Markov modeling.]
4,622,545
Method and apparatus for image compression and Manipulation
inventor William D. Atkinson
assignee Apple computer Inc.
filed 9/30/82
granted 11/11/86
4,633,490
Symmetrical adaptive data compression/decompression system.
granted Dec 30, 1985
assignee IBM
4,652,856
A multiplication-free multi-alphabet arithmetic code.
granted Feb 4, 1986
assignee IBM
4,667,649
Data receiving apparatus
filed 4/18/84, granted 6/30/87
inventors Kunishi et al.
assignee Canon Kabushiki Kaisha, Tokyo Japan
compression of Fax images.
4,682,150
Data compression method and apparatus
inventors Mathes and Protheroe,
assignee NCR Corporation, Dayton OH
A system and apparatus for compressing redundant and nonredundant
binary data generated as part of an operation of a time and attendance
terminal in which the data represents the time an employee is present
during working hours.
4,701,745
Data compression system
inventor Waterworth John R
assignee Ferranti PLC GB, patent rights now acquired by Stac Inc.
filed 03/03/1986 (03/06/1985 in GB), granted 10/20/1987
Algorithm now known as LZRW1 (see above)
I claim:
1. A data compression system comprising an input store for receiving
and storing a plurality of bytes of uncompressed data from an outside
source, and data processing means for processing successive bytes of
data from the input store;
the data processing means including circuit means operable to check
whether a sequence of successive bytes to be processed identical with
a sequence of bytes already processed, and including hash generating
means responsive to the application of a predetermined number of
bytes in sequence to derive a hash code appropriate to those bytes, a
temporary store in which the hash code may represent the address of a
storage location, and a pointer counter operable to store in the
temporary store at said address a pointer indicative of the position
in the input store of one of the predetermined number of bytes;
output means operable to apply to a transfer medium each byte of data
not forming part of such an identical sequence; and
encoding means responsive to the identification of such a sequence to
apply to the transfer medium an identification signal which identifies
both the location in the input store of the previous occurrence of the
sequence of bytes and the number of bytes contained in the sequence.
4,730,348
Adaptive data compression system
inventor MacCrisken, assignee Adaptive Computer Technologies
filed Sep. 19, 1986, granted Mar. 8, 1988
[order-1 Markov modeling + Huffman coding + LZ77]
4,758,899
Data compression control device
inventor Tsukiyama, assignee Hitachi
filed 11/20/1985, granted 07/19/1988
Limits compression to ensure that tape drive stays busy.
4,809,350
Data compression system
filed Jan. 30, 1987, granted Feb. 28, 1989
inventor Yair Shimoni & Ron Niv
assignee Elscint Ltd., Haifa, Israel
[Image compression via variable length encoding of differences with
predicted data.]
4,814,746
Data compression method
inventors Victor S. Miller, Mark N. Wegman
assignee IBM
filed 8/11/86, granted 3/21/89
A previous application was filed on 6/1/83, three weeks before the
application by Welch (4,558,302)
Communications between a Host Computing System and a number of remote
terminals is enhanced by a data compression method which modifies the
data compression method of Lempel and Ziv by addition of new character
and new string extensions to improve the compression ratio, and
deletion of a least recently used routine to limit the encoding tables
to a fixed size to significantly improve data transmission efficiency.
4,841,092
continued in 5,003,307
4,853,696
Code converter for data compression/decompression
filed 4/13/87, granted 8/1/89
inventor Amar Mukherjee, Maitland FL
assignee University of Central Florida, Orlando FL
Another hardware Huffman encoder:
A code converter has a network of logic circuits connected in reverse
binary tree fashion with logic paths between leaf nodes and a common
root node.
4,872,009
Method and apparatus for data compression and restoration
inventor Tsukimaya et al.
assignee Hitachi
filed 12/07/87, granted 10/03/89
This patent on run length encoding covers the 'invention' of limiting
the run length to 16 bytes and thus the encoding of the length on 4 bits.
4,876,541
Stem [sic] for dynamically compressing and decompressing electronic data
filed 10/15/87, granted 10/24/89
inventor James A. Storer
assignee Data Compression Corporation
A data compression system for encoding and decoding textual data,
including an encoder for encoding the data and for a decoder for
decoding the encoded data.
4,891,643
Arithmetic coding data compression/de-compression by selectively
employed, diverse arithmetic encoders and decoders.
granted Jan 2, 1990
assignee IBM
4,905,297
granted Feb 27, 1990
assignee IBM
Arithmetic coding encoder and decoder system.
4,906,991
Textual substitution data compression with finite length search window
filed 4/29/1988, granted 3/6/1990
inventors Fiala,E.R., and Greene,D.H.
assignee Xerox Corporation
4,935,882
Probability adaptation for arithmetic coders.
granted Jun 19, 1990
assignee IBM
4,941,193
Barnsley, fractal compression.
4,943,869
Compression Method for Dot Image Data
filed 1988-05-04, granted 1990-07-24
assignee Fuji Photo Film Co.
Lossy and lossless image compression schemes.
4,955,066
Compressing and Decompressing Text Files
filed 10/13/89, granted 09/04/90
inventor Notenboom, L.A.
assignee Microsoft
Now extended as 5,109,433
[Noted in signon screen of Word 5.5 and on the outside of the MS-DOS 5.0
Upgrade.]
A method of compressing a text file in digital form is disclosed.
A full text file having characters formed into phrases is provided by an
author. The characters are digitally represented by bytes. A first pass
compression is sequentially followed by a second pass compression of the
text which has previously been compressed. A third or fourth level of
compression is serially performed on the compressed text. For example, in
a first pass, the text is run-length compressed. In a second pass, the
compressed text is further compressed with key phrase compression. In a
third pass, the compressed text is further compressed with Huffman
compression. The compressed text is stored in a text file having a Huffman
decode tree, a key phrase table, and a topic index. The data is
decompressed in a single pass and provided one line at a time as an output.
Sequential compressing of the text minimizes the storage space required for
the file. Decompressing of the text is performed in a single pass. As a
complete line is decompressed, it is output rapidly, providing full text to
the user.
4,973,961
Method and apparatus for carry-over control in arithmetic coding.
granted Nov 27, 1990
assignee AT&T
4,988,998
Data compression system for successively applying at least two data
compression methods to an input data stream.
inventor O'Brien
assignee Storage Technology Corporation, Louisville, Colorado
filed Sep 5, 1989, granted Jan 29, 1991.
Run length encoding followed by LZ77.
5,001,478
Method of Encoding Compressed Data
filed 12/28/89, granted 03/19/91
inventor Michael E. Nagy
assignee IBM
1. A method of encoding a compressed data stream made up of a sequence of
literal references, lexicon references and history references, which
comprises the steps of:
assigning to each literal reference a literal identifier;
assigning to each history reference a history identifier;
assigning to each lexicon reference a lexicon identifier;
and emitting a data stream with said identifiers assigned to said references.
Gordon Irlam <gordoni@cs.adelaide.edu.au> says:
The invention can probably be best understood by considering the
decompressor. It consists of a history buffer, and a lexicon buffer, both
of which are initially empty. The history buffer contains the last n
symbols emitted. Whenever a history buffer reference is to be output the
string so referenced is subsequently moved to the lexicon buffer for future
reference. Thus the history buffer keeps track of strings that may be
repeated on a very short term basis, while the lexicon buffer stores items
for a longer time. Furthermore a history reference involves specifying
both the offset and length within the history buffer, whereas a lexicon
reference simply specifies a number denoting the string. Both buffers have
a finite size.
5,003,307
Data compression apparatus with shift register search means
filed Oct. 6, 1989, granted Mar. 26, 1991
inventors George Glen A, Ivey Glen E, Whiting Douglas L
assignee Stac Inc
continuation of 4,841,092
5,016,009
Data compression apparatus and method
filed 01/13/1989, granted 05/14/1991
inventors George Glen A, Ivey Glen E, Whiting Douglas L
assignee Stac Inc
LZ77 with offset hash table (extended in 5,126,739)
5,023,611
Entropy encoder/decoder including a context extractor.
granted Jun 11, 1991
assignee AT&T
5,025,258
Adaptive probability estimator for entropy encoder/decoder.
granted Jun 18, 1991
assignee AT&T
5,049,881
Apparatus and method for very high data rate-compression incorporating
lossless data compression and expansion utilizing a hashing technique
inventors Dean K. Gibson, Mark D. Graybill
assignee Intersecting Concepts, Inc.
filed 6/18/90, granted 9/17/91
[covers lzrw1, almost identical with Waterworth 4,701,745]
5,051,745
String searcher, and compressor using same
filed 8/21/90, granted 9/24/91
inventor Phillip W. Katz (author of pkzip)
In the string search method and apparatus pointers to the string to be
searched are indexed via a hashing function and organized according to the
hashing values of the string elements pointed to. The hashing function is
also run on the string desired to be found, and the resulting hashing value
is used to access the index. If the resulting hashing value is not in the
index, it is known that the target string does not appear in the string
being searched. Otherwise the index is used to determine the pointers which
correspond to the target hashing value, these pointers pointing to likely
candidates for matching the target string. The pointers are then used to
sequentially compare each of the locations in the string being searched to
the target string, to determine whether each location contains a match to
the target string.
In the method and apparatus for compressing a stream of data symbols, a
fixed length search window, comprising a predetermined contiguous portion
of the symbol stream, is selected as the string to be searched by the
string searcher. If a string to be compressed is found in the symbol
stream, a code is output designating the location within the search window
of the matching string and the length of the matching string.
5,065,447
Barnsley, fractal compression
5,109,433
Compressing and decompressing text files
inventor Notenboom
assignee Microsoft
extension of 4,955,066
5,126,739
Data Compression Apparatus and Method
filed Nov. 27, 1990, granted June 30, 1992.
inventor Whiting et. al
assignee Stac Inc
LZ77 with offset hash table (extension of 5,016,009)
5,140,321
Data compression/decompression method and apparatus
filed 9/4/91, granted 8/18/92
inventor Robert Jung
assignee Prime Computer
5,155,484
Fast data compressor with direct lookup table indexing into history buffer
filed 9/13/1991, granted 10/13/1992
inventor Chambers, IV, Lloyd L., Menlo Park, California
assignee Salient Software, Inc., Palo Alto, California (02)
Uses a 64K hash table indexed by the first two characters of
the input string. Includes several claims on the LZ77 file format
(literal or pair offset,length).
5,179,378
file Jul. 30, 1991, granted Jan. 12, 1993
inventor Ranganathan
assignee University of South Florida
Method and apparatus for the compression and decompression of data
using Lempel-Ziv based techniques.
[This covers LZ77 hardware compression with a systolic array of
processors working in parallel.]
5,229,768
Adaptive data compression system
granted Jul. 20, 1993
inventor Kasman E. Thomas
assignee Traveling Software, Inc.
A system for data compression and decompression is disclosed. A series of
fixed length overlapping segments, called hash strings, are formed from an
input data sequence. A retrieved character is the next character in the input
data sequence after a particular hash string. A hash function relates a
particular hash string to a unique address in a look-up table (LUT). An
associated character for the particular hash string is stored in the LUT at
the address. When a particular hash string is considered, the content of the
LUT address associated with the hash string is checked to determine whether
the associated character matches the retrieved character following the hash
string. If there is a match, a Boolean TRUE is output; if there is no match,
a Boolean FALSE along with the retrieved character is output. Furthermore, if
there is no match, then the LUT is updated by replacing the associated
character in the LUT with the retrieved character. [...]
[This algorithm is used in the Internet Draft "PPP Predictor
Compression Protocol".]
Japan 2-46275
Coding system
granted Feb 26, 1990
[Patents one form of arithmetic coding.]
Go Back Up
Go To Previous
Go To Next