[9] The WEB 16:1 compressor.

[WARNING: this topic has generated the greatest volume of news in the
history of comp.compression. Read this before posting on this subject.]

[WARNING 2: it is quite possible that the story is repeating itself
with a compressor called MINC by Premier Research Corporation, Ltd.
They claim a breakthrough in lossless data compression using a recursive
method, that is, applying the compressor to the compressed output of
the previous run.]

[WARNING 3: the OWS program, which also claims incredible compression
ratios, is a hoax. It just remembers the clusters which contained
the data. The data can be recovered only if those clusters are not
used again for another file. Needless to say, never trust such a
lossy program.]

(a) What the press says

April 20, 1992  Byte Week Vol 4. No. 25:

   "In an announcement that has generated high interest - and more than a
   bit of skepticism - WEB Technologies (Smyrna, GA) says it has
   developed a utility that will compress files of greater than 64KB in
   size to about 1/16th their original length.  Furthermore, WEB says its
   DataFiles/16 program can shrink files it has already compressed."
   [...]
   "A week after our preliminary test, WEB showed us the program successfully
   compressing a file without losing any data.  But we have not been able
   to test this latest beta release ourselves."
   [...]
   "WEB, in fact, says that virtually any amount of data can be squeezed 
   to under 1024 bytes by using DataFiles/16 to compress its own output
   multiple times."

June 1992 Byte, Vol 17 No 6:

   [...] According to Earl Bradley, WEB Technologies' vice president of
   sales and marketing, the compression algorithm used by DataFiles/16
   is not subject to the laws of information theory. [...]


(b) First details, by John Wallace <buckeye@spf.trw.com>:

I called WEB at (404)514-8000 and they sent me some product
literature as well as chatting for a few minutes with me on the phone.
Their product is called DataFiles/16, and their claims for it are
roughly those heard on the net.

According to their flier:

"DataFiles/16 will compress all types of binary files to approximately
one-sixteenth of their original size ... regardless of the type of
file (word processing document, spreadsheet file, image file,
executable file, etc.), NO DATA WILL BE LOST by DataFiles/16."
(Their capitalizations; 16:1 compression only promised for files >64K
bytes in length.)

"Performed on a 386/25 machine, the program can complete a
compression/decompression cycle on one megabyte of data in less than
thirty seconds"

"The compressed output file created by DataFiles/16 can be used as the 
input file to subsequent executions of the program.  This feature of 
the utility is known as recursive or iterative compression, and will 
enable you to compress your data files to a tiny fraction of the 
original size.  In fact, virtually any amount of computer data can 
be compressed to under 1024 bytes using DataFiles/16 to compress its 
own output files muliple times.  Then, by repeating in reverse the 
steps taken to perform the recusive compression, all original data 
can be decompressed to its original form without the loss of a single 
bit."

Their flier also claims: 

"Constant levels of compression across ALL TYPES of FILES"
"Convenient, single floppy DATA TRANSPORTATION"

From my telephone conversation, I was was assured that this is an
actual compression program.  Decompression is done by using only the 
data in the compressed file; there are no hidden or extra files.


(c) More information, by Rafael Ramirez <rafael.ramirez@channel1.com>:

   Today (Tuesday, 28th) I got a call from Earl Bradley of Web
who now says that they have put off releasing a software version of
the algorithm because they are close to signing a major contract with
a big company to put the algorithm in silicon.  He said he could not
name the company due to non-disclosure agreements, but that they had
run extensive independent tests of their own and verified that the
algorithm works. [...]

He said the algorithm is so simple that he doesn't want anybody
getting their hands on it and copying it even though he said they
have filed a patent on it. [...] Mr. Bradley said the silicon version
would hold up much better to patent enforcement and be harder to copy.

   He claimed that the algorithm takes up about 4K of code, uses only
integer math, and the current software implementation only uses a 65K
buffer.  He said the silicon version would likely use a parallel
version and work in real-time. [...]


(d) The impossiblity proofs.

It is impossible for a given program to compress without loss *all*
files greater than a certain size by at least one bit. This can be
proven by a simple counting argument. (Many other proofs have been
posted on comp.compression, *please* do not post yet another one.)

Assume that the program can compress without loss all files of size >= N
bits.  Compress with this program all the 2^N files which have
exactly N bits.  All compressed files have at most N-1 bits, so there
are at most (2^N)-1 different compressed files [2^(N-1) files of size
N-1, 2^(N-2) of size N-2, and so on, down to 1 file of size 0]. So at
least two different input files must compress to the same output file.
Hence the compression program cannot be lossless. (Stronger results
about the number of incompressible files can be obtained, but the
proofs are a little more complex.)

This argument applies of course to WEB's case (take N = 64K*8 bits).
Note that no assumption is made about the compression algorithm.
The proof applies to *any* algorithm, including those using an
external dictionary, or repeated application of another algorithm,
or combination of different algorithms, or representation of the
data as formulas, etc... All schemes are subject to the counting argument.
There is no need to use information theory to provide a proof, just
basic mathematics.

This assumes of course that the information available to the decompressor
is only the bit sequence of the compressed data. If external information
such as a file name or a number of iterations is necessary to decompress
the data, the bits providing the extra information must be included in
the bit count of the compressed data. (Otherwise, it would be sufficient
to consider any input data as a number, use this as the iteration
count or file name, and pretend that the compressed size is zero.)
For an example of storing information in the file name, see the program
lmfjyh in the 1993 International Obfuscated C Code Contest, available
on all comp.sources.misc archives (Volume 39, Issue 104).

[See also question 73 "What is the theoretical compression limit?" in
part 2 of this FAQ.]


(e) No software version

Appeared on BIX, reposted by Bruce Hoult <Bruce.Hoult@actrix.gen.nz>:

tojerry/chaos #673, from abailey, 562 chars, Tue Jun 16 20:40:34 1992
Comment(s). 
----------
TITLE: WEB Technology
I promised everyone a report when I finally got the poop on WEB's
16:1 data compression. After talking back and forth for a year
and being put off for the past month by un-returned phone calls,
I finally got hold of Marc Spindler who is their sales manager.
 
_No_ software product is forth coming, period!
 
He began talking about hardware they are designing for delivery
at the end of the year. [...]


(f) Product cancelled

Posted by John Toebes <toebes@bnr.ca> on Aug 10th, 1992:

[Long story omitted, confirming the reports made above about the
original WEB claims.]

10JUL92 - Called to Check Status.  Was told that testing had uncovered a
          new problem where 'four numbers in a matrix were the same
          value' and that the programmers were off attempting to code a
          preprocessor to eliminate this rare case.  I indicated that he
          had told me this story before.  He told me that the
          programmers were still working on the problem.

31JUL92 - Final Call to Check Status.  Called Earl in the morning and
          was told that he still had not heard from the programmers. [...]
          Stated that if they could not resolve the problem then there would
          probably not be a product.

03AUG92 - Final Call.  Earl claims that the programmers are unable to
          resolve the problem.  I asked if this meant that there would
          not be a product as a result and he said yes.


(g) Conclusion

The last report given above should put an end to the WEB story.

[Note from the FAQ maintainer: I intended to remove this story from
the FAQ, but the recent announcement of the MINC compressor has some
similarities with the WEB story so it is useful to keep it a little
longer.]
Go Back Up

Go To Previous

Go To Next