programmatically decompress pdf LZWDecode data?

Discussion:

(too old to reply)

ben

2003-09-15 21:19:31 UTC

how can i go about programmatically uncompressing some pdf stream data
that's in the LZWDecode compression?

i have the lzw stream of data in question in a variable seperated from
the rest of the pdf data. the data in this variable starts in exactly
the right place and continues for the correct length.

now what?

i'm wring my code in c (and objective-c) and am using mac os x which is
unix based.

has anyone got any pointers for decompressing lzw data? should i try
and use a utility that's already on the system? (that's what i was
adivised to do - use the unix utility 'uncompress' but it now turns out
that this doesn't deal with lzw i don't think) or should i use some
already written code? if so what?

any suggestions / pointers would be great.

thanks, ben.

Alex Cherepanov

2003-09-15 21:38:21 UTC

Permalink

Post by ben
how can i go about programmatically uncompressing some pdf stream data
that's in the LZWDecode compression?

One can easily write such a program in PostScript and
run it on Ghostscript or Distiller.

Post by ben
i have the lzw stream of data in question in a variable seperated from
the rest of the pdf data. the data in this variable starts in exactly
the right place and continues for the correct length.
now what?
i'm wring my code in c (and objective-c) and am using mac os x which is
unix based.

I can send you my little program (coded in PostScript) that extracts
and decodes the stream content for a given object.

ben

2003-09-15 23:38:09 UTC

Permalink

Post by Alex Cherepanov

Post by ben
how can i go about programmatically uncompressing some pdf stream data
that's in the LZWDecode compression?

One can easily write such a program in PostScript and
run it on Ghostscript or Distiller.
I can send you my little program (coded in PostScript) that extracts
and decodes the stream content for a given object.

i don't think that's quite what i'm looking for, but thanks very much
anyway. i've already done a lot of the code that reads the pdf - the
extraction of the bits of pdfs: cross reference table making, page tree
stucture making, name tree making. decompression of streams in the
flate compression is ok. but if the stream in question happens to be
LZWDecode i am unable to deal with that stream right now - it's just
that little step that i'm stuck on. so i'm not looking for pdf stream
object extraction as i've already done all that - i'm just looking for
a way take a stream that i already know the exact details of and
decompress it.

i think the best thing would be a decompressLZW() c function or similar.

how did you give your code the ability to decompress lzw?

thanks, ben

Alex Cherepanov

2003-09-16 13:29:21 UTC

Permalink

Post by ben
how did you give your code the ability to decompress lzw?

PostScript has buils-in decoding filter for LZW and all
other PDF streams.

You can use Ghostscript sources as a sample implementation.
Don't forget what GPL says about the derived work.

Micha Bieber

2003-09-15 22:58:11 UTC

Permalink

Post by ben
i'm wring my code in c (and objective-c) and am using mac os x which is
unix based.

zlib is your friend.

http://www.gzip.org/zlib/

Post by ben
thanks, ben.

Micha

Micha Bieber

2003-09-15 23:16:02 UTC

Permalink

Post by Micha Bieber
zlib is your friend.

Sorry, I was too quick here. zlib is good for the Deflate filter only - not for LZW
encoded data. Anyway, maybe you can arrange the things in your PDF sources this way - the
compression of Deflate is in most cases better.
You mentioned system utilities. Perhaps gunzip instead uncompress can decode your data.

Hope, it helps
Micha

ben

2003-09-16 00:25:25 UTC

Permalink

Post by Micha Bieber

Post by Micha Bieber
zlib is your friend.

Sorry, I was too quick here. zlib is good for the Deflate filter only - not for LZW
encoded data.

:) yeah, i've already used zlib to inflate - wish lzw required a
similar amount of effort as that (not too much effort that is).

Post by Micha Bieber
Anyway, maybe you can arrange the things in your PDF sources
this way - the
compression of Deflate is in most cases better.

i'm looking to read pdfs in general and i want to be able to read as
wide a spectrum of them as possible, and obviously which format pdfs
are saved in all round the world is out of my hands. lzw just happens
to be one of the compressions used in pdfs.

Post by Micha Bieber
You mentioned system utilities. Perhaps gunzip instead uncompress can decode your data.

good suggestion - i had a look at the manual page of that earlier on:
lz77. i think it might do lzh also, but in any case, not lzw. at least
it doesn't say it covers that. someone else suggested that maybe my
conclusion that 'uncompress' does not handle lzw might be incorrect -
there probably isn't one rigid lzw format. they'll be little
differences/nuences within different but lzw formats.

the possible nuences within one single compression format, in this case
lzw obviously, worries me (something i've been ignoring/hoping isn't
the case) - time to go and read the pdf specs more and again. ahh. :)

thanks for your suggestions, ben.

Micha Bieber

2003-09-16 01:01:43 UTC

Permalink

Post by ben
lz77. i think it might do lzh also, but in any case, not lzw. at least
it doesn't say it covers that. someone else suggested that maybe my
conclusion that 'uncompress' does not handle lzw might be incorrect -
there probably isn't one rigid lzw format. they'll be little
differences/nuences within different but lzw formats.
the possible nuences within one single compression format, in this case
lzw obviously, worries me (something i've been ignoring/hoping isn't
the case) - time to go and read the pdf specs more and again. ahh. :)

I hope the statement here can clarify the things a little bit:

http://www.faqs.org/faqs/compression-faq/part1/

Search for "It is likely that your Unix system has".

Notice also the statement about the .gz format some rows above there.
The more in-depth info (FAQ,source) at the top order domain
http://www.gzip.org may be useful too.

Micha

ben

2003-09-16 11:43:14 UTC

Permalink

Post by Micha Bieber
http://www.faqs.org/faqs/compression-faq/part1/
Search for "It is likely that your Unix system has".

oh yeah, 'compress', or 'uncompress' (same utility), does use lzw. so
it must be a case of different nuences within the same compression
format. i must need to do modify the data somehow before passing it to
'uncompress' in order to give it data in exactly the format it
requires. the 'lzw' format is obviously too general/vague a description
on it's own for decompression of it.

Post by Micha Bieber
Notice also the statement about the .gz format some rows above there.
The more in-depth info (FAQ,source) at the top order domain
http://www.gzip.org may be useful too.

yup, thanks v. much for the info

ben.