Text Compression - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Text Compression

Description:

Output: ascii text file. Name of inputfile: Accept it from prompt (it will have .zzz extension) ... proper ascii file, you need to print error message and save ... – PowerPoint PPT presentation

Number of Views:224
Avg rating:3.0/5.0
Slides: 12
Provided by: venkat3
Category:

less

Transcript and Presenter's Notes

Title: Text Compression


1
Text Compression
  • Programming Assignment 4

ECE573 Data Structures and Algorithms
Electrical and Computer Engineering
Dept. Rutgers University http//www.cs.rutgers.edu
/vchinni/dsa/
2
Text Compression
  • ZIP??
  • We can often reduce disk storage needed to store
    a text file by storing a coded version
  • Example?
  • Text string (1000 xs followed by 2000 ys)
  • xxxxxxxxxxyyyyyyyyyy 3000 bytes 2 bytes
  • Coded version
  • Run length coding 1000x2000y 10 bytes 2
    bytes
  • Run length in binary 1000x2000 y 6 bytes 2
    byte
  • (run length as integer in 2 bytes max run
    length 216)

3
Text Compression
Compressor
ASCII Text file
Binary File
Decompressor
  • Issues
  • What algorithm? LZW Mehod (Lempel, Ziv, and
    Welch)
  • What other parameters to consider?
  • Techniques/Tools
  • C, Hashing, Testing

4
Input/Output
  • Compression
  • Input ascii text file
  • Output binary file
  • Name of inputfile Accept it from prompt
  • Code table reinitialization (in kbytes) read it
    from prompt
  • Name of outputfile inputfile.zzz
  • Name of the executable pa4c
  • Note No additional info to be written to the
    output file

Decompression Input binary file (compressed as
per LZW algorithm) Output ascii text file Name
of inputfile Accept it from prompt (it will have
.zzz extension) Code table reinitialization (in
kbytes) read it from prompt Name of outputfile
inputfile (without .zzz extension) Name of the
executable pa4d
Errors write it to inputfile.err also print on
the screen and exit
5
LZW Compression
  • Input string aaabbbbbbaabaaba
  • Mapping strings of text ? numeric codes
  • Assign all characters that may occur in the file
    a code
  • a ?0, b?1
  • Find longest prefix p of the unencoded part of
    the input file that is in the dictionary and
    outputs its code
  • If the next character is c, then the prefix
    string pc is assigned next code
  • Mapping is stored in a dictionary (key and code)

OUTPUT 0214537
6
Implementation
  • Dictionary organization Chained Hash table (code
    key)
  • Key of length lgt1 has the property that its first
    l-1 characters are the key of other entry in the
    dictionary
  • Repalce the key by prefix-code and a character
  • Example aa may be 0a, aaba is 3a
  • Use fixed number of bits for code 12 bits long ?
    4096 codes
  • Key needs a code and char (so integer 32 bits is
    good LSB is char next 12 bits are code prefix)
  • Divisor 4099
  • Output of code
  • 12 bit long ? combine two codes as three chars
  • Compression
  • Begin dictionary using 256 character codes, 4096
    max codes

7
LZW Decompression
  • Input string 0214537
  • Mapping numeric codes ? text
  • Have to rebuild the code table!!
  • Assign all characters that may occur in the file
    a code
  • Algorithm?

OUTPUT 0214537
8
Implementation
  • Dictionary organization Chained Hash table (code
    key)
  • F(k) k mapping
  • Output of code
  • 12 bit long ? combine two codes as three chars
  • Compression
  • Begin dictionary using 256 character codes, 4096
    max codes

9
Problem
  • Modify LZW compression and decompress programs so
    that the code table is reinitialized after every
    x kbytes of the text file have been
    compressed/decompressed. Experiment with
    modified compressed code using text files that
    are 100k to 200k bytes long and x 10,20,30, 40,
    and 50. Which value of x gives the best
    compression?

10
Rules
  • The hash table implementation will be checked by
    TA
  • The code also will be reviewed (Good programming
    practice is a must).

11
Submission Testing
  • TA will test the code using regular text files of
    different sizes. Both compression and
    decompression routines will be tested.
  • The output of compression files of one will be
    decompressed with others decompressed routines
    and vice versa.
  • If you find any errors while compressing or
    decompressing (such as not a proper binary file
    or proper ascii file, you need to print error
    message and save error message and gracefully
    exit).
Write a Comment
User Comments (0)
About PowerShow.com