Python gzip – compress decompress

Filed Under: Python

Python gzip module provides a very simple way to compress and decompress files and work in a similar manner to GNU programs gzip and gunzip.

In this lesson, we will study what classes are present in this module which allows us to perform the mentioned operations along with the additional functions it provides.

Python gzip module

This module provides us with the Gzip class which contains some convenience functions like open(), compress() and decompress().

The advantage Gzip class provides us is that it reads and writes gzip files and automatically compresses and decompresses it so that in the program, they looks just like normal File objects.

It is important to remember that the other formats which are supported by the programs gzip and gunzip are not supported by this module.

Using gzip module

We will now start using the functions we mentioned to perform compression and decompression operations.

Writing Compressed Files with open()

We will start with the open() function which creates an instance of GzipFile and open the file with wb mode to write to a compressed file:


import gzip
import io
import os

output_file_name = 'jd_example.txt.gz'
file_mode = 'wb'

with gzip.open(output_file_name, file_mode) as output:
    with io.TextIOWrapper(output, encoding='utf-8') as encode:
        encode.write('We can write anything in the file here.\n')

print(output_file_name, 
        'contains', os.stat(output_file_name).st_size, 'bytes')
os.system('file -b --mime {}'.format(output_file_name))

Let’s see the output for this program:

python gzip compress

Python Gzip write to compressed file

To write to the compressed file, we first opened it in the wb mode and wrapped the GzipFile instance with a TextIOWrapper from the io module to encode Unicode text to bytes which is suitable for compression.

Writing multiple lines to compressed file

This time, we will use almost the same script as we used above but we will write multiple lines to it. Let’s look at the code how this can be achieved:


import gzip
import io
import os
import itertools

output_file_name = 'jd_example.txt.gz'
file_mode = 'wb'

with gzip.open(output_file_name, file_mode) as output:
    with io.TextIOWrapper(output, encoding='utf-8') as enc:
        enc.writelines(
            itertools.repeat('JournalDev, same line again and again!.\n', 10)
        )

os.system('gzcat jd_example.txt.gz')

Let’s see the output for this program:

python gzip compress multiple files

Writing multiple lines to compressed file

Reading Compressed Data

Now that we’re done with the file writing process, we can read data form the compressed file as well. We will now use another file mode, which is rb, read mode.


import gzip
import io
import os

read_file_name = 'jd_example.txt.gz'
file_mode = 'rb'

with gzip.open(read_file_name, file_mode) as input_file:
    with io.TextIOWrapper(input_file, encoding='utf-8') as dec:
        print(dec.read())

Let’s see the output for this program:

python gzip open file

Read compressed file

Notice that there was nothing special we did here with Gzip apart form passing it a different file mode. The reading process is done by the TextIOWrapper which uses as File object which is provided by the gzip module.

Reading Streams

Another big advantage gzip module offers is that it can be used to wrap other types of streams as well so they can make use of compression too. This is extremely useful when you want to transmit a lot of data over web sockets.

Let’s see how we can compress and decompress stream data:


import gzip
from io import BytesIO
import binascii

write_mode = 'wb'
read_mode = 'rb'

uncompressed = b'Reiterated line n times.\n' * 8
print('Uncompressed Data:', len(uncompressed))
print(uncompressed)

buf = BytesIO()
with gzip.GzipFile(mode=write_mode, fileobj=buf) as file:
    file.write(uncompressed)

compressed = buf.getvalue()
print('Compressed Data:', len(compressed))
print(binascii.hexlify(compressed))

inbuffer = BytesIO(compressed)
with gzip.GzipFile(mode=read_mode, fileobj=inbuffer) as file:
    read_data = file.read(len(uncompressed))

print('\nReading it again:', len(read_data))
print(read_data)

Let’s see the output for this program:

python gzip stream

Read Stream

Notice that while writing, we didn’t have to provide any length parameters. But this wasn’t the case when we re-read the data. We had to pass the length to read() function explicitly.

Conclusion

In this lesson, we studied Python gzip module which can be used to read and write to compressed files with a big advantage that the modules makes the compressed file looks like just a normal File object.

Reference: API Doc

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages