Python tarfile module

Filed Under: Python

Python tarfile module is used to read and write tar archives. Python provides us excellent tools and modules to manage compressed files, which includes (but not limited to) performing file and directory compression with different mechanisms like gzip, bz2 and lzma compression.

In this post, we will see various practical demonstrations of Python tarfile module functions. This is similar to python zip function. Let’s get started.

Python tarfile module

Python tarfile module provides us functions to perform various operations like:

  • read and write gzip, bz2 and lzma archives
  • read and write POSIX.1-1988 (ustar) format
  • read and write support for GNU tar format
  • read and write gzip, bz2 and lzma archives

Apart from these features, we can also handle directories and restore file information like timestamp, access permissions and owner.

Checking validity of TAR files

We will start by a simplest example of checking if a file is a valid TAR file. We will use is_tarfile() function to do this:


import tarfile

for file_name in [ 'README.txt', 'example.tar.gz' ]:
    try:
        print(file_name, tarfile.is_tarfile(filename))
    except (IOError, err):
        print(file_name, err)

Let’s run this example and check the output:
python tarfile check validity
Note that these files should exist in the directory you run this script in.

Reading TAR file metadata

In this section, we will study metadata related to a TAR file like what files does it contain, using the open() and getnames() function:


import tarfile

t = tarfile.open('example.tar.gz', 'r')
print("Files in TAR file:")
print(t.getnames())

Let’s run this example and check the output:
python tarfile example, python tarfile
Note that, we just put sample files in this TAR to demonstrate.

Let’s get a little deep in getting the file’s metadata before moving on to next example. We will print its size and much more information related to it:


import tarfile
import time

t = tarfile.open('example.tar.gz', 'r')
for info in t.getmembers():
    print(info.name)
    print('Modified:', time.ctime(info.mtime))
    print('Mode    :', oct(info.mode))
    print('Type    :', info.type)
    print('Size    :', info.size, 'bytes')

When we run this program, we can see much more information related to the files:
python tar file metadata

Extracting Files From an Archive

Here, we will extract files from the archive file:


import tarfile

t = tarfile.open('example.tar.gz', 'r')
for file_name in [ 'TarFolder/README.txt', 'TarFolder/tarfile_validity.py' ]:
    try:
        f = t.extractfile(file_name)
    except KeyError:
        print('ERROR: Did not find %s in tar archive' % file_name)
    else:
        print(file_name, ':', f.readlines())

Let’s run this example and check the output:
python tarfile extractfile

Adding Files to an Archive

Here, we will add files to an archive file:


import tarfile

print('creating archive')
out = tarfile.open('example.tar.gz', mode='w')
try:
    print('adding README.txt')
    out.add('README.txt')
finally:
    print('closing tar archive')
    out.close()

print('Contents of archived file:')
t = tarfile.open('example.tar.gz', 'r')
for member in t.getmembers():
    print(member.name)

Let’s run this example and check the output:
python tarfile add file
Here, it is worth noticing that ‘w’ doesnn’t preserve previous contents of the file. We can instead use ‘a’ mode to append files to an archive.

Appending Files to an Archive

Here, we will append files to an archive file, instead of using the ‘w’ mode:


import tarfile

print('creating archive')
out = tarfile.open('example.tar.gz', mode='a')
try:
    print('adding README.txt')
    out.add('README.txt')
finally:
    print('closing tar archive')
    out.close()

print('Contents of archived file:')
t = tarfile.open('example.tar.gz', 'r')
for member in t.getmembers():
    print(member.name)

Let’s run this example and check the output:
python tarfile append
Clearly, after adding README.txt to the TAR, there now exists 2 files.

Reference: API Documentation.

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages