Instagram Web Scraping with Python

Filed Under: Python Modules
Insta Scraping Python

In this tutorial, we’ll cover how to build your own Instagram data scraping tool and gain information about accounts present.

Instagram scraping implies gathering information that is publicly available on the web. You can scrape data like such as email addresses, phone numbers, images, etc ( based on what is available).

Also Read: Wikipedia Scraping In Python – Easy Ways

Code Implementation for Instagram Scraper

To scrap Instagram, we will be using a Python library known as instaloader which comes with an API for scraping data from Instagram. You can install the same by using the pip command in your command prompt. Now If you have the package on your system, let’s start building the web scraper!

Importing Modules

The Instaloader module comes with many functions to scrap Instagram, it can help to download pictures along with their captions. To know more about the library refer to Instaloader Documentation here.

import instaloader

Along with this, we will also create a bot (instance) that will help to scrape data for us.

bot = instaloader.Instaloader()

Now, let’s take input from the user to get the username of the account they wish to scrape data for using the code below.

Username = input('Enter the Account Username: ')

For this tutorial, let’s scrape data for an account whose username is dancejoy, a dance videos Instagram account.

Extracting Profile Pictures

We will start off by downloading the profile picture of the account using the code below.

bot.download_profile(Username, profile_pic_only = True)

A new folder will be created with the account username that will contain the profile picture of the account. In our case, the profile picture is below.

Dancejoy Profile Picture Insta
Dancejoy Profile Picture Insta

Extract More Information

Now let’s try to extract some more valuable information for the account using the code below.

profile = instaloader.Profile.from_username(bot.context, Username)

print("Username: ", profile.username)
print("User ID: ", profile.userid)
print("Number of Posts: ", profile.mediacount)
print("Followers: ", profile.followers)
print("Followees: ", profile.followees)
print("Bio: ", profile.biography,profile.external_url)

Let’s have a look at the output. It will look something like this:

Username:  dancejoy
User ID:  3056368980
Number of Posts:  1849
Followers:  1406049
Followees:  161
Bio:  A community of Dancers None

Downloading Posts from the Profile

To download each post from the username, we need to loop over the posts using the code below.

profile = instaloader.Profile.from_username(bot.context, Username)
posts = profile.get_posts()
try:
  for index, post in enumerate(posts, 1):
    bot.download_post(post, target=f"{profile.username}_{index}")
except:
  exit()

The output looks somewhat like it’s shown below.

dancejoy_1/2022-03-17_16-58-28_UTC.jpg [Incredibly mesmerizing 🤯  Cre…] dancejoy_1/2022-03-17_16-58-28_UTC.mp4 json 
dancejoy_2/2022-03-14_21-51-02_UTC.jpg [These two 🔥 🔥  Dancers: @tom1…] dancejoy_2/2022-03-14_21-51-02_UTC.mp4 json 
dancejoy_3/2022-03-12_20-20-03_UTC.jpg [The flexibility on this kid 🤯…] dancejoy_3/2022-03-12_20-20-03_UTC.mp4 json 
dancejoy_4/2022-03-11_00-11-05_UTC.jpg [Catwoman energy 🖤  Dancer: @l…] dancejoy_4/2022-03-11_00-11-05_UTC.mp4 json 
dancejoy_5/2022-03-08_19-01-36_UTC.jpg [Just WOW ❤️‍🔥  Dancer: @camia…] dancejoy_5/2022-03-08_19-01-36_UTC.mp4 json 
dancejoy_6/2022-03-04_21-12-59_UTC.jpg [@ryanmhatch and @ryanwlambert…] dancejoy_6/2022-03-04_21-12-59_UTC.mp4 json 
dancejoy_7/2022-03-02_17-39-34_UTC.jpg [And the Academy Award goes to…] dancejoy_7/2022-03-02_17-39-34_UTC.mp4 json 
dancejoy_8/2022-02-28_19-53-51_UTC.jpg [Monday mood 🔥  💃🏻: @georgiefa…] dancejoy_8/2022-02-28_19-53-51_UTC.mp4 json 
dancejoy_9/2022-02-27_00-56-25_UTC.jpg [@stefannym and @yeeremy_lugo …] dancejoy_9/2022-02-27_00-56-25_UTC.mp4 json 
dancejoy_10/2022-02-24_19-32-46_UTC.jpg [Us knowing it’s Friday tomorr…] dancejoy_10/2022-02-24_19-32-46_UTC.mp4 json 
dancejoy_11/2022-02-10_16-29-34_UTC.jpg [Traditional dance from #Madag…] dancejoy_11/2022-02-10_16-29-34_UTC.mp4 json 
dancejoy_12/2022-02-09_16-34-19_UTC.jpg [The guy in the back😂😆  Via @e…] dancejoy_12/2022-02-09_16-34-19_UTC.mp4 json 

Now, these don’t look like posts right?. If you check the code directly, you will find that the posts are saved in form of separate folders that will contain the actual content of the posts be it a video or image.

Conclusion

You can play around with the Instaloader library and even explore more features. You can even make use of Python Tkinter.

Also Read: Python Tkinter tutorial – part 1

Thank you for reading!

close
Generic selectors
Exact matches only
Search in title
Search in content