Python urllib – Python 3 urllib

Filed Under: Python
python urllib post

Python urllib module allows us to access URL data programmatically.

Python urllib

  • We can use Python urllib to get website content in python program.
  • We can also use it to call REST web services.
  • We can make GET and POST http requests.
  • This module allows us to make HTTP as well as HTTPS requests.
  • We can send request headers and also get information about response headers.

Python urllib GET example

Let’s start with a simple example where we will read the content of Wikipedia home page.


import urllib.request

response = urllib.request.urlopen('https://www.wikipedia.org')

print(response.read())

Response read() method returns the byte array. Above code will print the HTML data returned by the Wikipedia home page. It will not be in human readable format, but we can use some HTML parser to extract useful information from it.

python urllib request example

Python urllib request with header

Let’s see what happens when we try to run the above program for JournalDev.


import urllib.request

response = urllib.request.urlopen('https://www.journaldev.com')

print(response.read())

We will get below error message.


/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/pankaj/Documents/PycharmProjects/BasicPython/urllib/urllib_example.py
Traceback (most recent call last):
  File "/Users/pankaj/Documents/PycharmProjects/BasicPython/urllib/urllib_example.py", line 3, in <module>
    response = urllib.request.urlopen('https://www.journaldev.com')
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

It’s because my server doesn’t allow programmatic access to the website data because it’s meant for browsers that can parse HTML data. Usually we can overcome this error by sending User-Agent header in request. Let’s look at the modified program for this.


import urllib.request

# Request with Header Data to send User-Agent header
url = 'https://www.journaldev.com'

headers = {}
headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'

request = urllib.request.Request(url, headers=headers)
resp = urllib.request.urlopen(request)

print(resp.read())

We are creating request headers using dictionary and then sending it in the request. Above program will print HTML data received from JournalDev home page.

Python urllib REST Example

REST web services are accessed over HTTP protocols, so we can easily access them using urllib module. I have a simple JSON based demo rest web service running on my local machine created using JSON Server. It’s a great Node module to run dummy JSON REST web services for testing purposes.


import urllib.request

response = urllib.request.urlopen('http://localhost:3000/employees')

print(response.read())

python urllib GET example

Notice the console output is printing JSON data.

Python urllib response headers

We can get response headers by calling info() function on response object. This returns a dictionary, so we can also extract specific header data from response.


import urllib.request

response = urllib.request.urlopen('http://localhost:3000/employees')

print(response.info())

print('Response Content Type is = ', response.info()["content-type"])

Output:


X-Powered-By: Express
Vary: Origin, Accept-Encoding
Access-Control-Allow-Credentials: true
Cache-Control: no-cache
Pragma: no-cache
Expires: -1
X-Content-Type-Options: nosniff
Content-Type: application/json; charset=utf-8
Content-Length: 260
ETag: W/"104-LQla2Z3Cx7OedNGjbuVMiKaVNXk"
Date: Wed, 09 May 2018 19:26:20 GMT
Connection: close


Response Content Type is =  application/json; charset=utf-8

Python urllib POST

Let’s look at an example for POST method call.


import urllib.request
import urllib.parse

post_url = 'http://localhost:3000/employees'

headers = {}
headers['Content-Type'] = 'application/json'

# POST request encoded data
post_data = urllib.parse.urlencode({'name' : 'David', 'salary'  : '9988'}).encode('ascii')

#Automatically calls POST method because request has data
post_response = urllib.request.urlopen(url=post_url, data=post_data)

print(post_response.read())

When we call urlopen function, if request has data then it automatically uses POST http method. Below image shows the output of above POST call for my demo service.

python urllib post

You can download the code from my GitHub Repository.

Reference: API Doc

Comments

  1. vishal yadav says:

    i want to get price of particular product from different online shopping website.
    can you please help for this.

    Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages