Hello, readers! In this article, we will be focusing on Python Wikipedia module, in detail.
So, let us begin!! 🙂
Understanding Wikipedia module in Python
Information is the key factor for any outcome in terms of data analysis, scraping, estimations, etc.
Python provides us with a Wikipedia module to have information at our fingertips. With the Wikipedia module, we can have information from the Wikipedia website within our code with minimal scripting.
The Wikipedia module contains lots of functions that enable us to access, scrape and parse the information from the website itself.
We will be looking at the below functions offers by Wikipedia module–
- Fetch random page headers
- Summary of any title/header of a post
- Use the entire Wikipedia article page
- Fetch and receive the data in a multi-linguistic form
1. Fetch random page headers
With the Python Wikipedia module, we can easily fetch random titles of the articles. The wikipedia.random() method accepts the number of titles that we want as a parameter and then returns the string of titles as output.
Here, pages represents the number of pages whose titles needs to the represented.
import wikipedia print(wikipedia.random(pages=4))
['Kazakhstan national badminton team', 'Hisøya', 'The Jam (comics)', 'Institut Nova Història']
2. Fetch an entire Wikipedia article
With Wikipedia module, we can also extract the complete content, the category, and the title.
At first, we make use of page() function to get the object that will point to the sections of the article. For the same, we are supposed to pass the header of the article as a parameter to page() function.
Post which, the entire page is being pointed by the page object. Now, we can get the categories of the article using the categories attribute. Also, the entire content of the article can be represented using the content attribute.
Let us have a look at the below example.
import wikipedia page = wikipedia.page('Random-access machine') print(page) print("TITLE: ",page.original_title) print("CATEGORIES: ",page.categories) print("CONTENT: ",page.content)
TITLE: Random-access machine CATEGORIES: ['All articles lacking in-text citations', 'All articles that are too technical', 'All articles with style issues', 'Articles lacking in-text citations from December 2017', 'Articles with multiple maintenance issues', 'CS1 errors: dates', 'Register machines', 'Wikipedia articles that are too technical from December 2017', 'Wikipedia articles with style issues from December 2017'] CONTENT: In computer science, random-access machine (RAM) is an abstract machine in the general class of register machines. The RAM is very similar to the counter machine but with the added capability of 'indirect addressing' of its registers. Like the counter machine the RAM has its instructions in the finite-state portion of the machine (the so-called Harvard architecture). The RAM's equivalent of the universal Turing machine – with its program in the registers as well as its data – is called the random-access stored-program machine or RASP. It is an example of the so-called von Neumann architecture and is closest to the common notion of computer. Together with the Turing machine and counter-machine models, the RAM and RASP models are used for computational complexity analysis. Van Emde Boas (1990) calls these three plus the pointer machine "sequential machine" models, to distinguish them from "parallel random-access machine" models.
3. Summary of the Wikipedia article
We can fetch the summary of an article with the summary() function. The summary() function takes the number of sentences as parameters and then returns those many sentences as a summary of it.
import wikipedia page = wikipedia.page('Random-access machine') print(page) sent = wikipedia.summary(page,sentences=2) print(sent)
<WikipediaPage 'Random-access machine'> In computer science, random-access machine (RAM) is an abstract machine in the general class of register machines. The RAM is very similar to the counter machine but with the added capability of 'indirect addressing' of its registers.
4. Fetch and receive data in a different language
Apart from having the data from Wikipedia with a summary or the entire context, we can also have the data translated into different languages.
The set_lang() function enables us to set the language of the represented data as output.
import wikipedia wikipedia.set_lang("fr") sent = wikipedia.summary('Random-access machine',sentences=2) print(sent)
La mémoire vive, parfois abrégée avec l'acronyme anglais RAM (Random Access Memory), est la mémoire informatique dans laquelle peuvent être enregistrées les informations traitées par un appareil informatique. On écrit mémoire vive par opposition à la mémoire morte.
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For more such posts related to Python programming, Stay tuned with us.
Till then, Happy Learning!! 🙂