Create word cloud using Python
In this post, you will learn how to create a word cloud using the Python programming language.
A Wordcloud (or Tag cloud) is a visual representation of text information. It shows a list of words, the significance of each being appeared with text dimension or shading. This configuration is valuable for rapidly seeing the most conspicuous terms. It is generally used to depict keyword metadata (tags) on websites, analyzing customer, employee feedback, or to envision free frame text. Tags are usually single words, and the importance of each tag is shown by font size or color. You have definitely seen various types of basic and advanced charts for data visualization, like scatter plots, histograms, pie charts, and so on. A word cloud is a very interesting data visualization.
Python provides the wordcloud module to create word clouds and tag clouds. This module depends on numpy and pillow.
Install wordcloud module
Open your terminal window and run the following command to install wordcloud using the pip tool.
pip install wordcloud
The above command not only install the wordcloud but also install all supporting modules like matplotlib, pillow, numpy, and cycler.
Data for Wordcloud
We need data for wordcloud, the data can be any CSV, Excel file, or text scrap content. In this article, we have taken data via Wikipedia text scraping. This requires the installation of wikipedia module.
pip install wikipedia
In the given code, we have scrapped text for the 'Data Science' on Wikipedia and stored it in a string variable. Further, we will use this text to generate and display the word cloud.
import wikipedia
import re
data = wikipedia.page("Data_science")
text = data.content
# clean data
text = re.sub(r'==.*?==+', '', text)
final_text = text.replace('\n', '')
print(final_text)
Python wordcloud program
Here is very few lines of Python script to generate and save word cloud image. It helps us to better understand our data whenever we work on some Data Science projects. Stopwords are words that do not have any meaning like, 'I', 'we', 'are', 'is', 'am' and many more. The wordcloud already eliminates the most widely recognized stopwords for us. Yet, on the off chance that you imagine that your content corpus has some stop words that are not eliminated, you can generally affix new words that will be taken out.
The WordCloud method has attributes for different configuration settings, like width, height, background color, fonts. To demonstrate how easy it is, we have set the background color and limited the maximum words.
import wikipedia
import re
from wordcloud import WordCloud, STOPWORDS
import numpy as npy
from PIL import Image
data = wikipedia.page("Data_science")
text = data.content
# clean data
text = re.sub(r'==.*?==+', '', text)
final_text = text.replace('\n', '')
# define function to create word cloud
def word_cloud(data):
# open image
array_mask = npy.array(Image.open("cloud.png"))
# set image configurations and generate cloud
cloud = WordCloud(background_color = "white", max_words = 200,
mask = array_mask, stopwords = set(STOPWORDS))
cloud.generate(data)
# save wordcloud to a file
cloud.to_file("wcloud.png")
word_cloud(final_text)
The mask image is in the left and the generated output from the above code is on the right.
Related Articles
Count vowels in a string Python
Fizzbuzz program in Python
Count consonants in a string Python
Check if two strings are anagrams Python
Python program to sort words in alphabetical order
Find the stop words in nltk Python
Vader Sentiment Analysis Python
isalpha Python
Python YouTube Downloader Script
Python project ideas for beginners
Pandas string to datetime
Fillna Pandas Example
Lemmatization nltk
How to generate QR Code in Python using PyQRCode
OpenCV and OCR Python
PHP code to send SMS to mobile from website
Fibonacci Series Program in Python
Python File Handler - Create, Read, Write, Access, Lock File
Python convert XML to JSON
Python convert xml to dict
Python convert dict to xml