Lemmatization Python nltk
In this post, you will learn about the Python Lemmatization nltk.
Lemmatization is the algorithmic cycle of finding the lemma of a word contingent upon their significance. Lemmatization generally alludes to the morphological analysis of words, which plans to eliminate inflectional endings. It helps in restoring the base or word reference type of a word, which is known as the lemma.
Text preprocessing includes both stemming and lemmatization. Stemming calculation works by cutting the postfix from the word. From a more extensive perspective, this cuts off either the start or end of the word. Lemmatization is all the more remarkable activity that considers the morphological investigation of words. It restores the lemma which is the base type of all its inflectional structures. Lemmatizer minimises text ambiguity and helps in forming better machine learning features.
Applications of lemmatization are -
- Utilized in comprehensive retrieval systems like web crawlers.
- Utilized in conservative ordering.
Basically, it will change all words having a similar significant but unique meaning to their base form. It reduces the word density in the given text. It will also save memory as well as computational costs. Example words like "car" or "cars" are converted to base word "car". Similarly, example words like "call", "calling", "called" have the same common word root "call".
NLTK library provides methods to find a word root. In this article, we have used the nltk.stem.wordnet module for lemmatization. It lemmatizes using WordNet's built-in morphy function. The WordNet is a freely available lexical database for the English language focused on establishing structured semantic relationships between words. The lemmatize() method returns the word root of the given word if found in WordNet otherwise, it returns the input word unchanged. It has the following syntax -
lemmatize(word, pos)
Here, the pos argument is the POS tag of the word/token because the same word can be contextually or semantically different.
Python Lemmatization Example
Here is the first very basic example of nltk lemmatization -
# import modules
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
print("screws :", lemmatizer.lemmatize("screws"))
print("corporate :", lemmatizer.lemmatize("corporate"))
print("offers:", lemmatizer.lemmatize("offers"))
Output of the above code -
NLTK Lemmatizer POS Example
Here, we have taken the pos argument in the lemmatize() method. The 'n', 'v' and 'a' denote 'noun', 'verb' and 'adjective' respectively.
# import these modules
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
print("Copies :", lemmatizer.lemmatize("Copies", pos ="n"))
print("copies :", lemmatizer.lemmatize("copies", pos ="v"))
print("bigger :", lemmatizer.lemmatize("bigger", pos ="a"))
print("best :", lemmatizer.lemmatize("best", pos ="a"))
The above code returns the following output -
NLTK Lemmatize Sentence
The below code lemmatizes the given sentence -
# import modules
import nltk
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
sentence = "There are more planets than stars in our galaxy. The current count orbiting our star: eight."
punctuations="?:!.,;"
words = nltk.word_tokenize(sentence)
# remove punctuations
for word in words:
if word in punctuations:
words.remove(word)
print("{0:20}{1:20}".format("Word","Lemma"))
for word in words:
print ("{0:20}{1:20}".format(word,lemmatizer.lemmatize(word)))
When we execute the above code, it returns the following output -
Related Articles
Find the stop words in nltk PythonInstall NLTK for Python on Windows 64 bit
Python Tkinter Combobox Event Binding
Python program to check leap year
Python Spell Checker Program
Python remove punctuation from string
How to convert Excel to CSV Python Pandas
How to read data from excel file using Python Pandas
How to read data from excel file in Python
Python read JSON from URL requests
Python send mail to multiple recipients using SMTP server
How to generate QR Code in Python using PyQRCode
Python programs to check Palindrome strings and numbers
CRUD operations in Python using MYSQL Connector
Fibonacci Series Program in Python
Python File Handler - Create, Read, Write, Access, Lock File
Python convert XML to JSON
Python convert xml to dict
Python convert dict to xml