How to find the stop words in nltk Python
In this post, you will learn how to find the nltk stop words using the Python programming language.
In computing, stopwords will be words that are sifted through prior or subsequent to the preparation of common language information (text). In spite of the fact that "stopwords" ordinarily allude to the most well-known words in a language. Stop words are regularly eliminated from the content before training deep learning and machine learning models since stop words happen in abundance, subsequently giving practically no extraordinary data that can be utilised for characterization or grouping. A few instruments explicitly abstain from eliminating these stop words to help state search. It does not add much meaning to a sentence.
In SEO terminology, stop words are the most common words that most search engines avoid for the purposes of saving space and time in processing of large amounts of data during crawling or indexing. This helps search engines save space in their databases.
To find stopwords in a language or text file, we need to use the nltk Corpus. The Natural Language Toolkit (NLTK) is a Python API for the analysis of text written in natural languages. A corpus represents a collection of (data) texts, typically labelled with text annotations. Here is the command to install nltk module-
pip install nltk
Once you've installed NLTK, start up the Python interpreter as before, and install the data required by typing the following two commands at the Python prompt.
import nltk
nltk.download ()
Find stop words of English
import nltk
from nltk.corpus import stopwords
eng_stopwords = set(stopwords.words('english'))
print("List of english stopwords -")
print(eng_stopwords)
print("Length of stopwords -")
print(len(eng_stopwords))
When we run the above program, we get the following output −
List of english stopwords -
{'ourselves', 'an', 'or', 'nor', 'yourself', 'own', 'their', 'any', "should've", 'to', 'there', 'more', 'mustn', 'shan', 'she', 're', "haven't", 'why', 'few', 'were', 'but', "don't", 'doing', "didn't", 'am', 'won', 'ours', 'down', 'most', 'only', 'how', 'did', 'when', 'couldn', 'as', 'ain', "mustn't", 'aren', 'myself', 'below', 'at', 'me', 'further', 'other', "you're", 'just', 'y', 'same', 'than', 'will', "couldn't", 'has', 'm', "won't", 'll', 'that', 'being', "weren't", 'with', 'until', 'above', 'we', 'such', 'not', 'can', 'during', 'haven', 'where', 'd', "you'll", "hasn't", 'weren', 'from', 'having', 'off', "it's", 'don', 'wouldn', 'theirs', "aren't", 'by', 'they', 'yours', 'your', 'now', 'hadn', 'shouldn', 'for', 'are', 'because', 'what', 'after', 'be', 'in', 'then', "shouldn't", 'is', 'the', 'you', 'which', 'through', 'hasn', "wouldn't", 'out', 'and', 'this', 'should', 'those', 'does', 'wasn', 'whom', 'before', 'o', 'him', 't', 'our', "isn't", "wasn't", 'had', 'herself', 'once', 'i', 've', 'under', 'ma', 'of', 'do', "mightn't", 'himself', 'my', 'yourselves', 'themselves', 'he', 'a', 'on', 'too', 'again', 'was', "hadn't", 'her', "that'll", "doesn't", 'these', 'isn', "you'd", 'both', 'didn', 'here', 'who', 'each', 'them', "needn't", 'while', 'very', 's', 'doesn', "shan't", 'mightn', 'against', 'no', 'about', 'all', 'needn', 'hers', 'its', 'itself', 'into', 'it', 'over', 'his', 'between', 'some', 'have', 'if', "you've", 'been', "she's", 'so', 'up'}
Length of stopwords -
179
Similarly, we can find the stop words of French language.
import nltk
from nltk.corpus import stopwords
fr_stopwords = set(stopwords.words('french'))
print("List of French stopwords -")
print(fr_stopwords)
print("Length of French stopwords -")
print(len(fr_stopwords))
When we run the above program, we get the following output-
List of French stopwords -
{'auraient', 'ses', 'ta', 'eussent', 'ayante', 'mon', 'avais', 'fussent', 'des', 'que', 'avez', 'sa', 'eûtes', 'pour', 'êtes', 'y', 'je', 'fusses', 'furent', 'j', 'eusses', 'eue', 'sur', 'te', 'étant', 'fussiez', 'ai', 'eu', 'à', 'étions', 'mais', 'étaient', 'avions', 'aurai', 'étée', 'suis', 'serez', 'votre', 'lui', 'était', 'avec', 'et', 'es', 'eusse', 'elle', 'aurez', 'de', 'aurais', 'fusse', 'sont', 'son', 'fussions', 'ayons', 'vos', 'ton', 'aura', 'étées', 'soyez', 'par', 'fut', 'qu', 'étantes', 'soient', 'fûmes', 'sommes', 'même', 'une', 'sera', 'seras', 'aviez', 'serons', 'seraient', 'as', 'tu', 'auront', 'étants', 'qui', 'avait', 'il', 'serais', 'étante', 'fût', 'serions', 'eut', 'du', 'ayant', 'auriez', 's', 'eût', 'aurions', 'on', 'mes', 'ne', 'étés', 'étais', 't', 'auras', 'avaient', 'eux', 'étiez', 'le', 'eussiez', 'eues', 'ils', 'pas', 'soit', 'toi', 'd', 'se', 'vous', 'soyons', 'eus', 'les', 'leur', 'aies', 'nos', 'dans', 'serait', 'eurent', 'notre', 'c', 'ce', 'est', 'aux', 'serai', 'm', 'au', 'ou', 'avons', 'nous', 'été', 'ayants', 'ces', 'ma', 'moi', 'ont', 'ayantes', 'eûmes', 'un', 'ait', 'ayez', 'aient', 'seriez', 'aurons', 'en', 'la', 'n', 'l', 'fus', 'eussions', 'sois', 'aurait', 'me', 'aie', 'seront', 'fûtes', 'tes'}
Length of French stopwords -
157
Find stop words of sentence
Here, we find the english stopwords of a sentence-
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
data = "The adventure turned into a nightmare. The wildness and adventure that are in fishing still recommended it to me."
stopWords = set(stopwords.words('english'))
words = word_tokenize(data)
wordsFiltered = []
for w in words:
if w in stopWords:
wordsFiltered.append(w)
print(wordsFiltered)
The above code returns the following-
['into', 'a', 'and', 'that', 'are', 'in', 'it', 'to', 'me']
Find stop words of Text File
Suppose we have the following text file-
The below code returns all the stopwords of the given text file.
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
data = open("demo.txt", "r")
stopWords = set(stopwords.words('english'))
words = word_tokenize(data.read())
wordsFiltered = []
for w in words:
if w in stopWords:
wordsFiltered.append(w)
print(wordsFiltered)
The above code returns the following-
['do', 'what', 'are', 'not', 'to', 'do', 'it', 'were', 'you', 'were']
Related Articles
Python Spell Checker ProgramPython remove punctuation from string
How to convert Excel to CSV Python Pandas
How to read data from excel file using Python Pandas
How to read data from excel file in Python
Python read JSON from URL requests
Python send mail to multiple recipients using SMTP server
How to generate QR Code in Python using PyQRCode
Python programs to check Palindrome strings and numbers
CRUD operations in Python using MYSQL Connector
Fibonacci Series Program in Python
Python File Handler - Create, Read, Write, Access, Lock File
Python convert XML to JSON
Python convert xml to dict
Python OpenCV Image Filtering
Remove last element from list Python
Count consonants in a string Python
Adaptive Thresholding in Python OpenCV
Python convert dict to xml