How to remove stop words in Python
In this article, you will learn different ways to remove stop words in Python. Python provides various modules to get the stop words.
In SEO phrasing, stop words (such as "the", "a", "an", "in", "at") are the most widely recognised words that most web crawlers stay away from, for reasons of sparing reality in preparing huge amounts of information during slithering or ordering. This causes web indexes to spare space in their information bases.
Stop words are consistently eliminated from the substance before preparing deep learning and machine learning models since stop words occur in bounty, therefore giving essentially no phenomenal information that can be used for portrayal or gathering. A couple of instruments expressly avoid eliminating these stop words to help with a state search. We have to eliminate stop words while performing assignments like spam filtering, auto-tag generation, language classification, and so forth.
Removing Stop words with Python's NLTK Library
Natural Language Toolkit (NLTK) is a Python API for the analysis of text written in natural languages. Before using the given code, make sure your system has installed the nltk library, and if you don't have it, please follow this article How to install NLTK for Python on Windows 64-bit
Here is the code to remove stop words from a given sentence-
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
sentance = "Our mission is to provide good educational resources to " \
"technical students, learner, web developers. " \
"In this platform, we have share our knowledge and experience" \
"through developement resources, tutorials and" \
" multiple interview questions and exercises."
stopWords = set(stopwords.words('english'))
words = word_tokenize(sentance)
wordsFiltered = []
#Remove stopwords
for w in words:
if w not in stopWords:
wordsFiltered.append(w)
print(wordsFiltered)
The above code returns the following -
(env) c:\python37\Scripts\projects>stopwords.py
['Our', 'mission', 'provide', 'good', 'educational', 'resources', 'technical', 'students', ',', 'learner', ',', 'web', 'developers', '.', 'In', 'platform', ',', 'share', 'knowledge', 'experiencethrough', 'developement', 'resources', ',', 'tutorials', 'multiple', 'interview', 'questions', 'exercises', '.']
Removing Stop words with Python's SpaCy Library
SpaCy is a free, open-source, advanced Python library for Natural Language Processing. It's written in Cython. We can install SpaCy using the Python package manage tool pip in a virtual environment. To learn more about the virtual environment and pip, click on the link Install Virtual Environment.
pip install spacy
The following code removes all stop words from a given sentence -
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
nlp = spacy.load('en_core_web_sm')
text = "Success is walking from failure to failure with no loss of enthusiasm."
words = nlp(text)
token_list = []
for token in words:
token_list.append(token.text)
wordsFiltered =[]
for word in token_list:
lexeme = nlp.vocab[word]
if lexeme.is_stop == False:
wordsFiltered .append(word)
print(token_list)
print(wordsFiltered )
The above code returns the following -
['Success', 'is', 'walking', 'from', 'failure', 'to', 'failure', 'with', 'no', 'loss', 'of', 'enthusiasm', '.']
['Success', 'walking', 'failure', 'failure', 'loss', 'enthusiasm', '.']
Related Articles
Python Spell Checker ProgramPython Tkinter Combobox Event Binding
Capture a video in Python OpenCV and save
Python program to multiply two numbers
Install NLTK for Python on Windows 64 bit
Adaptive Thresholding in Python OpenCV
Python remove punctuation from string
How to convert Excel to CSV Python Pandas
How to read data from excel file using Python Pandas
How to read data from excel file in Python
Python read JSON from URL requests
Python send mail to multiple recipients using SMTP server
How to generate QR Code in Python using PyQRCode
Python programs to check Palindrome strings and numbers
CRUD operations in Python using MYSQL Connector
Fibonacci Series Program in Python
Python File Handler - Create, Read, Write, Access, Lock File
Python convert XML to JSON
Python convert xml to dict
Python convert dict to xml