Python Bytes
#276 Tracking cyber intruders with Jupyter and Python
- Autor: Vários
- Narrador: Vários
- Editor: Podcast
- Duración: 0:45:04
- Mas informaciones
Informações:
Sinopsis
Watch the live stream: Watch on YouTube About the show Sponsored by FusionAuth: pythonbytes.fm/fusionauth Special guest: Ian Hellen Brian #1: gensim.parsing.preprocessing Problem I’m working on Turn a blog title into a possible url example: “Twisted and Testing Event Driven / Asynchronous Applications - Glyph” would like, perhaps: “twisted-testing-event-driven-asynchrounous-applications” Sub-problem: remove stop words ← this is the hard part I started with an article called Removing Stop Words from Strings in Python It covered how to do this with NLTK, Gensim, and SpaCy I was most successful with remove_stopwords() from Gensim from gensim.parsing.preprocessing import remove_stopwords It’s part of a gensim.parsing.preprocessing package I wonder what’s all in there? a treasure trove gensim.parsing.preprocessing.preprocess_string is one this function applies filters to a string, with the defaults almost being just what I want: strip_tags() strip_punctuation() strip_multiple_whitespaces() str