Speed up millions of regex replacements in Python 3

Question

I'm using Python 3.5.2I have two listsa list of about 750,000 "sentences" (long strings)a list of about 20,000 "words" that I would like to delete from my 750,000 sentencesSo, I have to loop through 750,000 sentences and perform about 20,000 replacements,&#160;but ONLY if my words are actually "words" and are not part of a larger string of characters.I am doing this by&#160;pre-compiling&#160;my words so that they are flanked by the&#160;\b&#160;metacharactercompiled_words = [re.compile(r'\b' + word + r'\b') for word in my20000words]
Then I loop through my "sentences"import re

for sentence in sentences:
  for word in compiled_words:
    sentence = re.sub(word, "", sentence)
  # put sentence into a growing list
This nested loop is processing about&#160;50 sentences per second, which is nice, but it still takes several hours to process all of my sentences.Is there a way to using the&#160;str.replace&#160;method (which I believe is faster), but still requiring that replacements only happen at&#160;word boundaries?Alternatively, is there a way to speed up the&#160;re.sub&#160;method? I have already improved the speed marginally by skipping over&#160;re.sub&#160;if the length of my word is > than the length of my sentence, but it's not much of an improvement.Thank you for any suggestions.

Gitika · Answer

One thing you can try is to compile one single pattern like&#160;"\b(word1|word2|word3)\b".Because&#160;re&#160;rely on C code to do the actual matching, the savings can be dramatic.As @pvg pointed out in the comments, it also benefits from single-pass matching.If your words are not regex,&#160;

Speed up millions of regex replacements in Python 3

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Python

Is multi-threading supported in Python and can it speed up execution time as well?

what is the practical use of polymorphism in Python?

Use of "continue" in python

How can I compare the content of two files in Python?

how do i change string to a list?

how can i randomly select items from a list?

how can i count the items in a list?

how do i use the enumerate function inside a list?

Count the frequency of an item in a python list

What exactly is the function of random.seed() in python?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES