
Text = "inte rnatio nal trade is not good for ec onom ies" However, since a lot of smaller words are valid in the English language, many spaced out words don't correctly concatenate. This should help sanitize most of the invalid words, but as the comments mentioned it is sometimes impossible to find out if two words belong together without performing some sort of lexical analysis.Īs suggested by Pranav Hosangadi, here is a modified (and a little more involved) version which can remove multiple spaces in words by compounding previously added words which are not in the dictionary. When it finds that a previously added word is not in the dictionary, but appending the next word to it does make it valid, it sticks those two words together. This will split the text on spaces and append words to fixed_text. If fixed_text and not d.check(words) and d.check(compound_word := ''.join(, words])): Text = "int ernational trade is not good for economies"įor i in range(len(words := text.split())):

I will assume words that do not have meaning on their own but do together are a word, and use the following code to find words that are split by a single space: import enchant You could use the PyEnchant package to get a list of English words.
