How to use the wordfreq.iter_wordlist function in wordfreq

To help you get started, we’ve selected a few wordfreq examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github LuminosoInsight / python-ftfy / scripts / mojibakery.py View on Github external
def add_language_trigrams(normal_freqs, baked_freqs, language):
    """
    Collect the trigram frequencies of both correct and mojibaked text, using
    word examples from the given language.
    """
    for baseword in wordfreq.iter_wordlist(language):
        freq = wordfreq.word_frequency(baseword, language)
        for word in set([baseword, baseword.upper()]):
            if any(letter.isdigit() for letter in word):
                continue
            for frame in FRAMES:
                padded = frame % word
                for trigram in get_trigrams(padded):
                    normal_freqs[trigram] += freq

                for enc1 in COMMON_ENCODINGS + LANGUAGE_ENCODINGS[language]:
                    for enc2 in COMMON_ENCODINGS + LANGUAGE_ENCODINGS[language]:
                        if enc1 != enc2 and (enc1 not in COMMON_ENCODINGS or enc2 not in COMMON_ENCODINGS):
                            try:
                                mojibaked = padded.encode(enc1).decode(enc2)
                                if mojibaked != padded:
                                    for trigram in get_trigrams(mojibaked):