How to use the langcodes.closest_match function in langcodes

To help you get started, we’ve selected a few langcodes examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github LuminosoInsight / wordfreq / wordfreq / __init__.py View on Github external
Because we use the `langcodes` module, we can handle slight
    variations in language codes. For example, looking for 'pt-BR',
    'pt_br', or even 'PT_BR' will get you the 'pt' (Portuguese) list.
    Looking up the alternate code 'por' will also get the same list.
    """
    if match_cutoff is not None:
        warnings.warn(
            "The `match_cutoff` parameter is deprecated",
            DeprecationWarning
        )
    available = available_languages(wordlist)

    # TODO: decrease the maximum distance. This distance is so high just
    # because it allows a test where 'yue' matches 'zh', and maybe the
    # distance between those is high because they shouldn't match.
    best, _distance = langcodes.closest_match(
        lang, list(available), max_distance=70
    )
    if best == 'und':
        raise LookupError("No wordlist %r available for language %r"
                          % (wordlist, lang))

    if best != lang:
        logger.warning(
            "You asked for word frequencies in language %r. Using the "
            "nearest match, which is %r (%s)."
            % (lang, best, langcodes.get(best).language_name('en'))
        )

    return read_cBpack(available[best])