How to use the langcodes.Language function in langcodes

To help you get started, we’ve selected a few langcodes examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github luigigubello / My-Twitter-World / tweets-analysis / tweets_analysis.py View on Github external
year_array = copy.deepcopy(year)
				volume_tweet[volume_years[j]+1] = year_array
			j += 1
	if volume_creation != {}:
		volume_years = sorted(volume_creation.keys())
		j = 0
		while j+1 < len(volume_years):
			if volume_years[j+1] - volume_years[j] > 1:
				year_array = copy.deepcopy(creation_year)
				volume_creation[volume_years[j]+1] = year_array
			j += 1
	if language_count != {}:
		lang_keys = list(language_count)
		for item in lang_keys:
			lang_tw = langcodes.standardize_tag(item)
			new_key = langcodes.Language.make(language=lang_tw).language_name()
			language_count[new_key] = language_count[item]
			del language_count[item]
		lang_keys = list(language_count)
		for item in lang_keys:
			if item in language_dictionary:
				language_dictionary[item] += language_count[item]
			else:
				language_dictionary[item] = language_count[item]
	return([volume_tweet, daily_rhytm, volume_creation, user_agent, retweeted_user, hashtag, language_dictionary, account_count])
github LuminosoInsight / wordfreq / wordfreq / language_info.py View on Github external
'transliteration': 'sr-Latn', 'az-Latn', or None

        Indicates a type of transliteration that we should use for normalizing
        a multi-script language. 'sr-Latn' means to use Serbian romanization,
        and 'az-Latn' means to use Azerbaijani romanization.

    'lookup_transliteration': 'zh-Hans' or None

        Indicates a lossy transliteration that should be not be used for output,
        but should be applied when looking up words in a list. 'zh-Hans' means
        that we should convert Traditional Chinese characters to Simplified.
    """
    # The input is probably a string, so parse it into a Language. If it's
    # already a Language, it will pass through.
    language = Language.get(language)

    # Assume additional things about the language, such as what script it's in,
    # using the "likely subtags" table
    language_full = language.maximize()

    # Start the `info` dictionary with default values, including the 'script'
    # value that we now know from `language_full`.
    info = {
        'script': language_full.script,
        'tokenizer': 'regex',
        'normal_form': 'NFKC',
        'remove_marks': False,
        'dotless_i': False,
        'diacritics_under': None,
        'transliteration': None,
        'lookup_transliteration': None