How to use the langdetect.utils.unicode_block.unicode_block function in langdetect

To help you get started, we’ve selected a few langdetect examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

Mimino666 / langdetect / langdetect / detector.py View on Github

def cleaning_text(self):
        '''Cleaning text to detect
        (eliminate URL, e-mail address and Latin sentence if it is not written in Latin alphabet).
        '''
        latin_count, non_latin_count = 0, 0
        for ch in self.text:
            if 'A' &lt;= ch &lt;= 'z':
                latin_count += 1
            elif ch &gt;= six.u('\u0300') and unicode_block(ch) != 'Latin Extended Additional':
                non_latin_count += 1

        if latin_count * 2 &lt; non_latin_count:
            text_without_latin = ''
            for ch in self.text:
                if ch &lt; 'A' or 'z' &lt; ch:
                    text_without_latin += ch
            self.text = text_without_latin

Mimino666 / langdetect / langdetect / utils / ngram.py View on Github

def normalize(cls, ch):
        block = unicode_block(ch)
        if block == UNICODE_BASIC_LATIN:
            if ch &lt; 'A' or ('Z' &lt; ch &lt; 'a') or 'z' &lt; ch:
                ch = ' '
        elif block == UNICODE_LATIN_1_SUPPLEMENT:
            if cls.LATIN1_EXCLUDED.find(ch) &gt;= 0:
                ch = ' '
        elif block == UNICODE_LATIN_EXTENDED_B:
            # normalization for Romanian
            if ch == six.u('\u0219'):  # Small S with comma below =&gt; with cedilla
                ch = six.u('\u015f')
            if ch == six.u('\u021b'):  # Small T with comma below =&gt; with cedilla
                ch = six.u('\u0163')
        elif block == UNICODE_GENERAL_PUNCTUATION:
            ch = ' '
        elif block == UNICODE_ARABIC:
            if ch == six.u('\u06cc'):

How to use the langdetect.utils.unicode_block.unicode_block function in langdetect

To help you get started, we’ve selected a few langdetect examples, based on popular ways it is used in public projects.

langdetect

Package Health Score

Popular langdetect functions

Similar packages