How to use the presidio-analyzer.analyzer.entity_recognizer.EntityRecognizer.CONTEXT_PREFIX_COUNT function in presidio-analyzer

To help you get started, we’ve selected a few presidio-analyzer examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github microsoft / presidio / presidio-analyzer / analyzer / entity_recognizer.py View on Github external
# since the list of tokens is not necessarily aligned
        # with the actual index of the match, we look for the
        # token index which corresponds to the match
        token_index = EntityRecognizer.find_index_of_match_token(
            word,
            start,
            nlp_artifacts.tokens,
            nlp_artifacts.tokens_indices)

        # index i belongs to the PII entity, take the preceding n words
        # and the successing m words into a context string
        context_str = ''
        context_str = \
            self.__add_n_words_backward(token_index,
                                        EntityRecognizer.CONTEXT_PREFIX_COUNT,
                                        nlp_artifacts.lemmas,
                                        lemmatized_keywords,
                                        context_str)
        context_str = \
            self.__add_n_words_forward(token_index,
                                       EntityRecognizer.CONTEXT_SUFFIX_COUNT,
                                       nlp_artifacts.lemmas,
                                       lemmatized_keywords,
                                       context_str)

        self.logger.debug('Context sentence is: %s', context_str)
        return context_str

presidio-analyzer

Presidio analyzer package

MIT
Latest version published 2 months ago

Package Health Score

91 / 100
Full package analysis

Similar packages