How to use the sister.tokenizers.SimpleTokenizer function in sister

To help you get started, we’ve selected a few sister examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github tofunlp / sister / tests / View on Github external
def setUp(self):
        embedding_patcher = patch('sister.word_embedders.FasttextEmbedding')
        embedding = embedding_patcher.start()(lang='en')
        embedding.get_word_vector.return_value = np.random.rand(300)
        embedding.get_word_vectors.side_effect = lambda words: np.random.rand(len(words), 300)

        self.sentence_embedding = MeanEmbedding(
github tofunlp / sister / sister / View on Github external
def __init__(
            lang: str = 'en',
            tokenizer: Tokenizer = None,
            word_embedder: WordEmbedding = None) -> None:
        tokenizer = tokenizer or {"en": SimpleTokenizer(),
                                  "fr": SimpleTokenizer(),
                                  "ja": JapaneseTokenizer()}[lang]
        word_embedder = word_embedder or FasttextEmbedding(lang)
        super().__init__(tokenizer, word_embedder)