How to use the sacremoses.chinese.tradify function in sacremoses

To help you get started, we’ve selected a few sacremoses examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github alvations / sacremoses / sacremoses / cli.py View on Github external
def convert_chinese(t2s, processes, encoding, quiet):
    convert = simplify if t2s else tradify
    with click.get_text_stream("stdin", encoding=encoding) as fin:
        with click.get_text_stream("stdout", encoding=encoding) as fout:
            # If it's single process, joblib parallization is slower,
            # so just process line by line normally.
            if processes == 1:
                # TODO: Actually moses_normalize(fin.read()) gives the same output
                #       and it's a lot better but it's inconsistent with the other
                #       preprocessing interfaces, so we're doing it line by line here.
                for line in tqdm(fin.readlines()):
                    # Note: not stripping newlines, so don't need end='\n' when printing to stdout.
                    print(convert(line), end="", file=fout)
            else:
                for outline in parallelize_preprocess(convert, fin.readlines(), processes, progress_bar=(not quiet)):
                    # Note: not stripping newlines, so don't need end='\n' when printing to stdout.
                    print(outline, end="", file=fout)
github alvations / sacremoses / sacremoses / cli.py View on Github external
def convert_chinese(t2s, processes, encoding, quiet):
    convert = simplify if t2s else tradify
    with click.get_text_stream("stdin", encoding=encoding) as fin:
        with click.get_text_stream("stdout", encoding=encoding) as fout:
            # If it's single process, joblib parallization is slower,
            # so just process line by line normally.
            if processes == 1:
                # TODO: Actually moses_normalize(fin.read()) gives the same output
                #       and it's a lot better but it's inconsistent with the other
                #       preprocessing interfaces, so we're doing it line by line here.
                for line in tqdm(fin.readlines()):
                    # Note: not stripping newlines, so don't need end='\n' when printing to stdout.
                    print(convert(line), end="", file=fout)
            else:
                for outline in parallelize_preprocess(convert, fin.readlines(), processes, progress_bar=(not quiet)):
                    # Note: not stripping newlines, so don't need end='\n' when printing to stdout.
                    print(outline, end="", file=fout)