How to use the asrtoolkit.alignment.preprocess_txt.parse_transcript function in asrtoolkit

To help you get started, we’ve selected a few asrtoolkit examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github finos / greenkey-asrtoolkit / asrtoolkit / align_json.py View on Github external
def align_json(ref_txt, json_file, filename=None):
    """
    CLI for forced alignment tools
    Using a reference txt file and a hypothesis gk json 
        file, this time-aligns the reference txt file 
        and outputs an STM file
    Input
      ref_txt, str - reference text file containing ground truth
      json_file, str - hypothesis gk JSON file
      filename, str - output STM filename
    """

    ref_tokens = preprocess_txt.parse_transcript(ref_txt)
    gk_json = preprocess_gk_json.preprocess_transcript(json_file)
    segments = align(gk_json, ref_tokens)

    if filename is None:
        filename = basename(sanitize(strip_extension(ref_txt))) + ".stm"

    # fix segment filename and speaker
    for seg in segments:
        seg.filename = strip_extension(filename)
        seg.speaker = strip_extension(filename) + "UnknownSpeaker"

    output = time_aligned_text()
    output.segments = segments
    output.write(filename)