Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.
The optional `transforms` arguments can be used to apply pre-processing to
respectively the ground truth and hypotheses input. Note that the transform
should ALWAYS include `SentencesToListOfWords`, as that is the expected input.
:param truth: the ground-truth sentence(s) as a string or list of strings
:param hypothesis: the hypothesis sentence(s) as a string or list of strings
:param truth_transform: the transformation to apply on the truths input
:param hypothesis_transform: the transformation to apply on the hypothesis input
:return: the WER as a floating number between 0 and 1
"""
# deal with old API
if "standardize" in kwargs:
truth = _standardize_transform(truth)
hypothesis = _standardize_transform(hypothesis)
if "words_to_filter" in kwargs:
t = tr.RemoveSpecificWords(kwargs["words_to_filter"])
truth = t(truth)
hypothesis = t(hypothesis)
# Apply transforms. By default, it collapses input to a list of words
truth = truth_transform(truth)
hypothesis = hypothesis_transform(hypothesis)
# raise an error if the ground truth is empty
if len(truth) == 0:
raise ValueError("the ground truth cannot be an empty")
# tokenize each word into an integer
vocabulary = set(truth + hypothesis)
word2char = dict(zip(vocabulary, range(len(vocabulary))))
truth_chars = [chr(word2char[w]) for w in truth]