How to use the wikiextractor.annotated_wikiextractor.AnnotatedWikiDocument function in wikiextractor

To help you get started, we’ve selected a few wikiextractor examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github nournia / wikifier / wikiextractor / annotated_wikiextractor.py View on Github external
for m in ms:              
            if urllib.quote("#") not in m.group(1) or keep_anchors:
                annotations.append({
                    "u"    :   m.group(1), 
                    "s" :   m.group(2), 
                    "o"  :   m.start() - deltaStringLength
                })
            
            deltaStringLength += len(m.group(0)) - len(m.group(2))
                
        #As a second step, replace all links in the article by their label
        wiki_document.text = re.sub('<a href="([^">([^&gt;]+)</a>', lambda m: m.group(2), wiki_document.text)
        
        #Create a new AnnotatedWikiDocument
        annotated_wiki_document = AnnotatedWikiDocument(wiki_document)
        annotated_wiki_document.setAnnotations(annotations)

        #Return the AnnotatedWikiDocument
        return annotated_wiki_document