How to use the refextract.references.engine.parse_references function in refextract

To help you get started, we’ve selected a few refextract examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github inspirehep / refextract / tests / test_engine.py View on Github external
def get_references(ref_line, override_kbs_files=None):
    return parse_references([ref_line], override_kbs_files=override_kbs_files)
github inspirehep / refextract / refextract / references / api.py View on Github external
>>> extract_references_from_string(path, override_kbs_files={'journals': 'my/path/to.kb'})
    """
    docbody = source.split('\n')
    if not is_only_references:
        reflines, dummy, dummy = extract_references_from_fulltext(docbody)
    else:
        refs_info = get_reference_section_beginning(docbody)
        if not refs_info:
            refs_info, dummy = find_numeration_in_body(docbody)
            refs_info['start_line'] = 0
            refs_info['end_line'] = len(docbody) - 1,

        reflines = rebuild_reference_lines(
            docbody, refs_info['marker_pattern'])
    parsed_refs, stats = parse_references(
        reflines,
        recid=recid,
        reference_format=reference_format,
        linker_callback=linker_callback,
        override_kbs_files=override_kbs_files,
    )
    return parsed_refs
github inspirehep / refextract / refextract / references / api.py View on Github external
To override KBs for journal names etc., use ``override_kbs_files``:

    >>> extract_references_from_file(path, override_kbs_files={'journals': 'my/path/to.kb'})

    """
    if not os.path.isfile(path):
        raise FullTextNotAvailableError(u"File not found: '{0}'".format(path))

    docbody = get_plaintext_document_body(path)
    reflines, dummy, dummy = extract_references_from_fulltext(docbody)
    if not reflines:
        docbody = get_plaintext_document_body(path, keep_layout=True)
        reflines, dummy, dummy = extract_references_from_fulltext(docbody)

    parsed_refs, stats = parse_references(
        reflines,
        recid=recid,
        reference_format=reference_format,
        linker_callback=linker_callback,
        override_kbs_files=override_kbs_files,
    )

    if magic.from_file(path, mime=True) == "application/pdf":
        texkeys = extract_texkeys_from_pdf(path)
        if len(texkeys) == len(parsed_refs):
            parsed_refs = [dict(ref, texkey=[key]) for ref, key in zip(parsed_refs, texkeys)]

    return parsed_refs