How to use the refextract.references.regexs.re_tagged_citation.search function in refextract

To help you get started, we’ve selected a few refextract examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github inspirehep / refextract / refextract / references / engine.py View on Github external
'publisher')

        elif tag_type == "COLLABORATION":
            identified_citation_element, processed_line, cur_misc_txt = \
                map_tag_to_subfield(tag_type,
                                    processed_line[tag_match_end:],
                                    cur_misc_txt,
                                    'collaboration')

        if identified_citation_element:
            # Append the found tagged data and current misc text
            citation_elements.append(identified_citation_element)
            identified_citation_element = None

        # Look for the next tag in the processed line:
        tag_match = re_tagged_citation.search(processed_line)

    # place any remaining miscellaneous text into the
    # appropriate MARC XML fields:
    cur_misc_txt += processed_line

    # This MISC element will hold the entire citation in the event
    # that no tags were found.
    if len(cur_misc_txt.strip(" .;,")) > 0:
        # Increment the stats counters:
        count_misc += 1
        identified_citation_element = {
            'type': "MISC",
            'misc_txt': cur_misc_txt,
        }
        citation_elements.append(identified_citation_element)
github inspirehep / refextract / refextract / references / engine.py View on Github external
        @param line_marker: (string) The line marker for this single reference line (e.g. [19])
        @param line: (string) The tagged reference line.
        @param identified_dois: (list) a list of dois which were found in this line. The ordering of
        dois corresponds to the ordering of tags in the line, reading from left to right.
        @param identified_urls: (list) a list of urls which were found in this line. The ordering of
        urls corresponds to the ordering of tags in the line, reading from left to right.
        @param which format to use for references,
        roughly "<title> &lt;volume&gt; &lt;page&gt;" or "&lt;title&gt;,&lt;volume&gt;,&lt;page&gt;"
        @return xml_line: (string) the MARC-XML representation of the tagged reference line
        @return count_*: (integer) the number of * (pieces of info) found in the reference line.
    """
    count_misc = count_title = count_reportnum = count_url = count_doi = count_auth_group = 0
    processed_line = line
    cur_misc_txt = u""

    tag_match = re_tagged_citation.search(processed_line)

    # contains a list of dictionary entries of previously cited items
    citation_elements = []
    # the last tag element found when working from left-to-right across the
    # line
    identified_citation_element = None

    while tag_match is not None:
        # While there are tags inside this reference line...
        tag_match_start = tag_match.start()
        tag_match_end = tag_match.end()
        tag_type = tag_match.group(1)
        cur_misc_txt += processed_line[0:tag_match_start]

        # Catches both standard titles, and ibid's
        if tag_type.find("JOURNAL") != -1:</title>