How to use the tld.utils.process_url function in tld

To help you get started, we’ve selected a few tld examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github medialab / ural / ural / lru / stems.py View on Github external
if ':' in auth:
            user, password = auth.split(':', 1)
        else:
            user = auth

    # Parsing domain & port
    netloc = netloc.split(':', 1)

    if len(netloc) == 2:
        port = netloc[1]
        lru.append('t:' + port)

    # Need to process TLD?
    if tld_aware:
        domain_parts, non_zero_i, _ = process_url(
            url=parsed_url,
            fail_silently=True,
            fix_protocol=False,
            search_public=True,
            search_private=True
        )
        tld = '.'.join(domain_parts[non_zero_i:])
        lru.append('h:' + tld)
        for element in reversed(domain_parts[0:non_zero_i]):
            lru.append('h:' + element)

    else:
        for element in reversed(netloc[0].split('.')):
            lru.append('h:' + element)

    # Path
github medialab / ural / ural / lru / stems.py View on Github external
hostname = parsed_url.hostname or ''

    lru = []

    if scheme:
        lru.append('s:' + scheme)

    # Handling port
    if parsed_url.port is not None:
        lru.append('t:' + str(parsed_url.port))

    # Need to process TLD?
    should_process_normally = not tld_aware

    if tld_aware:
        domain_parts, non_zero_i, _ = process_url(
            url=parsed_url,
            fail_silently=True,
            fix_protocol=False,
            search_public=True,
            search_private=True
        )

        if domain_parts is None:
            should_process_normally = True

        else:
            tld = '.'.join(domain_parts[non_zero_i:])
            lru.append('h:' + tld)

            for element in reversed(domain_parts[0:non_zero_i]):
                lru.append('h:' + element)
github medialab / ural / ural / is_url.py View on Github external
if allow_spaces_in_path:
            pattern = RELAXED_URL_WITH_PROTOCOL_RE
        else:
            pattern = URL_WITH_PROTOCOL_RE
    else:
        if allow_spaces_in_path:
            pattern = RELAXED_URL
        else:
            pattern = URL_RE

    if not pattern.match(string):
        return False

    if tld_aware:
        domain_parts, non_zero_i, parsed_url = process_url(
            url=string,
            fail_silently=True,
            fix_protocol=not require_protocol,
            search_public=True,
            search_private=True
        )

        if domain_parts is None:
            if not parsed_url:
                return False

            return bool(SPECIAL_HOSTS_RE.match(parsed_url.hostname))

    return True

tld

Extract the top-level domain (TLD) from the URL given.

MPL-1.1 OR GPL-2.0-only OR LG…
Latest version published 2 years ago

Package Health Score

78 / 100
Full package analysis