How to use the tldextract.PUBLIC_SUFFIX_LIST_URLS function in tldextract

To help you get started, we’ve selected a few tldextract examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github fygrave / dnslyzer / distributed / dgaworker.py View on Github external
import tldextract
import dnslib
import base64
import ConfigParser as CFG
import redis
import numpy as np
import zmq

import pika
import datetime
import math
import string
import json
from dgascore import DGAScore

tldextract.PUBLIC_SUFFIX_LIST_URLS=["file:///data/effective_tld_names01.dat", "file:///data/effective_tld_names02.dat"]

dgascore = DGAScore()

def get_date():
    d = datetime.datetime.now()
    return "%s-%s-%s" % (d.day, d.month, d.year)

def cluster_id(domain_label):
    dom = tldextract.extract(domain_label)
    line = "%s.%s." % (dom.domain, dom.suffix)
    domain =  dom.domain.encode('utf-8')
    zone = "nozone"
    if len(dom.suffix) > 1:
       zone = dom.suffix.encode('utf-8')
    try:
        zone = domain[domain.rindex('.')+1:]

tldextract

Accurately separates a URL's subdomain, domain, and public suffix, using the Public Suffix List (PSL). By default, this includes the public ICANN TLDs and their exceptions. You can optionally support the Public Suffix List's private domains as well.

BSD-3-Clause
Latest version published 2 months ago

Package Health Score

86 / 100
Full package analysis

Similar packages