How to use the extruct.rdflibxml.host.HostLanguage.html5 function in extruct

To help you get started, we’ve selected a few extruct examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github scrapinghub / extruct / extruct / rdflibxml / __init__.py View on Github external
self.options.processor_graph.add_http_context(err, 500)
                return copyErrors(graph, self.options)
            except Exception :
                e = sys.exc_info()[1]
                self.http_status = 500
                # Something nasty happened:-(
                if not rdfOutput : raise e
                err = self.options.add_error(str(e), context = name)
                self.options.processor_graph.add_http_context(err, 500)
                return copyErrors(graph, self.options)

            dom = None
            try :
                msg = ""
                parser = None
                if self.options.host_language == HostLanguage.html5 :
                    import warnings
                    warnings.filterwarnings("ignore", category=DeprecationWarning)
                    import html5lib
                    parser = html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder("dom"))
                    if self.charset :
                        # This means the HTTP header has provided a charset, or the
                        # file is a local file when we suppose it to be a utf-8
                        dom = parser.parse(input, encoding=self.charset)
                    else :
                        # No charset set. The HTMLLib parser tries to sniff into the
                        # the file to find a meta header for the charset; if that
                        # works, fine, otherwise it falls back on window-...
                        dom = parser.parse(input)

                    try :
                        if isstring :