perl-IO-HTML

Open an HTML file with automatic charset detection

IO::HTML provides an easy way to open a file containing HTML while automatically determining its encoding. It uses the HTML5 encoding sniffing algorithm specified in section 8.2.2.2 of the draft standard. The algorithm as implemented here is: * 1. If the file begins with a byte order mark indicating UTF-16LE, UTF-16BE, or UTF-8, then that is the encoding. * 2. If the first '$bytes_to_check' bytes of the file contain a '<meta>' tag that indicates the charset, and Encode recognizes the specified charset name, then that is the encoding. (This portion of the algorithm is implemented by 'find_charset_in'.) The '<meta>' tag can be in one of two formats: <meta charset="..."> <meta http-equiv="Content-Type" content="...charset=..."> The search is case-insensitive, and the order of attributes within the tag is irrelevant. Any additional attributes of the tag are ignored. The first matching tag with a recognized encoding ends the search. * 3. If the first '$bytes_to_check' bytes of the file are valid UTF-8 (with at least 1 non-ASCII character), then the encoding is UTF-8. * 4. If all else fails, use the default character encoding. The HTML5 standard suggests the default encoding should be locale dependent, but currently it is always 'cp1252' unless you set '$IO::HTML::default_encoding' to a different value. Note: 'sniff_encoding' does not apply this step; only 'html_file' does that.

Ei virallista pakettia saatavilla: openSUSE Leap 15.5

Jakelut

openSUSE Tumbleweed

openSUSE Leap 15.6

openSUSE Leap 15.5

openSUSE Leap 15.4

Ei-tuetut jakelut

Seuraavia jakeluja ei virallisesti tueta. Käytä näitä paketteja omalla vastuullasi.