perl-IO-HTML

Open an HTML file with automatic charset detection

IO::HTML provides an easy way to open a file containing HTML while automatically determining its encoding. It uses the HTML5 encoding sniffing algorithm specified in section 8.2.2.2 of the draft standard. The algorithm as implemented here is: * 1. If the file begins with a byte order mark indicating UTF-16LE, UTF-16BE, or UTF-8, then that is the encoding. * 2. If the first '$bytes_to_check' bytes of the file contain a '<meta>' tag that indicates the charset, and Encode recognizes the specified charset name, then that is the encoding. (This portion of the algorithm is implemented by 'find_charset_in'.) The '<meta>' tag can be in one of two formats: <meta charset="..."> <meta http-equiv="Content-Type" content="...charset=..."> The search is case-insensitive, and the order of attributes within the tag is irrelevant. Any additional attributes of the tag are ignored. The first matching tag with a recognized encoding ends the search. * 3. If the first '$bytes_to_check' bytes of the file are valid UTF-8 (with at least 1 non-ASCII character), then the encoding is UTF-8. * 4. If all else fails, use the default character encoding. The HTML5 standard suggests the default encoding should be locale dependent, but currently it is always 'cp1252' unless you set '$IO::HTML::default_encoding' to a different value. Note: 'sniff_encoding' does not apply this step; only 'html_file' does that.

There is no official package available for openSUSE Leap 15.5

Distributions

openSUSE Tumbleweed

openSUSE Leap 15.6

openSUSE Leap 15.5

openSUSE Leap 15.4

Unsupported distributions

The following distributions are not officially supported. Use these packages at your own risk.