perl-WWW-RobotRules

database of robots.txt-derived permissions

This module parses _/robots.txt_ files as specified in "A Standard for Robot Exclusion", at <<a href="http://www.robotstxt.org/wc/norobots.html>">http://www.robotstxt.org/wc/norobots.html></a> Webmasters can use the _/robots.txt_ file to forbid conforming robots from accessing parts of their web site. The parsed files are kept in a WWW::RobotRules object, and this object provides methods to check if access to a given URL is prohibited. The same WWW::RobotRules object can be used for one or more parsed _/robots.txt_ files on any number of hosts. The following methods are provided: * $rules = WWW::RobotRules->new($robot_name) This is the constructor for WWW::RobotRules objects. The first argument given to new() is the name of the robot. * $rules->parse($robot_txt_url, $content, $fresh_until) The parse() method takes as arguments the URL that was used to retrieve the _/robots.txt_ file, and the contents of the file. * $rules->allowed($uri) Returns TRUE if this robot is allowed to retrieve this URL. * $rules->agent([$name]) Get/set the agent name. NOTE: Changing the agent name will clear the robots.txt rules and expire times out of the cache.

There is no official package available for openSUSE Leap 15.4

Distributions

openSUSE Tumbleweed

openSUSE Leap 15.4

openSUSE Leap 15.2

SUSE SLE-15-SP2

SUSE SLE-15-SP1

RedHat RHEL-7

Unsupported distributions

The following distributions are not officially supported. Use these packages at your own risk.