<<Previous: User Agent | ↑Up: Advanced Walk Settings | Next>>: Mime Types |
Syntax: one or more user-agent strings, one per line
This is a list of user agents to respect when checking
robots.txt
on a site. The robots.txt
group with the
User-agent
string that is a case-insensitive substring of the
earliest agent listed in Robots.txt Agents will be used; i.e.
the Robots.txt Agents should be listed highest-priority first.
If multiple robots.txt
groups match the same agent, the group
with the longest substring-matching User-agent
is used. If no
agents match, and a group for agent "*
" is present, it is
used. The default value for this setting is
"thunderstonesa
".
For example, changing this setting to MyBot
and Googlebot
and given this robots.txt
file:
User-agent: Google
Disallow: /some/google/dir
User-agent: MyBot
Disallow: /some/other/dir
then the Parametric Search Appliance will not crawl /some/other/dir
, but will
still crawl /some/google/dir
: while both agents
substring-match, and Google
is a longer substring, MyBot
is listed first in Robots.txt Agents and is thus higher
priority.
Given this robots.txt
with the same setting:
User-agent: Google
Disallow: /some/google/dir
User-agent: Googlebot
Disallow: /some/bot/dir
then the Parametric Search Appliance would not crawl /some/bot/dir
, because while
both agents substring-match Googlebot
, Googlebot
is the
longer match.
<<Previous: User Agent | ↑Up: Advanced Walk Settings | Next>>: Mime Types |