Logo

Parametric Search Appliance

This is custom script. Do not install updates.

 

Thunderstone Search Appliance Manual

<<Previous: User Agent ↑Up: Advanced Walk Settings Next>>: Mime Types

Robots.txt Agents

Syntax: one or more user-agent strings, one per line

This is a list of user agents to respect when checking robots.txt on a site. The robots.txt group with the User-agent string that is a case-insensitive substring of the earliest agent listed in Robots.txt Agents will be used; i.e. the Robots.txt Agents should be listed highest-priority first. If multiple robots.txt groups match the same agent, the group with the longest substring-matching User-agent is used. If no agents match, and a group for agent "*" is present, it is used. The default value for this setting is "thunderstonesa".

For example, changing this setting to MyBot and Googlebot and given this robots.txt file:

User-agent: Google
Disallow: /some/google/dir

User-agent: MyBot
Disallow: /some/other/dir

then the Parametric Search Appliance will not crawl /some/other/dir, but will still crawl /some/google/dir: while both agents substring-match, and Google is a longer substring, MyBot is listed first in Robots.txt Agents and is thus higher priority.

Given this robots.txt with the same setting:

User-agent: Google
Disallow: /some/google/dir

User-agent: Googlebot
Disallow: /some/bot/dir

then the Parametric Search Appliance would not crawl /some/bot/dir, because while both agents substring-match Googlebot, Googlebot is the longer match.


Copyright © Thunderstone Software     Last updated: Jul 28 2017

<<Previous: User Agent ↑Up: Advanced Walk Settings Next>>: Mime Types
Page generated in 0.08 seconds.
2024-11-23 18:41:02 EST