<<Previous: Extra Networks | ↑Up: Advanced Walk Settings | Next>>: Exclusion REX |
Syntax: zero or more regular expressions (REX), separated by space or line break
Restricts walks to fetch URLs only matching any of the specified regular expressions anywhere in the URL (hostname, path, or query) when the Base URL matches.
If a Base URL is matched by an Extra URLs REX, then the only URLs that match the Extra URLs REX will be walked on that host. If a Base URL does not match an Extra URLs REX, then it is walked as normal.
It is a rarely used setting, most commonly used in conjunction with a hostname to fetch matching URLs on an additional host. Links still need to be found to those pages for them to be indexed.
For example, with the following Extra URLs REX:
>>=http://products\.example\.com=!supplierid+supplierid\=BigCo
(which matches a URL that begins with products.example.com
and contains supplierid=BigCo
), and using the following Base URLs:
http://products.example.com/listProducts.aspx?supplierid=BigCo
http://help.example.com/index.aspx
The Extra URLs REX matches the products.example.com
URL, so only pages with supplier=BigCo
will be walked, while all of help.example.com
will be walked (following other inclusion/exclusion rules).
Available from version 4.3.9
.
See also Extra Domains
, here.
<<Previous: Extra Networks | ↑Up: Advanced Walk Settings | Next>>: Exclusion REX |