Logo

Parametric Search Appliance

This is custom script. Do not install updates.

 

Thunderstone Search Appliance Manual

<<Previous: Notes ↑Up: Basic Walk Settings Next>>: Robots

Base URL(s)

 

Syntax: one or more URLs, one per line

This is the address where the web walker will start walking your site. If the whole site is to be searched, simply enter your web address, for example "http://www.example.com". If the search is to be limited, specify the address to start the search or create a page listing the URLs to search. The search will only return information from your web site - no off-site searching will be done. Directory URLs should include a final forward slash "/". Example - "http://www.example.com/mysite/". If you have a virtual domain that just redirects to another URL, enter the destination URL as your Base URL instead of your virtual domain name.

You may specify multiple base URLs to index multiple sites; the Parametric Search Appliance's idea of a "site" is a single host as identified by the hostname portion of a URL. Therefore http://www.example.com, http://www2.example.com, and http://example.com would all be considered different sites.

In version 4.02.1046373961 Feb 27 2003 and later, the special "protocol" http-post or https-post may be used for a Base URL. This uses the POST method instead of the GET method to fetch the URL, using the query string as POST data (it must be URL-encoded). This can be used to start walking at a login page form that requires POST instead of GET. Note that the URL stored in the html table will have the -post and query string removed for security. During a Refresh walk, when a URL is about to be refreshed, the probable Base URL that led to it (i.e. the one with the longest prefix) will also be fetched. This helps ensure that login cookies are properly restored to allow the Parametric Search Appliance access during the refresh. Example:

"http-post://www.somehost.com/login.asp?user=bigbird&pass=open-sesame"

In version 5, a username and password may be given in the Base URL. Normally, if only one login is required to access the site to be walked, the username and password should be given in the Login Info walk setting. However, if several different logins are required, the additional logins can be specified as user:password@ prefixed to the hostname in the Base URL. Note that the user/pass is for WWW Basic Authentication. If your site uses a custom or form-based login, use http-post instead. Example:

"http://MyName:MyPassword@www.myhost.com/login.asp"

See also URL file 3.5.7, URL URL 3.5.8, Single page 3.5.9, Page file 3.5.10, and Page URL 3.5.11 for more ways to specify URLs.


Copyright © Thunderstone Software     Last updated: Jul 28 2017

<<Previous: Notes ↑Up: Basic Walk Settings Next>>: Robots
Page generated in 0.08 seconds.
2024-11-23 18:54:06 EST