<<Previous: New | ↑Up: Rewalk Type | Next>>: Refresh in version 5 |
The default rewalk type Refresh
updates the existing database,
and only downloads files that have been modified or created since the
last walk. Pages that are no longer present on the server are removed
from the database.
Here are other considerations for using Refresh
. Pages that were referenced but were missing in the initial walk (the walk prior to the Refresh
), but were added after the initial walk, will be missed by Refresh
if their
parent page has not been modified. If you change your settings to be
more inclusive (i.e. add extensions, ignore robots, add domains, etc.),
you should do a New
walk once, because a Refresh
is not
likely to find the newly allowed data, unless all of the pages leading
to this data have been modified.
If more than 30%-50% of your site changes between walks you may be
better off using a New
walk instead of Refresh
. Also,
many dynamic content generators do not give modified dates which will
cause every page to be rewalked. In that case you should use
New
instead of Refresh
.
<<Previous: New | ↑Up: Rewalk Type | Next>>: Refresh in version 5 |