Class bdd.search.EnginePrefs
All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class bdd.search.EnginePrefs

java.lang.Object
   |
   +----bdd.search.EnginePrefs

public class EnginePrefs
extends Object
Written by Tim Macinta 1997
Distributed under the GNU Public License (a copy of which is enclosed with the source).

Encapsulates the preferences for the crawler and the search engine.

Variable Index

 o pause_time
The time to pause between URL fetches (in seconds).
 o port

Constructor Index

 o EnginePrefs()

Method Index

 o getEmailAddress()
 o getFooterFile()
 o getHeaderFile()
 o getMainDir()
 o getMainIndex()
 o getMonitor()
 o getNotFoundFile()
 o getRulesFile()
The rules file contains rules which determine what URLs are allowed and what URLs whould be excluded.
 o getStartingFile()
 o getUserAgent()
 o getWorkingDir()
Returns the working directory for use by a crawler.
 o pauseBetweenURLs()
Pauses for the amount of time that has been specified for pausing between URL fetches.
 o readRobotsDotText(String, int)
Reads the "robots.txt" file on the given host and uses the results to determine what files on "host" are crawlable.
 o readRulesFile()
Causes the inclusion/exclusion rules to be read.
 o URLAllowed(URL)
Returns true if "url" is allowed to be indexed and false otherwise.

Variables

 o pause_time
  public int pause_time
The time to pause between URL fetches (in seconds).
 o port
  public static int port

Constructors

 o EnginePrefs
  public EnginePrefs()

Methods

 o URLAllowed
  public boolean URLAllowed(URL url)
Returns true if "url" is allowed to be indexed and false otherwise.
 o pauseBetweenURLs
  public void pauseBetweenURLs()
Pauses for the amount of time that has been specified for pausing between URL fetches.
 o getMainIndex
  public File getMainIndex()
 o getMainDir
  public File getMainDir()
 o getWorkingDir
  public File getWorkingDir()
Returns the working directory for use by a crawler. If more than one crawler is running at the same time they should be given different working directories.
 o getHeaderFile
  public File getHeaderFile()
 o getFooterFile
  public File getFooterFile()
 o getNotFoundFile
  public File getNotFoundFile()
 o getStartingFile
  public File getStartingFile()
 o getRulesFile
  public File getRulesFile()
The rules file contains rules which determine what URLs are allowed and what URLs whould be excluded. A line that is in the form:
 include http://gsd.mit.edu/
 
will cause all URLs that start with "http://gsd.mit.edu/" to be included. Similarly, to exclude URLs, use the keyword "exclude" instead of "include". Blank lines and lines starting with "#" are ignored.

When an URL is checked against the inclusion/exclusion rules the exclusion rules are checked first and if the URL matches an exclusion rule it is not included. If an URL is not covered by either rule it is not included, unless it is a "file://" URL in which case it is included by default.

 o readRulesFile
  public void readRulesFile() throws IOException
Causes the inclusion/exclusion rules to be read. This method should be called if the rules file is changed.
 o readRobotsDotText
  public void readRobotsDotText(String host,
                                int port)
Reads the "robots.txt" file on the given host and uses the results to determine what files on "host" are crawlable.
 o getUserAgent
  public String getUserAgent()
 o getEmailAddress
  public String getEmailAddress()
 o getMonitor
  public Monitor getMonitor()

All Packages  Class Hierarchy  This Package  Previous  Next  Index