Class bdd.search.spider.Crawler

All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class bdd.search.spider.Crawler

java.lang.Object
   |
   +----java.lang.Thread
           |
           +----bdd.search.spider.Crawler

public class Crawler
extends Thread

Written by Tim Macinta 1997
Distributed under the GNU Public License (a copy of which is enclosed with the source).

Calling the Crawler's start() method will cause the Crawler to index all of the sites in its queue and then replace the main index with the updated index when it completes. The Crawler's queue should be filled with the starting URLs before calling start().

Crawler(File, EnginePrefs): "working_dir" should be a directory that only this Crawler and a given Indexer will be accessing.

addURL(URL): Takes "url_to_queue" and adds it to this Crawler's queue of URLs.
main(File, EnginePrefs)
main(File, EnginePrefs, boolean)
main(String[]): This is the method that is called when this class is invoked from the command line.
run(): This is where the actual crawling occurs.

Crawler

  public Crawler(File working_dir,
                 EnginePrefs eng_prefs)

"working_dir" should be a directory that only this Crawler and a given Indexer will be accessing. This means that if several Crawlers are running simultaneously, they should all be given different "working_dir" directories. Also, no other threads should write to this directory (except for the selected Indexer).

addURL

  public void addURL(URL url_to_queue)

Takes "url_to_queue" and adds it to this Crawler's queue of URLs. This method should be used to add all of the desired starting URLs to the queue before the Crawler is started. If the URL has already been processed or if it is an unallowed URL it is not added.

run

  public void run()

This is where the actual crawling occurs.

Overrides:: run in class Thread

main

  public static void main(String arg[])

This is the method that is called when this class is invoked from the command line. calling this method will cause a Crawler to be created and started with the starting URLs being listed in a file specified by the first argument (arg[0]). The file listing the URLs should contain only the URLs with each URL on a line by itself. Blank lines are allowed and lines beginning with "#" are considered comments and are ignored.

main

  public static void main(File file,
                          EnginePrefs prefs)

main

  public static void main(File file,
                          EnginePrefs prefs,
                          boolean exit)

All Packages  Class Hierarchy  This Package  Previous  Next  Index