Class bdd.search.spider.WordExtractor
All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class bdd.search.spider.WordExtractor

java.lang.Object
   |
   +----bdd.search.spider.WordExtractor

public class WordExtractor
extends Object
Written by Tim Macinta 1997
Distributed under the GNU Public License (a copy of which is enclosed with the source).

A WordExtractor should be able to extract the words from a given file. This class should be subclassed by classes which understand different document types.

Constructor Index

 o WordExtractor()

Method Index

 o addWord(String)
Used internally to add a word to the list of words as they are found in the document.
 o countOccurances(String)
Returns a count of the number of times that "word" appears in the the document.
 o countWords()
Returns the number of words in this document.
 o firstOccurance(String)
Returns the index of "word".
 o getWords()
Returns an Enumeration that returns each word in the document in no particular order.

Constructors

 o WordExtractor
  public WordExtractor()

Methods

 o getWords
  public Enumeration getWords()
Returns an Enumeration that returns each word in the document in no particular order. A word is returned once at most regardless of the number of times it appears in the document. The Enumeration returns a String for each call to nextElement().
 o countWords
  public int countWords()
Returns the number of words in this document.
 o countOccurances
  public int countOccurances(String word)
Returns a count of the number of times that "word" appears in the the document.
 o firstOccurance
  public int firstOccurance(String word)
Returns the index of "word". The index is determined by counting the words in the document until the first occurance of "word" is found. For instance, firstOccurance("the") would return 5 if the document started like this "Once upon a time the giant tomato of...". Returns -1 if the word is not in the document.
 o addWord
  public void addWord(String word)
Used internally to add a word to the list of words as they are found in the document.

All Packages  Class Hierarchy  This Package  Previous  Next  Index