|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Interface Summary | |
|---|---|
| CharSequenceBuffer | |
| Tag | Tag returned by TagTokenizer. |
| TagProcessorContext | Defines a set of methods that allows TagRules to
interact with the TagProcessor. |
| TagRule | User defined rule for processing Tags encountered by the TagProcessor. |
| TagTokenizer.TokenHandler | Handler that will receive callbacks as 'tags' and 'text' are encountered. |
| Class Summary | |
|---|---|
| BasicBlockRule<T> | TagRule helper class for dealing with blocks surrounded by an opening and closing tag. |
| BasicRule | Basic implementation of TagRule. |
| CustomTag | A CustomTag provides a mechanism to manipulate the contents of a Tag. |
| State | Acts a registry of TagRules to apply whilst the TagProcessor
is processing the document in this particular state. |
| StateTransitionRule | |
| TagProcessor | Copies a document from a source to a destination, applying rules on the way to extract content and/or transform the content. |
| TagTokenizer | Splits a chunk of HTML into 'text' and 'tag' tokens, for easy processing. |
| Enum Summary | |
|---|---|
| Tag.Type | Type of tag. |
| TagTokenizer.Token | |
This package is for processing tag-like markup languages - things with anglybrackets. HTML, XHTML, WML, XML and other SGML dialects.
Strengths:
It has 2 APIs you can use:
The TagTokenizer scans through a document and fires events as it encounters Tags of
interest. Anything that does not qualify as a Tag will be treated as a Text token.
This is a similar approach to the SAX API for XML processing.
The TagProcessor is built on top of the TagTokenizer and acts as a registry for TagRules and
TextFilters.
It also supports multiple States, allowing different rules to be applied in different sections of
document.
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||