Tokenizer
- class Tokenizer()
A tokenizer that produces
Tokenization()s.interface
- Tokenizer.tokenize(text)
Tokenize a string.
- Arguments
text (
AnyString()) – The text to tokenize, either a string or aBiString().
- Returns
Tokenization – A
Tokenization()holding the text and its tokens.
- class RegExpTokenizer(pattern)
Breaks text into tokens based on a
RegExp().- Implements:
Create a RegExpTokenizer.
- Arguments
pattern (
RegExp()) – The regex that will match tokens.
- RegExpTokenizer.tokenize(text)
Tokenize a string.
- Arguments
text (
AnyString()) –
- Returns
Tokenization – A
Tokenization()holding the text and its tokens.
- class SplittingTokenizer(pattern)
Splits text into tokens based on a
RegExp().- Implements:
Create a SplittingTokenizer.
- Arguments
pattern (
RegExp()) – A regex that matches the regions between tokens.
- SplittingTokenizer.tokenize(text)
Tokenize a string.
- Arguments
text (
AnyString()) –
- Returns
Tokenization – A
Tokenization()holding the text and its tokens.