Tokenizer
- class Tokenizer()
A tokenizer that produces
Tokenization()
s.interface
- Tokenizer.tokenize(text)
Tokenize a string.
- Arguments
text (
AnyString()
) – The text to tokenize, either a string or aBiString()
.
- Returns
Tokenization – A
Tokenization()
holding the text and its tokens.
- class RegExpTokenizer(pattern)
Breaks text into tokens based on a
RegExp()
.- Implements:
Create a RegExpTokenizer.
- Arguments
pattern (
RegExp()
) – The regex that will match tokens.
- RegExpTokenizer.tokenize(text)
Tokenize a string.
- Arguments
text (
AnyString()
) –
- Returns
Tokenization – A
Tokenization()
holding the text and its tokens.
- class SplittingTokenizer(pattern)
Splits text into tokens based on a
RegExp()
.- Implements:
Create a SplittingTokenizer.
- Arguments
pattern (
RegExp()
) – A regex that matches the regions between tokens.
- SplittingTokenizer.tokenize(text)
Tokenize a string.
- Arguments
text (
AnyString()
) –
- Returns
Tokenization – A
Tokenization()
holding the text and its tokens.