The StreamTokenizer class, found in the java.io package, is designed to tokenize an input stream. It is an older class and is not as flexible as the StringTokenizer class discussed in the Using the StringTokenizer class section. An instance of the class is normally created based on a file and will tokenize the text found in the file. It can be constructed using a string.
The class uses a nextToken method to return the next token in the stream. The token returned is an integer. The value of the integer reflects the type of token returned. Based on the token type, the token can be handled in different ways.
The StreamTokenizer class fields are shown in the following table:
Field |
Data type |
Meaning |
nval |
double |
Contains a number if the current token is a number |
sval |
String |
Contains the token if the current token is a word token |
TT_EOF |
static int |
A constant for the end of the stream |
TT_EOL |
static int |
A constant for the end of the line |
TT_NUMBER |
static int |
The number of tokens read |
TT_WORD |
static int |
A constant indicating a word token |
ttype |
int |
The type of token read |
In this example, a tokenizer is created, followed by the declaration of the isEOF variable, which is used to terminate the loop. The nextToken method returns the token type. Based on the token type, numeric and string tokens are displayed:
try { StreamTokenizer tokenizer = new StreamTokenizer( newStringReader("Let's pause, and then reflect.")); boolean isEOF = false; while (!isEOF) { int token = tokenizer.nextToken(); switch (token) { case StreamTokenizer.TT_EOF: isEOF = true; break; case StreamTokenizer.TT_EOL: break; case StreamTokenizer.TT_WORD: System.out.println(tokenizer.sval); break; case StreamTokenizer.TT_NUMBER: System.out.println(tokenizer.nval); break; default: System.out.println((char) token); } } } catch (IOException ex) { // Handle the exception }
When executed, we get the following output:
Let '
This is not what we would normally expect. The problem is that the tokenizer uses apostrophes (single quote character) and double quotes to denote quoted text. Since there is no corresponding match, it consumes the rest of the string.
We can use the ordinaryChar method to specify which characters should be treated as common characters. The single quote and comma characters are designated as ordinary characters here:
tokenizer.ordinaryChar('''); tokenizer.ordinaryChar(',');
When these statements are added to the previous code and executed, we get the following output:
Let ' s pause , and then reflect.
The apostrophe is not a problem now. These two characters are treated as delimiters and returned as tokens. There is also a whitespaceChars method available that specifies which characters are to be treated as whitespaces.