Using the StreamTokenizer class

The StreamTokenizer class, found in the java.io package, is designed to tokenize an input stream. It is an older class and is not as flexible as the StringTokenizer class discussed in the Using the StringTokenizer class section. An instance of the class is normally created based on a file and will tokenize the text found in the file. It can be constructed using a string.

The class uses a nextToken method to return the next token in the stream. The token returned is an integer. The value of the integer reflects the type of token returned. Based on the token type, the token can be handled in different ways.

The StreamTokenizer class fields are shown in the following table:

Field

Data type

Meaning

nval

double

Contains a number if the current token is a number

sval

String

Contains the token if the current token is a word token

TT_EOF

static int

A constant for the end of the stream

TT_EOL

static int

A constant for the end of the line

TT_NUMBER

static int

The number of tokens read

TT_WORD

static int

A constant indicating a word token

ttype

int

The type of token read

 

In this example, a tokenizer is created, followed by the declaration of the isEOF variable, which is used to terminate the loop. The nextToken method returns the token type. Based on the token type, numeric and string tokens are displayed:

try { 
    StreamTokenizer tokenizer = new StreamTokenizer( 
          newStringReader("Let's pause, and then reflect.")); 
    boolean isEOF = false; 
    while (!isEOF) { 
        int token = tokenizer.nextToken(); 
        switch (token) { 
            case StreamTokenizer.TT_EOF: 
                isEOF = true; 
                break; 
            case StreamTokenizer.TT_EOL: 
                break; 
            case StreamTokenizer.TT_WORD: 
                System.out.println(tokenizer.sval); 
                break; 
            case StreamTokenizer.TT_NUMBER: 
                System.out.println(tokenizer.nval); 
                break; 
            default: 
                System.out.println((char) token); 
        } 
    } 
} catch (IOException ex) { 
    // Handle the exception 
} 

When executed, we get the following output:

Let
'  

This is not what we would normally expect. The problem is that the tokenizer uses apostrophes (single quote character) and double quotes to denote quoted text. Since there is no corresponding match, it consumes the rest of the string.

We can use the ordinaryChar method to specify which characters should be treated as common characters. The single quote and comma characters are designated as ordinary characters here:

tokenizer.ordinaryChar('''); 
tokenizer.ordinaryChar(','); 

When these statements are added to the previous code and executed, we get the following output:

Let
'
s
pause
,
and
then
reflect.  

The apostrophe is not a problem now. These two characters are treated as delimiters and returned as tokens. There is also a whitespaceChars method available that specifies which characters are to be treated as whitespaces.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset