Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Comparing tokenizers

A brief comparison of the NLP API tokenizers is shown in the following table. The tokens generated are listed under the tokenizer's name. They are based on the same text: "Let's pause, and then reflect." Keep in mind that the output is based on a simple use of the classes. There may be options not included in the examples that will influence how the tokens are generated. The intent is to simply show the type of output that can be expected based on the sample code and data:

`SimpleTokenizer`	`WhitespaceTokenizer`	`TokenizerME`	`PTBTokenizer`	`DocumentPreprocessor`	`IndoEuropeanTokenizerFactory`
Let	Let's	Let	Let	Let	Let
'	pause,	's	's	's	'
s	and	pause	pause	pause	s
pause	then	,	,	,	pause
,	reflect.	and	and	and	,
and		then	then	then	and
then		reflect	reflect	reflect	then
reflect		.	.	.	reflect
.					.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Comparing tokenizers

Create new playlist

Sign In

Sign Up

Table of Contents for
Comparing tokenizers