Parsing, tokenizing, and annotating a sentence

Parsed document content is iterable, and each element has numerous attributes produced by the processing pipeline. The following sample illustrates how to access the following attributes:

.text: Original word text
.lemma_: Word root
.pos_: Basic POS tag
.tag_: Detailed POS tag
.dep_: Syntactic relationship or dependency between tokens
.shape_: The shape of the word regarding capitalization, punctuation, or digits
.is alpha: Check whether the token is alphanumeric
.is stop: Check whether the token is on a list of common words for the given language

We iterate over each token and assign its attributes to a pd.DataFrame:

pd.DataFrame([[t.text, t.lemma_, t.pos_, t.tag_, t.dep_, t.shape_, t.is_alpha, t.is_stop] for t in doc],
             columns=['text', 'lemma', 'pos', 'tag', 'dep', 'shape', 'is_alpha', 'is_stop'])

Which produces the following output:

text	lemma	pos	tag	dep	shape	is_alpha	is_stop
Apple	apple	PROPN	NNP	nsubj	Xxxxx	TRUE	FALSE
is	be	VERB	VBZ	aux	xx	TRUE	TRUE
looking	look	VERB	VBG	ROOT	xxxx	TRUE	FALSE
at	at	ADP	IN	prep	xx	TRUE	TRUE
buying	buy	VERB	VBG	pcomp	xxxx	TRUE	FALSE
U.K.	u.k.	PROPN	NNP	compound	X.X.	FALSE	FALSE
startup	startup	NOUN	NN	dobj	xxxx	TRUE	FALSE
for	for	ADP	IN	prep	xxx	TRUE	TRUE
$	$	SYM	$	quantmod	$	FALSE	FALSE
1	1	NUM	CD	compound	d	FALSE	FALSE
billion	billion	NUM	CD	pobj	xxxx	TRUE	FALSE

We can visualize syntactic dependency in a browser or notebook using the following:

displacy.render(doc, style='dep', options=options, jupyter=True)

The result is a dependency tree:

Dependency tree

We can get additional insights into the meaning of attributes using spacy.explain(), as here:

spacy.explain("VBZ")
verb, 3rd person singular present

Table of Contents for Parsing, tokenizing, and annotating a sentence

Create new playlist

Sign In

Sign Up

Table of Contents for
Parsing, tokenizing, and annotating a sentence