Extracting relationships

Relationship-extraction identifies relationships that exist in text. For example, with the sentence, "The meaning and purpose of life is plain to see," we know that the topic of the sentence is "The meaning and purpose of life." It is related to the last phrase that suggests that it is "plain to see."

Humans can do a pretty good job of determining how things are related to each other, at least at a high level. Determining deep relationships can be more difficult. Using a computer to extract relationships can also be challenging. However, computers can process large datasets to find relationships that would not be obvious to a human or that could not be done in a reasonable period of time.

Numerous relationships are possible. These include relationships such as where something is located, how two people are related to each other, the parts of a system, and who is in charge. Relationship-extraction is useful for a number of tasks, including building knowledge bases, performing trend-analysis, gathering intelligence, and performing product searches. Finding relationships is sometimes called text analytics.

There are several techniques that we can use to perform relationship-extractions. These are covered in more detail in Chapter 10, Using Parser to Extract Relationships. Here, we will illustrate one technique to identify relationships within a sentence using the Stanford NLP StanfordCoreNLP class. This class supports a pipeline where annotators are specified and applied to text. Annotators can be thought of as operations to be performed. When an instance of the class is created, the annotators are added using a Properties object found in the java.util package.

First, create an instance of the Properties class. Then, assign the annotators as follows:

Properties properties = new Properties();         
properties.put("annotators", "tokenize, ssplit, parse"); 

We used three annotators, which specify the operations to be performed. In this case, these are the minimum required to parse the text. The first one, tokenize, will tokenize the text. The ssplit annotator splits the tokens into sentences. The last annotator, parse, performs the syntactic analysis, the parsing of the text.

Next, create an instance of the StanfordCoreNLP class using the properties' reference variable:

StanfordCoreNLP pipeline = new StanfordCoreNLP(properties); 

Then, an Annotation instance is created, which uses the text as its argument:

Annotation annotation = new Annotation( 
    "The meaning and purpose of life is plain to see."); 

Apply the annotate method against the pipeline object to process the annotation object. Finally, use the prettyPrint method to display the result of the processing:

pipeline.annotate(annotation); 
pipeline.prettyPrint(annotation, System.out); 

The output of this code is shown as follows:

    Sentence #1 (11 tokens):
    The meaning and purpose of life is plain to see.
    [Text=The CharacterOffsetBegin=0 CharacterOffsetEnd=3 PartOfSpeech=DT] [Text=meaning CharacterOffsetBegin=4 CharacterOffsetEnd=11 PartOfSpeech=NN] [Text=and CharacterOffsetBegin=12 CharacterOffsetEnd=15 PartOfSpeech=CC] [Text=purpose CharacterOffsetBegin=16 CharacterOffsetEnd=23 PartOfSpeech=NN] [Text=of CharacterOffsetBegin=24 CharacterOffsetEnd=26 PartOfSpeech=IN] [Text=life CharacterOffsetBegin=27 CharacterOffsetEnd=31 PartOfSpeech=NN] [Text=is CharacterOffsetBegin=32 CharacterOffsetEnd=34 PartOfSpeech=VBZ] [Text=plain CharacterOffsetBegin=35 CharacterOffsetEnd=40 PartOfSpeech=JJ] [Text=to CharacterOffsetBegin=41 CharacterOffsetEnd=43 PartOfSpeech=TO] [Text=see CharacterOffsetBegin=44 CharacterOffsetEnd=47 PartOfSpeech=VB] [Text=. CharacterOffsetBegin=47 CharacterOffsetEnd=48 PartOfSpeech=.] 
    (ROOT
      (S
        (NP
          (NP (DT The) (NN meaning)
            (CC and)
            (NN purpose))
          (PP (IN of)
            (NP (NN life))))
        (VP (VBZ is)
          (ADJP (JJ plain)
            (S
              (VP (TO to)
                (VP (VB see))))))
        (. .)))
    
    root(ROOT-0, plain-8)
    det(meaning-2, The-1)
    nsubj(plain-8, meaning-2)
    conj_and(meaning-2, purpose-4)
    prep_of(meaning-2, life-6)
    cop(plain-8, is-7)
    aux(see-10, to-9)
    xcomp(plain-8, see-10)
  

The first part of the output displays the text along with the tokens and POS. This is followed by a tree-like structure that shows the organization of the sentence. The last part shows the relationships between the elements at a grammatical level. Consider the following example:

prep_of(meaning-2, life-6)  

This shows how the preposition, of, is used to relate the words meaning and life. This information is useful for many text-simplification tasks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset