Finding coreference resolution entities

Coreference resolution refers to the occurrence of two or more expressions in text that refer to the same person or entity. Consider the following sentence:

"He took his cash and she took her change and together they bought their lunch."

There are several coreferences in this sentence. The word his refers to He and the word her refers to she. In addition, they refers to both He and she.

An endophora is a coreference of an expression that either precedes it or follows it. Endophoras can be classified as anaphors or cataphors. In the following sentence, the word It is the anaphor that refers to its antecedent, the earthquake:

"Mary felt the earthquake. It shook the entire building."

In the next sentence, she is a cataphor, as it points to the postcedent, Mary:

"As she sat there, Mary felt the earthquake."

The Stanford API supports coreference resolution with the StanfordCoreNLP class using a dcoref annotation. We will demonstrate the use of this class with the previous sentence.

We will start with the creation of the pipeline and the use of the annotate method, as shown here:

String sentence = "He took his cash and she took her change "  
    + "and together they bought their lunch."; 
Properties props = new Properties(); 
props.put("annotators",  
    "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); 
StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 
Annotation annotation = new Annotation(sentence); 
pipeline.annotate(annotation); 

The Annotation class' get method, when used with an argument of CorefChainAnnotation.class, will return a Map instance of the CorefChain objects, as shown here. These objects contain information about the coreferences found in the sentence:

Map<Integer, CorefChain> corefChainMap =  
    annotation.get(CorefChainAnnotation.class); 

The set of CorefChain objects are indexed using integers. We can iterate over these objects, as shown in the following code. The key set is obtained and then each CorefChain object is displayed:

Set<Integer> set = corefChainMap.keySet(); 
Iterator<Integer> setIterator = set.iterator(); 
while(setIterator.hasNext()) { 
    CorefChain corefChain =  
        corefChainMap.get(setIterator.next()); 
    System.out.println("CorefChain: " + corefChain); 
} 

The following output is generated:

CorefChain: CHAIN1-["He" in sentence 1, "his" in sentence 1]
CorefChain: CHAIN2-["his cash" in sentence 1]
CorefChain: CHAIN4-["she" in sentence 1, "her" in sentence 1]
CorefChain: CHAIN5-["her change" in sentence 1]
CorefChain: CHAIN7-["they" in sentence 1, "their" in sentence 1]
CorefChain: CHAIN8-["their lunch" in sentence 1]

We get more detailed information using methods of the CorefChain and CorefMention classes. The latter class contains information about a specific coreference found in the sentence.

Add the following code sequence to the body of the previous while loop to obtain and display this information. The startIndex and endIndex fields of the class refer to the position of the words in the sentence:

System.out.print("ClusterId: " + corefChain.getChainID()); 
CorefMention mention = corefChain.getRepresentativeMention(); 
System.out.println(" CorefMention: " + mention  
    + " Span: [" + mention.mentionSpan + "]"); 
 
List<CorefMention> mentionList =  
    corefChain.getMentionsInTextualOrder(); 
Iterator<CorefMention> mentionIterator =  
    mentionList.iterator(); 
while(mentionIterator.hasNext()) { 
    CorefMention cfm = mentionIterator.next(); 
    System.out.println("	Mention: " + cfm  
        + " Span: [" + mention.mentionSpan + "]"); 
    System.out.print("	Mention Mention Type: "  
        + cfm.mentionType + " Gender: " + cfm.gender); 
    System.out.println(" Start: " + cfm.startIndex  
        + " End: " + cfm.endIndex); 
} 
System.out.println(); 

The output is as follows. Only the first and last mentions are displayed to conserve space:

    CorefChain: CHAIN1-["He" in sentence 1, "his" in sentence 1]
    ClusterId: 1 CorefMention: "He" in sentence 1 Span: [He]
      Mention: "He" in sentence 1 Span: [He]
      Mention Type: PRONOMINAL Gender: MALE Start: 1 End: 2
      Mention: "his" in sentence 1 Span: [He]
      Mention Type: PRONOMINAL Gender: MALE Start: 3 End: 4
    ...
    CorefChain: CHAIN8-["their lunch" in sentence 1]
    ClusterId: 8 CorefMention: "their lunch" in sentence 1 Span: [their lunch]
      Mention: "their lunch" in sentence 1 Span: [their lunch]
      Mention Type: NOMINAL Gender: UNKNOWN Start: 14 End: 16
  

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset