Building a new dataset with the NER annotation tool

There are many annotation tools available in different forms. Some are standalone and can be configured or installed on a local machine, some are cloud-based, some are free, and some are paid. In this section, we will focus on free annotation tools, get an idea of how to use them, and see what we can achieve with annotation.

To see how we can use annotations to create a dataset, we will look at these tools:

  • brat
  • Stanford Annotator

brat stands for brat rapid annotation tool and can be found at http://brat.nlplab.org/index.html. It can be used online or offline. Installing it on your local machine is simple: follow the steps listed at http://brat.nlplab.org/installation.html. Once installed and running, open the browser. You need to create a text1.txt file in the data/test directory with the following content:

Joe was the last person to see Fred. He saw him in Boston at McKenzie's pub at 3:00 where he paid $2.45 for an ale. Joe wanted to go to Vermont for the day to visit a cousin who works at IBM, but Sally and he had to look for Fred.

As it shows No document selected, using the Tab key, the document can be selected. We will create a text file name text1.txt as discussed about with the same content we used for processing in earlier examples:

It will display the contents of the text1.txt file:

To annotate the document, first we have to log in:

Once logged in, select any word you wish to annotate, and this will open the New Annotation window with the listed/configured Entity type and Event type. All this information is stored and preconfigured in the annotation.conf file in the data/test directory. You can modify the file as per your requirements:

Annotations will be displayed on the text as we go on selecting the text:

Once saved, the annotation file can be found as text1.ann [Filename.ann].

The other tool is the Stanford Annotation tool, which can be downloaded from https://nlp.stanford.edu/software/stanford-manual-annotation-tool-2004-05-16.tar.gz. Once downloaded, extract and double-click on annotator.jar, or execute the following command:

> java -jar annotator.jar

It will show the following:

Either you can open any text file, or you can write your content and save the file. The text we used in the previous example on annotation will be used again, just to show how to use the Stanford Annotation tool.

Once the content is available, the next step is to create the tags. From the Tags menu, select the Add Tag option, which will open the Tag creation window, as shown in the following screenshot:

Enter the tag name and click on OK. You will then be asked to select the color for the tag. It will display the tag in the right-hand pane of the main window, as shown in the following screenshot:

Similarly, we can create as many tags as we want to use. Once a tag is created, the next step is to annotate the text. To annotate text, let's say, Joe, select the text using the mouse and click on the Name tag on the right. It will add markup to the text, as shown here:

In the same way, as we did for Joe we can mark any other text as required, and save the file. The tag can also be saved so that it can be reused on other text. The saved files are normal text files and can be viewed in any text editor.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset