186 ◾ Sebastian Blohm et al.
8.1 Introduction
e Semantic Web relies on explicit content captured by means of the RDF(S) and
OWL Semantic Web languages. As most of the content (text in particular) avail-
able on the Web is still unstructured, eective and ecient techniques are needed
to extract and capture the relevant information so that it can be formalized in RDF
and/or OWL, thus becoming accessible to the Semantic Web. e data models of
the Semantic Web (especially RDF) build on the central notion of a triple of the
form (s, p, o) where s is the so-called subject, p the predicate, and o the object. e
extraction of binary relations or tuples (s, o) standing in a relation (predicate or
property) p is thus a crucial task from a Semantic Web perspective. In this chapter
we are concerned with extracting binary relations or tuples from large bodies of
textual data.
Many approaches have been presented to deal with this task. On the one
hand, we can distinguish discriminative techniques that train a statistical classi-
er using machine learning techniques to spot mentions of the relation of inter-
est [5,21,22,23,24]. On the other hand, pattern-based approaches that induce and
match explicit patterns have also been proposed [4,6,8,15,16,18,10].
Patterns can be understood as constraints on the surface form of a text frag-
ment. We say that a pattern matches a textual fragment if the fragment fullls all
the constraints dened by the pattern. In this work, we follow the pattern-based
approach and present a new class Taxonomic Sequential Patterns (TSPs). Patterns
can be thought of as simple crisp classiers that match or do not match a given
text fragment. If they match, we can extract the corresponding tuple by way of
marked argument positions in the pattern. Clearly, the expressiveness of the pat-
tern class considered determines the performance of a pattern-based approach to
relation extraction.
Many pattern-based approaches to relation extraction incorporate some sort of
morpho-syntactic or semantic types into the pattern class to yield more general pat-
terns. However, the impact of these features on extraction quality is typically not
assessed. We can expect that at least two dimensions of pattern classes have a major
impact on extraction performance: (1) the pattern language elements that allow for
under-specication (wild card, skip, disjunction) and (2) the set of features (mor-
pho-syntactic, semantic types, etc.) taken into account during pattern matching.
8.6.1 Evaluation Protocol .......................................................................198
8.6.2 Dataset and Preprocessing ............................................................199
8.6.3 Experimental Setup ..................................................................... 200
8.6.4 Experimental Results ....................................................................202
8.7 Related Work ...........................................................................................205
8.8 Conclusions ..............................................................................................207
Acknowledgement ............................................................................................ 208
References ........................................................................................................ 208