4 ◾ Vijayan Sugumaran and Jon Atle Gulla
1.1 Introduction
Since Tim Berners-Lee’s original idea for a global system of interlinked hypertext
documents from 1989, the World Wide Web has grown into the world’s biggest
pool of human knowledge. Over the past few years, the Web has changed the way
people communicate and exchange information. It has created new business oppor-
tunities and obliterated old business practices. As a borderless source of informa-
tion, it has been instrumental in globalization and cooperation among people and
nations. Importantly, it has also helped individuals join virtual communities and
take part in social networks that cross physical, cultural, and organizational barri-
ers. e rapid growth of information on the World Wide Web has, however, created
a new set of challenges and problems.
Information overload—In 1998, the size of the Web was estimated to exceed
300 million pages with a growth rate of about 20 million per month (Baeza-Yates
and Ribeiro-Neto, 1999). e real size of the Web today is dicult to measure,
although Web search indices cite a lower band number of unique and meaningful
Web pages. e Google search index was measured around 500 million pages in
2000, 8 billion in 2004, and more than 27 billion today. is constitutes an enor-
mous amount of information about almost any conceivable topic. While the early
Web often suered from a lack of high-quality relevant pages, the present Web now
contains far too many relevant pages for any user to review. As an example, at the
time of this writing, Google is returning about 18.6 million pages for the “World
Wide Web” search phrase. If you fail to mark it as a phrase, an astonishing 113 mil-
lion pages are found to be relevant and presented on the result page. In addition,
the deeper Web generates information dynamically based on users’ queries.
Poor retrieval and aggregation—e explosion of Web documents and
services would not be so critical if users could easily retrieve and combine the
information needed. Since Web documents are at best semi-structured in simple
natural language text, they are vulnerable to obstacles that prevent ecient content
retrieval and aggregation. An increasing problem is the number of languages used
on the Web. Studies of Langer (2001) suggested that almost 65% of Web pages
were in English in 1999; data from Internet World Stats* indicate a more balanced
use of languages. e English using population at the end of 2009 constituted
only 27.7% of total online users. e plethora of languages now used on the Web
prevents search applications from applying language-specic strategies, and they
still depend on content-independent statistical models. In a similar vein, the many
*
www.internetworldstats.com
1.5.3 Part III: Ontology Engineering and Evaluation ..............................14
1.5.4 Part IV: Semantic Applications .......................................................15
Acknowledgment ................................................................................................16
References ...........................................................................................................16