Search Engines

A search engine is a site that searches the Internet based on words that users enter. According to WSJ Market Data Group, Tradeline.com, as reported in The Wall Street Journal WSJ.com “Reality Bytes” on January 8, 2001, 72% of consumers find online shopping information through search engines. Some search engines capitalize on this and charge companies a fee for an advantageous placement in search results. Search engines such as GoTo.com rely on paid placements. For example, if someone searches for a retail product, the responses to her query will be listed in descending order depending on the fee paid to the search engine. Yahoo! charges for listings but only in its searches for retail products. Some search engines also get paid every time a user clicks on a link to a sponsor's site. Other search engines such as Northern Lights and Google do not accept paid listings. They do accept advertising but their ranking is based on the results of their searches and is independent of the advertising.

Background

Prior to the advent of the World Wide Web, users, mainly academicians and scientists who searched for information on the Internet, used specialized programs such as Archie, Gopher, Jughead and Veronica. Archie, short for archives, used File Transport Protocol (FTP) to create lists of servers containing files about particular topics. Prior to the availability of Archie, people searching for a topic had to know the exact IP address of the computer on which it was located. With Archie, people typed in keywords and Archie returned locations in the UNIX format that identified the host name, the file location and the directory name. Users would then use FTP to transfer files identified by Archie. Another of these programs, Gopher, while more menu driven, still relied on knowledge of UNIX commands but allowed for a rudimentary form of bookmarks to go directly to a file. In addition to being difficult to use, searches were limited to university- and government-sponsored sites and results were not context sensitive.

Once the World Wide Web and browsers were available, more user friendly search engines were developed and the number of sites that could be searched mushroomed. Early search engines Lycos, Yahoo! and WebCrawler started operating in 1994. In 1995, Excite, AltaVista and Infoseek (now part of Go.com, owned by Disney) debuted. AltaVista originally was owned and developed by Digital Equipment Corporation before it was bought by Compaq, who later sold it to CMGI. AltaVista was very fast for its time and was the first search engine to use natural language. It could understand queries with terms such as what if. It further included instructions for using Boolean terminology to refine searches—and, if and or terms used for computer logic. (Boolean logic is named after a 19th century mathematician.)

However, many of these search engines listed the names of thousands of sites when users requested searches. Often the sites in the list were not relevant to the topic and the list contained no site descriptions. Newer search engines use sophisticated methods to rank sites and to determine site relevancy. Inktomi innovated the use of clustered work stations rather than standalone computers to achieve faster searches. Servers in clusters can communicate simultaneously so that searches can be completed faster than with single servers.

Search Engines Today

Popular search engines include Ask Jeeves, Inc., Northern Light, HotBot, AltaVista, Google and LookSmart. Many of the largest search engines such as Excite and Yahoo! are portals as well with features such as news, entertainment information and weather services. Most of the major search engines license their technologies to portals and Internet service providers. For example, Yahoo! uses the Google engine. AOL Search, HotBot and iWon use Inktomi as their search engines.

There also are specialized information service providers on the Internet that charge for their services and offer more specialized information. For example, Northern Light Technology, Inc. offers search services and news information to businesses. EoExchange, Inc. also offers premium search services on specific topics.

How Search Engines Work

Search engines create databases of either the first page of a Web site or all of the sites' pages through automated software programs called spiders or bots (short for robot). (These are different terms for the same process.) A spider will “crawl” from site to site looking for key phrases or URLs. When it completes the search, it creates indexes (lists) of the pages. When people do searches, it lists them in a particular order using proprietary algorithms. The order in which pages are listed is called ranking. Factors used in ranking include how frequently terms are used, location of terms within the document (in the title or in the headers, etc.) and the number of pages at a site that use the term searched for. Other external factors may be used such as how many other sites link to the site, how often visitors “click” on a site (visit it) and placement fees. Pay-for-placement, which was pioneered by GoTo.com, is now a factor at America Online, Lycos, AltaVista and parts of Microsoft's MSN and others. LookSmart and Inktomi also charge a fee if companies such as retailers want individual pages listed for particular products.

Search engines are either meta-based, use spiders or are natural language–based.

  • Meta-based search engines, also called metasearch sites, such as Dogpile.com, Mamma, SavvySearch and MetaCrawler search multiple search engines and compile them into a list for searchers.

    Searching on the Web—Google

    Larry Page and Sergey Brin, the founders of Google Incorporated, met when they were PhD candidates in the computer science department at Stanford University. Page was looking at how sites link to each other and Brin was looking at data mining—what information is available and how to find it. They shared a feeling that Web searching needed to be improved and devised a new approach to sorting results. They took their ideas about searching for information to other search engine companies. However, they found that other companies' strategies were focused on expanding into areas such as media portals and adding functions to search engines to keep people at their sites.

    Brin and Page wanted to concentrate on search techniques and information retrieval so they decided to found their own company and successfully sought venture capital funding for their endeavor. Because their searches are fully automated, they operate with only 200 employees, half of whom are engineers or technical people. Over 40 have PhDs. Traffic to their site increases 20% per month although they do no advertising. According to Google, users hear about them by word of mouth. They perform 100 million searches per day, over 55% for users outside of the United States. They currently have sites in Canada, France, Germany, Italy, Japan, Korea and the United Kingdom as well as the United States. People can restrict their search to Web pages written in 26 distinct languages or search using an interface of one of 36 languages.

    Google uses two techniques in its searching, page ranking and mathematical algorithms. The page ranking system looks at other pages that link to particular sites. For example, if someone does a query on restaurants in Spain, the Google engine looks at and ranks restaurants based partially on how many sites link to it and the type of sites linking to it. For example, a New York Times link ranks higher than a link from a personal Web page. The proprietary mathematical equation analyzes the links to the page as well as looking at the text on the page—the headlines, bolding and the proximity of words to each other for relevancy of text or data on the page. This validation is important because it eliminates the possible “spam” effect of sites sending thousands of the same messages to a site.

    When users request a search at Google, they are actually searching Web pages located on Google's 8000 Linux-based computers. The computers are located in clusters at five server farms at hosting sites in the Internet. Google “crawls” the Web continuously looking for new and updated Web sites. (Crawling is the use of a software program to automatically search the Web.) Google believes it has the largest index in the world with 1.3 billion uniform resource locators (URLs). (A URL is the Web address for Internet sites.) It performs 100 million searches per day and has most of the Web pages for the useful sites in its index. It doesn't keep URLs for password-protected or personal Web pages at its site.

    Google, which predicts it will be profitable by year-end 2001, has two sources of revenue, context-sensitive advertising on its site and licensing its search engine. The context-sensitive advertising brings up ads related to products and services for which users request searches. It licenses site search and Web search products. Its Web search product is licensed to Yahoo!, Vizavvi (part of Vivendi), China Netease.com (the second largest ISP in China), Sprint, VirginNet and others. Companies such as Cisco license Google's site search product. Google initially provided its Web searching software without a charge to gain visibility at sites such as portals.

    Google focuses on making information universally available. It is developing interfaces so that wireless devices such as Japanese I-mode handsets and Palm VII wireless devices can be used for Google search. A test pilot is being conducted with BMW using speech recognition to search by voice. Searches can be done for PDF documents and development is underway to search for other image files such as JPEG files, which users may wish to use for PowerPoint presentations.


  • Spider-based search engines perform searches using automated software programs. The Northern Light, Inktomi, HotBot, AltaVista, Google and LookSmart search sites use spiders.

  • Ask Jeeves is a natural language search engine where people can ask questions in everyday sentences or phrases. Ask Jeeves keeps templates of commonly asked questions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset