7.22 Search Engines and Directories

Some 85% of accessible web pages are found through search engines and directories. Directories list hierarchically by topic, and submissions are vetted by humans. By contrast, search engines are non-personal devices using complex (secret and changing) algorithms to rank sites, displaying them according to the keywords typed in by search engine users. Search engines are also divided into 'organic' (where ranking is free) and the commercial or pay-per-click variety (where site owners pay for top rankings).

Much material is not reached by the search engines. Even by 2004, a study by NEC Research Institute suggested that total search engine coverage had fallen from 60% to 42%.{10}

Search engines perform three operations. They firstly use a 'web crawler' or 'spider' to regularly search the web, finding sites and following links to collect information on all or most pages. Then they index that information, considering text, text headings, meta tags, graphics labels and links from and to the pages concerned. Lastly, they sort and store that information in ways that can be readily accessed by users of the search engine. Google stores whole pages, inbound link information and action taken on any AdWord links. Altavista stores every word.

When a user enters search words (also called keywords) into a search engine, the program searches its index and selects the best matches, ranking them by 1. relevance, 2. authority, 3. number and quality of incoming links (Google), 4. semantic clustering of keywords, and 5. statistical analysis of the keywords on the page (distribution, density, etc.). Some search engines (e.g. Ask) also allow searches by whole sentence queries rather than simply by key words. Search engines earn revenues from advertising and by incorporating pay-per-click services.

Web directories rely on sites being recommended to them (submissions) and do not generally search the web. They are created and maintained by humans rather than search algorithms, and list sites by category and subcategory. The best known are Yahoo! Directory and the Open Directory Project. The first has a paid submission service. The second is the most prestigious on the web, and is free. In some directories the paid-for-inclusions are ranked according to their bid amount.

Directories also supply information to search engines.

Search Engine Popularity

There are many search engines, but only five are now important to the English-speaking world. {4}

 

USA

UK

 

% Volume

% Visits

% Volume

% Visits

Google

65.55

63.40

90.16

83.68

Yahoo

15.46

12.21

2.94

3.46

Bing

13.97

14.47

4.19

4.94

Ask

2.74

2.15

1.36

1.89

AOL

1.58

1.29

-

0.27

Total

99.30

93.52

98.65

94.24

Search engine use varies by country.

 

National Search

Engine(s)

Google

Yahoo

Bing

Brazil

-

97%

1%

1%

Czech Republic

47.7%

40.8%

-

-

China

64.7%

30.9%

-

-

Denmark

15%

80%

1%

1%

France

-

89.5%

2.5%

2.8%

Germany

2%

89%

2%

-

India

c2%

94%

c2%

c2%

Italy

c3%

91%

c3%

c3%

Japan

3.8%

31.3%

56.2%

-

Mexico

-

c90%

c5%

c5%

Netherlands

1%

95%

1%

3%

Norway

-

95%

2%

2%

Portugal

10%

90%

-

-

Russia

c8%

68%

23%

1%

Slovakia

-

98%

1%

1%

South Korea

93%

4%

3%

-

Spain

4%

73.4%

17.4%

-

Sweden

-

71%

14%

c8%

UK

-

c90%

c5%

c4%

USA

 

c70%

14.4%

9.9%


Some search engines use or incorporate other search engine results. Examples:

      Yippy: also searches the 'deep web', i.e. beyond the range of conventional engines.
      Dogpile: returns results from leading search engines, including Google, Yahoo! and Bing.

Search engines specialize, or are particularly useful for some types of information. Examples:

      Webopedia: simple Internet and computer information.
      SearchEngineGuide. Search engines and guides listed under nearly 100 categories.
      ResourceShelf. Results of web searches by librarians and others.

History

The first tool created to search the Internet was the 1990 'Archie', followed a year later by the better-known 'Gopher'. In 1994 came 'WebCrawler', the first 'full text' crawler-based search engine, which was followed over the next ten years by search engines attracting considerable interest and investment: Magellan, Excite, Infoseek (now Go), Inktomi, Northern Light, AltaVista and Yahoo! It was a period of intense competition, with search engines incorporating results from other search engines, and sometimes acquiring the search engine company itself. Overture, for example, owned AlltheWeb and AltaVista. Yahoo! acquired both Inktomi (2002) and Overture (2003). Microsoft launched MSN Search in 1998 using Inktomi search results, then displayed Looksmart listings, blended results in from Inktomi, switched to Google technology until 2004, developed its own search technology thereafter, launched Bing in 2009, and finally leased that technology to Yahoo Search. Google provided a ranking method based on number and quality of incoming links, and rose rapidly to prominence after 2000.

The search engine market is now dominated by a few big players. Netmarketshare {5} gave the market share in April 2011 as: Google 84.64%, Yahoo, 5.15%, Baidu 4.30%, Bing 3.91%, Ask, 0.53%, AOL 0.38%, Excite 0.02%, Lycos 0.01% and AltaVista 0.01%. MSN, Microsoft Live Search and All the Web were shown as 0.00% (presumably less than 0.01%). All markets were global except Baidu (China), Bing and Microsoft Live Search.

Questions

1. What three operations do search engines perform?
2. Provide a short history of search engine development.
3. How do search engines specialize? Give some examples.
4. How do search engines differ from search directories?
5. What are the five most popular search engines in the US? What are the more striking differences in other countries?

Sources and Further Reading

1. History of Search Engines: From 1945 to Google Today by Aaron Wall. Search Engine History. Undated but probably 2008.
2. Web Search Engine. Wikipedia. History and how they work.
3. Search Engine Popularity Statistics. March 2011. Smart Insight. Useful tables and advice
4. comScore Releases December 2010 US Search Engine Rankings. Comscore. US tables.
5. Total share of search engine. MarketShare April 2011. Search Engine market share.
6. The Webcertain Search and Social Report 2010. Webcertain. Search and social media activity across the world.
7. The Search Engine List. About.Com. Search engines grouped by type or area of search.
8. SearchEngine Colossus. Extensive listing of search engines in 312 countries: for search and site submission.
9. SearchEngine Showdown. Guide to searching with various search engines and directories.
10. Searching the web. Studholme. List of local/national search engines.