How to build a table of search engines
To build this table of search engines I use
many resources available on Internet. These resources are reported in the
table below.
By using this table I can find out for example:
- What are the best search engines?
It is enough to search the same word about well known documents (I use my name)
and simply by looking at counts you get a quick idea on how search engines
compare.
| search engine | 20 Aug 97 | 20 Oct 97 | 17 Nov 97 | 10 Nov 97 | 9 Feb 98 | 27 Mag 98 | 11 Jan 99 | 1 Oct 99 | 21 Apr 00 | 16 Jan 01 | 5 Jun 01 | 18 Sep 02 | 7 Nov 03 | 2 Nov 04
| | AltaVista | 3290 | 1253 | 1239 | 1199 | 5986 | 8009 | 11862 | 8421 | 13362 | 16489 | 28984 | 36398 | 73000 | 379000
|
| Excite | 2166 | unreachable | 1956 | 2406 | 1594 | 2163 | 1779 | 2297 | ? | 2675 | 2455 | | |
| Hotboot/Inktomi | 2360 | 3711 | 4333 | 6024 | 6845 | 6381 | 3700 | 2220 | ? | 22400 | 19600 | 77600 | 76764 | 71988
|
| Infoseek | 2273 | 1025 | 1406 | 1700 | 1853 | 1863 | 2739 | 2788 | 2360 | 6099 | | |
| Lycos | 575 | 418 | 449 | 449 | 336 | 924 | 1888 | ? | 1983 | 19874 | 45336 | 150634 | 279830 | 74821
|
| OpenText | 363 | | 363 | 363 | 363 | dead
|
| Magellan | 140 | | | 119 | 72 | 106
|
| Webcrawler | 90 | | 83 | 83 | 72 | 106 | 53 | 113 | 145 | 145
|
| Arianna | 416 | | 828 | 891 | 921 | 832 | 1121 | 1816 | 2349 | 2488 | 138 | 402 | 9560 | 10270
|
| Yahoo | 2 | | | | | | | | | | | | | 374000
|
| AltaVista Cern | 717 | | 150 | |
| Northern Light | 3216 | 3343 | 3766 | 3956 | 4574 | 5167 | 8173 | 12939 | 17263 | 30190 | 32862 | |
| AOL NetFind | | 1636 | 1956 | 2406 | | 2163 | 1779
|
| Planetsearch | | | 479 | 508 | 487 | 694
|
| Euroseek | | | 1102 | 812 | 825 | 1038 | 921
|
| Google | | | | | | | 3492 | 4196 | 10503 | 51200 | 67200 | 119000 | 133000 | 252000
|
| FAST/AllTheWeb | | | | | | | | 13033 | 17420 | 35500 | 40396 | 88534 | 280000 | 260000
|
| WebTop | | | | | | | | | | | 17555
|
| Teoma | | | | | | | | | | | | 21078 | 103000 | 188700
|
| Wisenut | | | | | | | | | | | | 45974 | 51326 | 57586
|
Reports
- 2 Nov 2004
Big changes from last year. Yahoo has its own database and is more or less at
the same level of Google which declares 4,285,000,000 web pages. AltaVista
and AllTheWeb although have still a different database, they belong
to Yahoo.Teoma belongs to AskJeves and his third concerning database size(if we
ignore AltaVista and AllTheWeb whose database are not in "good shape":results
not relevant). Then we have the "old" database Inktomi which still seems
to power MSN. But Microsoft has announced that very soon will have its own
database , probably becoming the third great player in the field.
Follow Lycos (that seems to go out of businnes) and Wisenut (this one stationary).
- 7 Nov 2003
Both Gooogle and AllTheWeb claim to have more than 3,000,000,000 pages indexed.Lycos has the same database of AllTheWeb.I don't understand the big difference in the
results between Google and AllTheWeb. Anyhow Teoma seems to have reached the third place. HotBot doesn't exist anymore but its database (Inktomi) is still used
by Microsoft and is stationary. AltaVista still manages to survive with some increase in the database. WiseNut is stationary.
- 18 Sep 2002
Excite,Infoseek and NorthernLight are out of businnes.The big players are now
Google,Lycos,AllTheWeb,HotBot (based on Inktomi which is also behind MSN).
Since Google claims to have 2,500,000,000 Web page in its database, you can
assume that all these search engines have more than 1,000,000,000 pages.
AltaVista still survives but is a lot behind. Newcomers are Teoma and Wisenut.
- 5 June 2001
First let's say that the big increase of Lycos is due to the fact that it is
using the FAST database. So the really big databases are now Google,FAST,Altavista,HotBot powered by Inktomi,WebTop,NorthernLight. All these should be with sizes
of database from 500 millions to 1000 millions of Web pages.
- 16 Jan 2001
Google is for now the absolute winner with more than 1,500 million Web pages indexed.This may be 90% of the total Web but it is very difficult to say how big is
the total Web since a lot of content is stored by now in databases. According
to some report, this invisible Web can be 50 times the visible Web (made by
static pages)! Anyhow only FAST and Northern Light are behind Google with around 50% total Web indexed.
Follow Hotbot,Lycos,Altavista with around 30%. Infoseek and Excite seem to not
catch anymore. Accordind to browserwatch there should be also a new major
search engine: WebTop. But it seems to be at the same level of these last two.
Google is also the absolute winner regarding the ranking of results. But also
the other search engines are getting better and better in this regard.
- 21 Apr 2000
The number of indexable Web pages should be by now more than a billion.
Altavista,Nortern Light, Fast, Google are all around 25% Web indexed.Infoseek
and Lycos are at 5%. Excite is reported to be 20%. Hotbot,since is powered
by Inktomi that has announced 500 millions of pages indexed, could be higher
than the others but it is difficult to judge since it gives no precise results.
- 1 Oct 99
The Web is now 800 million pages.The six big "historic" search engines are
having trouble catching up. The best is Northern Light that should be at around
200 million or 25%. But AltaVista and
the others have still the same data base so AltaVista is now indexing only 20% and the others less than 10%.(HotBot has in fact decreased its data base) There are however 2 new big search engines. FAST is now indexing slightly more than Northern Light. Google is indexing
slightly more than 10% but is becoming very popular because of the way it ranks results.It uses an algorithm based on popularity that gives very good results.
- 11 Jan 99
Altavista and Northern Light are still at the top; so they should still have
around 40% of the Web classified.Hotbot seems to have problems although its
database is bigger than the other.The other big three (Infoseek,Lycos and
Excite) seem to catch up but their database should be
still at 10%. A new interesting entry is Google because of its innovative
ranking algorithm.Instead all the other new portals like go.com
are using the old search engines.
- 27 Mag 98
As you can see from this study we have a precise measurement of the size of the Web. The Web is 275,000,000 pages.
Altavista,Hotbot and Northern Light have each classified around 100,000,000
or 40% , Infoseek and Excite are still around 30,000,000 or 10%. Lycos is probably at 100,000,000 but returns less hits because it doesn't classify the whole
page.
In this study there is also a measurement of the increase in size
with a doubling in the last 9 months.
- 9 Feb 98
No major changes to report. Among the big 6 (AV,lycos,Infoseek,Nortern Light,
Excite,Hotbot) Hotbot and Northern Light continue to grow. For Alta Vista
there seems to be a big jump forward. I don't understand this. According
someone AV should be with more than 100 millions pages at the same level
of Hotbot. But my search for "fractal" still gives
it at a level of 30-50 millions.
- 10 Dec 97
Lycos,Open Text and WebCrawler seem not to update their data base anymore.
A search on "fractal" on the big five gives 107000 Hotbot, 76000 Northern Light,
around 30000 Altavista,Excite,Infoseek.
From these results we can try to make an educated guess of the data base size.
Altavista,Infoseek,Excite 30 millions Northern Light 60 millions Hotbtot 100
millions. In any case the first three seem to have reached their maximum capacity, instead Northern Light and Hotbot continue to grow.
- 17 Nov 97
To the "magnificent" six(Yahoo, AltaVista, HotBot, Excite, Infoseek, Lycos) we
must add Northern Light.
- 20 Aug 97
You see that Altavista, Excite, Hotbot, Infoseek are the best in terms of percentage of the Web classified and more or less are the same. To these I add Lycos
whose lesser count is due to the fact that it doesn't classify all the document.OpenText,Magellan,Webcrawler classify an order of magnitude less documents.
Arianna is reported only for comparison since is probably the best search
engine for italian only material:it has more or less the same material that
is in the international search engines. Yahoo of course is an outsider since
its documents are classified by people:so very little documents but very good.
Note that since these numbers have remained the same for the last year and they
correspond to a around 50 millions Web documents we must assume that the percentage of Web documents now classified must have decreased from 50% to perhaps 20% of the total.
- What percentage of the Web is classified?
Altavista has (21 August 1997) 48 documents with "zito" classified on the
intranet cern.ch. The same intranet reports instead 717 documents!
- What are the most used search engines?
A visit to the 100 hottest Web sites can answer this question.
- What are the best search engines according to the experts?
It depends on what experts to trust, but you can look at what Netscape,
Microsoft,Hotwired,PCmagazine,etc recommand.
- How good is a search engine at "concept extraction"?
Search engines that work with robots are trying to make their databases
more helpful by using artificial intelligence techniques. For example
you can try to classify in some way a document and give the possibility
to the user to look at documents similar (in this classification) to the
document found. A search in a well known field of knowledge (in my case
"fractals") on the major search engines, will make clear how this feature
is working.
From this point of view(date of September,3 1997), the best seems Excite that will link each document
to similar ones and also suggest other possible keywords to restrict the
search. AltaVista will suggest only other possible keyword.
Infoseek (that normally returns also documents on the "subject" requested that
may not contain the word: i.e. consider the term as a concept) will suggest
some categories in its catalog and also a "best bet".
Lycos,Hotbot offer nothing.
- What are the best search engines for Usenet news?
Here,again,
a quick search for zito will show that (Sept. 4,1997) Dejanews
is the best with more than 1000 hits from its database of almost two years
of news.Other motors will often use directly dejanews for this service or
have a smaller data base (Alta Vista 154:around 3 months,Hotbot 37,reference.com 32 but searches also some mailing lists).
- It is possible to use search engines with Usenet newsgroups search,to
read a single newsgroup?(with no need to use a usenet news provider)
- What is the best parallel or metasearch site?
- Are there Eureopean sites comparable to the nordamerican ones?
- How you classify search engines?
- What are the best search engines for news?
- How to get rid of linkrot?
I use the program (freeware) InfoLink to
check periodically all URL in the document. In the table you find a link to
the latest report from this program.
- How fresh is the database?Of course, if a site doesn't receive any visit
from a search engine for a month,any change wan't be updated in the database.
So,taking in account that one month is the maximum time needed by a search
engine to update its data, if,in the table on searches about "zito",
the number of found items is always the same, then probably the search
engine is having problems. For example this has happened for OpenText.
BACK to Ufficio Informazioni Virtuale