So what can be done to measure the size of the World Wide Web? Has anyone tried? Yes! The Netcraft Web Server Survey is a widely respected survey that attempts to contact each and every website that is accessible on the Internet.
Netcraft's primary goal is to determine web server market share: what percentage of websites run Microsoft's Internet Information Server, versus the Apache web server? But fortunately for us, they also count the websites visited by their automatic web-exploring "spider" software. And in February 2007, the Netcraft Web Server Survey found 108,810,358 distinct websites.
Of course, Netcraft's survey isn't perfect - there may be websites in the world that were not discovered by Netcraft's software. So, as of this writing, "there are more than 108 million websites" is the most accurate statement that can be made.
"But how many web pages are there?"
Large websites can have many thousands of pages. Dynamically generated sites can have a seemingly infinite number - and we must somehow agree not to count all of these. Yet most sites just have a few pages introducing a business or a person, or simply a placeholder home page.
Who could possibly tell us how many web pages there are? There are two obvious candidates: Google and Yahoo, the major search engine companies. Visiting, analyzing and indexing the billions of web pages in the world is their business. Unfortunately, neither company currently publicizes the exact size of its index and they have not done so since August 2005.
Is there anything we can do to arrive at a realistic estimate at the number of pages on the web? Yes: we can look at those August 2005 web page numbers, divide them by Netcraft's count of web sites in August 2005, and arrive at an estimated number of web pages per site. Assuming that the number of web pages per site has not changed drastically in a year and a half, we can then multiply Netcraft's February 2007 count of websites to arrive at a reasonable projection of the number of web pages in in the world as of February 2007.
So let's run the numbers!
Web pages in the world, August 2005: 19.2 billion pages were indexed by Yahoo as of August 2005.
Websites in the world, August 2005: 70,392,567 websites were indexed by Netcraft as of August 2005.
Web pages per website: 273 (rounding to the nearest whole number).
Web pages in the world, February 2007: multiplying our estimate of the number of web pages per website by Netcraft's February 2007 count of websites, we arrive at 29.7 billion pages on the World Wide Web as of February 2007.
Before You Complain...
"Hey, your methodology stinks!"
Yeah, no kidding! I have no way of verifying Netcraft's claims, or Yahoo's. Thing is, there's no way to do either without setting up a sophisticated web-exploring system of my own... one with the speed and capacity to explore millions of sites and billions of pages in a timely manner. Something very close to a search engine.
So if you have several million dollars US lying around that you'd like to see applied to researching this problem in an open fashion with a methodology that anyone can examine - feel more than free to send that money my way! But otherwise, let's be realistic and accept that a ballpark estimate is the best we'll be able to come up with. And honestly, who wants a World Wide Web that is so tightly controlled by a central authority that it can be easily measured? The web is growing all the time, no one is in charge, and by the time you've counted a tiny fraction of it, many sites have come and gone. It's a beautiful thing.
Got a LiveJournal account? Keep up with the latest articles in this FAQ by adding our syndicated feed to your friends list!