Playing In Googlebots Sandbox With Slurp, Teoma, & Msnbot - Spiders Display Differing Personalities
|
|
There has been endless webmaster speculation and worry aboutthe so-called "Google Sandbox" - the indexing time delay fornew domain names - rumored to last for at least 45 days fromthe date of first "discovery" by Googlebot. This recognizedlisting delay came to be called the "Google Sandbox effect."
Ruminations on the algorithmic elements of this sandbox timedelay have ranged widely since the indexing delay was firstnoticed in spring of 2004. Some believe it to be an issue ofone single element of good search engine optimization suchas linking campaigns. Link building has been the focus ofmost discussion, but others have focused on the possibilityof size of a new site or internal linking structure or justspecific time delays as most relevant algorithmic elements.
Rather than contribute to this speculation and furthermuddy the Sandbox, we'll be looking at a case study of asite on a new domain name, established May 11, 2005 and thespecific site structure, submissions activity, external andinternal linking. We'll see how this plays out in search engine spider activity vs. indexing dates at the top foursearch engines.
Ready? We'll give dates and crawler action in daily lists andsee how this all plays out on this single new site over time.
* May 11, 2005 Basic text on large site posted on newlypurchased domain name and going live by days end. Searchfriendly structure implemented with text linking makingfull discovery of all content possible by robots. Homepage updated with 10 new text content pages added daily.Submitted site at Google's "Add URL" submission page.
* May 12 - 14 - No visits by Slurp, MSNbot, Teoma or Google.(Slurp is Yahoo's spider and Teoma is from Ask Jeeves)Posted link on WebSite101 to new domain at Publish101.com
* May 15 - Googlebot arrives and eagerly crawls 245 pageson new domain after looking for, but not finding therobots.txt file. Oooops! Gotta add that robots.txt file!
* May 16 - Googlebot returns for 5 more pages and stops.Slurp greedily gobbles 1480 pages and 1892 bad links!Those bad links were caused by our email masking meantto keep out bad bots. How ironic slurp likes these.
* May 17 - Slurp finds 1409 more masking links & only 209new content pages. MSNbot visits for the first time andasks for robots.txt 75 times during the day, but leaveswhen it finds that file missing! Finally get around to add robots.txt by days end & stop slurp crawling email masking links and let MSNbot know it's safe to come in!
* May 23 - Teoma spider shows up for the first time and crawls 93 pages. Site gets slammed by BecomeBot, a spiderthat hits a page every 5 to 7 seconds and strains ourresources with 2409 rapid fire requests for pages. Added BecomeBot to robots.txt exclusion list to keep 'em out.
* May 24 - MSNbot has stopped showing up for a week sincefinding the robots.txt file missing. Slurp is showing upevery few hours looking at robots.txt and leaving againwithout crawling anything now that it is excluded fromthe email masking links. BecomeBot appears to be honoringthe robots.txt exclusion but asks for that file 109 timesduring the day. Teoma crawls 139 more pages.
* May 25 - We realize that we need to re-allocate serverresources and database design and this requires changesto URL's, which means all previously crawled pages arenow bad links! Implement subdomains and wonder what now?Slurp shows up and finds thousands of new email maskinglinks as the robots.txt was not moved to new directorystructures. Spiders are getting errors pages upon newvisits. Scampering to put out fires after wide-rangingchanges to site, we miss this for a week. Spider actionis spotty for 10 days until we fix robots.txt
* June 4 - Teoma returns and crawls 590 pages! No others.
* June 5 - Teoma returns and crawls 1902 pages! No others.
* June 6 - Teoma returns and crawls 290 pages. No others.
* June 7 - Teoma returns and crawls 471 pages. No others.
* June 8-14 Odd spider behavior, looking at robots.txt only.
* June 15 - Slurp gets thirsty, gulps 1396 pages! No others.
* June 16 - Slurp still thirsty, gulps 1379 pages! No others.
So we'll take a break here at the 5 weeks point and take noteof the very different behavior of the top crawlers. Googlebotvisits once and looks at a substantial number of pages butdoesn't return for over a month. Slurp finds bad links and seems addicted to them as it stops crawling good pages untilit is told to lay off the bad liquor, er that is links bygetting robots.txt to slap slurp to its senses. MSNbot visitslooking for that robots.txt and won't crawl any pages untiltold what NOT to do by the robots.txt file. Teoma just crawlslike crazy, takes breaks, then comes back for more.
This behavior may imitate the differing personalities of thesoftware engineers who designed them. Teoma is tenacious and hard working. MSNbot is timid and needs instruction and somereassurance it is doing the right thing, picks up pages slowlyand carefully. Slurp has addictive personality and performserratically on a random schedule. Googlebot takes a good longlook and leaves. Who knows whether it will be back and when.
Now let's look at indexing by each engine. As of this writingon July 7, each engine also shows differing indexing behavioras well. Google shows no pages indexed although it crawled 250 pages nearly two months ago. Yahoo has three pages indexedin a clear aging routine that doesn't list any of the nearly8,000 pages it has crawled to date (not all itemized above.)MSN has 187 pages indexed while crawling fewer pages thanany of the others. Ask Jeeves has crawled more pages to datethan any search engine, yet has not indexed a single page.
Each of the engines will show the number of pages indexed ifyou use the query operator "site:publish101.com" without thequotes. MSN 187 pages, Ask none, Yahoo 3 pages, Google none.
The daily activity not listed in the three weeks since June 16above has not varied dramatically, with Teoma crawling a bitmore than other engines, Slurp erratically up and down and MSN slowly gathering 30 to 50 pages daily. Google is absent.
Linking campaign has been minimal with posts to discussionlists, a couple of articles and some blog activity. Lookingback over this time it is apparent that a listing delay isactually quite sensible from the view of the search engines.Our site restructuring and bobbled robots.txt implementationseems to have abruptly stalled crawling but the indexingbehavior of each engine displays distinctly differing policyby each major player.
The sandbox is apparently not just Google's playground, butit is certainly tiresome after nearly two months. I think I'dlike to leave for home, have some lunch and take a nap now.
Back to class before we leave for the day kiddies. What didwe learn today? Watch early crawler activity and be certainto implement robots.txt early and adjust often for bad bots.Oh yes, and the sandbox belongs to all search engines.
Mike Banks Valentine is a search engine optimization specialistwho operates http://WebSite101.com and will continue reports ofcase study chronicling search indexing of http://Publish101.com
|
|
|
Created & Maintained by Empower! CMS Web Sites
Host2Sell Web Hosting | Emarketing Workshops | Site SEO Review | FREE Newsletter10 Ways To Indirectly Get To The Top Of Search
There are millions of web sites trying to get listed inthe top 20 spots of the major search engines. Thatamounts to a lot of competition! I say if you can'tget listed at the top, indirectly get to the top.How do you do this? Look up the top 20 web siteson the major search engines under the keywords andphrases people would find your web site. The keywould be to then advertise on those web sites.The most expensive way would be to buy ad spaceon those web sites. If you don't want to spend anymoney, you could use the ten strategies below.These strategies may not apply to every web site.1. Participate on their discussion ...(related: Search Engine Optimization)
How To Use The Google Patent To Get More Traffic
According to the recent release of the Google Patent Application, many of the things you're doing to get better page rank and increase your position in natural se...(related: Search Engine Optimization)
Is Page Not Found Making Google Tell The World Site Not Found?
Search Engines are hard to tame, that's for sure. But if you can get on their good side, search can be your biggest ally when it comes to generating tons of free traffic toyour business web site. Not many people understand how search engines think. So, searchengine "optimization" ends up either ignored orleft up to highly paid experts.
You can do this yourself, or at least very affordably
Here's a quick tip that you can put to use immediately thatdoesn't require any specialized knowledge of the Internet. You can do this yourself if you know how toprogram HTML. Or, if you don't, you can hiresomeone far more affordably than a Web Marketing guruto hand...(related: Search Engine Optimization)Opinion ? Search Engine Success
This article is actually the summary to a book soon to be released by the author, titled "Guaranteed Website Success". Opinions are quite often controversial. Such is the nature of this one.There a many opinions and conclusions being expressed by so called "experts" at this time. We can't turn a blind eye to all this information but nothing will replace our own logic and powers of observation. I would like to take a minute to summarize and express m...(related: Search Engine Optimization)
The Great Search Engine War, Where Content Is King
When search engines first appeared, they were simple affairs consisting of a relatively basic database containing small...(related: Search Engine Optimization)
Search Engines The Masters Of The Internet Universe ? Part 1
Trillions of Billions of content pages make up the wide world of Internet. Keeping a house clean and arranged with proper placement for each household item is so big a task for each of us that it is a much despised daily chore. Ever wonder who or what keeps the Internet clean organized and keep them in arms length for you when you need it? This humongous task of making sure the right set of data is kept organized and delivered as information to the right folks when they need it are done by the Search Engines (SE). There are many popular search engines that deliver search results for countless number of searches every Nano-second. The Internet is the first place folks go for obtaining a...(related: Search Engine Optimization)
Search Engines The Masters Of The Internet Universe ? Part 2
"This is a continuation of the pervious part, split into multiple parts for keeping the article size manageable."When we thought we are just boxed in by the same set of search engine perspectives, there it comes, "help" flapping its wide wings in the form of meta-search engines. These search engines gets results from multiple search engines and giv...(related: Search Engine Optimization)
The Importance Of Correct Html Syntax In Search Engine Positoning
There is a lot of competition to get good spots in the search engines. Proper html syntax and a clean code will help toward better positoning.With website design mostly done with wysiwig editors now, there seems to be a lot of reliance on these tools without r...(related: Search Engine Optimization)
Easy Web Tips
How can you be found on the web?The web is a necessity as we have mentioned before. It has become one of the things that you will be asked about in almost every meeting you have with clients. You may also be asked at any of the networking groups you attend. Why do people ask? It is simply to verify what you are saying. The web has a tendency to substantiate all of the elevator pitches and spiels that you give when first meeting with people. When I first started out reinventing myself and wishing to go back to my consulting background, I called on someone that I had worked with on some interesting projects. I told her what I was planning on doing since the project we were working on was cancelled. I forwarded my contact information outside of the project and the first thing she did was to look at the...(related: Search Engine Optimization)
List Building Vs. Search Engine Optimization
It seems the excitement about search engine optimization fades in and out from time to time. The more people talk about search e...(related: Search Engine Optimization)
Search Engine Marketing (sem) - Houses On Sand
Do you depend on free search engine traffic for your livelihood?I admit it. I spend an inordinate amount of time thinking about Search Engine Marketing (SEM), Search Engine Optimiz...(related: Search Engine Optimization)
Writing Search Engine Friendly Webpages
In order to tap the huge stream of targeted traffic an internet search engine can provide a website you need to master a few common sense principles when crafting your webpages.You can rest assured no sites receive top search engine ranking...(related: Search Engine Optimization)
site-map - Copyright © 2006 Empower! Web Design | All Rights Reserved. | Search Engine Optimization
