Search Engine Robots - How They Work, What They Do (part I)
|
|
Automated search engine robots, sometimes called "spiders" or "crawlers", are the seekers of web pages. How do they work? What is it they really do? Why are they important?
You'd think with all the fuss about indexing web pages to add to search engine databases, that robots would be great and powerful beings. Wrong. Search engine robots have only basic functionality like that of early browsers in terms of what they can understand in a web page. Like early browsers, robots just can't do certain things. Robots don't understand frames, Flash movies, images or JavaScript. They can't enter password protected areas and they can't click all those buttons you have on your website. They can be stopped cold while indexing a dynamically generated URL and slowed to a stop with JavaScript navigation.How Do Search Engine Robots Work?
Think of search engine robots as automated data retrieval programs, traveling the web to find information and links.
When you submit a web page to a search engine at the "Submit a URL" page, the new URL is added to the robot's queue of websites to visit on its next foray out onto the web. Even if you don't directly submit a page, many robots will find your site because of links from other sites that point back to yours. This is one of the reasons why it is important to build your link popularity and to get links from other topical sites back to yours.
When arriving at your website, the automated robots first check to see if you have a robots.txt file. This file is used to tell robots which areas of your site are off-limits to them. Typically these may be directories containing only binaries or other files the robot doesn't need to concern itself with.
Robots collect links from each page they visit, and later follow those links through to other pages. In this way, they essentially follow the links from one page to another. The entire World Wide Web is made up of links, the original idea being that you could follow links from one place to another. This is how robots get around.
The "smarts" about indexing pages online comes from the search engine engineers, who devise the methods used to evaluate the information the search engine robots retrieve. When introduced into the search engine database, the information is available for searchers querying the search engine. When a search engine user enters their query into the search engine, there are a number of quick calculations done to make sure that the search engine presents just the right set of results to give their visitor the most relevant response to their query.
You can see which pages on your site the search engine robots have visited by looking at your server logs or the results from your log statistics program. Identifying the robots will show you when they visited your website, which pages they visited and how often they visit. Some robots are readily identifiable by their user agent names, like Google's "Googlebot"; others are bit more obscure, like Inktomi's "Slurp". Still other robots may be listed in your logs that you cannot readily identify; some of them may even appear to be human-powered browsers.
Along with identifying individual robots and counting the number of their visits, the statistics can also show you aggressive bandwidth-grabbing robots or robots you may not want visiting your website. In the resources section of the end of this article, you will find sites that list names and IP addresses of search engine robots to help you identify them.How Do They Read The Pages On Your Website?
When the search engine robot visits your page, it looks at the visible text on the page, the content of the various tags in your page's source code (title tag, meta tags, etc.), and the hyperlinks on your page. From the words and the links that the robot finds, the search engine decides what your page is about. There are many factors used to figure out what "matters" and each search engine has its own algorithm in order to evaluate and process the information. Depending on how the robot is set up through the search engine, the information is indexed and then delivered to the search engine's database.
The information delivered to the databases then becomes part of the search engine and directory ranking process. When the search engine visitor submits their query, the search engine digs through its database to give the final listing that is displayed on the results page.
The search engine databases update at varying times. Once you are in the search engine databases, the robots keep visiting you periodically, to pick up any changes to your pages, and to make sure they have the latest info. The number of times you are visited depends on how the search engine sets up its visits, which can vary per search engine.
Sometimes visiting robots are unable to access the website they are visiting. If your site is down, or you are experiencing huge amounts of traffic, the robot may not be able to access your site. When this happens, the website may not be re-indexed, depending on the frequency of the robot visits to your website. In most cases, robots that cannot access your pages will try again later, hoping that your site will be accessible then.
Resources
*SpiderSpotting - Search Engine Watchhttp://searchenginewatch.com/webmasters/spiders.html
*Robotstxt.orgList of robots and protocols for setting up a robots.txt file. http://www.robotstxt.org/
*Spider-FoodTutorials, forums and articles about Search Engine spiders and Search Engine Marketing. http://spider-food.net/
*Spiderhunter.comArticles and resources about tracking Search Engine spiders. http://www.spiderhunter.com/
*Sim Spider Search Engine Robot SimulatorSearch Engine World has a spider that simulates what the Search Engine robots read from your website. http://www.searchengineworld.com/cgi-bin/sim_spider.cgi
Daria Goetsch is the founder and Search Engine Marketing Consultant for Search Innovation Marketing, a Search Engine Optimization company serving small businesses. She has specialized in Search Engine Promotion since 1998, including three years as the Search Engine Specialist for O'Reilly Media, Inc., a technical book publishing company.
Copyright © 2002-2005 Search Innovation Marketing. http://www.searchinnovation.com All Rights Reserved.
Permission to reprint this article is granted if the article is reproduced in its entirety, without editing, including the bio information. Please include a hyperlink to http://www.searchinnovation.com when using this article in newsletters or online.
|
|
|
Created & Maintained by Empower! CMS Web Sites
Host2Sell Web Hosting | Emarketing Workshops | Site SEO Review | FREE NewsletterHow To Avoid The 3 Biggest Title Tag Mistakes With Search Engines
Your title tag is the most important 3 to 12 words on your Web page. It accounts for up to 80% of your rankings on search engines.Here's why:Search Engines look for "searched for" words first in title tags.The title tag ...(related: Search Engine Optimization)
Five Short Quick Tips On Web Page Titles
By Catherine FranzSearch engine spiders read and record page titles first.Here are five tips on how you can get the search enginespiders to read your titles and use them in theiroptimization process:1. Character count is important: Google's maximum charactercount for page titles is 83. Yahoo's maximum charactercount is 111.2. Don't use filler words like: and, for, the. They aredropped anyway by the search engines.3. Remove redundant or unsearch worthy words or phrases, forinstance: Welcome to, this is about, or Products Page.Anything people would not normally search on. This alsouses up the limitations on character count (mentioned in #1above).4. Include word plurals, for instance: sales le...(related: Search Engine Optimization)
The Easiest Way To Your Google Sitemap
Google Sitemaps is a new tool for website owners and publishers, released by Google themselves. It allows you to submit a sitemap (a document that contains links to every page of your site) from your own homepage in .XML or in plain .TXT format that will help Google to spider your pages. This again should result in a faster indexi...(related: Search Engine Optimization)
How To Soar In Your Search Engine Marketing, In The Post Google Era Part 2
'Are Google's Days in the Dominant Position in Search Technology Numbered?'In an aggressive attempt to get SEO under control, perhaps for its new IPO, Google is evolving into a less relevant search engine losing market share to Yahoo! Furthermore, on the horizon is Microsoft's launch into the search business, which will integrate s...(related: Search Engine Optimization)
Search Engine Optimization Lies & Misconceptions
In a perfect world, everyone would be honest.In a perfect world, no one would violate search engine policies to try get a better listing.They'd respect the terms set out by the search engines. It's not a perfect world. And especially not when it comes to the highly competitive search engine optimization industry. Prime example;1) Here is a quote taken from a "search engine optimization" website;
- "Sophisticated doorway pages (those ...(related: Search Engine Optimization)
How Ive Maintained 7 Top Ten Google Rankings For Nine Months
Back in November 2004 I discovered a way to get a top 10 ranking in Google. I tested the technique for 3 months before I shared my findings with the world.Some people worked my technique and are still holding on to their top Google rankings. Others criticized it, and decided not to give it a go.Since November 2004 my targeted keywords have gotten more competitive. Webmasters have gotten swifter and saavier with their techniques. New t...(related: Search Engine Optimization)
Seo, The Simplified Version
Lets get things straight. SEO is a very competitive market. If you have the time to promote your site and have the energy to work hard to get a good PR then this is for you. I have read many books on SEO and tried to get the best tactics to use. If you have already built a website there are two thin...(related: Search Engine Optimization)
Local Search Optimization - A Guide To Getting Started
While searching the web these days, it's hard not to notice all those little Local tabs sprouting up in the vicinity of the search field on virtually every major search...(related: Search Engine Optimization)
How Search Engines Work
Before anyone can start optimizing a web site, you must understand how search engines work.Search engine optimization is the hardest thing to do for a webmaster because there are so many rules to it and you have to stay up to date with all the new search engine optimization techniques.Search engines send out what is know as a robot or some people call them spiders to index your web page. They find web pages by links, When a robot finds a link on a web page it will follow it to that page (you can join www.linkexchangeit.com to trade links w...(related: Search Engine Optimization)
site-map - Copyright © 2006 Empower! Web Design | All Rights Reserved. | Search Engine Optimization
