Crawling
There is a hidden, relentless force that permeates the web and its billions of web pages and files, unbeknownst to the majority of us sentient beings. I'm talking about search engine crawlers and robots here. Every day hundreds of them go out and scour the web, whether it's Google trying to index the entire web, or a spam bot collecting any email address it could find for less than honorable intentions. As site owners, what little control we have over what robots are allowed to do when they visit our sites exist in a magical little file called "robots.txt."
"Robots.txt" is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that.
Caching
Google's cache has been around in the search results for a long time. In fact, Google's cache is often ignored in SEO strategy and analysis. But using it can provide you with a lot of information that can increase leads, sales, user satisfaction and even offer clues to existing problems with your website. This article will examine Google's cache page in detail and recommend possible ways to use the information provided in your search engine optimization strategy.
Basic components of Google's cache
Before we dive deeper, a short introduction to Google's cache is helpful. You can see the Google cache of your website/web pages in three different ways.
Method 1: When you visit the URL that you need to view in the Google cache, click the Google toolbar (e.g using Firefox browser); in the drop down, click "Google snapshot of page."
Method 2: In the Google search result showing the URL that you need to view, click the "Cache" link.
Method 3: In the Google search box, type:
cache:
www.thisisyourdomain.com/thisisthepage.htm
Or if you are checking for your home page:
cache:
www.thisisyourdomain.com/
Read more at
http://www.seochat.com/c/a/Google-Op...fLj0gmlwE2p.99