Site Owners Forums - Webmaster Forums - What is disallow in robots.txt file?

Site Owners Forums - Webmaster Forums (http://siteownersforums.com/index.php)

- Search Engine Optimization (http://siteownersforums.com/forumdisplay.php?f=16)

- - What is disallow in robots.txt file? (http://siteownersforums.com/showthread.php?t=205726)

What is disallow in robots.txt file?

Hlo Friends,

What is disallow in robots.txt file?

Disallow means don't visit that particular page in search engine.

The simplest robots.txt file uses two key words, User-agent and Disallow.

It means not to visit that particular site.

The Robots Exclusion Protocol (REP) is not exactly a complicated protocol and its uses are fairly limited, and thus it’s usually given short shrift by SEOs. Yet there’s a lot more to it than you might think. Robots.txt has been with us for over 14 years, but how many of us knew that in addition to the disallow directive there’s a noindex directive that Googlebot obeys? That noindexed pages don’t end up in the index but disallowed pages do, and the latter can show up in the search results (albeit with less information since the spiders can’t see the page content)? That disallowed pages still accumulate PageRank? That robots.txt can accept a limited form of pattern matching? That, because of that last feature, you can selectively disallow not just directories but also particular filetypes (well, file extensions to be more exact)? That a robots.txt disallowed page can’t be accessed by the spiders, so they can’t read and obey a meta robots tag contained within the page?

A robots.txt file provides critical information for search engine spiders that crawl the web. Before these bots (does anyone say the full word “robots” anymore?) access pages of a site, they check to see if a robots.txt file exists. Doing so makes crawling the web more efficient, because the robots.txt file keeps the bots from accessing certain pages that should not be indexed by the search engines.

Having a robots.txt file is a best practice. Even just for the simple reason that some metrics programs will interpret the 404 response to the request for a missing robots.txt file as an error, which could result in erroneous performance reporting. But what goes in that robots.txt file? That’s the crux of it.

Both robots.txt and robots meta tags rely on cooperation from the robots, and are by no means guaranteed to work for every bot. If you need stronger protection from unscrupulous robots and other agents, you should use alternative methods such as password protection. Too many times I’ve seen webmasters naively place sensitive URLs such as administrative areas in robots.txt. You better believe robots.txt is one of the hacker’s first ports of call—to see where they should break into.

Quote:

Originally Posted by parmodshastri (Post 743666)

Hlo Friends,

What is disallow in robots.txt file?

Posting the same question on every forum to gain backlinks. Remember SEO results are based on more factors. If your aim is to get the backlinks then definitely you gonna lose the result.

Robots.txt is a text file webmasters create to instruct web robots (typically search engine robots) how to crawl pages on their website.

A robots.txt file is a file at the root of your site that indicates those parts of your site you don't want accessed by search engine crawlers. The file uses the ... While Google won't crawl or index the content blocked by robots.txt , we might still find and index a disallowed URL from other places on the web.

The asterisk after “user-agent” means that the robots. txt file applies to all web robots that visit the site. The slash after “Disallow” tells the robot to not visit any pages on the site.