Site Owners Forums - Webmaster Forums

Site Owners Forums - Webmaster Forums (http://siteownersforums.com/index.php)
-   Search Engine Optimization (http://siteownersforums.com/forumdisplay.php?f=16)
-   -   Robots.txt (http://siteownersforums.com/showthread.php?t=58786)

sameerseo 07-24-2012 03:36 AM

Robots.txt
 
Do you know anything about robots.txt?

wyanesmith1987 07-24-2012 05:04 AM

Introduction to "robots.txt"

There is a hidden, relentless force that permeates the web and its billions of web pages and files, unbeknownst to the majority of us sentient beings. I'm talking about search engine crawlers and robots here. Every day hundreds of them go out and scour the web, whether it's Google trying to index the entire web, or a spam bot collecting any email address it could find for less than honorable intentions. As site owners, what little control we have over what robots are allowed to do when they visit our sites exist in a magical little file called "robots.txt."

"Robots.txt" is a regular text file that through its name, has special meaning to the majority of "honorable" robots on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it's both meaningless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell Google just that.

if you want to read more so see this link..
http://www.javascriptkit.com/howto/robots.shtml

C.Rebecca 07-25-2012 05:02 AM

Robots.txt is a text (not html) file you put on your root directory to tell search robots which files to ignore (or alternatively) which files to crawl. It also helps Search Engines to locate the Sitemap of the website and hence crawl the entire website in depth... helping in your rankings and traffic.

larsen81 08-16-2012 02:32 AM

Robot.txt is a file where you tell search engines which sections or pages of your site not to index.

davikerkrish 08-16-2012 04:25 AM

robots.txt is a text file that placed on your root directory. It can allows or disallows spiders or search engines from indexing the pages.

Paul Jameson 08-16-2012 06:05 AM

Robot.txt is main function which guides search engine to find pages on your website to crawl. It is text file which you place on your root directory.

rajnish240 08-16-2012 11:17 PM

It is used to hide the privacy policy of a company from the Google's spider.So that your privacy are not visible publicly.
http://doxinh.com/danh-muc/do-lot-cao-cap/ Ao chip cao cap Quan lot doc Do ngu cao cap Do so sinh loai khac cao cap Cho thue trang phuc bieu dien Quan lot nam cao cap

cheskaSEO 08-16-2012 11:55 PM

Quote:

Originally Posted by rajnish240 (Post 230656)
It is used to hide the privacy policy of a company from the Google's spider.So that your privacy are not visible publicly.

agree with this post

Dilli Live 08-17-2012 02:55 AM

robot.txt is a text file which appears in the root directory in your website place. With help of this you can hide your unwanted website link for search engines.


Thanks,

samaustin141 08-17-2012 03:17 AM

It is a text file which instructs search engine spiders or crawlers on what to do. It tells specific web spiders on which specific web pages to index. Robots are configured to read text.It contains restrictions for Web Spiders, telling them where they have permission to search. It is like defining rules for search engine spiders (robots) what to follow and what not to.

AshleyScott 08-18-2012 05:06 AM

Robots. txt file is necessary at time where you want to instruct the crawler for the pages it is allowed to crawl.

jiyaalbert 08-20-2012 03:42 AM

The robots.txt file is a set of instructions for robots visiting that index the content of your web site pages. For those spiders that obey the file, it provides a map for what they can, and cannot index. The file must reside in the root directory of your web.

NateJacobs 08-20-2012 04:48 AM

Robots.txt is main use,if you don't want url indexing in Google,so use robot.txt.Many website owner are using robots.txt,so hacker don't hack site.

georgemathew 08-21-2012 02:07 AM

Robots.txt is a text file it can allows or disallows spiders or search engines from indexing the pages.

nptifitness 08-23-2012 10:59 PM

Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site

john mathew 08-27-2012 04:43 AM

hi,I am reading this article and thanks for sharing this information for about forum posting,

webdesignindia 08-27-2012 05:11 AM

Quote:

Originally Posted by C.Rebecca (Post 227719)
Robots.txt is a text (not html) file you put on your root directory to tell search robots which files to ignore (or alternatively) which files to crawl. It also helps Search Engines to locate the Sitemap of the website and hence crawl the entire website in depth... helping in your rankings and traffic.

I completely agree with you. Robots.txt allows you to tell search engine to not to crawl any sensitive data or information. Which is its mail benefit.

zeilenga569 08-31-2012 04:17 AM

Thanks , Great post information!

lylevasser12 08-31-2012 07:14 AM

Hello ,
It's a text file which instructs search engine spiders or crawlers on what to do. It tells specific web spiders on which specific web pages to index.

blueapple 09-05-2012 05:38 AM

A robots.txt file is a simple txt file. robots file on a website wills utility as a appeal that specified robots discount specified files or directories when crawling a site.

wowmadam 09-05-2012 11:54 PM

Robots.txt is a very useful text file to be uploaded on root directory of your site so as to disallow crawling our mentioned url's in robots.txt as not to be displayed to users out there.

Thanks

lawrencehayden 09-27-2012 12:09 AM

The Software Exemption Conventional, also known as the Spiders Exemption Method or robots.txt protocol, is a meeting to avoid participating web spiders and other web robots from opening all or part of a web page which is otherwise openly readable. Spiders are often used by google to classify and store web websites, or by web page owners to check resource value.

anshulniet 09-27-2012 01:04 AM

Robots.txt is a text file that you can put on your site to tell search robots which page you like them not to visit. Robots.txt is by no means mandatory for search engines but search engines obey what they are asked not to do. The location of robots.txt is very important as it must to be in main directory.

alluremedspa123 09-27-2012 02:38 AM

A robots.txt is a permissions file that can be used to control which webpages of a website a search engine indexes. The file must be located in the root directory of the website for a search engine website-indexing program (spider) to reference

3idatascraping 09-28-2012 12:33 AM

Robot.txt means to tell search engine of which pages you want to crawl or Not.

peterraimi 09-28-2012 02:27 AM

Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do.

Structure of a Robots.txt File :

The structure of a robots.txt is pretty simple (and barely flexible) � it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows:

User-agent:

Disallow:

�User-agent� are search engines' crawlers and disallow: lists the files and directories to be excluded from indexing. In addition to �user-agent:� and �disallow:� entries, you can include comment lines � just put the # sign at the beginning of the line:

# All user agents are disallowed to see the /temp directory.

User-agent: *

Disallow: /temp/

john mathew 09-28-2012 04:18 AM

Robot.txt tells to Google that which page should be crawl in the website.


All times are GMT -7. The time now is 07:51 AM.


Powered by vBulletin Copyright © 2020 vBulletin Solutions, Inc.