Stopping Robots and Web-Crawlers - preCharge Forums
It shows that you are unregistered. Please register with us by clicking Here
preCharge Forums


Nav Green LeftNav Right
preCharge Forums > Website Design & Development > Website Design » Stopping Robots and Web-Crawlers


Reply
Tcat Right
 
LinkBack Thread Tools Display Modes Tcat Right
Old 07-26-2006   #1 (permalink)
mundy
 
Posts: n/a
Default Stopping Robots and Web-Crawlers

I have a few pages around my site that I don't want any web-crawlers, spiders or robots to index etc... I remember reading somewhere about the several different ways to do it but I can't seem to find it (still searching Google though, I might end up finding my answer).

Anyone know off the top of their head?
  Reply With Quote


Old 07-26-2006   #2 (permalink)
mundy
 
Posts: n/a
Default Re: Stopping Robots and Web-Crawlers

Ah! Found it!

For those that want to know...

Most legitimate web crawlers (Like Google and Yahoo) will look for a file called "robots.txt" in your main web directory (more than likely in "/home/username/public_html/") and follow the instructions inside it.

A quick look on Wikipedia taught me everything I needed to know: http://en.wikipedia.org/wiki/Robots.txt

However that's all well and good, but what about the crawlers that don't co-operate with robots.txt? You'll need to deny them access with a .htaccess file. This site tells you how: http://www.clockwatchers.com/robots_bad.html

Hope I helped someone =)
  Reply With Quote

Old 07-28-2006   #3 (permalink)
MmixX
Junior Member
 
Join Date: Jul 2006
Age: 29
Posts: 8
Send a message via ICQ to MmixX Send a message via MSN to MmixX
Default Re: Stopping Robots and Web-Crawlers

yeah! did that on my site too :)

btw, thanks for posting the link
__________________
-=(M m i x X)=-<br />Illuzion Web Worx - Still Under Construction
MmixX is offline   Reply With Quote

Old 07-28-2006   #4 (permalink)
mundy
 
Posts: n/a
Default Re: Stopping Robots and Web-Crawlers

Happy to help =)
  Reply With Quote

Old 07-28-2006   #5 (permalink)
spottedhog
Junior Member
 
Join Date: Jul 2006
Posts: 16
Default Re: Stopping Robots and Web-Crawlers

You left out an important part of the "Bad Bots" issue; how to find "bad bots" that are spidering your website.

There are 2 ways....

1. Look every day at your access logs (a little tedious to do each day).
or
2. Create a file to catch bad bots.

Here are some links about creating "Bot Traps".

http://www.kloth.net/internet/bottrap.php
http://www.fleiner.com/bots

Bad Bot list and what to do with it:
http://www.javascriptkit.com/howto/htaccess13.shtml

Of course you can find other bad bots to ban......

The overall concept is to set up your robots.txt file correctly. The bad bots are those who do not follow the rules you set up in robots.txt. You could set up a fake diretory and in that directory put in some kind of bot trap. Then in the robots.txt file, put in a rule to disallow that directory. Soooooo.... a bad bot would open the bot trap, which should alert you, the site admin, what happened and who did it. Then you should ban that bot in the .htaccess file.

spottedhog is offline   Reply With Quote

Old 07-28-2006   #6 (permalink)
mundy
 
Posts: n/a
Default Re: Stopping Robots and Web-Crawlers

Yeah I read about those but must have forgotten to mention the link =/

Thanks tho ^^ =)
  Reply With Quote

Old 07-29-2006   #7 (permalink)
joomlajon
Banned User
 
joomlajon's Avatar
 
Join Date: Jul 2006
Age: 33
Posts: 5
Default Re: Stopping Robots and Web-Crawlers

Thanks for info, will use soon when I launch. :)
joomlajon is offline   Reply With Quote

Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
robots.txt Rob Search Engine Optimization 2 07-22-2006 08:42 PM


footer left
All times are GMT. The time now is 01:14 AM.

DISCLAIMER: preCharge Risk Management is not responsible for any opinions, advice or comments expressed on the preCharge Community Forums.
preCharge® is a registered trademark of preCharge Risk Management | chargeback protection | Merchant Account Blog

Powered by vBulletin
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0 RC6

Cheap Loan | Send Money Online | Xbox Mod Chip | Web Advertising | Mortgages

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49