It shows that you are unregistered. Please register with us by clicking Here
![]() |
|
![]() |
![]() | Register - FAQ - Today's Posts - New Posts - Support - Search | ![]() |
|
|
#1 (permalink) |
|
Posts: n/a
|
I have a few pages around my site that I don't want any web-crawlers, spiders or robots to index etc... I remember reading somewhere about the several different ways to do it but I can't seem to find it (still searching Google though, I might end up finding my answer).
Anyone know off the top of their head? |
|
|
|
|
|
#2 (permalink) |
|
Posts: n/a
|
Ah! Found it!
For those that want to know... Most legitimate web crawlers (Like Google and Yahoo) will look for a file called "robots.txt" in your main web directory (more than likely in "/home/username/public_html/") and follow the instructions inside it. A quick look on Wikipedia taught me everything I needed to know: http://en.wikipedia.org/wiki/Robots.txt However that's all well and good, but what about the crawlers that don't co-operate with robots.txt? You'll need to deny them access with a .htaccess file. This site tells you how: http://www.clockwatchers.com/robots_bad.html Hope I helped someone =) |
|
|
|
#3 (permalink) |
|
Junior Member
|
yeah! did that on my site too :)
btw, thanks for posting the link
__________________
-=(M m i x X)=-<br />Illuzion Web Worx - Still Under Construction |
|
|
|
|
|
#5 (permalink) |
|
Junior Member
Join Date: Jul 2006
Posts: 16
|
You left out an important part of the "Bad Bots" issue; how to find "bad bots" that are spidering your website.
There are 2 ways.... 1. Look every day at your access logs (a little tedious to do each day). or 2. Create a file to catch bad bots. Here are some links about creating "Bot Traps". http://www.kloth.net/internet/bottrap.php http://www.fleiner.com/bots Bad Bot list and what to do with it: http://www.javascriptkit.com/howto/htaccess13.shtml Of course you can find other bad bots to ban...... The overall concept is to set up your robots.txt file correctly. The bad bots are those who do not follow the rules you set up in robots.txt. You could set up a fake diretory and in that directory put in some kind of bot trap. Then in the robots.txt file, put in a rule to disallow that directory. Soooooo.... a bad bot would open the bot trap, which should alert you, the site admin, what happened and who did it. Then you should ban that bot in the .htaccess file. |
|
|
|
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| robots.txt | Rob | Search Engine Optimization | 2 | 07-22-2006 08:42 PM |