Let's discuss robots.txt for SMF. I believe this is a crucial Search Engine Optimization topic.
If you don't know what robots.txt is, fisrt
read up on it here. Nutshell version: you tell the bots what is okay to crawl and what is not using this file. Compliance with your instructions is strictly voluntary by bots, so this in no way "protects" your pages from being viewed (there are other mechanisms that do).
Goal 1 should be preventing the bots from indexing non-content pages. These include basically everything except the forum index, board indexes and topic pages. the great news is that SMF's URL structure makes this dead easy.
Here's my robots.txt:
Code:
User-agent: *
Disallow: /index.php?action=
In one shot, this knocks out all the non-content pages. It even works if you have search-engine friendly URLs on. In vBulletin, it takes a dozen and a half lines to do what this one does.
Note that if your forum root isn't /, you'll have to adjust accordingly. This version works if you've setup the subdomain
http://forums.example.com/ . But if you run like
http://www.example.com/forums/ then you want something like
Code:
User-agent: *
Disallow: /forums/index.php?action=
robots.txt goes in the document root of your virtual server. That means its url should be /robots.txt. If you put it in a subdirectory, no bots will read it.
The free
Google sitemaps tool now has a robots.txt checking tool. It lets you validate your file, and even test whether specific URLs would be fetched by Googlebot based on your robots.txt. this is indispensible.
If you have additional robots.txt directives, please share in this thread and post the reasons behind them.