robots.txt - preCharge Forums
It shows that you are unregistered. Please register with us by clicking Here
preCharge Forums


Nav Green LeftNav Right
preCharge Forums > Website Design & Development > Marketing > Search Engine Optimization » robots.txt


Reply
Tcat Right
 
LinkBack Thread Tools Display Modes Tcat Right
Old 07-22-2006   #1 (permalink)
Rob
 
Posts: n/a
Default robots.txt

Let's discuss robots.txt for SMF. I believe this is a crucial Search Engine Optimization topic.

If you don't know what robots.txt is, fisrt read up on it here. Nutshell version: you tell the bots what is okay to crawl and what is not using this file. Compliance with your instructions is strictly voluntary by bots, so this in no way "protects" your pages from being viewed (there are other mechanisms that do).

Goal 1 should be preventing the bots from indexing non-content pages. These include basically everything except the forum index, board indexes and topic pages. the great news is that SMF's URL structure makes this dead easy.

Here's my robots.txt:
Code:
User-agent: *
Disallow: /index.php?action=
In one shot, this knocks out all the non-content pages. It even works if you have search-engine friendly URLs on. In vBulletin, it takes a dozen and a half lines to do what this one does.

Note that if your forum root isn't /, you'll have to adjust accordingly. This version works if you've setup the subdomain http://forums.example.com/ . But if you run like http://www.example.com/forums/ then you want something like

Code:
User-agent: *
Disallow: /forums/index.php?action=
robots.txt goes in the document root of your virtual server. That means its url should be /robots.txt. If you put it in a subdirectory, no bots will read it.

The free Google sitemaps tool now has a robots.txt checking tool. It lets you validate your file, and even test whether specific URLs would be fetched by Googlebot based on your robots.txt. this is indispensible.

If you have additional robots.txt directives, please share in this thread and post the reasons behind them.
  Reply With Quote


Old 07-22-2006   #2 (permalink)
Rob
 
Posts: n/a
Default Re: robots.txt

This board's robots.txt is 404! Bad call. Even if you don't want to restrict anything, create the 0-byte /robots.txt file. It will keep the error log from needlessly filling with 404s.
  Reply With Quote

Old 07-22-2006   #3 (permalink)
tynana
Junior Member
 
Join Date: Jul 2006
Posts: 12
Default Re: robots.txt

Thanks so much for the info, much needed ;D
tynana is offline   Reply With Quote

Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Stopping Robots and Web-Crawlers mundy Website Design 6 07-29-2006 12:36 PM


footer left
All times are GMT. The time now is 01:27 PM.

DISCLAIMER: preCharge Risk Management is not responsible for any opinions, advice or comments expressed on the preCharge Community Forums.
preCharge® is a registered trademark of preCharge Risk Management | chargeback protection | Merchant Account Blog

Powered by vBulletin
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0 RC6

Car Credit | Cheap Loan | Credit Cards | Myspace Proxy | Car Finance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49