Check out what I found while ego-surfing. (http://216.239.51.100/search?q=cache:E5hVWBCvE5EC:indie-rpgs.com/forum/+seth+blumberg&hl=en&ie=UTF-8)
Can we get a /robots.txt that excludes search engines from the Forums, please? I don't necessarily want a prospective employer to be looking at my Forge postings.
We most certainly can.
Now, this is a funny question coming from the technical proficency guy, but: how exactly do we do that?
- Clinton
This is for Google, but it may be useful for other engines: http://www.google.com/webmasters/3.html#removed. It's a start.
Quote from: Clinton R NixonWe most certainly can.
Now, this is a funny question coming from the technical proficency guy, but: how exactly do we do that?
- Clinton
If I understand it rightly, you do something like this:
Quote
# robots.txt for http://www.yoursite.com
# This file is for resticting access to parts of the web server
# to all robots who use the Robot Exclusion Standard.
User-Agent: *
Disallow: /forum
(Edit: that character after "User-Agent:" should be an asterisk.)
Each disallow "/whatever" is another directory to exclude. Then, you save this file as "robots.txt" in your root web directory.
That's how we do it at the web site I work for.
Got it - I've added a /robots.txt file, and added META tags that should prevent the pages from being indexed, as well.
I'll contact Google and ask them to remove www.indie-rpgs.com/forum from their cached files.
Quote from: Clinton R NixonNow, this is a funny question coming from the technical proficency guy, but: how exactly do we do that?
Forgive my ignorance, but don't you just put this in the header:
<META NAME="robots" CONTENT="NOINDEX, NOFOLLOW">
A little bit of code in the source that the php pulls the header from should do the trick, right?
BTW, thanks for pointing me at pMachine; have you switched the reviews to it recently?
Fang Langford
Quote from: Le Joueur
A little bit of code in the source that the php pulls the header from should do the trick, right?
BTW, thanks for pointing me at pMachine; have you switched the reviews to it recently?
Yup - it works great for them. I'm glad you like it.
What exactly is the rationale behind this? I mean, it's your site, you can do with it whatever you wish. But this forum is full of interesting material, and I think it's rather strange to ensure that people won't be able to find it.
It's so people's names can't be found while searching the Internet. Don't think we won't still be indexed - the Forge will be. However, the front page of the forums (which is all that would be indexed anyway, for reasons I can explain if you like, but are technical and boring) doesn't convey much information, and might have my, Seth's, your, or anyone's name on it.
Many people would rather their names not pop up when people like prospective employers search the Internet. I can completely understand this, especially since I almost got fired from a job about 5 months ago because I said something on my personal journal that was disparaging to a co-worker.
Quote from: Clinton R Nixonthe front page of the forums (which is all that would be indexed anyway, for reasons I can explain if you like, but are technical and boring) doesn't convey much information, and might have my, Seth's, your, or anyone's name on it
Ah, if that is the only thing which would be indexed, not much is lost. Does the 'technical and boring' reason have anything to do with the fact that the threads themselves aren't static html but rather database entries retrieved by your php-scripts?
Victor,
It's because you access forums and threads with URL's like http://www.indie-rpgs.com/forum/viewforum.php?f=1, which are composed of a web page + arguments sent to that web page. Google, and other indexes, only grab the web page without any arguments.
If our forum system created URL's like: http://www.indie-rpgs.com/forum/site_discussion/2341, then the individual threads would be indexed.
This is strange, because to test my theory, I tried a Google on my nickname at http://gathering.tweakers.net. It's a forum with url's like "http://gathering.tweakers.net/showtopic.php/220386/1/100", without arguments. As I have over 7000 posts there, I should have found quite a lot. But I didn't find a single thing, nothing at all.
Might this have something to do with the fact that the last part of the url (the '/1/100' here) depends on options in the user's profile? (1/100 means: go to the first page, with 100 posts per page.)
Hm, maybe I'd better ask this to one of the DB-admins on the Gathering of Tweakers itself. Never mind.
It's because you access forums and threads with URL's like http://www.indie-rpgs.com/forum/viewforum.php?f=1, which are composed of a web page + arguments sent to that web page. Google, and other indexes, only grab the web page without any arguments.They've got this:
http://www.indie-rpgs.com/forum/viewforum.php?f=2Paul