News:

Forum changes: Editing of posts has been turned off until further notice.

Main Menu

Google indexing the Forums?

Started by Seth L. Blumberg, July 26, 2002, 04:48:53 PM

Previous topic - Next topic

Seth L. Blumberg

Check out what I found while ego-surfing.

Can we get a /robots.txt that excludes search engines from the Forums, please? I don't necessarily want a prospective employer to be looking at my Forge postings.
the gamer formerly known as Metal Fatigue

Clinton R. Nixon

We most certainly can.

Now, this is a funny question coming from the technical proficency guy, but: how exactly do we do that?

- Clinton
Clinton R. Nixon
CRN Games

Zak Arntson

This is for Google, but it may be useful for other engines: http://www.google.com/webmasters/3.html#removed. It's a start.

Matt Snyder

Quote from: Clinton R NixonWe most certainly can.

Now, this is a funny question coming from the technical proficency guy, but: how exactly do we do that?

- Clinton

If I understand it rightly, you do something like this:

Quote
# robots.txt for http://www.yoursite.com
# This file is for resticting access to parts of the web server
# to all robots who use the Robot Exclusion Standard.

User-Agent: *
Disallow: /forum

(Edit: that character after "User-Agent:" should be an asterisk.)

Each disallow "/whatever" is another directory to exclude. Then, you save this file as "robots.txt" in your root web directory.

That's how we do it at the web site I work for.
Matt Snyder
www.chimera.info

"The future ain't what it used to be."
--Yogi Berra

Clinton R. Nixon

Got it - I've added a /robots.txt file, and added META tags that should prevent the pages from being indexed, as well.

I'll contact Google and ask them to remove www.indie-rpgs.com/forum from their cached files.
Clinton R. Nixon
CRN Games

Le Joueur

Quote from: Clinton R NixonNow, this is a funny question coming from the technical proficency guy, but: how exactly do we do that?
Forgive my ignorance, but don't you just put this in the header:

<META NAME="robots" CONTENT="NOINDEX, NOFOLLOW">
A little bit of code in the source that the php pulls the header from should do the trick, right?

BTW, thanks for pointing me at pMachine; have you switched the reviews to it recently?

Fang Langford
Fang Langford is the creator of Scattershot presents: Universe 6 - The World of the Modern Fantastic.  Please stop by and help!

Clinton R. Nixon

Quote from: Le Joueur
A little bit of code in the source that the php pulls the header from should do the trick, right?

BTW, thanks for pointing me at pMachine; have you switched the reviews to it recently?

Yup - it works great for them. I'm glad you like it.
Clinton R. Nixon
CRN Games

Victor Gijsbers

What exactly is the rationale behind this? I mean, it's your site, you can do with it whatever you wish. But this forum is full of interesting material, and I think it's rather strange to ensure that people won't be able to find it.

Clinton R. Nixon

It's so people's names can't be found while searching the Internet. Don't think we won't still be indexed - the Forge will be. However, the front page of the forums (which is all that would be indexed anyway, for reasons I can explain if you like, but are technical and boring) doesn't convey much information, and might have my, Seth's, your, or anyone's name on it.

Many people would rather their names not pop up when people like prospective employers search the Internet. I can completely understand this, especially since I almost got fired from a job about 5 months ago because I said something on my personal journal that was disparaging to a co-worker.
Clinton R. Nixon
CRN Games

Victor Gijsbers

Quote from: Clinton R Nixonthe front page of the forums (which is all that would be indexed anyway, for reasons I can explain if you like, but are technical and boring) doesn't convey much information, and might have my, Seth's, your, or anyone's name on it

Ah, if that is the only thing which would be indexed, not much is lost. Does the 'technical and boring' reason have anything to do with the fact that the threads themselves aren't static html but rather database entries retrieved by your php-scripts?

Clinton R. Nixon

Victor,

It's because you access forums and threads with URL's like http://www.indie-rpgs.com/forum/viewforum.php?f=1, which are composed of a web page + arguments sent to that web page. Google, and other indexes, only grab the web page without any arguments.

If our forum system created URL's like: http://www.indie-rpgs.com/forum/site_discussion/2341, then the individual threads would be indexed.
Clinton R. Nixon
CRN Games

Victor Gijsbers

This is strange, because to test my theory, I tried a Google on my nickname at http://gathering.tweakers.net. It's a forum with url's like "http://gathering.tweakers.net/showtopic.php/220386/1/100", without arguments. As I have over 7000 posts there, I should have found quite a lot. But I didn't find a single thing, nothing at all.

Might this have something to do with the fact that the last part of the url (the '/1/100' here) depends on options in the user's profile? (1/100 means: go to the first page, with 100 posts per page.)

Hm, maybe I'd better ask this to one of the DB-admins on the Gathering of Tweakers itself. Never mind.

Paul Czege

It's because you access forums and threads with URL's like http://www.indie-rpgs.com/forum/viewforum.php?f=1, which are composed of a web page + arguments sent to that web page. Google, and other indexes, only grab the web page without any arguments.

They've got this:

http://216.239.33.100/search?q=cache:UuPbN3uI2sgC:www.indie-rpgs.com/forum/viewforum.php%3Ff%3D2+%22paul+czege%22&hl=en&ie=UTF-8">http://www.indie-rpgs.com/forum/viewforum.php?f=2

Paul
My Life with Master knows codependence.
And if you're doing anything with your Acts of Evil ashcan license, of course I'm curious and would love to hear about your plans