The Forge Archives

General Forge Forums => Site Discussion => Topic started by: Seth L. Blumberg on July 26, 2002, 04:48:53 PM

Title: Google indexing the Forums?
Post by: Seth L. Blumberg on July 26, 2002, 04:48:53 PM
Check out what I found while ego-surfing. (http://216.239.51.100/search?q=cache:E5hVWBCvE5EC:indie-rpgs.com/forum/+seth+blumberg&hl=en&ie=UTF-8)

Can we get a /robots.txt that excludes search engines from the Forums, please? I don't necessarily want a prospective employer to be looking at my Forge postings.
Title: Google indexing the Forums?
Post by: Clinton R. Nixon on July 26, 2002, 04:51:01 PM
We most certainly can.

Now, this is a funny question coming from the technical proficency guy, but: how exactly do we do that?

- Clinton
Title: Google indexing the Forums?
Post by: Zak Arntson on July 26, 2002, 04:57:30 PM
This is for Google, but it may be useful for other engines: http://www.google.com/webmasters/3.html#removed. It's a start.
Title: Google indexing the Forums?
Post by: Matt Snyder on July 26, 2002, 04:59:52 PM
Quote from: Clinton R NixonWe most certainly can.

Now, this is a funny question coming from the technical proficency guy, but: how exactly do we do that?

- Clinton

If I understand it rightly, you do something like this:

Quote
# robots.txt for http://www.yoursite.com
# This file is for resticting access to parts of the web server
# to all robots who use the Robot Exclusion Standard.

User-Agent: *
Disallow: /forum

(Edit: that character after "User-Agent:" should be an asterisk.)

Each disallow "/whatever" is another directory to exclude. Then, you save this file as "robots.txt" in your root web directory.

That's how we do it at the web site I work for.
Title: Google indexing the Forums?
Post by: Clinton R. Nixon on July 26, 2002, 05:08:31 PM
Got it - I've added a /robots.txt file, and added META tags that should prevent the pages from being indexed, as well.

I'll contact Google and ask them to remove www.indie-rpgs.com/forum from their cached files.
Title: I'm Not a Web Designer (Nor Do I Play One on TV)
Post by: Le Joueur on July 26, 2002, 05:16:53 PM
Quote from: Clinton R NixonNow, this is a funny question coming from the technical proficency guy, but: how exactly do we do that?
Forgive my ignorance, but don't you just put this in the header:

<META NAME="robots" CONTENT="NOINDEX, NOFOLLOW">
A little bit of code in the source that the php pulls the header from should do the trick, right?

BTW, thanks for pointing me at pMachine; have you switched the reviews to it recently?

Fang Langford
Title: Re: I'm Not a Web Designer (Nor Do I Play One on TV)
Post by: Clinton R. Nixon on July 26, 2002, 05:22:57 PM
Quote from: Le Joueur
A little bit of code in the source that the php pulls the header from should do the trick, right?

BTW, thanks for pointing me at pMachine; have you switched the reviews to it recently?

Yup - it works great for them. I'm glad you like it.
Title: Google indexing the Forums?
Post by: Victor Gijsbers on July 26, 2002, 10:34:32 PM
What exactly is the rationale behind this? I mean, it's your site, you can do with it whatever you wish. But this forum is full of interesting material, and I think it's rather strange to ensure that people won't be able to find it.
Title: Google indexing the Forums?
Post by: Clinton R. Nixon on July 26, 2002, 10:40:10 PM
It's so people's names can't be found while searching the Internet. Don't think we won't still be indexed - the Forge will be. However, the front page of the forums (which is all that would be indexed anyway, for reasons I can explain if you like, but are technical and boring) doesn't convey much information, and might have my, Seth's, your, or anyone's name on it.

Many people would rather their names not pop up when people like prospective employers search the Internet. I can completely understand this, especially since I almost got fired from a job about 5 months ago because I said something on my personal journal that was disparaging to a co-worker.
Title: Google indexing the Forums?
Post by: Victor Gijsbers on July 27, 2002, 12:49:26 PM
Quote from: Clinton R Nixonthe front page of the forums (which is all that would be indexed anyway, for reasons I can explain if you like, but are technical and boring) doesn't convey much information, and might have my, Seth's, your, or anyone's name on it

Ah, if that is the only thing which would be indexed, not much is lost. Does the 'technical and boring' reason have anything to do with the fact that the threads themselves aren't static html but rather database entries retrieved by your php-scripts?
Title: Google indexing the Forums?
Post by: Clinton R. Nixon on July 27, 2002, 04:34:15 PM
Victor,

It's because you access forums and threads with URL's like http://www.indie-rpgs.com/forum/viewforum.php?f=1, which are composed of a web page + arguments sent to that web page. Google, and other indexes, only grab the web page without any arguments.

If our forum system created URL's like: http://www.indie-rpgs.com/forum/site_discussion/2341, then the individual threads would be indexed.
Title: Google indexing the Forums?
Post by: Victor Gijsbers on July 27, 2002, 11:01:10 PM
This is strange, because to test my theory, I tried a Google on my nickname at http://gathering.tweakers.net. It's a forum with url's like "http://gathering.tweakers.net/showtopic.php/220386/1/100", without arguments. As I have over 7000 posts there, I should have found quite a lot. But I didn't find a single thing, nothing at all.

Might this have something to do with the fact that the last part of the url (the '/1/100' here) depends on options in the user's profile? (1/100 means: go to the first page, with 100 posts per page.)

Hm, maybe I'd better ask this to one of the DB-admins on the Gathering of Tweakers itself. Never mind.
Title: Google indexing the Forums?
Post by: Paul Czege on July 29, 2002, 06:07:43 AM
It's because you access forums and threads with URL's like http://www.indie-rpgs.com/forum/viewforum.php?f=1, which are composed of a web page + arguments sent to that web page. Google, and other indexes, only grab the web page without any arguments.

They've got this:

http://www.indie-rpgs.com/forum/viewforum.php?f=2

Paul