*
*
Home
Help
Login
Register
Welcome, Guest. Please login or register.
August 12, 2022, 07:59:22 PM

Login with username, password and session length
Forum changes: Editing of posts has been turned off until further notice.
Search:     Advanced search
275647 Posts in 27717 Topics by 4285 Members Latest Member: - Jason DAngelo Most online today: 76 - most online ever: 565 (October 17, 2020, 02:08:06 PM)
Pages: [1]
Print
Author Topic: Google indexing the Forums?  (Read 5377 times)
Seth L. Blumberg
Member

Posts: 303


« on: July 26, 2002, 07:48:53 AM »

Check out what I found while ego-surfing.

Can we get a /robots.txt that excludes search engines from the Forums, please? I don't necessarily want a prospective employer to be looking at my Forge postings.
Logged

the gamer formerly known as Metal Fatigue
Clinton R. Nixon
Member

Posts: 2624


WWW
« Reply #1 on: July 26, 2002, 07:51:01 AM »

We most certainly can.

Now, this is a funny question coming from the technical proficency guy, but: how exactly do we do that?

- Clinton
Logged

Clinton R. Nixon
CRN Games
Zak Arntson
Member

Posts: 839


WWW
« Reply #2 on: July 26, 2002, 07:57:30 AM »

This is for Google, but it may be useful for other engines: http://www.google.com/webmasters/3.html#removed. It's a start.
Logged

Matt Snyder
Member

Posts: 1380


WWW
« Reply #3 on: July 26, 2002, 07:59:52 AM »

Quote from: Clinton R Nixon
We most certainly can.

Now, this is a funny question coming from the technical proficency guy, but: how exactly do we do that?

- Clinton


If I understand it rightly, you do something like this:

Quote

# robots.txt for http://www.yoursite.com
# This file is for resticting access to parts of the web server
# to all robots who use the Robot Exclusion Standard.

User-Agent: *
Disallow: /forum


(Edit: that character after "User-Agent:" should be an asterisk.)

Each disallow "/whatever" is another directory to exclude. Then, you save this file as "robots.txt" in your root web directory.

That's how we do it at the web site I work for.
Logged

Matt Snyder
www.chimera.info

"The future ain't what it used to be."
--Yogi Berra
Clinton R. Nixon
Member

Posts: 2624


WWW
« Reply #4 on: July 26, 2002, 08:08:31 AM »

Got it - I've added a /robots.txt file, and added META tags that should prevent the pages from being indexed, as well.

I'll contact Google and ask them to remove www.indie-rpgs.com/forum from their cached files.
Logged

Clinton R. Nixon
CRN Games
Le Joueur
Member

Posts: 1367


WWW
« Reply #5 on: July 26, 2002, 08:16:53 AM »

Quote from: Clinton R Nixon
Now, this is a funny question coming from the technical proficency guy, but: how exactly do we do that?

Forgive my ignorance, but don't you just put this in the header:

Code:
<META NAME="robots" CONTENT="NOINDEX, NOFOLLOW">

A little bit of code in the source that the php pulls the header from should do the trick, right?

BTW, thanks for pointing me at pMachine; have you switched the reviews to it recently?

Fang Langford
Logged

Fang Langford is the creator of Scattershot presents: Universe 6 - The World of the Modern Fantastic.  Please stop by and help!
Clinton R. Nixon
Member

Posts: 2624


WWW
« Reply #6 on: July 26, 2002, 08:22:57 AM »

Quote from: Le Joueur

A little bit of code in the source that the php pulls the header from should do the trick, right?

BTW, thanks for pointing me at pMachine; have you switched the reviews to it recently?


Yup - it works great for them. I'm glad you like it.
Logged

Clinton R. Nixon
CRN Games
Victor Gijsbers
Acts of Evil Playtesters
Member

Posts: 390


WWW
« Reply #7 on: July 26, 2002, 01:34:32 PM »

What exactly is the rationale behind this? I mean, it's your site, you can do with it whatever you wish. But this forum is full of interesting material, and I think it's rather strange to ensure that people won't be able to find it.
Logged

Clinton R. Nixon
Member

Posts: 2624


WWW
« Reply #8 on: July 26, 2002, 01:40:10 PM »

It's so people's names can't be found while searching the Internet. Don't think we won't still be indexed - the Forge will be. However, the front page of the forums (which is all that would be indexed anyway, for reasons I can explain if you like, but are technical and boring) doesn't convey much information, and might have my, Seth's, your, or anyone's name on it.

Many people would rather their names not pop up when people like prospective employers search the Internet. I can completely understand this, especially since I almost got fired from a job about 5 months ago because I said something on my personal journal that was disparaging to a co-worker.
Logged

Clinton R. Nixon
CRN Games
Victor Gijsbers
Acts of Evil Playtesters
Member

Posts: 390


WWW
« Reply #9 on: July 27, 2002, 03:49:26 AM »

Quote from: Clinton R Nixon
the front page of the forums (which is all that would be indexed anyway, for reasons I can explain if you like, but are technical and boring) doesn't convey much information, and might have my, Seth's, your, or anyone's name on it


Ah, if that is the only thing which would be indexed, not much is lost. Does the 'technical and boring' reason have anything to do with the fact that the threads themselves aren't static html but rather database entries retrieved by your php-scripts?
Logged

Clinton R. Nixon
Member

Posts: 2624


WWW
« Reply #10 on: July 27, 2002, 07:34:15 AM »

Victor,

It's because you access forums and threads with URL's like http://www.indie-rpgs.com/forum/viewforum.php?f=1, which are composed of a web page + arguments sent to that web page. Google, and other indexes, only grab the web page without any arguments.

If our forum system created URL's like: http://www.indie-rpgs.com/forum/site_discussion/2341, then the individual threads would be indexed.
Logged

Clinton R. Nixon
CRN Games
Victor Gijsbers
Acts of Evil Playtesters
Member

Posts: 390


WWW
« Reply #11 on: July 27, 2002, 02:01:10 PM »

This is strange, because to test my theory, I tried a Google on my nickname at http://gathering.tweakers.net. It's a forum with url's like "http://gathering.tweakers.net/showtopic.php/220386/1/100", without arguments. As I have over 7000 posts there, I should have found quite a lot. But I didn't find a single thing, nothing at all.

Might this have something to do with the fact that the last part of the url (the '/1/100' here) depends on options in the user's profile? (1/100 means: go to the first page, with 100 posts per page.)

Hm, maybe I'd better ask this to one of the DB-admins on the Gathering of Tweakers itself. Never mind.
Logged

Paul Czege
Acts of Evil Playtesters
Member

Posts: 2341


WWW
« Reply #12 on: July 28, 2002, 09:07:43 PM »

It's because you access forums and threads with URL's like http://www.indie-rpgs.com/forum/viewforum.php?f=1, which are composed of a web page + arguments sent to that web page. Google, and other indexes, only grab the web page without any arguments.

They've got this:

http://216.239.33.100/search?q=cache:UuPbN3uI2sgC:www.indie-rpgs.com/forum/viewforum.php%3Ff%3D2+%22paul+czege%22&hl=en&ie=UTF-8">http://www.indie-rpgs.com/forum/viewforum.php?f=2

Paul
Logged

My Life with Master knows codependence.
And if you're doing anything with your Acts of Evil ashcan license, of course I'm curious and would love to hear about your plans
Pages: [1]
Print
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC
Oxygen design by Bloc
Valid XHTML 1.0! Valid CSS!