Session ID

eosguy

Joined: 2005-07-27
Posts: 38
Posted: Sat, 2005-09-03 16:19

Does G2 use session IDs for guests? I'm trying to do some SEO on my site, and session IDs for guests doesn't help search engines at all. Any tips for SEO with G2?

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sat, 2005-09-03 16:33

yes it has session ids for guests. but we handle search engine spiders differently.

 
eosguy

Joined: 2005-07-27
Posts: 38
Posted: Sat, 2005-09-03 17:22

So what is the best way for me to get Google and other spiders to search my website properly? As it is now, I've removed session IDs from my phpBB installation, trying to make other sites I have URL friendly in order to get higher page ranking. :)

Thanks for replying.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sun, 2005-09-04 02:19

G2 does already special handling for search engines.
if it's a search engine, it doesn't add the session to URLs and similar stuff.

this code is used to detect search engines (and it works):

    /**
     * Return the id of the search engine currently crawling the site by
     * analyzing the current request.
     *
     * @return string the crawler id, or null if it's a regular user
     */
    function identifySearchEngine() {
	if (!isset($_SERVER['HTTP_USER_AGENT'])) {
	    return null;
	}
	$userAgent = $_SERVER['HTTP_USER_AGENT'];
	if (strstr($userAgent, 'Google')) {
	    return 'google';
	} else if (strstr($userAgent, 'Yahoo')) {
	    return 'yahoo';
	} else if (strstr($userAgent, 'Ask Jeeves')) {
	    return 'askjeeves';
	} else if (strstr($userAgent, 'msnbot')) {
	    return 'microsoft';
	}

	return null;
    }
 
djmaze

Joined: 2005-08-28
Posts: 17
Posted: Sun, 2005-09-04 02:28

This isn't bullet proof as there are already numberous of agents who "fake" google which makes the above code not perfect.

Examples?

x.x.x.x: Mozilla/4.0 (compatible; Google Desktop)

69.93.41.x: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
http://ws.arin.net/cgi-bin/whois.pl?queryinput=69.93.41

Anyway HTTP_USER_AGENT can be manipulated by anyone and anything (think about the Opera browser)

Secondly there are 1000's of bots not just 4 ;)

I know it's a pain if you want to log where someone goes.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sun, 2005-09-04 03:01

well, should we care?
if someone decides to be identified as google, let him do that...

if you have suggestions / a patch, please let us know.

 
djmaze

Joined: 2005-08-28
Posts: 17
Posted: Sun, 2005-09-04 13:07

Identify a bot by his IP. Here's a list of our findings
http://dragonflycms.org/cvs/html/includes/classes/cpg_member.php?v=9.26#188

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sun, 2005-09-04 14:21

and who should maintain such a list? IPs will change sooner or later.

 
eosguy

Joined: 2005-07-27
Posts: 38
Posted: Sun, 2005-09-04 17:11

Interesting... I'm using this tool that says it'll spider like google: http://www.gritechnologies.com/tools/spider.go - you can enter your URL over there and let it spider your site. I get the session IDs in the URLs. :(

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sun, 2005-09-04 17:30

just take a look at the URLs on this http://www.gritechnologies.com/tools/spider.go page. obviously they don't immitate google good enough, because G2 doesn't detect it as google.
take a look at your g2data/sessions/ directory, there you'll see that google sessions ids are of the pattern: google662496410 ("google" . random number).
and yes, our google detection code works. it's the lack of this "pseudo google spider" tool.

 
eosguy

Joined: 2005-07-27
Posts: 38
Posted: Sun, 2005-09-04 18:27

Google hasn't searched my site yet, if the files in the sessions folder are any indication. I only have files with random numbers and with 'microsoft' in front. If I want to submit a sitemap to Google, should I then just make sure that I'm logged in to Gallery and have the sitemap done?

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sun, 2005-09-04 18:32

no, just submit your normal web url to google. no special steps are needed.
google will only index what a guest user sees. anything else would a) doesn't make sense and b) be a potential security risk.

 
toastmaster

Joined: 2003-05-01
Posts: 219
Posted: Fri, 2006-06-16 12:12

Hmmm, it aint working! i'm seeing lots of spidered links like:
http://www.digitaltoast.co.uk/?PHPSESSID=78cc025f308fa1b7d7703ad19a52886c

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sat, 2006-06-17 03:49

that's not G2. that's either Wordpress or your php configuration.

please post a phpinfo link.

 
ichthyous

Joined: 2006-06-16
Posts: 301
Posted: Mon, 2006-10-16 15:42

I used Xenu to check my gallery site this weekend and I was shocked when all of these session ID pages showed up...basically, every page in the site had three pages. However, Goggle has been busy indexing my site like crazy and not a single session ID page has showed up in the index yet, so I would be inclined to agree with Valiant that this system works well. Maintaining a database of real IP addresses associated with each bot is both very time consuming and innacurate as they change all the time. I personally don't think gallery needs it either, as it's enough to detect the major spiders and make sure they don't get session ID pages. I might add a few other smaller bots but with those four you already have around 95% of search covered. You can use Poodle Predictor to check your site to see how the bots will spider your page