Does G2 use session IDs for guests? I'm trying to do some SEO on my site, and session IDs for guests doesn't help search engines at all. Any tips for SEO with G2?
yes it has session ids for guests. but we handle search engine spiders differently.
eosguy
Joined: 2005-07-27
Posts: 38
Posted: Sat, 2005-09-03 17:22
So what is the best way for me to get Google and other spiders to search my website properly? As it is now, I've removed session IDs from my phpBB installation, trying to make other sites I have URL friendly in order to get higher page ranking.
Thanks for replying.
valiant
Joined: 2003-01-04
Posts: 32509
Posted: Sun, 2005-09-04 02:19
G2 does already special handling for search engines.
if it's a search engine, it doesn't add the session to URLs and similar stuff.
this code is used to detect search engines (and it works):
/**
* Return the id of the search engine currently crawling the site by
* analyzing the current request.
*
* @return string the crawler id, or null if it's a regular user
*/
function identifySearchEngine() {
if (!isset($_SERVER['HTTP_USER_AGENT'])) {
return null;
}
$userAgent = $_SERVER['HTTP_USER_AGENT'];
if (strstr($userAgent, 'Google')) {
return 'google';
} else if (strstr($userAgent, 'Yahoo')) {
return 'yahoo';
} else if (strstr($userAgent, 'Ask Jeeves')) {
return 'askjeeves';
} else if (strstr($userAgent, 'msnbot')) {
return 'microsoft';
}
return null;
}
djmaze
Joined: 2005-08-28
Posts: 17
Posted: Sun, 2005-09-04 02:28
This isn't bullet proof as there are already numberous of agents who "fake" google which makes the above code not perfect.
and who should maintain such a list? IPs will change sooner or later.
eosguy
Joined: 2005-07-27
Posts: 38
Posted: Sun, 2005-09-04 17:11
Interesting... I'm using this tool that says it'll spider like google: http://www.gritechnologies.com/tools/spider.go - you can enter your URL over there and let it spider your site. I get the session IDs in the URLs.
valiant
Joined: 2003-01-04
Posts: 32509
Posted: Sun, 2005-09-04 17:30
just take a look at the URLs on this http://www.gritechnologies.com/tools/spider.go page. obviously they don't immitate google good enough, because G2 doesn't detect it as google.
take a look at your g2data/sessions/ directory, there you'll see that google sessions ids are of the pattern: google662496410 ("google" . random number).
and yes, our google detection code works. it's the lack of this "pseudo google spider" tool.
eosguy
Joined: 2005-07-27
Posts: 38
Posted: Sun, 2005-09-04 18:27
Google hasn't searched my site yet, if the files in the sessions folder are any indication. I only have files with random numbers and with 'microsoft' in front. If I want to submit a sitemap to Google, should I then just make sure that I'm logged in to Gallery and have the sitemap done?
valiant
Joined: 2003-01-04
Posts: 32509
Posted: Sun, 2005-09-04 18:32
no, just submit your normal web url to google. no special steps are needed.
google will only index what a guest user sees. anything else would a) doesn't make sense and b) be a potential security risk.
that's not G2. that's either Wordpress or your php configuration.
please post a phpinfo link.
ichthyous
Joined: 2006-06-16
Posts: 301
Posted: Mon, 2006-10-16 15:42
I used Xenu to check my gallery site this weekend and I was shocked when all of these session ID pages showed up...basically, every page in the site had three pages. However, Goggle has been busy indexing my site like crazy and not a single session ID page has showed up in the index yet, so I would be inclined to agree with Valiant that this system works well. Maintaining a database of real IP addresses associated with each bot is both very time consuming and innacurate as they change all the time. I personally don't think gallery needs it either, as it's enough to detect the major spiders and make sure they don't get session ID pages. I might add a few other smaller bots but with those four you already have around 95% of search covered. You can use Poodle Predictor to check your site to see how the bots will spider your page
Posts: 32509
yes it has session ids for guests. but we handle search engine spiders differently.
Posts: 38
So what is the best way for me to get Google and other spiders to search my website properly? As it is now, I've removed session IDs from my phpBB installation, trying to make other sites I have URL friendly in order to get higher page ranking.
Thanks for replying.
Posts: 32509
G2 does already special handling for search engines.
if it's a search engine, it doesn't add the session to URLs and similar stuff.
this code is used to detect search engines (and it works):
/** * Return the id of the search engine currently crawling the site by * analyzing the current request. * * @return string the crawler id, or null if it's a regular user */ function identifySearchEngine() { if (!isset($_SERVER['HTTP_USER_AGENT'])) { return null; } $userAgent = $_SERVER['HTTP_USER_AGENT']; if (strstr($userAgent, 'Google')) { return 'google'; } else if (strstr($userAgent, 'Yahoo')) { return 'yahoo'; } else if (strstr($userAgent, 'Ask Jeeves')) { return 'askjeeves'; } else if (strstr($userAgent, 'msnbot')) { return 'microsoft'; } return null; }Posts: 17
This isn't bullet proof as there are already numberous of agents who "fake" google which makes the above code not perfect.
Examples?
x.x.x.x: Mozilla/4.0 (compatible; Google Desktop)
69.93.41.x: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
http://ws.arin.net/cgi-bin/whois.pl?queryinput=69.93.41
Anyway HTTP_USER_AGENT can be manipulated by anyone and anything (think about the Opera browser)
Secondly there are 1000's of bots not just 4 ;)
I know it's a pain if you want to log where someone goes.
Posts: 32509
well, should we care?
if someone decides to be identified as google, let him do that...
if you have suggestions / a patch, please let us know.
Posts: 17
Identify a bot by his IP. Here's a list of our findings
http://dragonflycms.org/cvs/html/includes/classes/cpg_member.php?v=9.26#188
Posts: 32509
and who should maintain such a list? IPs will change sooner or later.
Posts: 38
Interesting... I'm using this tool that says it'll spider like google: http://www.gritechnologies.com/tools/spider.go - you can enter your URL over there and let it spider your site. I get the session IDs in the URLs.
Posts: 32509
just take a look at the URLs on this http://www.gritechnologies.com/tools/spider.go page. obviously they don't immitate google good enough, because G2 doesn't detect it as google.
take a look at your g2data/sessions/ directory, there you'll see that google sessions ids are of the pattern: google662496410 ("google" . random number).
and yes, our google detection code works. it's the lack of this "pseudo google spider" tool.
Posts: 38
Google hasn't searched my site yet, if the files in the sessions folder are any indication. I only have files with random numbers and with 'microsoft' in front. If I want to submit a sitemap to Google, should I then just make sure that I'm logged in to Gallery and have the sitemap done?
Posts: 32509
no, just submit your normal web url to google. no special steps are needed.
google will only index what a guest user sees. anything else would a) doesn't make sense and b) be a potential security risk.
Posts: 219
Hmmm, it aint working! i'm seeing lots of spidered links like:
http://www.digitaltoast.co.uk/?PHPSESSID=78cc025f308fa1b7d7703ad19a52886c
Posts: 32509
that's not G2. that's either Wordpress or your php configuration.
please post a phpinfo link.
Posts: 301
I used Xenu to check my gallery site this weekend and I was shocked when all of these session ID pages showed up...basically, every page in the site had three pages. However, Goggle has been busy indexing my site like crazy and not a single session ID page has showed up in the index yet, so I would be inclined to agree with Valiant that this system works well. Maintaining a database of real IP addresses associated with each bot is both very time consuming and innacurate as they change all the time. I personally don't think gallery needs it either, as it's enough to detect the major spiders and make sure they don't get session ID pages. I might add a few other smaller bots but with those four you already have around 95% of search covered. You can use Poodle Predictor to check your site to see how the bots will spider your page