Gallery2 and SEO - Tips and suggestions
Joined: 2005-07-16
Posts: 39 |
Posted: Tue, 2005-09-20 00:08 |
Updated Feb-05-2007 I don't know if all of this has been covered (can't get the new search to work for me) but I have some general suggestions for the SEO (search engine optimizng) of Gallery2. In no particular order: 1)Title bar The title of each album should be in the title bar along with the photo title (when viewing photos). This can be accomplished by modifying the <title> tag in the appropriate tpl page. 2)Mod_Rewrite Use dashes (-) instead of underscores (_), this way older search engines (and some new ones) will treat the url as separate words. 3)index.php Apply the mod to use index.php instead of main.php. This is very important! Redirects kill rankings (especially 302's). Also, without this your site will never get assinged the appropriate Page Rank from Google. Update: This mod doesn't work anymore so if you are able to, which you most likely are, modify httpd.conf to add main.php before index.php and all should be well with the world once again. 4)Correct error pages The default error page generates an HTTP 200 response code (OK), the correct codes should be used or the spiders will just keep indexing the error page. Update: This was fixed in the Gallery code, I believe, correct me if I'm wrong. 5)Duplicate content Having multiple sizes of an image is great, but to a spider, it looks like copies of the same page. It's best to modify the tpl's to load the full size images in a new window, without the template includes. Duplicate content penalties are common and VERY hard to fix. 6)Album title when viewing photo's Having the album title on the page with the photo title will GREATLY increase keyword density and weighting. 7)<h1> and <h2> tags Adding the <h1> tag and formatting to the CSS and using it for the album and picture titles when viewing albums and photos will add to keyword weight. May help, may not but it's worth a shot. 8)Slideshows and other "features" See duplicate content above. 9)File and album naming Prepare a naming convention and stick to it throughout your gallery. Photo and album titles should be clear and concise. "Won't someone please think of the spiders?" 10)Preparation is better than perspiration If you're going to do any of this, do it BEFORE you make your site available to the outside world. Once the search engines latch on to something, it's difficult to make them let go. Especially if it's wrong. This is important so that you don't end up with a bunch of non-existent pages in the search engines when you change the names 20 times. There is probably more that I'm missing here, but that covers the most important that I can think of and have done to my site to aid the little spiders in moving around my gallery. If you have any to add or disagree with any of the above, please post and explain your position. Added: I've tried to follow this thread for the last year or so, but for some reason it only emails me updates when it feels like it heh, so here goes some Q&A Question: Index.php always showing instead of Question: Do you have a list of changes, or a patch of some sort to implement? Question: Do bots pick up a session ID when browsing my site? Question: Is the same as Question: If Google picked up some pages that I don't want there anymore ( can I use the removal to to get Google to drop the listing? Some quick examples: And for crying out loud ---- MAKE A BACKUP!!---- ==HighlightId== Open modules/core/templates/blocks/BreadCrumb.tpl Remove arg3="highlightId=`$theme.parents[parent.index_next].id`" and Remove "arg3="highlightId=`$`" ==Picture name in the resize link== Open modules/core/templates/blocks/PhotoSizes.tpl Find this block <a href="{g->url arg1="view=core.ShowItem" arg2="itemId=`$`" arg3="imageViewsIndex=`$theme.sourceImageViewIndex`"}"> {$smarty.capture.fullSize} </a> Add {$theme.item.title|markup} thusly: <a href="{g->url arg1="view=core.ShowItem" arg2="itemId=`$`" arg3="imageViewsIndex=`$theme.sourceImageViewIndex`"}">{$theme.item.title|markup} {$smarty.capture.fullSize} </a> Add some text to make it even spiffier, and make it open in a new window without a template like this: <a href="{g->url arg1="view=core.DownloadItem" arg2="itemId=`$`" }" target="_blank">View {$theme.item.title|markup} full size {$smarty.capture.fullSize} </a> ==Make links out of thumbnail titles== (This is for Matrix, poke around other themes to see how they handle titles) Open album.tpl Find {if !empty($child.title)} <p class="giTitle"> {if $child.canContainChildren} {g->text text="Album: %s" arg1=$child.title|markup} {else} {$child.title|markup} {/if} </p> {/if} Change to: {if !empty($child.title)} <p class="giTitle"> {if $child.canContainChildren} <a href="{g->url arg1="view=core.ShowItem" arg2="itemId=`$`"}"> {g->text text="Album: %s" arg1=$child.title|markup} {else} <a href="{g->url arg1="view=core.ShowItem" arg2="itemId=`$`"}"> {$child.title|markup}</a> {/if} </p> {/if} Just your standard <a href=""></a> and the 'ol {g->url}, pretty easy to make it work with other themes. ---Hint, take out Album: and you'll feel better about yourself. ==Peer List== One thing that annoys me is seeing Snowypicture#13.jpg reduced to sno...13.jpg, so change it! Open modules/core/templates/blocks/PeerList.tpl Find entitytruncate:14 (there are 2) Change the 14 to however many characters you think your file names will grow to. ======== Again, if you see any mistakes, please point them out. I'll continue to keep an eye on the thread and update as necessary. Good luck! |
Joined: 2003-01-04
Posts: 32509 |
Posted: Tue, 2005-09-20 00:27 |
Great, thanks for the contribution. Added it to the how to's page: maybe we can get some of the optimizations into G2. but certainly not all, since there are conflicting goals. features vs. duplicate content. |
Joined: 2005-07-16
Posts: 39 |
Posted: Tue, 2005-09-20 00:49 |
Redirects are death in Google. It sees the 302 as spam, even if it's just a naming thing. Their algo just doesn't know how to distinguish, yet anyway. It would be extremely beneficial to make index.php the default, especially for less experienced webmasters. PageRank is also url dependant, which means if all your inbound links point to http:\\ and it redirects to http:\\\main.php, the PR won't fully transfer, if at all. And thanks for the kudos, always happy to help, especially when you guys have given so much! |
Joined: 2005-10-04
Posts: 3 |
Posted: Tue, 2005-10-18 03:55 |
you have the worst technical support ever! no instruction on how to do any this is a joke! no wonder coppper is better! |
Joined: 2003-01-04
Posts: 32509 |
Posted: Tue, 2005-10-18 08:45 |
bmsstore @topic: |
Joined: 2005-01-09
Posts: 383 |
Posted: Tue, 2005-10-18 09:14 |
I'm not getting it. Where does it say coppermine is better? A lot of work went into coppermine I am sure. And a lot of work went into gallery2 but I haven't seen a comparison myself yet. no wonder copper sucks :P On topic. I am trying to keep my website out of the hands of google's search. Using robot.txt I am doing fine, but some safeguards won't hurt. So leave the bad ranking features in ;) |
Joined: 2005-10-18
Posts: 4 |
Posted: Tue, 2005-10-18 09:26 |
I really like G2 - it looks really good intergrated into my wordpress install - but search engine friendly its not. All of the points in the first post are valid and easily changed, but as long as the URL contains the g2_GALLERYSID part its never going to be very search engine friendly. I've spent the best part of today trying to find a way to remove it, but my lack of php ability really let me down. If anyone has a way to remove the g2_GALLERYSID from the URL - please post it, I don't mind an ugly hack or even a suggestion of where to start. I know this post sounds a bit down, but really its not I think G2 is great - its just that I'm facing having to move back to a sub standard solution if I can't get this sorted. |
Joined: 2003-01-04
Posts: 32509 |
Posted: Tue, 2005-10-18 09:42 |
stephen_3 |
Joined: 2005-01-09
Posts: 383 |
Posted: Tue, 2005-10-18 10:17 |
valiant, stephen_3, |
Joined: 2005-10-18
Posts: 4 |
Posted: Tue, 2005-10-18 10:25 |
Valiant thankyou for your reply - although Im not sure if its correct. Try the search allinurl:g2_GALLERYSID in google and it finds 729,000 pages that have been spidered with the session id in them and not just the front page, whole galleries that have been spidered with the session id. Im watching my logs, one of the major crawlers should hit the new galleries in the next few days and il post back with whether its crawling with session ids or not. Thanks for the help. Stephen. |
Joined: 2005-10-18
Posts: 4 |
Posted: Tue, 2005-10-18 10:32 |
Thanks RwD, I have cookies enabled so that I don't get the session id, the problem is that I don't believe that spiders allow you to set cookies and therefore they get the the session id. I have no problem with my users or myself getting sessions, but i don't want google spidering the same pages multiple times with different session ids. |
Joined: 2003-01-04
Posts: 32509 |
Posted: Tue, 2005-10-18 10:53 |
stephen_3 i don't know yet what could have caused this. seeing your server access logs will be interesting. |
Joined: 2003-08-05
Posts: 565 |
Posted: Tue, 2005-10-18 19:09 |
Rwd---- appropos your comments about keeping robots out... use dynamic <robots.txt> files. * The static form of the <robots.txt> file is often the wrong flavour, it works the wrong way around. It automatically and indiscriminately lets in every visiting spyder - without fail - every time - all the time. If you've done your homework you will have coded your <robots.txt> file to include all the places that are inappropriate for indexation. So not only does the visiting spyder know that you're letting absolutely everybody and everything into your site but you've also automatically informed them as to the whereabouts of stuff you'd prefer not to be broadcast. It's all a bit dissatisfying. Wandering through your access logs, as you do, can often leave you with a bad taste in the mouth when you check up the wherewithall of the latest visiting spyder;~/ This usually means then chasing through a myriad of sub-directories rewriting all the .htaccess files - as appropriate. There's NOTHING like KNOWING that you're destined for sessions of repetitive stable door closing to increase your self-esteem! * After some overlong experiences of spyder hunting/bashing/blocking I turned things around. I coded my site to work the logic the other way. My sites automatically reject new spyders without fail - every time - all the time. They do this without revealing all the sub-directories that should not be spydered even by white-listed spyders. White-listed spyders are provided with appropriate list of disallowed sub-directories. Thus these robots.txt files can be considered as being dynamic. * Briefly - use PHP to pretend to run a 'normal' robots.txt but whose content is dynamically rendered. * In detail - amend your existing .htaccess and robots.txt files thus: [.htaccess] <Files robots.txt> ForceType application/x-httpd-php </Files> [robots.txt] <?php $disallow=array("/noindex", "/html", "/assets", "/private", "/cgi-bin", "/pdf-noindex"); $disallow_none = TRUE; $agent = 0; header('Content-type: text/plain'); $robots = array("Yahoo! Slurp", "ConveraCrawler", "Googlebot", "Googlebot-Image", "", "ia_archiver", "IRLbot", "JemmaTheTourist", "", "MJ12bot", "MojeekBot", "Mozdex", "msnbot", "NutchOrg", "pipeLiner", "psbot", "searchengineworld", "", "Speedy Spider", "SurveyBot", "Teoma"); foreach($robots as $rob) { $pos = substr_count($_SERVER['HTTP_USER_AGENT'], $rob); if ($pos) { if (!$agent) { echo "User-agent: *\n"; if (in_array($rob, $robots)) { foreach($disallow as $dis) { echo "Disallow: $dis\n"; } $agent++; } } } } if ($agent == 0) { if ($disallow_none) { echo "User-agent: *\n"; echo "Disallow: /\n"; } } ?> The above <robots.txt> 'file' whitelists a sample bunch of fairly well known spyders. It provides them with the expected list of disallowed sub-directories. All other spyders get a very short file that disallows everything. The onus is on you to check the likely provenance of newcomers for possible white-listing. BTW they ALWAYS come back for another sniff around no matter what... ----best wishes, Robert |
Joined: 2005-01-09
Posts: 383 |
Posted: Tue, 2005-10-18 20:42 |
I was about to go to sleep, caught a flu. Not realy comprehending what you mean... In any case, this is my robots.txt User-agent: * Disallow: / I don't want anything to show up on searches, I do not reveal any info even though not every piece has te stay de-indexed. Is this a bad method?? Everything I want to keep from the spiders is actually indeed within a subdir :P I just want to keep the stuff from the search, if I wanted it to be unfindable I would not put it online ;) |
Joined: 2003-08-05
Posts: 565 |
Posted: Tue, 2005-10-18 20:57 |
Your file says "to all robots, disallow everything in the root and beyond (except for robots.txt of course)". Properly programmed robots will abide by this and not index anything on your site. Good robots will even de-index everything previously indexed... I have accidently discovered this to my cost with Google after innocently 'protecting' a single blogging page with noindex/nofollow on the actual page. Google promptly delisted my entire site and page rating. Oh and FWIW stay away from migratory birds and chickens!!! ----best wishes, Robert |
Joined: 2003-08-05
Posts: 565 |
Posted: Tue, 2005-10-18 21:04 |
My flavour of programming is for sites that want particular sections of their site to be properly listed in some nominated search engines but to portray an uncompromising 'go away' attitude to all other and any brand new visiting spyders (until investigated) without incidently highlighting your nominated areas of interest. It means you don't have to be on perpetual lookout for dubious spyders. It is NOT a strategy for all site administrators;~) ----best wishes, Robert |
Joined: 2003-08-05
Posts: 565 |
Posted: Tue, 2005-10-18 21:24 |
Maybe this will clarify things... 1) whitelisted spyder example 2) unlisted/blacklisted spyder example That's it... automatically;~) ----best wishes, Robert |
Joined: 2005-01-09
Posts: 383 |
Posted: Wed, 2005-10-19 11:11 |
icpix wrote:
Oh and FWIW stay away from migratory birds and chickens!!! Better a normal flu the this birds version. I heard it is pretty deadly. I just hope it passes over. Negative projections say hundreds of milions of people will die from it within a short time. Luckily those are the negative projections :P |
Joined: 2005-10-18
Posts: 4 |
Posted: Mon, 2005-10-24 00:02 |
Valiant, |
Joined: 2005-10-20
Posts: 3 |
Posted: Mon, 2005-10-24 12:07 |
Hi, looks like you are real SEO-guru, can you suggest how to solve most visible SEO problems for G2 v 2.0.1? Maybe some mini-HOWTO or patch to help? |
Joined: 2003-01-04
Posts: 32509 |
Posted: Mon, 2005-10-24 12:28 |
stephen_3: /** * Return the id of the search engine currently crawling the site by * analyzing the current request. * * @return string the crawler id, or null if it's a regular user */ function identifySearchEngine() { if (!isset($_SERVER['HTTP_USER_AGENT'])) { return null; } $userAgent = $_SERVER['HTTP_USER_AGENT']; if (strstr($userAgent, 'Google')) { return 'google'; } else if (strstr($userAgent, 'Yahoo')) { return 'yahoo'; } else if (strstr($userAgent, 'Ask Jeeves')) { return 'askjeeves'; } else if (strstr($userAgent, 'msnbot')) { return 'microsoft'; } return null; } so no, we don't identify by IP, we parse the userAgent string for google, yahoo, etc. |
Joined: 2005-10-25
Posts: 1 |
Posted: Tue, 2005-10-25 00:37 |
Hi guys, G2 is absolutely great app, the best piece of software on the net. lost fo kudos to you all. SEO question, is there ANY simple way to use index.html instead of silly main.php? |
Joined: 2002-12-10
Posts: 16504 |
Posted: Tue, 2005-10-25 13:31 |
Yes there is: For your URL Rewrite issue, please start a new topic and include the information requested (you'll see that when you start a new topic). You shouldn't even need to look at a page for the URL Rewrite module to work. If it's an album you see an error with and you are manually entering the URL, you need to put a trailing slash on the URL ( not I'm not a rewrite rule expert, but if you start a new topic with your problem, the URL Rewrite module developer will probably be able to help you. Be sure to be clear in your subject of what your problem is "Gallery doesn't work" doesn't help |
Joined: 2005-11-03
Posts: 15 |
Posted: Sat, 2005-11-19 20:51 |
I have a question about 5) if I can see the duplicate content (large images), but no one else is allowed to. Will that affect that? On the same note, do spiders only crawl along public / anonymous user content? Can they even attempt to index content that's restricted? |
Joined: 2002-12-10
Posts: 16504 |
Posted: Sat, 2005-11-19 21:07 |
Spiders cannot crawl content they cannot access. So if they don't have the username and password to login and crawl the site unrestricted or as that user, then they can't see the content. Example, if you make it so users have to login to see any content a search spider will never know of any of the content you have in your gallery. |
Joined: 2005-07-16
Posts: 39 |
Posted: Sat, 2005-11-19 21:20 |
ryooki wrote:
I have a question about 5) if I can see the duplicate content (large images), but no one else is allowed to. Will that affect that? On the same note, do spiders only crawl along public / anonymous user content? Can they even attempt to index content that's restricted? No, only publicly accessible content can be spidered. If you are the only one able to access the full image then the spider will not be able to crawl and index it. |
Joined: 2005-11-03
Posts: 15 |
Posted: Sat, 2005-11-19 21:55 |
Thanks for answering so quickly. |
Joined: 2004-06-14
Posts: 243 |
Posted: Mon, 2005-12-05 18:06 |
@index.php @slideshow @fullsize |
Joined: 2005-12-27
Posts: 10 |
Posted: Tue, 2005-12-27 14:41 |
Continental: How did you solve the slideshow problem in your robots.txt file? |
Joined: 2003-08-05
Posts: 565 |
Posted: Tue, 2005-12-27 15:26 |
By increments. 1) amended my dynamic robots.txt file to include... That should've done it but Googlebot and Googlebot-Image are still parsing shedloads of slideshow URLs. 2) amended /modules/slideshow/templates/local/Header.tpl to show... 3) The above robots.txt stuff should've been interpreted as any URL including the character string of slideshow.html but the google family steadfastly have been ignoring this and, indeed, the per iteration NOINDEX declaration. Mystifying. So, by way of experiment, I have amended my dynamic robots.txt file to show... Yes, I could use * wildcards and end of line markers $ as per here but I cannot rely on other less involving search engines to cooperate. So, for a few more days, I will experiment with the above amendment. Will update this thread with my results. |
Joined: 2002-12-05
Posts: 573 |
Posted: Wed, 2006-03-29 20:26 |
valiant wrote:
bmsstore |
Joined: 2005-07-27
Posts: 38 |
Posted: Mon, 2006-04-03 01:00 |
I've been reading this thread and still don't quite know what is the best way to get Google to index all the pages in a Gallery powered site. Does anyone have a properly indexed Gallery site? Smugmug users seem to have all their pages indexed by Google, though it's a completely different gallery engine. |
Joined: 2002-12-05
Posts: 573 |
Posted: Sun, 2006-04-09 23:57 |
most of my album pages are indexed: [img][/img] |
Joined: 2005-07-27
Posts: 38 |
Posted: Wed, 2006-04-12 01:47 |
Google has only indexed my old pages. My site has been revamped and some pages are now gone, and there are many more new pages. Anyway to force Googlebot to reindex my site? |
Joined: 2005-07-16
Posts: 39 |
Posted: Wed, 2006-04-12 21:07 |
eosguy wrote:
Google has only indexed my old pages. My site has been revamped and some pages are now gone, and there are many more new pages. Anyway to force Googlebot to reindex my site? |
Joined: 2006-08-12
Posts: 7 |
Posted: Sat, 2006-09-23 21:14 |
I don't see how that robots.txt file stops Google from trying to index the slideshow. |
Joined: 2003-08-05
Posts: 565 |
Posted: Sat, 2006-09-23 21:32 |
forumposters----- |
Joined: 2006-09-13
Posts: 59 |
Posted: Tue, 2006-09-26 00:01 |
When you delete an album or some pictures which Google has already picked up on and indexed, it (google) will keep the page linked and gallery returns an error when you go to it. Is there any way of overcoming this? --- |
Joined: 2006-09-13
Posts: 59 |
Posted: Tue, 2006-09-26 00:10 |
Oh, another thought, I guess from what has been said here that Google won't exactly love permalinks from the point of duplication??? --- |
Joined: 2006-06-16
Posts: 324 |
Posted: Mon, 2007-02-12 23:55 |
When you delete an album or some pictures which Google has already picked up on and indexed, it (google) will keep the page linked and gallery returns an error when you go to it. Is there any way of overcoming this? Google retains all copies of pages it's indexed for quite a while...sometimes up to a year. The next time Googlebot comes to spider your pages it will get the 404 (Page not found) error from Gallery app or from your server and then the page will go supplemental (will have the word "supplemental" in green under the listing) Supplemental pages are out of date pages or pages Google considers to be duplicate content and perhaps suspect as SPAM. You'll have to wait until Google finds the new page and indexes that and it will appears in google's index of your site. You can help things along by issuing a 301 redirect or every page that's been moved and that will tell all the search engines where to find the new page immediately. 301 redirects require specific knowledge of Apache mod_alias and mod_rewrite rules though |
Joined: 2006-06-16
Posts: 324 |
Posted: Sat, 2006-10-07 13:47 |
Oh, another thought, I guess from what has been said here that Google won't exactly love permalinks from the point of duplication??? You are correct. Google will index both directories and find duplicate pages, then one or both sets are lkely to go supplemental. I turned permalinks off for my site and use custom 301 redirects when content moves. The whole time my site was being built google was indexing it, even though I had a noindex tag in robots.txt. When google updated it's index there were hundreds of broken urls as I had changed the url structure on my site several times. It took a lot of time to clear it up as custom 301 redirects had to be written to match every broken URl pattern and rewrite then to the current URL pattern. Andrew |
Joined: 2005-11-04
Posts: 1642 |
Posted: Sat, 2006-10-07 15:05 |
hollyonline wrote:
When you delete an album or some pictures which Google has already picked up on and indexed, it (google) will keep the page linked and gallery returns an error when you go to it. Is there any way of overcoming this? I think the webmasters tool on google has a facility somewhere to remove content. . |
Joined: 2005-07-16
Posts: 39 |
Posted: Sat, 2006-10-07 15:38 |
Dayo wrote:
Yeah, but it doesn't actually remove it. It just hides it for 180 days, then it pops back up. You can hose the entire site with that so your best bet is just wait for googlebot to figure it out. |
Joined: 2006-06-16
Posts: 324 |
Posted: Fri, 2008-01-25 23:15 |
I have made quite a few modifications to my site to help search engine compatibility. Here is a short rundown of the changes I made: 1) The biggest one...use rewrite module to create keyword rich URLS This short list of changes will help greatly in SEO of your pages. I have yet to figure out how to implement one final change which would help greatly, namely to change the base URI in gallery to just the domain name and not domain name+index.php or main.php. You can see the various customizations at my website below. Andrew |
Joined: 2005-11-26
Posts: 108 |
Posted: Thu, 2006-11-16 00:20 |
ichthyous, What do you mean by your change number "2" ================================== |
Joined: 2004-10-07
Posts: 560 |
Posted: Wed, 2006-12-20 10:51 |
index.php won't be shown if that's the default what apache is looking for. My Gallery 2: |
Joined: 2006-06-16
Posts: 324 |
Posted: Mon, 2007-02-05 23:25 |
I have index.php set as the default for my site, but that's not really what I meant. Gallery app itself uses index.php when it generates the urls. I had to change the actual code to remove that. It's not something that is inherently bad, but since all my incoming links pointed to my domain I didn't want all the internal links pointing to index.php. I had posted about duplicate content issues with Google and slideshows...the new exlide flash slideshow module should take care of that...if you can get it to work. My Photos of Spain Gallery |
Joined: 2004-10-07
Posts: 560 |
Posted: Mon, 2007-02-05 23:41 |
Do you really think that there is an issue with duplicate content & slideshows? 1. I don't think so, google should not be that dumb |
Joined: 2005-07-16
Posts: 39 |
Posted: Tue, 2007-02-06 00:13 |
Yes, GoogleBot is that dumb. |
Joined: 2005-07-16
Posts: 39 |
Posted: Tue, 2007-02-06 00:22 |
Here's a bit of an update for the index.php bit... Set apache to look for main.php before index.php - poof, no more re-directs. ie: httpd.conf: DirectoryIndex main.php index.php index.html blah.html and so on |
Joined: 2006-06-16
Posts: 324 |
Posted: Fri, 2008-01-25 23:16 |
Do you really think that there is an issue with duplicate content & slideshows? Yes, these days Google is very touchy about dupe content so why risk it? Flash slideshows are much prettier and no dupe content issues at all. Once you start seeing a good number of pages going supplemental it's very hard and time consuming to track it and turn it around. Netscan, I'm not sure I really follow your last post. I don't have any main.php/index.php redirect as I have swapped the files. I did a lot of editing in one of the class files to remove the index.php entirely from my links. All of my internal links point to just the domain name. |