Hi all, i'm working on a new CMS which is fully PHP5 based and was thinking to integrate Gallery2 but have some doubts and issues.
The CMS is fully UTF-8 and XHTML 1.1 based so here are my issues
1. Gallery2 relies on the gettext php extension but gettext isn't thread-safe so not advisible in a multi-threaded environment
2. gettext relies on setlocale() but that depends on which languages are installed on the system and in this case UTF-8 is a very tricky setting to use.
3. My CMS usernames are fully UTF-8 compliant which makes the use of any strlen() and other string routines buggy if a plugin (in this G2) can't handle unicode and just roughly throws with strtolower() for example
So is there any future plan of getting rid of the slow gettext system and make it utf-8 compliant, which IS the default Apache behavior anyway.
Posts: 32509
1. restricting the extensions in use etc. to only thread-safe ones was AFAIK not on the agenda when g2 was initially designed. also, it doesn't make much sense, even now, to get such a burden, there are still too many non thread-safe extensions.
2. yes that's true. that's why we have an entry in our faq which explains this fact. installing further languages on the system isn't a difficult task.
but it's true, gettext as an (external) dependency isn't that great. we looked for php based gettext replacements, and there are at least two of them, but they are very slow.
3. internally, g2 works 100% in utf-8. browser input is utf-8, output is utf-8, interactions with the filesystem are translated from/to utf-8.
you're mentioning issues with usernames in utf-8 and non-mb*() functions.
i'm not sure if we actually have such issues. got an example?
have you checked out how g2 embedding works?
g2 and your CMS don't share the same user tables. they are loosely coupled and kept in sync'.
Posts: 17
1. i understand there are much extensions that are not thread-safe but if you have a heavily visited gallery (in this case inside a CMS) it could get nasty and racing could occure
2. php slow? yes that's true but gettext is slow as well, i will try and write a benchmarker on $this->translate() to see how slow it actualy is when gettext is installed.
I did throw an exception in PHP5 to get a stack trace:
Will keep you posted on that ;)
3. I didn't check everything yet cos i just started to read the code, however if a variable passed to strlen(), substr(), preg_replace('[\w]'), etc. contains utf characters ("\xE1\x80\x80" for example) they could fail.
However i did notice that user names are stored in utf into the database so no issues there.
If the G2 embedding just copies user tables and just sync the tables it would be no option. In my vision syncing is something else then embedding or integration.
My CMS uses several tables to manage members and when you get 100 new members a day or just 10 who change their member details the sync fails unless someone adds a monitoring to it.
Don't get me wrong G2 does look nice and the features are great but i've seen many attempts to use utf-8 and many other stuff but mostly half baked solutions.
You all know i work sometimes for coppermine as well but my main goal is a good CMS with easy integrational systems which G2 could be one of.
Posts: 17
Tests done:
Local machine
P4 2.54Ghz
Windows OS
PHP 5.0.4 (no accelerators)
Default new G2 install (completely empty)
With gettext page generated in 0.6541 seconds
Without gettext page generated in 0.5053 seconds
[edit]
My CMS on this machine generates pages in 0.0731 seconds
[/edit]
http://urbanduck.net/gallery/main.php
p4 2.8
Linux OS
PHP 4.4.0
with gettext 5.1533 seconds first time and 1.326 seconds on a page refresh
http://historicalengine.org/gallery/main.php
Dual XEON
Linux OS
PHP 4.4.0
with gettext 0.6209 seconds
Modified main.php is attached below
Posts: 17
i also did a a benchmark on coppermine, no worries same results ;) even 0.03 seconds slower but coppermine has no caching.
Posts: 32509
djmaze, you work for coppermine?
no, i didn't know that, but i don't really care
cool to be involved in multiple projects 
integration / embedding:
as yo our way of integration / embedding...
i did a lot of research in this field last fall / this spring and this was the result. there's this thing i call tight integration (sharing db tables) and the other approach, loose coupling / integration. i tried first the former approach, because it's more obvious and sounds good. i ran into too many issues and realized that the second approach is much cleaner / better than it originally looks like.
modern CMS have events / hooks. e.g. xaraya, drupal, mambo, typo3, wordpress, ... all allow to register some function such that it is executed when such an event occurs.
we use this for user create / update / delete (and also group create / update / delete) events.
each time a user is created in the CMS, our function is called and it automatically creates a corresponding user in G2 and maps the CMS with the G2 user in a mapping table. loosely coupled, but always in sync' and there are no major disadvantages.
so you see, it isn't actually synchronization / periodic sync' stuff, it is a real integration and the two systems don't get out of sync'.
i'll have to write down all the pro's and con's and issues i ran into when playing with the two approaches.
gettext
what's your suggested alternative to gettext? we need a very flexible translation, want to support i18n.
Posts: 17
What i do in my current project is just using an array() with language stuff and a language loader.
for example /l10n/en/main.php
All uppercase entries contain sprintf() functionality in any way
All others are just like the entry or shortened with underscores
When i have a add-on it has it's own file like a mail.php or gd.php
A function loads the language file of the specified module in the current language or else the english version.
With another function you can control the language output
function _L($var) { global $LNG; return (isset($LNG[$var]) ? $LNG[$var] : str_replace('_', ' ', $var)); }This is just a basic example but should provide enough details how it works.
I use arrays since i noticed it's incredibly faster then using defines and it's a good alternative if you lack C knowhow to write PHP plugins or are missing gettext.
Posts: 7934
I agree that using a pure PHP solution is going to be far faster than using gettext. However, I have yet to see a pure PHP solution that provides all the functionality of gettext and can touch it on speed.
For example, if you want to translate the following message:
"I have %d apple"
you also have to translate the plural form:
"I have %d apples"
In English this is no problem. There are two plural forms, and the rule for them is easy. But how about in Gaelic which has 5 plural forms? Or Polish which has 3 plural forms that are totally different? It is tricky to get this right. Before I started using gettext for G2 I investigated a few other solutions and none of them properly solved the pluralization problem.
If you can derive a localization scheme that properly covers pluralization, we'd certainly consider it.
There's a reimplementation of gettext in pure PHP here:
http://savannah.nongnu.org/projects/php-gettext/
This is more portable, but from my benchmarks it's far slower, which is to be expected because it's competing against a pure C library.
Posts: 17
i understand plural forms is the toughest part to deal with especialy if you have many lines that deal with those, but only ngettext() deals with them and not the simple gettext().
The only place where it is used (i can find) is as dngettext() in translateDomain($domain, $data) where $data has $data['many']
Files that use the plurals (G2 minimal install):
AdminCore.inc
ItemAddFromBrowser.inc
So to get this right in a languages array you could use:
But ngettext is also limited, for example what about:
sprintf('You are using Gallery %1$d, please <a href="%3$s" target="_blank">upgrade</a> to Gallery %2$d', $old_num, $new_num, $url);This string has 2 numbers and 1 string to add. The string is a "whole" line which can be different in languages.
Even tougher: "Next update is within %1$d weeks and %2$d hours"
In these cases ngettext can't handle them unless you feed 2 strings and merge the 2 strings like:
So giving your example:
"I have %d apple and %d banana"
"I have %d apples and %d banana"
"I have %d apple and %d bananas"
"I have %d apples and %d bananas"
I know of situations where this is still an issue in certain languages but it just depends when it ever will happen ;)
Posts: 7934
Yup, that's all we need. We specify two plural forms for english (one vs. many) and provide a count. Localizers use a plural forms definition to describe how many plural forms that they have and then create a localization accordingly, and gettext chooses the correct form based on the rule. The rules get quite complicated, eg:
http://www.gnu.org/software/gettext/manual/html_mono/gettext.html#SEC150
This is a little simplistic, since you'd have to apply a rule to get the specific translation here. But yes, something along those lines should do it.
I agree -- this sucks. I haven't found a solution anywhere that handles this properly. We are forced to break it up into two strings with two different counts and then merge the strings together in order to make this work (and that's always error prone).
The other big advantage we get from gettext is that it's a standard format which means that there are lots of tools out there to audit .po files, assist in translation, do reporting on them, etc. We've definitely found that useful in the past.
I'm not married to gettext -- I'll happily use a better solution if one comes along. As you've seen in our code, we have a single place where we make all gettext calls so it would be easy for us to switch to a different localization scheme if there's a better one around. Certainly php-gettext looks like a viable alternative. We'd very much prefer to keep the gettext .po format since we have a considerable amount of localized data in that format, but I imagine we could write a tool to migrate it to a different data structure if a better one came along.
If you can help us dig up a thread safe replacement for gettext that matches it on translation features (a must), has comparable performance, and has some kind of reasonable tools support I'd strongly consider switching. I hate the fact that we're bound to the system locales!
Posts: 17
ok i will keep you all posted when i find one ;)
Posts: 172
I have a question about the plural form. In Chinese there's not difference between singular and plural. Is it possible to specify it in the PO files? Currently I translate both the msgstr[0] and msgstr[1] to the same string. It seems to be a waste.
Posts: 172
OK. I think I figured it out. Just remove or comment out msgstr[1] and change the header to "Plural-Forms: nplurals=1; plural=0;"
Updating translation now.
Posts: 7934
Thanks
So what's the link to your new CMS under development? I'll keep tabs on the approach you're taking and see if we can benefit from something along the same lines. And come on by #gallery on freenode if you want to talk about this in real time.
Posts: 7934
Great! Let us know how it goes...
Posts: 172
I tried it on my own site and it seems to work. So I updated and submitted the change. Hope I didn't just break someone's site.
Posts: 32509
stephenju, cool
Can you add some docs on http://codex.gallery2.org/index.php/Gallery2:Translations such that others with the same problem (no plurals needed) know what to do? that would be great!
(everyone can register and contribute on codex.gallery2.org)
Posts: 172
valiant, will do. Have to get the kids to school right now. First day of school. %&*#%^&...
Posts: 17
I'm sorry but i won't say much about the CMS core untill there's a good stable base structure, for now you could look at some base parts i commited to cvs at http://cvs.moocms.com/moo/
For now it just uses a multi array and a custom template engine that also has language possabilities, like inside template files: {L_([a-z0-9\-_]*?)}
Also i think there will be not much benefit since the system is full blown PHP5 in OOP.
However i do think the benefit of G2 (or a php5 version) in combination with my CMS could become endless, but i prefer to talk with some devs in private about it.
Posts: 17
Finally i've figured it out and the solution is pretty simple.
i use $LNG array with values like:
$LNG['year'] = array('%d year', '%d years');
now here's my ngettext equivelant
function nget($var, $n) { $txt = $LNG[$var]; $i = ($n == 1 ? 0 : 1); if (!empty($txt['plural'])) { $total = count($txt)-2; $plural = str_replace('n',$n,$txt['plural']); eval("$i = ($plural)"); if ($i > count($txt)-2) { $i = count($txt)-2; } elseif ($i < 0) { $i = 0; } } return sprintf($txt[$i], $n); }Now the trick:
when your language isn't of the default 2 forms then you add another entry in the array with the key 'plural'
This will result in:
nget('year', 1): 1 year
nget('year', 2): 2 years
nget('year', 10): 10 decades
ofcourse it's just an example, but you get it.