Hebrew import problems from G1 to G2 -SOLVED!

itamarl

Joined: 2007-06-09
Posts: 16
Posted: Sat, 2007-06-09 16:08

Hi All,

I have searched for this all around and I apologize if this is right under my nose.

I am upgrading my gallery (around 9000 pictures) from G1 to G2.
the problem I am getting is that when text is in hebrew it is imported as various numbers preceded with a % sign.

When inserting hebrew text directly to G2 it works fine.
I have tried various codepages in the import stage with no avail.

Forgot to mention, It's a Debian Lenny machine with the Debian packages.

Thanks for your help and for this project.

Itamar.

 
itamarl

Joined: 2007-06-09
Posts: 16
Posted: Mon, 2007-07-16 14:38

5 weeks and no word...
Is this a knows issue in Gallery2 Imports of Hebrew?
I can see that the encoding of the Hebrew text in Galley (1) is ISO-8859-1 (and there is no import option for ISO-8859-1 Hebrew).
In the first import window (where you choose the albums to import) I see gibberish instead of Hebrew characters. Surprisingly when I click next I see the Hebrew characters correctly in the list of to-be imported albums.

Regretfully after the import I see that all Hebrew was imported as gibberish.

Help! Please!

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Wed, 2007-07-18 07:00

i thought it's just a matter of selecting the right character encoding on import.
are you sure it's ISO-8859-1 and not ISO-8859-8?

--------------
Documentation: Support / Troubleshooting | Installation, Upgrade, Configuration and Usage

 
itamarl

Joined: 2007-06-09
Posts: 16
Posted: Wed, 2007-07-18 09:58

Thank you for your kind reply!

No, sorry..
ISO-8859-8 gives the same results.
the outcome is this:
[img]http://www.itamar.org/albums/General/import.jpg[/img]

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Wed, 2007-07-18 17:35

1. what mysql version are you using?
2. looks like all your Hebrew characters are HTML entities.
i guess a small modification to the import (migrate) module needs to be done to convert the HTML entities properly to UTF-8 characters.

i have the feeling that this has been discussed in the forums already. but i'll update this thread here as soon as i have some time.

--------------
Documentation: Support / Troubleshooting | Installation, Upgrade, Configuration and Usage

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Wed, 2007-07-18 17:44

you'll have to edit a file:
file modules/core/classes/helpers/GalleryCharsetHelper_simple.class

find:

    function convertToUtf8($string, $sourceEncoding=null) {
	global $gallery;
	$phpVm = $gallery->getPhpVm();

	if (empty($sourceEncoding)) {
	    $sourceEncoding = GalleryCharsetHelper_simple::detectSystemCharset();
	}
	if (empty($sourceEncoding) || !strcmp($sourceEncoding, 'UTF-8')) {
	    return $string;
	}

	/* Iconv can return false, so try it first.  If it fails, continue */
	if ($phpVm->function_exists('iconv')) {
	    if (($result = $phpVm->iconv($sourceEncoding, 'UTF-8', $string)) !== false) {
		return $result;
	    }
	}

	if ($phpVm->function_exists('mb_convert_encoding')) {
	    return $phpVm->mb_convert_encoding($string, 'UTF-8', $sourceEncoding);
	} else if ($phpVm->function_exists('recode_string')) {
	    return $phpVm->recode_string($sourceEncoding . '..UTF-8', $string);
	} else {
	    GalleryCoreApi::requireOnce(
		'modules/core/classes/helpers/GalleryCharsetHelper_medium.class');
	    $charset =& GalleryCharsetHelper_medium::getCharsetTable($sourceEncoding);
	    if (isset($charset)) {
		return preg_replace('/([\x80-\xFF])/se',
				    '$charset[ord(\'$1\')]',
				    $string);
	    }
	}
	return $string;
    }

replace with:

        function old_convertToUtf8($string, $sourceEncoding=null) {
	global $gallery;
	$phpVm = $gallery->getPhpVm();

	if (empty($sourceEncoding)) {
	    $sourceEncoding = GalleryCharsetHelper_simple::detectSystemCharset();
	}
	if (empty($sourceEncoding) || !strcmp($sourceEncoding, 'UTF-8')) {
	    return $string;
	}

	/* Iconv can return false, so try it first.  If it fails, continue */
	if ($phpVm->function_exists('iconv')) {
	    if (($result = $phpVm->iconv($sourceEncoding, 'UTF-8', $string)) !== false) {
		return $result;
	    }
	}

	if ($phpVm->function_exists('mb_convert_encoding')) {
	    return $phpVm->mb_convert_encoding($string, 'UTF-8', $sourceEncoding);
	} else if ($phpVm->function_exists('recode_string')) {
	    return $phpVm->recode_string($sourceEncoding . '..UTF-8', $string);
	} else {
	    GalleryCoreApi::requireOnce(
		'modules/core/classes/helpers/GalleryCharsetHelper_medium.class');
	    $charset =& GalleryCharsetHelper_medium::getCharsetTable($sourceEncoding);
	    if (isset($charset)) {
		return preg_replace('/([\x80-\xFF])/se',
				    '$charset[ord(\'$1\')]',
				    $string);
	    }
	}
	return $string;
    }

    function convertToUtf8($string, $sourceEncoding=null) {
        $string = GalleryCharsetHelper_simple::old_convertToUtf8($string, $sourceEncoding);

        return GalleryUtilities::unicodeEntitiesToUtf8($string);
    }

then try the import again.
once you're done with the import, revert your change to the above mentioned file (just restore the original file).

--------------
Documentation: Support / Troubleshooting | Installation, Upgrade, Configuration and Usage

 
itamarl

Joined: 2007-06-09
Posts: 16
Posted: Thu, 2007-07-19 06:43

VALIANT!!!!!!!

YOU THE MAN!!

This works immaculately!
Thanks very much (just donated 10$ via paypal!)

Will this be fixed in future releases?
Well, I don't really care for once I migrate this doesn't really matter for me - but what about future upgraders that will try migrating a hebrew G1?

Thanks again Valiant!
Itamar.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Thu, 2007-07-19 13:20