export using httrack et.al.

JoergSchulz

Joined: 2003-05-21
Posts: 31
Posted: Fri, 2005-04-01 17:14

We could export our G1 albums using httrack/wget. This does not seem to be possible in G2 - will this be possible in the production release? Will there be other tools?

----

Gallery version: 2Beta1
Webserver apache Mac Apache/1.3.33 (Darwin) configured
Datatabase (with version): mySql
PHP version (eg 4.2.1):
phpinfo URL (optional):
Graphics Toolkit(s):
Operating system: darwin
Web browser/version: firefox+safari
G1 version (for migration bugs): n/a

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2005-04-01 17:55

No, this is not possible in G2, because with tools like "wget" you can easily work around the permission system of G1, i.e. even if you don't want the public to see/download certain images, everyone can get them with wget (or a browser), if the url to the image is known.

In G2, you can't access the images/albums directly, thus the permissions you set are respected and can't be worked around.

Alternatives in G2:
- You could write client tools like GalleryRemote to export whole albums.
- You could write G2 modules that let you export whole albums
- The "download zip file" works similarly.

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Fri, 2005-04-01 18:17

probably JoergSchulz just wants to do a wget export of publically visible albums/images... shouldn't this be possible?

 
JoergSchulz

Joined: 2003-05-21
Posts: 31
Posted: Sat, 2005-04-02 18:21

Yes, this is one of the features I liked with G1: It was possible to burn albums on a cd (old- and new fashioned friends of mine are either offline or have only low-bandwidth connections). The download ZIP does not retain the Gallery functionality.

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Sat, 2005-04-02 20:28

if you'd like to help us implement this please do the following:
1) try it.. export your G2 using wget or similar and test it out.. find out what links don't work (for example, login won't work too well...)
2) file a feature request on sourceforge; include the information you find in your tests, as this will help the person who implements the feature for G2.

 
JoergSchulz

Joined: 2003-05-21
Posts: 31
Posted: Sun, 2005-04-03 10:58

first thing to do would be to create an offline mode. Otherwise the full gallery will be fetched - my setup contains about 5.000 images. I cannot test with this amount of images.
I file a feature request next thing.

O.T.: how to setup a second G2 instance on the same machine? This would be testable.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sun, 2005-04-03 11:21

offline mode? and how would that help only fetching a part of your albums? you know that you can tell wget which links to follow, and which not. wget is very powerful. don't think you need an offline mode. maybe you better explain what you mean by it.

@second G2 instance: you can have a second G2 instance with its own g2data directory and its own database tables (its own database or its own database prefix), but using the same gallery2 application files. just run the installer again and choose "multisite".

 
JoergSchulz

Joined: 2003-05-21
Posts: 31
Posted: Sun, 2005-04-03 11:33

Offline Mode: G1 had an offline mode that disabled certain links (i.e. login). You are right - the problem of recursing into other directories had to be solved in G1 by disabling the RSS file instead of the offline mode.

I try and setup a second instance and test wget.

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Sun, 2005-04-03 17:10

right, the G1 "offline mode" disables certain links so that the pages wget retrieves are nicely browsable.. based on the results of your tests we can decide if G2 needs such a mode.. for example, "offline mode" in G2 could using GalleryCapabilities to remove login links. Also possibly disable breadcrumb links for higher level albums to enable retrieving only a subtree of the gallery..

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Sun, 2005-04-03 17:16

moved this topic to the dev forum..

 
JoergSchulz

Joined: 2003-05-21
Posts: 31
Posted: Mon, 2005-04-04 20:34

<OT> Ok; my mac apache setup is flawy, no 2nd instance available. No problem - this will be solved later. Lets turn to linux; g2setup worked flawlessly again as expected. Compliments. My mini G2 has 3 albums (2 main, 1sub) with 5 images</OT>

HTTRACK:
httrack http://regentag/gallery2/main.php/v/Barcelona/
starts to copy ALL photos, not only the selected sub-album.
In the process, an endless loop collects the non-desirable Action Items repeatedly in an endless loop. Bad. Had to interrupt that.
The resulting website copy looks surprisingly well - the pictures are completely available and reachable locally (well - more than I wanted: I only wanted one sub-album. We will come to the sub-album tests later).
The interactive links like "login, search, AddToCart, module:link to other directories, ViewCart, MemberList" point to - surprise - the original site.

Obviously, what we need is a set_offline mode that eliminates these buttons and that forbids recursive gets of other directories.

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Mon, 2005-04-04 22:09

1) it gets all the albums because the top bar has links to the parent albums.. offline mode in G1 omits these.. i think we can do the same
2) seems like an infinite loop would be an httrack bug, no? unless G2 someone keeps generating unique urls, but i don't know why that would happen.. maybe sessionid keeps changing?
3) how would you define the things to be disabled? G2 supports modules so you can never know all the things that might be present.. i suppose each module could choose to support "offline mode" and omit the appropriate content..

anyway, collect your findings and file a feature request on sourceforge.. thanks for the research.

 
JoergSchulz

Joined: 2003-05-21
Posts: 31
Posted: Mon, 2005-04-04 22:23

ad 1) fine..
ad 2) wget has the same problems; difference: wget starts with these URLs and never gbets to the photos.
ad 3) sounds like a bigger change: eache module should be supposed to support an offline mode (otherwise it should be excluded by a master-offline mode).

ad sourceforge: I add the above findings as comment to the existing request.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Mon, 2005-04-04 22:30

couldn't wget identify itself as google and then the same google optimizations concerning urls would make it work for wget? alternatively, wget can be configured to accepts cookies, right?

 
JoergSchulz

Joined: 2003-05-21
Posts: 31
Posted: Mon, 2005-04-04 22:37

would this allow to keep the look-and-feel of G2?

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7934
Posted: Tue, 2005-04-05 03:18

Doesn't matter who you identify yourself as right now; it's up to the layout to decide what to show you and layouts don't currently have a concept of "offline mode" so they'll show you the links, etc.

You could write a simplified layout that only shows images and force that on all your albums and use wget on it...

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Tue, 2005-04-05 04:05

i was thinking a particular user-agent string could have a little block in GallerySession that makes GalleryCapabilities calls to restrict what gets shown.. for example, disabling the sidebar and login links would remove many of the problematic things mentioned so far.

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7934
Posted: Wed, 2005-04-06 05:30

That's a pretty clever idea. I can't think of a reason why it's bad...

 
jmullan
jmullan's picture

Joined: 2002-07-28
Posts: 974
Posted: Wed, 2005-04-06 07:23
JoergSchulz wrote:
httrack http://regentag/gallery2/main.php/v/Barcelona/
starts to copy ALL photos, not only the selected sub-album.

in wget, one can use the --no-parent option to prevent crawling upwards. This won't work for non-short-urls

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Wed, 2005-04-06 16:29

jmullan, that's a great idea.. one problem though: i tried that and it only got the album and subalbums, but it didn't get the images, since the /d/{id} url path is a "parent" of the /v/{album} path where i started my wget.. any idea how to make sure it retrieves the actual images?

Edit: the css and static images also still point to the actual site.. hm..

 
jmullan
jmullan's picture

Joined: 2002-07-28
Posts: 974
Posted: Wed, 2005-04-06 16:37

Shoot. Um. Unless you can find something in the man pages for wget, I don't know.

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Wed, 2005-04-06 16:54

yes, along similar lines i think i found something.. makes for a longer command line, but seems to work:

wget --recursive --convert-links --html-extension --include-directories='/gallery2/v/myalbum,/gallery2/d,/gallery2/themes,/gallery2/layouts,/gallery2/templates,/gallery2/images' http://mydomain/gallery2/v/myalbum/

This gets all the items and subalbums in "myalbum" along with images and css.

 
JoergSchulz

Joined: 2003-05-21
Posts: 31
Posted: Thu, 2005-04-07 19:47

nope: The images are not fetched; there are only links to the images in the original gallery.

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Thu, 2005-04-07 19:57

with /gallery2/d in the include directories it retrieves all the images for me.

 
JoergSchulz

Joined: 2003-05-21
Posts: 31
Posted: Thu, 2005-04-07 20:24

ahem - not for me:

cmd Line:
wget --recursive --convert-links --include-directories='/gallery2/v/spaceoddities,/gallery2/d,/gallery2/themes,/gallery2/layouts,/gallery2/templates,/gallery2/images' http://regentag/gallery2/main.php/v/spaceoddities/

similar effect with httrack:
httrack -D http://regentag/gallery2/main.php/v/spaceoddities/
I get NO thumbnails (only the link to the original URLs instead), but the images in intermediary size, but not the images in full size. The navigation structure is included; the interactive links as well.

Obviously we are on a wrong track. We definitely need an option for g2 that allows for image downloads w/out the interactive links. the wget/httrack options alone don't help.

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Thu, 2005-04-07 20:29

maybe that's because you have main.php in your url? if your rewrites are setup that way then /gallery2/main.php/d may be what you're looking for.

 
JoergSchulz

Joined: 2003-05-21
Posts: 31
Posted: Thu, 2005-04-07 20:51

no - the whole gallery gets loaded then. And the mentioned problem with the recursive fetching of all interactive links reappears. No Way.

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Thu, 2005-04-07 22:04

i don't get the impression you're really trying to figure out how to use wget properly.. i just told you the result of my test, complaining to me about it won't help.
what is the actual hostname for your G2?

 
JoergSchulz

Joined: 2003-05-21
Posts: 31
Posted: Fri, 2005-04-08 19:42

My gallery is currently only available in an intranet.
I Will try and get a grip on httrack/wget->G2 tomorrow night. The idea is to get back to the best option and possibility I had (that was: look-and-feel available, thumbnails available, fullsize pictures not) and find a possibility to add the urls to the midsize/fullsize images.
No - I won't only complain. I want to and will contribute to the great G2 project.
Lets do it the other way round: do you have a real-small test site I can connect to my wget/httrack? We can then see the differences between our results.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2005-04-08 19:53

really small test site: http://nei.ch/test/gallery2/
(26 items total)

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Fri, 2005-04-08 20:52

you can try this patch of modules/core/classes/GallerySession.class which removes the sidebar and login links when you access the gallery with wget:

diff -u -r1.65 GallerySession.class
--- modules/core/classes/GallerySession.class   3 Apr 2005 08:42:05 -0000       1.65
+++ modules/core/classes/GallerySession.class   8 Apr 2005 20:50:08 -0000
@@ -177,7 +177,7 @@
        $forceResendCookie = false;

        /* Verify the remote address to avoid casual session hijacking */
-       $currentRemoteIdentifier = $this->_getRemoteIdentifier();
+       list ($currentRemoteIdentifier, $userAgent) = $this->_getRemoteIdentifier();

        if (!isset($this->_remoteIdentifier) ||
            $this->_compareIdentifiers($this->_remoteIdentifier,
@@ -205,6 +205,13 @@
            }
        }

+       if (!strncasecmp($userAgent, 'wget', 4)) {
+           /* Offline mode */
+           $this->_isUsingCookies = true;
+           GalleryCapabilities::set('login', false);
+           GalleryCapabilities::set('showSidebar', false);
+       }
+
        if (!isset($_COOKIE[SESSION_ID_PARAMETER]) ||
                $_COOKIE[SESSION_ID_PARAMETER] != $this->_sessionId ||
                $forceResendCookie) {
@@ -569,7 +576,7 @@
        $this->_sessionData = array();
        $this->_loadedSessionData = array();
        $this->_creationTime = time();
-       $this->_remoteIdentifier = $this->_getRemoteIdentifier();
+       list ($this->_remoteIdentifier) = $this->_getRemoteIdentifier();
     }

     /**
@@ -583,8 +590,9 @@
      */
     function _getRemoteIdentifier() {
        $httpUserAgent = GalleryUtilities::getServerVar('HTTP_USER_AGENT');
-       return array(GalleryUtilities::getRemoteHostAddress(),
-                    isset($httpUserAgent) ? md5($httpUserAgent) : null);
+       return array(array(GalleryUtilities::getRemoteHostAddress(),
+                          isset($httpUserAgent) ? md5($httpUserAgent) : null),
+                    $httpUserAgent);
     }

     /**

I will probably put this or something similar into cvs after beta-2 is released next week.

 
JoergSchulz

Joined: 2003-05-21
Posts: 31
Posted: Sat, 2005-04-09 20:53

patch didn#t work for me (see PM);
further tests postponed to Monday night (don't feel well, sorry)

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Sat, 2005-04-09 21:29

dunno if that should apply with 'patch' or not.. it's small enough i thought it would be a manual job.

 
jnm11

Joined: 2006-12-31
Posts: 1
Posted: Sun, 2006-12-31 11:11

This thread seems to have died but it is an important issue for me and others.
I've been experimenting to try and get this to work.
I used to use wget with the offline flag in Gallery1 which worked very well
If you try this in Gallery2 however with the following
wget --recursive --convert-links --html-extension -p 'http://localhost.localdomain/gallery/'
It gets all the site and creates a directtory structure
./localhost.localdomain/gallery/...
but all the pictures end up in ./ so the links to them are broken. I don't understand this problem. I've looked at the wget documentation but can't figure it out.

httrack does a good job in default mode
httrack http://localhost.localdomain/gallery/main.php
will build a mirror in ./
the problem is with login links, serach bars etc which cause infinite recursion so httrack never terminates.
This can be fixed by limiting the recursion depth, but this is ugly and will still result in a huge number of unnecessary files.

The best approach seems to be to turn off as many of the interactive buttons as possible in the theme, but it does not seem possible to disable the login button.
A very nice option would be to have this only display in the top gallery frame.

This can be worked around however by using filters

For example
httrack http://localhost.localdomain/gallery/main.php -localhost.localdomain/*search* -localhost.localdomain/*Login*
will follow all links that do not include the "search" or "Login" in the urls.
Note that in httrack syntax the - means to exclude anything matching the pattern.

These links to login etc.. are then not followed but will be left pointing to the orginal web page.

Could this perhaps be added to the FAQ as there seems plenty of interest in static copies?

In the longer term it would be very nice to have a "?static=true" option which will give a gallery view without any links to dynamic actions such as comments or logins

 
jonmccune

Joined: 2007-07-18
Posts: 1
Posted: Wed, 2007-07-18 15:40

Has any more progress been made here? I would really like to be able to quickly and easily make an offline copy of gallery2 albums.

Thanks,
-Jon

 
mindless
mindless's picture

Joined: 2004-01-04
Posts: 8601
Posted: Wed, 2007-07-18 22:58

Don't think so, but there are some tips and ideas above.