Is G2 the solution for a large, commercial image repository?

salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Fri, 2006-01-20 02:01

Hello,

We are trying to create an image repository.

I've read nearly every post in this forum and I've installed G2. Before I get too far along the path, I wanted to ask is G2 the solution for a large, commercial image repository similar to freefoto or istockphoto?

The solution will need to ...

1. Support upwards up millions of images
2. Support upwards of thousands of users
3. Have an efficient categorization and album structure that is scalable.
4. Have an efficent search/sort method that enables photo users to find what they need and ideally contribute to the tagging of images to constantly improve accuracy.
5. Support Digital Asset Management (DAM) including payment methods.

In reading the forum threads, for example, I found that the user album modules degrades performance after a thousands users or so. Thus, assigning an album to each photographer might not be the way to go.

The site will be targeted toward to groups of users: photographers and photo users.

PHOTOGRAPHERS
1. Will require a (DAM) similar to istockphoto.com or the like.

PHOTO USER
1. Will need to be able to find and organize images they'd like to use.
2. Will need to be able to acquire the image (ex. via download, email, code snippet, etc.)

The above is a simple overview. Certain there are details that will present themselves as we proceed.

Any help is appreciated.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2006-01-20 03:32

http://www.care2.com/ uses a modified G2 version and they've built a couple of services around g2 by writing g2 modules.
they have thousands of users and over a million images in their g2.

> 1. Support upwards up millions of images
> 2. Support upwards of thousands of users
that should not be a problem

> 3. Have an efficient categorization and album structure that is scalable.
g2 organizes content hierarchically in albums. there are no categories, just nested albums.
care2 and other websites have added tagging to g2 or relational categories, but these features are not included in a stock-g2.

> 4. Have an efficent search/sort method that enables photo users to find what they need and ideally contribute to the tagging of images to constantly improve accuracy.
care2 added tagging. there's a user contributed module for tagging too, but it's not mature.
g2 has various sort methods. the search functionality yet lacks of boolean and full-text search.

> 5. Support Digital Asset Management (DAM) including payment methods.
G2 is not specialized for DAM. there are some modules and features that go into the direction of DAM though.

G2's strength is its extensiblity, modularity and quality, as well as the increasing number of official and user contributed modules and themes.

@user albums and performance:
yes, we've identified this problem about a month ago and we will have to fix it. we already have plans, but not yet the development resources. though, g2.1 will probably have page and template level caching which should performance.

 
salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Fri, 2006-01-20 03:53

valiant,

Thank you for the feedback. I'll take a look at care. One quick question, if assigning an album to each user is not the way to go as far as performance, how have companies like care overcome the problem? I'm assuming, without having looked at the care site yet, that members of the care community each have a "space" for their pictures. This "space" is tantamount to an album ... how then does care support thousands on such spaces?

Thank you

 
ckdake
ckdake's picture

Joined: 2004-02-18
Posts: 2258
Posted: Fri, 2006-01-20 04:02

They haven't changed the idea behind G2, just the way a few things work. If I remember right, they removed the image firewall and some of the permissions code among other things. We are working to make it such that people using Gallery 2 for huge systems won't need to make modifications like this.

 
salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Mon, 2006-01-23 01:14

chdake,

Quote:
they removed the image firewall

If I understand this correctly, the image firewall is what previews images from being viewed directly (or outside of G2). This says to me that the Eiffel tower at this URL http://www.care2.com/c2c/photos/view/168/771394407/Le_Tour_Eiffell/le%20tour%20eiffell.bmp.html would not be viewable directly at http://dingo.care2.com/pictures/c2c/galleries/albums/168/771394407/Le_Tour_Eiffell/le%20tour%20eiffell.bmp if the firewall were in place ?

Have I understood you correctly?

 
ckdake
ckdake's picture

Joined: 2004-02-18
Posts: 2258
Posted: Mon, 2006-01-23 01:23

the image firewall in gallery2 allows you to store images in a folder not accessable by your webserver so that in a url example.com/view/album/image.jpg, this is actually routed through a php file that verifies permission and then gets the data from the filesystem and sends it to the browser. Removing the image firewall removes this php "guard" file so that the view/album/image.jpg is an actual path to an image on the filesystem and the image is returned by the webserver and not a php script.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Mon, 2006-01-23 03:14

or with other words:
- without the image firewall, the webserver decides who has access to download (see) a file and who doesn't. since you don't want users to enter their password into a apache autentication popup window, you can only allow downloading to all users (including non-authenticated / non-logged-in users).
- with the image firewall, the application (Gallery 2) can check permissions to see who is allowed / not allowed to download the file.

this makes sense if you want to be absolutely sure that a user has to be authenticated and authorized to see/download your images/files. setting permissions in a web application to control access to files without having an image firewall doesn't make sense, unless you don't really care about permissions.

 
salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Mon, 2006-01-23 03:23

Does the presence of the firewall greatly slow down g2? in other words, why would http://www.care2.com/ remove the firewall AND YET STILL have a membership/photo management scheme?

 
ckdake
ckdake's picture

Joined: 2004-02-18
Posts: 2258
Posted: Mon, 2006-01-23 04:32

It's slows down performance enough to matter for the type of volume they push through.

Even without an image firewall, people can't upload/change photos and albums and can only access images that they know the direct URL to. An image firewall is nowhere near the same thing as a member/photo management system.

 
salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Mon, 2006-01-23 06:31

valiant and ckdake,

first off, I want to say "Thank You" and "I appreciate your help" as I work through (stumbling at times -- 4 install of G2 so far) this testing for our large scale repository.

Current question:

Despite the advice that the User Albums modules will degrade performance once thousands of user have beeen created, I have installed the module for now. Why? Because I did not see how else to have separate albums for each photographer that joins.

Do either of you know whether care2.com did their install as a multisite and so these two user's galleries are in fact 2 installs of G2 using a common codebase: http://www.care2.com/c2c/photos/view/239/239742579/ and http://www.care2.com/c2c/photos/view/35/282929378/ (as examples) each with common themes to maintain the care2 look?

Getting this to work would require automating the normal install process, such as g2data dir., config.php, database and tables, and site admin username and password to name a few things.

And when either of the two gallery users logs in, they are in effect the site admin of that gallery with many normal admin features disabled (ex. there is no way to manage photo permissions).

And then it follows that the care2 photo home page uses the API to 'stitch together' the many galleries into a coherrent launch point http://www.care2.com/c2c/photos/.

Am I thinking along the right lines?

 
ckdake
ckdake's picture

Joined: 2004-02-18
Posts: 2258
Posted: Mon, 2006-01-23 06:39

IIRC, their code is a fork off of G2 Alpha-4. I think that was before multisite happened. Their stuff is heavily customized, but I gave them as an example just to show that G2 has some issues to be addressed with performance but can be huge with the right care and modifications.

Just set up G2 and use it as you like, and as you run into performance problems, come back here and we'll do everything we can to make sure they get fixed. This is the best way for us to solve this kind of problem as most of us don't have systems huge enough with enough load to see what's goin on with some things and the fix is often as simple as a new index on a column in a table in the database or changing the way a cache works.

 
salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Mon, 2006-01-23 07:06

Thank you.

Quick question: All other things being equal, which would have better performance? 1) A single install having 10,000 user albums each created via the User Album module or 2) 1 codebase with 10,000 multisite installs (separate database tables for each) -- this would essentially be the equivalent of an ISP offering G2 to its customers?

Thanks

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Mon, 2006-01-23 11:55

with the current user album / ACL performance issue, a single install with 10000 user albums is expected to be really slow. so 10000 separate installations would be clearly faster. of course, 10000 separate installations are not as easy to maintain than a single one.
there's still no scripted multisite upgrading. when upgrading, you still have to replace the codebase only once, but run the upgrader 10000 times.

 
salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Mon, 2006-01-23 22:28

I've being thinking about this scaling issue and had the following idea.

Challenge: Over 1000 or so user albums, the system degrades to an unacceptable level; however, thousands of separate installs would prove unmanageable.

Solution: clusters of single installs using User Album module capped at < 1000 albums. For example, to effectively have 10,000 user album we might have 10 single installs of G2 each with 1000 user albums:

domain.com/g1/
domain.com/g2/
domain.com/g3/
.
.
.
domain.com/g10/

Each install would use the same set of themese to maintain look and feel. A custom page would reside at domain.com/index.php that would use the various API's to "stick together" the various galleries into an coherent launch point. Additionally, rather than 10,000 upgrades, only 10 would be needed.

Q: What optimal User Album cap would you suggest?

Perhaps you can comment and point out any "gotchas".

Thank you

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Mon, 2006-01-23 22:36

i fear 1000 user albums is already too much right now. the situation is bad and it's not fixed. also, i don't like such real-world trade-offs just because this part of g2 isn't yet ready for prime-time.
i'll notify another g2 user/dev who has actually a g2 with a lot of useralbums working. maybe he has more insight into the real-world numbers.
be sure that we will eventually fix the problem, but right now we don't have the resources to do it.

 
salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Mon, 2006-01-23 22:54

valiant,

I must say again I really appreciate your help here. Your insight and advice has been invaluable.

Further, I respect that you're honest in pointing out that while G2 is a great solution, it doesn't do everything well and is being improved.

Quote:
i don't like such real-world trade-offs

Are you saying that you would not advise I use my "cluster" idea? If not, given what is available, how would you advise I proceed. If you are suggesting that given the situation that I try the clustering, what gotchas do you see other than lowering the cap?

Thank you

 
joe7rocks
joe7rocks's picture

Joined: 2004-10-07
Posts: 560
Posted: Mon, 2006-01-23 23:14

Hi salimjordan!

I guess i'm the 'another g2 user/dev who has actually a g2 with a lot of useralbums working'.. :)
let me be short now:
'lot'=1600+ users currently with 100K+ pics
speed: would be terrible, but i made some modifications to core code (or 'bad tricks', e.g. speedup in most cases=loss of some feature) just to speed it up a bit, to be able to handle the load (what i can currently handle on my setup with 0.9 avarage load..)

so as valiant wrote, the situation is quite bad for this kinda usage, and it's not _yet_ fixed.
The bigest problem atm is the lack of resources...
What i can tell you is, that i have this problem too, i need a fix and i'm gonna fix it, anyway :) though i can start to work on it only in febr, so you would have to wait.. weeks anyway.
(the mentioned page and template level caching will also increase performance, so you might can handle more users and more pics with less resources..though -imho- that's not a real fix _for this_)

so to sum it up in a sentence/answer:
yes, g2 gonna be..

 
salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Mon, 2006-01-23 23:45

joe7rocks,

Thank you. It is comforting I know I'm not alone ...

In your opinion, at what number of User Albums does performance become unacceptable (acceptable = 15 to 20 sec page loads).

Any comment on the "cluster" solution I describe above?

Thank you

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2006-01-24 00:36

you call a 15-20 sec page load acceptable? that's horrible from a user's point of view.
IMO 5 seconds are the absolute maximum. don't you get really annoyed if you have to wait that long when browsing websites?

I don't know the answer to your question. It depends of course also on the load of the server, whether you have a dedicated server for the database etc.
I'd expect the performance to drop drastically for more than 100/200 user albums if there are a lot of requests, but i've never tested it myself.

 
salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Tue, 2006-01-24 01:44

joe,

thanks again. to be honest I was pushing the envelope by saying 15 to 20 secs. i totally agree, but I thought 5 secs was just unrealistic under current conditions.

we do have dedicated servers and databases available. we are a hosting company. just looking for the best way to begin to get off on the right foot.

 
salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Tue, 2006-01-24 06:46

Hello again,

As you are limited on resources, what could we provide to further the rapid development of a scalability fix?

Thank you

 
ckdake
ckdake's picture

Joined: 2004-02-18
Posts: 2258
Posted: Tue, 2006-01-24 08:04

salimjordan:

Just use G2 and let us know what problems you have. The more specific you can be about the problem, the more we can do to fix it. If you file a bug with the details of a specific problem, we can work on solving it, you can verify that the fix works, and each new problem can be a new bug. We have the resources to handle large scale performance issues this way, just not necesarily the resources to independently set up huge test scenarios with lots of load, find problems, and fix them without real world feedback.

 
joe7rocks
joe7rocks's picture

Joined: 2004-10-07
Posts: 560
Posted: Tue, 2006-01-24 19:21

What number: mostly the number of requests matters, not just the number of useralbums
+installed modules (like imageblock etc.) matters (quite a lot) too
+hw config in use

in numbers:
while i can currently handle ~200K hits/day with an 0.9 avarage load (and the mentioned user/album#), i would expect it (unmodified/typical install of g2.0.2) to die (=lots of page loads over 5secs) with no more than 1000 useralbums/this amount of requests/day on any config at about 2.0GHz cpu/1GB mem/etc. (and i guess it wouldn't be much better with any higher hw config either..)

'cluster': my oppinion is quite simple: forget that. :)

 
salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Wed, 2006-01-25 02:01

joe7rocks,

I'd like to take a look at your gallery; would you provide a URL?

Also, would you be willing to talk privately about getting an optimized, large scale gallery up?

Thank you

 
joe7rocks
joe7rocks's picture

Joined: 2004-10-07
Posts: 560
Posted: Wed, 2006-01-25 02:20

url, sure: http://gallery.site.hu . (currently 1600 regged user, 110K+ pics, and slighly over avarage load as i just gave a test run to imageblock module with randompic..)

support: feel free to ask here, or on freenode #gallery-support, but if you really want you can pm me too _relating this_..

 
salimjordan

Joined: 2006-01-20
Posts: 18
Posted: Wed, 2006-01-25 04:11

Gentlemen,

I've started down the road to getting the repository up.

1. Here is a confirmation email I received for my test user

Quote:
Hello Salim Jordan,

You receive this email because you have registered at
http://www.ambercode.com/dev/main.php?g2_navId=x55664b78
Your username is: sjordan

To finish the registration process please click the following link:
http://www.ambercode.com/dev/main.php?g2_controller=register.ConfirmRegistration&g2_username=sjordan&g2_key=G9759812&g2_navId=x55664b78

In case somebody else abused your email address for this registration
please ignore this email. In this case the registration does not become valid
and you will not receive any further emails.

Thank you!

Please explain why the URL for the site at which the gallery is hosted is http://www.ambercode.com/dev/main.php?g2_navId=x55664b78 as opposed to http://www.ambercode.com/dev/main.php (or even http://www.ambercode.com/dev/)?

2. How do I prevent a user from deleting his own User Album. The User Album functions as essentially the doc root for all additional albums and items the user might add. Deleting the doc root doesn't make much sense. But alas the user is able to do it.

Thank you.

 
ckdake
ckdake's picture

Joined: 2004-02-18
Posts: 2258
Posted: Wed, 2006-01-25 15:31

salimjordan: it would be helpful if you posted a new topic for each of your issues so they don't get lost in this thread and other people searching can easily find it.

1. The g2_navid is essentially keeping track of previoss pages.. I've never used the registration module so I'm not sure if it's supposed to show up but if you post a new thread someone else may know.

2. You should be able to remove core.delete for that user from their user album to prevent them from being able to delete it.

 
nmahlin

Joined: 2005-09-26
Posts: 19
Posted: Tue, 2006-03-21 18:40

First off great job on gallery2! I love working with a robust system like gallery and always look forward to your upcoming releases.

We have been using gallery2 for a film archive that contains users digitized 8mm film, super8 film, 16mm films. We have 5000+ movies with 4000+ users and are currently using the user album module. As our archive continues to grow I have been noticing performance issues.

Am I correct in saying that its not only the number of user albums that cause a performance hit but any movie/image that also has user specific permissions? I was wondering if I would see any performance increase if I got rid of user albums and instead used a theme to give the appearance that a user had their own album. For example there would be an icon called "your movies" which would just display every movie they owned.

Thanks for the help

 
joe7rocks
joe7rocks's picture

Joined: 2004-10-07
Posts: 560
Posted: Tue, 2006-03-21 19:08
Quote:
Am I correct in saying that its not only the number of user albums that cause a performance hit but any movie/image that also has user specific permissions?

Correct.

Quote:
I was wondering if I would see any performance increase if I got rid of user albums and instead used a theme to give the appearance that a user had their own album

Not correct. It wouldn't help performance.

2.1 has great speed improvement on sites like yours. (with appropiate settings, it can dramatically increase performance)
I strongly recommend you to wait for the final release and upgrade to 2.1, and see if it helped (it will).

 
markfh

Joined: 2006-01-17
Posts: 12
Posted: Tue, 2006-03-21 21:05

Is there a release date for 2.1 yet? We're currently using 2.1 RC2a and want to go live with our first client as soon as possible.

Great product BTW. We'll be sending money to y'all for each client we market to.

 
joe7rocks
joe7rocks's picture

Joined: 2004-10-07
Posts: 560
Posted: Tue, 2006-03-21 22:51

Yes, there is.
It is planned to be released in no more than 2 days :)

 
hix

Joined: 2006-01-25
Posts: 18
Posted: Sun, 2006-04-16 12:16

Is the performance problem (too many user albums) solved? Or does version 2.1 still have problems?

 
joe7rocks
joe7rocks's picture

Joined: 2004-10-07
Posts: 560
Posted: Sun, 2006-04-16 14:20

well..it really depends on.
e.g: define 'too many' and define 'problem' :)
but:
- 'too many' useralbums with 2.1 (and caching turned on) is not a problem itself, it should be fast.
- 'too many' useralbums (e.g. 1000+) AND a lot of AND frequent uploads (e.g. new pics in every 5mins)..well yes, that can be a problem. (it can be slower _for logged in_ users..)

We are certainly still improving the product, also this aspect!
Feel free to give us feedback on performance if you tried it, we will be happy to see real numbers/results, and that can help us further improve G2.