Support for DB2 databases?

Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Mon, 2006-09-11 14:40

Some preliminary observations...

The trace generated 2 files... "p692254t1.cli.txt" (duration 35 seconds) and "p741440t1.cli.txt" (duration 21 seconds).
I'm not sure why there are 2 trace files generated, but their timestamps do overlap by about 20 seconds.
Based on your "22 second" observation, I gather the "p741440t1.cli.txt" file is the one that matches the
scenario you described.

Unfortunately, the trace parser seems to be choking on that file. :-( It does seem to work with the
35-second trace though, so I took a look at that.

Of the 35 seconds, the most API time was spent in the SQLExecute() API, a total of 3.5 seconds.
10% doesn't sound unreasonable to me, but I found that 2.5 of those seconds was spent executing
1 statement:

Quote:
SELECT g2_PhotoItem.g_width, g2_PhotoItem.g_height, g2_DataItem.g_mimeType, g2_DataItem.g_size, g2_Item.g_canContainChildren, g2_Item.g_description, g2_Item.g_keywords, g2_Item.g_ownerId, g2_Item.g_summary, g2_Item.g_title, g2_Item.g_viewedSinceTimestamp, g2_Item.g_originationTimestamp, g2_FileSystemEntity.g_pathComponent, g2_ChildEntity.g_parentId, g2_Entity.g_id, g2_Entity.g_creationTimestamp, g2_Entity.g_isLinkable, g2_Entity.g_linkId, g2_Entity.g_modificationTimestamp, g2_Entity.g_serialNumber, g2_Entity.g_entityType, g2_Entity.g_onLoadHandlers FROM g2_PhotoItem, g2_DataItem, g2_Item, g2_FileSystemEntity, g2_ChildEntity, g2_Entity WHERE g2_PhotoItem.g_id IN (4592,4595,4598,4601,4604,4607,4610,4613,4616,4619,4622,4625,4628,4631,4634,4637,4640,4643,4646,4649,4652,4655,4658,4661,4664,4667,4670,4673,4676,4679,4682,4685,4688,4691,4694,4697,4700,4703,4706,4709) AND g2_DataItem.g_id=g2_PhotoItem.g_id AND g2_Item.g_id=g2_PhotoItem.g_id AND g2_FileSystemEntity.g_id=g2_PhotoItem.g_id AND g2_ChildEntity.g_id=g2_PhotoItem.g_id AND g2_Entity.g_id=g2_PhotoItem.g_id"

That's a fairly long IN predicate.

I notice in the trace that you are using a 64-bit Instance (I don't remember if you mentioned
that before, I may have forgotten if you did). I don't know if that is significant, but I don't
know if the other DB2 G2 users are running 32- or 64-bit, so comparing their performance to yours
might not be a fair comparison. Also, it adds one more thing to consider in your situation. (And I don't
have a 64-bit env to test on.)

I'll keep you posted.

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Mon, 2006-09-11 15:29

There's 40 items in that IN predicate. The number of pictures on each page.
But even on albums where I only have 5 pictures, the thing still stalls out for 10 seconds or so before it starts to do anything. (with db2sysc at the top of the 'top' list taking 18%cpu during the whole time)

That particular query takes 1.7 seconds to connect, query,terminate from a command line. The query itself returns in 0.8 seconds.

The 32-64 thing. Can't remember if I mentioned it or not, sorry. The instance is a 64 bit instance. THe client instance that apache uses to connect is 32. AIX generally doesn't have any problems with 32 to 64 communications. There were some windows issues with talking to a 64bit instance, in some cases you needed a DB2 connect middle man, but in this case, everything's on AIX.

Why it generated 2 trace files? I don't know, I turned it on, hit 'next' to goto the next page containing 40 images, and turned it off as soon as the page loaded, and that's what showed up. g2 is doing 2 different connects? Not sure why. I was just gald to see something show up. :)

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Mon, 2006-09-11 16:23

Can you stick that query into the simple PHP test program?

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Mon, 2006-09-11 17:01
 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Mon, 2006-09-11 18:14

OK, let's recap...

Symptom:

- Slow response time in Gallery2.
- High CPU utilization in db2syscs during periods of slow response time.
- Other DB2 apps perform fine.
- No helpful info in "db2diag.log".

DB2 Configuration:

- 64-bit DB2 8.2 server.
- 32-bit DB2 8.1 FP11 client (PHP/Apache, on same AIX system).
- Database shared with other apps.
- Stats are current.

G2 configuration:

- G2.1.2.
- Size of page seems irrelevant... problem occurs with either 40 thumbs/page or albums with only 5 thumbs.

Blow-by-blow:

- Got DB2 Monitor snapshot on database, DB2 forum folks feel the ratio of "rows read to rows selected" (88:1) is high.
- Running various statements in a simple PHP script shows ibm_db2 faster than ODBC, no performance problems evident.
- Got CLI trace, trace parser not working very well, I'll see if I can fix it.
- On another CLI trace file, the parser identified one SELECT stmt that spent 2.5 seconds in SQLExecute(); it runs sub-second in CLP and in simple PHP script.
- DB2 folks asked me to run Index Advisor ("db2advis.exe") on that stmt, but "db2advis" seems to choke on the stmt. No response from IBM on this yet.

Brainstorming:

- It would definitely be helpful to repro this behaviour outside of the GUI environment.
- It might be helpful to get a db2trc while the db2syscs CPU utilization is high, but how to do that in a GUI scenario this size? Maybe even just take stack traces of DB2 processes at certain intervals? If memory serves me, is it "kill -6 $pid", or something like that?
- It might help to get EXPLAINs of all statements, but not sure how to do that in PHP environment.

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Mon, 2006-09-11 18:44

Yep. THat about sums it up. THe database server instance is 8.1fp11 as well (which is 8.2fp3 I think?)

I got db2advis to run on that query.
Here's the output:

$ db2advis -d wwwdb -i p.in

Using user id as default schema name. Use -n option to specify schema
execution started at timestamp 2006-09-11-14.41.11.226570
found [1] SQL statements from the input file
Recommending indexes...
total disk space needed for initial set [ 0.000] MB
total disk space constrained to [ 14.913] MB
Trying variations of the solution set.
0 indexes in current solution
[792.0000] timerons (without recommendations)
[792.0000] timerons (with current solution)
[0.00%] improvement

--
--
-- LIST OF RECOMMENDED INDEXES
-- ===========================
-- no indexes are recommended for this workload.

--
--
-- RECOMMENDED EXISTING INDEXES
-- ============================
-- RUNSTATS ON TABLE "G2 "."G2_ITEM" FOR INDEX "G2 "."G2_ITEM79_99070" ;
-- COMMIT WORK ;
-- RUNSTATS ON TABLE "G2 "."G2_DATAITEM" FOR INDEX "SYSIBM "."SQL060512013540680" ;
-- COMMIT WORK ;
-- RUNSTATS ON TABLE "G2 "."G2_FILESYSTEMENTITY" FOR INDEX "SYSIBM "."SQL060512013542920" ;
-- COMMIT WORK ;
-- RUNSTATS ON TABLE "G2 "."G2_PHOTOITEM" FOR INDEX "SYSIBM "."SQL060512013545760" ;
-- COMMIT WORK ;

--
--
-- UNUSED EXISTING INDEXES
-- ============================
-- DROP INDEX "G2 "."G2_CHILD93_52718";
-- DROP INDEX "G2 "."G2_ENTIT6F_76255";
-- DROP INDEX "G2 "."G2_ENTIT6F_35978";
-- DROP INDEX "G2 "."G2_ENTIT6F_44738";
-- DROP INDEX "G2 "."G2_ENTIT6F_63025";
-- DROP INDEX "G2 "."G2_ENTIT6F_60702";
-- DROP INDEX "G2 "."G2_FILES56_3406";
-- DROP INDEX "G2 "."G2_ITEM79_21573";
-- DROP INDEX "G2 "."G2_ITEM79_54147";
-- DROP INDEX "G2 "."G2_ITEM79_90059";
-- ===========================
--

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Tue, 2006-09-12 15:19

I sent my contacts in DB2 Performance the summary I posted above,and they responded:

Quote:
"Got DB2 Monitor snapshot on database, IBM folks feel the ratio of "rows
read to rows selected" (88:1) is high."

That is indeed very bad ... I think you don't want the ratio to be greater
than 10:1. That should point to a bad plan (not sure why clp would be fast
... unless CLI trace is turned ON. It is OFF, right?).

Stack tracebacks are with kill-32 if its AIX.

I've assured them that CLI trace is normally Off. And I've asked them a few questions
about your "db2advis" output, basically it means DB2 does not have any recommendations
for additional indexes... it is already using what it thinks are the best indexes.

Next time you repro the problem, could you please "kill -36" to dump a stack trace for the
"db2syscs" process? That should tell us what that process is doing (files will be in the same
directory as your "db2diag.log"), but it will be a static snapshot... with no context, it might
not be possible to tell whether that activity is unusual, but it's worth a try.

I'm still trying to think of additional tests we can do, that can hopefully be incorporated into Gallery.

Damn I wish we could repro this outside of Gallery. That could make this so much easier. :-(

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Tue, 2006-09-12 16:04

I did a kill -36 on the db2sysc pid that was stalling/hogging CPU during a page load.
Here's what showed up in db2dump directory

http://www.snyderworld.org/ftp/db2stack

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Tue, 2006-09-12 22:16

OK, that "t331792.000" file is the only one we're interested in.

It contains two stack traces (two threads). Look for "<StackTrace>" in the file.

The first appears to be irrelevant, but the second one says that the component of the DB2 engine
that is currently active is the Optimizer, processing a NLJOIN.

This is a little out of my area of expertise, but I wouldn't think that the Optimizer should take that long.

I'm still discussing this with my DB2 contacts, but in the meantime, does this suggest anything to you?
Is there anything funky with the default optimization level on your server?

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Tue, 2006-09-12 22:41

Doesn't ring a bell with me. I know of the optimizer existance, but don't really know what it does.
Mines at the default of 5. 15seconds
I change to 9, 13 seconds
I change to 1, 15 seconds

If it's some specfic query that's doing it, but you say your implementation runs nice and speedy. Mabey some module I'm using that your not that has a bum (for DB2) query?

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Tue, 2006-09-12 23:05

I'm working on instrumenting the Gallery code so that I can tell how long each statement takes.
Just scrambling to find the time to do it. :-(

Hopefully if we can identify a specific problematic statement it might be easier to reproduce
outside of Gallery.

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Tue, 2006-09-12 23:11

At least I'm making some progress on serendipity. I got it to install. Getting it to work is another story though....

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Wed, 2006-09-13 02:51

Ok, I'm starting to think it's my PHP now. I don't know.
So I added this on the top of the _query function in

shell_exec('date >> /tmp/g2_querys');
               $command="echo \"SQL [".$sql."]\" >> /tmp/g2_querys";
               shell_exec($command);
               $mtime = microtime();
               $mtime = explode(" ",$mtime);
               $mtime = $mtime[1] + $mtime[0];
               $starttime = $mtime;

and this at the bottom:

$mtime = microtime();
                $mtime = explode(" ",$mtime);
                $mtime = $mtime[1] + $mtime[0];
                $endtime = $mtime;
                $totaltime = ($endtime - $starttime);
                $command="echo \"Query took [".$totaltime."]\" >> /tmp/g2_querys";
                shell_exec($command);
                shell_exec('date >> /tmp/g2_querys');
                shell_exec("echo '=================================================' >> /tmp/g2_querys");

Then I tailed the file (unix tail command you can watch the file contents as they get put in) as I hit 'next' to go to the next page (typical 10seconds or so before it does anything)
The whole time the browser is sitting there spinning (not doing anything, db2sysc is at top taking some CPU) querys are being spit out one by one, a couple a second. PHP is busy the whole time. http process is right under db2sysc, but only taking 1% CPU.
22:40:56 is the time where the page actually loaded started to show contents.

(Why so many querys? Seems like alot just to go to the next page)
So, now I'm really confused.

Here's the file:
http://www.snyderworld.org/ftp/g2_querys

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Wed, 2006-09-13 03:33

I just finished doing something very similar. In _query() I took timestamps at the beginning and the end of
every db2_*() call durint execution of an SQL statement. (I notice that the Prepare() and db2_Execute()
functions are never used.) And I just print it to stdout, not to a file.

By doing so, I can report:

- the time taken in each db2_*() call
- the time spent in PHP between the db2_*() calls,
- the total time spent in this stmt execution.

For some reason I'm having difficulty attaching my adodb-ibm_db2.inc.php here. So here's my _query()
function (Note: Like I think I said earlier, I'm not sure that your original copy of that file and mine
were the same):

	function _query($sql,$inputarr=false) 
	{
	GLOBAL $php_errormsg;
		if (isset($php_errormsg)) $php_errormsg = '';
		$this->_error = '';
		
		if ($inputarr) {
			if (is_array($sql)) {
				$stmtid = $sql[1];
			} else {
//LJM
printf ("\n<br>In _query(), calling db2_prepare(), \$sql = $sql...\n<br>");
				$stmtid = db2_prepare($this->_connectionID,$sql);
//LJM
printf ("\n<br>In _query(), returned from db2_prepare().\n<br>");
	
				if ($stmtid == false) {
					$this->_errorMsg = isset($php_errormsg) ? $php_errormsg : '';
					return false;
				}
			}
			
//LJM
printf ("\n<br>In _query(), calling db2_execute(), with \$inputarr, \$sql = $sql...\n<br>");
			if (! db2_execute($stmtid,$inputarr)) {
//LJM
printf ("\n<br>In _query(), returned from db2_execute(), with \$inputarr, failure.\n<br>");
				if ($this->_haserrorfunctions) {
					$this->_errorMsg = db2_errormsg();
					$this->_errorCode = db2_error();
				}
//LJM
printf ("\n<br>In _query(), returned from db2_execute(), with \$inputarr, success.\n<br>");
				return false;
			}
		
		} else if (is_array($sql)) {
			$stmtid = $sql[1];
//LJM
printf ("\n<br>In _query(), calling db2_execute(), with is_array(\$sql), \$sql = $sql...\n<br>");
			if (!db2_execute($stmtid)) {
//LJM
printf ("\n<br>In _query(), returned from db2_execute(), with is_array(\$sql), error.\n<br>");
				if ($this->_haserrorfunctions) {
					$this->_errorMsg = db2_errormsg();
					$this->_errorCode = db2_error();
				}
//LJM
printf ("\n<br>In _query(), returned from db2_execute(), with is_array(\$sql), success.\n<br>");
				return false;
			}
		} else
//LJM
$totalelapsedtime = 0;
$startdb2time = microtime(true);
if (isset($enddb2time))
{
$elapsedPHPtime = $startdb2time - $enddb2time;
$totalelapsedtime += $elapsedPHPtime;
printf ("\n<br>Elapsed time in PHP = %lf.\n<br>", $elapsedPHPtime);
}
printf ("\n<br>$startdb2time: In _query(), calling db2_exec(), \$sql = \"$sql\"...\n<br>");
			$stmtid = db2_exec($this->_connectionID,$sql);
//LJM
$enddb2time = microtime(true);
printf ("\n<br>$enddb2time: In _query(), returned from db2_exec().\n<br>");
$elapseddb2time = $enddb2time - $startdb2time;
printf ("\n<br>Elapsed time in DB2 API = %lf.\n<br>", $elapseddb2time);
$totalelapsedtime += $elapseddb2time;
		
		$this->_lastAffectedRows = 0;
		if ($stmtid) {
//LJM
$startdb2time = microtime(true);
$elapsedPHPtime = $startdb2time - $enddb2time;
printf ("\n<br>Elapsed time in PHP = %lf.\n<br>", $elapsedPHPtime);
$totalelapsedtime += $elapsedPHPtime;
printf ("\n<br>$startdb2time: In _query(), calling db2_num_fields()...\n<br>");
			if (@db2_num_fields($stmtid) == 0) {
//LJM
$enddb2time = microtime(true);
printf ("$enddb2time: In _query(), returned from db2_num_fields(), == 0.\n<br>");
$elapseddb2time = $enddb2time - $startdb2time;
$totalelapsedtime += $elapseddb2time;
printf ("\n<br>Elapsed time in DB2 API = %lf.\n<br>", $elapseddb2time);
$startdb2time = $enddb2time;
$elapsedPHPtime = $startdb2time - $enddb2time;
printf ("\n<br>Elapsed time in PHP = %lf.\n<br>", $elapsedPHPtime);
$totalelapsedtime += $elapsedPHPtime;
printf ("\n<br>$startdb2time: In _query(), calling db2_num_rows()...\n<br>");
				$this->_lastAffectedRows = db2_num_rows($stmtid);
//LJM
$enddb2time = microtime(true);
printf ("\n<br>$enddb2time: In _query(), returned from db2_num_rows(), \$this->_lastAffectedRows = $this->_lastAffectedRows.\n<br>");
$elapseddb2time = $enddb2time - $startdb2time;
$totalelapsedtime += $elapseddb2time;
printf ("\n<br>Elapsed time in DB2 API = %lf.\n<br>", $elapseddb2time);
				$stmtid = true;
			} else {
//LJM
$enddb2time = microtime(true);
printf ("\n<br>$enddb2time: In _query(), returned from db2_num_fields(), != 0.\n<br>");
$elapseddb2time = $enddb2time - $startdb2time;
$totalelapsedtime += $elapseddb2time;
printf ("\n<br>Elapsed time in DB2 API = %lf.\n<br><br>", $elapseddb2time);
				$this->_lastAffectedRows = 0;
			}
			
			if ($this->_haserrorfunctions) {
				$this->_errorMsg = '';
				$this->_errorCode = 0;
			} else
				$this->_errorMsg = isset($php_errormsg) ? $php_errormsg : '';
		} else {
			if ($this->_haserrorfunctions) {
				$this->_errorMsg = db2_stmt_errormsg();
				$this->_errorCode = db2_stmt_error();
			} else
				$this->_errorMsg = isset($php_errormsg) ? $php_errormsg : '';
		}
//LJM
printf ("\n<br>Total (DB2 and PHP) elapsed time this execution = %lf.\n<br><hr><br>", $totalelapsedtime);
		return $stmtid;
	}

Here's an example of the output for one statement:

1158116828.9707: In _query(), calling db2_exec(), $sql = " SELECT g2_AccessSubscriberMap.g_itemId, MAX(G2_BIT_OR (g2_AccessSubscriberMap.g_itemId, g2_AccessMap.g_permission)) FROM g2_AccessMap, g2_AccessSubscriberMap WHERE g2_AccessSubscriberMap.g_itemId IN (25) AND g2_AccessSubscriberMap.g_accessListId = g2_AccessMap.g_accessListId AND g2_AccessMap.g_userOrGroupId IN (5,4) GROUP BY g2_AccessSubscriberMap.g_itemId "... 

1158116828.9892: In _query(), returned from db2_exec(). 

Elapsed time in DB2 API = 0.018488. 

Elapsed time in PHP = 0.000230. 

1158116828.9894: In _query(), calling db2_num_fields()... 

1158116828.9895: In _query(), returned from db2_num_fields(), != 0. 

Elapsed time in DB2 API = 0.000100. 


Total (DB2 and PHP) elapsed time this execution = 0.018818.

As for why there are so many statements for what seems like a simple operation, believe it or not there
is that much info that has to get retrieved from the database: permissions, etc.

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Wed, 2006-09-13 04:38

Here's what I get with your _query

http://www.snyderworld.org/ftp/ljm_query

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Wed, 2006-09-13 12:26

I tried installing eAccelerator, but my PHP (5.1.4) is too new for it. I'm not about to go back.

What I might try is the "Zend core for IBM", they supposedly have a pre-configured setup of php/ibm_db2 specially made for DB2, it's free. Mabey something's not right during the build of php.

Were you using any accelerators on your installation?

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Wed, 2006-09-13 13:57

Interesting... Everything is sub-second. So where is the time being spent?

I realized last night in bed that the PHP time I'm reporting is only the time in PHP between:

  • the first db2_*() API call in _query() and
  • the end of _query().

So I'm not actually reporting the time in PHP:

  • in _query() before the first db2_*() API call (this is probably negligible)
  • in functions other than _query() (this could be significant).

I think I could capture all of that by making the 'total' timer variables object-level attributes,
so that they are preserved across _query() calls. I'll play around with that, but:

  • I think the data we currently have does show that the DB2 engine is returning from executing queries
    sub-second. It doesn't, however, take into account the time it takes to return result sets to the
    client, but that too should be negligible unless your system has severe IPC or TCPIP issues (which is not
    likely).
  • It'll probably be easier to modify your code changes to do something similar. At the beginning
    and end of every _query() invocation, record the current time using microtime(true). Use those
    timestamps (previous exit time - current entry time) to report the time spent everywhere other than
    _query()
    .

No, I'm not using any accelerators. Don't really even know what they are. ;-) And I don't really know
much about "Zend Core for IBM"... all I know is that it contains a copy of the ibm_db2 driver.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Wed, 2006-09-13 14:26

- most php accelerators are PHP op-code caches. meaning that all the .php, .class, .inc files in G2 do not have to be opened, read and parsed again by PHP.
- zend optimizer is not such a op-code cache, AFAIK, it's just a more aggressive php engine optimzing the op-code a little bit in multiple passes. thus, zend optimizer is usually just a tiny little bit faster than the normal zend engine while op-code caches lead to a speed-up of some multiples.
- to profile the performance outside of the db2 extension and the db engine, i suggest you use dbg profiling or xdebug profiling. those are php debug extensions that also provide a profiling mode. note though that php with an enabled dbg or xdebug extension is really, really slow. so you should only do relative comparisons e.g. G2 + MySQL + xdebug vs. G2 + DB2 + xdebug.

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Thu, 2006-09-14 02:07

Well, I'm a little more confused now.
I spent some hours today making serendipity work with DB2.(BTW, tell Mr. Scott I used his initial stuff there, and with some many more modificatoins, I got it working almost ok now) ( http://www.snyderworld.org/serendipity/ )
It's Using the ibm_db2 driver. It's performance it's stellar. Quite a few complex selects, it's running excellent (performance wise, still have some issues on certain functions). No slowdowns or anything. Yes, I added it to my wwwdb database along with all my other wwwdb apps. (I did try moving g2 to it's own instance and database, no change, I think I mentioned that).

Mabey, uggg....um....please don't shoot me, the g2 code is just too bulky?

I'm at a loss at this point. I added trace upon trace, I can't find anything that is stalling. EIther php (httpd) is busy or db2sysc is busy. DB2 itself doesn't show any slowdown whatsoever (I think it's just left open waiting for php to finish talking to it). AIX has loads-o-memory left, 0 disk activity, cpu goes up 1/4 way...browser stall...10 seconds...queries being submitted during the whole time, page starts loading..you see pictures.

So tell me again, Larry, your implementation runs good? No stalling? What levels of php/ibm_db2/DB2 are you at?
I've been searching, I don't find much, is there anything detailed to say about the "ratio of "rows read to rows selected" being out of wack? Any recommended course of action there?

THere's gotta be some function, somewhere that's in g2 that's causing it. Man...it's every page too, try to 'Edit photo', stall.6 seconds..then the page start ot load and you see you image with the options to edit...
Hit login, page loads immediatly, type your id,passwd, stall 8 seconds...page starts to load and logs in.
the mystery is killllling me.....
Ahhhhhhh..

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Thu, 2006-09-14 03:00

BTW, thanks a whole heck of alot for all the help. Keeps me sane knowing there's actual people out there that don't think your crazy. Well, mabey you do think I'm crazy, I don't really know. Mabey I am crazy? :)

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Thu, 2006-09-14 03:44

I can't speak for how bulky the G2 code is or isn't, but the answer you'd probably get from the core owners
is that "it works fine on other systems". :-)

Are you still going to try the Zend Core for IBM?

My DB2 contacts tell me that their concern with the rows read/selected ratio is an indicator of at least
one case of a bad access plan being chosen by DB2. The reason for that (99.9% of the time) is that some
statistics are not acurate. But if that were the case I would expect to see that reflected in the
timestamps we took, and frankly I just don't see that.

Other things I still wonder about:

  • Your PHP. You mentioned (http://gallery.menalto.com/node/37648?page=17#comment-185282)that there are 2
    ways to add ibm_db2 into your PHP. Since I'm using a pre-built binary of PHP, all I needed to do was
    download the ibm_db2 binary and enable it in my "php.ini". Is that similar to what you did?
  • But the simple script using ibm_db2 shows no degradation with single statements, which suggests that
    PHP, Apache, ibm_db2, and DB2 are all performing well. What's left? Gallery itself?
  • The timings we got from the DB2 API calls seem to support the suggestion that DB2 is performing well.
  • The 64-bit server-to-32-bit-client thing. I could imagine opportunities for problems there.
  • The fact that db2syscs is using CPU during these stalls. The "kill -36" output seemed odd to me that the
    Optimizer should still be active at that point, but I wonder if there was another client active when you
    took that stack dump. Plus that's only a static snapshot; it might be more informative to take a 15-second
    "db2trc" during a stall.
  • I wonder what would happen if we just copied all the SQL from your scenario (captured by setting
    $gallery->setDebug('buffered')) into a CLP file or PHP script and run them outside of Gallery.

My own configuration is on Windows XP, G2.1.2 GA, Apache 2.0.55, and PHP 5.1.4. I recently upgraded from
DB2 V8.2.2 ESE to DB2 9 Express-C. I have about 1,400 photos, and am using the default Matrix theme, pretty
much uncustomized. I've enabled all Modules except Registration, Search, Cart, Zip Download, RSS,
Comments, GetID3, Rating, Archive Upload, Migration, Nokia Image Upload, Publish XP, and Upload Applet.

I'll pass along to Dan your news about Serendipity, I'm sure he'll be happy to hear it's coming along well
for you.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Thu, 2006-09-14 09:33

@73blazer:
what's the size of your gallery? #entities, #items, #albums, #users, #groups, #rows in the acl / permission tables? is the imageblock module enabled?, ..
the easiest way to compare performance would be to dump your database and g2data/ dir and set up another db2 instance with the same contents on another server.

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Thu, 2006-09-14 13:25

5878 Items in Total. (Which is the reason I upgraded from G1, because it was starting to choke with the amount of images it had to cope with)
9 Top level Albums, each have from 1-5 Sub-albums. 1 or 2 third level albums
2 users
Rows in ACL? What's that?
Imageblock is being used (My wife loves the random image) I also use the external image block command embeded into my homepage, there's not a performance degredation there.

What did I do to get ibm_db2? I had to compile it, then re-compile PHP linking with the new ibm_db2 library. (On AIX, it's almost impossible to find a pre-compiled binary of most open source stuff, and if you do find one, it's usally 3-4 versions old) The documentation said there was another way, the add an extention way, where you just compile it (in extension mode) and add it in your php.ini file. SOunds like that is what you have.
The way I did it, I linked with the ibm_db2 library at php compile time, no entries needed in php.ini for that method. It's a perfectly valid method outlined in the PECL docs. It seems, at least to me in theory, that the linking directly method would be better. But that's just a personal opinion.
I did try the extension method at first. See, so here the thing on AIX here. That rarely works without modifications to the core code of the extension your compiling, because, the way AIX links with other libraries is, apperently, very dirrent than on most platforms. I don't completly understand it myself, but I know, (from trying to get other extensoins to work in PHP, and just having been using AIX 12+ years now), that talking amongst libraries on AIX is a chore when you go outside of the AIX linker. So, compile php with the --with-IBM_DB2 keyword (your compiled ibm_db2 library must be accessible), your php make file will use the AIX linker to link with it, and presto, it works. That being said, I could go back and try the add and extension way again to try and make it work .Sometimes you you screw with the make & .la files files at compile time, I can get it to generate a (more) sharable library that works in extenstion mode from php.ini (I was able to get the eAccelerator to work in extension mode, but alas, my php is too new for it, but the library was called and worked and most importantly, didn't coredump). I would have thought the ibm_db2 driver woudl have worked straight away in extenstion mode with PHP, being that it was mostly written by IBM'ers (was my understanding anyway), usally IBM'er written stuff takes account for the difficulties of working on AIX. When this didn't, I just started swearing to myself, and moved on to the link-at-compile-time method.

I think I'll try:
- Disabaling all extraneous modules
- Larry's suggestion about capturing all SQL and running it standalone PHP or command line
- A full db2trc
- Making IBM_DB2 an extenstion to PHP (rather than built into it)
- Setting it up on another machine. (I have a spare windows box, and a couple of older RS6000's?)

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Thu, 2006-09-14 13:26

Ok, I just deactivated most modules, and performance increased 75%.
Still a slight lag there though going from page to page, but CPU goes to 80% for 3 seconds, and the page appears. 3-4 seconds is way better than 18seconds.
So now what? Activate them one by one to see what the offending module is?

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Fri, 2006-09-15 06:51

So, tell me, what's in this Java jar file that was installed? Some DB2 UDF's? That's a possibility. Java performance absolutly blows on AIX.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2006-09-15 09:06

Correct, a Java UDF and they could be a bottleneck indeed. But if they were, wouldn't you see the slowness in your DB2 SQL profiling?

the UDFs are defined in modules/core/classes/GalleryStorage/Db2CreateUdfs.sql
and one of them depends on the Java UDF to create a user defined aggregate function:
http://svn.sourceforge.net/viewvc/gallery/trunk/java/gallery2/g2_db2_bit_or.java?view=markup

but that relates not to PHP/G2 time. that's time spent in DB2.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2006-09-15 09:07

Correct, a Java UDF and they could be a bottleneck indeed. But if they were, wouldn't you see the slowness in your DB2 SQL profiling?

the UDFs are defined in modules/core/classes/GalleryStorage/Db2CreateUdfs.sql
and one of them depends on the Java UDF to create a user defined aggregate function:
http://svn.sourceforge.net/viewvc/gallery/trunk/java/gallery2/g2_db2_bit_or.java?view=markup

but that relates not to PHP/G2 time. that's time spent in DB2.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2006-09-15 09:11

@ACL:
AccessMap and AccessSubscriberMap tables.
ACL = Access Control List. if you're interested: http://en.wikipedia.org/wiki/Access_control_list

how many rows do those 2 tables have?

@profiling:
how much time per HTTP request is spent in total in DB2 as a percentage and in absolute time?
have slow queries been identified?
else:
maybe it's just your server.
you can check that theory by running the same data on an installation on a "normal" x86 based server + same DB2 version.
and you can do PHP profiling on your db2 server to find out where all the time is lost.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Fri, 2006-09-15 09:19

It contains a java UDF that is required for core Gallery functionality on DB2.

Specifically, DB2 does not have built-in Bitwise functionality (required in Gallery for aggregation
of Access Control Lists (ACLs)), so it had to be implemented as an external UDF. It was done in java
because we did not want to assume that Users had access to a C compiler.

Look at the bottom of "modules/core/classes/GalleryStorage/Db2CreateUdfs.sql" and you'll see the
"CREATE FUNCTION" statement that creates it. The jar file is 'installed' in
"modules/core/classes/GalleryStorage/Db2Storage.class", function "configureStore".

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Fri, 2006-09-15 14:07

g2.g2_AccessMap = 39

g2.g2_AccessSubscriberMap 17382

So what's AccessSubscriberMap for, is that a normal number. Not sure what that is supposed to represent.. I did use the g1->g2 import for 75% of the images currently there.

Yeah, your right the JavaUDF would be reflected in the query times. Java has been vastly improved in the last couple years. IN the late 90's Java apps ran excrutiatingly slow on AIX machines, IBM spent alot of time improving the java runtime recently.

No slow queries have been identified, that I can find anyway.

PHP Profiling.....I'll give it a whirl....

My Server...umm... I'd agree, except everything else PHP/DB2 wise works without drag. Granted nothing as heavy as g2, yet. Well, my vehicle service tracker (uses odbc), one guy has close to 2800 line items on one vehicle, returns in lighting quick time. I run customers full-blown enterprise (non-php) apps with huge (30gb) DB2 databases (some tables have over 1million entries), and the 4 processers are never slowed. (for troubleshooting purposes, these apps aren't running all the time). Sometihng related with php/DB2 I'd buy though...it might be worth a try to set it up on another machine, I need to buy a case and drive for my old board/processor and I'd have a spare x86 machine. (AMD 2200).

Thanks for the help guys.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2006-09-15 14:29

@ACL table cardinality:
your numbers sound about right.
39 rows in AccessMap, probably about 12 different ACL ids for 2 users. that's reasonable and that's small.
17k ACL subscribers sounds also very reasonable.

So this is not the source of your problem.

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Sat, 2006-09-16 15:06

I'm playing with trying to adjust the DB2 driver to spit out more information.
The only abnormal thing I see thus far is the amount of connects it does to the database:

DUring a page load I saw my "going for db2_connect" message 42 times? I just put a log statment in the top of function _connect. 42 connects to the database seems a bit much. I think it was one for each picture on the page, plus 2 more. I see now why activate database is nessesary.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sat, 2006-09-16 17:47

g2 does 1 db connect per http request that needs the database.
downloads of public images (thumbnails, resized images, originals) usually don't need a db connection.
so for a normal album page request, you get a total of 1 db connection, none for the image http requests.

if it's a private album, then yes, you can have like 10-20 db connections for a single page, because it needs the database for each image download.

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Sat, 2006-09-16 18:14

It's a public album. I've no private albums. So, then from what your saying, that shouldn't be happening?

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Sat, 2006-09-16 20:37

Hmm, I just did the same thing and when I page forward from one page of 9 items to another page of 9
items, I go through _connect() only once.

I put some more debug printfs in ADOdb and I see this happening, but only once:

entering ADONewConnection()
calling ADOLoadCode('db2')
returning from ADOLoadCode('db2')
leaving ADONewConnection()

entering Connect()
$forceNew is set, so:
- it calls _nconnect()
- _nconnect() calls _connect()
- _connect executes $this->connectStmt
- _connect() returns
- _nconnect() returns
leaving Connect()

I have no idea why you are going through it so many times?

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Wed, 2006-09-20 12:57

OK, I just checked again, to ensure I wasn't doing something wrong. I'm going through _connect twice to begin the page, and then once for each item on the page. On a page with 40 thumbnails, that's 42 connects to the database. So, where do I look why that is happening? Valient mentioned that if it's a private album that would happen, but none of my albums are private. Did my items get flagged improperly when I imported from G1 to G2? I did have some problems during the import, namely the utility simply skipped over a few pictures, but that's it really.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Wed, 2006-09-20 13:23

all public thumbs / resizes / photos have a cache file in g2data/cache/entity/x/y/xyz-fast.inc, e.g. g2data/cache/entity/4/3/437-fast.inc.
disable the url rewrite module if it's active.
browse to your g2 to a album page. right-click on a thumbnail to find out its image location (DownloadItem URL).
from the URL, you got the id of the thumbnail.
check if a -fast.inc file exists for that thumbnail in g2data/cache/entity/.

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Wed, 2006-09-20 13:55

Yes, an fast.inc file exists for the thumbnails I checked.
Example:
<?php function GalleryFastDownload() { return $GLOBALS['gallery']->fastDownload('cache/derivative/1/8/18887.dat', 'IMG_6321.JPG', 'Sat, 16 Sep 2006 03:13:09 GMT', 'image/jpeg', 13544);} ?>

I did have URL Re-write enabled, so I could copy pure resized image URL's for use in forums and such.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Wed, 2006-09-20 14:19

enabling url rewrite with mod_rewrite or isapi rewrite doesn't change the behavior.

the rewrite module with pathinfo leads to a performance hit. i guess it will do a db connection for each download in that case.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Wed, 2006-09-20 22:09

73blazer:

If the "rewrite using pathinfo" is not causing the multi-connect problem, then I would start debugging
in main.php.

Valiant:

Perhaps if he dumps the call stack every time he goes into _connect(), that should tell us the code path
it is taking, from which we might be able to tell *why* it is happening.

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Thu, 2006-09-21 12:38

I disabled the mod_rewrite module and re-enabled my connect traces. I only see 2 connects per page then.
Doesn't really seem to be faster either, traces show about 0.5 to 1 second faster with the mod_rewrite module deactivated.. So *those* connects and selects from the database are working speedily.
So...hmm... I need more traces to see where that lag is coming from...
So to load a page, what php files would be good to stuff a trace in? main.php?

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Thu, 2006-09-21 12:55

I've narrowed the stall down to this line in main.php:

list ($ret, $results, $theme) = $view->doLoadTemplate($template);

So, I'll go find doLoadTemplate now..
Any meaning here....

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Thu, 2006-09-21 15:10

I've narrowed it down furthur to, well, a DB query.
In DB2 the query takes 0.2 seconds to return.
From G2, it takes 3 - 5 seconds...
Here's the line
$recordSet = $this->_gs->_db->Execute($query, $targetIds);
in
./modules/core/classes/GalleryStorage/GalleryStorageExtras.class
I put a trace before and after, it takes 3 - 5 seconds to execute this line...

Still looking furthur......

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Thu, 2006-09-21 16:06

Ok,
Here's the skinny.

That query, the one we identified earlier with the long IN predicate, well I took it and put in a php script that runs it with the ibm_db2 driver.
Returns in 0.5 seconds, like I said earlier.
However, I wasn't using prepare and execute, I was just issuing db2_exec.
I re-organized my test script to do a prepare/execute and
db2_prepare, then db2_execute, and it takes 3.5 seconds.

Furthur more, when I re-organized the query to say and (x=1 or x=2 or x=3) instead of x IN (1,2,3) the query returned in 0.5 seconds again.
If I execute that query directly in DB2 command line with the IN it's 0.5 seconds.
I changed the IN to only 1 item in the php test script,using db2_prepare and db2_execute it's 3.2 seconds.

So, it appears the ibm_db2 driver, at least the version I'm on,has a problem with IN predicates with prepared statments.
That query, and others with IN clauses are called a few times during a page load, and with more modules enabled like I had, even more times, thus my 15 seconds stall. Disabling several moduels I wasn't really using I'm down to a 3-4 second stall. So, no so bad.

But the problem is existing in the ibm_db2 driver.
What to do about it,I'm not sure.

Larry, Do you know what version of the ibm_db2 driver you have? I have 1.2.1
You should be able to tell with phpinfo() in the ibm_db2 section.
Although my version says 1.2.0 there, but I've 1.2.1. They left the text as 1.2.0 in the thier code for that version I can see by purusing the .c files.

There's a 1.2.3 version that has a few items in the change log ( http://pecl.php.net/package-changelog.php?package=ibm_db2&release=1.2.3 )
but nothing I can see related to my problem. Perhaps and older version might work better? Mabey I'll try and dig through the ibm_db2 code to see what's the problem...

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Thu, 2006-09-21 17:20

A 0.5 second query is already pretty long. 3.5 totally unacceptable.
quoting your query:
SELECT g2_PhotoItem.g_width, g2_PhotoItem.g_height, g2_DataItem.g_mimeType, g2_DataItem.g_size, g2_Item.g_canContainChildren, g2_Item.g_description, g2_Item.g_keywords, g2_Item.g_ownerId, g2_Item.g_summary, g2_Item.g_title, g2_Item.g_viewedSinceTimestamp, g2_Item.g_originationTimestamp, g2_FileSystemEntity.g_pathComponent, g2_ChildEntity.g_parentId, g2_Entity.g_id, g2_Entity.g_creationTimestamp, g2_Entity.g_isLinkable, g2_Entity.g_linkId, g2_Entity.g_modificationTimestamp, g2_Entity.g_serialNumber, g2_Entity.g_entityType, g2_Entity.g_onLoadHandlers FROM g2_PhotoItem, g2_DataItem, g2_Item, g2_FileSystemEntity, g2_ChildEntity, g2_Entity WHERE g2_PhotoItem.g_id IN (4592,4595,4598,4601,4604,4607,4610,4613,4616,4619,4622,4625,4628,4631,4634,4637,4640,4643,4646,4649,4652,4655,4658,4661,4664,4667,4670,4673,4676,4679,4682,4685,4688,4691,4694,4697,4700,4703,4706,4709) AND g2_DataItem.g_id=g2_PhotoItem.g_id AND g2_Item.g_id=g2_PhotoItem.g_id AND g2_FileSystemEntity.g_id=g2_PhotoItem.g_id AND g2_ChildEntity.g_id=g2_PhotoItem.g_id AND g2_Entity.g_id=g2_PhotoItem.g_id"

so this is a IN predicate with 40 values. I guess that means that you show album pages with 40 thumbnails, which is fairly large, compared to about 12-16 in the normal case.
But it still shouldn't take 0.5 seconds. Nothing complicated in that query, just joining a few tables on their PK, all conditions matching indices.
So 0.5 seconds for that is IMO very slow.

And yes, good catch. The 3.5 vs 0.5 seconds issue is also huge.

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Thu, 2006-09-21 17:41

Well, the 0.5 seconds in the test script includes a connect, display the results and close. Same from command line (I put connect;select xxxx;terminate; in a script).
If I shift my timer to just around the db2_exec (so after the connect and before the close) in the php script, it's 0.0083010196685791 seconds
The prepare/execute version goes from 3.5 to 3.1278390385883134 seconds.
Yeah, shouldn't include the connect.

It doesn't make much difference how many things are in the IN () clause, like I said I knocked it down to 1, and same stall.

Is that more like what you'd expect?

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Thu, 2006-09-21 18:39

0.0083s is fine I guess. Good. So that's even more dramatic, from 0.083 to 3.13 seconds!

The ball is now in the hands of Larry / DB2 engineers. :)

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Thu, 2006-09-21 18:54

My version of 'ibm_db2' is 1.4.10, according to phpinfo().

Gallery uses db2_exec(), not db2_execute(). At least for me. Are you saying that it is using
db2_prepare() and db2_execute() for you?

 
73blazer
73blazer's picture

Joined: 2006-05-12
Posts: 79
Posted: Thu, 2006-09-21 21:50

Hmm..your using an 'alpha' version. I had used the latest 'stable' version when I upgraded to G2 which was before june.
Yeah, I meant the other way around. Sorry, I'm a bit dyslexic-ish.They are running through db2_exec.
I usally prepare everything from the way I call the DB2 API's in some of my C++ apps, it almost always required for what's going on, just the way I'm used to doing it.
My original test script used prepare/execute. It's db2_exec that is slow with IN clauses.
Well, that's good, because in the changelog, 3 newer versions says db2_exec performance increase.
Serendipity has only 1 IN clause, which I hadn't gotton to testing that piece of functionality yet. Sure enough, stall when I picked the feature that had the IN clause.

I'm not really sure why my original g2_querys trace (which trace input/output from _query function) didn't show any long traces, because I re-activated that trace and I see two 2.5/3 second queries going from page to page. Who's knows, I've hacked the code for traces so much at this point, I may have deleted some lines I shouldn't have, I really need to refresh from the source tar file. I've also disabled many modules, and re-enabled some again, so that may effect how many queries and what type of queries are coming through.

I'll try upgrading my ibm_db2 driver, I'll have to compile it, and re-compile php so it links with it.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Thu, 2006-09-21 22:39

Also, please use the latest nightly snapshot of G2 (or svn) when testing performance.
Not that we miss an obvious reason just because we don't compare performance figures of different versions.