Support for DB2 databases?

Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Fri, 2005-11-11 20:42

I'm not having any luck so far. Maybe Sphericus can help (this is the first time I've seen the MTK, so I'm not really familiar with it and I might be missing something).

First of all, I don't see any 'replacement UDFs' in the MTK. My understanding that the MTK 'generates' code, it does not 'provide additional UDFs'.

What I did with the MTK was:

1) Read the doc :-)
2) Start the MTK app
3) Created a new Project. For the 'From' platform I selected "MS SQL Server". (Other choices were Sybase, Oracle, Informix.)
4) Open up the SQL Translator, copied the query into it:

Quote:
SELECT iam0.g_itemId, COUNT(iam1.g_itemId) FROM g2_ItemAttributesMap AS iam0, g2_ItemAttributesMap AS iam1, g2_AccessSubscriberMap WHERE iam0.g_itemId IN (7) AND iam1.g_parentSequence LIKE iam0.g_parentSequence || iam0.g_itemId || '/%' AND iam1.g_itemId = g2_AccessSubscriberMap.g_itemId AND g2_AccessSubscriberMap.g_accessListId IN (8,9) GROUP BY iam0.g_itemId

5) Clicked on the "Convert" button.

The conversion failed with various 'Unexpected Syntax' errors.

Then I tried setting the 'From' platform to Oracle, and got similar errors.

The end result is that I cannot get the MTK to convert this statement.

Sphericus, do you know if I'm doing something wrong?

Thanks.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2005-11-11 20:45

- i guess there might be a directory with source code files for UDFs
maybe something like mssql/like/ or mssql/udf/...

- also, maybe use CONCAT instead of || or use a much simpler query.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Fri, 2005-11-11 21:09

There is NO code in the MTK. Take my word for it. :-) All there is is two directories... "doc" (the documentation) and "jre" (the class files which make up the MTK itself).

I've now managed to get a very simple scenario to work.

Quote:
SELECT mycolumn1 FROM mytable WHERE mycolumn2 = 'xyz'

But it does not like either form of concatenation:

Quote:
SELECT mycolumn1 FROM mytable WHERE mycolumn2 LIKE 'Prefix' || '%')

SELECT mycolumn1 FROM mytable WHERE mycolumn2 LIKE CONCAT('Prefix', '%')

Both of them choke with "Unexpected Syntax" on any kind of concatenation.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Fri, 2005-11-11 21:12

FYI, it chokes even on concatenation in a "=" predicate:

Quote:
SELECT mycolumn1 FROM mytable WHERE mycolumn2 = 'Prefix' || 'Suffix'

SELECT mycolumn1 FROM mytable WHERE mycolumn2 = CONCAT('Prefix', 'Suffix')

It seems to me like the MTK doesn't support concatenation.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2005-11-11 21:20

we'll have to wait for Sphericus for a real answer.

but we're not interested in concat / a replacement for concat.
we're just interested in a replacement for LIKE
so maybe try

SELECT mycolumn1 FROM mytable WHERE mycolumn2 LIKE mycolumn3;

or something like that... yeah. won't help, i'm pretty sure :/

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Fri, 2005-11-11 21:40

Well it translated successfully, but the output statement is exactly the same as the original statement. :-)

 
sphericus

Joined: 2005-11-06
Posts: 55
Posted: Sat, 2005-11-12 04:02

There is UDF code in the MTK package.
All the UDF codes are in the root directory of the MTK installation, typically c:\MTK.
The UDF are in files *.UDF with the executables JAR files with similar names. A number of the LIKE functions are provided by the compiled java they provide. But certain functions hey were able to provide without the java.

Also, Larry, I believe that the output would be very similar or the same as the original as the MTK probably acknowledges that the new LIKE it has can do it without trouble.

If you have a look at those. I found there wasn't any LIKE UDF's for oracle, but there were for Sybase and MS SQL.

Also while we are on the point, I assume you are familiar with the differences between "DB2" and "DB2/400 (a.k.a. DB2 for iSeries)" ?

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Sat, 2005-11-12 04:28

Ah, hiding in plain sight. I was looking only at the navigation tree, which only shows subdirectories, I never actually opened up the C:\MTK folder. I'm normally a command-line bigot, but in recent years am trying to force myself to use GUIs like Windows Explorer. And this is the result. I despise GUIs.

OK, so I'm looking at the UDF files, and will try to figure out how they are used.

Any idea why the SQL Translator doesn't seem to be working?

Yes, I've worked with iSeries (a.k.a. AS/400) a little. DB2 on that platform is significantly different from its Linux/UNIX/Windows cousin.

Thanks, Sphericus.

 
sphericus

Joined: 2005-11-06
Posts: 55
Posted: Sat, 2005-11-12 05:06

No idea why the SQL translator isn't working to be honest.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Sat, 2005-11-12 05:46
Quote:
No idea why the SQL translator isn't working to be honest.

OK, well, that makes two of us. :-)

I now see roughly how the UDF files are used. They provide UDFs in either of two languages... some are implemented in SQL/PL, others in java. The 'LIKE-equivalent' UDFs are unfortunately written in java, which means:

- They depend on an external ".class" file (like I mentioned a couple of days ago), and
- The UDFs don't 'replace' existing UDFs, instead they are cataloged as some other name, e.g., "MS7.isLike1 (...)". Applications would have to invoke the UDF using this other name.

External UDFs mean that extra level of complexity due to the external file dependency. And another drawback of this method (IMHO) is that (AFAICS) they don't seem to provide the java source for these UDFs... they provide only the binary ".jar" file, which makes our problem determination more awkward.

To be honest, I still prefer the solution I proposed ("LOCATE ()"), because:

- It's not UDF-based at all.
- It uses only DB2-provided SQL.
- The implementation of it in the G2 code is exactly the same as other existing platform-specifics... a simple call to $storage->getFunctionSql(). No new layers of complexity being added.

Any concerns if I stick with the LOCATE () solution?

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sat, 2005-11-12 16:00

Summary of the LIKE / CONCAT issue:
------------------------------------
DB2's LIKE statement can't deal with COLUMN data on the right side. We use AND iam1.g_parentSequence LIKE iam0.g_parentSequence || iam0.g_itemId || '/%' in two places, a LIKE statement with COLUMN data on the right side.
There are two alternatives to deal with it: a replacement for the whole clause:
1. LOCATE(iam0.g_parentSequence || CHAR(iam0.g_itemId) || '/', iam1.g_parentSequence) = 1
2. replace DB2's LIKE with a user defined function (UDF) "MS7.isLike1", so we'd call $storage->getFunctionSql('LIKE', $likeCriterion)

Locate solution:
------------------
Pro's:
+ it's simple and it works
+ don't have to ship with .jar
Con's:
- it's a ridicoulos level of abstraction replacing a LIKE + CONCAT(column, data) with a new statement. Other DBMS (mysql, oracle, pg) need to replace the CONCAT
actually we have to find every occurrence of a LIKE statement in G2 and make sure the stuff on the right isn't a column. We can't guarantee correct functionality of DB2 with G2 if not each and every module has been inspected for LIKE statements.
- for DB2 we have to replace the whole statement, for other DBMS just the CONCAT, it's more complicated than now

LIKE replacment:
-------------------
Pro's:
+ We'd just have to introduce a getFunctionSql() case for LIKE and it guarantee to work with all LIKE statements, it's much cleaner
Con's:
- Is there a license issue with this .jar file? Can we ship it with our GPL code?
- we'd have to be consequent and replace all LIKE statements with getFunctionSql(), but actually we'd have to replace it only in 2 places

Just to be clear:
- Shipping G2 with an additional .jar isn't a problem. In our install code it would register / declare this UDF

What statements would need to be replaced?
- modules/core/classes/GalleryItemHelper_simple.class LIKE + CONCAT(columns)
- modules/core/classes/GalleryItemHelper_simple.class LIKE + CONCAT(columns) (quite the same

that's it...for now. I searched for grep -r LIKE *, so hopefully no statements used lower-case "like".

Let's also see what bharat thinks about the two alternatives.

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7994
Posted: Sun, 2005-11-13 23:05

I spent some time going through the code to see if we could change the way that we do this parent sequencing code to avoid this problem altogether, but I don't see an alternative that we could go to. So if we have to do this, I think that we should aim for the solution that has the lightest footprint on the code so that it's the most maintainable. Therefore I'm in favor of the 2nd (LIKE replacement) solution.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Mon, 2005-11-14 06:14

First, I don't see how solution #2 is lighter in footprint or maintainability. As for the porting effort, that's what I'm doing now. Both solutions require us to use $storage->getFunctionSql().

Second, the licensing question is a good one. I don't think G2 can legally redistribute any part of it. I've pasted the MTK's EULA into a Word doc nd am attaching it here.

Third, we don't even know if it will work yet, because:

1) I haven't done any testing of it yet.

2) There are still any number of unknowns with this UDF. There doesn't even seem to be any documentation on how to use it.

3) There seems to be several different versions of this UDF, labelled:

- mtkms7.udf
- mtkms7ISeries.udf
- mtkms7ISeriesV5R2.udf
- mtksyb.udf
- mtksybISeries.udf
- mtksybISeriesV5R2.udf

Which version should I try to use? I dunno. What's the difference between them? Beats me. Why does iSeries have its own version of each of the UDFs? I have no idea.

4) If it breaks in the future, we don't have the java source. Do you really want to rely on IBM to support this? Read the EULA. ;-)

 
sphericus

Joined: 2005-11-06
Posts: 55
Posted: Mon, 2005-11-14 10:49

Well, I would expect the ms7 one to be like in MS SQL 7, and Sybase, to be like Sybase.
iSeries needs a seperate one as DB/400 a.k.a. DB2 for iSeries, is quite a different kettle of fish.

Thus with the choice of Sybase compatibility UDF's or MS SQL 7 compatibility UDF's I think it would be up to the G2 developers which one they choose. As they may prefer the Sybase or MS SQL commands over the other. Including but not limited to the LIKE predicate.

I will have a word with the IBM/DB2 people about licensing of the MTK and distribution with G2.
I am not overly familiar with the GPL, but can you distribute something that is binary only with a GPL product?
Or what you could do, is like with most linux distributions, is put a link to download the jar file from somewhere, into the release or install notes. As it would only effect DB2 users anyway, I am sure that it would not be a great problem, if you did that. As thats what people running nvidia graphics cards have to do when running SuSE anyway, although now you can download it via YOU, it is still the same issue, whereby they didn't feel that they were able to ship the drivers with their product.

Regardless, I will have a word with IBM about this tomorrow, and see what they say, if we are lucky, they may even be able to see if they can release the source code for that specific UDF to open source.

All the best.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Mon, 2005-11-14 13:26

@License:
Unless the UDF is open source, we'd most probably have to ask during step 5 of the installer (db step) to download the jar and put it into dir so and so if DB2 was selected. then it's not problem for us.

@why is the LIKE replacement better for us?
it's easier to maintain than the LIKE + column / LIKE + CONCAT replacement.
for the LIKE replacement, we'd only have to add the LIKE case to the db abstraction classes and return the same in all existing db classes and return the MS7.LIKE in the DB2 class.
what would be needed for the LIKE + CONCAT replacement?
an example from GalleryItemHelper_simple.class:
we use the following for our current db abstraction:

	list ($ret, $concat) = $storage->getFunctionSql(
	    'CONCAT',
	    array('[GalleryItemAttributesMap=1::parentSequence]',
		  '[GalleryItemAttributesMap=1::itemId]',
		  '\'/%\''));
...
	$query = sprintf('...
          AND
          [GalleryItemAttributesMap=2::parentSequence] LIKE %s
...
        ', ..., $concat, ...);

for DB2, we need something like
[code[
list ($ret, $likeConcat) = $storage->getFunctionSql(
'LIKECONCAT',
array('[GalleryItemAttributesMap=2::parentSequence]',
'[GalleryItemAttributesMap=1::parentSequence]',
'[GalleryItemAttributesMap=1::itemId]',
'\'/\''));
...
$query = sprintf('...
AND
%s
...
', ..., $likeConcat, ...);[/code]
or something like:

	list ($ret, $concat) = $storage->getFunctionSql(
	    'CONCAT',
	    array('[GalleryItemAttributesMap=1::parentSequence]',
		  '[GalleryItemAttributesMap=1::itemId]',
		  '\'/\''));
	list ($ret, $likeConcat) = $storage->getFunctionSql(
	    'LIKECONCAT',
	    array('[GalleryItemAttributesMap=2::parentSequence]',
		  $concat));
...
	$query = sprintf('...
          AND
          %s
...
        ', ..., $likeConcat, ...);

The challenge would be to have a single replacement for DB2 and our existing classes.
we don't want a if DB2 else statement in our application code. so we want something like this:

	list ($ret, $concat) = $storage->getFunctionSql(
	    'CONCAT',
	    array('[GalleryItemAttributesMap=1::parentSequence]',
		  '[GalleryItemAttributesMap=1::itemId]',
		  '\'/\''));
	list ($ret, $likeConcat) = $storage->getFunctionSql(
	    'LIKECONCAT',
	    array('[GalleryItemAttributesMap=2::parentSequence]',
		  $concat));
...
	$query = sprintf('...
          AND
          %s
...
        ', ..., $likeConcat, ...);

and LIKECONCAT in mysql, pg and oracle would append a '%' if possible ('\'/\'' -> '\'/%\'' is not just appending. Maybe just concat it with another '\'%\''. And db2 would use locate() instead. or the other way: your DB2 LIKECONCAT function should check for '%' and remove it. and if there was a % at the beginning, you need to check for > 0 instead of = 1. well that would make your DB2 LIKECONCAT quite a bit more flexible and more attractive.

that would work for our LIKE + CONCAT. Maybe we should call this function LIKECOLUMNPLUS or so :/, our specific replacement is quite unflexible anyway, it's not a genenral like. there could be more and more statements that need a different replacement.

here the code needed if we used the UDF LIKE:

	list ($ret, $concat) = $storage->getFunctionSql(
	    'CONCAT',
	    array('[GalleryItemAttributesMap=1::parentSequence]',
		  '[GalleryItemAttributesMap=1::itemId]',
		  '\'/%\''));
	list ($ret, $like) = $storage->getFunctionSql(
	    'LIKE',
	    array('[GalleryItemAttributesMap=2::parentSequence]',
		  $concat));

...
	$query = sprintf('...
          AND
          %s
...
        ', ..., $likeConcat, ...);

so given that your Locate / likeconcat replacement would do checking for '%' and handle it correctly, the LIKE + CONCAT alternative is about as easy / small in code footprint as the UDF LIKE replacement.

the question is now:
when would the LOCATE() like replacement not work?
- e.g. if we have a like pattern like '%the beginning does not mattter but the last characters must match!' of course the pattern would be a result of concat'ing columns or it would just be a column.
this case could maybe be fixed with a more complicated SQL replacment where we also use = LENGTH(iam1.column) - LENGTH(match)
- e.g. a LIKE pattern like '%somewhere % somewhere later % and so on %'
have fun finding a replacment SQL for this. of course the pattern would be the result of concat or so.

whereas we can expect a UDF LIKE replacement to just work as expected, i.e. work as the LIKE operator in ms sql / oracle, ...

the UDF LIKE replacement would be something that we wouldn't have to care much about. if we find another statement that needs it, just plug it in. if we go with the locate(), we'd have to make sure that the locate() also works for the new case.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Mon, 2005-11-14 20:37

Well, I've now done some preliminary testing of the MTK UDFs, and here's what I've found.

1) There are actually only two replacements for LIKE: ms7.isLike() and syb.isLike().

2) In both cases, they still have the same problem as the regular LIKE predicate... they don't support column names in concatenation.

Quote:
SELECT iam0.g_itemId, COUNT(iam1.g_itemId) FROM g2_ItemAttributesMap AS iam0, g2_ItemAttributesMap AS iam1, g2_AccessSubscriberMap WHERE iam0.g_itemId IN (7) AND SYB.isLike(iam1.g_parentSequence, iam0.g_parentSequence || iam0.g_itemId, '') = 1 AND iam1.g_itemId = g2_AccessSubscriberMap.g_itemId AND g2_AccessSubscriberMap.g_accessListId IN (8,9) GROUP BY iam0.g_itemId
SQL0440N No authorized routine named "||" of type "FUNCTION" having
compatible arguments was found. SQLSTATE=42884

Concatenation of string literals do work.

Quote:
SELECT iam0.g_itemId, COUNT(iam1.g_itemId) FROM g2_ItemAttributesMap AS iam0, g2_ItemAttributesMap AS iam1, g2_AccessSubscriberMap WHERE iam0.g_itemId IN (7) AND SYB.isLike(iam1.g_parentSequence, 'x' || 'y' || '/%', '') = 1 AND iam1.g_itemId = g2_AccessSubscriberMap.g_itemId AND g2_AccessSubscriberMap.g_accessListId IN (8,9) GROUP BY iam0.g_itemId

G_ITEMID 2
----------- -----------

0 record(s) selected.

(And yes, I also tested CONCAT() instead of '||'.)

They also do work with a single column name (no concatenation):

Quote:
SELECT iam0.g_itemId, COUNT(iam1.g_itemId) FROM g2_ItemAttributesMap AS iam0, g2_ItemAttributesMap AS iam1, g2_AccessSubscriberMap WHERE iam0.g_itemId IN (7) AND SYB.isLike(iam1.g_parentSequence, iam0.g_parentSequence, '') = 1 AND iam1.g_itemId = g2_AccessSubscriberMap.g_itemId AND g2_AccessSubscriberMap.g_accessListId IN (8,9) GROUP BY iam0.g_itemId

G_ITEMID 2
----------- -----------

0 record(s) selected.

I grant you that the LOCATE() solution won't be as easy as I thought, but I now don't think the MTK UDFs solution is going to fly at all.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Mon, 2005-11-14 21:18

thanks for testing.
i guess you still could write a SQL/PL / Java UDF on your own for DB2 that does actually work, but i tend to "let your go" and accept the LOCATE replacemnt.

let's just shortly explore how such a PL/SQL function would look like:
- handle special characers _ and % as single character and multicharacter wildcards
http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html
-> _ matches exactly one character
-> % Matches any number of characters, even zero characters

- preferably 100% replaces LIKE (no getSqlFunction call required), but a functin with another name is also fine

from the sql 92 standard: http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

Quote:
<like predicate> uses the triadic operator LIKE (or the inverse,
NOT LIKE), operating on three character strings and returning
a Boolean. LIKE determines whether or not a character string
"matches" a given "pattern" (also a character string). The char-
acters '%' (percent) and '_' (underscore) have special meaning when
they occur in the pattern. The optional third argument is a charac-
ter string containing exactly one character, known as the "escape
character", for use when a percent or underscore is required in the
pattern without its special meaning.

then on page 213 section 8.5 there's more.
- we don't need the escape character feature right now, so this wouldn't have to be implemented

<like predicate> ::=
              <match value> [ NOT ] LIKE <pattern>
                [ ESCAPE <escape character> ]

so we have a function with 2 operands, and we iterate through <pattern> character by character with current character c to match current character mo f m of <match value> (c++, m++ means to set c/m = next character in the match/pattern string):

while (c) {
  if c == '_') {
    // doing already everything after if elseif else clause
  } else if (c == '%') {
    if c == last character of <pattern>, return true;
    else { 
      c++;
      do: if new c != m, m++; while (m not end of string)
      if (m end of string) { return false; }
    }
  } else (not _ or %):
    if c != m,
     return false;
  }
 c++;
 m++;
} // end while c

i'm not a PL/SQL guru, but that should be feasible.
if someone knows DB2 SQL/PL and feels like implementing, please do.
else we go with LOCATE.

Larry Menard
is there anything else that needs to be handled, what's holding you off from running g2/db2 right now? any other issues?

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Mon, 2005-11-14 23:20

Yes, that's roughly what I'm thinking too. I'll see if I can come up with our own UDF (in PL/SQL) that handles concatenating strings and column values.

I still don't know what's 'down the road' after this LIKE/CONCAT issue in this porting exercise. I haven't yet gotten the 'main.php' interface to successfully run.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Tue, 2005-11-15 01:53

Question: will any of the SQL in g2 use more than one '%' wildcard in a LIKE? E.g., WHERE ... LIKE 'The quick % fox jumped %'

Handling multiple '%' wildcards is significantly more difficult than if we limit ourselves to just one.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2005-11-15 02:23

i don't know but such a limitation would / could only lead to other issues.
i guess my sketch / algorithm from above would work. of course you need to put it into pl/sql and it's getting a little uglier 'cause of that but it sould work.

cd modules/core/classes
egrep "%[^sdim%YHMS ]" *
returns at least:
GalleryCoreSearch.class: $whereData[] = '%' . $criteria . '%';

there might be more.
it would be really cool if you could implement the LIKE function without the "only a single wildcard" limitation!

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Tue, 2005-11-15 02:38

You're killing me, you know that? :-)

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2005-11-15 02:42

i'm there for your daily challenge. i guess you must hate me by now :/

got another question:
we're planning to correct a silly limitation of G2. right now we're using bit data for our permissions as you know. 32 bits.
in DB2 you're using "varchar(32) FOR BIT DATA" for that.
we're planning to increase that to 64 and soon 256 bits.
it's a silly question, just want to be super sure, but "varchar(256) FOR BIT DATA" shouldn't be a problem, right? this varchar field probably is good for 2000 chars, right? (searched with google and on db2's homepage, but didn't find it yet)

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Tue, 2005-11-15 03:05

The max size of a VARCHAR in DB2 is 32,762. The fact that it's FOR BIT DATA doesn't change that.

However, to use VARCHAR of more than 4000 will require a bufferpool and tablespace with a page size of more than 4K are created (the current default pagesize is 4K). That's something that could be done within g2, but it's probably much easier to ensure the User does so when they create the database.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2005-11-15 03:06

thanks :)

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Tue, 2005-11-15 21:08

OK, I think I have a working PL/SQL "g2_like" UDF that handles concatenated parms and multiple wildcards. (Actually the concatenation is done at the time UDF is called... the value received into the UDF is already concatenated.)

It returns 0 if the match is successful, 1 if the match is not successful.

I've tested a number of scenarios:

Quote:
These five are expected to match (and they do):

g2_like ('The quick brown fox jumped over the lazy dog', 'The quick % fox jumped over the lazy dog')
g2_like ('The quick brown fox jumped over the lazy dog', 'The quick % fox jum_ed over the lazy dog')
g2_like ('The quick brown fox jumped over the lazy dog', 'The quick % fox jum_ed %ver the lazy dog')
g2_like ('The quick brown fox jumped over the lazy dog', 'The quick ' || '%' || ' fox jum_ed %ver the lazy dog')
g2_like ('The quick brown fox jumped over the lazy dog', 'The quick ' || '%' || ' fox jum_ed %ver the lazy do%')

These two are expected to NOT match (and they don't):

g2_like ('The quick brown fox jumped over the lazy do', 'The quick ' || '%' || ' fox jum_ed %ver the lazy dog')
g2_like ('The quick brown fox jumped over the lazy d', 'The quick ' || '%' || ' fox jum_ed %ver the lazy dog')

Any concerns about this, or can you suggest any other scenaios I should test?

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2005-11-15 21:39
Quote:
OK, I think I have a working PL/SQL "g2_like" UDF that handles concatenated parms and multiple wildcards.

yay for that! :) most excellent :)

Quote:
It returns 0 if the match is successful, 1 if the match is not successful.

usually 0 == false and 1 == true. why did you invert this?

could you test another few edge cases?
these should also pass:
g2_like ('The quick brown fox jumped over the lazy dog', 'The quick % fox jumped over the lazy dog%')
g2_like ('The quick brown fox jumped over the lazy dog', '%The quick%jumped over the lazy dog')
g2_like ('The quick brown fox jumped over the lazy dog', '%The quick%jumped over the lazy dog%')
g2_like ('The quick brown fox jumped over the lazy dog', 'The quick % fox jumped ___r__he lazy dog%')
g2_like ('The quick brown fox jumped over the lazy dog', '%')
g2_like ('The quick brown fox jumped over the lazy dog', '_%_')

these test should not pass:
g2_like ('The quick brown fox jumped over the lazy dog', '_The quick%jumped over the lazy dog')
g2_like ('The quick brown fox jumped over the lazy dog', 'The quick%jumped over the lazy dog_')
g2_like ('The quick brown fox jumped over the lazy dog', '__')

thanks!!

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Tue, 2005-11-15 21:57

I think I fell into the "0 = success" trap. I can easily switch the return values.

I'll test the new test scenarios tonight and let you know how it goes.

Thanks for the input.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Tue, 2005-11-15 23:14

The UDF is failing on a few of your testcases. I'm on it. Thanks for the suggestions.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Wed, 2005-11-16 05:42

I'm stumped on something. I've trimmed down my UDF to bare bones to illustrate this, hopefully somebody can tell me what I'm doing wrong.

(Note, DB2 CLP scripts such as the following, "--" indicates comments.)

Quote:
connect to gallery2 user g2user using g2pwd@
create function g2_like (subject varchar(1024), pattern varchar(1024)) returns integer
begin atomic

declare subject_position, pattern_position integer;

-- returns 1 if the pattern is matched
-- returns 0 if the pattern is not matched

set subject_position = 1;
set pattern_position = 1;

while pattern_position <= length(pattern) do

-- -- compare the two characters at the current positions
-- if ((substr(pattern, pattern_position, 1) = '_') or
-- (substr(pattern, pattern_position, 1) = substr(subject, subject_position, 1)))
-- then -- Current two characters match (including '_' wildcard)
-- -- nop, next iteration please;
-- else -- Current two characters do not match

if substr(pattern, pattern_position, 1) = '%'
then -- but the pattern character is '%'
-- removed for simplification
else -- mismatch
return pattern_position;
end if;

-- end if;

set subject_position = subject_position + 1;
set pattern_position = pattern_position + 1;

end while;

return 1;
end@

-- should match ('%' is in column 11 of the pattern)
values g2_like ('The quick brown fox jumped over the lazy dog',
'The quick % fox jumped over the lazy dog')@
connect reset@

With the outer 'if' commented out, the pattern matches successfully:

Quote:
values g2_like ('The quick brown fox jumped over the lazy dog', 'The quick % fox jumped over the lazy dog')

1
-----------
1

1 record(s) selected.

But if we un-comment that outer 'if', all of a sudden it starts to fail, and it suddenly seems to think the '%' is in position 12 (it is in fact in position 11).

Quote:
values g2_like ('The quick brown fox jumped over the lazy dog', 'The quick % fox jumped over the lazy dog')

1
-----------
12

1 record(s) selected.

Can anybody see anything wrong with the logic?

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Wed, 2005-11-16 08:46

are you sure the _ and especially the % do not already have a special meaning in DB2?
so the following code:
if substr(pattern, pattern_position, 1) = '%'
evaluates always to true!
the result (match at character 1) confirms this theory. maybe you need to escape % and _ ... just a thought. wild guesses help sometimes.

and i can't answer the why 12 question without seeing the whole code i guess.
btw: maybe also tests following edge cases (what happens if the pattern is longer etc):

should return true:
g2_like ('', '')
g2_like ('F', '%')

should return false:
g2_like ('The quick brown fox jumped over the lazy dog', '')
g2_like ('F', 'Bar')
g2_like ('', 'Bar')

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Wed, 2005-11-16 21:15

I reworked a number of things and the problem went away.

('%' and '_' are special to DB2 only when used in a LIKE predicate... in a garden-variety PL/SQL UDF they're just plain characters.)

FYI, I'm down to only 4 failing testcases out of 20. I think these will boil down to only 2 problems (if I'm lucky, only 1). Will keep you posted.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Wed, 2005-11-16 21:44

great :)

hope you will GPL the UDF, for the fame and the glory ;)

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Thu, 2005-11-17 03:00

OK, all 20 testcases are now successful.

There might be better ways to write it, but I never claimed to be a hardcore Developer. As it is, I think it's fairly clean and understandable.

As this has been a significant piece of work, I'm attaching the UDF source, testcases, and testcase output here for your review.

P.S.: I'm not familiar with the use of the term "GPL" as a verb. Exactly what does that entail? :-)

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Thu, 2005-11-17 03:23

Kudos! You rock :)

i took a quick look at the code. not as short as i thought, but it looks reasonable. congrats!

to GPL something := to make it available to the public under the GPL license (well, that's my definition).
by contributing this code to G2, you practically publish it under the GPL.
I'll add your name in the sql file as author of the function and i mention you in the cvs commit message. and we'll add you to the contributors list of G2, see: http://gallery.menalto.com/wiki/Development_Team .

so i suggest we add it to G2 as soon as you give your ok.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Thu, 2005-11-17 04:07

OK, feel free to add my name anywhere you wish.

My time is somewhat booked up for tomorrow, but I want to implement this and one more change to see how far that gets me, then I'll send you my current changes on Friday evening.

Thanks.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Thu, 2005-11-17 05:22

Please also submit your BITAND UDF then we can add everything to g2 and see how far that gets you.

there are a few issues now:
- BITAND:
1. it need to operate on at least 8 bit (integer and not SMALLINT)
2. it needs to AND bit by bit. the arguments X and Y of this function are sequences of Bits. you need to AND bit 0 of X with bit 0 of Y, bit 1 of X with bit 1 of Y, ...

- BIT_OR:
1. this needs to be an aggregate function. just like AVG(), SUM(), .... . it receives a single argument, the column name.
See GalleryPermissionHelper_medium.class as an example:

	SELECT
	    [GalleryAccessSubscriberMap::itemId],
	    BIT_OR([GalleryAccessMap::permission])
	FROM
	    [GalleryAccessMap], [GalleryAccessSubscriberMap]
	WHERE
...
	GROUP BY
	    [GalleryAccessSubscriberMap::itemId]

2. it needs to OR bit by bit just like BITAND. see
User-defined Aggregate Functions in DB2 Universal Database:
http://www-128.ibm.com/developerworks/db2/library/techarticle/0309stolze/0309stolze.html
- looks like user defined aggregate / column functions are not natively supported in DB2
- has this changed since 2003?
- but there are workarounds
- i didn't know that you still had to write this. i looked 15 minutes ago at the DB2 class in g2 and realized it wasn't what we needed.
- maybe you can define a new datatype: bitstring, an array of bits or so
- as aggregate / column function, you could either user MIN, SUM or AVG, per bit, MIN makes sense, MAX = OR. If you could use MAX as aggregate but for each bit separately, that would be great

i guess compared to this user defined aggregate function, the LIKE method was almost easy... :(

i've started changing the db classes, added CreateDb2Udfs.sql etc. but before we don't have this BITOR aggregate, we've got a problem.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Thu, 2005-11-17 15:54

Yes, I will also provide my BIT_AND. It already functions as you describe, bit-by-bit, and the arguments are sequences of 32 bits.

Thanks for the description of BIT_OR, I had assumed it was more similar to BIT_AND (receives 2 sequences of bits).

I have no experience with user-defined aggregate functions, and haven't yet read Knut's article, so before I look into them I don't know if they ever were or are now supported. Yet another thing I have to learn on the fly. :-)

I'll send you my latest changes (since my last CVS UPDATE) tomorrow. (Only things that work.)

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Thu, 2005-11-17 18:34

Great :)
Take your time. I'm aware that this has turned out to be something much more involving than the both of us thought. So if you need to take a break and come back later to work on the BIT_OR, feel free to do so. I'd understand it.
Cool that you already got the BITAND, will review it and get it into cvs as soon as i get it. Of course G2 /DB2 won't work at this point without BIT_OR, but it will be much better than the current CVS version of G2.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Fri, 2005-11-18 00:23

In integrating the G2_LIKE() UDF into the G2 code I've hit a (hopefully minor) snag.

The SQL generated is:

Quote:
AND G2_LIKE(iam1.g_parentSequence, iam0.g_parentSequence || iam0.g_itemId || '/%' ) = 1

But the || fails because iam0.g_itemId is not a string literal. It works if I wrap it in a CAST(... AS CHAR(10)):

Quote:
AND G2_LIKE(iam1.g_parentSequence, iam0.g_parentSequence || CAST(iam0.g_itemId AS CHAR(10)) || '/%' ) = 1

The problem is that I can't always assume that the 3rd arg needs to be cast, can I?

One solution is if I can programmatically test the datatype for each arg in the getFunctionSql() code. Is that possible? Then I could wrap the CAST around only the necessary columns.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2005-11-18 00:45

yeah, when looking yesterday at the code, i thought too of that issue.

can't you in case 'CONCAT': always cast to CHAR? why would that be bad?

that's what i have in my current version of Db2DatabaseStorage.class:

       case 'CONCAT':
            $sql = 'char(' . implode(') || char(', $args) . ')';
            break;

thus casting every argument to CHAR.

programmatically check each argument for the type? don't think that's easily feasible, quite some changes would be necessary. of course we could just add another arg to concat to let getSqlFuntion know... but only if it's really necessary.

and i'd really prefer if we could just use CHAR($arg[..]) and not CAST as CHAR(10)), we don't know the length...

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Fri, 2005-11-18 01:12

Yeah, good idea. But for some reason the ||s in my SQL don't seem to be going through the case 'CONCAT'. Any idea why that might be happening?

Also, if you have three values concatenated, will you not be wrapping a char() wrapper around another char() wrapper in the 2nd arg? The first time through the code it will wrap char() around $arg[0] and $arg[1]. The second time through the code it will wrap another char() around $arg[1] (which already has a char() wrapper) and a wrapper around $arg[2]. So you'll wind up with char($arg[0]) || char(char($arg[1])) || char($arg[2]). You might want to first test it to see if it's already being cast.

(Re the CAST($arg[x] AS CHAR(n)) versus CHAR($arg[x]), sorry, old habits die hard. :-)

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2005-11-18 01:26

|| that is not generated with || ?
don't no where that would be.

the above case 'CONCAT': is not yet in your DB2 class, i have it just on my harddisk, maybe that was confusing.

@multiple arguments:
the above implode() works for multiple arguments and you'd get
char($arg[0]) || char($arg[1]) || char($arg[2])
for 3 arguments. however, if one of the arguments was already casted, is that what you mean?, then this argument will be casted again.
don't think that hurts... let's try it with that.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Fri, 2005-11-18 01:42

Never mind, I retract my questions... I found and fixed the problem, and yes I realized how the implode() works. I should have thought about it a bit more before posting. :-)

OK, the update to case 'CONCAT' is now working fine, as is the G2_LIKE(). Thanks. :-)

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Fri, 2005-11-18 20:11

I think I know how I can do the aggregate BIT_OR.

I think I will have to write a UDF that uses a 'scratchpad' (a data area that holds persistent data across invocations of the function). For example: In statement "SELECT foo(col1) FROM tb1", function "foo" is invoked for each value in column "col1". Each time it is invoked on a new value, it should:

Quote:
if (scratchpad is empty) then
{
simply copy all 32 bits of current row value into the scratchpad.
}
else
{
for (each bit of the current scratchpad string contents)
{
if (the appropriate bit in the scratchpad contents is already "1") then
{
nop.
}
else
{
if (the corresponding bit in the current row value is "0") then
{
set the appropriate bit in the scratchpad string to "0".
}
else
{
set the appropriate bit in the scratchpad string to the value of the bit in the current row value.
}
}
}
return (current scratchpad string)

The original SQL statement will only see the returned value from the last invocation of the function. (Actually, I'm not 100% sure of that part. ;-)

I expect the G2 SQL might even be able to remain unchanged from what it is currently.

The down side is that only 'external' UDFs are able to use the SCRATCHPAD functionality. 'External' basically means the function resides in a separate library. In this case, I'd write it in java so that the library is binary-portable.

I may have to amend this plan as I proceed, but does that sound OK to you so far?

P.S., I apologize that the above code is so difficult to read. Is there any way to preserve indenting in these forum posts?

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2005-11-18 20:38

@unchanged g2 sql code: that would be great, but from the example for a user defined workaround aggregate function in DB2 from the link, it looks like the returned format of the MAX(BIT_OR(column)) is counterBIT_OR_RESULT, whereas the counter part can't be removed without rewriting the SQL of the whole query (subquery).

the problem is, that you have to print each intermediate result in your user defined function.

i guess i have an idea:

say SELECT BIT_OR(column)
outputs:
00000001INTERMEDIATE_BIT_OR_RESULT
00000002INTERMEDIATE_BIT_OR_RESULT
00000003INTERMEDIATE_BIT_OR_RESULT
...
00009701INTERMEDIATE_BIT_OR_RESULT

so we have a counter in the first x nr of characters. the INTERMEDIATE_BIT_OR_RESULT is a NR_OF_BITS long string of 0's and 1's.

so instead of
SELECT MAX(BIT_OR(column)) FROM FOO
which would return
00009701INTERMEDIATE_BIT_OR_RESULT

we do now:

SELECT MAX(BIT_OR(column)) - (COUNT(*) * 10^NR_OF_BITS) FROM foo
which wil return:
00009701INTERMEDIATE_BIT_OR_RESULT - (9701 * 10^NR_OF_BITS) which DB2 of course will return as
INTERMEDIATE_BIT_OR_RESULT

explanation:
10^NR_OF_BITS is 10 to the power of NR_OF_BITS.
so since the internal counter of your UDF and COUNT(*) will have the same value, count(*) will be 9701. so we can remove the stuff in front of the actual result by subtracting COUNT(*) * 10^NR_OF_BITS.

10^NR_OF_BITS will have to be computed in PHP and we can just write 'COUNT(*) * ' . pow(10, 32) since we're using 32 bits right now.

i really hope that works :)

@java:
that's fine, i guess.

@code formatting in the forums:
you're always using the quote tag. there's also a code tag which is basically a html pre. just replace quote with code in your tags :)

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Fri, 2005-11-18 21:47

> from the example for a user defined workaround aggregate function in DB2 from the link

I'd suggest you don't pay any attention to Knut's article. I find he's made it much more complicated than it needs to be. I turned to Don Chamberlin's book "Understanding DB2 Universal Database"... it's much more understandable by mere humans like me.

The query I'm now working on is:

SELECT g2_AccessSubscriberMap.g_itemId,
 BIT_OR(g2_AccessMap.g_permission)
 FROM g2_AccessMap, g2_AccessSubscriberMap
 WHERE g2_AccessSubscriberMap.g_itemId IN (?)
 AND g2_AccessSubscriberMap.g_accessListId = g2_AccessMap.g_accessListId
 AND (g2_AccessMap.g_userId = ? OR g2_AccessMap.g_groupId IN (?))
 GROUP BY g2_AccessSubscriberMap.g_itemId

There's no MAX() in there, so I don't understand why you keep bringing that up. There's no COUNT() either.

I guess I was misunderstanding what the BIT_OR() was supposed to do. My current understanding is that it is supposed to return a sequence of values (one for each g_permission value) prefixed by a counter. If so, the counter is no problem (I can easily use the scratchpad for that), but what is each g_permission value being ORed against?

(BTW, I'm not a Mathie at heart, so applied math explanations are not as helpful to me as you might think. :-()

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Fri, 2005-11-18 22:39

@MAX():
my MAX() is from Knut's approach. I read parts of his article and he implements UD aggregate functions with UDFs and the help of MAX().
We'd have to replace BIT_OR with getSqlFunction and there we could apply the MAX(BIT_OR(column)) - (COUNT(*) * constant) logic.

@BIT_OR():
BIT_OR() is a built-in aggregate function in mysql and we have our own implementation of it for postgresql and oracle.
So BIT_OR() works just like SUM(), AVG(), MAX(), and other aggregate functions in SQL.
e.g.
SELECT category, AVG(price)
FROM products
GROUP BY category;

for a table product, category, price
would return the average price for each category;
you get a single line per category in the result.

the following is also a fictive query:

SELECT itemId, BIT_OR(permission)
FROM itemPermissions
WHERE user = 'joe'
GROUP BY itemId;

for a table itemId, permission, user
will return for each itemId a single line indicating the sum of all permissions user 'joe' has for the item.
a single line, e.g.
if the table has the rows;

itemId   permisssion   user
---------------------------
     5   10000000      tom
     5   00100000      joe
     5   00000001      joe
    99   00000010      joe
    99   00100001      joe
    99   11000000      joe

it will OR all rows that match the where clause (joe) for each itemId (group by).
that's exactly like SUM(), sum() would some the 3 rows with itemId 5, user = joe,
but BIT_OR ORs these 3 values. here's the result of the query:

itemId   permission
----------------------
     5   00100101
    99   11000011

so, no: our BIT_OR(), what we need as a 1:1 replacement for the mysql BIT_OR, is an aggregate function that just returns a value. not a sequence, not prefixed by a counter.
but since i thought (after reading Knut's article) that this is not possible, we need a workaround.
applying the same scheme as Knut, I'm using a counter and MAX(). the counter will ensure that we pick the last intermediate result, because this last intermediate result is prefixed by the MAX counter value.
the "- (COUNT(*) * constant)" thing is my extension to Knut's article, such that we can do the whole thing with a simple getSqlFunction replacement and don't need to rewrite the original SQL.

@your alternative:
Well, i don't know what's possible in DB2. After reading Knut's article, it seems you have to do it that way. And I was thinking with such limits in mind.
If you found a solution that doesn't need such a workaround, fine. i'm just trying to help to solve the problem.

 
Larry Menard
Larry Menard's picture

Joined: 2005-10-01
Posts: 757
Posted: Fri, 2005-11-18 23:05

I'll try to digest this tonight. In the meantime, here's what I have for delivery.

Quote:
lib/adodb/drivers/adodb-db2.php.inc (SQL_CUR_USE_IF_NEEDED)

modules/core/classes/GalleryStorage/DatabaseStorage/Db2DatabaseStorage.class ('BIT_AND' UDFs, 'G2_BIT_AND' invocation, case 'LIKE', 'G2_LIKE' UDF, add char() wrapper in case 'CONCAT')

modules/core/classes/helpers/GalleryItemHelper_simple.class (use getFunctionSql('LIKE', ...))

Remember, the new LIKE case in getFunctionSql() will need to be added to the other RDBMSs.

This will bring the code up to the same spot I'm at... dying on the BIT_OR.

Let me know when it's checked in, and I'll get a fresh copy of the build and verify it's working as I expect.

Thanks, Valiant.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sat, 2005-11-19 02:12

committed to cvs, get tomorrows nightly snapshot. you can't use anonymous CVS right now, since sf.net is having hardware problems and anonymous cvs is still in the state of a few days back.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Sat, 2005-11-19 02:52

Larry,

bharat is adding another place where he uses BITAND, but this time for PermissionSetMap::flags, which is just 4 bits ling (it's an integer anyway).
since you cast to char 32 in getSqlFunction BITAND , we might have to CAST the resukt back to INT(), but just in this case.
you don't have to bother about this now, just wanted to mention this such that we don't forget.
bharat will add this soon...