Hi list
japanese sjis word error by SQL-parser. Some letters include '0x5c' code same backslash.
I can not execute below SQL INSERT INTO kanji2 VALUES (1, '...0x955c');
error session is : eNqFkU8vA0EYh+/zKd6DBMl27My21Y7TZrOhsd2t3e1WEEnRVKua1UrD0Y4LQgi9+BKcJA7+JEgc +Aw+gYNv4F0uTuY0yczzzPub+dm+7/kCLAYWB8d2BRR0KDDIZ3UShHgiCbGiQMBIaU1Af6sT13v9 Ro92Wis0Xo8NbQCM8kngus4n9OIEKwLLi5wuDA6dxmYR7J0YRkh5N5hzBBiUGzTHM91tUg188AIN zGnbDTWIbJxVa3Wh7C2UHMeEHNVJpWwK4NSgLNNb5aQyU0k5zcM4WYr7UCu5bkgc050W0K5n+u1W n/wMKrmB7Ye4hB5s1LvtFofIdKp2AGNMg9Hh0uHDyaVsyHnZHC6Njk+RfwSOguzIFdlEvLK3KAcX KsVIFUuey2XEf1VLuqgP5LE8UMhZlJ/unhk3FGAunTIvl/HuIxnhpKZCyKfCz6MV4CSCB+/nqwqs gFjymnwlL8n93owCLiJ883jzosCYnoaM8TURBm3gH6rCMvZbEJaTVFUs/8Oq6bTHs9PP2eN9FZmW 9nr1fHd7/faB7DdP8v3z
On Mon, Sep 23, 2002 at 03:38:24PM +0900, luc wrote:
Hi list
japanese sjis word error by SQL-parser. Some letters include '0x5c' code same backslash.
I can not execute below SQL INSERT INTO kanji2 VALUES (1, '...0x955c');
Hi Luc.
I don't know how you are generating the SQL in question, but when I type in the U+955C [as alt-38236] (which is ) character into my browser, windows seems to fault and generates a U+005C character instead of the U+955C that I typed in. However if use character map to generate the character, and then copy paste to PMA, the SQL it generates is INSERT INTO `foo` ( `a` ) VALUES ( '镜' );
Which is perfectly fine for my case of limited japanese, but I don't know if that is the solution for a lot of CJK content.
error session is : eNqFkU8vA0EYh+/zKd6DBMl27My21Y7TZrOhsd2t3e1WEEnRVKua1UrD0Y4LQgi9+BKcJA7+JEgc +Aw+gYNv4F0uTuY0yczzzPub+dm+7/kCLAYWB8d2BRR0KDDIZ3UShHgiCbGiQMBIaU1Af6sT13v9 Ro92Wis0Xo8NbQCM8kngus4n9OIEKwLLi5wuDA6dxmYR7J0YRkh5N5hzBBiUGzTHM91tUg188AIN zGnbDTWIbJxVa3Wh7C2UHMeEHNVJpWwK4NSgLNNb5aQyU0k5zcM4WYr7UCu5bkgc050W0K5n+u1W n/wMKrmB7Ye4hB5s1LvtFofIdKp2AGNMg9Hh0uHDyaVsyHnZHC6Njk+RfwSOguzIFdlEvLK3KAcX KsVIFUuey2XEf1VLuqgP5LE8UMhZlJ/unhk3FGAunTIvl/HuIxnhpKZCyKfCz6MV4CSCB+/nqwqs gFjymnwlL8n93owCLiJ883jzosCYnoaM8TURBm3gH6rCMvZbEJaTVFUs/8Oq6bTHs9PP2eN9FZmW 9nr1fHd7/faB7DdP8v3z
Decoded bug report (the characters are badly mangled by my mail client): ERROR: C1 C2 LEN: 80 81 640 STR: .
CVS: $Id: sqlparser.lib.php3,v 1.27 2002/09/19 16:50:32 lem9 Exp $ MySQL: 3.23.52-nt USR OS, AGENT, VER: Win MOZILLA 5.0 PMA: 2.3.1-rc2 PHP VER,OS: 4.2.3 WINNT LANG: ja-sjis SQL: INSERT INTO kanji2 VALUES (1, '..�...e.X.g.'); INSERT INTO kanji2 VALUES (2, '.l.b.g.X.P�[.v.'); INSERT INTO kanji2 VALUES (3, '.C...^�[.l.b.g.C.N.X.v.�..'); INSERT INTO kanji2 VALUES (4, 'ý�123'); INSERT INTO kanji2 VALUES (5, '.X.^.C...V�[.g'); INSERT INTO kanji2 VALUES (6, '.e.X.g'); INSERT INTO kanji2 VALUES (7, '.�.c'); INSERT INTO kanji2 VALUES (8, '.�.�.�.��H'); INSERT INTO kanji2 VALUES (9, '����'); INSERT INTO kanji2 VALUES (10, '.p�[.V.X.e...g'); INSERT INTO kanji2 VALUES (11, '.l.X.P.U'); INSERT INTO kanji2 VALUES (12, '.l.X.P.U.U'); INSERT INTO kanji2 VALUES (13, '.��K�.'); INSERT INTO kanji2 VALUES (14, 'ȯĽ����');
Hi Robin
Please try to insert my attached texfile. I think to need new parser for multibyte character with using multibyte functions like mb_strlen().
Robin Johnson wrote:
I don't know how you are generating the SQL in question, but when I type in the U+955C [as alt-38236] (which is ) character into my browser, windows seems to fault and generates a U+005C character instead of the >U+955C that I typed in. However if use character map to generate the character, and then copy paste to PMA, the SQL it generates is INSERT INTO `foo` ( `a` ) VALUES ( '镜' );
On Tue, Sep 24, 2002 at 12:29:35AM +0900, luc wrote:
Please try to insert my attached texfile. I think to need new parser for multibyte character with using multibyte functions like mb_strlen().
Hi Luc and list. Sorry about the delay, I've been really busy with university, exams and work.
I have a rough preliminary fix for this inside just the SQL Parser, but it made me realize that the same fix should probably be extended to the rest of the system as well.
Problem summary: PHP's string functions fail badly on strings containing multibyte (MB) characters. This effects EVERY language that uses unicode or other encoding schemes.
Solution: Replace ALL string functions in the code (strlen, substr, strpos etc.) with a variable $GLOBALS['PMA_MB_functionname'] that is then called. Additionally to this, we have a little bit of code that puts the correct function name in those variables, depending on the availability (PHP version is important here) and the need for multibyte characaters. Eg mb_strlen or strlen would go into $GLOBALS['PMA_MB_strlen'] which could then be used in the code directly, and handle everything properly.
Any comments/suggetions/flames on this?
Robin Johnson wrote:
On Tue, Sep 24, 2002 at 12:29:35AM +0900, luc wrote:
Please try to insert my attached texfile. I think to need new parser for multibyte character with using multibyte functions like mb_strlen().
Hi Luc and list. Sorry about the delay, I've been really busy with university, exams and work.
I have a rough preliminary fix for this inside just the SQL Parser, but it made me realize that the same fix should probably be extended to the rest of the system as well.
Problem summary: PHP's string functions fail badly on strings containing multibyte (MB) characters. This effects EVERY language that uses unicode or other encoding schemes.
Solution: Replace ALL string functions in the code (strlen, substr, strpos etc.) with a variable $GLOBALS['PMA_MB_functionname'] that is then called. Additionally to this, we have a little bit of code that puts the correct function name in those variables, depending on the availability (PHP version is important here) and the need for multibyte characaters. Eg mb_strlen or strlen would go into $GLOBALS['PMA_MB_strlen'] which could then be used in the code directly, and handle everything properly.
Any comments/suggetions/flames on this?
Looks ok to me, but looks like a lot of work, unless we can automate the source code changes.
Marc
Hi Robin, Marc and list.
Marc Delisle wrote:
Robin Johnson wrote: :
Hi Luc and list. Sorry about the delay, I've been really busy with university, exams and work. : Solution: Replace ALL string functions in the code (strlen, substr, strpos etc.) with a variable $GLOBALS['PMA_MB_functionname'] that is then called. Additionally to this, we have a little bit of code that puts the correct function name in those variables, depending on the availability (PHP version is important here) and the need for multibyte characaters. Eg mb_strlen or strlen would go into $GLOBALS['PMA_MB_strlen'] which could then be used in the code directly, and handle everything properly.
Any comments/suggetions/flames on this?
Looks ok to me, but looks like a lot of work, unless we can automate the source code changes.
If it is then looks a happy for me.