[Phpmyadmin-devel] #4536 - master: import problem (PMA_String)

Marc Delisle marc at infomarc.info
Sun Sep 21 12:20:34 CEST 2014

Le 2014-09-20 21:05, Marc Delisle a écrit :
> Le 2014-09-20 14:16, Hugues Peccatte a écrit :
>> Hi everyone,
>> It seems that since that commit [1], the master is quite slow to import
>> data.
>> This seems to be linked to multi-bytes functions that are really slower
>> compared to standard string functions.
>> I tried several variants of this, but without a good result: instead of
>> always using PMA_StringMB, try to detect encoding and so use
>> PMA_StringNative when possible.
>> To improve this, I cached the encoding, to avoid to detect it each time.
>> See [2] (this is not totally safe, here, for the tests, I removed the
>> mb_* detection).
>> But the result is not as good as before…
>> I also tried something more experimental: convert all the strings used
>> in PMA_String to UTF-8 string and so, mb_* functions won't convert
>> anymore. But it is not effective either.
>> Do you have any idea about the way to improve it please?
>> I thought about generalising the strpos to a strallpos/strposall (for
>> redundant strpos) with an explicit encoding, not to let PHP convert each
>> time.
>> Thanks for your help,
>> [1] https://github.com/phpmyadmin/phpmyadmin/commit/9b77d746aba
>> [2] https://github.com/Tithugues/phpmyadmin/commit/ab6f493449d90e58bd4caa15740d8364c7fd4247
>> Hugues.
> Hi Hugues,
> I have not looked deep into this logic, so it seems that you've become
> the expert here in these matters.
> Taking into account that the current master is not acceptable for a
> 4.3.0-alpha release, I see a few choices:
> - remove the mb modifications from the import logic
> - remove the current parser from the import logic, therefore removing
> support for things like a custom delimiter and probably other things
> (import of compressed files?)
> - delay 4.3.0 until we find the correct solution with mb

We could also add another (!) custom option in the import dialog:
multi-byte or not.

The multi-byte way is the more correct one for importing files with
multi-byte characters, but simply does not work for big files (10 to 15
times slower). So by default, the option could be set to not use the
multi-byte way.

A user with a big multi-byte file would have a problem, unless she is
allowed to set the PHP execution time limit to huge values (which not
many sysadmins will allow unless they want their shared server to
perform badly).

Marc Delisle | phpMyAdmin

More information about the Developers mailing list