[Phpmyadmin-devel] #4536 - master: import problem (PMA_String)

Marc Delisle marc at infomarc.info
Sun Sep 21 12:20:34 CEST 2014


Le 2014-09-20 21:05, Marc Delisle a écrit :
> Le 2014-09-20 14:16, Hugues Peccatte a écrit :
>> Hi everyone,
>>
>> It seems that since that commit [1], the master is quite slow to import
>> data.
>> This seems to be linked to multi-bytes functions that are really slower
>> compared to standard string functions.
>>
>> I tried several variants of this, but without a good result: instead of
>> always using PMA_StringMB, try to detect encoding and so use
>> PMA_StringNative when possible.
>> To improve this, I cached the encoding, to avoid to detect it each time.
>> See [2] (this is not totally safe, here, for the tests, I removed the
>> mb_* detection).
>> But the result is not as good as before…
>> I also tried something more experimental: convert all the strings used
>> in PMA_String to UTF-8 string and so, mb_* functions won't convert
>> anymore. But it is not effective either.
>>
>> Do you have any idea about the way to improve it please?
>> I thought about generalising the strpos to a strallpos/strposall (for
>> redundant strpos) with an explicit encoding, not to let PHP convert each
>> time.
>>
>> Thanks for your help,
>>
>> [1] https://github.com/phpmyadmin/phpmyadmin/commit/9b77d746aba
>> [2] https://github.com/Tithugues/phpmyadmin/commit/ab6f493449d90e58bd4caa15740d8364c7fd4247
>>
>> Hugues.
> 
> Hi Hugues,
> 
> I have not looked deep into this logic, so it seems that you've become
> the expert here in these matters.
> 
> Taking into account that the current master is not acceptable for a
> 4.3.0-alpha release, I see a few choices:
> 
> - remove the mb modifications from the import logic
> 
> - remove the current parser from the import logic, therefore removing
> support for things like a custom delimiter and probably other things
> (import of compressed files?)
> 
> - delay 4.3.0 until we find the correct solution with mb
> 
> 

We could also add another (!) custom option in the import dialog:
multi-byte or not.

The multi-byte way is the more correct one for importing files with
multi-byte characters, but simply does not work for big files (10 to 15
times slower). So by default, the option could be set to not use the
multi-byte way.

A user with a big multi-byte file would have a problem, unless she is
allowed to set the PHP execution time limit to huge values (which not
many sysadmins will allow unless they want their shared server to
perform badly).

-- 
Marc Delisle | phpMyAdmin




More information about the Developers mailing list