Re: [Phpmyadmin-devel] #4536 - master: import problem (PMA_String)

18 Oct 2014

      2014-10-13 12:21 GMT+02:00 Hugues Peccatte <hugues.peccatte@gmail.com>:
...
2014-10-12 21:06 GMT+02:00 Marc Delisle <marc@infomarc.info>:
...
Le 2014-10-12 12:57, Hugues Peccatte a écrit :
...
2014-10-05 20:36 GMT+02:00 Hugues Peccatte <hugues.peccatte@gmail.com
<mailto:hugues.peccatte@gmail.com>>:
2014-10-04 9:01 GMT+02:00 Hugues Peccatte <
hugues.peccatte@gmail.com
    <mailto:hugues.peccatte@gmail.com>>:
Le 4 oct. 2014 03:22, "Madhura Jayaratne" <madhura.cj@gmail.com
        <mailto:madhura.cj@gmail.com>> a écrit :
>
        >
        >
        > On Sat, Oct 4, 2014 at 1:24 AM, Hugues Peccatte
        <hugues.peccatte@gmail.com <mailto:hugues.peccatte@gmail.com>>
        wrote:
        >>
        >> 2014-10-03 12:26 GMT+02:00 Marc Delisle <marc@infomarc.info
        <mailto:marc@infomarc.info>>:
        >>>
        >>> Hi Hugues,
        >>> I retested this morning on a laptop, importing a SQL file
        containing
        >>> 10000 employees from the sample employees database. This is
        a small file
        >>> (660 KB).
        >>>
        >>> Current master: 3 min 25 sec (and ends with JSON.parse:
        unexpected
        >>> character)
        >>>
        >>> Current Tithugues/stringFunctions_master: 2 min 10 sec (same
        js error)
        >>>
        >>> Current QA_4_2: 0 min 5 sec
        >>>
        >>> There has been improvement, but we cannot release 4.3 with
        this import
        >>> speed.
        >>>
        >>> --
        >>> Marc Delisle | phpMyAdmin
        >>
        >>
        >> Hi,
        >>
        >> I agree… But I'm afraid this is linked to multibytes
functions…
        >> Maybe we shouldn't use the multibytes functions everywhere…
        >>
        >> I'll still try to improve performances.
        >>
        >> Hugues.
        >>
        >
        > Indeed, I also think that we should use mb_* function only
        when necessary and choice to use them should be made in case by
        case basis.
        >
        > --
        > Thanks and Regards,
        >
        > Madhura Jayaratne
Hi,
I didn't push my commits, but that's what I've started. I
        replaced the mb_* calls by standard calls on configuration
        variables, reserved words, etc.
Hugues.
Hi,
Out of desperation, I try another algorithm. Instead of buffering
    data until SQL delimiter, I'll try to parse all lines.
    So, I won't parse 1000 times a buffer of 50000 characters, but less
    than 10 times many buffers of 500 characters. I hope this will be
    faster.
Hugues.
Hi,
The new algorithm is over. There are still some controls to add, but it
is usable with the file in this ticket: [1]
You can find my modifications here: [2]
Marc, is it faster for you ?
It seems that I won ~33% of time. We're still far from 5 seconds…
Maybe I'll try to use standard PHP functions to see the difference. If
the standard PHP functions are really faster, I'll try to add an option
to use mb_* functions or standard PHP functions, as you said.
[1] https://sourceforge.net/p/phpmyadmin/bugs/4536/
[2]
https://github.com/Tithugues/phpmyadmin/tree/stringFunctions_useStandardFunc...
Hi Hugues,
yes it's faster. With the same testing conditions, the import takes 1
min 20 sec.
...
--
Marc Delisle | phpMyAdmin
Thanks for your feedback.
I'll try another another improvement to be faster.
Note for my self:
* read X characters but don't restart the search from 0 each time
* search for the escaped quote with a lookbehind expression, something
like `(?!<\\)(\\\\)*'`
Hugues.
Hi,

As asked by Marc, I added on option to import by reading as a multibytes
string or not.
The default configuration won't read as multibytes string (because it's too
long…). It seems that the DnD to import doesn't use the default
configuration, so what ever you define as default, it won't be use in this
process.
Should we create a ticket for this ? I think it's possible to get it in
javascript.

Hugues.