[Phpmyadmin-devel] Improving Sync for Big Tables

Fri Mar 23 13:03:34 CET 2012

Le 2012-03-22 20:00, adeel khan a écrit :
> Hi,
> Well i am interested in the GSoC project that is improving the
> synchronization feature , which really have problems with big tables.
> I already have a bad experience of synch huge tables. Last year i was
> testing this feature as i was doing research on tools having sync
> capability and after using a huge table > 0.1 Million the app/browser
> stopped responding. The main cause if i remembered exactly was probably
> time out.
> As you have written in the ideas list to first check it so i give it
> another try , and it again gives me the same result. I tested it on a db
> having just one table of 20K+ record, the script executed for about 5
> minutes and there was php timeout. Here i am just taking the source table
> (single table) to destination .
> 
> I thought about the solution and here what i can think of it right now
> First as your app generate everything before showing the difference stuff
> to the user, this is the major cause of bottleneck. In my 20K tuples
> example i was amazed why it didnt give me any timeout error on the first
> dialog. In a worst case scenarios generating a diff for the entire db is
> simply not scalable approach.Even if you have a lot of time/memory the user
> would have other stuff to do and would not wait for a long time for the
> sync thing to finish. I think the whole thing should be done in batches.The
> idea is that the structural diff is mostly less time and memory consuming,
> so its better to generate it for the entire db, but for data, just generate
> enough so that the script doesnot time out. (cases where we have huge
> tables).
> Now any user would get all the structural diff , plus what data diff we can
> able to get through in that pass. The time allowed for pass is bounded by
> script execution time which is 5 minutes as i see. Hence whatever has to be
> generated and applied has to meet this criteria. Now as if the user go for
> synch all option it would be synch without any issue and he/she would again
> land on the same page with next possible difference that was not evaluated
> in the previous pass. Now obviously there would be no structural diff now
> as they are already resolved.
> The issue now remains is the if the user wants a selected diff. The user
> might be interested in a table diff which app is not generating because of
> it is not coming in the first pass. The above method forces any user that
> he/she must apply initial pass diff to able to reach his/her desired table.
> This issue can be resolved by giving a skipping feature so that the user
> can skip to next tables, just as the concept of pagination.
> 
> I was thinking about profiling and checking for memory consumption.

Adeel,
if you design a solution that works in batches, how will you split your
batches to ensure that batch 1 of a table in source db can be compared
to batch 1 of the same table, in target db?

-- 
Marc Delisle
http://infomarc.info