Re: [Phpmyadmin-devel] Improving Sync for Big Tables

23 Mar 2012


      Le 2012-03-22 20:00, adeel khan a écrit :
...
Hi,
Well i am interested in the GSoC project that is improving the
synchronization feature , which really have problems with big tables.
I already have a bad experience of synch huge tables. Last year i was
testing this feature as i was doing research on tools having sync
capability and after using a huge table > 0.1 Million the app/browser
stopped responding. The main cause if i remembered exactly was probably
time out.
As you have written in the ideas list to first check it so i give it
another try , and it again gives me the same result. I tested it on a db
having just one table of 20K+ record, the script executed for about 5
minutes and there was php timeout. Here i am just taking the source table
(single table) to destination .
I thought about the solution and here what i can think of it right now
First as your app generate everything before showing the difference stuff
to the user, this is the major cause of bottleneck. In my 20K tuples
example i was amazed why it didnt give me any timeout error on the first
dialog. In a worst case scenarios generating a diff for the entire db is
simply not scalable approach.Even if you have a lot of time/memory the user
would have other stuff to do and would not wait for a long time for the
sync thing to finish. I think the whole thing should be done in batches.The
idea is that the structural diff is mostly less time and memory consuming,
so its better to generate it for the entire db, but for data, just generate
enough so that the script doesnot time out. (cases where we have huge
tables).
Now any user would get all the structural diff , plus what data diff we can
able to get through in that pass. The time allowed for pass is bounded by
script execution time which is 5 minutes as i see. Hence whatever has to be
generated and applied has to meet this criteria. Now as if the user go for
synch all option it would be synch without any issue and he/she would again
land on the same page with next possible difference that was not evaluated
in the previous pass. Now obviously there would be no structural diff now
as they are already resolved.
The issue now remains is the if the user wants a selected diff. The user
might be interested in a table diff which app is not generating because of
it is not coming in the first pass. The above method forces any user that
he/she must apply initial pass diff to able to reach his/her desired table.
This issue can be resolved by giving a skipping feature so that the user
can skip to next tables, just as the concept of pagination.
I was thinking about profiling and checking for memory consumption.
Adeel,
if you design a solution that works in batches, how will you split your
batches to ensure that batch 1 of a table in source db can be compared
to batch 1 of the same table, in target db?


-- 
Marc Delisle
http://infomarc.info