Le 2012-03-22 20:00, adeel khan a écrit :
Hi, Well i am interested in the GSoC project that is improving the synchronization feature , which really have problems with big tables. I already have a bad experience of synch huge tables. Last year i was testing this feature as i was doing research on tools having sync capability and after using a huge table > 0.1 Million the app/browser stopped responding. The main cause if i remembered exactly was probably time out. As you have written in the ideas list to first check it so i give it another try , and it again gives me the same result. I tested it on a db having just one table of 20K+ record, the script executed for about 5 minutes and there was php timeout. Here i am just taking the source table (single table) to destination .
I thought about the solution and here what i can think of it right now First as your app generate everything before showing the difference stuff to the user, this is the major cause of bottleneck. In my 20K tuples example i was amazed why it didnt give me any timeout error on the first dialog. In a worst case scenarios generating a diff for the entire db is simply not scalable approach.Even if you have a lot of time/memory the user would have other stuff to do and would not wait for a long time for the sync thing to finish. I think the whole thing should be done in batches.The idea is that the structural diff is mostly less time and memory consuming, so its better to generate it for the entire db, but for data, just generate enough so that the script doesnot time out. (cases where we have huge tables). Now any user would get all the structural diff , plus what data diff we can able to get through in that pass. The time allowed for pass is bounded by script execution time which is 5 minutes as i see. Hence whatever has to be generated and applied has to meet this criteria. Now as if the user go for synch all option it would be synch without any issue and he/she would again land on the same page with next possible difference that was not evaluated in the previous pass. Now obviously there would be no structural diff now as they are already resolved. The issue now remains is the if the user wants a selected diff. The user might be interested in a table diff which app is not generating because of it is not coming in the first pass. The above method forces any user that he/she must apply initial pass diff to able to reach his/her desired table. This issue can be resolved by giving a skipping feature so that the user can skip to next tables, just as the concept of pagination.
I was thinking about profiling and checking for memory consumption.
Adeel, if you design a solution that works in batches, how will you split your batches to ensure that batch 1 of a table in source db can be compared to batch 1 of the same table, in target db?