Hi, Well i am interested in the GSoC project that is improving the synchronization feature , which really have problems with big tables. I already have a bad experience of synch huge tables. Last year i was testing this feature as i was doing research on tools having sync capability and after using a huge table > 0.1 Million the app/browser stopped responding. The main cause if i remembered exactly was probably time out. As you have written in the ideas list to first check it so i give it another try , and it again gives me the same result. I tested it on a db having just one table of 20K+ record, the script executed for about 5 minutes and there was php timeout. Here i am just taking the source table (single table) to destination .
I thought about the solution and here what i can think of it right now First as your app generate everything before showing the difference stuff to the user, this is the major cause of bottleneck. In my 20K tuples example i was amazed why it didnt give me any timeout error on the first dialog. In a worst case scenarios generating a diff for the entire db is simply not scalable approach.Even if you have a lot of time/memory the user would have other stuff to do and would not wait for a long time for the sync thing to finish. I think the whole thing should be done in batches.The idea is that the structural diff is mostly less time and memory consuming, so its better to generate it for the entire db, but for data, just generate enough so that the script doesnot time out. (cases where we have huge tables). Now any user would get all the structural diff , plus what data diff we can able to get through in that pass. The time allowed for pass is bounded by script execution time which is 5 minutes as i see. Hence whatever has to be generated and applied has to meet this criteria. Now as if the user go for synch all option it would be synch without any issue and he/she would again land on the same page with next possible difference that was not evaluated in the previous pass. Now obviously there would be no structural diff now as they are already resolved. The issue now remains is the if the user wants a selected diff. The user might be interested in a table diff which app is not generating because of it is not coming in the first pass. The above method forces any user that he/she must apply initial pass diff to able to reach his/her desired table. This issue can be resolved by giving a skipping feature so that the user can skip to next tables, just as the concept of pagination.
I was thinking about profiling and checking for memory consumption.
My brief intro is that iam currently doing MS in Computer Science from LUMS Lahore ,Pakistan .Previously i did my BS with major in CS from FAST-NU .I am interested in system stuff like OS,distributed system,networks and and any good/cool software. I have started programming in C/C++, way back in 2004. I do have little industry experience too working in mobile applications (J2ME and Android) . I started working in PHP and MySQL in 2008 mostly because of its ease of use and its syntactic relatedness to C. My best Opensource/developement experience is/was working in GSOC 2010 that was making a synchronzation plus visualization feature for pgAdmin(postgres client). Other than that i have fixed bugs and developed a feature for a OpenSource project "OrangeHRM" (tool for human resource management).
Regards adeel
Hi,
On 3/22/12 8:00 PM, adeel khan wrote:
[[snip]]
I thought about the solution and here what i can think of it right now First as your app generate everything before showing the difference stuff to the user, this is the major cause of bottleneck. In my 20K tuples example i was amazed why it didnt give me any timeout error on the first dialog. In a worst case scenarios generating a diff for the entire db is simply not scalable approach.Even if you have a lot of time/memory the user would have other stuff to do and would not wait for a long time for the sync thing to finish. I think the whole thing should be done in batches.The idea is that the structural diff is mostly less time and memory consuming, so its better to generate it for the entire db, but for data, just generate enough so that the script doesnot time out. (cases where we have huge tables). Now any user would get all the structural diff , plus what data diff we can able to get through in that pass. The time allowed for pass is bounded by script execution time which is 5 minutes as i see.
For what it's worth, the script execution time is configurable and will differ depending on the hosting environment; so we can't count on any specific predefined number.
[[snip]]
On Fri, Mar 23, 2012 at 5:40 AM, Isaac Bennetch bennetch@gmail.com wrote:
Hi,
On 3/22/12 8:00 PM, adeel khan wrote:
[[snip]]
I thought about the solution and here what i can think of it right now First as your app generate everything before showing the difference stuff to the user, this is the major cause of bottleneck. In my 20K tuples example i was amazed why it didnt give me any timeout error on the first dialog. In a worst case scenarios generating a diff for the entire db is simply not scalable approach.Even if you have a lot of time/memory the user would have other stuff to do and would not wait for a long time for the sync thing to finish. I think the whole thing should be done in batches.The idea is that the structural diff is mostly less time and memory consuming, so its better to generate it for the entire db, but for data, just generate enough so that the script doesnot time out. (cases where we have huge tables). Now any user would get all the structural diff , plus what data diff we can able to get through in that pass. The time allowed for pass is bounded by script execution time which is 5 minutes as i see.
For what it's worth, the script execution time is configurable and will differ depending on the hosting environment; so we can't count on any specific predefined number.
But as we can set the timeout value to a minimum execution time programmatically we would always be assured that we would able to get that time for a pass.
adeel
Hi
Dne Mon, 26 Mar 2012 02:54:16 +0500 adeel khan ak1733@gmail.com napsal(a):
But as we can set the timeout value to a minimum execution time programmatically we would always be assured that we would able to get that time for a pass.
This is already used in importing, see libraries/import.lib.php, especially PMA_checkTimeout, so it might be turned into more generic solution.
Hi, I didnt checked this mail. Sorry Michal i didnt applied to gsoc this time, because of some constraints. I hope any one else would improve the sync feature for big tables by going through the idea. I think its workable. As you have already mentioned that importing use this scheme for assuring script exec time because the same case applies to importing of data. You need some assurance that the script should run to a minimal time, and you can always tell user (which is not the case in import) that how much data is inserted or else.
adeel
On Thu, Apr 5, 2012 at 6:33 PM, Michal Čihař michal@cihar.com wrote:
Hi
Dne Mon, 26 Mar 2012 02:54:16 +0500 adeel khan ak1733@gmail.com napsal(a):
But as we can set the timeout value to a minimum execution time programmatically we would always be assured that we would able to get
that
time for a pass.
This is already used in importing, see libraries/import.lib.php, especially PMA_checkTimeout, so it might be turned into more generic solution.
-- Michal Čihař | http://cihar.com | http://blog.cihar.com
Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Phpmyadmin-devel mailing list Phpmyadmin-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/phpmyadmin-devel
Le 2012-03-22 20:00, adeel khan a écrit :
Hi, Well i am interested in the GSoC project that is improving the synchronization feature , which really have problems with big tables. I already have a bad experience of synch huge tables. Last year i was testing this feature as i was doing research on tools having sync capability and after using a huge table > 0.1 Million the app/browser stopped responding. The main cause if i remembered exactly was probably time out. As you have written in the ideas list to first check it so i give it another try , and it again gives me the same result. I tested it on a db having just one table of 20K+ record, the script executed for about 5 minutes and there was php timeout. Here i am just taking the source table (single table) to destination .
I thought about the solution and here what i can think of it right now First as your app generate everything before showing the difference stuff to the user, this is the major cause of bottleneck. In my 20K tuples example i was amazed why it didnt give me any timeout error on the first dialog. In a worst case scenarios generating a diff for the entire db is simply not scalable approach.Even if you have a lot of time/memory the user would have other stuff to do and would not wait for a long time for the sync thing to finish. I think the whole thing should be done in batches.The idea is that the structural diff is mostly less time and memory consuming, so its better to generate it for the entire db, but for data, just generate enough so that the script doesnot time out. (cases where we have huge tables). Now any user would get all the structural diff , plus what data diff we can able to get through in that pass. The time allowed for pass is bounded by script execution time which is 5 minutes as i see. Hence whatever has to be generated and applied has to meet this criteria. Now as if the user go for synch all option it would be synch without any issue and he/she would again land on the same page with next possible difference that was not evaluated in the previous pass. Now obviously there would be no structural diff now as they are already resolved. The issue now remains is the if the user wants a selected diff. The user might be interested in a table diff which app is not generating because of it is not coming in the first pass. The above method forces any user that he/she must apply initial pass diff to able to reach his/her desired table. This issue can be resolved by giving a skipping feature so that the user can skip to next tables, just as the concept of pagination.
I was thinking about profiling and checking for memory consumption.
Adeel, if you design a solution that works in batches, how will you split your batches to ensure that batch 1 of a table in source db can be compared to batch 1 of the same table, in target db?