StrataFrame Forum

Speed of Updating Descriptions

http://forum.strataframe.net/Topic26730.aspx

By Ger Cannoll - 4/3/2010

I have a Sql Server table which I am updating from  VFP Tables. It curently has c 200,000 records and is taking about 2 hours to update.

The Sql table consists of about 20 fields from a VFP table, and an additional 10 fields, whcih are in effect lookup fiuelds to give descriptions from a series of VFP lookup tables.If I strip out the 'Lookup Code' bit, the update takes about 5 minutes, so I am fairly sure the bottleneck is in the way I am updating the descriptions, and womder if there is a quicker way to do this update.

I have the following code replicated 10 times for each of my lookup fields, within my main loop to update the main data

1. I set up an OleDbCommand with the connection string and a parameter for the lookup table

2. I then have a While Rdr.read loop to update a field with the description

Is there a quicker/better way to update these descripotions ?

By Charles R Hankey - 4/3/2010

Can you just do the join to a normalized table on the VFP side then import that table?



I didn't quite get how you are doing the "update" but to transfer data I'd suggest SSIS.



Still, if you have the option of doing your joins first and then just bringing it in as one table, no lookups you'll probably get that 5 minute speed.


By Ger Cannoll - 4/3/2010

Hi Charles

Initially I had considered using Joins, but its not a staight join in the sense that there is some additional code, once the join is done, like inserting extra text under certain conditions. One alternative I had considered was to use a ListArray. i.e. Initially store the lookup tables into a ListArray (there are only a few thousand Lookup records) and then do a ListArray.BinarySearch to insert the descriptions. I would have thought thios would be faster ,but I have abosolutely no experience of using ListArrays. (I used Arrays quite a lot in VFP and know they are extremely fast for lookups etc). I dont suppose there'a anything like a IIF(Seek) in .net ?

By Edhy Rijo - 4/4/2010

Hi Gerard,



I am sorry, but I don't quite understand what is it that you are trying to accomplish here, would you mind trying to be more explicit?



Are you using VFP & SQL tables all the time in this application, or is this one time import?



You said you are updating an SQL table from a VFP table which has 200,000 records, right?



Also it looks like that the problem is in getting the lookup descriptions which are in the VFP tables and at the time of the import it is taking a lot of time, is this right?



Assuming what I understood is correct, you may change your approach to import the records. It will help if you can post the code you are using in a text file here so we can see clearly what you are doing. I have done some imports from CSV data, and it was really slow using the regular BO.Add or just ADO, so I changed my approach to use SqlBulkCopy which inserted the records into the SQL table really fast, in seconds, then after I had the records in SQL I did the rest of the process which resulted in a faster process. Take a look in the forums using "sqlbulkcopy" and you will find my posts and others.



You could also do the following:

1.- Use the sqlbulkcopy approach to create a temporary SQL table with all the lookup data, also import your 200K records.

2.- Create a stored procedure to do the lookup assignments and any other updates needed.



Keep in mind that using VFP/ADO you will be processing one record at a time with a loop/scan, etc. while in MS-SQL with a stored procedure you can process a whole bunch of record in a single pass. In my case my process done in SF with BOs took more than an hour to import and do some calculations and updates, with the help of a MS-SQL MVP consultant he helped me create a stored procedure that now process 500k records in 1-10 minutes, of course I had to pay him but his fees are very competitive he being a MS-SQL MVP.
By Ger Cannoll - 4/4/2010

Hi Efhi.

To give meore background.

This is an ongoing requirement..i.e a process that will be run every night, to get data from VFP Tables into a Sql Server Database, which then goes into a BI application.

Most of the data is in a few VFP Tables and these all go into one SQL Table. On this SQL Table, there are say 10 fields, which also need a description. e.g. Rep No (Which needs a description)  Stock Group(Which needs a description)  Area (Which needs a description) Account Category(Which needs a description) etc.. The Sql Table is currenlty at 200,00 records for a 3 month period but this will grow to 1.5 million for say a 2 year period.

I am happy with the main population of the SQl table (Which currently takes about 5 minutes) if I leave out the Descriptions.

I then have to go back over the file (or some other method) and populate each of the six description fields from 6 different VFP tables (Lookup Tables)

(The description population require manily a lookup from the VFP Lookup Tables but some additional processing also,  so I cannot easily use an outer join to these Description Tables)

As I will be doing a lot of this sort of stuff over the coming months, I wanted to get the most efficient way of doing this

By Edhy Rijo - 4/4/2010

I believe that using a stored procedure will be the way to go specialy if this is a process that need to be run more than once. Importing the lookup tables should take no time and will allow the SP to do it's magic.
By Charles R Hankey - 4/4/2010

My vote is for SSIS. This is exactly the kind of stuff it was designed for and it is very configurable to find the best performance. Especially if you are using SQL2008 I think you will be very pleased to find how flexible your options are using SSIS. Very easy to debug as well.


By Charles R Hankey - 4/4/2010

Should also mention this is not "either-or" with a solution using SPs. SSIS lets you run any SQL queries or SPs as part of any of your transformation processes.
By Ger Cannoll - 4/4/2010

Charles and Edhy..thanks for your very informative replies.

(By the way, what is SSIS ?)

I have done a bit of experimenting and got the update,( inclusive of descriptions) down to about 10 mins for 200,000 records.

I had a look at Arrays, ArrayList but neither seemd to fit the bill. Then I came across  a thing called Dictionary, which allows you to store in a type of array,  Key and Value pairs, and then to do a lookup on these with one command. This seems to have worked extremely well in the limited testing I have done so far.

So what I have done is as follows:

1. Initailly, store all my Keys and Descriptions into  Dictionaries . This is an initial hit but only takes seconds as I am just storing 2 fields for each lookup

2. When populating my main Table, in order to get the descriptions, I just do a lookup on the in Memory dictionary. This means I do not have any overhead of going to disk  out and doing any selects etc...all thats needed is a lookup which seems to be very fast

By Charles R Hankey - 4/5/2010

Sounds like you have it working. Great.



SSIS is SQL Server Integration Services - the successor to DTS (Data Transformation Services) It is a whole thing built into the Business Intelligence Development Studio ( BIDS ) that is designed specifically for importing/exporting and transforming data. Very useful for importing text files, munging data of all kinds.

I have a pretty large package I've created that migrates and transforms a whole lot of SQL data from one database to another which we can run on client data when we migrate them to our .NET version from the VFP app (which used SQL data but used GUID keys etc)



Very powerful and worth looking into whether or not you use it on this project. Much more user friendly than DTS was and plays well with .NET.


By Dustin Taylor - 4/5/2010

Glad you got it working! A dictionary would indeed be a good approach for speed, assuming your lookup codes don't bloat too terribly much. For what it is worth, we have similar requirements on our automated conversion program (converts all the VFP data in our medical app to the appropriate SQL data/structures) and for that application we used SQL bulk copies as Edhy mentioned to get the data into SQL, then we play with it from there. 

I wasn't invovled with the development of that conversion program, so I couldn't tell you why they opted for bulk copies over SSIS, but SSIS is something I do need to become more familiar with regardless.  I've used it several times for one-time deals, and have been impressed with the speed and cabailities. I haven't ever followed all the way through to make a re-usable package, so thanks for reminding me to take a look, Charles Smile.

By Charles R Hankey - 4/5/2010

I think you went with bulk copy over DTS. Lots of people did, for good reason. DTS was pretty clunky in a lot of ways and very hard to tweak. SSIS 2005 was better but the tools were still a bit lacking in the IDE. I think with SSIS 2008 it is really worth a look.



It is worth signing up for the $50 a month access to the LearnDevNow.com videos ( AppDev videos minus the coursebooks etc ) just for the SSIS lessons. I think one trip through the 12 hour training material there and you pretty much have what you need to figure out what you need for a project.