Efficiency in data processing with BO's


Author
Message
Keith Chisarik
Keith Chisarik
StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)
Group: StrataFrame Users
Posts: 939, Visits: 40K
What do you all consider the most efficient way to process through large amounts of data?



1) Load the BO with all data upfront and then "filter", modify data, move on

2) Load the BO upfront and then "seek" (can you seek for more than one record even?)

3) Load the BO with only the small subset you want to work with, inside a loop (many calls to the database), modify data, move on

4) Don't use a BO at all, use ADO or LINQ to SQL.

5) Other



I have to adjust pricing for a very large volume of records, needing to update or insert 1-6 at a time for a given combination of criteria from a total record count of about 200,000.



Just curious as to your opinions or past experience with this. I know I find myself having to worry about the most efficient way of doing things since I never have been able to make the .NET/SQL stuff as fast as VFP.

Keith Chisarik
Trent Taylor
Trent Taylor
StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)
Group: StrataFrame Developers
Posts: 6.6K, Visits: 7K
What do you all consider the most efficient way to process through large amounts of data?

There is really on ONE best way...especially if you are going to be pulling from a number of different tables and it may require a number of different queries....a CLR stored procedure.  This let's you still program in .NET, but the work takes place at the server.  However, this will only work for SQL Server.  Otherwise you will have to create stored procs in the database language you are using.

Here is an example of performance increase, we have a program that deals with entering payments that come back on an EOB.  We released our first blush sitting on top of VFP and it cratered if the EOB was very large....not to mention slow.  We now have this totally on SQL Server and using CLR stored procs and it is basically instantaneous.  We have another scenario where the end-user is posting the claims, and there were literally 18+ queries that took place each time in order to check all the exceptions, etc. to get the correct fees, and so forth.  This totally died in VFP, but by using the CLR stored procs, it is literally as fast as you can type the code....it is incredible!

The last example I will give you relates to our prescription data, there are some tables with millions of records which have to be queried and compared to check for items such as drug-interactions.  Pulling this back to the client would never work, but by using the CLR stored procedure, it is basically as fast as you can type again. 

In the current beta, though there is no documentation yet, there is also a new feature that allows a single query to the server to bring back multiple result sets (assuming the database supports this [i.e. SQL Server]).  I have one example where I have a single query to the server but load 6 business objects...one trip...no stored procs...and FAST!

Dim cmd As New SqlCommand("SELECT * FROM Customers WHERE cs_pk = @cs_pk;" & _
                                         "SELECT * FROM CustomerOrders WHERE or_cs_pk = @cs_pk")

'-- Add the parms
cmd.Parameters.AddWithValue("@cs_pk", MyCustomerPk).SqlDbType = SqlDbType.Int

'-- Now execute the query and load the BOs
MicroFour.StrataFrame.Business.BusinessLayer.FillMultpleDataTables(cmd,CustomersBO, CustomerOrdersBO)

I know that you are using DB2 in certain conditions, so this is an issue and will definitely require you to "think out of the box."  DB2 is a more cumbersome and more limiting environment to work within.  I am not as familiar with it so I do not know if it supports the multiple result sets or not. 

All of your questions are valid, but trying to determine if ADO or LINQ to SQL will help is actually not the issue here.  First, we use ADO within the BO...secondly, LINQ to SQL is only going to strong-type your queries, and has nothing to do with performance really.  The whole issue with large data is generally getting it back from the server, the number of trips, and how the query is formulated.  Also, different scenarios require different solutions.  This is why I mentioned several options above.

have to adjust pricing for a very large volume of records, needing to update or insert 1-6 at a time for a given combination of criteria from a total record count of about 200,000.

In this case, you shouldn't be bringing all 200,000 records back to your side.  If you know what needs to be changed, you are better off working in a much smaller data set (i.e. the pk, code, and fee) then have the update take place back on the server.  If the end-user has to enter all 200,000 fees, then this is going to happen with one or a few at a time, not all 200,000.  So I would imagine that you already know what needs to be changed, so you could create a CLR stored procedure (or TSQL if you want pain Smile ) and then have all of this work take place on the server so the data never makes it to the client...just the results and the information that the end-user needs.

Whoa...that was a big post, but I hope it give you some ideas Smile

P.S. Using SQL Server versus VFP we have had in some cases 100 times better performance...SQL Server is WAY faster!  It all comes down to how you deal with the data.

Greg McGuffey
Greg McGuffey
Strategic Support Team Member (4.8K reputation)
Group: Forum Members
Posts: 2K, Visits: 6.6K
Normally I'd consider using stored procedures, which would be the way to go if you were updating all 200,000 at once. But since you aren't...I'd be asking these questions?



- how do you determine what the 1-6 records are? How much user interaction is there? Is the logic complex?

- Should the process be cancellable/within a transaction?

- How fast is the connection to the db?



Greg McGuffey
Greg McGuffey
Strategic Support Team Member (4.8K reputation)
Group: Forum Members
Posts: 2K, Visits: 6.6K
I really need to check if some else posted before I post...nice info Trent.
Trent Taylor
Trent Taylor
StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)
Group: StrataFrame Developers
Posts: 6.6K, Visits: 7K
I really need to check if some else posted before I post...nice info Trent.

LOL...I do it all the time.  Usually we are both posting at the same time when this happens BigGrin

Keith Chisarik
Keith Chisarik
StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)
Group: StrataFrame Users
Posts: 939, Visits: 40K
I actually will be updating all 200,000 "at once" I have an EDI pricing update file, that from my initial test updates close to 95% of the existing records and then will be adding more. So this isn't a user doing entry, it is an automated process that pulls in files (actually many), parses them and then updates the DB2 pricing tables with the updates and adds for new products or date sensitive pricing.



I am using DB2 for this so I will have to see what capabilities I have there.



Thanks for the input.



Greg:

- how do you determine what the 1-6 records are? a concatenated key of 3 fields returns the records to be updated

- How much user interaction is there? Just a button press....hopefully

- Is the logic complex? - not really pricing is based on a three fields

- Should the process be cancellable/within a transaction? No, I want to be able to run it whenever the user like and it just pulls the new data files from the vendors and evaluates what does and does not need to change

- How fast is the connection to the db? Right now it is a local LAN, eventually via a VPN that isn't that great

Keith Chisarik
Greg McGuffey
Greg McGuffey
Strategic Support Team Member (4.8K reputation)
Group: Forum Members
Posts: 2K, Visits: 6.6K
Ah, I forgot that you are living the love with DB2.



I'd say that you will likely be be better served using DB2 to do the heavy lifting. It appears the DB2 has the equivalent of CLR stored procedures and they appear to be able to be written in VB.NET or C# too! It also has normal SQL sprocs too. The nice thing about the compiled sprocs is that they are efficient at managing complex rules.



Now, the next question is are these files and who processes them?



If the user will be indicating the files to process, then I might try to load the files locally into a BO (doing the parsing along the way), then load that data into a table on DB2, then call the sproc to do the work.



If the files are in some central repository, you could theoretically get DB2 to do the file parsing work also.



If it takes a while to do the processing, you could of course use another thread.



Just some ideas.



Good luck!
Peter Jones
Peter Jones
Advanced StrataFrame User (718 reputation)Advanced StrataFrame User (718 reputation)Advanced StrataFrame User (718 reputation)Advanced StrataFrame User (718 reputation)Advanced StrataFrame User (718 reputation)Advanced StrataFrame User (718 reputation)Advanced StrataFrame User (718 reputation)Advanced StrataFrame User (718 reputation)Advanced StrataFrame User (718 reputation)
Group: Forum Members
Posts: 386, Visits: 2.1K
Hi Trent,

From your response I get the impression that you believe CLR is the way to go with SQL Server. Is this so and, if it is, what made you guys move away from standard SQL stored procs?

Cheers, Peter

Trent Taylor
Trent Taylor
StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)StrataFrame Developer (14K reputation)
Group: StrataFrame Developers
Posts: 6.6K, Visits: 7K
From your response I get the impression that you believe CLR is the way to go with SQL Server. Is this so and, if it is, what made you guys move away from standard SQL stored procs?

We still use standard TSQL stored procs for INSERT, UPDATE, and DELETEs (our CRUD settings).  But anything past that or that requires customization or custom queries on the server, we use CLR.  We haven't changed any of our standards or logic.  CLR stored procedures are much easier to work within and have more flexiblity (especially for .NET developers).  You get all of the benefits of TSQL (as it relates to server side performance [you do take a slight hit using CLR over TSQL but it is not really noticeable]) but get to write all of your code in C# or VB.NET.  Also, the DDT will deploy CLR assemblies for you as well which just adds to the ease of implementation.

Keith Chisarik
Keith Chisarik
StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)StrataFrame VIP (2.4K reputation)
Group: StrataFrame Users
Posts: 939, Visits: 40K
I am still struggling with speed issues and require some help.



In breaking down the problem, I know that I will probably have to write my own insert sprocs on the DB2 side, for now I am ignoring that.



I am having speed issues before I ever save across the wire, just working with the BO. After much trial, it seems that the call to the add() function of the BO is my first bottleneck. I invested in a code profiler and it clearly shows that to be the issue (see attachment), and I can confirm that by replacing the add() line with just about any other function and things are fast.



I was under the assumption that working with the BO was essentially the same as working with the underlying datatable, what am I doing wrong that bo.add() in a loop is taking forever? To add 87k records to the BO is taking about 15 minutes, before BO.save() is ever called. The BO has 6 fields only, 4 strings, a self incrementing integer PK, and an integer row version field, primarykeyisautoincrement is true and row-versioning is set to optimistic row version.



Any thoughts?

Keith Chisarik
Attachments
slow.jpg (131 views, 26.00 KB)
GO

Merge Selected

Merge into selected topic...



Merge into merge target...



Merge into a specific topic ID...




Similar Topics

Reading This Topic

Login

Explore
Messages
Mentions
Search