StrataFrame Forum

seek() slowness

http://forum.strataframe.net/Topic7826.aspx

By Keith Chisarik - 4/2/2007

I have some code that loops through about 1/4 of a million records, it is taking a very long time (I stopped it after 15 minutes and it wasn't close to done).



I took out all the processing in the loop so that all it was doing was the loop with a single seek() and it was still slow. I replaced the seek() with "if 1 =2 then" and it went as expected (about 20 seconds).



I am seek()ing against a single integer field that I have an index on in SQL. Help????



Code:

searchstr = "rent_RPMrecnum = " & "'" & currow("record_number") & "'"

'If Me.RentitemsBO1.Seek(searchstr) Then

‘do nothing

Else

‘still do nothing for testing purposes

End if





Thanks.








By Greg McGuffey - 4/2/2007

I'd tend to not use .NET for this or any procedural language. I'd try to figure out how to use set base SQL to get the job done. Much faster.
By Keith Chisarik - 4/2/2007

"Utt oh", Trent called Ben. I must have done something really dumb Smile



Trent: /points and laughs

Ben:


By StrataFrame Team - 4/2/2007

It doesn't matter that you have an index in SQL Server... it's irrelavent once you have the data in a data table.  What you want to do is add an index to your DataTable.  But since you cannot directly add an index, you'll have to change the column that it considers to be the "primary key".  I've never done this before, but the only index that ADO.NET keeps on a DataTable is the one on the primary key column.  That should certainly help speed things up.

table.PrimaryKey = new DataColumn[] {table.Columns["columnname"]};

I'm not entirely sure if you can do this one data is already in the table... if you can't you'll need to create a new table, set the PK, and then copy the data over. 

By Keith Chisarik - 4/2/2007

Greg,



Thanks but "set base SQL" doesn't mean much to me. I used to do this with millions of records in VFP and it didn't blink, I must have something wrong... and seek() and the "wait window" were selling point's for me so I want to know how to make it work like it should. If that falls through i will look up "set base SQL", thanks.
By Keith Chisarik - 4/2/2007

I am seeking against a BO, that already has a primary key field defined. I'm confused on your fix Ben Sad
By StrataFrame Team - 4/2/2007

Ah, the PrimaryKey property of a DataTable isn't really used by us... it's just used by ADO.NET to allow the Find() method to locate records based upon the internal index.  That index is the only index that exists on a DataTable in ADO.NET, so if you want to speed up the seeking to data on a column, you'll need to somehow move your DataTable's index to that column.  Hence, you redefine the PK, which moves the index to that column.

I'm not talking about changing the PK in the database... just the DataTable.PrimaryKey property on the BO.CurrentDataTable itself (just your local copy).

By StrataFrame Team - 4/2/2007

Oh, and my sample was in C#, sorry, here's the VB:

BO.CurrentDataTable.PrimaryKey = New DataColumn() {BO.CurrentDataTable.Columns("columnname")}

By Keith Chisarik - 4/2/2007

Cool that took it from unusable to just slow (good enough to get me past the jump), thank you. BigGrin



So if I want it faster (VFP fast) I should try to write a stored proc in SQL or something like Greg suggested?



(it has been about 2 minutes and it is 1/2 done)



Thankfully this procedure only has to be done once a day.


By Keith Chisarik - 4/2/2007

Actually I get an error when calling add() on the BO after adding your code. If I remove your code, it runs fine again, but back to a crawl.



The column is set to not allow nulls, and has a default value of 0 defined.





System.Data.NoNullAllowedException was unhandled by user code

Message="Column 'rent_RPMrecnum' does not allow nulls."

Source="System.Data"

StackTrace:

at System.Data.DataColumn.CheckNullable(DataRow row)

at System.Data.DataTable.RaiseRowChanging(DataRowChangeEventArgs args, DataRow eRow, DataRowAction eAction, Boolean fireEvent)

at System.Data.DataTable.SetNewRecordWorker(DataRow row, Int32 proposedRecord, DataRowAction action, Boolean isInMerge, Int32 position, Boolean fireEvent, Exception& deferredException)

at System.Data.DataTable.InsertRow(DataRow row, Int32 proposedID, Int32 pos, Boolean fireEvent)

at System.Data.DataRowCollection.Add(DataRow row)

at MicroFour.StrataFrame.Business.BusinessLayer.NewRow()

at MicroFour.StrataFrame.Business.BusinessLayer.Add(Boolean CheckSecurity)

at MicroFour.StrataFrame.Business.BusinessLayer.Add()

at ladesktop.frmSyncTables.syncrentitems(String p_sqlstring) in V:\2005 Projects\LightActionRPMScan\ladesktop\frmSyncTables.vb:line 564

at ladesktop.frmSyncTables.bgwRentitemsSync_DoWork(Object sender, DoWorkEventArgs e) in V:\2005 Projects\LightActionRPMScan\ladesktop\frmSyncTables.vb:line 103

at System.ComponentModel.BackgroundWorker.OnDoWork(DoWorkEventArgs e)

at System.ComponentModel.BackgroundWorker.WorkerThreadStart(Object argument)
By Keith Chisarik - 4/2/2007

and yes, that is the column I set to be the new PK



Me.RentitemsBO1.CurrentDataTable.PrimaryKey = New DataColumn() {Me.RentitemsBO1.CurrentDataTable.Columns("rent_RPMrecnum")}
By Trent L. Taylor - 4/2/2007

Trent: /points and laughs

Not at all!!!  Wink

and yes, that is the column I set to be the new PK

The reason that we have never done this is because there are better ways to work through large data sets.  This is, I beleive, what Greg was reffering to in his post regarding "SQL base."  You are correct that VFP could handle 1 million records plus very easily...but SQL can go MUCH bigger.  You are used to working in a "connected" data environment, which was FoxPro and the reason for all of the corrupted memo fields Smile  With SQL Server and business objects, you are working in a disconnected environment so you want to work with as little data on the client side as possible.

When we need to work with very large data sets, for example prescription contraindications which can have a a million+ plus records to deal with.  We create an SQL Project and deploy it to SQL Server and then execute it through a stored procedure.  The DDT will automatically create the stored procedures required in order to execute it for you.  What is nice about this all of the work is done on the server, like a T-SQL stored procedure, but you are talking ADO.NET or whatever you you need.  You can then send your ADO.NET results back to the BO.  This is the best approach when dealing with large data sets....or create a T-SQL stored procedure, but this is not nearly as fun Smile

By Greg McGuffey - 4/2/2007

By "set based" I'm referring to the way SQL does things the most efficiently. It works best on sets of records, not on individual records (or rows). Looping through the BO (in SQL Server, cursors) is row based. Row based manipulation provide lots of power as to what can get done, but is not as fast (generally) as using SQL to manipulate a whole set at once (that is not as fast for SQL server to get the work done. Languages like .NET are more efficient at row based manipulation. Which is fastest would probably depend on all sorts of variables, like efficiency of code, hardward, connectivity, ect.).



Set based manipulation is great if you have something you want to do to the data based on some criterion. E.g. lets say you have a database containing a table of movie rentals. You want to mark all rentals more than a year old as needing maintenance (like polishing the DVDs). It is more efficient to use an update statement than to loop through the rows:



Update rentals -- table with rentals

Set MaintenanceStatus = 1 -- lets say this means it need maintenance

Where PurchaseDate <= DateAdd(year,-1,GetDate()) -- gets date one year ago




Since this sort of thing only needs to be done infrequently, the user doesn't actually need to see the data to get the job done, a SQL statement is the way to go. You could create a sproc (for max speed, as it would be precompiled in SQL) or you could just create a method in a BO.



However, there are times when this won't work too well. If what you need to do is complicated or not data related or the data is in another database or the user needs to see all the data to decide what to do (the data would thus be on the client anyway), then row level processing is needed. It turns out that it isn't actually needed very often though.



I hope this helps. I come from MS Access background, not a VFP, so I had to learn a lot about SQL Server as I had it doing lots of work to speed up things. Sounds like VFP was speedy (always heard that, but I wasn't allowed to use it when I started doing db development...long story). The whole disconnected data thing in .net is different and I'm still getting used to it. It may be that this world view is now inaccurate or not as accurate, as it might be more efficient to use a loop in .net than in SQL server. I'm watching this post, hoping to learn a lot also Hehe
By Greg McGuffey - 4/2/2007

Man, if I'd just waited a few minutes, I would have see what Trent posted....Blink



Seems I not as much a dinosaur as I thought w00t
By Ben Hayat - 4/2/2007

The reason that we have never done this is because there are better ways to work through large data sets. This is, I beleive, what Greg was reffering to in his post regarding "SQL base." You are correct that VFP could handle 1 million records plus very easily...but SQL can go MUCH bigger. You are used to working in a "connected" data environment, which was FoxPro and the reason for all of the corrupted memo fields [Smile] With SQL Server and business objects, you are working in a disconnected environment so you want to work with as little data on the client side as possible.



When we need to work with very large data sets, for example prescription contraindications which can have a a million+ plus records to deal with. We create an SQL Project and deploy it to SQL Server and then execute it through a stored procedure. The DDT will automatically create the stored procedures required in order to execute it for you. What is nice about this all of the work is done on the server, like a T-SQL stored procedure, but you are talking ADO.NET or whatever you you need. You can then send your ADO.NET results back to the BO. This is the best approach when dealing with large data sets....or create a T-SQL stored procedure, but this is not nearly as fun




Hi Trent; This was a very valuable point. Could you please make a note to bring this up in the class to cover it in more detail?



Much appreciated!
By Keith Chisarik - 4/2/2007

Part of the problem is that the volume isn't in SQL exclusively, I have to read a table of roughly 1/4 million records from a Dataflex table via ODBC (seems all I do these days is read old data but thats another story).



I am trying to create a sync method that will mirror the table in SQL for me to program against. So for each of the 250,000 records I need to either add a record or check the fields to see of they have been updated. It is the initial "look for this record" that is taking the time. I think what you are saying here is that I want to create a SQL proc to search for the record, not rely in the BO seek() method since ADO wasnt built to do that... do I have it right?



Thanks everyone for your input and time. For the time being I have abandoned doing the whole table at once, and gone for only updating the records the user calls for at any given time, I will eventually need to be able to "refresh" the entire SQL table against the live Dataflex datasource for reporting though.
By Ben Hayat - 4/2/2007

Dataflex table




Good old DATAFLEX Smile

Are they still around? I used it from 1983 to 1985. It's proprietary data file format (back then) was very solid.

Are they using Btrieve now? Or Pervasive SQL?
By Keith Chisarik - 4/2/2007

I must be reading the old school stuff, .dat files, a file for each index.... stuff like that. I have no clue what they are doing these days, I downloaded Visual Dataflex just for the database tool. I was handed an old application that is no longer supported and has no documentation to work with.
By Keith Chisarik - 4/2/2007

PS - in 1983 I was 10 Smile
By Trent L. Taylor - 4/2/2007

do I have it right?

Basically.  I was not aware that this was the type of database your were working with.  Could you use scalar methods on the database to give you the info you need rather than using the Seek on the ADO.NET datatable?  Just a thought....

PS - in 1983 I was 10

ROFL....not that I have any room to talk....I was 7 BigGrin

By Greg McGuffey - 4/2/2007

Jeesh, you guys probably never even used a punch card! Tongue
By Ivan George Borges - 4/3/2007

I did!

With FORTRAN. Crazy

By Ben Hayat - 4/3/2007

I did!



With FORTRAN. [Crazy]




IBM 360/370

Univac 1108



Fortran

Cobol

Basic

Assembly

JCL

Algol

.

.

.

Tongue
By StrataFrame Team - 4/3/2007

What's a punch card?  BigGrin
By Steve L. Taylor - 4/3/2007

1983 - Keith was 10, Trent was 7, Ben was 2 -- ok I am a little older than you guys -- by a freaking lot. I was 31 in 1983.

Either y'all are precocious or maybe I am mature.  Actually, I started on punched cards and tape.  At one time I could read Baudot...

I think Ben and Trent are counting the days until the nursing home comes and gets me...

By Ivan George Borges - 4/3/2007

Ben Hayat (04/03/2007)
I did!

With FORTRAN. [Crazy]


IBM 360/370
Univac 1108

Fortran
Cobol
Basic
Assembly
JCL
Algol
.
.
.
Tongue

Hey, Ben.

I walked the same path!

By Keith Chisarik - 4/3/2007

Keith Chisarik (04/02/2007)
Actually I get an error when calling add() on the BO after adding your code. If I remove your code, it runs fine again, but back to a crawl.



The column is set to not allow nulls, and has a default value of 0 defined.





System.Data.NoNullAllowedException was unhandled by user code

Message="Column 'rent_RPMrecnum' does not allow nulls."

Source="System.Data"

StackTrace:

at System.Data.DataColumn.CheckNullable(DataRow row)

at System.Data.DataTable.RaiseRowChanging(DataRowChangeEventArgs args, DataRow eRow, DataRowAction eAction, Boolean fireEvent)

at System.Data.DataTable.SetNewRecordWorker(DataRow row, Int32 proposedRecord, DataRowAction action, Boolean isInMerge, Int32 position, Boolean fireEvent, Exception& deferredException)

at System.Data.DataTable.InsertRow(DataRow row, Int32 proposedID, Int32 pos, Boolean fireEvent)

at System.Data.DataRowCollection.Add(DataRow row)

at MicroFour.StrataFrame.Business.BusinessLayer.NewRow()

at MicroFour.StrataFrame.Business.BusinessLayer.Add(Boolean CheckSecurity)

at MicroFour.StrataFrame.Business.BusinessLayer.Add()

at ladesktop.frmSyncTables.syncrentitems(String p_sqlstring) in V:\2005 Projects\LightActionRPMScan\ladesktop\frmSyncTables.vb:line 564

at ladesktop.frmSyncTables.bgwRentitemsSync_DoWork(Object sender, DoWorkEventArgs e) in V:\2005 Projects\LightActionRPMScan\ladesktop\frmSyncTables.vb:line 103

at System.ComponentModel.BackgroundWorker.OnDoWork(DoWorkEventArgs e)

at System.ComponentModel.BackgroundWorker.WorkerThreadStart(Object argument)





this is the problem, can you help with this. I think my problem is misunderstood, the BO.Seek() is horribly slow unless the primary key on the BO is changed to the field I am seeking against as Ben suggested above, BUT when I do this, calling BO.Add() results in the error above even thought I have default values defined for all my "NON NULL" fields.
By Ivan George Borges - 4/3/2007

Steve L. Taylor (04/03/2007)
1983 - Keith was 10, Trent was 7, Ben was 2 -- ok I am a little older than you guys -- by a freaking lot. I was 31 in 1983.

Either y'all are precocious or maybe I am mature.  Actually, I started on punched cards and tape.  At one time I could read Baudot...

I think Ben and Trent are counting the days until the nursing home comes and gets me...

 

I was 19. But I'm not worried... I've decided to live 120 years, so, nothing to worry about. Hehe

By Ben Hayat - 4/3/2007

Steve L. Taylor (04/03/2007)
1983 - Keith was 10, Trent was 7, Ben was 2 -- ok I am a little older than you guys -- by a freaking lot. I was 31 in 1983.



Either y'all are precocious or maybe I am mature. Actually, I started on punched cards and tape.At one time Icould read Baudot...



I think Ben and Trent are counting the days until the nursing home comes and gets me...




Steve, it's great to see you active in design and development. And that's what keeps everyone young...Staying on top of things.



Hope to see you when I get to your class!
By Ben Hayat - 4/3/2007

Ivan George Borges (04/03/2007)
Ben Hayat (04/03/2007)
I did!



With FORTRAN. [Crazy]




IBM 360/370

Univac 1108



Fortran

Cobol

Basic

Assembly

JCL

Algol

.

.

.

Tongue




Hey, Ben.



I walked the same path!




We should definitely talk about it when we are having Beer "On" Trent. BigGrin
By StrataFrame Team - 4/3/2007

OK, back to business...

Keith, if you change the column that is the primary key, you'll have to change it back after you are done with your loop in order to add a new record.  Also, when you set the PK column back to where it actually is the primary key, you'll also need to set the auto-increment on the column... seed of -1 and value of -1.

By Greg McGuffey - 4/3/2007

I'm going to throw out another approach. Is it possible to just create a fill method that returns just the records you want? You must be determining somehow which records to seek, so why not just use that criterion in the Where clause. You could also do a sort (on SQL, not in BO). Then you just loop through the returned records and do whatever you have to do.