By Edhy Rijo - 2/17/2010
Hi guys,
I am currently saving big text files (450,000+ lines) in a Varchar(Max) field, for a while the performance of getting a record with such data decreased and it is noticeable in the browser dialogs and reports.
I am wonder is there is a better way to deal with so much data in a single field, so would it be possible to kind of compress the data in the field somehow?
I know that I may have this field in another table to separate it from the main table and do some magic to make it look as part of the record in the form, but if that is no the best way I am open to any suggestions.
Please keep in mind that currently I am showing the data in the form in a readonly textbox, but I can change the design to not show it up when the form loads and put a button so the user can get the data and then show it, but at least that will not slow down the browser dialog.
Thanks!
|
By Teddy Jensen - 2/17/2010
Hi Edhy,I also have the need for saving files (XML files) related to a record. I use a related table and before saving I compress the file using MicroFour.StrataFrame.IO.Compression.PackageFile Then on my maintenanceform I have a button that will extract the the file, transform the XML to html via a Stylesheet and then show it in a form with a Webbrowser control through a ChildFormDialog. I know your files are not XML but just to give some inspiration. /Teddy
|
By Russell Scott Brown - 2/18/2010
Do you have to keep the large text file in the table at all? Does the text file change frequently or is it static?
|
By Dustin Taylor - 2/18/2010
Teddy's answer is what I was going to suggest. Try compressing the file and saving it to a VarBinary field instead, see if it gives you better reliability. If needs be, you also have the option of saving using blobs, which is what we use inthe My Account area for the bigger files (basically sends a larger file as bite-sized chunks to prevent the connection from crapping out in the middle of a huge transfer.)
|
By Trent L. Taylor - 2/18/2010
Edhy,
You are aware that you can update a single field as a blob or chunk with events that notify you of the progress, right? In our CRM, when we save a download file, it could be really large, and not always save correctly. Same goes for our medical application where we allow media to be attached to the patient. You ca set the blob size and then even show a progress bar of the save. If this rings a bell, let me know. Otherwise, I can try and post a small sample later (today is going to be crazy so it might be tomorrow before I get it posted).
I remember posting this on the forum before....I will search.
|
By Trent L. Taylor - 2/18/2010
FYI, the method name on the BO is PushBlobField or PushBlobFieldAsync.
|
By Edhy Rijo - 2/18/2010
Russell Scott Brown (02/18/2010) Do you have tokeep the large text file in the table at all?
Does the text file change frequently or is it static?
File is part of an import routine, it could change once or twice during the import process if any error is detected on the columns format of the file, but once imported it is preserved on the database so it can be used from any workstation instead of a file location.
Since it is a CSV file I simply read the file to string and save the string, again, this worked just fine until things started to get slow
|
By Edhy Rijo - 2/18/2010
Teddy Jensen (02/18/2010) ...I know your files are not XML but just to give some inspiration.
Thanks Teddy, sure this may also works in my case, will consider it.
|
By Edhy Rijo - 2/18/2010
Trent L. Taylor (02/18/2010) Edhy,
You are aware that you can update a single field as a blob or chunk with events that notify you of the progress, right?
Trent, Dustin,
I was not aware of the Blob field, while working with VFP in the past we always tried to stay away of Blob fields for inestability reasons and it did not cross my mind to even think about them in this case. Since these CSV files are really big, I like the idea of compressing them and be able to show a progress bar during this process, so I will take you on your offer for the sample in VB if possible (I know you are switching to C# )
|
By Trent L. Taylor - 2/19/2010
I will try to get you a sample...I kind of think I have posted one out here before for Kenneth Langley. Might do a quick search to see if anything comes up. Also, we just called this method blob as some databases (i.e. Oracle) call it blob. But in essence you could do this for any field type. But in SQL Server, I would use a VarBinary(MAX) field to store the data. So you would just convert your file into a byte array and then save that into a VarBinary field.
I will try and allocate some time to write this sample...sorry for the delay. I am getting somewhat hammered this morning (and I am not referring to beer drinking contests! )
|
By Edhy Rijo - 2/23/2010
Hi Trent,
Sorry to bother you on this one, but I searched the forums and could not find any sample code on how to do this. Please when you have some time see if you can post something quick to get the idea.
Thanks!
|
By Trent L. Taylor - 2/24/2010
I'm sorry Edhy...this slipped off of my radar. I will try and get you something right now.
I've been dealing with bureaucrats the last few days, so I was a bit preoccupied (not to mention frustrated!).
|
By Trent L. Taylor - 2/24/2010
OK, here is an example I posted a while back. Let me know if this is not enough to work off of.
http://forum.strataframe.net/FindPost21474.aspx
|
By Edhy Rijo - 2/24/2010
Thanks a lot, I believe that will be good enough, will try to implement this today and will post results.
About the bureaucrats, just don't pay too much attention to them
|
By Charles R Hankey - 2/24/2010
Question: I am currently serializing PDF files to VARBINARY(max) fields. Of course they are not as large as what Edhy is talking about. In the context of this question, do I understand correctly that just doing this doesn't compress the whitespace out of the file and that is why the more elaborate technique is necessary, or am I just not understanding the issue at all ?
For the PDFs I am just using
me.policiesBO1.PDF_bytes = File.Readallbytes(myfile)
|
By Greg McGuffey - 2/24/2010
The technique your using Charles wouldn't do any compression. I.e. a space is ASCII 32, data just like anything else. Now if the file were using some sort of compression algorithm, then you'd get that. I.e. JPG, PNG etc have compression built into them. So, if the PDF file format includes some compression, then you'd get that compression. I think it's a pretty simple call to use the zip tools in the framework to compress the resulting byte stream. I'd check out the tools area, if memory is correct. I check into this later today, as I'm a curious guy.
|
By Edhy Rijo - 2/24/2010
Well Charles, Greg,
Now that you mentioned compression, yes that is something I would like, since my files are just basically text files I am sure any kind of compression will help.
Since I have not tried any of the techniques, I believe I would have to do the following:
1.- Read the text file to stream
2.- Compress that stream
3.- Save it to the Varbinary(max) field using Trent's technique.
Am I missing something here?
|
By Greg McGuffey - 2/24/2010
That's what I'd try.
BTW, I took a look and my memory failed me. There is some compression stuff in the StrataFrame.IO.Compression namespace, but it is all related to creating packages (very handy for ziping multiple files into one file) and not what we are talking about (compressing/decompressing a stream).
It turns out that it is the .NET library in the IO.Compression namespace. Here are some links related to this:
MS info on this namespace. We're interested in the DeflateStream and GZipStream classes:
http://msdn.microsoft.com/en-us/library/system.io.compression.aspx
This one explains a bit and then provides a utility to handle zip files:
http://blogs.msdn.com/dotnetinterop/archive/2006/04/05/.NET-System.IO.Compression-and-zip-files.aspx
|
By Charles R Hankey - 2/24/2010
Of course the question is also is the goal speed or size? Does the trade-off of serializing and deserializing the whitespace - resulting in a large piece of data - payoff in faster in and out time vs the time it takes to compress/uncompress.
I don't know the answer, obviously, but it seems to be part of the question.
|
By Edhy Rijo - 2/24/2010
Charles R Hankey (02/24/2010) Of course the question is also is the goal speed or size?
In my case I would say it is both, since these large text file can be compressed a great deal, and the end user does not need to see the file all the time, so once the file is imported it is just there in case it is needed for future reference so the less space used the better.
Also I am planning on moving this field to a separate table with a one to one relationship so when using the Browse Dialog this field is not included and only when it is needed by the internal process or the end user wants to take a pick at the file it will be shown in a childform.
Greg, thanks for the links and the comment, I will check them out.
|
By Teddy Jensen - 2/26/2010
Thanks a lot, I believe that will be good enough, will try to implement this today and will post results.
|
Edhy - You are normally very fast in posting your results, so I would like to know if You got it working ? /Teddy
|
By Edhy Rijo - 2/26/2010
Teddy Jensen (02/26/2010)
Edhy - You are normally very fast in posting your results, so I would like to know if You got it working ?
Hmmmm, that must be a bad habit just kidding
Seriously, I am still working on this but I went a different approach based on the application requirements which are as follows:
1.- The CSV data saved in the varbinary field is used once or twice. The data is imported into a table and once the records are imported this CSV data is not used anymore unless there is a problem with the imported records and the CSV needs to be imported again.
2.- Based on number 1, I don't need this big field value being loaded in any way during normal operations unless I need to read the CSV again to re-import the records.
So this is what I did:
1.- Create a new table related to the main table. This new table will have the varbinary field.
2.- In the main table BO, create a new property to read the varbinary field from the related table. This will free the main table from this field and speed up the loading process of forms using the main table.
3.- I added a new child form dialog to manage the request for getting the varbinary field via the BO property.
During this process, I am also playing with compressing the CSV data once it is converted to a byte array using System.Text.Encoding.UTF8.GetBytes(value), the compression looks pretty good since one of the CSV file has 9.3 million bytes and when compressed will have 2.6 million bytes. I am not using the .Net compression classes since I have a 3rd party library to handle many things, but I am sure the plain .Net class could do the same.
I am still in the process of refining the custom field property to properly handle getting the varbinary field, decompress it and convert it back to text to be shown in the child form dialog so when I am done I could post some code.
Trent's suggestion here is pretty good and would work for me if I would have to show the CSV data all the time when opening the form, but that is not my case for now
|
By Teddy Jensen - 2/26/2010
OK - thanks, was just curious on what route You choose.I will probably later try to implement Trents technique to see if I can improve my form with the XML files. /Teddy
|
By Edhy Rijo - 2/26/2010
Teddy Jensen (02/26/2010) I will probably later try to implement Trents technique to see if I can improve my form with the XML files.
That technique it will give you the impression of loading the data faster and in fact it will do that because all other fields will be loaded first and then it will go get the varbinary field's data.
What worried me more was the fact that my main table is being used in several other forms as a parent table and all those form where suffering from slow loading and searching for records because when using the Browse Dialog all fields would be included in the search including the varbinary even though this field was not shown in the browse dialog, by removing this field from main table and putting it in a child table fixed my problem.
|
By Trent L. Taylor - 3/1/2010
This is all good stuff. One other thing I thought I would mention is well is that you can also pull from a VarBinary field asynchronously as well. So this isn't just for saving, but retrieving also. This is also what we do in our medical app for images. Most times you don't even seen the progress bar come up. But when a large file is being retrieved over a slow connection, then the end-user will see the progress bar and know that it is being downloaded. Once we added this functionality a year or two ago in the app, it made a huge difference to our users. Just another item for thought here.
|
By Teddy Jensen - 3/1/2010
OK - i tried to get this to work, but am apparently missing something, because i get this error:BusinessLayerException The field could not be truncated. Source : MicroFour StrataFrame Business Stack Trace: ved MicroFour.StrataFrame.Business.BusinessLayer.TruncateBlobField(PrimaryKeyValue pk, String fieldName) ved MicroFour.StrataFrame.Business.BusinessLayer.PushBlobFieldThread(Object parameters) Any hints?
|
By Trent L. Taylor - 3/1/2010
Teddy, are you referring to the PullBlob or PushBlob here?
|
By Teddy Jensen - 3/1/2010
It was the PushBlob, but I have it working now.I had a null value issue on my test BO. It's pretty cool. Thx. /Teddy
|
By Trent L. Taylor - 3/2/2010
Great....glad you got it going! Yeah, it is really nice when you want to save or retrieve data asynchronously. Especially when dealing with larger data saves.
|