Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Files in database

Author  Topic 

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2005-10-27 : 09:58:00
Hi,

I need to create an app which manages large files (15 - 20meg) of different types.

I would like to store the files in the DB. I found this [url]http://forums.asp.net/491642/ShowPost.aspx[/url] post at asp.net. It provides a compelling case for putting the files in the DB.

However, I come here and people say that its best to have them in the file system.

For example, here [url]http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=54764&SearchTerms=files,database[/url] , Kristen writes,

quote:
Getting large files [anything bigger than 8,000 bytes] in/out of SQL Server is more painful than regular data.


Why might this be so?

Does anyone care to comment on the asp.net post?

Cheers, XF

Kristen
Test

22859 Posts

Posted - 2005-10-27 : 14:09:26
"provides a compelling case"

A couple of people are posting that they are convinced that it is better. The arguments that they make against the "store it in the filesystem" people are robustly made, but not compelling to me.

IF you need the images and database to be copied to a number of places (e.g. the laptop of travelling salesmen, or something else that could use replication) its a good option to consider.

On the assumption that a file, once presented, will very rarely change, the backup of the files can be done to "occasional" backups - by that I mean the files are only backed up occasionally.

On the other hand a backup of SQL Server will backup everything. You can use Differential backups, but most people would not leave more than, say, a week between full backups - so once a week [and more commonly once a day] you are going to back up the whole bang shooting match. During that time there will be some additional stress on the SQL application [needing to acquire locks, backup the file "out of sequence" to allow sections to be written to, etc], slowing the performance for application which access it.

The other thing I have an issue with is how long it takes to get files in, and out, of the SQL database. Its a very tortuous process (compared to any other data that you would get in & out of SQL).

Next up is when you separate the application/fileserver server from the SQL server. For example a web box - typically then the Images files can be on the Web box, and the database elsewhere. If you put the Images IN the database then you are using bandwidth between the application box and the database box - and the size of files is going to far outweigh the other data, thus drowning it! (memory that could be used to cache the "skinny" data is going to be run out by the larger file data).

In the case of images on a web server (quite probably not your bag) an HTML tag to:

<IMG SRC="/images/Foo.gif"> is likely to result in that image being cached by proxy servers all the way from your server to the user. A very large proportion of the time Foo.gif will never actually be requested from your web server.

Change that to <IMG SRC="/asp/GetImage.asp?ImageID=1234"> and it probably won't be cached anywhere downstream.

But principally I think it is harder to build an application that uses the database to store files (than storing them in the OS).

Putting the filename, and path, and all the keywords, permission data, and all that stuff, in the database means you can still search by keyword and all the other benefits.

Kristen
Go to Top of Page

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2005-10-27 : 15:20:41
Hi,

quote:
On the assumption that a file, once presented, will very rarely change, the backup of the files can be done to "occasional" backups - by that I mean the files are only backed up occasionally.


Personally I don't think this sounds like a good way of backing up data. Maybe the files will very rarely change but some might! How would you know which ones had changed if you lost the lot? And what about recently added files? If you lost those since the last file backup then there'd be a db record pointing at nothing let alone an out of date version.

Once a web hosting company lost a site I'd been working on's entire set of images. They had a backup but it was hopelessly out of synch with the database. I had to write code to indicate which records had a missing file and which files had a missing record and the client had to upload the missing files again.

Also, the posters on the thread I linked to don't seem too daunted about backing up large databases. They seem to be making out that SQL Server is designed to handle the task.

quote:
The other thing I have an issue with is how long it takes to get files in, and out, of the SQL database. Its a very tortuous process (compared to any other data that you would get in & out of SQL).


Why is it a tortuous process? I've looked up an example of streaming a BLOB from SQL Server and it appears that one just needs to set an ADO.net DataReader instance's command behaviour to CommandBehavior.SequentialAccess and then from then on the data might as well be coming from a file.

quote:
Next up is when you separate the application/fileserver server from the SQL server. For example a web box - typically then the Images files can be on the Web box, and the database elsewhere. If you put the Images IN the database then you are using bandwidth between the application box and the database box - and the size of files is going to far outweigh the other data, thus drowning it! (memory that could be used to cache the "skinny" data is going to be run out by the larger file data).


This sounds like a good point. Maybe SQL Server doesn't bother caching BLOBs? According to one of the posters in the asp.net thread, blob fields are not stored directly in the record, so maybe SQL Server doesn't bother caching BLOBs?

quote:
In the case of images on a web server (quite probably not your bag) an HTML tag to:

<IMG SRC="/images/Foo.gif"> is likely to result in that image being cached by proxy servers all the way from your server to the user. A very large proportion of the time Foo.gif will never actually be requested from your web server.


Yes I agree, not my bag because its probably not a good idea to be caching lots of multimeg files anywhere.

quote:
But principally I think it is harder to build an application that uses the database to store files (than storing them in the OS).


Maybe but if one is going to build a high quality file management database then surely its worth it otherwise perhaps its not a task that one should be undertaking?

XF.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-10-27 : 20:23:58
"How would you know which ones had changed"

You backup based on the Archive Attribute in the file system - if the file is changed it gets backed up. You then need a hierarchical backup system that rotates the new files round, so that you have a minimalist route to recovering the whole lot (and a simplistic way to recover the "Latest version of File X")

Of course recovering "Last tuesday's version of File X is not a viable option if the files are stored in the database - but that may not be a likely "need" for you.

"They seem to be making out that SQL Server is designed to handle the task"

Indeed, but a) its a bit all or nothing (you can use considerable complexity to portion off parts of the database for separate backup if you want to, but as I indicated previously "its extra engineering") and b) SQL server is bound to slow down during a massive backup and c) recovery time is longer to get a big database back than a small one.

"Why is it a tortuous process"

Because SQL Server doesn't hold the Blob in the normal way that data is held. So the actual process is "Here's a bit more, can you tag it on the end please" and "I've got the first 2K, can I have the next 2K please" ... the fact that ADO makes it easy doesn't mean its fast!

"Maybe SQL Server doesn't bother caching BLOBs"

Dunno if it does, or not. But if it does its less memory for caching of other stuff. And it your Blobs are 90% of the data then that's 90% of your caching gone - and SQL Server is much better off caching index pages etc. that speed up queries (over simplification because SQL Server allocates different zones to different types of cached data, but it would skew the caching for sure).

"Maybe but if one is going to build a high quality file management database then surely its worth it otherwise perhaps its not a task that one should be undertaking?"

I would have phrased more in terms of "What's the best solution for the job".

But if your mind is made up I should have a go. Ideally you would build a test-rig both ways and see what works best for you. One or other might hit a brick wall and cause you to favour the other one early on.

Kristen
Go to Top of Page

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2005-10-29 : 13:20:07
Thanks for your reply. I'll probably give it a go and see what happens.

Regarding proxy servers not caching images, I think this can be got around by using url-rewriting to give images the standard file type.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-10-30 : 02:19:16
"I think this can be got around"

Hadn't thought of that - sounds like a good idea - although you'd then have to cater for delivery of actual file system images via that route too.

Kristen
Go to Top of Page

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2005-10-30 : 11:57:05
Sure. Someone posted a good idea on how to fix this on a blog somewhere but I haven't tried it.

The idea was to make the folder in your web application that contains file system images an application too. And then have this nested application override the routing of images to asp.net so that the web server handles them like normal.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-10-30 : 14:46:06
Let me know if you find the URL pls

Kristen
Go to Top of Page
   

- Advertisement -