Is it a bad practice to store large files (10 MB) in a database?

  • I am currently creating a web application that allows users to store and share files, 1 MB - 10 MB in size.

    It seems to me that storing the files in a database will significantly slow down database access.

    Is this a valid concern? Is it better to store the files in the file system and save the file name and path in the database? Are there any best practices related to storing files when working with a database?

    I am working in PHP and MySQL for this project, but is the issue the same for most environments (Ruby on Rails, PHP, .NET) and databases (MySQL, PostgreSQL).

    Do you only retrieve them when you actually need them?

    Surprised that no one posted the MS research done on this issue (for SQL Server 2008): To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem

    *large* is a relative quantity, I ( and many others probably ) don't see `10MB` as large in a modern system.

    This is on-topic according to the FAQ - it fits under the bullets "design patterns" (slash antipatterns) and "software architecture". Why was it closed?

    @Izkata (imho) the close reason is wrong here, the question is on topic but it's also _very_ not constructive.

    B Seven: You are not giving us enough information to go on, and you are asking a couple of very vague questions. Notice how there's an "it depends" element on almost all answers? Well, we really don't want that. You can either edit your question to be a little bit more specific (including some details for your project), _or_ try a do over with a new question. Feel free to ask on [meta] or [chat] for further details on how to improve your question.

    I don't see any vagueness in the question as it is now. I have no idea why it was closed.

    Let us remember those innocent times of 2012, when 10MB was a 'large' file.

  • Thomas

    Thomas Correct answer

    8 years ago

    Reasons in favor of storing files in the database:

    1. ACID consistency including a rollback of an update which is complicated when the files are stored outside the database. This isn't to be glossed over lightly. Having the files and database in sync and able to participate in transactions can be very useful.
    2. Files go with the database and cannot be orphaned from it.
    3. Backups automatically include the file binaries.

    Reason against storing files in the database:

    1. The size of a binary file differs amongst databases. On SQL Server, when not using the FILESTREAM object, for example, it is 2 GB. If users need to store files larger (like say a movie), you have to jump through hoops to make that magic happen.
    2. Increases the size of the database. One general concept you should take to heart: The level of knowledge required to maintain a database goes up in proportion to the size of the database. I.e., large databases are more complicated to maintain than small databases. Storing the files in the database can make the database much larger. Even if say a daily full backup would have sufficed, with a larger database size, you may no longer be able to do that. You may have to consider putting the files on a different file group (if the database supports that), tweak the backups to separate the backup of the data from the backup of the files etc. None of these things are impossible to learn, but do add complexity to maintenance which means cost to the business. Larger databases also consume more memory as they try to stuff as much data into memory as possible.
    3. Portability can be a concern if you use system specific features like SQL Server's FILESTREAM object and need to migrate to a different database system.
    4. The code that writes the files to the database can be a problem. One company for whom I consulted not so many moons ago at some point connected a Microsoft Access frontend to their database server and used Access' ability to upload "anything" using its Ole Object control. Later they changed to use a different control which still relied on Ole. Much later someone changed the interface to store the raw binary. Extracting those Ole Object's was a new level of hell. When you store files on the file system, there isn't an additional layer involved to wrap/tweak/alter the source file.
    5. It is more complicated to serve up the files to a website. In order to do it with binary columns, you have to write a handler to stream the file binary from the database. You can also do this even if you store file paths but you don't have to do this. Again, adding a handler is not impossible but adds complexity and is another point of failure.
    6. You cannot take advantage of cloud storage. Suppose one day you want to store your files in an Amazon S3 bucket. If what you store in the database are file paths, you are afforded the ability to change those to paths at S3. As far as I'm aware, that's not possible in any scenario with any DBMS.

    IMO, deeming the storage of files in the database or not as "bad" requires more information about the circumstances and requirements. Are the size and/or number of files always going to be small? Are there no plans to use cloud storage? Will the files be served up on a website or a binary executable like a Windows application?

    In general, my experience has found that storing paths is less expensive to the business even accounting for the lack of ACID and the possibility of orphans. However, that does not mean that the internet is not legion with stories of lack of ACID control going wrong with file storage but it does mean that in general that solution is easier to build, understand and maintain.

    Why can't you use CDNs? This is a supported scenario with pretty much every CDN I've ever heard of.

    @BillyONeal - You can't use a CDN *and* store the file in the database. Unless you are OK with duplication, you can't have both.

    Erm, the whole point of a CDN is duplication. CDNs merely cache the target of a web address -- the only requirement is that there's an HTTP host serving the content, and that the content changes rarely. (How on earth is the CDN supposed to tell where you pulled the image from anyway?)

    @BillyONeal - However, I think this is bad choice of words on my part and I've adjusted my answer. Specifically, if you want to use *cloud storage* (and then perhaps use a CDN with your cloud storage), you can't do it natively with the database storage solution. You would have to write a synchronization routine to pull the files from the database and then send them to your cloud storage provider.

    @BillyONeal - In a way, your comment was the best answer. You can have all the benefits of DB storage, but none of the problems.

License under CC-BY-SA with attribution

Content dated before 6/26/2020 9:53 AM