Developing Matt

My Technical Journal

Programming the filestream

with 12 comments

I attended a class affectionately named ‘Programming the filestream object’ taught by the great Bob Beauchemin.  it was an excellent class.

“Storing large binary objects in databases is traditionally suboptimal.”  If you haven’t read it then you need to read the Whitepapaer ‘To blob or not to blob’ It researched at what point you need to use the file system vs using a blob in sql server.  The results are:

  • In the database optimal up to 256 kb.
  • In the file system optimal over 1 mb

Database memory buffers is the problem.  Updating large objects causes fragmentation in the database but in the file system is a delete and an insert.  The nice thing about filestreams is that it has transactional consistency, an integrated point in time backup and restore, and a single storage and query vehicle.

File stream is a new storage mechanism for varbinary (max) columns. This is the only one you can use for the filestream object and is specified by a filestream keyword on a column in the create/alter table.  The metadata is still stored in database but the data is stored in file system (note: if there is a null value in the column then there is no file)

Filestream setup is in 2 pieces

OS portion of filestream setup

  • Sql server configuration manager
  • Sql server instance properties. Filestream tab.
  • Enable filestream for file i/o streaming access: this is enabling streaming for i/o access, not the buffer type. it’s pretty quick with just turning it on. but if you enable that then it adds a driver over the os. if you allow clients to upload data to your filestream column you have to allow the smb protocol (most of us will).  The includes opening up the smb port (port 445) and enable the service, which is what the remote procedure call uses (i think).
  • Filestream files have to be local to the sql server

Sql server configuration

  • issue sql command: Exec sp_configure ‘filesteam access level’,’2’ reconfigure (1 is transact sql only)
  • issue sql command: Alter database northwind add filegroup filestreamgroup1 contains filestream
  • issue sql command: Alter database northwind add file (name = photos, filename = ‘c:\files\photos’ to filegroup filestreamgroup1
    • The c\files has to exist already. c:\files|photos cannot exist
    • It needs to be local, you can store it on a san, but not on a nas.
    • It needs to be ntfs and can’t be remote storage.
    • The administrator can delete these files which WILL corrupt the database.
  • It creates a filestream header and a filestream log. it is transactional and point and time consistent but it keeps its own log. sql server uses this log (which contains what you inserted, deleted, etc).

Bob (can I call him Bob?).  Mr Beauchemin then created an emplyees2 and added a column “photo varbinary(max) filestream null”. You also have to have a guid for yoru filestream “rowquid (uniqueidentifier) not null rowcol unique default newid()”.  At this point it creates a new folder in the os for the table and another folder for the column. Also a log is updated at this point so that recovery can re-create if needed.  Also note that when you insert records into this column you would assume that it would use this guid.  It derives it’s own filestream names based upon this guid.  He advised that paul randall wrote about how it gets it’s file names from the guid.

Other points

  • Nvarchar(max), xml is not supported (you can’t put a filestream on a nvarchar column)
  • Filestream columns can be larger than 2 gb.
  • DACL’d to administrator and sql server admin
  • Varbinarymax has a write method is unavailable (can’t do incremental updates)

Filestream programming points

  • It is a normal sql column.
  • You can also access this using a win32 i/o api’s and special file handle.
    • Note file must not be access with an ordinary win32 file handle or database corruption can result.
  • This is a file handle from sql server.
  • If you update it will create an extra file which will be removed at checkpoint.
  • To get the path there is a method on the column: Select student.resume.Pathname –resume is the column.
    • pathname is case sensitive by the way
  • If you use a blank string trick then you will get a pathname because it will add a file.
  • These function still work: Substring, Charindex, len , replace, cast, convert, full text indexed filestreams can use contains, freetext. encryption is not supported on filestream data
  • It’s all supported by sql express by the way and doesn’t affect the 4 gig limit.

streaming

To use streaming io you must have a windows login not a sql login)

Methods

  • Opensqlfilestream (method defined in sql native client driver)
  • Oledb
  • odbc
  • Ado.net
  • He gave a good example using the streaming api’s.  I will try to find and post this later.  But he didn’t use .net code so he used a sql import to get the sql api.  But in his code:
    • He went to sql for the transaction token and for the name of the file. and then accessed the file using that information.
    • He stated the different Open options: async, no_buffering, write_through (pushes the data over when you write which is the result), sequential_scan (fastest for reading entire file), random_access (skip to certain portion of the file. this works quite well)
  • Win32 apis supported: opensqlfilestream (retrieves handle), closehandle (frees handle), readfile, writefile, transmit file, setfilepointer, setendoffile, flushfilebufferssupported
  • Sqlfilestream (.net 3.1 and later is needed). system.data.sqltypes (inherits from stream.io.stream), supports asynchronous reads and writes, write-through is not the default (uses client buffering: won’t flush data back to file until you close/save it)
  • He also talked about the access pattern.  I will try to find this as well.  (the order of accessing the filestream data)
  • For an insert you have to insert an empty string first. But you can use output clause so you don’t have to make more than one roundtrip. ‘insert into dbo.student output inserted.resume.pathname, getfilestream_tran
  • Don’t issue a delete through the filestream. just do it in sql
  • You can’t use snapshot on the database with a table inside that has filestream.
  • No special support in linq to sql and entity framework
  • It does support full text searching

He also mentioned and displayed an application called Spatialalbum (written primarily by ????)  that utilizes filestream as well as many other sql2008 features.  (it is a medea driven app with google earth to organize pictures based upon their location.  pretty interesting app.  It will be in codeplex.  if I find it later I’ll include linkage. 

Advertisements

Written by matt

March 24, 2009 at 9:27 am

12 Responses

Subscribe to comments with RSS.

  1. Hi Matt,
    being bleeding edge stuff, there still isn’t a lot of info on Filestreams out there. I’m currently trying to figure the magic out for fetching the PathName within VBA. Would you have any pointers to info on that?

    Thanks

    JK

    Jeff Kesselman

    October 1, 2009 at 11:17 am

  2. Within VBA you would write your select statement to fetch the pathname:
    “SELECT FileStreamColumnName.PathName() FROM yourTable”

    matt

    October 1, 2009 at 12:03 pm

    • I tried:

      strSQL = “SELECT FileData.PathName() FROM dbo_SWF_Files”
      Set rs = db.OpenRecordset(strSQL, dbOpenDynaset, dbSeeChanges)

      That gives me the following error:

      Run-time error ‘3085’
      Undefined function ‘FileData.PathName’

      Its worth noting that if I eliminate the .PathName() I get blob bits back just fine.

      Jeff Kesselman

      October 1, 2009 at 12:53 pm

      • Im wondering if maybe, since this is access, it isnt really running the query against the local shadow which cant handle the PathName() function?

        Do I maybe need to open a secondary connection directly to SQLServer? If so thats annoying but doable i guess. Woudl you happen to have a good pointer as to how to do that in a way that uses what Access already knows and doesnt require me to hardcode the SQL server name in the VBA code?

        Jeff Kesselman

        October 1, 2009 at 12:56 pm

      • according to http://blogs.msdn.com/manisblog/archive/2007/10/21/filestream-data-type-sql-server-2008.aspx

        ” SQL Server restricts access to the FILESTREAM data container(s), except when access is made by using the Transact-SQL transactions and OpenSqlFilestream APIs”

        So you are right, it might be a limitation of your connection.

        You might check the whitepaper on the filestream object. I perused to see if there was anything that might apply to your situion, but didn’t see anthing outright
        http://msdn.microsoft.com/en-us/library/cc949109.aspx

        matt

        October 1, 2009 at 1:16 pm

      • Yeah, Access doesn’t seem to know about Filestreams yet.

        I fixed it by opening my own connection to the SQL server in the VBA method.

        Thanks

        jeff Kesselman

        October 2, 2009 at 11:39 am

      • Hey thanks Jeff for writing back to let me know that we were on the right track.

        matt

        October 2, 2009 at 1:28 pm

      • My pleasure… I’m still stuck now on trying to find the OpenSqlFilestream APi references from VBA so I can actually manipulate the data. If you have any pointers…

        jeff Kesselman

        October 2, 2009 at 1:39 pm

  3. So a stupid question ill probably answer for myself before you do…

    In what References collection are the Opensqlfilestream objects hiding?

    jeff Kesselman

    October 2, 2009 at 11:43 am

  4. Short answer seems to be, you cant get there from here.

    Ill need to write a piece of .NET bridge code.

    I suppose its historical reasons why VBA isnt a .NET language *sigh*

    jeff Kesselman

    October 2, 2009 at 2:55 pm

  5. I am trying to get to the attachments in a column in the sql server db that i have. We have a column called EMLStream that contains an email msg from outlook with attachments. I tried using :
    Select top 100 *, EMLStream.PathName() as path2 from emails with (nolock)
    on Query Analyzer but i get this error msg:
    Cannot find either column “EMLStream” or the user-defined function or aggregate “EMLStream.PathName”, or the name is ambiguous.

    I have an ms access application that i want to be able to view the attachments (xlsx, docx, jpg, gif, pdf,etc…). I have never done this and need help with syntax and code….

    Alex

    November 17, 2011 at 9:32 am

  6. Others have had the same problem using Access with the filestream object. Give this post a try: http://www.codeproject.com/KB/database/AccessBlob.aspx

    matches

    November 17, 2011 at 10:11 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: