SQL Server 2008 Security by don kiely
I’m not very knowledgeable to sql server security, so I was glad to finally get to this class taught by Mr. Kiely. This is one of the classes I took in the 24 hour free technology marathon hosted by sqlpass. Click on the sqlpass24 tag to see the others. I’m very grateful to sql pass for providing this training.
Mr. Kiely is a developer who sees the database world from my perspective. While not a fan of security I’m a fan of him. When you listen to him you get the sense that he is just an average guy with a lot of great useful knowledge and a goofy sense of humor.
When we used to be neighbors in Alaska we used to hang out all the time. We would talk about security after a day of sledding in the snow.
Ok. that’s a complete lie.
Here is the information that I caught from the course. You should probably attend one yourself to catch what he really said, but this is what I picked up:
The big 2008 security changes that I saw
1. Builtin\administrator is no longer automatically sysadmin
2. In 2005 we received encryption and granular permissions. 2008 builds on it.
3. Encryption
ENCRYPTION
The biggest area of enhancement is encryption. Personally, I’m not interested in encryption at all. But as a developer I see it as my responsibility to be aware, as it can be very useful for sensitive data. If anything else, this class made me less afraid of it. It looks like it’s not near the overhead that it used to be and my fear of losing data due to some encryption botch-up is slowly inching itself away from reality.
SLQ 2005 introduced native data encryption at the cell level. There wasn’t any way to encrypt the entire database. In 2008 you can encrypt the entire database (it encrypts the file, the log, and the backup files, and thus is transparent to the application)
With transparent data encryption you cannot attach and read the data from an encrypted database. It queries a certificate to access physical files. Unlike 2005 this encrypts all data as it is written to disk…. decrypted as read from disk…and encrypts and decrypts each 8k page.
Filestream data is not encrypted, by the way.
OTHER SECURITY CONSIDERATIONS
Policy based management: centralized control over configuration settings, object properties, etc across server machines and instances. Can I just say WOW! I had no idea that was there. I might be mistaken, but it looks like I could allow cmd shell based upon a schedule. CRAZY.
Furthermore you can record any action against any object. What? Really? If I wanted to find out who was eating my oreos in the middle of the night I could do it with this. I speak in jest, but this is actually a very powerful feature.
Don, when are you going to write a security book?
Embed Reporting Services by Jessica Moss
I attended ‘Embed Reporting Services into your applications’ by Jessica Moss as part of the 24 hours of training hosted by sqlpass. Click on the sql24pass tag to see the other courses I took. Thank you J. Moss and SqlPass for the time and energy spent for my education. You were there for me.
Ms. Moss was very knowledgeable and has some interesting points on the differences between the different varieties of report services and the benefits therein. Specifically, she discussed local, server, and sharepoint distribution methods.
server mode beneifits
security
subscirptions
snapshots
report history
scalable
local mode benefits
It can be used in asp.net applications, windows form applications. Technically you can use it without a backend source… by passing in the datasource directly to the report (good reuse). By doing this you can get around licensing requirements (read: free). However, you can only export to excel and pdf with the local mode and data has to come from a valid datasource (not sure what that means..I think it has to be a .net object…not sure )
My favorite part of this session was the Q/A time. Jessica Moss had an opportunity to show that she knew what she was talking about with some of the tougher questions:
Q. Can we have access to multiple stored procedures form a single report?
A. On a local report, yes. on a server report no.
Q. Can you discuss differences bw 2005 and 2008 rdl report viewer control.
A. The rdl tablix, gages, and rtf (allowing html). The report viewer with local mode will have a problem with the new rtf capability. Server mode will not have a problem with any of the new features.
Q. Can rdl be converted to rdlc?
A. Yes. you can by editing the file. It’s harder to go from rdlc to rdl because you have to add the dataset information. But it can be done.
Q. Is there access to snapshots when in sharepoint mode?
A. It depends on how you have report services configured. In regular mode, yes. if your server is integrated you don’t have access to report manager…therefore you don’t have access to the snapshot tool with report server. However can do subscriptions but it’s through the sharepoint tool.
Database Compatibility Settings by Don Vilen
Just took a course taught by Don Vilen titled ‘Database Compatibility Settings: What they Really Do and Don’t Do”. Vilen has been a part of the knowledgebase for the different versions for several years (8?) and spoke to the compatibility between the versions. It was interesting to listen to the history and the problems and the work behind keeping the database compatibility issues to a minimum. The key points I took away from this course:
1. test your code
2. watch out for keywords not allowed (merge is a new one for 2008)
3. push vendors to upgrade and stay on top of your servers to upgrade asap.
Does the upgrade advisor catch compatibility? It will catch a lot of them if they are syntactical (can be caught by scanning code) but behavioral differences are hard to detect.
What will happen if i restore a database with a table named revert? it will work but if there aren’t brackets around it then there will be syntactical problems.
Changing your compatibility to 2008 and then back again doesn’t necessarily change it back…it’s kind of a pseudo change when you revert back.
Put it on a schedule and upgrade as soon as possible to take advantage of the newest versions.
Thanks Don for the interesting discussion and for sqlpass for hosting the 24 hours of sql pass. Click the sql24pass tag to see the other classes I took.
SPLIT with CLR
So I decided to take Adam Machanic and Itzik Ben-Gan at their word. The other day I took a course by Adam Machanic regarding whether to use t-sql or clr. And about a year ago I took a course by Itzik where he reviewed the different ways to split a comma separated list of words into a table (this can also be found in a whitepaper titled Arrays and Lists in Sql Server 2005).
Itzik said that it would indeed be best (for performance) to write a clr routine. Machanic himself tested that when just doing a simple charindex versus an indexof (in a clr routine) that clr performed 3x better when the length of characters exceeded 50,000 characters.
So, having never written a clr-anything I decided to give it a try, depending heavily on this article. There’s a new ‘SQL SERVER PROJECT’ option in visual studio 2008, but I didn’t know what it was…so I stayed away from it. Instead I just wrote a class affectionately named ‘fnParseString’ with the code from the article.
I compiled the code.
I turned on the thing that needed to be turned on for my server
sp_configure ‘clr enabled’, 1
GO
RECONFIGURE
I then created an assembly
CREATE ASSEMBLY CLRFunctions FROM ‘C:\tempsql\fnParseString.dll’
When I created a function with the article it didn’t work, telling me that it couldn’t find my UserDefinedFunction. After many iterations I ended up with the below external name (in italics below). Not sure how to rectify this one in the future, but one article I read suggested that it might be due to the fact that I picked just a class project.
CREATE FUNCTION [dbo].fnParseStringCLR(@string [nvarchar](4000), @separator [nchar](1))
RETURNS TABLE (
[StringCol] [nvarchar](max) NULL
) WITH EXECUTE AS CALLER
AS
EXTERNAL NAME CLRFunctions.[fnParseString.UserDefinedFunctions].parseStringCLR
Then I called it and compared to the next best thing..the function we use at my company (a function that uses a numbers table) the performance was significant. It was a 50% savings to use CLR. WOW!
—edit 9/16/09 3:45pm
In looking into the security concerns with CLR I found a great article by Don Kiely here. Two things I picked up from this article. CLR doesn’t have any more security concerns than t-sql within the same security context, but it does have possible memory issues with poorly written code…which can be the same problem with some of those nastily written t-sql procedures that you continue to ignore. : )
SQLclr or T-SQL? Performance Options
Adam Machanic, out of Boston, taught this class on the sql 24 pass online training class promoted and supported by sqlpass. THANK YOU sqlpass and THANK YOU Adam Machanic.. I didn’t know that Adam Machanic hosted the http://sqlblog.com site.
First off, some great quotes:
“Performance tuning is an art as well as a science.”
“If you don’t have the common sense, ask someone who does”
”If you have a solution in your software project that isn’t the right solution for the situation, you will feel it.”
”As a developer you have to develop a sense of what is good and what is not good and apply it on a situational basis.”
”Iterative logic shines with SQL-CLR”
SQL-CLR should
- Operate on data either headed for or already in the database.
- Should be used for data-centric operations (churning data), not application or ui-centric
Performance tradeoffs to consider
- Adding indexes improves read performance and hurts write performance
- Rewriting queries may make them more complicated
T-SQL is the only choice for reading data and doing standard data operations. But more elaborate uses like the following should be considered for CLR:
- math or string operations
- linear and time series computations
- heavy xml
- manipulation of different kinds of data (like images)
The question being attempted to answer: Does passing data to CLR carry huge overhead, considering that clr is jitted (compiled down to machine code) and T-SQL is interpreted?
TESTS:
Test 1 Return Len on a string. His first test:He created t-sql and slq clr function that return the length of the input string. he ran it 100K times and passed in different strengths. t-sql did better, but not by much. but when it reached over 50000 characters there was a difference in about 5 seconds.
Test 2 math: he found prime numbers from x to x. clr : 24 seconds. t-sql : 43 minutes, 2 seconds. clr performed better. Simple inline/non-procedure math may not be an issue (counting on a math). bottom line: if the formula needs to be a udf, then consider converting it to an inline procedure or consider clr.
Test 3. Non-math. string operations. He created two functions, one using t-sql charindex and the other (clr) using .net string.indexof. and then he asked to find the character Y..and called it 200 K times. When it was around 100K character they did the same. but when it went above that ..up to 50000 characters then clr performed 3x better.
Test 4. replace. pretty much the same before. created two functions, one using t-sql replace and the other using .net string.replace. ran it 100K again. at the low end around 2500 they performed the same, but at around 50000 characters clr performed 7x better (clr ran at 100 seconds, t-sql ran at 700 seconds)
Test 5. linear and time series (linear means any query where the current value depends on the previous value(s)). His example went something like his..’select t1.y, sum(t2.x) as running_x from t as t1 inner join t as t2 on t1.y >=t2.y group by t1.y) for each input row look at all the proceeding rows once …every single time.
You can use a cursor for this but the fastest guaranteed reliable method is CLR. Use an ado.net datareader (forward only), use an order by so that the data is sorted by the query engine and then loop with the data reader…use local variables to calculate and store running values. Output the data using sqlpipe (allows us to pass a rowset back to the caller…he called them row by row– no temp tables are needed). You can see the code on his blog )search for running sums). The results:
- set based sql: 15:43
- sql cursor: 0:04
- clr stored procedure: 0:01.5
Test 6. xml manipulation. .net has an xml datareader. sql server’s model is iterator based sqlxml model and simply doesn’t perform well. CLR is about 5-7xs faster. and it’s easier. 3 hours on t-sql with someone from microsoft xml ..about an hour on CLR for him to create. (you can download extended events code generator on his blog here)
Q/A
Q. Does CLR add memory overhead to the server?
A. It can. You want to watch memory very closely and use screaming algorithms (use datareader not a datatable for example). memory is the most expensive resource. bottom line: follow best practices and know what you are doing.
Q. Which is better for recursion for a hierarchy.
A. Neither. In 2005 the best way is to use materialized paths used for hierarchy. In 2008 use hierarchy id (which uses sql clr under the hood). Recursion is not the right solution.
Q. How was he calling charindex?
A. He used a clr function calling string.indexof He put it in a while loop and called it 100x. For the sql version he used charindex directly.
Q. We found that udfs are a serious drag on performance vs inline. Are you suggesting that we might increase performance if we use CLR instead?
A. Yes. Multi statement udfs are a drag and not good for performance. Look up Alex’s articles on sql blog (or any search on sqlblog on ‘inline’) to determine if it’s better for inline. Heavy calculations and functions are better in CLR..but generally inline is better.
Q. Why do some DBA’s not like CLR and think it’s not safe.
A. There’s a great acronym FUD: It stands for fear uncertainty and doom. what they don’t know they fear.
Q. Would you recommend using clr to replace a trigger table that receives 20 transactions per second. That trigger includes update and insert statements.
A. If you are just moving the data somewhere else like a log, then clr is not your best choice. CLR is not the best choice for standard data manipulation. If you are moving it to a log file on a disk or across servers then it might be interesting. If you are just using it for a log by the way, check out sql server 08 ‘change data capture’
Wait Stats and File Stats by Andrew Kelly part 2
This is part 2 of a class taught by Andrew Kelly in the sqlpass 24 hour free training hosted by sql pass. Wait Stats and File stats are the core of Andrew Kelly’s tools to check performance.
WAIT STATS
Definition: Any time sql server has to wait to do something it records the type of wait (wait type) and the time it had to wait in milliseconds (wait time). In sql server 2005 there are 194 different wait types and 485 different types in 2008. If it has to read rows…and go to the disk and come back to return your data it will count in milliseconds how long it takes to get this data.
2008 has new waits: PREEMPTIVE waits. These are waits that are waiting on something else. So if it is a procedure that is waiting on another procedure somewhere else then it is tracked as a preemptive wait because it is waiting on an external process.
Wait Stats are cumulative from the last time the database server was restarted (they can be manually reset).
Here’s how the entire things works.
This is the view sys.dm_os_wait_stats (in sql 2000 it was dbcc sqlperf). 21: 58:
When server attempts to do something it goes into a wait state..it starts keeping track at that point, indicated by the red line (T0). It says I’m waiting on ____(x resource)_____ and then it keeps track of the time in milliseconds. The T1 wait time (the signal wait time) starts when the resource becomes available and measures how long that resource takes (the data is ready…lets see how long it takes for the data to return). T0’s entire wait time is inclusive of T1’s wait time.
if you have lots of signal wait time that means you are (in a general sense) cpu bound. You are waiting on the resource. But if your overall wait time is more significant than your signal wait time that means you are waiting on the resource to become available. You are waiting on the disk or i/o of some sort.
Signal wait time means you are ready to do it but you are waiting for it to be dome…usually this means that you don’t have enough processors to do what you need to do.
Other points about Wait Stats
Get a snapshot using sys.dm_os_wait_stats dmv. Get a before and and an after and get the difference between the two (a delta). Take a snapshot at noon and then a snapshot at 2. That delta is the amount of waits between noon and two. To reset the counters you use dbcc sqlperf(‘sys.dm_os_wait_stats’, CLEAR)…although I’m not clear as to why you would want to/need to.
What you are looking for in the end with the delta you can generate a report to show what you are waiting on.
Most common waits :
cxPacket: the result of inefficient parallel processing…multiple threads going to the same process and is waiting on another thread. to fix, adjust MAXDOP at server or query level (hyper-threading). setting it to 1 eliminates parallelism.
lck_m_xx and LCK_M_SCH_xx results from locking and blocking…it is long running transactions and is due to improper ot lack of indexes, whcih reduces the amount of locking, which removes the amount of blocking.
async_networkio: usually indicates that the client is not absorbing the data as fast as sql server can send it. It may be a network but most likely it is just the client holding up the server from being able to pass the data. Client is written poorly or the hardware it is running on is poorly configured or not able to keep up. Sql has sent the data but client can’t keep up.
Disk related: pageiolatch_xx, io_completion, writelog: these are i/o issues. You cannot write or read to disk as fast as you need to. It is a problem with the stored subsystem, not enough disk, a driver, etc. (writelog waits should be kept as low as possible (if you cannot write transaction a to disk, transaction b has to wait, transaction c has to wait, etc).
pagelatch_xx (this is different than pageiolatch. not related to physical io). Can indicate contention for internal resources other than the bufferpooll (if you see pagelatch_up this indicates contention the files, most commonly in tempdb). heaps a lobs can cause latching…heavy inserts into the same pages, or page splits.
demo: he has a job that every once in a while he inserts file stats to a table and analyzed that data. He also had a wait stat job that did the same and then he had a stored procedure that gave the delta for two different time periods.
Q/A:
Q. How can i tie wait time aggregate data to individual queries?
A. There are a number of dmvs that can help with this (replaces sysprocesses in 2000)
Q. How often take a snapshot?
A. When he goes into a situation initially he does it every hour. After a while just once a day unless there is a problem of course.
He also recommended reading the whitepaper on Performance Tuning Waits Queues by Tom Davidson from Microsoft. He goes into how to capture the stat and what each statistic is…and explains some of the process as well as some of the wait types that can be ignored (waitfor is one example). Apparently there is a script that filters some of these out that Davidson has.
Click here or here for that 98 page document. It looks like a great read so far.
(images used by permission in part1 and part2: thank you!)
Wait Stats and File Stats by Andrew Kelly part 1
I attended a class taught by Andrew Kelly in the sqlpass 24 hour free training hosted by sql pass. Wait Stats and File stats are the core of Andrew Kelly’s tools to check performance.
Part 1: FILE STATS
Definition: Any time sql server reads or writes to the physical file (any file: data, log, secondary, etc) it records information. It records how often, how much data, how long (it takes to read and write), overall size. This is tracked for every file.
Mechanism used? sys.dm_io_virtual_file_stats (replaced fn_virtualfilestats() in 2000)
These numbers are cumulative since the last time sql server was restarted (no other way to restart these stats). These are presented as views; they are actually being held in memory. So Autoclose will clear the stats for that database.
These numbers are physical i/o. If your query is already in cache it is using logical i/o and it doesn’t affect these numbers.
Don’t forget Temp db…this is a database that has a lot of physical i/o. (tends to have more than other databases). Pay special attention to this database ( temp tables, sorting, cached joins, etc…these things create a tremendous amount of physical i/o)
You can use the two (file stats and wait stats) together to get more information.
Be aware that backups can skew these results. You have to account for them.
File stats are easy to monitor, so there isn’t any excuse to not see it. Tip #1: Get a baseline and work from there. Without a baseline there is no way to know if there is a problem that is sneaking up on you.
Most common issues that show up in file stats and what to do about it
High stalls on writes. He recommended the following
- add more write back cache to the controller or san
- change the ratio to be 100% write back & 0% read
- Change the raid type
- Add more spindles to disk array
- Always separte log and data files from different arrays (sequential vs. random access)
- Separate tempdb log file
High Stalls on Reads
- You can often get more performance from optimizing queries over by throwing hardware at it.
- Same as 2-6 above (increasing write speed increases read speed)
File Stat columns:
Restoration
Pretty basic stuff…but here are a few things I use when restoring a database including determining who is connected to the database. I always lose this information between the times that I’m not actively restoring a database. As a precursor, make sure it’s not you (close all sessions and apps and windows that might be connected to it)
1. If users are in the database, determined by calling everyone in your company or …
declare @DBName varchar(15) = ‘yourdatabase’
SELECT spid, nt_username, *
FROM master..sysprocesses
WHERE dbid = DB_ID(@DBName) and spid != @@SPID
– @@spid is you
– you can use sp_who but it lists all connections to the database server
2. You can ask them politely to get out or you can violently kill their sessions with…
USE MASTER
DECLARE @DBName varchar(15) = ‘yourdatabase’
DECLARE @spid INT,
@cnt INT,
@sql VARCHAR(255)
SELECT @spid = MIN(spid), @cnt = COUNT(*)
FROM master..sysprocesses
WHERE dbid = DB_ID(@dbname)
AND spid != @@SPID
PRINT ‘Starting to KILL ‘+RTRIM(@cnt)+’ processes.’
WHILE @spid IS NOT NULL
BEGIN
PRINT ‘About to KILL ‘+RTRIM(@spid)
SET @sql = ‘KILL ‘+RTRIM(@spid)
EXEC(@sql)
SELECT @spid = MIN(spid), @cnt = COUNT(*)
FROM master..sysprocesses
WHERE dbid = DB_ID(@dbname)
AND spid != @@SPID
PRINT RTRIM(@cnt)+’ processes remain.’
END
3. If that doesn’t work or they continue to connect then you can always set it to single-user database mode.
Use Master
DECLARE @dbName varchar(15) = ‘yourdatabase’
Alter Database @dbName
SET SINGLE_USER With ROLLBACK IMMEDIATE
–this rolls back any transactions.
4. I still like to use the GUI to restore but you can use this of course. ‘they’ say it is better this way.
RESTORE DATABASE @dbName
FROM DISK = ‘c:\yourdirectory\yourfile.bak’
Proud parent of 32 orphaned dt_ procedures
Have you ever seen these? They mysteriously popped up one afternoon on a newly moved database (to 2008). I knew they didn’t belong, so I deleted them (and one table named dtproperties or something like that). By the end of the next day they showed up again. strange.
A little digging around led to some posts where people said it had something to do with the database diagramming software in ssms (if you use the diagrams in 2005 or 2008 it automatically creates constraints, indexes…and supposedly some procedures), but I didn’t have any diagrams.
I dropped them and a little later that afternoon they showed up again….all cute and giggly.
It wasn’t affecting anything, there wasn’t any data and there were no problems with these newly adopted objects hanging around mother database. The lazy in me was tempted to let them play.
But after digging around a but more I found a lonely soul who had an adp database. Access has this file variety called ADP (Access 2003…It is discontinued in 2007). Apparently, it is an intimate tie to the sql server backend. If you create something in this adp access file it creates it in the back-end. This seemed like the right road to travel to find the home of these orphaned procedures. And sure enough, there it was. All you have to do is click ‘database diagrams’ and viola….a script throws up 32 procedures and a supporting table. There weren’t any diagrams drawn and the user was innocently clicking the different buttons on the left (tables, queries, diagrams, forms, reports). That’s it. That’s all you need to do to find yourself a proud parent of these cute little guys:
dt_addtosourcecontrol
dt_addtosourcecontrol_u
dt_adduserobject
dt_adduserobject_vcs
dt_checkinobject
dt_checkinobject_u
dt_checkoutobject
dt_checkoutobject_u
dt_displayoaerror
dt_displayoaerror_u
dt_droppropertiesbyid
dt_dropuserobjectbyid
dt_generateansiname
dt_getobjwithprop
dt_getobjwithprop_u
dt_getpropertiesbyid
dt_getpropertiesbyid_u
dt_getpropertiesbyid_vcs
dt_getpropertiesbyid_vcs_u
dt_isundersourcecontrol
dt_isundersourcecontrol_u
dt_removefromsourcecontrol
dt_setpropertybyid
dt_setpropertybyid_u
dt_validateloginparams
dt_validateloginparams_u
dt_vcsenabled
dt_verstamp006
dt_verstamp007
dt_whocheckedout
dt_whocheckedout_u
What’s the umbrella for?
I know that I can use sp_helptext to view the text of a single stored procedure, but I would like to know if I ever use the term fadrizzle in any stored procedures. How can I search all stored procedures? fadrizzle?
SELECT Name
FROM sys.procedures
WHERE OBJECT_DEFINITION(OBJECT_ID) LIKE ‘%fadrizzle %’
Turns out this doesn’t work all the time…not sure why yet. Good news though. Antonin Foller (www.foller.cz) wrote a script on how to do this very thing:
http://www.motobit.com/tips/detpg_sql-find-text-stored-procedure/ Title="Special stored procedure to find any text inside a source code of all stored procedures in one database/server."
