How do you keep yourself motivated?
Do you ever get burned out? How do you keep yourself focused on technology that you encounter and work with on a daily basis? How do you keep yourself interested in the new?
Lately, I’ve found myself in a situation where I’m pushing myself to remain on top of things, and the push has become less and less significant. Just last week it turned into a pull.
Don’t get me wrong, I love the technology, but reward sometimes doesn’t outweigh the work involved. I have a 9-5 job that doesn’t require me to be on the up and up, but I know it benefits them when I do. To keep myself current, I’m going to attempt to take at least 2 of the tests required for the database developer certification this year. But the drive wanes.
I don’t have the money or the means to make it to any more conventions, so I can’t depend on that any more. I have plenty of distractions: an active family and an active photography blog to name a couple.
I don’t know how people like Adam Machanic or Kim Tripp and Paul Randal and Itzik keep going..keep posting…keep writing. I don’t know how they continue to keep up the drive. Maybe it is because they get paid for their efforts and maybe that is a copout bent on self-disservice…an excuse that isn’t fair to them. They weren’t always at the media top. They worked to get there.
Regardless, I haven’t written in a while. Simply, I’m waiting for the drive to return. It will. If you have any jump-starting ideas I’m game!
Sql Server Performance 2
Here are a few more takeaways from the performance class taought by Stephen Wynkoop on sql pserformance.
The Performance monitor
Start with task monitor to see if it is actually sql server that is dragging the machine. You might get a better ballpark.
Perfmon is a performance hit so watch update frequency. You can use backspace (after clicking on a counter) to select that counter. He says to start with a baseline and add the counters you want. You can save your workspace and change/update the scaling. Things to check:
- Page splits/second (you want the target less than 100) (under Logical Disk)
- Buffer Cache Hit Ratio (closer to 100 is better…RAM is a good fix for this problem). (this is under sql server general statistics)
- User Connections: good for understanding the load on the box. The more user connections the more hits on the box. (this is under sql server general statistics)
- If you do replication, check replication based counters
- Scan rates
- Device throughput (check disk i/o…can be an early indication of a bad disk)
- %processsor time (he likes to combine processor time, connections, and disk space to determine overall resource utilization on box) (under Processor)
- %Disk time (physical disk time, but beware of SAN. It can give a bad read. A san is an abstracted layer for storage. you might not be able to use this counter) (under Logical Disk)
Another really neat trick that was new to me was to run performance monitor and profiler at the same time to get a really good feel for what is going on with the system. EXCELLENT!
OTHER HELPFUL TOOLS
- Sp_who, sp_who2 (who is doing what)
- Sp_lock (shows locks)
- Sysprocesses (what operations are going on right now)
- sqlIO – capacity of disk subsystem (tool from Microsoft to check out disk subsystem)
Sql Server Performance
I attended a sswug online class taught by Stephen Wynkoop on sql server performance. Here are my takeaways:
Root causes of performance issues
- Indexing (missing, over indexing, wrong indexes)
- Lack of maintenance (stats, fragmentation (should be automated, easy to automate), growth issues)
- T-Sql
- Storage Subsystems (san, disk, etc) ..disk contention issues,
- Hardware (ram, etc)
Three DMVs to show you missing indexes
- Sys.dm_db_missing_index_details
- Sys.dm_db_missing_index_groups
- Sys.dm_db_missing_index_group_stats
Don’t index values that aren’t largely unique (for example a sex column or a country column where all records are in the same country)
Don’t index on 3rd party apps unless you have permission (and keep that documentation close)
He recommended keeping a running log of changes (index additions, removals) and watch indexes closely for fragmentation and utilization. (how do you watch for utilization levels on an index?)
Covering indexes are great and should include the columns you want to return. If you are pulling back lastname, firstname, city, state, zip on a consistent bases. Sql server can return the data back from the index, reducing one i/o. The catch, of course, is that sql server will have to maintain those indexes (which will hurt inserts and updates).
Autoupdate stats is fired when 20% of the data has changed.
Index order should be on selectivity with the most selective first
INDEX FRAGMENTATION TIPS
- The easy way is to use DBCC SHOWCONTIG WITH TABLERESULTS, ALL_INDEXES (or DBCC SHOWCONTIG (tblName) for one table) . The closer the scan density to 100 percent the better the health of the indexes. Also look for avg. pages per extent to be close to 8. I like to place my index stats into a temp table and query with the worst offenders on top
CREATE TABLE #frag (
ObjectName CHAR (255),
ObjectId INT,
IndexName CHAR (255),
IndexId INT,
Lvl INT,
CountPages INT,
CountRows INT,
MinRecSize INT,
MaxRecSize INT,
AvgRecSize INT,
ForRecCount INT,
Extents INT,
ExtentSwitches INT,
AvgFreeBytes INT,
AvgPageDensity INT,
ScanDensity DECIMAL,
BestCount INT,
ActualCount INT,
LogicalFrag DECIMAL,
ExtentFrag DECIMAL)insert into #frag exec (
‘DBCC SHOWCONTIG WITH TABLERESULTS, ALL_INDEXES’)select *, cast(CountPages as numeric)/cast(Extents as numeric) as ‘Avg. Pages per Extent’
from #frag order by ScanDensity, CountPages /Extents desc
- Look for avg_fragmentation_in_prercent from Select * from sys.dm_db_index_physical_stats(null,null,null,null,null). If more than 10% then you should defrag your indexes. The general recommendation is to rebuild over 30% fragmented. Defrag if 30% or less fragmented. Defragging is less effective but it doesn’t lock the table and recreate the index from scratch (which is what the rebuild does). For high impact systems you can also defrag over and over again until it gets to the level of fragmentation that you are comfortable with.
- Other options: rebuild with online= on and reorganize (to defragment)
- Review indexes quarterly
- Check out brent ozar’s discussion on identifying which databases have index fragmentation (http://sqlserverpedia.com/wiki/Index_Maintenance)
T-SQL TIPS
- Watch data type matching between schema and query (convert your parameter early to match column)
- Watch transactions. Don’t leave them wide open. It blocks other operations and creates contention
- Avoid temp tables (tempdb works very hard to keep up). You can replace it with working tables that are static and you can put indexes on them
- Watch naming convention. If it starts with sp_ it will actually query the master database first to try to find the system procedure
- Most of the time you don’t need hints….it is usually better to let sql serer do it
STORAGE SYSTEM TIPS
- Logical disks does not equal performance. You want physical spindles. You need more spindles to get performance because they re different access points to the drive.
- Disk speed is still the biggest contributor to performance issues
HARDWARE TIPS
- Over virtualization: more than one instance of sql server can impact performance
- Nic saturation (the network card is being over utilized…with a bad router, etc)
- Cabinet saturation: if your cabinet is joined with other servers that are spewing out tons of data this can cause problems as well
- Linked servers: performance takes a hit..there is a network between the two. connection speeds, resource utilization, check indexing on other server, etc.
- Database Compatibility Levels: if you are running a 2000 database on a 2008 server then performance will be hit (doesn’t have all the access to the new a native database with native tools and capabilities)
- Virtualization: multiple virtual machines that are fighting for resources.
Don’t forget the Resource Governor in SQL 2008 tool where you can tap down certain access to different applications (you can restrict how much memory it can use)
User has grant access to which procedures?
How to get the list of objects (stored procedures, specifically) for which a user has access
SELECT p.name, permiss.permission_name,permiss.state_desc, princ.name
FROM sys.database_permissions permiss
JOIN sys.database_principals princ
ON permiss.grantee_principal_id = princ.principal_id
JOIN sys.procedures p
ON major_id = p.object_id AND princ.name = ‘Domain\UserName’
where permiss.state_desc = ‘GRANT’
And here’s how to get the list of objects for which a user doesn’t have access. It is simply turning the above on it’s head. Lots of ways to do it, but I kept it this way for readability
with ctePROCS as
(
select p.name
from sys.database_permissions permiss
join sys.database_principals princ
on permiss.grantee_principal_id = princ.principal_id
join sys.procedures p on major_id = p.object_id
and princ.name = ‘Domain\User’
WHERE permiss.state_desc = ‘GRANT’
group by p.name
)
SELECT p.name
FROM
sys.procedures p
left join ctePROCS procs on p.name = procs.name
where procs.name is null
SQL Server 2008 Security by don kiely
I’m not very knowledgeable to sql server security, so I was glad to finally get to this class taught by Mr. Kiely. This is one of the classes I took in the 24 hour free technology marathon hosted by sqlpass. Click on the sqlpass24 tag to see the others. I’m very grateful to sql pass for providing this training.
Mr. Kiely is a developer who sees the database world from my perspective. While not a fan of security I’m a fan of him. When you listen to him you get the sense that he is just an average guy with a lot of great useful knowledge and a goofy sense of humor.
When we used to be neighbors in Alaska we used to hang out all the time. We would talk about security after a day of sledding in the snow.
Ok. that’s a complete lie.
Here is the information that I caught from the course. You should probably attend one yourself to catch what he really said, but this is what I picked up:
The big 2008 security changes that I saw
1. Builtin\administrator is no longer automatically sysadmin
2. In 2005 we received encryption and granular permissions. 2008 builds on it.
3. Encryption
ENCRYPTION
The biggest area of enhancement is encryption. Personally, I’m not interested in encryption at all. But as a developer I see it as my responsibility to be aware, as it can be very useful for sensitive data. If anything else, this class made me less afraid of it. It looks like it’s not near the overhead that it used to be and my fear of losing data due to some encryption botch-up is slowly inching itself away from reality.
SLQ 2005 introduced native data encryption at the cell level. There wasn’t any way to encrypt the entire database. In 2008 you can encrypt the entire database (it encrypts the file, the log, and the backup files, and thus is transparent to the application)
With transparent data encryption you cannot attach and read the data from an encrypted database. It queries a certificate to access physical files. Unlike 2005 this encrypts all data as it is written to disk…. decrypted as read from disk…and encrypts and decrypts each 8k page.
Filestream data is not encrypted, by the way.
OTHER SECURITY CONSIDERATIONS
Policy based management: centralized control over configuration settings, object properties, etc across server machines and instances. Can I just say WOW! I had no idea that was there. I might be mistaken, but it looks like I could allow cmd shell based upon a schedule. CRAZY.
Furthermore you can record any action against any object. What? Really? If I wanted to find out who was eating my oreos in the middle of the night I could do it with this. I speak in jest, but this is actually a very powerful feature.
Don, when are you going to write a security book?
Embed Reporting Services by Jessica Moss
I attended ‘Embed Reporting Services into your applications’ by Jessica Moss as part of the 24 hours of training hosted by sqlpass. Click on the sql24pass tag to see the other courses I took. Thank you J. Moss and SqlPass for the time and energy spent for my education. You were there for me.
Ms. Moss was very knowledgeable and has some interesting points on the differences between the different varieties of report services and the benefits therein. Specifically, she discussed local, server, and sharepoint distribution methods.
server mode beneifits
security
subscirptions
snapshots
report history
scalable
local mode benefits
It can be used in asp.net applications, windows form applications. Technically you can use it without a backend source… by passing in the datasource directly to the report (good reuse). By doing this you can get around licensing requirements (read: free). However, you can only export to excel and pdf with the local mode and data has to come from a valid datasource (not sure what that means..I think it has to be a .net object…not sure )
My favorite part of this session was the Q/A time. Jessica Moss had an opportunity to show that she knew what she was talking about with some of the tougher questions:
Q. Can we have access to multiple stored procedures form a single report?
A. On a local report, yes. on a server report no.
Q. Can you discuss differences bw 2005 and 2008 rdl report viewer control.
A. The rdl tablix, gages, and rtf (allowing html). The report viewer with local mode will have a problem with the new rtf capability. Server mode will not have a problem with any of the new features.
Q. Can rdl be converted to rdlc?
A. Yes. you can by editing the file. It’s harder to go from rdlc to rdl because you have to add the dataset information. But it can be done.
Q. Is there access to snapshots when in sharepoint mode?
A. It depends on how you have report services configured. In regular mode, yes. if your server is integrated you don’t have access to report manager…therefore you don’t have access to the snapshot tool with report server. However can do subscriptions but it’s through the sharepoint tool.
Database Compatibility Settings by Don Vilen
Just took a course taught by Don Vilen titled ‘Database Compatibility Settings: What they Really Do and Don’t Do”. Vilen has been a part of the knowledgebase for the different versions for several years (8?) and spoke to the compatibility between the versions. It was interesting to listen to the history and the problems and the work behind keeping the database compatibility issues to a minimum. The key points I took away from this course:
1. test your code
2. watch out for keywords not allowed (merge is a new one for 2008)
3. push vendors to upgrade and stay on top of your servers to upgrade asap.
Does the upgrade advisor catch compatibility? It will catch a lot of them if they are syntactical (can be caught by scanning code) but behavioral differences are hard to detect.
What will happen if i restore a database with a table named revert? it will work but if there aren’t brackets around it then there will be syntactical problems.
Changing your compatibility to 2008 and then back again doesn’t necessarily change it back…it’s kind of a pseudo change when you revert back.
Put it on a schedule and upgrade as soon as possible to take advantage of the newest versions.
Thanks Don for the interesting discussion and for sqlpass for hosting the 24 hours of sql pass. Click the sql24pass tag to see the other classes I took.
SPLIT with CLR
So I decided to take Adam Machanic and Itzik Ben-Gan at their word. The other day I took a course by Adam Machanic regarding whether to use t-sql or clr. And about a year ago I took a course by Itzik where he reviewed the different ways to split a comma separated list of words into a table (this can also be found in a whitepaper titled Arrays and Lists in Sql Server 2005).
Itzik said that it would indeed be best (for performance) to write a clr routine. Machanic himself tested that when just doing a simple charindex versus an indexof (in a clr routine) that clr performed 3x better when the length of characters exceeded 50,000 characters.
So, having never written a clr-anything I decided to give it a try, depending heavily on this article. There’s a new ‘SQL SERVER PROJECT’ option in visual studio 2008, but I didn’t know what it was…so I stayed away from it. Instead I just wrote a class affectionately named ‘fnParseString’ with the code from the article.
I compiled the code.
I turned on the thing that needed to be turned on for my server
sp_configure ‘clr enabled’, 1
GO
RECONFIGURE
I then created an assembly
CREATE ASSEMBLY CLRFunctions FROM ‘C:\tempsql\fnParseString.dll’
When I created a function with the article it didn’t work, telling me that it couldn’t find my UserDefinedFunction. After many iterations I ended up with the below external name (in italics below). Not sure how to rectify this one in the future, but one article I read suggested that it might be due to the fact that I picked just a class project.
CREATE FUNCTION [dbo].fnParseStringCLR(@string [nvarchar](4000), @separator [nchar](1))
RETURNS TABLE (
[StringCol] [nvarchar](max) NULL
) WITH EXECUTE AS CALLER
AS
EXTERNAL NAME CLRFunctions.[fnParseString.UserDefinedFunctions].parseStringCLR
Then I called it and compared to the next best thing..the function we use at my company (a function that uses a numbers table) the performance was significant. It was a 50% savings to use CLR. WOW!
—edit 9/16/09 3:45pm
In looking into the security concerns with CLR I found a great article by Don Kiely here. Two things I picked up from this article. CLR doesn’t have any more security concerns than t-sql within the same security context, but it does have possible memory issues with poorly written code…which can be the same problem with some of those nastily written t-sql procedures that you continue to ignore. : )
SQLclr or T-SQL? Performance Options
Adam Machanic, out of Boston, taught this class on the sql 24 pass online training class promoted and supported by sqlpass. THANK YOU sqlpass and THANK YOU Adam Machanic.. I didn’t know that Adam Machanic hosted the http://sqlblog.com site.
First off, some great quotes:
“Performance tuning is an art as well as a science.”
“If you don’t have the common sense, ask someone who does”
”If you have a solution in your software project that isn’t the right solution for the situation, you will feel it.”
”As a developer you have to develop a sense of what is good and what is not good and apply it on a situational basis.”
”Iterative logic shines with SQL-CLR”
SQL-CLR should
- Operate on data either headed for or already in the database.
- Should be used for data-centric operations (churning data), not application or ui-centric
Performance tradeoffs to consider
- Adding indexes improves read performance and hurts write performance
- Rewriting queries may make them more complicated
T-SQL is the only choice for reading data and doing standard data operations. But more elaborate uses like the following should be considered for CLR:
- math or string operations
- linear and time series computations
- heavy xml
- manipulation of different kinds of data (like images)
The question being attempted to answer: Does passing data to CLR carry huge overhead, considering that clr is jitted (compiled down to machine code) and T-SQL is interpreted?
TESTS:
Test 1 Return Len on a string. His first test:He created t-sql and slq clr function that return the length of the input string. he ran it 100K times and passed in different strengths. t-sql did better, but not by much. but when it reached over 50000 characters there was a difference in about 5 seconds.
Test 2 math: he found prime numbers from x to x. clr : 24 seconds. t-sql : 43 minutes, 2 seconds. clr performed better. Simple inline/non-procedure math may not be an issue (counting on a math). bottom line: if the formula needs to be a udf, then consider converting it to an inline procedure or consider clr.
Test 3. Non-math. string operations. He created two functions, one using t-sql charindex and the other (clr) using .net string.indexof. and then he asked to find the character Y..and called it 200 K times. When it was around 100K character they did the same. but when it went above that ..up to 50000 characters then clr performed 3x better.
Test 4. replace. pretty much the same before. created two functions, one using t-sql replace and the other using .net string.replace. ran it 100K again. at the low end around 2500 they performed the same, but at around 50000 characters clr performed 7x better (clr ran at 100 seconds, t-sql ran at 700 seconds)
Test 5. linear and time series (linear means any query where the current value depends on the previous value(s)). His example went something like his..’select t1.y, sum(t2.x) as running_x from t as t1 inner join t as t2 on t1.y >=t2.y group by t1.y) for each input row look at all the proceeding rows once …every single time.
You can use a cursor for this but the fastest guaranteed reliable method is CLR. Use an ado.net datareader (forward only), use an order by so that the data is sorted by the query engine and then loop with the data reader…use local variables to calculate and store running values. Output the data using sqlpipe (allows us to pass a rowset back to the caller…he called them row by row– no temp tables are needed). You can see the code on his blog )search for running sums). The results:
- set based sql: 15:43
- sql cursor: 0:04
- clr stored procedure: 0:01.5
Test 6. xml manipulation. .net has an xml datareader. sql server’s model is iterator based sqlxml model and simply doesn’t perform well. CLR is about 5-7xs faster. and it’s easier. 3 hours on t-sql with someone from microsoft xml ..about an hour on CLR for him to create. (you can download extended events code generator on his blog here)
Q/A
Q. Does CLR add memory overhead to the server?
A. It can. You want to watch memory very closely and use screaming algorithms (use datareader not a datatable for example). memory is the most expensive resource. bottom line: follow best practices and know what you are doing.
Q. Which is better for recursion for a hierarchy.
A. Neither. In 2005 the best way is to use materialized paths used for hierarchy. In 2008 use hierarchy id (which uses sql clr under the hood). Recursion is not the right solution.
Q. How was he calling charindex?
A. He used a clr function calling string.indexof He put it in a while loop and called it 100x. For the sql version he used charindex directly.
Q. We found that udfs are a serious drag on performance vs inline. Are you suggesting that we might increase performance if we use CLR instead?
A. Yes. Multi statement udfs are a drag and not good for performance. Look up Alex’s articles on sql blog (or any search on sqlblog on ‘inline’) to determine if it’s better for inline. Heavy calculations and functions are better in CLR..but generally inline is better.
Q. Why do some DBA’s not like CLR and think it’s not safe.
A. There’s a great acronym FUD: It stands for fear uncertainty and doom. what they don’t know they fear.
Q. Would you recommend using clr to replace a trigger table that receives 20 transactions per second. That trigger includes update and insert statements.
A. If you are just moving the data somewhere else like a log, then clr is not your best choice. CLR is not the best choice for standard data manipulation. If you are moving it to a log file on a disk or across servers then it might be interesting. If you are just using it for a log by the way, check out sql server 08 ‘change data capture’
Wait Stats and File Stats by Andrew Kelly part 2
This is part 2 of a class taught by Andrew Kelly in the sqlpass 24 hour free training hosted by sql pass. Wait Stats and File stats are the core of Andrew Kelly’s tools to check performance.
WAIT STATS
Definition: Any time sql server has to wait to do something it records the type of wait (wait type) and the time it had to wait in milliseconds (wait time). In sql server 2005 there are 194 different wait types and 485 different types in 2008. If it has to read rows…and go to the disk and come back to return your data it will count in milliseconds how long it takes to get this data.
2008 has new waits: PREEMPTIVE waits. These are waits that are waiting on something else. So if it is a procedure that is waiting on another procedure somewhere else then it is tracked as a preemptive wait because it is waiting on an external process.
Wait Stats are cumulative from the last time the database server was restarted (they can be manually reset).
Here’s how the entire things works.
This is the view sys.dm_os_wait_stats (in sql 2000 it was dbcc sqlperf). 21: 58:
When server attempts to do something it goes into a wait state..it starts keeping track at that point, indicated by the red line (T0). It says I’m waiting on ____(x resource)_____ and then it keeps track of the time in milliseconds. The T1 wait time (the signal wait time) starts when the resource becomes available and measures how long that resource takes (the data is ready…lets see how long it takes for the data to return). T0’s entire wait time is inclusive of T1’s wait time.
if you have lots of signal wait time that means you are (in a general sense) cpu bound. You are waiting on the resource. But if your overall wait time is more significant than your signal wait time that means you are waiting on the resource to become available. You are waiting on the disk or i/o of some sort.
Signal wait time means you are ready to do it but you are waiting for it to be dome…usually this means that you don’t have enough processors to do what you need to do.
Other points about Wait Stats
Get a snapshot using sys.dm_os_wait_stats dmv. Get a before and and an after and get the difference between the two (a delta). Take a snapshot at noon and then a snapshot at 2. That delta is the amount of waits between noon and two. To reset the counters you use dbcc sqlperf(‘sys.dm_os_wait_stats’, CLEAR)…although I’m not clear as to why you would want to/need to.
What you are looking for in the end with the delta you can generate a report to show what you are waiting on.
Most common waits :
cxPacket: the result of inefficient parallel processing…multiple threads going to the same process and is waiting on another thread. to fix, adjust MAXDOP at server or query level (hyper-threading). setting it to 1 eliminates parallelism.
lck_m_xx and LCK_M_SCH_xx results from locking and blocking…it is long running transactions and is due to improper ot lack of indexes, whcih reduces the amount of locking, which removes the amount of blocking.
async_networkio: usually indicates that the client is not absorbing the data as fast as sql server can send it. It may be a network but most likely it is just the client holding up the server from being able to pass the data. Client is written poorly or the hardware it is running on is poorly configured or not able to keep up. Sql has sent the data but client can’t keep up.
Disk related: pageiolatch_xx, io_completion, writelog: these are i/o issues. You cannot write or read to disk as fast as you need to. It is a problem with the stored subsystem, not enough disk, a driver, etc. (writelog waits should be kept as low as possible (if you cannot write transaction a to disk, transaction b has to wait, transaction c has to wait, etc).
pagelatch_xx (this is different than pageiolatch. not related to physical io). Can indicate contention for internal resources other than the bufferpooll (if you see pagelatch_up this indicates contention the files, most commonly in tempdb). heaps a lobs can cause latching…heavy inserts into the same pages, or page splits.
demo: he has a job that every once in a while he inserts file stats to a table and analyzed that data. He also had a wait stat job that did the same and then he had a stored procedure that gave the delta for two different time periods.
Q/A:
Q. How can i tie wait time aggregate data to individual queries?
A. There are a number of dmvs that can help with this (replaces sysprocesses in 2000)
Q. How often take a snapshot?
A. When he goes into a situation initially he does it every hour. After a while just once a day unless there is a problem of course.
He also recommended reading the whitepaper on Performance Tuning Waits Queues by Tom Davidson from Microsoft. He goes into how to capture the stat and what each statistic is…and explains some of the process as well as some of the wait types that can be ignored (waitfor is one example). Apparently there is a script that filters some of these out that Davidson has.
Click here or here for that 98 page document. It looks like a great read so far.
(images used by permission in part1 and part2: thank you!)
