Archive for the ‘Uncategorized’ Category
New photography site
I like to keep my personal and professional life somewhat separate, but there is this area that falls in the gray between those two lives. So I set up another blog to exploit it a little.
I like to take pictures and have started up another site called http://mattclingan.com. Check it out. I will be keeping this one technical, but I just thought I’d throw up a post about it. Thanks for placating.
Index Internals and usage by Kimberly Tripp
If you have heard Kimberly speak, you know she loves to talk about indexes. So this was a class where she could let loose with her love of indexes and run with it. And it was great! Every class i take from her is great. She is passionate about her topics, and that passion is transferred over to the crowd. I learned a lot and have much to learn! There are some major points that she hit below and my notes are here: 01_index-internals-and-usagkimberlytrip. This class might be difficult to read through. It’s a good class to sit through and listen to her explan as she goes through the slides, but hopefully it will help.
The point of optimization is not to find a good plan….but to find a good plan fast. This sometimes means that the best plan is yet to be discovered, so modifying and tweaking and testing your query is a must.
Optimization is about selectivity and what determines selectivity? Statistics. In sql 2000 you had to watch the number of statistics but in 2005 the more stats the better.
Sql server has to know how much data is going to be processed before it goes to the data effeciently.
Stats on Stats
- Stats are stored in sys.stats
- Stats are not valid between releases (use sp_updatestats)
- Turn on auto create stats and auto update stats
- There is a dmv called indexusage
- Consider adding more statistics using sp_createstats
- Also recommended: sp_createstats ‘indexonly’,'fullscan’
Indexes…Pages
- Each page is 8K (8096 if you take away the header(96 bytes)). There are 400 bytes per row which equates to 20 rows per page
- How do you see the page count? SELECT * FROM SYS.DM_DB_INDEX_PHYSICAL_STATS TO VIEW INDEX
- Sp_helpindex to view indexes
- A non clustered index is effectively a mini-table, which means they have a lot of interesting uses (which is why an index on last name, first name, middle initial allows you to search for all three or a combination of)
- Sometimes wider indexes are better than narrower. If you are seeking highly selective index names then the narrow index is selective enough. But it’s useless ot have narrow indexes when your queries are returning 1,2,3% of the data.
- What performance tools are available? She mentioned performance dashboard, the dta utility, and umv’s.
Key points
“Always start with a good clustered index key. If the non-clustered indexes have to look up the data based on the clustereing key that means the clustering key is in every single non-cluster. So make sure it’s narrow. if it’s wide it’s going to make all of your non-clusters unnecessarily wide. And if it’s not static then it has to go to all these places to keep it up to date. If it’s not unique, sql server ‘uniquifies’ which is highly costly (on insert, updates, times, and space). Start with a minimal number of non-clusters. if pk is not the clustered then make the pk the next index. then create your nonclustered unique keys. Then manually create indexes on your foreign keys. Then start adding highly selective search arguments. And then STOP.”
“Narrow indexes have very few uses, so use wider indexes.”
Accidental DBA – Paul Randal, Kimberly Tripp
I took a class titled ‘Accidental DBA’ by Paul Randal and Kimberly Tripp. It was my introduction to this couple and I discovered why people love to attend their classes. It takes a stroke of brilliance to bridge the gap between the inner workings of of a database and translate it into a common language. Someday I hope to have enough understanding to be able to teach a class of my own. They are my role models for now. Soon I will be theirs. : ) When you view the notes the first is not in any kind of outline. It is interesting to note that some instructors are more logical in their presentation. That is, they follow a logical topical path (Paul). Others present a more referential structure. They will hit a topic and apply information from a lot of places that apply to that topic (Kimberly). Neither is correct of corse. The former provides easier outlining structure, the latter provides a more Jackson Pollock paint splattering structure (try it). Just as pretty; just different. I digress. My notes: accidentaldbanotes
Highlights and snapshot overview
- In 2005 manage the temp database. It stores more structures and data than ever before (worktables, sort tables, hashing, userdefined object, table variables, # or ##, triggers) Also see Brandon Leuthke’s post on top 10 dmvs to use to query tempdb. For all these reasons isolate the tempdb, preallocate tempdb.
- How to move databases: see kb 224071 Whitepaper on working with tempdb
- dm_db_file_space_usage tells how space is being used in tempdb
- Autoshrink is evil
- Autoclose should be turned off
- Autogrow is ok, but don’t let it get out of control (2005 is better than 2000). And it can create problems with internal log fragmentation.
- Optimize log file, only one log file, isolate log file, pre-allocate log, defrag drive where log resides
- Anatomy of data modification (how it updates)
- Locks are like breadcrumbs and are OK! (most)
- See kb 271509 to see about queuing and on how to view who is blocking at the head of the chain (sp_blocer_pss80). In kb there is a bunch of code that can be used including a dbcc input buffer (shows last command of that blocking user)
- Nice discussion on transactions and locks and how it works (A transaction is an atomic unit. When the transaction is complete sql server writes logs to transactions, release locks, notifies user).
- Purpose of a checkpoint: reduce roll-forward recovery time…batch i/o to disk….help bring what is closer to current…no guarantee on consistency (only way is with logs)
- Recovery models discussed
- Page types (data, index, text, sort, boot, file header, allocation maps, diff map, MI map)
- How data is stored..the record structure
- How data is organized (heap, clustered index, non-clustered index, lob data, etc)
- How data is accessed (singleton lookup, range scan, allocation order scan)
- The 5 phases of execution
- Fragmentation and corruption (page splits and wasted space are the cause.)
- Schemas that use page splits: guids, updates to rows with variable len columns, random insertion points, mismanaged fill factor (whatever the heck that is)
- Corruption types
- How to catch corruption: (set alert on error types 823,824,825….turn on pagechecksums…checkdb should be run regularly (at least with physical_only))
- What to do with corruption
What’s new in asp.net 3.5 by Steven Smith
The class was a pretty good introduction to the new stuff. Attached are my notes.
There are new controls (listview, datapager to add paging support, linq datasource (bind to any linq enabled data model…(sql right now, but in the future amazone, flickr, text, xml, etc)
The visual studio enhancements are nice (javascript intellisence, multi-targeting support (asp.net 2.0 or 3.5 your choice), control extender support
Other things in the class: asp.net routing engine (similar to url rewriting), asp.net mvc
Regarding Linq: I asked about the overhead of using linq and he didn’t know. It’s basically an object relational mapper and a little better in performance than a dataset. All this translates: I won’t use it. Currently you can’t use edit/update/delete unless you fetc the entire row. (unless you use an anonymous type…you can get one from www.linqpad.net i guess). One good thing. It doesn’t run the query until it needs it (which is why paging support is so much better)
Understanding complications, plan reuse and recompilations of stored procedures – Itzik Ben-Gan
Understanding complications, plan reuse and recompilations of stored procedures – Itzik Ben-Gan
Can I just say WOW! The things I learned in this 1 hour class were phenomenal. Thanks Itzik for giving me a leg up on optimizing queries. My notes are here (02_understanding-compliationsstoredproc_itzikbengan) and his sql scripts are here.
PLAN RESUSE
When is a plan reused and what is the possible problem?
The problem is that the cost of reusing an inadequate plan is higher than the cost of recompilation
Any query over 1% of an index is not as efficient as a full table scan
How to query the existing plan: select * from sys.syscacheobjects where sql like ‘%something unique about stored proc. He used newid() to get an ide and added it as a commented line in the stored proc%’
· Usecounts will show how many times the plan has been used
An example would be to create a stored procedure and then run it for the first time with a highly selective criteria. The plan that is saved might be ineffective if that’s not a typical criteria.
Possible solutions:
Create an index that covers the ctriteria(n))
Note: when it doesn’t matter the selectivity it is called a trivial plan and the same plan is always usedc (the optimizer has a flag for this type of plan)
Add recompile to procedure
Specify certain queries not to optimize (add option (recompile) after where clause
The cost of recompile every time is less than the cost of using a wrong plan
PARAMETER SNIFFING
The problem is centered around the fact that a declaration inside a stored procedure cannot be ‘sniffed’ (read ahead), therefore it cannot be optimized. Great discussion on this read notes for solution
DYNAMIC FILTERS
This was another great example that showed a common <bad> use of dynamic sql. (example….where name like ‘%@parameter@%’ or city like ‘%@city%’ or state like ‘%@state%’) He talked about a way to optimize a query in a clever way. He also spoke of how to prevent sql injection (use sp_executesql with parameters). Excellent information I can use today.
Query Programming and Tuning – Itzik Ben-Gan
Query Programming and Tuning – Itzik Ben-Gan

I can’t express enough how much fun it was to take Itzik’s class. It was my first class with him, and now I know why he has such a huge following. He has a grasp for understanding the structure of the query language and a passion for database logic. I took a 6 hour class and have attempted to encapsulate my learnings in 25 pages of notes here. Also, you can find the sql for this class on the references section of his blog (www.insidetsql.com) or by clicking here.
Highlights:
THE OVER CLAUSE
“The over clause is the single most powerful thing in sql language” because it bridges the gap between iterative logic and the logic of sets. It is optimized well, orders well, and has a group of ranking functions. The different ranking functions are covered in good detail.
DATA AGGREGATION TECHNIQUES
Subqueries: expensive. Using subqueries is a separate scan for each subquery. Multiple joins are better than subqueries as they do one scan fo all.
Cursors: sometimes you have to use (benefits: can control order of elements and doesn’t impose restrictions) However the cost is extensive. It is several times slower than a basic aggregate
Grouping: single pass through data. This is a good solution unless you need a running aggregate. However, the over clause will give us better performance for this case.
Sliding aggregates: similar to grouping but we fixa a low point and an upper point (1 month ago to 1 month ahead)
Custom Aggregates: most common forms: string concatenation, median, mode
Pivot: Using pivot is useful but it’s not any better performance than grouping…it isn’t useful if there are large elements in the group….and it’s not straitforward (for someone else to administer later)
CLR User defined aggregates: 2005 is restricted to only 1 aggregate function with code and are restricted to 8000 bytes. Using clrs is the fastes way that he knows to do sting concatenation, but it won’t be easy to maintain.
AN AUXILILARY TABLE OF NUMBERS (what is it? a single column with a sequence of integers)
TIP: in 2005 you can use a synonym to make it look like a local object.
Use this table To SPLIT ARRAYS (the best way to split an array as far as performance is to use a clr). This is a great section that you should be sure to read as EVERY organization finds a need for this, and most of the time we end up writing functions that use poorly optimized resultsets.
TOP
The problem with top is how it is used: Select top 3 * from sales order by qty Here TOP needs to use ORDER BY for filtering purposes (which 3 rows to pick). The order by clause now has two meanings: display and filtering. There is no guarantee of presentation of the data. Instead use the OVER clause. (lots of examples in notes)
T-SQL VS CLR
If it’s data manipulation, use t-sql
If it’s iterative logic, algorithmic logic,k computational intensive clculationrs, string manipulation, parsing then use .net.



