Developing Matt

My Technical Journal

What Itzik Taught Me Part 1

leave a comment »

 Itzik-Ben-Gan wrote a book titled (and entitled) Microsoft Sql Server 2008: T-SQL Fundamentals (link).  I have much appreciation for this speaker.  The amount of information he provides in his sessions and books is not only wonderful but also graspable.  His style of teaching is concept first, then technique later.  He will lay a conceptual groundwork in his material that he builds on and elaborates.  My first introduction to Itzik was at a pre-session to Dev Connections Conference.  The hotel was beautiful, the food was amazing.  But instead of enjoying it I stayed in my room and reviewed and reviewed and reviewed the session notes until 3 in the morning, badly preparing me for the next day but launching into new heights in my understanding of the sql language.  I enjoyed it the language before; I loved it after.

I’m a very detailed reader, so it takes me a long time to get through books.  Finances keep me from going to his courses, so I am going through the t-sql fundamentals one page at a time and below are my discoveries and take-aways.

1.  I’ve always wondered what the difference was between count(*) and count(column_A).  Count(column_A) only counts the known values.  in other words if columnA has a null it won’t be counted.  Count(*) includes nulls (p.33)

2.  You can count(distinct column_A) values!  (p.33)

3.  When you specify an order you are, in essence, forcing a cursor instead of selecting a table (p.41)

4.  ORDER BY just sorts, unless it is used with TOP.  When this occurs it sorts and filters.  Essentially it runs twice for two purposes.  First it is used with TOP to filter the records.  Then it sorts them for viewing pleasure. (p.43)

5.  When an order by tie occurs (making the query non-deterministic because there is more than one correct answer)..the engine just takes the first one it accesses. (p.44)

7.  When selecting a top with an order by and there are ties you can specify the keyword WITH TIES to return all of the tiebreakers!  (select top (5) with ties) (p.45)

8.  The over clause evaluates a grouped set of records

9.  100 is different than 100.  Did you see the difference?  Let me restate:  100. is different than 100  when doing calculations.  100 is an integer  100. (with dot) is a decimal.  Use the dot for implicit conversions to decimal.  Wow!

10.  The ROW_NUMBER function assigns a rownumber in order of the column  you specify (example: select val, row_number() over(order by val) as rownumber from Sales.OrerValues).  (p.48)

11.  RANK is similar to row number but the numbers can be repeated (for ties).  Row numbers really need to be unique in order to be row numbers (it would kind of defeat the purpose otherwise).  But rank just gives you a ranking based upon the column you specify (example: select val, rank() over (order by val) as ranking from sales.OrderValues)  (p.48)

12.  Said another way, RANK shows you how many values your value is above.  It shows where your value is ranked.  DENSE_RANK adds one more element to this.  It shows how many distinct values are below you.   In other words, if there are two values of $45 below your current value, rank will rank you at 10 where DENSE_RANK willa ctually rank you at a 9 because it ranks you based upon distinct values.  (p.48)

example: select val, RANK() over(order by val) as ranked, DENSE_RANK() over(order by val) as denseranked from Sales.OrderValues

13.  NTILE is a ranking based upon TILES.  That is confusing to me.  I prefer to think of it as TIERS.  One example I can think of is a group of runners running a race.  If you had 10 people running you’d probably just use rank.  But if you had 10,000  you’d probably use some tier (or TILE) methodology.  If you’ve ever run a 5K they often will group people into 3 or 4 tiers (or tiles).  The first tier (or tile) contains people who can run it in 15 minutes.  The second tier (or tile) contains people who can run it in 20 minutes.  The third tier (or tile) are people who can run it in 25 minutes, etc.  So when you  use the NTILE function  you specify how many tiles (or tiers) you watn to tuse as well as the ordering ( just like rank, rownumber, and dense_rank).  So for our query we are stating that I want the query to tell me which tile the row falls into…and I want it broken up into 10 tiles, sorted by val.  (p.49)

example: select val, NTILE(10) over(order by val) as NTILERESULT from Sales.OrderValues

14.  If that isn’t enough, you can partition your ranking functions!!  So if you want you can partition it by another column.  In our example, I could partition my ranking by men vs. women.  (p.51)

15.  When coding calculations of two numbers the datatype with lower precedence will be chosen as the datatype for the final result.  Mr. Ben-Gan gave th example of select 5/2.0 which would be implicitly converted and evaluated as 5.0/2.0 and the final result would be the numeric datatype.  I mentioned that the lowest precedence would be chosen.  Here is the list in order of precedence.  So what do you think when you select 5/2?  The answer is 2.  Why?  Because they are both integers and so the final result is evaluated and returned as  an integer.  To get 2.5 one of them would have to have a dot behind it.  BRILLIANT!  Thanks Itzik!  (p.53)

16.  The order of the when statements is guaranteed of a case statement, which helps when evaluating expressions that might fail  (example divide by 0 errors or other mathematical faux paus). (p.63)

17.  PRESERVED.  This is the word I’ve been looking for all along in trying to describe the different kind of outer joins.  A left outer join PRESERVES all the rows on the left table.  A right outer join PRESERVES all the rows in the right table.  A full outer join PRESERVES the rows in both tables.  (p.114)

18.  When debating over whether to filter your records in the ON clause or the WHERE clause think of it like this: is the filter related to your relationship or do you need it to be a final filter over all the returning data?  Remember how the OUTER joins work.  If you want some sort of final filter on your result, put it in the WHERE clause.  Some behavior that you see can be explained by the order of the query.  (note the next bullet).  In other words, the Where clause comes after the From clause.

19.  Here is a bit of manipulation to get you outside and away from the monitor you’ve been staring at for too long.

Fresh
Wind,
Grass,
Hills, and
Sun
   Oversteps
   Death and
   Transfuses
Order into LIFE!!

I made that up!  Just for you!  My personal goal is to get outside more than I am on the computer.  But what this acronym also represents is the order by which the sql engine processes your query.

From
Where
Group By,
Having
Select
   Over
   Distinct
   Top
Order By

Advertisements

Written by matt

May 28, 2010 at 3:07 pm

Posted in T-Sql

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: