Best sql-server-2008 questions in May 2012

Why can't I perform an aggregate function on an expression containing an aggregate but I can do so by creating a new select statement around it?

10 votes

Why is it that in SQL Server I can't do this:

select  sum(count(id)) as 'count'
from    table

But I can do

select sum(x.count)
from
(
    select  count(id) as 'count'
    from    table   
) x

Are they not essentially the same thing? How am I meant to be thinking about this in order to understand why the first block of code isn't allowed?

SUM() in your example is a no-op - SUM() of a COUNT() means the same as just COUNT(). So neither of your example queries appear to do anything useful.

It seems to me that nesting aggregates would only make sense if you wanted to apply two different aggregations - meaning GROUP BY on different sets of columns. To specify two different aggregations you would need to use the GROUPING SETS feature or SUM() OVER feature. Maybe if you explain what you want to achieve someone could show you how.

T-SQL procedure - filter parameter as Object/CLR/Xml/UDT

9 votes

Cliffs: Is there a known pattern for passing a standard "filter" type to a stored procedure to encapsulate stardate/enddate/pagesize/pagenum parameters?

Not sure the correct place for this question. I'm exploring the idea of passing a filtering object parameter to a stored procedure which encapsulates our common filtering parameters (startdate, enddate, pagenumber, pagesize, list of int's, etc). The reason for this is to reduce the amount of similar parameters and boilerplate SQL spread around our procedures. This would give us a more standard interface and starting point for each procedure right from the start. I haven't been able to find much info on the topic.

Pattern I've noticed - when first building most SP's they start with a single id parameter used in the where clause. At some point later, you may need to add parameters for date range parameters (startdate, enddate or dynamic ranges "ytd, mtd, dtd"). If the data set is large enough you also may need to introduce pagesize/pagenum for server side paging. After some time you may realize that you need results for a list of id's rather than a single id, so you add a CSV or XML parameter to envelope the IDs.

Ultimately many stored procedures end up with a lot of similar boilerplate and (hopefully) identical parameters for handling these standard filtering parameters. I'm trying to research known patterns for passing an encapsulated filter object parameter to my procedures, that ideally would be strongly typed on the C# side. This would be particularly useful when managing a group of procedures that power reports which all require the same filtering options (in addition to the report-specific query parameters).

My goal is to reduce the number of parameters required to the bare minimum needed for the WHERE clause, and create a standard mechanism for passing the generic filtering options into a procedure and using those values while inside a procedure. How could this be achieved through XML or CLR or UDT parameters?

For context of this question, I'm using SQL Server 2008 via ADO.Net from C# 2.0. Unfortunately LINQ/EF is not an option for this project at this point, and we must stick with our existing RDBMS. If there is a known pattern that requires changing technologies I would be interested in hearing about it.

Edit: Appreciate the replies so far. I've added a bounty for 50pts that I'll let run for a few more days to try to promote some more discussion. If my question isn't clear enough just leave a comment..

I personally think that you're overthinking or trying to reduce something that doesn't need to be reduced. You're probably better off leaving stored procedure parameters alone, or trying to create some base classes and helper functions that can append sets of parameters to a command object.

However, that being said, I'll throw a solution to your question out there and see if it fits your needs:

I suggest using TSQL user defined types. Create one or more types. Maybe one for date ranges, and one for paging and sorting. I use a similar process for passing multi-row data to stored procedures. (Some of this code might need to be tweaked a bit, as I'm just modifying some code I've already written and I haven't worked with DataTable fields in quite some time.)

Ultimately, all this does is shorten the list of parameters in the application method and matching stored procedure. The stored procedure would be responsible for extracting or joining the information in the table variable. The classes listed below do provide the ability to keep these parameters strongly typed on the .NET application side.

if not exists (select * from INFORMATION_SCHEMA.DOMAINS where DOMAIN_SCHEMA = 'dbo' and DOMAIN_NAME = 'DateRange' and DATA_TYPE = 'table type')
begin

    create type dbo.DateRange as table 
    (
        StartDate datetime2 null
        ,EndDate datetime2 null
    )

end
go


if not exists (select * from INFORMATION_SCHEMA.DOMAINS where DOMAIN_SCHEMA = 'dbo' and DOMAIN_NAME = 'Paging' and DATA_TYPE = 'table type')
begin

    create type dbo.Paging as table 
    (
        PageNumber int null
        ,PageSize int null
        ,SortField sysname null
        ,SortDirection varchar(4) null
    )

end
go

The SQL user defined types can be represented as strongly typed objects in a .NET application. Start with a base class:

    Imports System
    Imports System.Data
    Imports System.Data.SqlClient
    Imports System.Runtime.Serialization


    Namespace SqlTypes

        <Serializable()> _
        <System.ComponentModel.DesignerCategory("Code")> _
        Public MustInherit Class SqlTableTypeBase
            Inherits DataTable

            Public Sub New()

                MyBase.New()
                Initialize()

            End Sub


            Public Sub New(ByVal tableName As String)

                MyBase.New(tableName)
                Initialize()

            End Sub


            Public Sub New(ByVal tableName As String, ByVal tableNamespace As String)

                MyBase.New(tableName, tableNamespace)
                Initialize()

            End Sub


            Protected Sub New(ByVal info As SerializationInfo, ByVal context As StreamingContext)

                MyBase.New(info, context)

            End Sub


            ''' <summary>
            ''' Implement this method to create the columns in the data table to match the SQL server user defined table type
            ''' </summary>
            ''' <remarks></remarks>
            Protected MustOverride Sub Initialize()


            Public Function CreateParameter(parameterName As String) As SqlParameter

                Dim p As New SqlParameter(parameterName, SqlDbType.Structured)
                p.Value = Me

                Return p

            End Function

        End Class

    End Namespace

Create an implementation for the SQL types:

Imports System
Imports System.Data
Imports System.Runtime.Serialization


Namespace SqlTypes

    <Serializable()> _
    <System.ComponentModel.DesignerCategory("Code")> _
    Public Class DateRange
        Inherits SqlTableTypeBase

        Public Sub New()

            MyBase.New()

        End Sub


        Public Sub New(ByVal tableName As String)

            MyBase.New(tableName)

        End Sub


        Public Sub New(ByVal tableName As String, ByVal tableNamespace As String)

            MyBase.New(tableName, tableNamespace)

        End Sub


        Protected Sub New(ByVal info As SerializationInfo, ByVal context As StreamingContext)

            MyBase.New(info, context)

        End Sub


        'TODO: throw some more overloaded constructors in here...

        Public Sub New(startDate As DateTime?, endDate As DateTime?)

            MyBase.New()

            Me.StartDate = startDate
            Me.EndDate = endDate

        End Sub


        Public Property StartDate As DateTime?
            Get
                Return CType(Me.Rows(0)(0), DateTime?)
            End Get
            Set(value As DateTime?)
                Me.Rows(0)(0) = value
            End Set
        End Property


        Public Property EndDate As DateTime?
            Get
                Return CType(Me.Rows(0)(1), DateTime?)
            End Get
            Set(value As DateTime?)
                Me.Rows(0)(1) = value
            End Set
        End Property


        Protected Overrides Sub Initialize()

            Me.Columns.Add(New DataColumn("StartDate", GetType(DateTime?)))
            Me.Columns.Add(New DataColumn("EndDate", GetType(DateTime?)))

            Me.Rows.Add({Nothing, Nothing})

        End Sub

    End Class

End Namespace

And:

Imports System
Imports System.Data
Imports System.Runtime.Serialization


Namespace SqlTypes

    <Serializable()> _
    <System.ComponentModel.DesignerCategory("Code")> _
    Public Class Paging
        Inherits SqlTableTypeBase

        Public Sub New()

            MyBase.New()

        End Sub


        Public Sub New(ByVal tableName As String)

            MyBase.New(tableName)

        End Sub


        Public Sub New(ByVal tableName As String, ByVal tableNamespace As String)

            MyBase.New(tableName, tableNamespace)

        End Sub


        Protected Sub New(ByVal info As SerializationInfo, ByVal context As StreamingContext)

            MyBase.New(info, context)

        End Sub


        'TODO: throw some more overloaded constructors in here...


        Public Sub New(pageNumber As Integer?, pageSize As Integer?)

            MyBase.New()

            Me.PageNumber = pageNumber
            Me.PageSize = pageSize

        End Sub


        Public Sub New(sortField As String, sortDirection As String)

            MyBase.New()

            Me.SortField = sortField
            Me.SortDirection = sortDirection

        End Sub


        Public Sub New(pageNumber As Integer?, pageSize As Integer?, sortField As String, sortDirection As String)

            Me.New(pageNumber, pageSize)

            Me.SortField = sortField
            Me.SortDirection = sortDirection

        End Sub


        Public Property PageNumber As Integer?
            Get
                Return CType(Me.Rows(0)(0), Integer?)
            End Get
            Set(value As Integer?)
                Me.Rows(0)(0) = value
            End Set
        End Property


        Public Property PageSize As Integer?
            Get
                Return CType(Me.Rows(0)(1), Integer?)
            End Get
            Set(value As Integer?)
                Me.Rows(0)(1) = value
            End Set
        End Property


        Public Property SortField As String
            Get
                Return CType(Me.Rows(0)(2), String)
            End Get
            Set(value As String)
                Me.Rows(0)(2) = value
            End Set
        End Property


        Public Property SortDirection As String
            Get
                Return CType(Me.Rows(0)(3), String)
            End Get
            Set(value As String)
                Me.Rows(0)(3) = value
            End Set
        End Property


        Protected Overrides Sub Initialize()

            Me.Columns.Add(New DataColumn("PageNumber", GetType(Integer?)))
            Me.Columns.Add(New DataColumn("PageSize", GetType(Integer?)))
            Me.Columns.Add(New DataColumn("SortField", GetType(String)))
            Me.Columns.Add(New DataColumn("SortDirection", GetType(String)))

            Me.Rows.Add({Nothing, Nothing, Nothing, Nothing})

        End Sub

    End Class

End Namespace

Instantiate the objects and set the values in the constructor, then simply get the parameter from the object, and append it to the stored procedure command object's parameter collection.

cmd.Parameters.Add(New DateRange(startDate, endDate).CreateParameter("DateRangeParams"))
cmd.Parameters.Add(New Paging(pageNumber, pageSize).CreateParameter("PagingParams"))

EDIT Since this answer revolves around the strong typing, I thought I should add an example of strong typing in the method signature:

'method signature with UDTs
Public Function GetMyReport(customParam1 as Integer, timeFrame as DateRange, pages as Paging) as IDataReader

'method signature without UDTs
Public Function GetMyReport(customParam1 as Integer, startDate as DateTime, endDate as DateTime, pageNumber as Integer, pageSize as Integer)

SQL evaluation of IF clauses

9 votes

Apologies if this is not suitable, but really this is a 'why' rather than a 'how'. Not sure that is suitable, but don't know a better place to ask and I can't think how to phrase a google to get what I am looking for.

    IF 'hell' = 'freezing over'
    BEGIN
    SELECT log(0)
    END

Look at that statement. There is no world in which the IF clause will be true. If I attempt to run it I am expecting SQL to jump past the IF clause and move to the end. Instead I get:

An invalid floating point operation occurred.

This is bizarre. So I guess that is just the way that SQL does it's thing. Except...

    IF 'hell' = 'freezing over'
    BEGIN
    SELECT 1/0
    END

There is no error here. The statement in the IF clause should still generate an error. Could anyone explain why this is not happening?

This came up whilst debuggging a massive set of SQL calcs where EXP(SUM(LOG())) is used to accumulate data within an if clause. I can alter the code to stop that happening again, but why is it evaluating something within an IF clause that is not met.

Cheers.

EDIT: Additional amusement. Try catch? Pffft

    IF 1=2
    BEGIN
        BEGIN TRY
            SELECT SQRT(-1)
        END TRY
        BEGIN CATCH
        END CATCH
    END

Non mathematical:

    IF 1=2
    BEGIN
    SELECT SUBSTRING('hello',-1,-1)
    END

My guess would be that log(0) is effectively evaluated prematurely due to constant-folding whereas 1/0 is not, either due to its cardinality estimation or more likely the fact that the ANSI_WARNINGS setting will affect the desired result of a divide by zero (overflow vs NULL).

SQL filtered indexes: should I always put a filter on an index for optional columns?

8 votes

For 'large' tables, is there any reason not to put a filter on indexes for optional columns?

So for an index on column AAA (because people can search on AAA),
I can set the filter to ([AAA] IS NOT NULL).
This saves storage, so it saves money.

Some more advantages from technet:

  • Improved query performance and plan quality
  • Reduced index maintenance costs
  • Reduced index storage costs

People say that it's good to put a filter on an index for columns that are mostly empty. But why wouldn't I put a filter on indexes for columns that are empty for like 1%? Is there any reason not to do it if it only has advantages?

This is usually a good idea with two gotchas:

  1. The table designer has a bug (only pre Denali!). When it rebuilds a table it deletes all filters.
  2. Be sure that the optimizer can tell statically that your predicate will never allow null rows to be returned. Usually, this is the case because of SQL NULL semantics (semmingly the only case where they help instead of hinder). Example: select distinct col from T will not use the index because a null value might be found. Use this: select distinct col from T where col is not null.

Filtered indexes are vastly underused. They can even be used to make a nullable column unique.

My practical recommendation: Just try it for a few month and learn for yourself if there are additional unforseen problems.

If you are into advanced SQL Server query techniques, also look ad indexed views. THey are a super set of filtered indexes (at least on Enterprise).

Change column types in a huge table

8 votes

I have a table in SQL Server 2008 R2 with close to a billion rows. I want to change the datatype of two columns from int to bigint. Two times ALTER TABLE zzz ALTER COLUMN yyy works, but it's very slow. How can I speed the process up? I was thinking to copy the data to another table, drop, create, copy back and switching to simple recovery mode or somehow doing it with a cursor a 1000 rows a time but I'm not sure if those will actually lead to any improvement.

Depending on what change you are making, sometimes it can be easier to take a maintenance window. During that window (where nobody should be able to change the data in the table) you can:

  1. drop any indexes/constraints pointing to the old column, and disable triggers
  2. add a new nullable column with the new data type (even if it is meant to be NOT NULL)
  3. update the new column setting it equal to the old column's value (and you can do this in chunks of individual transactions (say, affecting 10000 rows at a time using UPDATE TOP (10000) ... SET newcol = oldcol WHERE newcol IS NULL) and with CHECKPOINT to avoid overrunning your log)
  4. once the updates are all done, drop the old column
  5. rename the new column (and add a NOT NULL constraint if appropriate)
  6. rebuild indexes and update statistics

The key here is that it allows you to perform the update incrementally in step 3, which you can't do in a single ALTER TABLE command.

This assumes the column is not playing a major role in data integrity - if it is involved in a bunch of foreign key relationships, there are more steps.

EDIT

Also, and just wondering out loud, I haven't done any testing for this (but adding it to the list). I wonder if page + row compression would help here? If you change an INT to a BIGINT, with compression in place SQL Server should still treat all values as if they still fit in an INT. Again, I haven't tested if this would make an alter faster or slower, or how much longer it would take to add compression in the first place. Just throwing it out there.

Compare multiple columns, but only those having valid values, and create y/n flag if all are equal

7 votes

I want to create a Y/N flag, where Y indicates every valid value in every column in a given row is equal, and N otherwise. I need to exclude from consideration any column that contains nulls, blanks, or all zeroes. Suppose:

CREATE TABLE z_test
(ID INT NOT NULL,
D1 VARCHAR(8)NULL,
D2 VARCHAR(8)NULL,
D3 VARCHAR(8)NULL,
D4 VARCHAR(8)NULL,
DFLAG CHAR(1)NULL)

INSERT INTO z_test VALUES (1,NULL,' ','000000','00000000',NULL)
INSERT INTO z_test VALUES (1,'20120101','0000','20120101','00000000',NULL)
INSERT INTO z_test VALUES (2,'20100101','20100101','20100101','20100101',NULL)
INSERT INTO z_test VALUES (2,'00000000','20090101','0','20090101',NULL)
INSERT INTO z_test VALUES (3,'00000000','20090101',NULL,'20120101',NULL)
INSERT INTO z_test VALUES (3,'20100101',' ',NULL,'20100101',NULL)

The desired output (excluding D1 through D4, though I don't want to drop them) is:

ID       DFLAG
---------------
1        N
1        Y
2        Y
2        Y
3        N
3        Y

Speed is not a concern as this query will not be run very often but it is on a largish table.

Any pointers or suggestions would be very much appreciated!!

SELECT ID, 
       CASE 
         WHEN C = 1 THEN 'Y' 
         ELSE 'N' 
       END AS DFLAG 
FROM   z_test 
       CROSS APPLY (SELECT COUNT(DISTINCT D) C 
                    FROM   (VALUES(D1), 
                                  (D2), 
                                  (D3), 
                                  (D4)) V(D) 
                    WHERE  LEN(D) > 0 /*Excludes blanks and NULLs*/
                         AND D LIKE '%[^0]%'/*Excludes ones with only zero*/) CA 

Is it better to pass large inserts to SQL Server as a table valued parameter, or as a string insert statement?

5 votes

I am writing a .NET application that writes data to SQL Server 2008r2. I have two options for inserting the data, either I can create a large string insert statement, and send it as a text command, or I can collect the data in a .NET DataTable, and pass it as a table valued parameter. What are the benefits and costs of each method?

(I am omitting a good deal of code since I am just asking about the relative benefits, not the specific syntax)

e.g.:

Option 1:

    string insert = @"insert into MyTable (id, val) values
        ( 1, 'a'),(2,'b'),(3,'c'),(4,'d');"

Option 2:

    DataTable dt = new DataTable();
    dt.Columns.Add("id", typeof(int));
    dt.Columns.Add("val", typeof(string));
    ....
    create procedure uspMyProc 
                    @tt ttMyTableType readonly
                as
                begin
                    insert into TestTable1 (id, strValue)
                    select myId, myVal from @tt;
                end"

Thanks for any help.

Option 3: In the first instance I would populate the insert stored procedure with one insert statement's worth of parameters and call it multiple times in a loop from the C# code:

Option 4: If you truly have lots of rows to insert, perhaps you need to look into the SqlBulkCopy class. It consumes either DataTable, DataRow or an IDataReader. You can make an IDataReader from a list of objects using some custom code, a question of this ilk is asked here:

Get an IDataReader from a typed List


I would say it depends.

If you really want to pass many rows of parameters in tabular form, for whatever reason, use a table valued parameter - that's what it's there for.

I have seen Option 1 - some generic DAL code would script out a SQL "batch" of commands to run. It worked, but didn't give any defence against injection attacks. Parameterised SQL does.


All that said, I would favour calling the insert sproc once for each row to be inserted from code - the calls will be fully parameterised and performance is fine. If performance becomes a problem I would favour Option 4.

SQL Server Convert Timestamp DataType to Decimal

5 votes

Been Trying to figure this out and according to Cast and Convert Documentation on MSDN, this should be possible ( http://msdn.microsoft.com/en-us/library/ms187928.aspx )

Have a look at the Conversion table.

enter image description here

I am running the following Code:

CREATE TABLE TableName (
    ID bigint Identity(1,1),
    SomeValue nvarchar(20) not null,
    TimestampColumn timestamp not null)

Insert Into TableName (SomeValue)
values ('testing')

SELECT Convert(decimal, TimeStampColumn) from TableName

However I simply get the following error:

Error converting data type timestamp to numeric.

This should be possible according to the documentation or am I missing something? Note I will also need to convert back to the same timestamp value.

SELECT TimeStampColumn, Convert(timestamp, Convert(decimal, TimeStampColumn)) 
FROM TableName

In the end the above query should render the same value.

Try this; though MSDN says it's a implicit conversion but it actually doesn't work. So, what I am doing is converting it to INT and then to decimal (INT to decimal is implicit anyways)

select val, CAST((CONVERT(bigint, timestampcol)) as decimal) as 'TS as decimal' 
from teststmp

Remove dates contained within other dates?

5 votes

I have the following rows:

CREATE TABLE #TEMP (id int, name varchar(255), startdate datetime, enddate datetime)
INSERT INTO #TEMP VALUES(1, 'John', '2011-01-11 00:00:00.000','2011-01-11 00:01:10.000')
INSERT INTO #TEMP VALUES(2, 'John', '2011-01-11 00:00:20.000','2011-01-11 00:01:05.000')
INSERT INTO #TEMP VALUES(3, 'John', '2011-01-11 00:01:40.000','2011-01-11 00:01:50.000')
INSERT INTO #TEMP VALUES(4, 'Adam', '2011-01-11 00:00:40.000','2011-01-11 00:01:20.000')
INSERT INTO #TEMP VALUES(5, 'Adam', '2011-01-11 00:00:45.000','2011-01-11 00:01:15.000')

SELECT * FROM #TEMP

DROP TABLE #TEMP

I am trying to remove records that have dates contained within other dates to obtain the following:

John 2011-01-11 00:00:00.000 2011-01-11 00:01:10.000
John 2011-01-11 00:01:40.000 2011-01-11 00:01:50.000
Adam 2011-01-11 00:00:40.000 2011-01-11 00:01:20.000

Any suggestions on how to achieve this for a table of about 100K rows?

This gives the desired result:

DELETE T1 FROM #TEMP T1
WHERE EXISTS(
    SELECT NULL FROM #TEMP T2
    WHERE   t1.id <> t2.id
    AND     t1.name = t2.name
    AND     t1.startdate >= t1.startdate
    AND     t1.enddate   <= t1.enddate
)

http://msdn.microsoft.com/en-us/library/ms188336.aspx

Edit: I've just noticed that there's one problem. If there are duplicates (same start- and enddate), both would be deleted (none with John's approach, even with only one equal date). So you need to take this into account:

DELETE T1 FROM #TEMP T1
WHERE EXISTS(
    SELECT NULL FROM #TEMP T2
    WHERE   t1.id <> t2.id
    AND     t1.name = t2.name
    AND     t1.startdate > t2.startdate
    AND     t1.enddate   < t2.enddate
    OR      t1.id <> t2.id
    AND     t1.name = t2.name
    AND     t1.startdate = t2.startdate
    AND     t1.enddate   < t2.enddate
    OR      t1.id <> t2.id
    AND     t1.name = t2.name
    AND     t1.startdate > t2.startdate
    AND     t1.enddate   = t2.enddate
    OR      t1.id > t2.id
    AND     t1.name = t2.name
    AND     t1.startdate = t2.startdate
    AND     t1.enddate   = t2.enddate
)