Best sql-server questions in December 2010

Is assembly running in SQL Server or from a Windows app

10 votes

How can I determine if an assembly is running from a SQL Server CLR stored procedure or if it is running from a Windows app?

Check out SqlContext.IsAvailable

How do you maintain multiple versions of Databases?

7 votes

We have many environments

trunk (dev integration) -> devel (team testing) -> qa (regression testing) -> live (customer use)

Each has its own database that works with the code in that environment.

Part of the contents of the database is metadata, and part is data. For example if you are building a report, then the columns you can choose from to build the report are metadata, but the reports that a user has already built are data. Metadata flows up the promotion chain (get introduced in trunk, and goes devel->qa->live), along with code as it gets tested. Data however does not get promoted. Moreover data in a certain environment must not get erased or corrupted and continue to work after that environment has been promoted to.

What good strategies exist out there to manage and equally important, automate such a setup?

We are using .NET/C#/SQL Server but I think this problem is generic and has to be dealt with across the board for any mature application that has a large number of developers working on it, and cares about the data users generate on it.

Another product you may wish to look at is Red Gate's SQL Source Control. We're still in the midst of evaluating it, but it certainly tries to handle at least some of your requirements. The rest of Red Gate's products may also come in handy for you.

Good C#.NET Solution to manage frequent database polling

7 votes

I am currently working on a c# .NET desktop application that will be communicating to a database over the internet via WCF and WCF Data Services. There will be many spots in the application that may need to be refreshed upon some interval. The easiest solution would be to just put these areas on a timer and requery the database. However, with thousands of clients connecting to the service layer and hence database, these operations would be very expensive to the server.

What I have considered is creating an RSS feed that is polled by the client, and lets the client know when these particular areas need to be updated. The RSS feed will be managed by a service that either polls the database for changes, or iterates through a list of items that are queued up by the WCF requests made by the client.

I have also considered creating some direct and continuous connection from the client to the server, but I am not sure what outbound firewall ports would be open from the client. I could probably only count on port 80/443.

So my question is what solutions have people had success implementing to solve this problem? Have people done RSS? Microsoft Sync Services? Two way communication between client and server over some save port via WCF?

Any ideas are greatly appreciated.

I think you might want to go with a combination of two approaches. First off, you could use long polling from the client to server so that the server can notify the client as soon as a change occurs that the client is interested in.

Secondly, based on the tags in this question it looks like you're using SQL Server. You could use database notifications on the tables you are interested in to have the DB notify your service when changes occur. This could then trigger the service to notify the client about the changes through the long poll connections. You can do this using the SqlDependency class.

I'm sure there are other ways, but this would probably scale quite well as you'd only have one service getting the notifications and then distributing them out to all the clients.

T-SQL Conditional WHERE Clause

7 votes

Hi Guys,

Found a couple of similar questions here on this, but couldn't figure out how to apply to my scenario.

My function has a parameter called @IncludeBelow. Values are 0 or 1 (BIT).

I have this query:

SELECT p.*
FROM Locations l
INNER JOIN Posts p
on l.LocationId = p.LocationId
WHERE l.Condition1 = @Value1
AND   l.SomeOtherCondition = @SomeOtherValue

If @IncludeBelow is 0, i need the query to be this:

SELECT p.*
FROM Locations l
INNER JOIN Posts p
on l.LocationId = p.LocationId
WHERE l.Condition1 = @Value1
AND   l.SomeOtherCondition = @SomeOtherValue
AND   p.LocationType = @LocationType -- additional filter to only include level.

If @IncludeBelow is 1, that last line needs to be excluded. (i.e don't apply filter).

I'm guessing it needs to be a CASE statement, but can't figure out the syntax.

Here's what i've tried:

SELECT p.*
FROM Locations l
INNER JOIN Posts p
on l.LocationId = p.LocationId
WHERE l.Condition1 = @Value1
AND   l.SomeOtherCondition = @SomeOtherValue
AND (CASE @IncludeBelow WHEN 0 THEN p.LocationTypeId = @LocationType ELSE 1 = 1)

Obviously that's not correct.

What's the correct syntax?

I changed the query to use EXISTS because if there's more than one location associated with a POST, there'd be duplicate POST records that'd require a DISTINCT or GROUP BY clause to get rid of...

The non-sargable

This will perform the worst of the possible solutions:

SELECT p.*
  FROM POSTS p
 WHERE EXISTS(SELECT NULL
                FROM LOCATIONS l
               WHERE l.LocationId = p.LocationId
                 AND l.Condition1 = @Value1
                 AND l.SomeOtherCondition = @SomeOtherValue)
   AND (@IncludeBelow = 1 OR p.LocationTypeId = @LocationType)

The sargable, non-dynamic version

Self explanitory....

BEGIN
  IF @IncludeBelow = 0 THEN
    SELECT p.*
      FROM POSTS p
     WHERE EXISTS(SELECT NULL
                    FROM LOCATIONS l
                   WHERE l.LocationId = p.LocationId
                     AND l.Condition1 = @Value1
                     AND l.SomeOtherCondition = @SomeOtherValue)
       AND p.LocationTypeId = @LocationType
  ELSE
    SELECT p.*
      FROM POSTS p
     WHERE EXISTS(SELECT NULL
                    FROM LOCATIONS l
                   WHERE l.LocationId = p.LocationId
                     AND l.Condition1 = @Value1
                     AND l.SomeOtherCondition = @SomeOtherValue) 
END

The sargable, dynamic version (SQL Server 2005+):

Love or hate it, dynamic SQL lets you write the query once. Just be aware that sp_executesql caches the query plan, unlike EXEC in SQL Server. Highly recommend reading The Curse and Blessings of Dynamic SQL before considering dynamic SQL on SQL Server...

DECLARE @SQL VARCHAR(MAX)
    SET @SQL = 'SELECT p.*
                  FROM POSTS p
                 WHERE EXISTS(SELECT NULL
                                FROM LOCATIONS l
                               WHERE l.LocationId = p.LocationId
                                 AND l.Condition1 = @Value1
                                 AND l.SomeOtherCondition = @SomeOtherValue)'

    SET @SQL = CASE 
                 WHEN @IncludeBelow = 0 THEN
                   ' AND p.LocationTypeId = @LocationType '
                 ELSE ''
               END   

BEGIN 

  EXEC sp_executesql @SQL, 
                     N'@Value1 INT, @SomeOtherValue VARCHAR(40), @LocationType INT',
                     @Value1, @SomeOtherValue, @LocationType

END

Sql Transaction - SQL Server or C#?

6 votes

Am I right in saying that from a performance perspective, sql transactions are far better within a stored procedure than code?

At the moment I use most of my transactions in stored procs but sometimes I use code for more complex routines - which obviously I keep to a minimum as much as possible.

It's just that there was a complex routine that required too many "variables" that writing the sql transaction in c# was far easier than using SQL Server. It's a fine line between code readability and performance.

Any ideas?

The performance varies; a SqlTransaction can have less overhead than a TransactionScope, especially if the TransactionScope decides it needs to get entangled with DTC. But I wouldn't expect a vast difference between SqlTransaction and a BEGIN TRAN, except for the extra round trip. However, TransactionScope is still fast, and is the most convenient option for encapsulating multiple operations in a transaction, as the ambient transaction does not need to be manually associated with the command each time.

Perhaps a better (and more significant) factor is the isolation-level. TransactionScope defaults to the highest (serializable). Lower isolation levels allow morefor less blocking (but at the risk of non-repeatable reads, etc). IIRC a TSQL transaction defaults to one of the lower levels. But the isolation level can be tweaked for all 3 options.

Performing a bitwise sum

6 votes

As part of a complex aggregate I want to know the bitwise sum of some data, i.e. if I have rows with values 1,1,1,2,2,8 the bitwise sum is 11. In this case the values are all exact powers of two (single bits), so I can hack around it by grouping and summing over the groups (obviously this example is a bit tortured compared to the real query):

select SUM(y.test)
from (
    select x.test
    from ( -- garbage test data
        select 1 as [test]
        union all select 1
        union all select 1
        union all select 2
        union all select 2
        union all select 8) x
group by x.test) y

but is there a clean way to perform a bitwise sum in [T]SQL?

If all of your test values are single bits as in your example (1, 2, 8) - simply use SUM(DISTINCT col) in your query.

Hope that helps.

(For reference: http://msdn.microsoft.com/en-us/library/ms187810.aspx)

Using SQL, how do I update rows, using their own values?

5 votes

I have the following table which I'll call 'example'

id name       last_name
01 Adam       Adams
02 Bill       Billo
03 Cathy      McCathyson

I need to modify the table and end up with the following:

id name
01 Adam Adams
02 Bill Billo
03 Cathy McCathyson

For a single row, I know how to write this query:

UPDATE example SET name = 
   (SELECT name FROM example WHERE id = 01)+" "
   +(SELECT last_name FROM example WHERE id = 01)
WHERE id = 01;

How do I modify this query such that it updates each row with that row's values, as in the example?

EDIT: I've updated my example since it confused the issue.

UPDATE example SET NAME = NAME + ' ' + last_name

    ID NAME      LAST_NAME

     1 Adam       Adams
     2 Bill       Billo
     3 Cathy      McCathyson

SQL> UPDATE example SET NAME = NAME + ' ' + last_name
  2  /

3 rows updated

SQL> select * from example
  2  /

        ID NAME                    LAST_NAME
---------- -----------------------------------------
         1 Adam Adams              Adams
         2 Bill Billo              Billo
         3 Cathy McCathyson        McCathyson

SQL performance: Is there any performance hit using NVarchar(MAX) instead of NVarChar(200)

5 votes

Hi,

I am wondering if there is any disadvantage on defining a column of type nvarchar(max) instead of giving it a (smaller) maximum size.

I read somewhere that if the column value has more than 4?KB the remaining data will be added to an "overflow" area, which is ok.

I'm creating a table where most of the time the text will be of a few lines, but I was wondering if there's any advantage in setting a lower limit and then adding a validation to avoid breaking that limit.

Is there any restriction on the creation of indexes with nvarchar(max) column, or anything that pays for having to add the restriction on the size limit?

Thanks!

Strictly speaking the MAX types will always be a bit slower than the non-MAX types, see Performance comparison of varchar(max) vs. varchar(N). But this difference is never visible in practice, where it just becomes noise in the overall performance driven by IO.

Your main concern should not be performance of MAX vs. non-MAX. You should be concerned with the question it will be possible that this column will have to store more than 8000 bytes? If the answer is yes, even by if is a very very unlikely yes, then the answer is obvious: use a MAX type, the pain to convert this column later to a MAX type is not worth the minor performance benefit of non-MAX types.

Other concerns (possibility to index that column, unavailability of ONLINE index operations for tables with MAX columns) were already addressed by Denis' answer.

BTW, the information about the columns over 4KB having remaining data in an overflow area is wrong. The correct information is in Table and Index Organization:

ROW_OVERFLOW_DATA Allocation Unit

For every partition used by a table (heap or clustered table), index, or indexed view, there is one ROW_OVERFLOW_DATA allocation unit. This allocation unit contains zero (0) pages until a data row with variable length columns (varchar, nvarchar, varbinary, or sql_variant) in the IN_ROW_DATA allocation unit exceeds the 8 KB row size limit. When the size limitation is reached, SQL Server moves the column with the largest width from that row to a page in the ROW_OVERFLOW_DATA allocation unit. A 24-byte pointer to this off-row data is maintained on the original page.

So is not columns over 4KB, is rows that don't fit in the free space on the page, and is not the 'remaining', is the entire column.

How to have a sql_variant parameter for a SQL CLR stored procedure?

5 votes

How can one add a sql_variant parameter to a SQL CLR stored procedure? Using System.Object does not work, and I don't see any attributes that I can use.

[Microsoft.SqlServer.Server.SqlProcedure]
public static void ClearOnePartition(
    SqlString aString
    , /* I want this to be a sql_variant */ object aVariant
)
{
    //do stuff here
}

In Mapping CLR Parameter Data from SQL Books Online, Object is listed as the correct type to use to map sql_variant.

I created a simple SQL Server project and added the following class to it:

public partial class StoredProcedures
{
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void StoredProcedure1(object param1)
    {
        // Put your code here
        //Trace.Write(param1);
        SqlContext.Pipe.Send(param1.ToString());

    }
};

I then modified the test.sql file to exercise this stored proc:

DECLARE @thing sql_variant = 'hahahahaha';

EXEC dbo.StoredProcedure1 @thing

This runs as expected and produces the following output:

hahahahaha

No rows affected.

(0 row(s) returned)

Finished running sp_executesql.

select statment performance degradation when using DISTINCT with parameters

5 votes

Note for bounty - START:

PARAMETERS SNIFFING (that is the only "idea" that was reported in pre-bounty questions) is not the issue here, as you can read in the "update" section at the end of the question. The problem is really related to how sql server creates execution plans for a parametrized query when distinct is used. I uploaded a very simple database backup (it works with sql server 2008 R2) here (you must wait 20 seconds before downloading). Against this DB you can try to run the following queries:

-- PARAMETRIZED QUERY

declare @IS_ADMINISTRATOR int
declare @User_ID int
set @IS_ADMINISTRATOR = 1 -- 1 for administrator 0 for normal
set @User_ID = 50

SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
  DOC.DOCUMENT_ID
FROM
  DOCUMENTS DOC LEFT OUTER JOIN
  FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
  ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)   
WHERE
  1 = @IS_ADMINISTRATOR OR  ROL.USER_ID = @USER_ID

-- NON PARAMETRIZED QUERY

SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!! 
  DOC.DOCUMENT_ID
FROM
  DOCUMENTS DOC LEFT OUTER JOIN
  FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
  ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)   
WHERE
  1 = 1 OR  ROL.USER_ID = 50

Final note: I noticed DSTINCT is the problem, my goal is to achieve the same speed (or at least almost the same speed) in both queries.

Note for bounty - END:


Original question:

I noticed that there is an heavy difference in performance between

-- Case A
select distinct * from table where id > 1

compared to (this is the sql generated by my Delphi application)

-- Case B1
exec sp_executesql N'select distinct * from table where id > @P1',N'@P1 int',1

that is equivalent to

-- Case B2
declare @P1 int
set @P1 = 1
select distinct * from table where id > @P1

A performs much faster than B1 and B2. The performance becomes the same in case I remove DISTINCT.

May you comment on this?

Here i posted a trivial query, I noticed this on a query with 3 INNER JOIN. Anyway not a complex query.

Note: I was expecting to have THE EXACT SAME PERFORMANCE, in cases A and B1/B2.

So are there some caveats in using DISTINCT?

UPDATE:

I tried to disable parameter sniffing using DBCC TRACEON (4136, -1) (the flag to disable parameter sniffing) but nothing changes. So in this case the problem is NOT LINKED TO PARAMETERS SNIFFING. Any idea?

The problem isn't that DISTINCT is causing a performance degradation with parameters, it's that the rest of the query isn't being optimized away in the parameterized query because the optimizer won't just optimize away all of the joins using 1=@IS_ADMINISTRATOR like it will with just 1=1. It won't optimize the joins away without distinct because it needs to return duplicates based on the result of the joins.

Why? Because the execution plan tossing out all of the joins would be invalid for any value other than @IS_ADMINISTRATOR = 1. It will never generate that plan regardless of whether you are caching plans or not.

This performs as well as the non parameterized query on my 2008 server:

-- PARAMETRIZED QUERY

declare @IS_ADMINISTRATOR int
declare @User_ID int
set @IS_ADMINISTRATOR = 1 -- 1 for administrator 0 for normal
set @User_ID = 50

IF 1 = @IS_ADMINISTRATOR 
BEGIN
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
  DOC.DOCUMENT_ID
FROM
  DOCUMENTS DOC LEFT OUTER JOIN
  FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
  ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)   
WHERE
  1 = 1
END
ELSE 
BEGIN
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
  DOC.DOCUMENT_ID
FROM
  DOCUMENTS DOC LEFT OUTER JOIN
  FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
  ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)   
WHERE
  ROL.USER_ID = @USER_ID
END

What's clear from the query plan I see running your example is that @IS_ADMINISTRATOR = 1 does not get optimized out the same as 1=1. In your non-parameterized example, the JOINS are completely optimized out, and it just returns every id in the DOCUMENTS table (very simple).

There are also different optimizations missing when @IS_ADMINISTRATOR <> 1. For instance, the LEFT OUTER JOINS are automatically changed to INNER JOINs without that OR clause, but they are left as-is with that or clause.

See also this answer: SQL LIKE % FOR INTEGERS for a dynamic SQL alternative.

Of course, this doesn't really explain the performance difference in your original question, since you don't have the OR in there. I assume that was an oversight.

executing a test in sql server 2005

5 votes

When I am executing following ...

EXEC 'DROP TABLE bkp_anish_test'

('DROP TABLE bkp_anish_test' is a dynamically build sql query)

I am getting following error

Could not find stored procedure 'DROP TABLE bkp_anish_test'.

You dont need to use EXEC to run the sql statement. In the query editor, just run

DROP TABLE bkp_anish_test

if the table is in xyz database, try this

 EXEC ('USE xyz ; DROP TABLE bkp_anish_test;');

Hi

Can anyone brief me how to do Connection Pooling in ADO.Net, I do need to connect to 3 separate databases. 2 of them are in same server and the other in a separate one.

Better with code snipts..

as long as you are string about disposing your connections, the default (for sql-server at least) is that it will just work automatically. In your example you could well only have 3 underlying connections (one per connection string).

But always ensure your connections are disposed, ideally with using:

using(var conn = new SqlConnection(connectionString)) {
    // use conn
}

then it is released back to the pool (for re-use when the same connection-string is seen next) even when an exception is thrown.

To disable pooling (if you choose), include Pooling=false; in the connection-string.

sql group by only rows which are in sequence

4 votes

Say I have the following table:

MyTable
---------
| 1 | A |
| 2 | A |
| 3 | A |
| 4 | B |
| 5 | B |
| 6 | B |
| 7 | A |
| 8 | A |
---------

I need the sql query to output the following:

---------
| 3 | A |
| 3 | B |
| 2 | A |
---------

Basically I'm doing a group by but only for rows which are together in the sequence. Any ideas?

Note that the database is on sql server 2008. There is a post on this topic however it uses oracle's lag() function.

This is known as the "islands" problem. Using Itzik Ben Gan's approach:

;WITH YourTable AS
(
SELECT 1 AS N, 'A' AS C UNION ALL
SELECT 2 AS N, 'A' AS C UNION ALL
SELECT 3 AS N, 'A' AS C UNION ALL
SELECT 4 AS N, 'B' AS C UNION ALL
SELECT 5 AS N, 'B' AS C UNION ALL
SELECT 6 AS N, 'B' AS C UNION ALL
SELECT 7 AS N, 'A' AS C UNION ALL
SELECT 8 AS N, 'A' AS C
),
T AS
(
SELECT N,
       C,
       ROW_NUMBER() OVER (ORDER BY N) - ROW_NUMBER() OVER (PARTITION BY C ORDER BY N) AS Grp
FROM YourTable
)
SELECT COUNT(*), C FROM T
GROUP BY C,Grp

In SQL Server 2008 R2 script data missing on Script Wizard

4 votes

In my SQL Server 2008 R2

Script Option Screen of Script Wizard under section Table/View Options Look I find Script Data row and want to turn the option to True but I fail.

I don't find any script data option. Why script data option missing on my SQL Server 2008 R2? Is there any command to generate the insert data option.

Why this problem arise? how to solve it? Thanks in advance

Also if you have a lot of data in tables, you can use compression option to compress data, which decrease scripts heaviness and increase script executing speed.

Unique Constraint in Entity Framework Code First

4 votes

Question

Is it possible to define a unique constraint on a property using either the fluent syntax or an attribute? If not, what are the workarounds?

I have a user class with a primary key, but I would like to make sure the email address is also unique. Is this possible without editing the database directly?

Solution (based on Matt's answer)

public class MyContext : DbContext {
    public DbSet<User> Users { get; set; }

    public override int SaveChanges() {
        foreach (var item in ChangeTracker.Entries<IModel>())
            item.Entity.Modified = DateTime.Now;

        return base.SaveChanges();
    }

    public class Initializer : IDatabaseInitializer<MyContext> {
        public void InitializeDatabase(MyContext context) {
            if (context.Database.Exists() && !context.Database.CompatibleWithModel(false))
                context.Database.Delete();

            if (!context.Database.Exists()) {
                context.Database.Create();
                context.Database.SqlCommand("alter table Users add constraint UniqueUserEmail unique (Email)");
            }
        }
    }
}

As far as I can tell, there's no way to do this with Entity Framework at the moment. However, this isn't just a problem with unique constraints... you may want to create indexes, check constraints, and possibly triggers and other constructs too. Here's a simple pattern you can use with your code-first setup, though admittedly it's not database agnostic:

public class MyRepository : DbContext {
    public DbSet<Whatever> Whatevers { get; set; }

    public class Initializer : IDatabaseInitializer<MyRepository> {
        public void InitializeDatabase(MyRepository context) {
            if (!context.Database.Exists() || !context.Database.ModelMatchesDatabase()) {
                context.Database.DeleteIfExists();
                context.Database.Create();

                context.ObjectContext.ExecuteStoreCommand("CREATE UNIQUE CONSTRAINT...");
                context.ObjectContext.ExecuteStoreCommand("CREATE INDEX...");
                context.ObjectContext.ExecuteStoreCommand("ETC...");
            }
        }
    }
}

Another option is if your domain model is the only method of inserting/updating data in your database, you could implement the uniqueness requirement yourself and leave the database out of it. This is a more portable solution and forces you to be clear about your business rules in your code, but leaves your database open to invalid data getting back-doored.

SQL max row of a table

4 votes

Which one is better in SQL Server 2008 or it doesn't matter as the database optimizes.

SELECT * FROM table
WHERE datecolumn = (SELECT max(datecolumn) FROM table);


SELECT TOP 1 * FROM table ORDER BY datecolumn DESC;

Looking at the actual SQL execution plan for these queries, it results in exactly the same execution plan and the same execution cost. However it might differ depending on the index setup and collected statistics, therefore it is best to do the check yourself.

One thing to be aware of is that the two queries might not return the same result. The 2nd SQL will always return one record where the first record might return multiple rows if datecolumn is not unique.

How to script indexes, keys, foreign keys in SQL Server

4 votes

Hi,

I would like to get the details of all indexes, keys, and foreign keys from a database in SQL Server (2008). How do I do this?

I plan to use this to synchronize those properties across a couple of somewhat similar databases.

I can use SQL Server Management Studio, but I cannot do a full backup of a database because of restrictions set by the web hoster.

--

Secondary question that you do not need to answer:

Why can't there be something similar to the database schema in Mysql that simply lists all of the database structure in text SQL script format?

Thanks,

Assuming you are using atleast SQL Server 2005 or above, you can use the Database Publishing Wizard to script your schema

This can be used to generate scripts for schema only, data or both.

It integrates directly into Visual Studio 2005 and/or Visual Web Developer 2005

If you are using VS2008, v1.2 version of SQL Publishing Wizard comes pre-installed. You can check out here where to invoke it from.