Best sql-server questions in January 2012

Why are the performances of these 2 queries so different?

8 votes

I have a stored proc that searches for products (250,000 rows) using a full text index.

The stored proc takes a parameter that is the full text search condition. This parameter can be null, so I added a null check and the query suddenly started running orders of magnitude slower.

-- This is normally a parameter of my stored proc
DECLARE @Filter VARCHAR(100)
SET @Filter = 'FORMSOF(INFLECTIONAL, robe)'

-- #1 - Runs < 1 sec
SELECT TOP 100 ID FROM dbo.Products
WHERE CONTAINS(Name, @Filter)

-- #2 - Runs in 18 secs
SELECT TOP 100 ID FROM dbo.Products
WHERE @Filter IS NULL OR CONTAINS(Name, @Filter)

Here are the execution plans:

Query #1 Execution plant #1

Query #2 Execution plant #2

I must admit that I am not very familiar with execution plans. The only obvious difference to me is that the joins are different. I would try adding a hint but having no join in my query I am not sure how to do that.

I also do not quite understand why the index called IX_SectionID is used, since it is an index that only contains the column SectionID and that column is not used anywhere.

OR can crush performance, so do it this way:

DECLARE @Filter VARCHAR(100)
SET @Filter = 'FORMSOF(INFLECTIONAL, robe)'

IF @Filter IS NOT NULL
BEGIN
    SELECT TOP 100 ID FROM dbo.Products
    WHERE CONTAINS(Name, @Filter)
END
ELSE
BEGIN
    SELECT TOP 100 ID FROM dbo.Products
END

Look at this article: Dynamic Search Conditions in T-SQL by Erland Sommarskog and this question: SQL Server 2008 - Conditional Query.

Should Join Tables typically be created as Index Organized Tables (Clustered Indexes)?

6 votes

Generally speaking ... should join tables (i.e. associative tables) be created as Index Organized Tables (Oracle) , Clustered Indexes (SQL Server) .... or plain old heap tables (with separate indexes on the 2 columns).

The way I see if, the advantages are:

Speed improvement. You're avoiding a heap table look up.

Space Improvement. You're eliminating the heap table altogether, so you're probably saving ~30% space.

The disadvantages:

Index Skip Scan (only applies to Oracle) .. will be faster then a Full Table Scan, but slower then an Index Scan. So searches on the second column of the compound key will be slightly slower (Oracle), much slower (MSSQL).

A Full Index Scan will be slower then a Full Table Scan - so if most of the time the Cost Based Optimizer is doing Hash Joins (which don't take advantage of Indexes) ... you could expect worse performance. (Assuming that the RDBMS doesn't first filter the tables).

Which makes me question whether any type of indexes are really required for Join Tables, if you're predominately going to be doing Hash Joins.

My personal rule-of-thumb is to create two-table associative entities as index-organized-tables, with the primary key constraint being the access "direction" I expect to be more commonly used. I'll then generally add a unique index to cover reverse order of the keys, so in all cases the optimizer should be able to use unique-scan or range-scan access.

Three-table (or more) associative entities generally require significantly more analysis.

Also, the optimizer will use indexes with hash join operations; generally fast full scans, but indexes nonetheless.

Selecting specific records in a SQL Server database from C#

6 votes

I am currently trying to grab some rows from a SQL Server database using C# that are of the following criteria:

  • From the RamResults database
  • in the Results table
  • where the Date column is equal to the current date

I have the following so far:

// Open the same connection with the same connection string.
using (SqlCeConnection con = new SqlCeConnection(DatabaseControl.conString))
{
   con.Open();
   // Read specific values in the table.
   using (SqlCeCommand com = new SqlCeCommand("SELECT Result FROM RamResults WHERE Date == @Form1.date", con))
   {
      SqlCeDataReader reader = com.ExecuteReader();
      while (reader.Read())
      {
         int resultsoutput = reader.GetInt32(0);
         MessageBox.Show(resultsoutput.ToString());
      }
   }
}

Using SELECT Result FROM RamResults WHERE Date == Form1.date throws an error:

There was an error parsing the query. [ Token line number = 1,Token line offset = 43,Token in error = = ]

Although if I take out the WHERE statement e.g.

SELECT Result FROM RamResults

it works perfectly

Description

2 things

  1. Use = instead of == because this is the right equals operator in T-SQL. Your Query should be like this

    SELECT Result FROM RamResults WHERE Date = @Date

  2. You forget to pass in the parameter.

Sample

// Open the same connection with the same connection string.
using (SqlCeConnection con = new SqlCeConnection(DatabaseControl.conString))
{
    con.Open();
    // Read specific values in the table.
    using (SqlCeCommand com = new SqlCeCommand("SELECT Result FROM RamResults WHERE Date = @Date", con))
    {
        com.Parameters.AddWithValue("@Date", Form1.date);
        SqlCeDataReader reader = com.ExecuteReader();
        while (reader.Read())
        {
            int resultsoutput = reader.GetInt32(0);
            MessageBox.Show(resultsoutput.ToString());
        }
    }
}

NOLOCK with Multithreading

5 votes

I am working on a multithreaded application (C#), and 2 threads are updating a table using NOLOCK at the same time, is that a problem? To be more specific, they are both updating the same records.

The answer is "it depends".

NOLOCK allows 'dirty reads'. This means that within a transaction, you may see uncommitted data from another transaction. If you have multiple threads updating the same row in the same table, you may see the modified value of data that the other thread touched, before that thread commits it's transaction.

For example, take the table account_balances, defined as (account_num int, balance decimal(12,2)). Let's assume the following happens:

// precondition, account #1 has a balance of 10.00

  1. Thread #1 starts a transaction, decrements account #1 by 10
  2. Thread #2 starts a transaction, attempts to read the balance of account #1. It reads a balance of 0.
  3. Thread #2 decrements the account by $5, and issues an overdraft to the customer (their balance is -5)
  4. Thread #1 rolls back it's transaction
  5. Thread #2 commits it's transaction

// The account balance is now -5, even though it should be 5.

What you won't see is some form of inconsistent data within a field- the nolock hint isn't like running multi-threaded code without a lock- individual writes are still atomic.

How should I perform automated testing of SQL ETL Scripts & SPROCs?

5 votes

We have a number of DTSX Packages that Extract-Transform-Load data from a number of legacy Pervasive Databases -> SQL Server 2k8 R2. (via ODBC)

The majority of our ETL process T-SQL scripts

The legacy data is a badly structured mess & most often, the tweaks & changes we make to our ETL scripts are to deal with this malformed data.

What we'd like to do is put some process in place to automate testing of these SPROCS / Scripts that perform these transformations.

What we've come up with ourselves is to:

  • Have some known datasource input (hosted in a local pervasive database)
  • Run the DTSX Packages to generate the output
  • Have some expected & pre-created output somewhere else within SQL Server.
  • Write our own verification scripts to compare the expected with the actual.

Just wondering if there are any other processes/systems/db-testing applications we should be considering also.

Have a look at tSQLt - the Database Unit Testing Framework for SQL Server. It makes writing robust tests fairly easy.

It does not (yet) allow for testing of scripts, but you could wrap those into stored procedures.

How do I tell Entity Framework Function Import that a column returned by a stored procedure is not nullable?

5 votes

I have an SQL Server stored procedure that ressembles this:

CREATE PROCEDURE [jp].[GetFoo]
    @Guid UNIQUEIDENTIFIER
AS

SELECT
    CONVERT(BIT, (CASE WHEN [dbo].[GetBar](T.Col2) = 3 THEN 1 ELSE 0 END)) IsGetBarCol2EqualToThree
FROM 
    [dbo].[MyTable] T
WHERE
    T.Col1 = @Guid

When I do Function Import / Get Column Information in EF, the inferred type of the column IsGetBarCol2EqualToThree is Nullable<bool>. But there is no way this field is going to be null, so I'd like it to be just bool. Is there a way to do this that would be persistent upon updating (ie that does not rely on modifying any generated code)?

The SQL Server version is 2005, I'm using Visual Studio 2010SP1 with EF 4, project is compiled against .net 4.0.

what happens if you do this mod. isnull([dbo].[GetBar](T.Col2), 0)?

put as answer :P

delete top 1000 rows from a table sql server 2008

5 votes

I have a table in SQL I would like to delete 1000 rows from it I tried this, But I deleted all table

delete from [mytab] 
select top 1000 
a1,a2,a3
from [mytab]

Why not

delete top (1000) from [mytab];

C3p0 - APPARENT DEADLOCK on MSSQL, but not PostgreSQL or MySQL

5 votes

We are getting exceptions like this

com.mchange.v2.async.ThreadPoolAsynchronousRunner$DeadlockDetector@5b7a7896 -- APPARENT DEADLOCK!!! Complete Status: 
Managed Threads: 3
Active Threads: 3
Active Tasks: 
    com.mchange.v2.c3p0.stmt.GooGooStatementCache$1StatementCloseTask@55bc5e2a (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#1)
    com.mchange.v2.c3p0.stmt.GooGooStatementCache$1StatementCloseTask@41ca435f (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2)
    com.mchange.v2.c3p0.stmt.GooGooStatementCache$1StatementCloseTask@460d33b7 (com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#0)
Pending Tasks: 

when load testing our application on MSSQL 2008 R2 (jTDS or official MS JDBC doesn't matter). We never get this exception when running the same tests against PostgreSQL or MySQL.

We don't just want to increase the number of helper threads for c3p0 (which solves the problem, but how long?). We want to know what's the problem as it is workings with other DBMS'.

The applications behaves like:

  • Send X requests
  • Wait for a while -> DEADLOCK
  • Send X requests
  • Wait for a while -> DEADLOCK

Does anyone know or has an idea why we have this behavior with MSSQL?

Thanks, Adrian

(Btw. BoneCP works without any problem too.)

SQL Server has a much more restrictive locking strategy compared to PostgreSQL or InnoDB.

Especially it will block SELECTs on rows (tables?) that are updated from a different connection/transaction (in the default installation).

You should make sure that you are not selecting the same rows in one session that are being updated from another.

If you can't change the sequence of your code, you might get away with using "dirty reads" in SQL Server.

If I remember that correctly, this is accomplished by adding WITH NOLOCK to the SELECT statements (but I'm not entirely sure)

Edit
A different possibility (if you are on SQL Server 2005 or later) would be to use the new "snapshot isolation" to avoid blocking selects.

Substring of a DateTime with Linq to Sql extensions

5 votes

I'm trying to get a list of dates from my table, which contains a number of DateTime values in a column called StartTime. My predecessor was using the following SQL:

SELECT DISTINCT SUBSTRING(CONVERT(VARCHAR(50), StartTime, 120), 1, 10)

This results in a distinct list of dates in "yyyy-MM-dd" format for each row in the table. I'm trying to convert this to Linq-to-SQL by doing the following:

query.Select(o => o.StartTime.ToString("yyyy-MM-dd")).Distinct()

However this results in an error "Method 'System.String ToString(System.String)' has no supported translation to SQL."

How can I do this Substring/Convert using Linq-to-SQL?

Thanks!

Instead of relying on string processing, you can handle this via DateTime properties supported in LINQ to SQL:

var results = query.Select(o => o.StartTime.Date).Distinct();

If you want to view this as a string, later, you can use LINQ to Objects to convert the results:

var stringResults = results.AsEnumerable().Select(d => d.ToString("yyyy-MM-dd"));

Calling user defined functions with Create Table

5 votes

I have the following function

CREATE FUNCTION GetIdentity() RETURNS INT AS
BEGIN
  RETURN (IDENT_CURRENT('tblTempPo'))
END
GO

I need to call it with create table

create table tblTempPo
(
ID int null,
 BrickVolume AS
              (
              GetIdentity() 
              )
)

I'm getting the error

'GetIdentity' is not a recognized built-in function name.

How can I solve this?

You need to add dbo (or whatever the schema name is) to properly call the function:

create table tblTempPo
(
    ID int null,
    BrickVolume AS(dbo.GetIdentity())
)

Although, for your example to work, you'd want to do something like this:

CREATE TABLE tblTempPo
(
    ID INT IDENTITY(1,1) NOT NULL,
    AnotherField VARCHAR(10),
    BrickVolume AS (dbo.GetIdentity())
)
GO

INSERT INTO tblTempPo VALUES('a')
INSERT INTO tblTempPo VALUES('b')
INSERT INTO tblTempPo VALUES('c')
SELECT * FROM tblTempPo

The SELECT statement will yield the results:

ID     AnotherField     BrickVolume
-----------------------------------
 1          a               3
 2          b               3
 3          c               3

Can't add index to persisted computed column because it is "of a type that is invalid for use as a key"

5 votes

I have a table like this with a computed column:

CREATE TABLE PhoneNumbers
(
    [PhoneNumberID] int identity(1,1) not null primary key clustered,
    [Number] varchar(20), /* Entire number, like (800) 555-5000 */
    [Digits] AS dbo.RegExReplace(Number, '[^0-9]', '') PERSISTED  /* Like 8005555000 */
)

It's created fine, and the Digits column works great as expected, BUT it doesn't seem to behave as a "PERSISTED" column. When I do a query with Digits in the WHERE clause it's VERY slow. When I try to add an index to the Digits column I get: Column 'Digits' in table 'PhoneNumbers' is of a type that is invalid for use as a key column in an index.

It seems like that column isn't really being treated as PERSISTED and is being recomputed on every query and won't let me add an index.

The RegExReplace is a C# CLR function defined as follows:

[SqlFunction(IsDeterministic = true, IsPrecise = true)]
public static SqlString RegExReplace(SqlString expression, SqlString pattern, SqlString replace)

Any ideas on how to get that Digits column to act like a persisted column or allow me to add an index?!

Thank you!

Try a CAST:

CREATE TABLE PhoneNumbers
(
    [PhoneNumberID] int identity(1,1) not null primary key clustered,
    [Number] varchar(20), /* Entire number, like (800) 555-5000 */
    [Digits] AS CAST(dbo.RegExReplace(Number, '[^0-9]', '') AS VARCHAR(20)) PERSISTED  /* Like 8005555000 */
)

I believe that the problem is your CLR function is returning SqlString which ends up being nvarchar(4000) or similar - not indexable.

It's kind of a known "problem" with computed columns that the datatype is inferred from the expression. Mainly an issue with strings and "helper functions" which take varchar(max) and also with decimal operations where precision changes due to the calculations.

I have a little rule where I always CAST - it makes it explicit and avoids any ambiguity. Generally, columns which are known to be small should be explicitly small - varchar(max) seems to have a lot of performance overhead - even if you pass through a function which returns varchar(max) and takes varchar(max), cast back to the size you know, because it will perform a lot better.