Best sql-server questions in July 2011

How to reduce remote SQL Server loads?

8 votes

I want to create an application in C# with client and server sides. It will work over local network. Client sides must check for updates on a remote SQL Server. Lets say we've set update time interval to 2 seconds. If i have 20 client side applications, then they'll send query to the remote SQL Server every 2 sec and it will load server quite a lot. Now I want to know is there any way to reduce server load or it's only way to check for updates?

Sorry for my bad english. Thx in advance

From my point of view, there is no need to allow clients to connect the DB serer directly. There should be one more tier here which will only connect to the server and cache information about the updates. Your clients should connect to this additional information and work with the cached info.

UPDATE As far as I understand, the problem appears because all your clients ping your DB server every two seconds. The solution to this problem is to create a special module which will only have access to the DB server and asks it for the update. For example, every two seconds. If the update is ready, it should be able to fetch it from the DB and store. This is what I meant under the additional tier.

Now, let's return to your clients. They should be able to communicate with this module and get information from it about a ready update (this information is cached and thus it is really fast to obtain it. Also you needn't ping the server at every client request). If update is ready, fetch it to the client side and work on client side.

As for the communication between this additional tier and clients. Since you are working with .NET, I would suggest that you take a look at the WCF which, from my point of view, becomes a standard approach of implementing the between-process communication in .NET. There are a lot of information in the network about it, I will post the links shortly.

Here is my favorite WCF book:

Programming WCF Services

MSDN entry:

Windows Communication Foundation

Should I TRUNCATE TABLE before I DROP TABLE to avoid logging overhead?

7 votes

I understand that using TRUNCATE is a minimally logged operation and does not log the deletion of each record while DROP logs delete operations.

So, is it safe to assume that if I want to get rid of a relatively large table and I want this to happen as QUICKLY and with as LITTLE logging overhead as possible I should TRUNCATE TABLE before I DROP TABLE? Does doing this in RECOVERY SIMPLE make any difference?

I should note that this needs to happen in an automated fashion (within pre-written scripts) because this will be deployed to client databases where both downtime and log file growth could be a problem.

While TRUNCATE doesn't log individual rows, it does log for the page/extent. This is why you can rollback a truncate (which not a lot of people know). My guess is that if you just truncate then drop it will actually be slower than a drop on its own. If you commit in between, maybe not, but it would also depend on the log activity, recovery model, when you hit a checkpoint, etc.

Why is the speed important here? It's not like users are using the table if you're about to drop it...

Why don't you test it? Unless someone has run an extensive study about this covering several different variables, I doubt you're going to get much more than quasi-educated guesses.

SQL - order by list order

6 votes

I have the following query that returns rows based on a comma seperated list

Select * from Table where RecordID in (22,15,105,1,65,32)

I would like the results of this query to return to in the order of the ID's in the list. Is that possible with SQL?

Thanks in advance

If you need the output to appear in a particular order, then you need to specify that order, using something the server can sort. Not knowing which engine you're working against, the general scheme would be to create a temp table or use rowset constructors to pair each record ID with its desired sort order.

E.g. (SQL Server)

declare @T table (RecordID int,Position int)
insert into @T (RecordID,Position)
select 22,1 union all
select 15,2 union all
select 105,3 union all
select 1,4 union all
select 65,5 union all
select 32,6

select * from Table t inner join @T t2 on t.RecordID = t2.RecordID order by t2.Position

Split a string and return greatest in mssql

6 votes

I need to find a way to get the data with the highest versionNumber.

Here is my database design:

VERSIONNUMBER - varchar(15)
DOWNLOADPATH - varchar(100)

Lets say I have records like:

VERSIONNUMBER -------- DOWNLOADPATH
1.1.2                  a.com
1.1.3                  b.com
2.1.4                  c.com
2.1.5                  d.com
2.2.1                  e.com

I need to get the record with the versionnumber 2.2.1. Need some help with the sql though :)

Thank you for any help

Try this:

with a as
(
    select * from (values
    ('1.1.2'),('1.1.3'),('2.1.4 '), ('2.1.5'), ('2.2.1') ) as b(c)
)
select c, PARSENAME(c,1),PARSENAME(c,2), PARSENAME(c,3)
from a
order by 
convert(int,PARSENAME(c,3)),
convert(int,PARSENAME(c,2)),
convert(int,PARSENAME(c,1))

Inspired from: http://www.sql-server-helper.com/tips/sort-ip-address.aspx

with a as
(
    select * from (values
    ('1.1.2'),('1.1.3'),('2.1.4 '), ('2.1.5'), ('2.2.1') ) as b(c)
),
x as 
(
    select c, 
       convert(int,PARSENAME(c,3)) * 100 
       + convert(int,PARSENAME(c,2)) * 10 
       + convert(int,PARSENAME(c,1)) * 1 as the_value
    from a
)
select c from x where the_value = (select MAX(the_value) from x)

In software development, it is typical to find a minor version number that has two digits in it, the version's number don't have any bearing with number's value, thus version 1.12 is greater than 1.5; to compensate for that, you must pad the digits adequately:

    -- Use this, the query above is not future-proof :-)
with a as
(
    select * from (values
    ('2.1.4 '), ('2.1.12'), ('2.1.5'), ('2.2.1') ) as b(c)
),
x as 
(
    select c, 
       convert(int,PARSENAME(c,3)) * 100*100*100 
       + convert(int,PARSENAME(c,2)) * 100*100 
       + convert(int,PARSENAME(c,1)) * 100 as the_value
    from a
)
select c, the_value from x   
order by the_value

Output:

2.1.4   2010400
2.1.5   2010500
2.1.12  2011200
2.2.1   2020100

If you don't take that into consideration(as with the following query):

with a as
(
    select * from (values
    ('2.1.4 '), ('2.1.12'), ('2.1.5'), ('2.2.1') ) as b(c)
),
x as 
(
    select c, 
       convert(int,PARSENAME(c,3)) * 100
       + convert(int,PARSENAME(c,2)) * 10
       + convert(int,PARSENAME(c,1)) * 1 as the_value
    from a
)
select c, the_value from x   
order by the_value;


    -- KorsG's answer has a bug too
with a as
(
    select * from (values
    ('2.1.4 '), ('2.1.12'), ('2.1.5'), ('2.2.1') ) as b(c)
),
x as 
(
    select c, 
       CAST(REPLACE(c, '.', '') AS int) as the_value
    from a
)
select c, the_value from x   
order by the_value      

Those two queries will yield the same (incorrect) output:

c           the_value
2.1.4   214
2.1.5   215
2.2.1   221
2.1.12  222

The 2.2.1 and 2.1.12's value overlapped. That also happens when you merely remove the dots and directly convert the resulting string to int. 2.1.12 become two thousand one hundred twelve, 2.2.1 become two hundred twenty one. 2.2.1 is greater than 2.1.12, not less than

Fastest way to find string by substring in SQL?

5 votes

I have huge table with 2 columns: Id and Title. Id is bigint and I'm free to choose type of Title column: varchar, char, text, whatever. Column Title contains random text strings like "abcdefg", "q", "allyourbasebelongtous" with maximum of 255 chars.

My task is to get strings by given substring. Substrings also have random length and can be start, middle or end of strings. The most obvious way to perform it:

SELECT * FROM t LIKE '%abc%'

I don't care about INSERT, I need only to do fast selects. What can I do to perform search as fast as possible?

I use MS SQL Server 2008 R2, full text search will be useless, as far as I see.

If you want to use less space than Randy's answer and there is considerable repetition in your data, you can create an N-Ary tree data structure where each edge is the next character and hang each string and trailing substring in your data on it. You number the nodes in depth first order. Then you can create a table with up to 255 rows for each of your records, with the Id of your record, and the node id in your tree that matches the string or trailing substring. Then when you do a search, you find the node id that represents the string you are searching for (and all trailing substrings) and do a range search.

What are the differences between Merge Join and Lookup transformations?

5 votes

Hi I'm new to SSIS packages and writing a package and reading up about them at the same time.

I need to convert a DTS into a SSIS package and I need to perform a join on two sources from different databases and was wondering what was the better apporach, to use a lookup or a merge join?

On the surface they seem very similar. The 'Merge Join' requires that the data be sorted before hand whereas the 'Lookup' doesn't require this. Any advice would be very helpful. Thank you.

Screenshot #1 shows few points to distinguish between Merge Join transformation and Lookup transformation.

Regarding Lookup:

If you want to find rows matching in source 2 based on source 1 input and if you know there will be only one match for every input row, then I would suggest to use Lookup operation. An example would be you OrderDetails table and you want to find the matching Order Id and Customer Number, then Lookup is a better option.

Regarding Merge Join:

If you want to perform joins like fetching all Addresses (Home, Work, Other) from Address table for a given Customer in the Customer table, then you have to go with Merge Join because the customer can have 1 or more addresses associated with them.

An example to compare:

Here is a scenario to demonstrate the performance differences between Merge Join and Lookup. The data used here is a one to one join, which is the only scenario common between them to compare.

  1. I have three tables named dbo.ItemPriceInfo, dbo.ItemDiscountInfo and dbo.ItemAmount. Create scripts for these tables are provided under SQL scripts section.

  2. Tablesdbo.ItemPriceInfo and dbo.ItemDiscountInfo both have 13,349,729 rows. Both the tables have the ItemNumber as the common column. ItemPriceInfo has Price information and ItemDiscountInfo has discount information. Screenshot #2 shows the row count in each of these tables. Screenshot #3 shows top 6 rows to give an idea about the data present in the tables.

  3. I created two SSIS packages to compare the performance of Merge Join and Lookup transformations. Both the packages have to take the information from tables dbo.ItemPriceInfo and dbo.ItemDiscountInfo, calculate the total amount and save it to the table dbo.ItemAmount.

  4. First package used Merge Join transformation and inside that it used INNER JOIN to combine the data. Screenshots #4 and #5 show the sample package execution and the execution duration. It took 05 minutes 14 seconds 719 milliseconds to execute the Merge Join transformation based package.

  5. Second package used Lookup transformation with Full cache (which is the default setting). creenshots #6 and #7 show the sample package execution and the execution duration. It took 11 minutes 03 seconds 610 milliseconds to execute the Lookup transformation based package. You might encounter the warning message Information: The buffer manager has allocated nnnnn bytes, even though the memory pressure has been detected and repeated attempts to swap buffers have failed. Here is a link that talks about how to calculate lookup cache size. During this package execution, even though the Data flow task completed faster, the Pipeline cleanup took lot of time.

  6. This doesn't mean Lookup transformation is bad. It's just that it has to be used wisely. I use that quite often in my projects but again I don't deal with 10+ million rows for lookup everyday. Usually, my jobs handle between 2 and 3 millions rows and for that the performance is really good. Upto 10 million rows, both performed equally well. Most of the time what I have noticed is that the bottleneck turns out to be the destination component rather than the transformations. You can overcome that by having multiple destinations. Here is an example that shows the implementation of multiple destinations.

  7. Screenshot #8 shows the record count in all the three tables. Screenshot #9 shows top 6 records in each of the tables.

Hope that helps.

SQL Scripts:

CREATE TABLE [dbo].[ItemAmount](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [ItemNumber] [nvarchar](30) NOT NULL,
    [Price] [numeric](18, 2) NOT NULL,
    [Discount] [numeric](18, 2) NOT NULL,
    [CalculatedAmount] [numeric](18, 2) NOT NULL,
CONSTRAINT [PK_ItemAmount] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO

CREATE TABLE [dbo].[ItemDiscountInfo](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [ItemNumber] [nvarchar](30) NOT NULL,
    [Discount] [numeric](18, 2) NOT NULL,
CONSTRAINT [PK_ItemDiscountInfo] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO

CREATE TABLE [dbo].[ItemPriceInfo](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [ItemNumber] [nvarchar](30) NOT NULL,
    [Price] [numeric](18, 2) NOT NULL,
CONSTRAINT [PK_ItemPriceInfo] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO

Screenshot #1:

1

Screenshot #2:

2

Screenshot #3:

3

Screenshot #4:

4

Screenshot #5:

5

Screenshot #6:

6

Screenshot #7:

7

Screenshot #8:

8

Screenshot #9:

9

Is it bad practice to store SQL queries in resource file?

5 votes

I have a web application that communicates with SQL server. Rather than hard-coding all of the query strings, I have opted to store them in a global resource file. Is that considered bad practice?

On a side note, when I do this, Visual Studio yells at me about the possibility of SQL injection, despite those queries being parameterized (not to mention the "spelling" warnings inside the resource file).

I don't see anything particularly "bad" with doing this. It really isn't much different than hard coding the sql code within your code, and only minorly different than generating the SQL ad-hoc at runtime.

You say that you are using parameterized queries, so you shouldn't have to worry about script injection.

If you are storing the sql in a resource file to adhere to the DRY principle, then you may want to use some kind of DAL for that purpose instead. Like Entity Framework (EF) or Linq-to-SQL

SQL Query - Need to improve performance

5 votes

I have a data load scenario where I create dynamic sql query to pull data and cache in our service. There is 1 table that contains all product data : ProductHistory (47 columns, 200,000 records + and will keep growing)

What I need: Get the latest products by using the maximum id, maximum version and maximum changeid.

First Attempt:

SELECT distinct Product.* FROM ProductHistory product 
WHERE  product.version = 
(SELECT max(version) from ProductHistory p2 where product.Id = p2.Id 
  and product.changeId = 
(SELECT max(changeid) from ProductHistory p3 where p2.changeId = p3.changeId))

This took more than 2.51 minutes.

Other Failed Attempt:

select distinct product.* from ProductHistory product 
where CAST(CAST(id as nvarchar)+'0'+CAST(Version as nvarchar)+'0'+CAST(changeid as nvarchar) as decimal) = 
(select MAX(CAST(CAST(id as nvarchar)+'0'+CAST(Version as nvarchar)+'0'+CAST(changeid as nvarchar) as decimal)) from ProductHistory p2 
where product.Id = p2.Id)

It basically uses the same principle as when you order dates, concatenating the numbers ordered by relevance.

For example 11 Jun 2007 = 20070711
And in our case: Id = 4 , version = 127, changeid = 32   => 40127032
The zeros are there not to mix up the 3 different ids

But this one takes 3.10 minutes !!! :(

So, I basically need a way to make my first attempt query better by any chance. I was also wondering with such amount of data, is this the best speed of retrieval that I should expect ?

  1. I ran sp_helpindex ProductHistory and found out the indexes as below :

    PK_ProductHistoryNew - clustered, unique, primary key located on PRIMARY- Id, Version

  2. I wrapped the first query in a SP but still no change.

So, wondering by what other means we can improve the performance of this operation ?

Thanks, Mani p.s : I am just running these queries in SQL management stuido to see the time.

Run the query from Sql Server Management Studio and look at the query plan to see where the bottle neck is. Any place you see a "table scan" or "index scan" it has to go through all data to find what it is looking for. If you create appropriate indexes that can be used for these operations it should increase performance.

SQL Server single query memory usage

5 votes

I would like to find out or at least estimate how much memory does a single query (a specific query) eats up while executing. There is no point in posting the query here as I would like to do this on multiple queries and see if there is a change over different databases. Is there any way to get this info?

Using SQL Server 2008 R2

thanks

Gilad.

You might want to take a look into DMV (Dynamic Management Views) and specifically into sys.dm_exec_query_memory_grants. See for example this query (taken from here):

DECLARE @mgcounter INT
SET @mgcounter = 1
WHILE @mgcounter <= 5 -- return data from dmv 5 times when there is data
BEGIN
    IF (SELECT COUNT(*)
      FROM sys.dm_exec_query_memory_grants) > 0
    BEGIN
             SELECT *
             FROM sys.dm_exec_query_memory_grants mg
                         CROSS APPLY sys.dm_exec_sql_text(mg.sql_handle) -- shows query text
             -- WAITFOR DELAY '00:00:01' -- add a delay if you see the exact same query in results
             SET @mgcounter = @mgcounter + 1
    END
END

While issuing the above query it will wait until some query is running and will collect the memory data. So to use it, just run the above query and after that your query that you want to monitor.

How to optimize a TSQL query?

5 votes

"activity" is a bit field. I need to set it to true if one of the rows with this client_id has value true

SELECT c.client_id, u.branch_id, a.account_id, activity
FROM Clients c INNER JOIN 
      accounts a ON c.id=a.client_id INNER JOIN uso u ON a.uso_id = u.uso_id,
     (SELECT MAX(CONVERT(int,accounts.activity)) as activity, client_id
       FROM accounts GROUP BY client_id) activ
WHERE activ.client_id = c.id

This query executes about 2 minutes. Please help me to optimize it.

Seems activity field is a BIT and you cannot do a MIN or MAX on it.

Instead of this, use TOP:

SELECT  c.client_id, u.branch_id, a.account_id,
        (
        SELECT  TOP 1 activity
        FROM    accounts ai
        WHERE   ai.client_id = c.id
        ORDER BY
                activity DESC
        )
FROM    clients c
JOIN    accounts a
ON      c.id = a.client_id
JOIN    uso u
ON      a.uso_id = u.uso_id

Create an index on accounts (client_id, activity) for this to work fast.

You may want to read this article:

SQL and Case Insensitive Conditions

5 votes

I have noticed in MySQL a where clause like WHERE x = 'Test' will match regardless of case ('TEST', 'test', etc.)

When using PDO, is it fair to assume that this is the case with most database servers? For example, If I were to use MSSQL or Oracle, would this be the same case?

It's not the server that it depends on, but the collation. Most databases will default to a case insensitive collation, so you can assume that, but if you ever run into one that is case sensitive it is easy to change.

SQL Server 2005 'between' operator behavior - bug or feature?

5 votes

According to microsoft documentation, the behavior of the 'BETWEEN' operator should be a closed interval both on the left and the right side, however, the following code:

SELECT CASE WHEN (1 BETWEEN 1 AND 2) THEN 'YEAH' ELSE 'NO' END AS [closed in the left],
       CASE WHEN (1 BETWEEN 2 AND 1) THEN 'YEAH' ELSE 'NO' END AS [closed in the right]

shows otherwise. Can you reproduce it? What do you think?

The minimum value must always be specified first when using BETWEEN. The documentation you cited indicates that:

BETWEEN returns TRUE if the value of test_expression is greater than or equal to the value of begin_expression and less than or equal to the value of end_expression.

so your second CASE is actually testing:

1 >= 2 AND 1 <= 1

I need good tool for database unit testing

5 votes

I've tried to test SQL Server database stored procedures (write tests like unit) using Visual Studio 2010 testing tool but it is very inconvenient.
Is there any another more convenient tool for testing database stored procedures?

Check out TSQLUnit for an open source, SQL Server unit testing app. And there are several more SQL Server-specific testing apps listed on this page.

Is there a way to pause or wait for a few moments from within a script?

2 votes

I would assume that what I am about to ask would typically fall within the realm of "bad practice" but I want to ask regardless because this functionality is useful for diagnosing problems and bench marking long running scripts.

In short, is there a function that is built into SQL Server that I could call that would allow me to 'pause' for a short period of time?

I have a nightly SSIS process that after exporting data generates various subsets of statistical data. Whenever I have trouble with execution times of the SSIS package, I have a series of diagnostic stored procedures that I can call that help diagnose at which export/bulk insert is taking exceptionally long to execute.

During the second half of my SSIS process, where I generate statistical data, it would be really beneficial if I could analyze where the process may be bogging down or analyze how fast it can modify the data in my data tables. Most of the statical procedure begin by inserting the data into a table that is cycled through via WHILE loops and updated when necessary conditions are met.

I can easily query my data to capture a specific metric while my statistical stored procedures are executing. What I would like to do is grab that metric, wait exactly 1 second (or some specific amount of time) and grab the metric again so that I can compare them.

By being able to implement a 'pause', I can create more robust diagnostic scripts that out put how many inserts, updates or calculations are begin executed per sec and can thus estimate which portions of my bigger procedures are taking the longest.

I would assume that there may be some diagnostic tools that I could use from within SQL Server Management Studio. I admit, I haven't tinkered with any of that. However, usually if there is a tool within SSMS much of the back-end functionality can be mimicked by some form of fancy scripting. Another possible limitation to using any built-in diagnostics is that I'd prefer to run these metrics while the SSIS package is running not just while I am testing the stored procedures that execute within the package.

Thanks guys for any solutions or tips! This request isn't 100% necessary but my nightly SSIS packages are becoming massive and I'm having to start fine tuning the "squeeky wheels" that I've been able to ignore in the past.

Perhaps you are looking for WAITFOR DELAY

Calling a Function in a Stored Proc Cursors

2 votes

I have a function "Split" which I am calling in a stored proc with cursor.

My a_upload table structure is: (int, varchar(100))

upload_id   allowed_file_extensions
---------------------------------------------------------------------
1           .xls, .doc, .pdf, .docx, .xlsx, .pptx, .txt
2           .xls, .doc, .pdf, .jpeg, .jpg, .docx, .xlsx, .pptx, .txt

table c_file_extensions is: (int, varchar(100), bit)

id  description  is_enabled
---------------------------
1   .xls         1
2   .doc         1
3   .pdf         0
4   .rtf         1

Stored proc:

DECLARE @is_enabled Varchar(10), @allowed_file_extensions Varchar(100)

   SET @is_enabled = 'True';

DECLARE cur CURSOR FORWARD_ONLY FOR 
  SELECT items 
    FROM split((SELECT allowed_file_extensions 
                  FROM a_upload 
                 WHERE upload_id = 1), ',') 

OPEN cur
FETCH NEXT FROM cur INTO @allowed_file_extensions
WHILE @@fetch_status=0
BEGIN

    IF EXISTS(SELECT * 
                FROM c_file_extensions 
               WHERE description = @allowed_file_extensions 
                 AND is_enabled = 0)
      SET @is_enabled = 'False';

FETCH NEXT FROM cur INTO @allowed_file_extensions
END
CLOSE cur
DEALLOCATE cur

SELECT @is_enabled AS Output

function split:

CREATE FUNCTION dbo.Split(@String varchar(8000), @Delimiter char(1))       
  RETURNS @temptable TABLE (items varchar(8000)) AS

BEGIN
    DECLARE @idx int       
    DECLARE @slice varchar(8000)       

    SELECT @idx = 1    

    IF LEN(@String) < 1 OR @String IS NULL RETURN

    WHILE @idx!= 0       
    BEGIN       
        SET @idx = CHARINDEX(@Delimiter,@String)       
        IF @idx != 0       
          SET @slice = LEFT(@String,@idx - 1)       
        ELSE       
          SET @slice = @String       

          IF(LEN(@slice)>0)  
            INSERT INTO @temptable
             (Items) 
            VALUES
             (@slice)       

          SET @String = RIGHT(@String, LEN(@String) - @idx)       
          IF LEN(@String) = 0 BREAK
    END  
RETURN      
END  

For upload_id = 1, I am expecting the output to be 'False', but I always get 'True'. I tried to debug, and found that the "if exists (select * from c_file_extensions where description = @allowed_file_extensions and is_enabled = 0)" is not working properly.

This is happening because you're comparing against items with leading spaces.

Try this:

  select items from split((select ext from a_upload where id = 1), ',') 

This results in:

.xls
 .doc
 .pdf
 .docx
 .xlsx
 .pptx
 .txt

To solve this, you could:

  • modify your cursor definition:
select  LTRIM(items) from split((select ext from a_upload where id = 1), ',') 
  • modify your dbo.split() to remove leading spaces on Line 29:
set @String = LTRIM(right(@String,len(@String) - @idx))  

Comparing sql query resultsets in java

1 votes

I want to run a query programmatically using java on a sql server database running on our dev environment and production environment and compare the results. would doing a column by column comparison be the best way to compare the resultsets or is there a better way to do it ?

Thank You

Comparing resultsets would work. The other option would be to create an object for each resultset and compare that. You could define your equals method to handle this, like any other ordinary Java object.

Much of this will depend on what you want to produce in the end. If all you're only concerned about resultset 1 being different from resultset 2, either method would work.

SQL Server query execution very slow when comparing Primary Keys

1 votes

I have a SQL Server 2008 R2 database table with 12k address records where I am trying to filter out duplicate phone numbers and flag them using the following query

SELECT a1.Id, a2.Id  
  FROM Addresses a1 
    INNER JOIN  Addresses a2 ON a1.PhoneNumber = a2.PhoneNumber
  WHERE a1.Id < a2.Id

Note: I realize that there is another way to solve this problem by using EXISTS, but this is not part of the discussion.

The table has a Primary Key on the ID field with a Clustered Index, the fragmentation level is 0 and the phone number field is not null and has about 130 duplicates out of the 12k records. To make sure it is not a server or database instance issue I ran it on 4 different systems.

Execution of the query takes several minutes, sometimes several hours. After trying almost everything as one of my last steps I removed the Primary Key and ran the query without it and voila it executed in under 1 second. I added the Primary Key back and it still ran in under one second.

Does anybody have an idea what is causing this problem?

Is it possible that the primary key gets somehow corrupted?

EDIT: My apologies I had a couple of typos in the Sql Query

Out of data statistics. Dropping and recreating the PK will give up fresh statistics.

Too late now, but I'd have suggest running sp_updatestats to see what happened.

If you backup and restore a database onto different systems, statistics follow the data

I'd suspect a different plan too after non-indexed (I guess) columns PhoneNumber and CCAPhoneN

Does an index already cover a clustered primary key?

1 votes

Let's say I have a table like this:

CREATE TABLE t(
  [guid] [uniqueidentifier] NOT NULL,
  [category] [nvarchar](400)
  {,...other columns}
  )

Where guid is my primary key, and has a clustered index.

Now, I want an index that covers both category and guid, because I'm rolling up some other stuff related to t by category, and I want to avoid including the t table itself.

Is it sufficient to create index covering category, or do I need to include guid as well?

I would expect SQL Server indexes to point directly to page offsets in t rather than simply referring to a guid primary key value, which means I would need to explicitly include the PK column to avoid hitting t. Is this the case?

Actually your assumption is wrong - all SQL Server non-clustered indices do include the clustering key (single or multiple columns) and do not point directly at some physical page.

This prevents SQL Server from having to reorganize and update lots of index entries when a page needs to be split in two or relocated. So if you are seeking in a non-clustered index and you find a value, then you have the clustering key and SQL Server will need to do a "bookmark lookup" (or key lookup) to retrieve the actual data page (the leaf page in the clustering index) to get the whole set of data belonging to a single row.

That said - if you ever have a situation where it depends on the ordering of the key columns, then you still might need to create an index specifically on (guid, category) - of course, in that case, SQL Server is smart enough to figure out that the clustering key column is already in the index and won't be adding it one more time.

The fact that the clustering key column(s) are inlcuded in every single non-clustered index is another strong reason why your clustering keys should be narrow, static and unique. Making them too wide (anything beyond 8 byte) is a sure recipe for bloat and slow-down.

0 votes
ALTER PROCEDURE [dbo].[getMessages]
    -- Add the parameters for the stored procedure here
    @lastRow int,
    @sort varchar(9)
AS
BEGIN
    -- Insert statements for procedure here
    DECLARE @StartRow INT,@EndRow INT
    SELECT @StartRow = (@lastRow + 1), @EndRow = (@lastRow + 6)

;WITH cte AS (SELECT ROW_NUMBER() OVER (ORDER BY
        CASE WHEN @sort = 'votes1' THEN m.votes END DESC,
        CASE WHEN @sort = 'votes2' THEN m.votes END ASC
      ) AS rows,
      m.message,
      m.messageId,
      TotalCount = COUNT(m.messageId) OVER ( PARTITION BY NULL)
    FROM
      tblMessages m
    WHERE
      m.deleted != 1
      )
     SELECT * 
     FROM cte WHERE ROWS BETWEEN @StartRow AND @EndRow
    ORDER BY rows
END

So this is my proc that I use for paging so on the front end I can pass in the last row I saw, and then when I click "load more", it starts from the next row and gets the next 6. Well, not I want the PREVIOUS 6, passing in an id, so if you see 6, go to the next six, and then want to see the previous 6 again.

How would I modify this proc to do that?

You don't.

Instead, modify your code. The key in the proc is the @lastrow. When you execute it the first time I'm assuming @lastrow = 0. If they go forward, you are executing it with a value of 6.

To go backward, just pass the current value - 6. For example, if you're on page 20, @lastrow is going to be 114. Subtract 6 in your code and call the proc again.