Best sql-server questions in June 2012

How To perform a SQL Query to DataTable Operation That Can Be Cancelled

10 votes

I tried to make the title as specific as possible. Basically what I have running inside a backgroundworker thread now is some code that looks like:

 SqlConnection conn = new SqlConnection(connstring);
                    SqlCommand cmd = new SqlCommand(query, conn);
                    conn.Open();
                    SqlDataAdapter sda = new SqlDataAdapter(cmd);
                    sda.Fill(Results);
                    conn.Close();
                    sda.Dispose();

Where query is a string representing a large, time consuming query, and conn is the connection object.

My problem now is I need a stop button. I've come to realize killing the backgroundworker would be worthless because I still want to keep what results are left over after the query is canceled. Plus it wouldn't be able to check the canceled state until after the query.

What I've come up with so far:

I've been trying to conceptualize how to handle this efficiently without taking too big of a performance hit.

My idea was to use a SqlDataReader to read the data from the query piece at a time so that I had a "loop" to check a flag I could set from the GUI via a button. The problem is as far as I know I can't use the Load() method of a datatable and still be able to cancel the sqlcommand. If I'm wrong please let me know because that would make cancelling slightly easier.

In light of what I discovered I came to the realization I may only be able to cancel the sqlcommand mid-query if I did something like the below (pseudo-code):

while(reader.Read())
{
 //check flag status
 //if it is set to 'kill' fire off the kill thread

 //otherwise populate the datatable with what was read
}

However, it would seem to me this would be highly ineffective and possibly costly. Is this the only way to kill a sqlcommand in progress that absolutely needs to be in a datatable? Any help would be appreciated!

There are really two stages where cancelling matters:

  1. Cancelling the initial query execution before the first rows are returned
  2. Aborting the process of reading the rows as they are served

Depending on the nature of the actual sql statement, either of these steps could be 99% of the time, so they both should be considered. For example, calling SELECT * on some table with a billion rows will take essentionally no time to execute but will take a very long time read. Conversely, requesting a super complicated join on poorly tuned tables and then wrapping it all in some aggregating clauses may take minutes to execute but negligible time to read the handful of rows once they are actually returned.

Well-tuned advanced database engines will also cache chunks of rows at a time for complicated queries, so you will see alternating pauses where the engine is executing the query on the next batch of rows and then fast bursts of data as it returns the next batch of results.

Cancelling the query execution

In order to be able to cancel a query while it is executing you can use one of the overloads of SqlCommand.BeginExecuteReader to start the query, and call SqlCommand.Cancel to abort it. Alternatively you can call ExecuteReader() syncronously in one thread and still call Cancel() from another. I'm not including code examples because there are plenty of them in the documentation.

Aborting the read operation

Here using a simple boolean flag is probably the easiest way. And remember it's really easy to fill a data table row using the Rows.Add() overload that takes an array of object, that is:

object[] buffer = new object[reader.FieldCount]
while(reader.Read()) {
    if(cancelFlag) break;
    reader.GetValues(buffer);
    dataTable.Rows.Add(buffer);
}

Cancelling blocking calls to Read()

A sort of mixed case occurs when, as mentioned earlier, a call to reader.Read() causes the database engine to do another batch of intensive processing. As noted in the MSDN documentation, calls to Read() can be blocking in this case even if the original query was executed with BeginExecuteReader. You can still get around this by calling Read() in one thread that's handling all the reading but calling Cancel() in another thread. The way you know if you reader is in a blocking Read call is to have another flag the the reader thread updates while the monitoring thread reads:

...
inRead = true
while(reader.Read()) {
    inRead = false
    ...
    inRead = true
}

// Somewhere else:
private void foo_onUITimerTick(...) {
   status.Text = inRead ? "Waiting for server" : "Reading";
}

Regarding performance of Reader vs Adapter

A DataReader is usually faster than using DataAdapter.Fill(). The whole point of a DataReader is to be really, really fast and responsive for reading. Checking some boolean flag once per row would not add a measurable difference in time even over millions of rows.

The limiting factor for a big database query is not the local CPU processing time but the size of the I/O pipe (your network connection for a remote database or your disk speed for a local one) or a combination of the db server's own disk speed and CPU processing time for a complex query. Both a DataAdapter and a DataReader will spend time (perhaps the majority of the time) just waiting for a few nanoseconds at a time for the next row to be served.

One convenience of DataAdapter.Fill() is that it does the magic of dynamically generating the DataTable columns to match the query results, but that's not difficult to do yourself (see SqlDataReader.GetSchemaTable()).

Parameter doesn't perform as well as hard coding the value

8 votes

I have a stored procedure that performs terribly. When I declare a variable, set its value and then use it in the where clause the statement takes over an hour to run. When I hard code the variables in the where clause it runs in less than a second.

I started to look into what was wrong with it through execution plans. It looks like when I try and pass it some declared variables the execution plan crates some Hash Match because it selects values from a view that uses a UNION and a common table expression.

/*************   Begin of Stored Procedure ***************/
CREATE PROCEDURE GetFruit
  @ColorId bigint,
  @SeasionId bigint
WITH RECOMPILE
AS
BEGIN

SELECT
    A.Name
FROM
    [Apple_View] A   /* This is the view down below */
    INNER JOIN [Fruit] F
        ON ( F.ColorId = @ColorId
            AND A.FruitId = F.FruitId)          
WHERE
    (A.ColorId = @ColorId
    AND 
    A.SeasonId = @SeasonId)

END
/************* End of Stored Procedure   ***************/

/************* Begin of View   ***************/
WITH Fruits (FruitId, ColorId, SeasonId) AS
(
    -- Anchor member
    SELECT
        F.FruitId
        ,F.ColorId
        ,F.SeasonId
    FROM
        ((  
            SELECT DISTINCT
                EF.FruitId
                ,EF.ColorId
                ,EF.SeasonId
                ,EF.ParentFruitId
            FROM
                ExoticFruit EF
                INNER JOIN Fruit FR
                    ON FR.FruitId = EF.FruitId
        UNION
            SELECT DISTINCT
                SF.FruitId
                ,SF.ColorId
                ,SF.SeasonId
                ,SF.ParentFruitId               
            FROM
                StinkyFruit SF
                INNER JOIN Fruit FR
                    ON FR.FruitId = SF.FruitId
        UNION
            SELECT DISTINCT
                CF.FruitId
                ,CF.ColorId
                ,CF.SeasonId
                ,CF.ParentFruitId
            FROM
                CrazyFruit CF
                INNER JOIN Fruit FR
                    ON FR.FruitId = CF.FruitId

            )) f

    UNION ALL

    -- Recursive Parent Fruit
    SELECT 
        FS.FruitId
        ,FS.ColorId
        ,FS.SeasonId
        ,FS.ParentFruitId
    FROM
        Fruits FS
        INNER JOIN MasterFruit MF
            ON  MF.[ParentFruitId] = fs.[FruitId]
)

SELECT DISTINCT
    FS.FruitId
    ,FS.ColorId
    ,FS.SeasonId
    FROM
        Fruits FS

/************* End of View   ***************/


/* To Execute */
EXEC GetFruit 1,3

If I run the Stored Procedure using the set values it takes over an hour and here is the execution plan. With Variables

If I run the Stored Procedure removing the DECLARE and SET values and just set the Where clause to the following statement it runs in less than a second and here is the execution plan:

WHERE(A.ColorId = 1 AND  A.SeasonId = 3)

hard coded where clause

Notice how the hard coded variables uses indexing while the first uses a hash set. Why is that? Why are hard coded values in the where clause working different from the declared variables?

-------this is what finally performed with the help of @user1166147------

I changed the stored procedure to use sp_executesql.

CREATE PROCEDURE GetFruit
  @ColorId bigint,
  @SeasionId bigint
WITH RECOMPILE
AS
BEGIN

DECLARE @SelectString nvarchar(max)

SET @SelectString = N'SELECT
    A.Name
FROM
    [Apple_View] A   /* This is the view down below */
    INNER JOIN [Fruit] F
        ON ( F.ColorId = @ColorId
            AND A.FruitId = F.FruitId)          
WHERE
    (A.ColorId = ' + CONVERT(NVARCHAR(MAX), @ColorId) + '
    AND 
    A.SeasonId = ' + CONVERT(NVARCHAR(MAX), @SeasonId) + ')'

EXEC sp_executesql @SelectString

END

EDIT SUMMARY Per a request from Damien_The_Unbeliever

The goal is to get best/most information about the variable value to SQL BEFORE the plan is created, generally parameter sniffing does this. There may be a reason that parameter sniffing was 'disabled' in this case. Without seeing a better representation of the actual code we can't really say what the solution is or why the problem exists. Try the things below to force the affected areas to generate plans using actual values.

*LONG VERSION WITH MORE DETAIL *

Is this your actual stored proc? Do you have default values for your parameters? If so, what are they?

Parameter sniffing can help - but it has to have typical parameters values to create the plan well, and if not, won't really help or will create a bad plan based off of the non typical parameter value. So if a variable has a default value of null or a value that is not a typical value the first time it is run and the plan compiled - it creates a bad plan.

If someone else wrote this sproc - they may have intentionally 'disabled' parameter sniffing with the local variables for a reason. Business rules may require these variable structures.

The goal is to get best/most information about the variable value to SQL BEFORE the plan is created, and generally Parameter Sniffing does this. But there are things that can make it affect performance negatively, and that may be why it is 'disabled'. It still seems like the plan is being created with atypical values for the parameters or not enough info still - using parameter sniffing or not.

Try calling the query inside the sproc with Use sp_executesql to execute the affected queries, forcing it to generate a plan for that area with the actual variables, and see if it's better. This may be your solution if you have to have this sort of irregular parameter value - create stored procs that run the affected parts and call them later from within the stored procedure - after the variable has received a typical value.

Without seeing a better representation of the actual code, it is hard to see what the problem is. Hopefully this info will help -

Database table insert locks from a multi threaded application

7 votes

I have a process that is running multi threaded.

Process has a thread safe collection of items to process.

Each thread processes items from the collection in a loop.

Each item in the list is sent to a stored procedure by the thread to insert data into 3 tables in a transaction (in sql). If one insert fails, all three fails. Note that the scope of transaction is per item.

The inserts are pretty simple, just inserting one row (foreign key related) into each table, with identity seeds. There is no read, just insert and then move on to the next item.

If I have multiple threads trying to process their own items each trying to insert into the same set of tables, will this create deadlocks, timeouts, or any other problems due to transaction locks?

I know I have to use one db connection per thread, i'm mainly concerned with the lock levels of tables in each transaction. When one thread is inserting rows into the 3 tables, will the other threads have to wait? There is no dependency of rows per table, except the auto identiy needs to be incremented. If it is a table level lock to increment the identity, then I suppose other threads will have to wait. The inserts may or may not be fast sometimes. If it is going to have to wait, does it make sense to do multithreading?

The objective for multithreading is to speed up the processing of items.

Please share your experience.

PS: Identity seed is not a GUID.

In SQL Server multiple inserts into a single table normally do not block each other on their own. The IDENTITY generation mechanism is highly concurrent so it does not serialize access. Inserts may block each other if they insert the same key in an unique index (one of them will also hit a duplicate key violation if both attempt to commit). You also have a probability game because keys are hashed, but it only comes into play in large transactions, see %%LOCKRES%% COLLISION PROBABILITY MAGIC MARKER: 16,777,215. If the transaction inserts into multiple tables also there shouldn't be conflicts as long as, again, the keys inserted are disjoint (this happens naturally if the inserts are master-child-child).

That being said, the presence of secondary indexes and specially the foreign keys constraints may introduce blocking and possible deadlocks. W/o an exact schema definition is impossible to tell wether you are or are not susceptible to deadlocks. Any other workload (reports, reads, maintenance) also adds to the contention problems and can potentially cause blocking and deadlocks.

Really really really high end deployments (the kind that don't need to ask for advice on forums...) can suffer from insert hot spot symptoms, see Resolving PAGELATCH Contention on Highly Concurrent INSERT Workloads

BTW, doing INSERTs from multiple threads is very seldom the correct answer to increasing the load throughput. See The Data Loading Performance Guide for good advice on how to solve that problem. And one last advice: multiple threads are also seldom the answer to making any program faster. Async programming is almost always the correct answer. See AsynchronousProcessing and BeginExecuteNonQuery.

As a side note:

just inserting one row (foreign key related) into each table, ... There is no read,

This statement is actually contradicting itself. Foreign keys implies reads, since they must be validated during writes.

SQL query for asp.net grid pagination

7 votes

I am using iBatis and SQLServer,

What is the best way for using offset and limit for pagging queries?

Maybe I add the column ROW_NUMBER() OVER (ORDER BY Id) AS RowNum, but this will only prevent data access for simple queries. There is some cases that I use union of selects. How to optmize these queries?

I don't know anything about ibatis, but I guess you could do this in SQL.

If I understand you correctly, you want to get paginate the results of a select statement or union of a few select statements.

I'd do it the following way. This could be a stored procedure for example, and there should probably be some sanity checking in there the check the values of offset and limit are greater than 0. If you do end up doing something like this, make sure you replace * with your column names too!

Here is an example with a union:

DECLARE @offset INT;
DECLARE @limit INT;

WITH cte
     AS (SELECT t.*,
                Row_number() OVER (ORDER BY Id) AS RowNum
         FROM   (SELECT *
                 FROM   Table1
                 UNION
                 SELECT *
                 FROM   Table2) t)
SELECT *
FROM   cte
WHERE  RowNum BETWEEN @offset AND @offset + @limit

Essentially what I've done is derived a new table from the union of two queries, as you said could happen in your case. I'm then adding a column with the row number to the result of that in a CTE, then only selecting the rows specified in @Offset and @limit + @offset to get back only the rows you've asked for.

E.g. Setting @offset = 50 and @limit = 50, you'd get back results 50-100 (as ordered by the criteria specified in the Row_number over clause.

(I hope this was the sort of thing you were looking for!)

Edit: This will only work in SQL Server 2005 onwards - you haven't mentioned which version you're using!

SQL Inner Join. ON condition vs WHERE clause

7 votes

I am busy converting a query using the old style syntax to the new join syntax. The essence of my query is as follows :

Original Query

SELECT i.*  
FROM 
  InterestRunDailySum i, 
  InterestRunDetail ird, 
  InterestPayments p
WHERE 
   p.IntrPayCode = 187
   AND i.IntRunCode = p.IntRunCode AND i.ClientCode = p.ClientCode
   AND ird.IntRunCode = p.IntRunCode AND ird.ClientCode = p.ClientCode

New Query

SELECT  i.*
  FROM InterestPayments p
    INNER JOIN InterestRunDailySum i 
      ON (i.IntRunCode = p.IntRunCode AND i.ClientCode = p.ClientCode)
    INNER JOIN InterestRunDetail ird 
      ON (ird.IntRunCode = p.IntRunCode AND ird.IntRunCode = p.IntRunCode)
  WHERE 
    p.IntrPayCode = 187

In this example, "Original Query" returns 46 rows, where "New Query" returns over 800

Can someone explain the difference to me? I would have assumed that these queries are identical.

The problem is with your join to InterestRunDetail. You are joining on IntRunCode twice.

The correct query should be:

SELECT  i.*
  FROM InterestPayments p
    INNER JOIN InterestRunDailySum i 
      ON (i.IntRunCode = p.IntRunCode AND i.ClientCode = p.ClientCode)
    INNER JOIN InterestRunDetail ird 
      ON (ird.IntRunCode = p.IntRunCode AND ird.ClientCode = p.ClientCode)
  WHERE 
    p.IntrPayCode = 187

How to remove duplicating rows from union statement

7 votes

OK - I have looked and looked and found a lot of examples but nothing quite meeting my need. Maybe I used the wrong words to search with, but I could use your help. I will provide as much detail as I can.

I need to produce a report that merges fields from two tables, or rather a view and a table, into one table for a report. Here is the statement I am using:

SELECT A.ConfInt, A.Conference, 
        NULL as Ordered,
        NULL as Approved,
        NULL as PickedUp,
        SUM(dbo.Case_Visit_Payments.Qty) AS Qty
FROM         dbo.Conferences as A INNER JOIN
                      dbo.Case_Table ON A.ConfInt = dbo.Case_Table.Conference_ID INNER JOIN
                      dbo.Case_Visit_Payments ON dbo.Case_Table.Case_ID = dbo.Case_Visit_Payments.Case_ID
WHERE     (dbo.Case_Visit_Payments.Item_ID = 15 AND A.ProjectCool = 1)
GROUP BY A.Conference, A.ConfInt
UNION
SELECT  B.ConfInt, 
        B.Conference, 
        SUM(dbo.Cool_Fan_Order.NumberOfFansRequested) AS Ordered, 
        SUM(dbo.Cool_Fan_Order.Qty_Fans_Approved) AS Approved, 
        SUM(dbo.Cool_Fan_Order.Qty_Fans_PickedUp) AS PickedUp, 
        NULL AS Qty
FROM         dbo.Conferences as B LEFT OUTER JOIN
                      dbo.Cool_Fan_Order ON B.ConfInt = dbo.Cool_Fan_Order.Conference_ID
where B.ProjectCool = 1
GROUP BY B.Conference, B.ConfInt

And here are the results:

4   Our Lady        NULL    NULL    NULL    11
4   Our Lady        40      40      40      NULL
7   Holy Rosary     20      20      20      NULL
11  Little Flower   NULL    NULL    NULL    21
11  Little Flower   5       5       20      NULL
19  Perpetual Help  NULL    NULL    NULL    2
19  Perpetual Help  20      20      20      NULL

What I would strongly prefer is to not have the duplicating rows, such as:

4   Our Lady        40      40      40      11
7   Holy Rosary     20      20      20      NULL
11  Little Flower   5       5       20      21
19  Perpetual Help  20      20      20      2

I hope this question was clear enough. Any Suggestions would be greatly appreciated. And I do mark as answered. :)

Gregory

The quick answer is to wrap your query inside another one,

SELECT ConfInt
     , Conference
     , SUM(Ordered) AS Ordered
     , SUM(Approved) As Approved
     , SUM(PickedUp) AS PickedUp
     , SUM(Qty) AS Qty
  FROM (

       <your UNION query here>  

      )
 GROUP BY ConfInt, Conference

This is not the only way to achieve the result set, but its the quickest fix to meet the specified requirements.

As an alternative, I believe these queries will return equivalent results:

We could use a correlated subquery in the SELECT list to get Qty:

;WITH q AS
      ( SELECT B.ConfInt
             , B.Conference
             , SUM(o.NumberOfFansRequested) AS Ordered
             , SUM(o.Qty_Fans_Approved) AS Approved
             , SUM(o.Qty_Fans_PickedUp) AS PickedUp
          FROM dbo.Conferences as B
          LEFT
          JOIN dbo.Cool_Fan_Order o ON o.Conference_ID = B.ConfInt
         WHERE B.ProjectCool = 1
         GROUP BY B.ConfInt, B.Conference
      )
 SELECT q.ConfInt
      , q.Conference
      , q.Ordered
      , q.Approved
      , q.PickedUp
      , ( SELECT SUM(v.Qty)
            FROM dbo.Case_Table t
            JOIN dbo.Case_Visit_Payments v ON v.Case_ID = t.Case_ID
           WHERE t.Conference_ID = q.ConfInt
             AND v.Item_ID = 15
        ) AS Qty
   FROM q
  ORDER BY q.ConfInt, q.Conference

Or, we could use LEFT JOIN operation on the two queries, rather than UNION. (We know that the query referencing Cool_Fan_Order can be the LEFT side of the outer join, because we know that it returns at least as many rows as the other query. (Basically, we know that the other query can't return values of ConfInt and Conference that aren't in the Cool_Fan_Order query.)

;WITH p AS 
      ( SELECT A.ConfInt
             , A.Conference
             , SUM(v.Qty) AS Qty
          FROM dbo.Conferences as A
          JOIN dbo.Case_Table t ON t.Conference_ID = A.ConfInt
          JOIN dbo.Case_Visit_Payments v ON v.Case_ID = t.Case_ID
         WHERE A.ProjectCool = 1
           AND v.Item_ID = 15
         GROUP BY A.ConfInt, A.Conference
      )
    , q AS
      ( SELECT B.ConfInt
             , B.Conference
             , SUM(o.NumberOfFansRequested) AS Ordered
             , SUM(o.Qty_Fans_Approved) AS Approved
             , SUM(o.Qty_Fans_PickedUp) AS PickedUp
          FROM dbo.Conferences as B
          LEFT
          JOIN dbo.Cool_Fan_Order o ON B.ConfInt = o.Conference_ID
         WHERE B.ProjectCool = 1
         GROUP BY B.ConfInt, B.Conference
      )
 SELECT q.ConfInt
      , q.Conference
      , q.Ordered
      , q.Approved
      , q.PickedUp
      , p.Qty
   FROM q
   LEFT
   JOIN p ON p.ConfInt = q.ConfInt AND p.Conference = q.Conference
  ORDER BY q.ConfInt, q.Conference

The choice between those three (they all return an equivalent resultset under all conditons), boils down to readability and maintainability, and performance. On large enough rowsets, there may be some observable performance differences between the three statements.

How To Slow Down A SQL Query?

7 votes

As strange as it sounds I need to slow down a SQL query. Currently I'm using Microsoft SQL Server 2008 R2 on an in-house development server with the AdventureWorks database. I'm in the process of testing some code and the queries that I'm running are too fast no matter what I try!

Basically I'm testing a cut-off feature and need a sufficiently long query to be able to cut it off before it completes.

Unfortunately as it is a local installation there isn't a single query or large enough table in the AdventureWorks database to actually give me good data to work with. I've tried

WAITFOR DELAY '01:00'

Which worked great to just test to make sure it was working, but now I need to test to see if I can cut the data stream off mid-read. The WAITFOR statement doesn't do me justice in that respect because I need it to actively be retrieving data back from the server. My first intuition was to use convoluted calculations to slow it down, however even having SQL server multiply all the numerical values in the query by themselves 37 times only slowed down the query by milliseconds. The second thing I tried was embedding the WAITFOR statement in a sub-query but it appears you can't do that. Finally, the only thing I haven't tried is to execute multiple stored procedures and WAITFOR in between them, but I don't think that would work for what I need.

I have to say, I'm impressed at how hard it is to make an absolutely terrible query when you're this close to the server.

Is there any way I can slow down a query easily?

Thank you!

Just do a load of cross joins.

SELECT T1.*
FROM SomeTable T1,  
     SomeTable T2,  
     SomeTable T3,  
     SomeTable T4

For a 1,000 row table that will generate 1,000 billion rows which should be plenty slow enough.

How do I select X random rows while guaranteeing that Y certain specific rows are in the result?

6 votes

Assume a table structure:

Create Table Question
{
ID int pk,
Category varchar
Stem varchar,
AnswerA varchar,
...
AnswerD varchar,
Correct char,
isMandatory bit

}

For a given category, there are approximately 50 questions. There can be 1-10 mandatory questions.

I need to select all mandatory questions, and then enough other questions at random to make a question set of 20 questions.

Ok how about this

select top 20 * from question
where category = @category
order by isMandatory desc, newid()

See accepted answer for reasoning behind newid() Random record from a database table (T-SQL)

Will SQL update a record if the new values are the same?

6 votes

Will SQL update a record if there is no change to the record?

For examople, is it more efficient to replace

UPDATE TABLE myTable 
Set Col1 = ISNULL(Col1,'')
...
Set Col100 = ISNULL(Col30,'')

with

UPDATE TABLE myTable 
Set Col1 = ISNULL(Col1,'')
...
Set Col100 = ISNULL(Col30,'')
WHERE 
Col1 IS NULL OR
...
Col30 IS NULL

Yes, it will attempt overwrite.

How to update conflict resolver when upgrading from SQL-Server 2005 to SQL-Server 2008

6 votes

We have recently upgraded from SQL Server 2005 to SQL Server 2008 (R2, SP1). This upgrade included some publications, where all tables are published with a default conflict resolver based on the "later wins" principle. Its smart name is 'Microsoft SQL Server DATETIME (Later Wins) Conflict Resolver', and the corresponding dll file is ssrmax.dll.

As you all know, once a table is published with a conflict resolver, the same conflict resolver must be used in all later publications using this table. Fair enough, but, when adding previously published tables to new publications, and specifying the very same conflict resolver to be used for this table, we are getting an error message:

use [myDb]
exec sp_addmergearticle 
    @publication = N'myDb_Pub', 
    @article = N'Tbl_blablabla', 
    @source_owner = N'dbo', 
    @source_object = N'Tbl_blablabla', 
    @type = N'table', 
    @description = N'', 
    @creation_script = N'', 
    @pre_creation_cmd = N'drop', 
    @schema_option = 0x000000000C034FD1, 
    @identityrangemanagementoption = N'none', 
    @destination_owner = N'dbo', 
    @force_reinit_subscription = 1, 
    @column_tracking = N'false', 
    @article_resolver = N'Microsoft SQL Server DATETIME (Later Wins) Conflict Resolver', 
    @subset_filterclause = N'', 
    @resolver_info = N'ddmaj', 
    @vertical_partition = N'false', 
    @verify_resolver_signature = 0, 
    @allow_interactive_resolver = N'false', 
    @fast_multicol_updateproc = N'true', 
    @check_permissions = 0, 
    @subscriber_upload_options = 0, 
    @delete_tracking = N'true', 
    @compensate_for_errors = N'false', 
    @stream_blob_columns = N'false', 
    @partition_options = 0
GO

And this is the error we get:

The article '...' already exists in another publication with a different article resolver.

By trying to understand how the same conflict resolver is not considered by the machine as 'the same conflict resolver', I discovered that there were two conflict resolvers with the same name, different versions, in the registry:

the 2005 version:

  • file ssrmax.dll,
  • version 2005.90.4035.0,
  • cls_id D604B4B5-686B-4304-9613-C4F82B527B10

the 2008 version:

  • file ssrmax.dll,
  • version 2009.100.2500.0,
  • cls_id 77209412-47CF-49AF-A347-DCF7EE481277

And I checked that our 2008 server is considering the second one as the 'available custom resolver' (I got this by running sp_enumcustomresolvers). The problem is that both references are available in the registry, so I guess that old publications do refer to the 2005 version, while new publications try to refere to the 2008 version, which is indeed different from the previous one.

So the question is: how can I have the server consider only one of these 2 versions, and this (of course) without having to drop and recreate the existing publications (which would turn our life into hell for the next 2 weeks).

Well .. so nobody got an answer. But I think I (finally) got it. Guess what... it is somewhere in the metamodel (as usual)!

  • When adding an item to the subscription, the new conflict resolver references to be used by the stored procedure come from the [distribution].[MSmerge_articleresolver] table
  • But, for existing subscriptions, previous conflict resolver references are stored in the system tables of the publishing database, ie [sysmergearticles], [sysmergeextendedarticlesview], and [sysmergepartitioninfoview]

So we have on one side an item initialy published with SQLSERVER 2005, where the publication references the 2005 conflict resolver, as per the publishing database metamodel. On the other side, the machine will attempt to add the same item to a new publication, this time with a default reference to the conflict resolver available in the distibution database, which is indeed different from the 2005 one ....

To illustrate this, you can check the following

USE distribution
go
SELECT article_resolver, resolver_clsid
  FROM [MSmerge_articleresolver] WHERE article_resolver like '%Later Wins%'
  GO

Then,

USE myPublicationDatabase
go
SELECT article_resolver, resolver_clsid
  FROM [sysmergearticles] WHERE article_resolver like '%Later Wins%'
  GO
 SELECT article_resolver, resolver_clsid
  FROM [sysmergeextendedarticlesview] WHERE article_resolver like '%Later Wins%'
  GO
 SELECT article_resolver, resolver_clsid
  FROM [sysmergepartitioninfoview] WHERE article_resolver like '%Later Wins%'
  GO  

So it seems that I should update either the references in the distribution database or the references in the publication database. Let's give it a try!

Finding repeated occurrences with ranking functions

6 votes

Please help me generate the following query i've been struggling with for some time now. Lets' say I have a simple table with month number and information whether there were any failed events in this particular month

Below a script to generate sample data:

WITH DATA(Month, Success) AS
(
    SELECT  1, 0 UNION ALL
    SELECT  2, 0 UNION ALL
    SELECT  3, 0 UNION ALL
    SELECT  4, 1 UNION ALL
    SELECT  5, 1 UNION ALL
    SELECT  6, 0 UNION ALL
    SELECT  7, 0 UNION ALL
    SELECT  8, 1 UNION ALL
    SELECT  9, 0 UNION ALL
    SELECT 10, 1 UNION ALL
    SELECT 11, 0 UNION ALL
    SELECT 12, 1 UNION ALL
    SELECT 13, 0 UNION ALL
    SELECT 14, 1 UNION ALL
    SELECT 15, 0 UNION ALL
    SELECT 16, 1 UNION ALL
    SELECT 17, 0 UNION ALL
    SELECT 18, 0
)

Given the definition of a "repeated failure ":

When event failure occurs during at least 4 months in any 6 months period then the last month with such failure is a "repeated failure" my query should return the following output

Month   Success RepeatedFailure
1       0   
2       0   
3       0   
4       1   
5       1   
6       0       R1
7       0       R2
8       1   
9       0   
10      1   
11      0       R3
12      1   
13      0   
14      1   
15      0   
16      1   
17      0
18      0       R1

where:

  • R1 -1st repeated failure in month no 6 (4 failures in last 6 months).
  • R2 -2nd repeated failure in month no 7 (4 failures in last 6 months).
  • R3 -3rd repeated failure in month no 11 (4 failures in last 6 months).

R1 -again 1st repeated failure in month no 18 because Repeated Failures should be again numbered from the beginning when new Repeated Failure occurs for the first time in last 6 reporting periods

Repeated Failures are numerated consecutively because based on its number i must apply appropriate multiplier:

  • 1st repated failure - X2
  • 2nd repeated failure - X4
  • 3rd and more repeated failure -X5.

I'm sure this can be improved, but it works. We essentially do two passes - the first to establish repeated failures, the second to establish what kind of repeated failure each is. Note that Intermediate2 can definitely be done away with, I've only separated it out for clarity. All the code is one statement, my explanation is interleaved:

;WITH DATA(Month, Success) AS
-- assuming your data  as defined (with my edit)
,Intermediate AS 
(
SELECT
    Month,
    Success,
    -- next column for illustration only
    (SELECT SUM(Success) 
     FROM DATA hist 
     WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS SuccessesInLastSixMonths,
    -- next column for illustration only
    6 - (SELECT SUM(Success) 
     FROM DATA hist 
     WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS FailuresInLastSixMonths,
    CASE WHEN 
            (6 - (SELECT SUM(Success) 
                    FROM DATA hist 
                    WHERE curr.Month - hist.Month BETWEEN 0 AND 5)) 
            >= 4 
            THEN 1
            ELSE 0 
    END AS IsRepeatedFailure
FROM DATA curr 
-- No real data until month 6
WHERE curr.Month > 5
)

At this point we have established, for each month, whether it's a repeated failure, by counting the failures in the six months up to and including it.

,Intermediate2 AS
(
SELECT 
    Month,
    Success,
    IsRepeatedFailure,
    (SELECT SUM(IsRepeatedFailure) 
        FROM Intermediate hist 
        WHERE curr.Month - hist.Month BETWEEN 0 AND 5) 
        AS RepeatedFailuresInLastSixMonths
FROM Intermediate curr
)

Now we have counted the number of repeated failures in the six months leading up to now

SELECT
    Month,
    Success,
    CASE IsRepeatedFailure 
        WHEN 1 THEN 'R' + CONVERT(varchar, RepeatedFailuresInLastSixMonths) 
        ELSE '' END
    AS RepeatedFailureText
FROM Intermediate2

so we can say, if this month is a repeated failure, what cardinality of repeated failure it is.

Result:

Month       Success     RepeatedFailureText
----------- ----------- -------------------------------
6           0           R1
7           0           R2
8           1           
9           0           
10          1           
11          0           R3
12          1           
13          0           
14          1           
15          0           
16          1           
17          0           
18          0           R1

(13 row(s) affected)

Performance considerations will depend on on how much data you actually have.

Main table with hundreds vs few smaller

5 votes

I was wondering which approach is better for designing databases?

I have currently one big table (97 columns per row) with references to lookup tables where I could.

Wouldn't it be better for performance to group some columns into smaller tables and add them key columns for referencing one whole row?

If you split up your table into several parts, you'll need additional joins to get all your columns for a single row - that will cost you time.

97 columns isn't much, really - I've seen way beyond 100.

It all depends on how your data is being used - if your row just has 97 columns, all the time, and needs to 97 columns - then it really hardly ever makes sense to split those up into various tables.

It might make sense if:

  • you can move some "large" columns (like XML, VARCHAR(MAX) etc.) into a separate table, if you don't need those all the time -> in that case, your "basic" row becomes smaller and your basic table will perform better - as long as you don't need those extra large column

  • you can move away some columns to a separate table that aren't always present, e.g. columns that might be "optional" and only present for e.g. 20% of the rows - in that case, you might save yourself some processing for the remaining 80% of the cases where those columns aren't needed.

Join tables ON A ∩ B conditions

5 votes

I wish to join two (or more) tables based on a set of columns that are present in all tables that take part in the join. In other words I wish to join tables based on the intersection set of columns. However each table has extra columns which are unique to that table.

Intersection set

Example

#: A number
-: NULL

Table A

+------+------+------+
| Col1 | Col2 | ColA |
+------+------+------+
| A    | A    | #    |
| A    | B    | #    |
+------+------+------+

Table B

+------+------+------+
| Col1 | Col2 | ColB |
+------+------+------+
| A    | A    | #    |
| B    | B    | #    |
+------+------+------+

Result

+------+------+------+------+
| Col1 | Col2 | ColA | ColB |
+------+------+------+------+
| A    | A    | #    | #    |
| A    | B    | #    | -    |
| B    | B    | -    | #    |
+------+------+------+------+

I've come up with a solution but the performance is horrid, performance is an issue. I don't want to pollute you with that solution. I'd much rather have a fresh set of eyes on this :)

Looking forward to seeing your solutions. Thank you for your time. It's much appreciated.

UPDATE

Thank you for all the responds. However it seems I didn't explain the problem well enough. (Haven't tested all answers yet)

But note how Table B has a row that's not present in Table A.

Table B
+------+------+------+
| Col1 | Col2 | ColB |
+------+------+------+
| B    | B    | #    |
+------+------+------+

And Table A vice versa.

The solution I've come up with unions all tables together on the intersection set of columns to generate a skeleton.

Skeleton:    
SELECT Col1, Col2 FROM TableA
UNION
SELECT Col1, Col2 FROM TableB

Once I have the skeleton I LEFT OUTER JOIN for each table.

LEFT OUTER JOIN TableA AS a ON a.Col1=skeleton.Col1 AND a.Col2=skeleton.Col2
LEFT OUTER JOIN TableB AS b ON b.Col1=skeleton.Col1 AND b.Col2=skeleton.Col2

So the final query looks like this

SELECT s.*, a.ColA, b.ColB 
FROM 
(
    SELECT Col1, Col2
    FROM TableA
    UNION
    SELECT Col1, Col2
    FROM TableB
) s
LEFT OUTER JOIN TableA a ON a.Col1=s.Col1 AND a.Col2=s.Col2
LEFT OUTER JOIN TableB b ON b.Col1=s.Col1 AND b.Col2=s.Col2

Just the thing for a full outer join:

select  coalesce(a.Col1, b.Col1) as Col1
.       coalesce(a.Col2, b.Col2) as Col2
,       a.ColA
,       b.ColB
from    A a
full outer join
        B b
on      a.Col1 = b.Col1
        and a.COl2 = b.Col2

5 votes

Is there a keyword or metainformation in SQL Server that'll tell you if TOP took effect?

EX:
Select TOP 5 * From Stuff

RESULT: 5 rows

What is the best way to determine if there would have been 6 or more?

I could do:
SELECT TOP 6 count(*) FROM Stuff

But I am concerned about a separate call to retrieve the count because there actual query is much more complicated than this one and on a large table.

Thanks!

Well, you could select the top N+1 (where N in your example is 5, so in your example select the top 6) and discard the last one in your client code, and use the presence of a sixth element to determine if TOP would have had an effect had you used N in the first place. I am not sure there is much value of doing this, however.

SQL SELECT using XML input

4 votes

I've currently got a C# application that responds to HTTP requests. The body of the HTTP request (XML) is passed to SQL Server, at which time the database engine performs the correct instruction. One of the instructions is used to load information about Invoices using the id of the customer(InvoiceLoad):

<InvoiceLoad ControlNumber="12345678901">
   <Invoice>
      <CustomerID>johndoe@gmail.com</CustomerID>
   </Invoice>
</InvoiceLoad>  

I need to perform a SELECT operation against the invoice table (which contains the associated email address).

I've tried using:

SELECT 'Date', 'Status', 'Location' 
FROM Invoices 
WHERE Email_Address = Invoice.A.value(.)  

using an xml.nodes('InvoiceLoad/Invoice/CustomerId') Invoice(A)

command.

However, as this query may run THOUSANDS of times per minute, I want to make it as fast as possible. I'm hearing that one way to do this may be to use CROSS APPLY (which I have never used). Is that the solution? If not, how exactly would I go about making this query as fast as possible? Any and all suggestions are greatly appreciated!

I don't see why you would need a call to .nodes() at all - from what I understand, each XML fragment has just a single entry - right?

So given this XML:

<InvoiceLoad ControlNumber="12345678901">
   <Invoice>
      <CustomerID>johndoe@gmail.com</CustomerID>
   </Invoice>
</InvoiceLoad>  

you can use this SQL to get the value of the <CustomerID> node:

DECLARE @xmlvar XML

SET @xmlvar = '<InvoiceLoad ControlNumber="12345678901">
   <Invoice>
      <CustomerID>johndoe@gmail.com</CustomerID>
   </Invoice>
</InvoiceLoad>'

SELECT 
   @xmlvar.value('(/InvoiceLoad/Invoice/CustomerID)[1]', 'varchar(100)') 

and you can join this against your customer table or whatever you need to do.

If you have the XML stored in a table, and you always need to extract that value from <CustomerID>, you could also think about creating a computed, persisted column on that table that would extract that e-mail address into a separate column, which you could then use for easy joining. This requires a little bit of work - a stored function taking the XML as input - but it's really quite a nice way to "surface" certain important snippets of data from your XML.

Step 1: create your function

CREATE FUNCTION dbo.ExtractCustomer (@input XML)
RETURNS VARCHAR(255)
WITH SCHEMABINDING
AS BEGIN
    DECLARE @Result VARCHAR(255)

    SELECT 
        @Result = @Input.value('(/InvoiceLoad/Invoice/CustomerID)[1]', 'varchar(255)') 

    RETURN @result
END

So given your XML, you get the one <CustomerID> node and extract its "inner text" and return it as a VARCHAR(255).

Step 2: add a computed, persisted column to your table

ALTER TABLE dbo.YourTableWithTheXML
ADD CustomerID AS dbo.ExtractCustomer(YourXmlColumnHere) PERSISTED

Now, your table that has the XML column has a new column - CustomerID - which will automagically contain the contents of the <CustomerID> as a VARCHAR(255). The value is persisted, i.e. as long as the XML doesn't change, it doesn't have to be re-computed. You can use that column like any other on your table, and you can even index it to speed up any joins on it!