Best sql-server questions in March 2012

What is the recommended way to build functionality similar to Stackoverflow's "Inbox"?

11 votes

I have an asp.net-mvc website and people manage a list of projects. Based on some algorithm, I can tell if a project is out of date. When a user logs in, i want it to show the number of stale projects (similar to when i see a number of updates in the inbox).

The algorithm to calculate stale projects is kind of slow so if everytime a user logs in, i have to:

  1. Run a query for all project where they are the owner
  2. Run the IsStale() algorithm
  3. Display the count where IsStale = true

My guess is that will be real slow. Also, on everything project write, i would have to recalculate the above to see if changed.

Another idea i had was to create a table and run a job everything minutes to calculate stale projects and store the latest count in this metrics table. Then just query that when users log in. The issue there is I still have to keep that table in sync and if it only recalcs once every minute, if people update projects, it won't change the value until after a minute.

Any idea for a fast, scalable way to support this inbox concept to alert users of number of items to review ??

The first step is always proper requirement analysis. Let's assume I'm a Project Manager. I log in to the system and it displays my only project as on time. A developer comes to my office an tells me there is a delay in his activity. I select the developer's activity and change its duration. The system still displays my project as on time, so I happily leave work.

How do you think I would feel if I receive a phone call at 3:00 AM from the client asking me for an explanation of why the project is no longer on time? Obviously, quite surprised, because the system didn't warn me in any way. Why did that happen? Because I had to wait 30 seconds (why not only 1 second?) for the next run of a scheduled job to update the project status.

That just can't be a solution. A warning must be sent immediately to the user, even if it takes 30 seconds to run the IsStale() process. Show the user a loading... image or anything else, but make sure the user has accurate data.

Now, regarding the implementation, nothing can be done to run away from the previous issue: you will have to run that process when something that affects some due date changes. However, what you can do is not unnecessarily run that process. For example, you mentioned that you could run it whenever the user logs in. What if 2 or more users log in and see the same project and don't change anything? It would be unnecessary to run the process twice.

Whatsmore, if you make sure the process is run when the user updates the project, you won't need to run the process at any other time. In conclusion, this schema has the following advantages and disadvantages compared to the "polling" solution:

Advantages

  • No scheduled job
  • No unneeded process runs (this is arguable because you could set a dirty flag on the project and only run it if it is true)
  • No unneeded queries of the dirty value
  • The user will always be informed of the current and real state of the project (which is by far, the most important item to address in any solution provided)

Disadvantages

  • If a user updates a project and then upates it again in a matter of seconds the process would be run twice (in the polling schema the process might not even be run once in that period, depending on the frequency it has been scheduled)
  • The user who updates the project will have to wait for the process to finish

Changing to how you implement the notification system in a similar way to StackOverflow, that's quite a different question. I guess you have a many-to-many relationship with users and projects. The simplest solution would be adding a single attribute to the relationship between those entities (the middle table):

Cardinalities: A user has many projects. A project has many users

That way when you run the process you should update each user's Has_pending_notifications with the new result. For example, if a user updates a project and it is no longer on time then you should set to true all users Has_pending_notifications field so that they're aware of the situation. Similarly, set it to false when the project is on time (I understand you just want to make sure the notifications are displayed when the project is no longer on time).

Taking StackOverflow's example, when a user reads a notification you should set the flag to false. Make sure you don't use timestamps to guess if a user has read a notification: logging in doesn't mean reading notifications.

Finally, if the notification itself is complex enough, you can move it away from the relationship between users and projects and go for something like this:

Cardinalities: A user has many projects. A project has many users. A user has many notifications. A notifications has one user. A project has many notifications. A notification has one project.

I hope something I've said has made sense, or give you some other better idea :)

Comparing DateTime structs to find free slots

10 votes

I would like to search through the events of all of the users in a list and retrieve all times where every user is free of 30mins or greater between 7AM-7PM.

There is a catch however, if a method is marked as 'recurring', ie the bit recurring is set to 1, then that event recurs for a period of 52 weeks after its beginning (so the time is not available). Retrieval of these events are taken care of in a stored procedure.

My code so far is below. Am I going about writing this procedure the right way? I'm not really sure how to proceed to get the function to return as I would like. Would anyone be able to help me with this?

List<string> usernames = //List of usernames.
DateTime start = //DateTime for start of period you would like to schedule meeting
DateTime end = //DateTime for end of period
//int mins = //duration of meeting (must be 30mins or greater)

foreach (string username in usernames) {
   //retrieve events for this user
    var db = Database.Open("mPlan");
    List<DateTime> startTimes;
    List<DateTime  endTimes;
    // This stored procedure returns all events of a user in a given time period, 
    // including recurring events.
    var record = db.Query("EXEC dbo.GetEvents @0, @1, @2", username, start, end);
    foreach(var record in result) {
          startTimes.Add(record.event_start);
          endTimes.Add(record.event_end);
    }
    // so now I have a list of all start times and end times of events
    // for one user and could save all this data in a list
  }

Edit :

Table structure :

DECLARE @Users TABLE
(    
    UserID   INT IDENTITY(1,1),
    Username VARCHAR(32)
);

DECLARE @Groups TABLE
(
    GroupID   INT IDENTITY(1,1),
    GroupName VARCHAR(32)
);

DECLARE @Membership TABLE
(
    UserID  INT,
    GroupID INT
);

DECLARE @event TABLE
(
    event_id    INT IDENTITY(1,1),
    event_start DATETIME,
    event_end   DATETIME,
    group_id    INT,
    recurring   BIT
);

Example of functionality I would like :

User adds multiple users from the database to a list. User selects a time period over which he would like to have a meeting with all of these users. My algorithm computes all time periods that are free for all users (i.e a times that would be suitable to have a meeting between all users and are >30mins ).

Additional information :

Sample cases :

  • User A attempts to organize a meeting with User B. All timeslots are free. I would like the algorithm to return a DateTime start and DateTime end of all possible combinations of start times and end times that are >30mins and == duration ( a parameter ).

  • Typical case : User A has events planned for all times except 6pm - 7pm. He attempts to organize a meeting with user B for duration of 1 hour. User B has no events organized - the DateTime 6PM and DateTime 7pm are returned to indicate the start and end time of meetings.

  • Recurring case : User A has a recurring event at 5pm-6pm on a Monday. He tries to organize a meeting of 2 hours on a monday in six weeks time. All combinations of DateTime start and DateTime end where there is a difference of 2 hours are returned. The time 5pm-7pm is not returned, since this event is recurring and occurs every week for 52 weeks.

Edit :

Here is the stored procedure which retrieves all of a users events for a set time period (start, end) :

ALTER PROCEDURE dbo.GetEvents @UserName VARCHAR(50), @StartDate DATETIME, @EndDate DATETIME 
AS 

BEGIN 
-- DEFINE A CTE TO GET ALL GROUPS ASSOCIATED WITH THE CURRENT USER 
;WITH Groups AS  
(   SELECT  GroupID  
    FROM    Membership  m 
            INNER JOIN Users u 
                ON m.UserID = u.UserID 
    WHERE   Username = @UserName 
    GROUP BY GroupID 
), 
-- DEFINE A CTE TO GET ALL EVENTS FOR THE GROUPS DEFINED ABOVE 
AllEvents AS 
(   SELECT  e.* 
    FROM    event e 
            INNER JOIN Groups m  
                ON m.GroupID = e.group_id 
    UNION ALL 
    SELECT  e.event_id, e.title, e.description, DATEADD(WEEK, w.weeks, e.event_start), DATEADD(WEEK, w.weeks, e.event_end), e.group_id, e.recurring 
    FROM    event e 
            INNER JOIN Groups m  
                ON m.GroupID = e.group_id 
            CROSS JOIN  
            (   SELECT  ROW_NUMBER() OVER (ORDER BY Object_ID) AS weeks 
                FROM    SYS.OBJECTS 
            ) AS w 
    WHERE  e.recurring = 1 
)    
-- GET ALL EVENTS WHERE THE EVENTS FALL IN THE PERIOD DEFINED 
SELECT  * 
FROM    AllEvents 
WHERE   Event_Start >= @StartDate 
AND     Event_End <= @EndDate 

END 

So imagine some tables:

USE tempdb;
GO

CREATE TABLE dbo.Users
(    
    UserID   INT IDENTITY(1,1),
    Username VARCHAR(32)
);

CREATE TABLE dbo.Groups
(
    GroupID   INT IDENTITY(1,1),
    GroupName VARCHAR(32)
);

CREATE TABLE dbo.Membership
(
    UserID  INT,
    GroupID INT
);

CREATE TABLE dbo.[event]
(
    event_id    INT IDENTITY(1,1),
    event_start DATETIME,
    event_end   DATETIME,
    group_id    INT,
    recurring   BIT
);

And imagine that some sample data weren't so difficult to provide:

INSERT dbo.Users(Username) 
    SELECT 'User A' UNION ALL SELECT 'User B';

INSERT dbo.Groups(GroupName) 
    SELECT 'Group 1' UNION ALL SELECT 'Group 2';

INSERT dbo.Membership(UserID, GroupID)
    SELECT 1,1 UNION ALL SELECT 2,2;


INSERT dbo.[event](event_start, event_end, group_id, recurring)
-- user A, almost all day meeting on a specific date
SELECT '20120313 07:00', '20120313 18:00', 1, 0 

-- user A, recurring meeting every Monday
UNION ALL SELECT '20120312 17:00', '20120312 18:00', 1, 1 

-- user A, recurring meeting every Tuesday (future)
UNION ALL SELECT '20120327 14:00', '20120327 15:00', 1, 1; 
GO

Now we can build this stored procedure:

CREATE PROCEDURE dbo.GetPossibleMeetingTimes
    @AskingUserID INT,
    @TargetUserID INT,
    @Duration     INT,           -- in minutes!
    @StartDate    SMALLDATETIME, -- assumes date, no time!
    @EndDate      SMALLDATETIME  -- again - date, no time!
AS
BEGIN
    SET NOCOUNT ON;

    ;WITH dRange(d) AS
    (
        -- get the actual dates in the requested range
        -- limited to number of rows in sys.objects

        SELECT TOP (DATEDIFF(DAY, @StartDate, @EndDate)+1) 
            DATEADD(DAY, n-1, @StartDate)
         FROM (SELECT n = ROW_NUMBER() OVER (ORDER BY [object_id])
          FROM sys.objects) AS x
    ), possible(ds, de) AS
    (
        -- get all the timeslots of @Duration minutes 
        -- between 7:00 AM and 7:00 PM for each day in 
        -- the range - these are all *potential* slots

        SELECT DATEADD(MINUTE, 30*rn, DATEADD(HOUR, 7, dRange.d)),
            DATEADD(MINUTE, 30*rn + @Duration, DATEADD(HOUR, 7, dRange.d))
        FROM (SELECT TOP (720/30) rn = ROW_NUMBER() OVER
        (ORDER BY [object_id])-1 FROM sys.objects) AS x
        CROSS JOIN dRange
    )
    SELECT p.ds, p.de FROM possible AS p 
    WHERE p.de <= DATEADD(HOUR, 19, DATEADD(DAY, DATEDIFF(DAY, 0, p.de), 0)) 
    AND NOT EXISTS 
    (
        SELECT 1 FROM 
        (
            -- filter down to users with events on the days in the range

            SELECT group_id, event_start, event_end
                FROM dbo.[event] 
                WHERE event_start >= @StartDate 
                AND event_start < DATEADD(DAY, 1, @EndDate)
            UNION ALL 

            -- also include users with recurring events on same weekday(s)
            -- normalized to the matching day in the range

            SELECT group_id, 
              event_start = DATEADD(DAY, DATEDIFF(DAY, event_start, p.ds), event_start),
              event_end   = DATEADD(DAY, DATEDIFF(DAY, event_end,   p.ds), event_end)
            FROM dbo.[event]
            WHERE recurring = 1 
            AND event_start <= DATEADD(DAY, 1, @EndDate) -- ignore future events
                    AND event_start >= DATEADD(WEEK, -52, @EndDate) -- 52 weeks out 
            AND DATEDIFF(DAY, event_start, p.ds) % 7 = 0 -- same weekday
        ) AS sub
        WHERE sub.group_id IN 
        (
            -- this checks that events are within previously scheduled times

            SELECT GroupID FROM dbo.Membership
              WHERE UserID IN (@AskingUserID, @TargetUserID)
              AND (p.de > sub.event_start AND p.ds < sub.event_end)
        )
    )
    ORDER BY p.ds, p.de;
END
GO

Example calls:

-- Case 1: User A tries to meet with User B on a day where 
-- both schedules are clear.

EXEC dbo.GetPossibleMeetingTimes
    @AskingUserID = 1,
    @TargetUserID = 2,
    @Duration     = 30,
    @StartDate    = '20120314', -- no events for either user
    @EndDate      = '20120314';

Results:

no events for either user

-- Case 2: User A tries to meet with User B for an hour, on 
-- a day where user A has meetings from 7 AM to 6 PM.

EXEC dbo.GetPossibleMeetingTimes
    @AskingUserID = 1,
    @TargetUserID = 2,
    @Duration     = 60,
    @StartDate    = '20120313', -- user A has an almost all-day event
    @EndDate      = '20120313';

Results:

user A is busy almost all day

-- Case 3: User A tries to meet with User B for two hours, on 
-- a weekday where User A has a recurring meeting from 5-6 PM

EXEC dbo.GetPossibleMeetingTimes
    @AskingUserID = 1,
    @TargetUserID = 2,
    @Duration     = 120,        
    @StartDate    = '20120319', -- user A has a recurring meeting
    @EndDate      = '20120319';

Results:

user A has a recurring meeting

Now note that I took care of several factors you either haven't considered or haven't mentioned (such as a recurring event that starts in the future). On the other hand, I also didn't deal with some other factors (e.g. daylight savings time, if it may impact this at all) and didn't test all possible scenarios (e.g. multiple events on the same day that already exist).

I did test that if you pass in a range (e.g. 2012-03-12 -> 2012-03-14) you will essentially just get a union of the above results with roughly the same time slots made available (these vary based on duration of course). The important part is that the blackout timeslots are honored. I did not test the logic for the case where a recurring event starts in the future and the provided date range includes that weekday both before and after the first instance of the event.

If any case doesn't work for you, then this is exactly why it's important that you show us all your cases using sample data, not word problems and also explain the desired results of the query given that data.

EDIT - to handle more than 2 users, you only need a few changes. If you add a split function as follows:

CREATE FUNCTION dbo.SplitInts( @List VARCHAR(MAX) )
RETURNS TABLE
AS
   RETURN 
   ( SELECT Item = CONVERT(INT, Item) FROM (
      SELECT Item = x.i.value('(./text())[1]', 'INT') FROM (
       SELECT [XML] = CONVERT(XML, '<i>' + REPLACE(@List, ',', '</i><i>') 
         + '</i>').query('.')) AS a CROSS APPLY [XML].nodes('i') AS x(i)) AS y
       WHERE Item IS NOT NULL
   );

Now very minor changes to the stored procedure (I've left out the unchanged bits):

ALTER PROCEDURE dbo.GetPossibleMeetingTimes
    @UserIDList   VARCHAR(MAX),  -- removed other two parameters
    @Duration     INT,           
    @StartDate    SMALLDATETIME, 
    @EndDate      SMALLDATETIME  
AS
...
        WHERE sub.group_id IN -- changed the code within this subquery
        (
            SELECT GroupID FROM dbo.Membership AS m
              INNER JOIN dbo.SplitInts(@UserIDList) AS i
              ON m.UserID = i.Item
              WHERE (p.de > sub.event_start AND p.ds < sub.event_end)
        )
...

So then your call just changes slightly to:

EXEC dbo.GetPossibleMeetingTimes
     @UserIDList = '1,2,3,4,5',
     @Duration   = 30,
     @StartDate  = '20120314',
     @EndDate    = '20120314';

Just make sure that the requester is included in the comma-separated list.

PS this addendum is untested.

How to implement uniqueness where the order of the fields does not matter

8 votes

I think the following example will explain the situation best. Let's say we have the following table structure:

-------------------------------------
Member1   int      NOT NULL (FK)(PK)
Member2   int      NOT NULL (FK)(PK)
-------------------------------------
Statust   char(1)  NOT NULL

Here are the table contents for the table:

Member1    Member2    Status
----------------------------
  100        105        A

My question is how do I implement uniqueness so that the following INSERT statement will FAIL based on that one row already in the table.

INSERT status_table (Member1,Member2,Status) VALUES(105,100,'D');

Basically, I'm trying to model a relationship between two members. The Status field is the same whether we have (100,105) or (105,100).

I know I could use a before_insert and before_update trigger to check the contents in the table. But I was wondering if there was a better way to do it... Should my database model be different...

If you can make sure that all applications/users store the members' IDs in least-to-greatest order (the least MemberID in Member1 and the greatest in Member2), then you could simply add a Check constraint:

ALTER TABLE Status_table
  ADD CONSTRAINT Status_table_Prevent_double_pairs
    CHECK (Member1 < Member2)

If you don't want to do that or you want that extra info to be stored (because you are storing (just an example) that "member 100 invited (liked, killed, whatever) member 150" and not vice versa), then you could use @Tegiri's approach, modified a little (multiplying two big enough integers would be an overflow problem otherwise):

CREATE TABLE Status_table
( Member1 INT NOT NULL
, Member2 INT NOT NULL
, Status CHAR(1) NOT NULL
, MemberOne  AS CASE WHEN Member1 < Member2 THEN Member1 ELSE Member2 END
          --- a computed column
, MemberTwo  AS CASE WHEN Member1 < Member2 THEN Member2 ELSE Member1 END
          --- and another one
, PRIMARY KEY (Member1, Member2)
, UNIQUE (MemberOne, MemberTwo)
, ...                                    --- FOREIGN KEY details, etc 
) ;

Good reference for SQL to use with SQL Server 2008

7 votes

I had been working on Oracle long time back and has exposure to SQL. Now I need to brush up my SQL skills for a project on SQL Server 2008. Could you please recommend a good resource (online/book)?

The "books online" help isn't too bad for SQL2008 i.e. usually several examples, shows the syntax clearly. You can download it separately.

"Timeout expired" error, when executing view in SQL Server 2008

7 votes

I've written a query in SQL Server 2008. The query takes about 4 minutes to execute.
I need this query as a View. So, I've created a view with this query and when I try to execute the view creation script, it shows the following error:

Timeout Expired.
The timeout period elapsed prior to completion of the operation or the server is not responding.

The query is:

SELECT t.jrnno,
       (SELECT SUM(t1.amount)
          FROM dbo.T_sh AS t1
         WHERE (t1.b_or_s = '1') AND (t1.jrnno = t.jrnno)) AS buy,
       (SELECT SUM(t2.amount)
          FROM dbo.T_sh AS t2
         WHERE (t2.b_or_s = '2') AND (t2.jrnno = t.jrnno)) AS sale, 
       SUM(t.amount) AS Total, 
       SUM(t.h_crg) AS Howla, 
       SUM(t.l_crg) AS Laga, 
       SUM(t.taxamt) AS Tax, 
       SUM(t.commsn) AS Commission
  FROM dbo.T_sh AS t
 WHERE (t.tran_type = 'S')  
   AND (t.jrnno NOT IN (SELECT DISTINCT jrnno
                          FROM dbo.T_ledger))
GROUP BY t.jrnno

T_sh and T_ledger both tables have about 100K rows. What could be the possible reason and how can I overcome this?

Update:

select 
   t.jrnno, 
   SUM(CASE WHEN t.b_or_s = 1 THEN t.amount ELSE NULL END) buy,
   SUM(CASE WHEN t.b_or_s = 2 THEN t.amount ELSE NULL END) sale,
   SUM(t.amount) AS Total, 
   SUM(t.h_crg) AS Howla, 
   SUM(t.l_crg) AS Laga, 
   SUM(t.taxamt) AS Tax, 
   SUM(t.commsn) AS Commission
FROM 
   dbo.t_sh t
WHERE  
   t.tran_type = 'S'
   AND NOT EXISTS(SELECT 1 FROM  dbo.T_ledger x where x.jrnno = t.jrnno)
group by 
   t.jrnno

It solved my problem. Thanks everyone for your quick response.

Try this query:

select 
t.jrno, 
SUM(CASE WHEN t1.b_or_s = 1 THEN t.amount ELSE NULL END) buy,
SUM(CASE WHEN t1.b_or_s = 2 THEN t.amount ELSE NULL END) sale,
SUM(t.amount) AS Total, 
SUM(t.h_crg) AS Howla, 
SUM(t.l_crg) AS Laga, 
SUM(t.taxamt) AS Tax, 
SUM(t.commsn) AS Commission
FROM dbo.t_sh t
WHERE  t.tran_type = 'S'
AND NOT EXISTS(SELECT 1 FROM  dbo.T_ledger x x.jrno = t.jrno)

How to write a stored procedure which contains multiple parameters for in query?

7 votes

I want to write a stored procedure like this

Create Proc dbo.GetApplicantsByIDs

as

Select * from Applicants where ID in (1,2,3,4)

How i can pass 1,2,3 as parameters and these ids may be multiple.

You can send your id's as XML to the SP.

create procedure dbo.GetApplicantsByIDs
  @IDList xml
as

-- Table to hold the id's
declare @IDs table(ID int primary key)

-- Fill table with id's  
insert into @IDs(ID)
select X.ID.value('.', 'int')
from @IDList.nodes('/i') as X(ID)

select * 
from Applicants 
where ID in (select ID 
             from @IDs)

The parameter string should look like this:

'<i>1</i><i>2</i><i>3</i>'

Does INSERT INTO ... SELECT ... always match fields by ordinal position?

7 votes

My tests seem to confirm that

INSERT INTO a (x, y) SELECT y, x FROM b

maps b.y to a.x, i.e., the fields are matched only by ordinal position and not by name. Is this always the case, i.e., can I rely on that behaviour? Unfortunately, the documentation does not specify this (or I didn't find it).

That is correct, SQL Server does not try to do any mapping of column names since you can apply any aliases to the source data that you like. It will always reference ordinal position.

T-SQL: What is NOT(1=NULL)?

6 votes

I don't get the simple boolean algebra on my sql-server. According to msdn, the following statement should return "1", but on my server it returns "0". Can you help me?

SET ANSI_NULLS ON
SELECT CASE WHEN NOT(1=NULL) THEN 1 ELSE 0 END

Please have a look at msdn. There it clearly states: "Comparing NULL to a non-NULL value always results in FALSE." - no matter what the ANSI_NULLS-setting is. Thus "1=NULL" should be FALSE and NOT(FALSE) should thus be TRUE and the statement should return "1".

But on my machine, it returns "0"!

One explanation might be, that "1=NULL" evaluates to "UNKNOWN". NOT(UNKNOWN) is still UNKNOWN (msdn), which would force the CASE-Statement into the ELSE.

But then the official documentation of the equals-operator would be wrong. I cannot believe this!

Can anybody explain this behaviour?

Thank you very much for any help!

Edit (2012-03-15):

One thing I just found that might be of interest for some of you:

CREATE TABLE #FooTest (Value INT)
ALTER TABLE #FooTest WITH CHECK ADD CONSTRAINT ccFooTestValue CHECK (Value>1)
PRINT '(NULL>1) = ' + CASE WHEN NULL>1 THEN 'True' ELSE 'False' END
INSERT INTO #FooTest (Value) VALUES (NULL)

The print-Statement writes 'False', but the insertion runs without error. SQL-Server seems to negate the check-constraint in order to search for rows that do not fulfill the constraint-check:

IF EXISTS (SELECT * FROM inserted WHERE NOT(Value>NULL)) <Generate error>

Since the check-constraint evaluates to UNKNOWN, the negation is also UNKNOWN and SqlServer does not find any row violating the check-constraint.

The MSDN page for Equals that you link to definitely appears may be incorrect.

Check the MSDN page for SET ANSI_NULLS.

When SET ANSI_NULLS is ON, all comparisons against a null value evaluate to UNKNOWN.

To get that example SQL statement to work as expected, you should use compare using "IS NULL" or "IS NOT NULL" instead of using the equals operator (=). For example:

SELECT CASE WHEN NOT(1 IS NULL) THEN 1 ELSE 0 END

OR

SELECT CASE WHEN (1 IS NOT NULL) THEN 1 ELSE 0 END

SQL datetime needs to read 00:00:00.000

6 votes

I have the following piece of SQL:

select DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0,GETDATE()),0))

which comes through as this format:

2012-02-29 23:59:59.000

I need the exact piece of code with the date the same, however the time part must read 00:00:00.000

Modify* I should have been clear here: I need to have the last day of previous month at any given time (with the time in 00:00:00.000 format of course)

select dateadd(d,datediff(d,0,dateadd(s,-1,dateadd(m,datediff(m,0,getdate()),0))),0)

Turn value into singular row

5 votes

Current

Name   Quantity
---------------
Stella       2
Jennifer     2
Greg         3 

Requested result

Name    Quantity 
---------------
Stella       1
Stella       1
Jennifer     1
Jennifer     1
Greg         1
Greg         1 
Greg         1 

How should I do it?

declare @T table  
(  
  Name varchar(50),  
  Sales int  
)  
insert into @T values  
('Stella',   '2'),  
('Jennifer', '2'), 
('Greg',     '3')

Using recursion and borrowing Michael Fredrickson's setup code:

declare @T table (  
  Name varchar(50),  
  Sales int  
)

insert into @T values ('Stella',   '2')  
insert into @T values ('Jennifer', '2') 
insert into @T values ('Greg',     '3')

-- Recursive verion
;with People (Name, Sales) as
(
    select Name, Sales
    from @T
    union all
    select Name, Sales - 1
    from People
    where Sales - 1 > 0
)
select Name, 1 as Quantity
from People
option (maxrecursion 0) -- Recurse without limit

This seems to run faster on my box (5x faster than Michael Fredrickson's according to query plan, but with many more logical reads), not that it matters much.

Optimal solution for massive number of requests on one database table

5 votes

We have a system where customers are allocated a product on a first come first served basis.

Our products table contains an incrementing primary key that started at zero which we use to keep track of how many products have been allocated i.e. a user reserves a product and gets allocated 1, next user gets 2 etc.

The problem, is that potentially hundreds of thousands of users will access the system in any given hour. All of whom will be hitting this one table.

Since we need to ensure that each customer is only allocated one product and keep track of how many products have been allocated, we use a row lock for each customer accessing the system to ensure they write to the table before the next customer hits the system - i.e. enforcing the first come first served rule.

We are concerned about the bottleneck that is the processing time of each request coming into SQL Server 2008 Enterprise Edition and the row lock.

We can't use multiple servers as we need to ensure the integrity of the primay key so anything that requires replication isn't going to work.

Does anyone know of any good solutions that are particularly efficient at handling a massive number of requests on one database table?

A bit more info: The table in question essentially contains two fields only - ID and CustomerID. The solution is for a free giveaway of a million products - hence the expectation of high demand and why using the incrementing primary key as a key makes sense for us - once the key hits a million, no more customers can register. Also, the products are all different so allocation of the correct key is important e.g. first 100 customers entered receieve a higher value product than the next 100 etc

Thanks for any help.

First, to remove the issue of key generation, I would generate them all in advance. It's only 1m rows and it means you don't have to worry about managing the key generation process. It also means you don't have to worry about generating too many rows accidentally, because once you have the table filled, you will only do UPDATEs, not INSERTs.

One important question here is, are all 1m items identical or not? If they are, then it doesn't matter what order the keys are in (or even if they have an order), so as customers submit requests, you just 'try' to UPDATE the table something roughly like this:

UPDATE TOP(1) dbo.Giveaway -- you can use OUTPUT to return the key value here
SET CustomerID = @CurrentCustomerID
WHERE CustomerID IS NULL

IF @@ROWCOUNT = 0 -- no free items left
PRINT 'Bad luck'
ELSE
PRINT 'Winner'

If on the other hand the 1m items are different then you need another solution, e.g. item 1 is X, items 2-10 are Y, 11-50 are Z etc. In this case it's important to assign customers to keys in the order the requests are submitted, so you should probably look into a queuing system of some kind, perhaps using Service Broker. Each customer adds a request to the queue, then a stored procedure processes them one at a time and assigns them the MAX free key, then returns the details of what they won.

Does VB.NET have functionality similar to TSQL's "IN"?

5 votes

In TSQL..

IF MyVal IN (1, 2, 3, 4, 14) BEGIN ... END

Is there a way to do this in VB.NET?

Is it possible to check for the existence of an integer in a set of integers inline?

Such as:

If MyVal in (1, 2, 3, 4, 14) Then ... End If

Arrays are an implementation of IEnumerable so with the System.Linq import a shorthand version of Tim Schmelter's answer would be:

{1,2,3,4,14}.Contains(MyVal)

Arrays also have an explicit implementation of IList.Contains, so without LINQ a perhaps less elegant alternative is:

DirectCast({1,2,3,4,14}, IList).Contains(MyVal)

SQL Server: OPENXML vs SELECT..FROM when dealing with XML?

4 votes

I have this xml :

DECLARE @x XML
SET @x = 
    '<data>
       <add>a</add>
       <add>b</add>
       <add>c</add>
     </data>';

Task :

I want to list the a,b,c.

approach 1 :

SELECT s.value('.', 'VARCHAR(8000)') AS [ADD]
FROM   @x.nodes('/data/add') AS t(s) 

approach 2:

DECLARE @idoc INT
EXEC sp_xml_preparedocument @idoc OUTPUT, @x

SELECT *
FROM   OPENXML(@idoc, '/data/add', 2)
       WITH ([add] NVARCHAR(MAX) '.')

both of them give me :

enter image description here

question :

which is the preferred way ?

Is there any advantages of the latter vs former ( or vice verse) ?

A simple test shows that your approach 1 takes less time than approach 2. I would not draw any conclusions about it always being the case. It can depend on how your XML is structured and how you need to query the XML.

Stored procedures to test on:

create procedure TestXML
  @X xml
as
set nocount on

select X.N.value('.', 'varchar(8000)')
from @X.nodes('/root/item') as X(N)

go

create procedure TestOpenXML
  @X xml
as
set nocount on

declare @idoc int
exec sp_xml_preparedocument @idoc out, @X

select value
from openxml(@idoc, '/root/item',1) 
  with (value  varchar(8000) '.')

exec sp_xml_removedocument @idoc

Test:

declare @X xml

set @X =
  (
    select number as '*'
    from master..spt_values
    for xml path('item'), root('root'), type
  )

set statistics time on
exec TestXML @X
exec TestOpenXML @X

Result approach 1:

SQL Server Execution Times:
   CPU time = 63 ms,  elapsed time = 70 ms.

Result approach 2:

SQL Server Execution Times:
   CPU time = 156 ms,  elapsed time = 159 ms.

(Tested on SQL Server 2005.)