Best sql-server questions in October 2010

What exactly does the T-SQL "LineNo" reserved word do?

14 votes

I was writing a query against a table today on a SQL Server 2000 box, and while writing the query in Query Analyzer, to my surprise I noticed the word LineNo was converted to blue text.

It appears to be a reserved word according to MSDN documentation, but I can find no information on it, just speculation that it might be a legacy reserved word that doesn't do anything.

I have no problem escaping the field name, but I'm curious -- does anyone know what "LineNo" in T-SQL is actually used for?

OK, this is completely undocumented, and I had to figure it out via trial and error, but it sets the line number for error reporting. For example:

LINENO 25

SELECT * FROM NON_EXISTENT_TABLE

The above will give you an error message, indicating an error at line 27 (instead of 3, if you convert the LINENO line to a single line comment (e.g., by prefixing it with two hyphens) ):

Msg 208, Level 16, State 1, Line 27
Invalid object name 'NON_EXISTENT_TABLE'.

This is related to similar mechanisms in programming languages, such as the #line preprocessor directives in Visual C++ and Visual C# (which are documented, by the way).

How is this useful, you may ask? Well, one use of this it to help SQL code generators that generate code from some higher level (than SQL) language and/or perform macro expansion, tie generated code lines to user code lines.

P.S., It is not a good idea to rely on undocumented features, especially when dealing with a database.

Update: This explanation is still correct up to and including the current version of SQL Server, which at the time of this writing is SQL Server 2008 R2 Cumulative Update 5 (10.50.1753.0) .

SCOPE_IDENTITY in C# - range

8 votes

I've checked the documentation for SCOPE_IDENTITY(), and it says "A scope is a module: a stored procedure, trigger, function, or batch." That is simple when I'm running a query in SSMSE, but in C# I'm using SqlCommand for executing my statements.

The question is: what is the scope there? Is executing subsequent commands under one connection an equivalent of batch? Or maybe every command is in a different scope and I need a transaction for this to work?

I suggest thinking of your C# commands and T-SQL "Batches" as completely separate to one another.

Think of SQLCommand as your execution wrapper only, within which the actual definition of what constitutes a batch is defined and controlled by the T-SQL language.

Your session scope is maintained at the Connection object level.

You will likely find the following MSDN forum post interesting reading. Notice how the initial example executes two separate SQL Commands but the SCOPE_IDENITY() of the second call can see the result of the previous call. This is because the current scope is visible at the connection level.

SQLCommand With Parameters and Scope_Indentity

For completeness of explanation, the reason why this does not work using parameters, as later demonstrated in the linked example, is because sp_executesql is executed within it's own scope and so therefore cannot see the scope of the connection.

[EDIT]

Further reading for the more inquisitive reader, please find VB.NET code below that provides an example of executing two separate commands on a single Connection, with the second command sucessfully issuing the SCOPE_IDENTITY() function.

The source code can be executed from within the SCRIPT component of an SSIS package Task. You will also need to edit the connection details for your environment and also create the table object referenced.

Create Table Script:

create table TestTable
(
    ID int identity(1,1) primary key not null,
    SomeNumericData int not null
);

VB.NET Source Listing:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Data.SqlClient.SqlConnection
Imports Windows.Forms.MessageBox

Public Class ScriptMain



    Public Sub Main()
        '
        ' Add your code here

        Dim oCnn As New Data.SqlClient.SqlConnection
        Dim sSQL As String
        Dim sSQL2 As String
        Dim resultOne As Integer
        Dim resultTwo As Integer
        Dim messageBox As Windows.Forms.MessageBox

        resultOne = 0
        resultTwo = 0

        oCnn.ConnectionString = "Server=ServerName;Database=DatabaseName;Trusted_Connection=true"
        sSQL = "INSERT INTO TestTable(SomeNumericData) VALUES(666) "
        sSQL2 = "SELECT SCOPE_IDENTITY()"
        Dim oCmd As SqlClient.SqlCommand = New SqlClient.SqlCommand(sSQL, oCnn)
        Dim oCmd2 As SqlClient.SqlCommand = New SqlClient.SqlCommand(sSQL2, oCnn)

        oCmd.CommandType = CommandType.Text
        oCmd.Connection = oCnn
        oCnn.Open()

        resultOne = oCmd.ExecuteNonQuery()
        resultTwo = Convert.ToInt32(oCmd2.ExecuteScalar())

        oCnn.Close()

        messageBox.Show("result1:" + resultOne.ToString + Environment.NewLine + "result2: " + resultTwo.ToString)

        Dts.TaskResult = Dts.Results.Success
    End Sub
End Class

Aggregate bitwise-OR in a subquery

8 votes

Given the following table:

CREATE TABLE BitValues ( n int )

Is it possible to compute the bitwise-OR of n for all rows within a subquery? For example, if BitValues contains these 4 rows:

+---+
| n |
+---+
| 1 |
| 2 |
| 4 |
| 3 |
+---+

I would expect the subquery to return 7. Is there a way to do this inline, without creating a UDF?

WITH    Bits
          AS ( SELECT   1 AS BitMask
               UNION ALL
               SELECT   2
               UNION ALL
               SELECT   4
               UNION ALL
               SELECT   8
               UNION ALL
               SELECT   16
             )
    SELECT  SUM(DISTINCT BitMask)
    FROM    ( SELECT    1 AS n
              UNION ALL
              SELECT    2
              UNION ALL
              SELECT    3
              UNION ALL
              SELECT    4
              UNION ALL
              SELECT    5
              UNION ALL
              SELECT    6
            ) AS t
            JOIN Bits ON t.n & Bits.BitMask > 0

7 votes

Quite a simple question. In SQL 2008 if I have a stored procedure (see below) do I run the risk of a race condition between the first two statements or does the stored procedure put a lock on the things it touches like transactions do?

ALTER PROCEDURE [dbo].[usp_SetAssignedTo] 
    -- Add the parameters for the stored procedure here
    @Server varchar(50), 
    @User varchar(50),
    @UserPool varchar(50)
AS
BEGIN
    SET NOCOUNT ON;

    -- Find a Free record
    Declare @ServerUser varchar(50)
    (SELECT top 1 @ServerUser = UserName from ServerLoginUsers
    where AssignedTo is null and [TsServer] = @Server)

    --Set the free record to the user
    Update ServerLoginUsers
    set AssignedTo = @User, AssignedToDate = getdate(), SourcePool = @UserPool
    where [TsServer] = @Server and UserName = @ServerUser

    --report record back if it was updated. Null if it was not available.
    select * 
    from ServerLoginUsers 
    where [TsServer] = @Server 
        and UserName = @ServerUser 
        and AssignedTo = @User
END

You could get a race condition.

It can be done in one statement:

  • You can assign in an UPDATE
  • The lock hints allow another process to skip this row
  • The OUTPUT clause returns data to the caller

Try this... (edit: holdlock removed)

Update TOP (1) ServerLoginUsers WITH (ROWLOCK, READPAST)
OUTPUT INSERTED.*
SET
   AssignedTo = @User, AssignedToDate = getdate(), SourcePool = @UserPool
WHERE
   AssignedTo is null and [TsServer] = @Server   -- not needed -> and UserName = @ServerUser

If not, you may need a separate select

Update TOP (1) ServerLoginUsers WITH (ROWLOCK, READPAST)
SET
    -- yes, assign in an update
   @ServerUser = UserName,
   -- write
   AssignedTo = @User, AssignedToDate = getdate(), SourcePool = @UserPool
OUTPUT INSERTED.*
WHERE
   AssignedTo is null and [TsServer] = @Server   -- not needed -> and UserName = @ServerUser

SELECT ...

See this please for more: http://stackoverflow.com/questions/939831/sql-server-process-queue-race-condition

What can cause 'rows affected' to be incorrect?

7 votes

Using Microsoft SQL Server Management Studio 2008. I have done a simple transaction:

BEGIN TRAN

SELECT ko.ID, os.ID AS ID2
FROM table_a AS ko
JOIN table_b AS os ON os.ID=ko.ID
WHERE (ko.the_date IS NOT NULL AND os.the_date IS NULL);

UPDATE table_b SET the_date=ko.the_date
FROM table_a AS ko
JOIN table_b AS os ON os.ID=ko.ID
WHERE (ko.the_date IS NOT NULL AND os.the_date IS NULL);

SELECT ko.ID, os.ID AS ID2
FROM table_a AS ko
JOIN table_b AS os ON os.ID=ko.ID
WHERE (ko.the_date IS NOT NULL AND os.the_date IS NULL);


ROLLBACK

So the SELECT and UPDATE should be the same. And the result should return 0 rows. But the UPDATE affects one row less than the SELECT gets from DB:

(61 row(s) affected)

(60 row(s) affected)

(0 row(s) affected)

What am I missing here?

I'd suspect the most likely reason is that Table_a in your example has a row with a duplicate ID in it - this cases an additional row to appear in the join in your first select, but the update only deals with rows in Table_b, so your duplicate row doesn't matter. This statement should give you the culprit:

SELECT ko.ID
FROM table_a AS ko
JOIN table_b AS os ON os.ID=ko.ID
WHERE (ko.the_date IS NOT NULL AND os.the_date IS NULL)
GROUP BY ko.ID
HAVING COUNT(*) > 1

Transactional And Reporting Databases - How?

7 votes

When building a transactional system that has a highly normalized DB, running reporting style queries, or even queries to display data on a UI can involve several joins, which in a data heavy scenario can and usually does, impact performance. Joins are expensive.

Often, the guidance espoused is that you should never run these queries off your transactional DB model, rather you should use a denormalized flattened model that is tailored for specific UI views or reports which eliminates the need for many joins. Data duplication is not an issue in this scenario.

This concept makes perfect sense, but what I rarely see when experts make these statements is exactly HOW to implement this. For example, (and quite frankly I'd appreciate an example using any platform) in a mid sized system running on a sql server back-end you have a normalized transactional model. You also have some reports and a website that require queries. So, you create a "reporting" database that flattens up the normalized data. How do you keep this in sync? Transaction log shipping? If so, how do you transform the data to fit in the reporting model?

In our shop, we set up a continuous transactional replication from the OLTP system to another DB server used for reporting. You wouldn't want to use log shipping for this purpose as it requires an exclusive lock on the database every time it restores a log, which would prevent your users from running reports.

With the optimizer in SQL Server today, I think the notion that the joins on a normalized database are "too expensive" for reporting is a bit outdated. Our design is fully 3rd normal form, several million rows in our main tables, and we have no problems running any of our reports. Having said that, if push came to shove, you could look into creating some indexed views on your reporting server to help out.

Should I use 'Integrated Security=True' in a production environment?

7 votes

Is it a bad practice to use Integrated Security=True on a production server in ASP.NET?

Nope - perfectly safe*

All you are doing is saying that you are going to use the credentials of (usually) the Windows user that the process is running under in order to authenticate with SQL Server (as opposed to supplying a username and password).

In fact in general using integrated security is considered more secure.

(*) Of course it always depends on your exact situation, but in the general case yes its fine.

SQL Server: how to constrain a table to contain a single row ?

7 votes

I want to store a single row in a configuration table for my application. I would like to enforce that this table can contain only one row.

What is the simplest way to enforce the single row constraint ?

You make sure one of the columns can only contain one value, and then make that the primary key (or apply a uniqueness constraint)

CREATE TABLE T1(
    Lock char(1) not null,
    /* Other columns */,
    constraint PK_T1 PRIMARY KEY (Lock),
    constraint CK_T1_Locked CHECK (Lock='X')
)

I have a number of these tables in various databases, mostly for storing config. It's a lot nicer knowing that, if the config item should be an int, you'll only ever read an int from the DB.

Upgrade Path For Legacy Reporting ("Embedded" Access 2003)?

7 votes

Update: Albert D. Kallal has kindly started the discussion off, and to get some more opinions I'm adding a bounty.

This is a nontrivial question about maintenance of a legacy application myself and two other developers support. We are not the original developers, and the code base is 300,000 lines of MFC and business logic tightly coupled together. We don't know every single line of code 100%.

We do know the code behind the major components, and we know that it's poorly written. Our objective is to refactor the application out of 1995 and into 2010. Between the three of us there is (in aggregate) enough experience in software architecture and database design for us to fix the components that are poorly architected in code or incorrectly modelled in the database, but we don't have a lot of experience with modern reporting systems. Thus my question (once you get to the end of it...) is about reporting systems.

For anybody who reads this entire post, I am appreciative of your time. For anybody who reads this post and replies with solutions, experience (or sympathy!), I am both appreciative and thankful.

At work I have inherited the maintenance of an Access 2003 database that contains approximately 250 reports (and thousands of supporting queries) that acts as a reporting engine for our application.

The reports all have swathes of VBA in them for particular formatting or pulling extra information into the report. For this reason we are entirely locked into the Access platform, we can't use tools like BIDS to import the Access report objects without messing around to make the report display the same without VBA.

So to get ourselves out of this Access solution we need to put some time in going over every single report. Which means we're looking to pick the best longterm solution, since we're going to have to redevelop every report regardless of the platform we choose.

Furthermore our customers have a choice of Microsoft Access or SQL Server as their database. This means that all our SQL has to be written with the lowest common denominator in mind - JET SQL. We've got some wiggle room to drop support for Microsoft Access, but we'd need to build a case for it. If the best reporting system we can identify has strong support for SQL Server but little or no support for Microsoft Access this will accelerate us dropping support for Microsoft Access as a database.

The overall implementation of the report system is quite mediocre, when we want to display reports in our application we start a Microsoft Access process, find its window and reparent it to our application, strip off its window styles and then use the Access.Application COM interface to invoke some VBA that creates linked tables to the database (either a Microsoft Access MDB or a SQL Server database) and then opens up the report we want. Probably the only supported part of the process is using the public COM interfaces, the rest is an ugly hack. The other components in the application are equally underwhelming.

To "fix" our application we've got a new development plan, with development of our application split into (approximately) three parts every year.

  1. 4 months upgrading our application to support the latest government legislation in our industry
  2. 4 months delivering a new major feature
  3. 4 months "consolidation" (fixing what is broken)

We're currently at #3 now (for this year), and we really want to take advantage of the downtime to fix up the application, refactoring the major components. We have three developers, and want AppName v5.0 out at the end of 2012 (it's currently AppName v4.12). This gives us 36 months of development effort to approportion between several components (user interface, underlying database structure, reporting, etc) over the three consolidation periods we will have before then. The sum of the components that we fix will give us v5.0.

We've scoped out what we'd like to do with most of the components except for our reporting engine, and I'm posting on SO in the hope of getting some good ideas, or at least a feel for the work that's required.

I have two ideas for improving our reporting system. Both of them involve a moderate amount of work, and there is one consideration that neither solution addresses completely: in addition to the reports that we develop, our customers also have the opportunity to request bespoke development of reports. They're customer-specific, we take their Access database, augment it with their report and give it back to the customer. There's hundreds of unique reports out there - unusable if we turned the old system off. (And we have to turn the old system off eventually - we don't know how much longer we're going to be able to mess around with the Microsoft Access window to make it look like an embedded report. We already have two distinct code paths for Access 2003 and 2007. What if we can't hack up a code path for Access 2010 and all our customers have to use Access 2007?)

For both ideas, the intention is to stop supporting our current reporting system and let it run for as long as it will without maintenance. Maybe we can hack in Access 2010 and Access 2014 support, and the customer reports that were developed keep putting along for 5 more years. Over time, we'd migrate the most commonly used reports from the old Access database into their new format.

Idea 1: Microsoft.Reporting.WinForms.ReportViewer

The first idea is to write a wrapper around the ReportViewer control as a replacement reporting engine.

We'd need to move the project to C++/CLI (already on the cards), and instead of having to launch an entire process each time we needed to view a report we could simply instantiate this control. A bonus of this that the RDLC files that contain the reports are much easier to version control in Subversion than the Access 2003 database we currently have (we use Visual SourceSafe because the tools to integrate SVN with Access don't work well with the size of our Access database). The visual designer for RDLC files is also nicely integrated into Visual Studio.

This is more of an evolutionary rather than revolutionary change to the way we do reports, the ReportViewer control will take an RDLC file that has the report layout, and our application will take care of querying the data. Because our database might be SQL Server or Microsoft Access, we still have to write simple JET SQL. We're gaining better reporting (drill down looks nice), stronger authoring tools and easier version control, but is this worth the effort?

Idea 2: SQL Server Reporting Services and SharePoint 2010 with Access Services

The second idea is to kill Access as a database platform and migrate all our customers to SQL Server (we have hosted instances of our application for those customers who don't have the skill set to set up their own SQL Server instances). Once they're migrated we would use SQL Server Reporting Services as the reporting engine, with the ReportViewer control in server rendering mode.

In addition to SQL Server Reporting Services, I am curious as to whether SharePoint 2010 with Access Services could be used to rapidly migrate existing Access reports into a more manageable format. We'd take the Access report that the customer uses, convert it to an Access Web Report then make it available for them on a SharePoint site. This would only be for our hosted customers, but if we find a way to deal quickly massage the VBA out of customer reports we could churn through the several hundred custom reports our customers have.

I'm also interested in the ability to use an Access Web Navigation Form to act as a portal to all our reports. We'd host a web browser control inside our application which would give customers access to their own reports and to our standard suite.

We'd get all the benefits of Idea #1 plus the ability to write in full Transact SQL, a reports portal, and (hopefully) a reasonable upgrade path for customer's proprietary reports.

So, my question is: am I going about this the right way? Are these viable solutions for modern reporting systems, or laughable? We have a strong preference for using the ReportViewer control either in client rendering mode where our application processes the data, or in server rendering mode in conjunction with SQL Server - but are there reporting systems like Crystal Reports which offer better reporting and better migration paths for our legacy Access reports?

If you had up to 36 months of developer time, how would you do this?

Well, ok, no one else jumping in, I give this a go.

Quite interesting how you talking about a report writer that 15+ years old. Back then the Access report writer was beyond state of the art. It was a country mile ahead of everything else in the industry. Even today a lot of competing report writers don't have the concept of sub reports that allows modeling of relational data without having to resort to code or even SQL. Then, throw in programmable VBA, then the result is something that's very unique and powerful.

For access 2007, the report writer received some more nice upgrades in terms of layout controls but that going to be of little help here.

And, for 2010 we can now display reports in a sub-form control. This feature was added to facilitate use of the new access navigation control. Access 2010 has a new web browser control (works in forms or reports), and there also a new navigation control. Your post hints that the new navigation control and the web control are somehow related to each other but they completely different features.

Both the new web browser control, and navigation control can be used in both web appliations or 100% client only applications. The navigation control is nice since you can build that nav contorl by drag and dropping reports onto the nav control to build up a up a list of reports to choose from (it is slick and easy and nice). And with this navigation control, we can actually build some nice drill down type of interfaces for reports.

As you noted for access 2010 we now have web publishing of access reports and this feature is based on SQL server reporting services (they are RDL reports). However, two important issues here is no VBA is allowed inside of the web reports. And, I also point out that there is no automatic conversion utility that is built into access that will convert existing reports into web based reports. So to build a report that's going to be designated and published to the web, you have to choose specifically to create a web report to accomplish this goal. So this answers and clears up one question of yours of will this help you convert existing reports to SQL server, and the answer is no. So, Access will not help you convert existing reports to web based RDL reports (As noted, Access uses RDL and sql reporting for those web reports - those reports also render in the access client side without conversion).

Access has a great path for web based reports via SharePoint and also Access Web is coming to Office 365. However, keep in mind this ability is not going to help much with the existing reports that you have.

In fact one of the things I would be looking at if you're going to use winforms report viewer is the change in where that existing VBA report code will be moved to? You not really mentioned this issue. As noted one really interesting and great feature of those reports is that imbedded VBA code. Often that VBA will have been used because SQL and something like RDL will NOT work because neither of those languages (sql, and RDL) are procedural code.

I can't stress how important this concept is. So, this quite much means any report writer replacements means that code will now have to be OUTSIDE of the reports and moved into your application. So, keep this issue in mind as now when you issue new reports, you also be issuing new procedural code that NOT be contained in those reports. This code will have to become part of your application (so, to issue new reports, you will thus also be issusing a new version of your software).

You are not likely to find much that allows procedural code to be imbedded inside the report like you can with access. So, that report code and logic will now have to be built and maintained within your main application and outside of the reports.

At the end of the day, I should point out the old adage if it ain't broke, then don't change it. Access been around for a very long time, but we seen significant investments from the folks in Redmond into this product during the last few years, so it shows no signs of dying anytime soon.

So, one possible suggestion is to keep the status quo, and continue going the way it works now. I mean you stated that you have to continue supporting JET for this anyway so you not getting away from having to use a major part of Access anyway. So, you continue to have to use JET engine anyway. So, you just dumping the report side and you still have use the JET data engine anyway.

However, assuming this decision's been made, I can't really suggest what report writer you should replace the access one with. Obviously considerations for the next report writer should have a seamless path to web even if they are NOW going to be rendered on the desktop. It makes no sense to make a large investment today without web considerations in some fashion.

I do think SQL server reporting services is a good choice due to the web ability. And, as an access developer we also have the option to create web based reports but they also render perfect in the access client on the desktop side (and this works when you have no server and no conversion issues exist when publishing these reports to the web, or using them local on the client). So, even if you don't use access, do choose something that allows reports to render both desktop and web like access 2010 allows.

I would consider building the report system around some .net tools. This would likely not play too well as an embedded report system inside of your existing application, but it would allow you to issue new reports, and you would not have to touch your existing code base for each new report issued. This issuing of new reports that have procedural code needs to be resolved. You likely can now issue new reports without having to modify the main application because those reports can contain code inside. I would be looking to use something that would allow new reports to be built and issued but you not having to issue new edition of your main software. You might not embeed the code in the reports anymore, but you need to palce it somewhere, and hopefully outside of your main application.

finding the start day (Monday) of the current week.

6 votes

Looking for a SQL query/queries that would determine the start day (Monday) of the current week.

Example: If today is -> then the start of the week is

Sat Oct 09, 2010 -> Start of the week is Monday Oct 04, 2010
Sun Oct 10, 2010 -> Start of the week is Monday Oct 04, 2010
Mon Oct 11, 2010 -> Start of the week is Monday Oct 11, 2010
Tue Oct 12, 2010 -> Start of the week is Monday Oct 11, 2010

I have seen many "solutions" on Google and StackOverflow. The look something like:

SET @pInputDate = CONVERT(VARCHAR(10), @pInputDate, 111)
SELECT DATEADD(DD, 1 - DATEPART(DW, @pInputDate), @pInputDate)

This fails because: Sun Oct 10, 2010 -> start of week Monday Oct 11, 2010 (which is incorrect).

Thanks in advance

Try using DATEFIRST to explicitly set the day of week to be regarded as the 'first'.

set DATEFIRST 1  --Monday
select DATEADD(DD, 1 - DATEPART(DW, @pInputDate), @pInputDate)

This will return the Monday of the week the InputDate falls in.

SQL And NULL Values in where clause

6 votes

So I have a simple query that returns a listing of products

SELECT     Model, CategoryID
FROM         Products
WHERE     (Model = '010-00749-01') 

This returns

010-00749-01    00000000-0000-0000-0000-000000000000
010-00749-01    NULL

Whitch is correct, so I wanted only get the products whose CategoryID is not '00000000-0000-0000-0000-000000000000' so I have

SELECT     Model, CategoryID
FROM         Products
WHERE     (Model = '010-00749-01') 
AND (CategoryID <> '00000000-0000-0000-0000-000000000000') 

But this returns no results. So I changed the query to

SELECT     Model, CategoryID
FROM         Products
WHERE     (Model = '010-00749-01') 
AND ((CategoryID <> '00000000-0000-0000-0000-000000000000') OR  (CategoryID  IS NULL))

Which returns expected result

010-00749-01    NULL

Can someone explain this behavior to me? MS SQL Server 2008

Check out the full reference on Books Online - by default ANSI_NULLS is on meaning you'd need to use the approach you have done. Otherwise, you could switch that setting OFF at the start of the query to switch the behaviour round.

When SET ANSI_NULLS is ON, a SELECT statement that uses WHERE column_name = NULL returns zero rows even if there are null values in column_name. A SELECT statement that uses WHERE column_name <> NULL returns zero rows even if there are nonnull values in column_name.
...
When SET ANSI_NULLS is ON, all comparisons against a null value evaluate to UNKNOWN. When SET ANSI_NULLS is OFF, comparisons of all data against a null value evaluate to TRUE if the data value is NULL.

Here's a simple example to demonstrate the behaviour with regard to comparisons against NULL:

-- This will print TRUE
SET ANSI_NULLS OFF;
IF NULL <> 'A'
    PRINT 'TRUE'
ELSE
    PRINT 'FALSE'

-- This will print FALSE
SET ANSI_NULLS ON;
IF NULL <> 'A'
    PRINT 'TRUE'
ELSE
    PRINT 'FALSE'

Excluding matches on JOIN fields that are NULL

6 votes

If you do a join that looks like this

SELECT T1.KeyField1, T1.KeyField2, T2.Field3
FROM T1 JOIN T2 ON T1.KeyField1 = T2.KeyField1 AND T1.KeyField2 = T2.KeyField2

Is there a way to not allow NULLS to match similar to the results this query would return

SELECT T1.KeyField1, T1.KeyField2, T2.Field3
FROM T1 JOIN T2 ON T1.KeyField1 = T2.KeyField1 AND T1.KeyField2 = T2.KeyField2
               AND T1.KeyField2 IS NOT NULL AND T2.KeyField2 IS NOT NULL

EDIT

I actually asked the question wrong.... Let me try again.

We are comparing an new data to old data and looking for records where the rows are exactly the same.

So both tables defined:

CREATE TABLE [Table](
    [Identifier] [int] IDENTITY(1,1) NOT NULL,
    [Key1] [varchar](50) NOT NULL,
    [Data1] [varchar](50) NULL,
    [Data2] [varchar](50) NULL

If I do the query:

DELETE
FROM T1 JOIN T2 ON T1.Key1 = T2.Key1 
               AND T1.Data1 = T2.Data2 AND T1.Data2 = T2.Data2

Give

T1 & T2

| Key1 | Data1       | Data2   |
| 1000 | 123 Main St | <NULL>  |
| 1001 | 456 High St | FLOOR 2 |

This would not remove the duplicate record 1000 from T1 since Data2 is NULL.

Outside of making use of a magic value in the join, is there any other way to compare these?

I understand that I should make the consultants rewrite the code to insert all NULLS as '', but this is a huge undertaking at this point. I am also looking at hashing the row to look for differences.

try using this:

SET ANSI_NULLS ON

http://msdn.microsoft.com/en-us/library/aa259229(SQL.80).aspx

EDIT

joining with "magic numbers" like:

ISNULL(T1.Field1, '-9999') = ISNULL(T2.Field2, '-9999') 

is the best you can do in your situation, and will most likely hurt the query performance significantly. I'd say the real issue is a design one, joining on NULLs is just plain strange to me.

SQL Server - pull X random records per state

6 votes

I have a table with records for each zip code in the united states. For the purposes of displaying on a map, I need to select X random records per state. How would I go about doing this?

Use:

WITH sample AS (
 SELECT t.*,
        ROW_NUMBER() OVER (PARTITION BY t.state
                               ORDER BY NEWID()) AS rank
   FROM ZIPCODES t)
SELECT s.*
  FROM sample s
 WHERE s.rank <= 5