Best sql-server questions in December 2011

Multiple INSERT statements vs. single INSERT with multiple VALUES

14 votes

I'm running a performance comparison between using 1000 INSERT statements:

INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('6f3f7257-a3d8-4a78-b2e1-c9b767cfe1c1', 'First 0', 'Last 0', 0)
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('32023304-2e55-4768-8e52-1ba589b82c8b', 'First 1', 'Last 1', 1)
...
INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
   VALUES ('f34d95a7-90b1-4558-be10-6ceacd53e4c4', 'First 999', 'Last 999', 999)

..versus using single INSERT statement with 1000 values:

INSERT INTO T_TESTS (TestId, FirstName, LastName, Age) 
VALUES 
('db72b358-e9b5-4101-8d11-7d7ea3a0ae7d', 'First 0', 'Last 0', 0),
('6a4874ab-b6a3-4aa4-8ed4-a167ab21dd3d', 'First 1', 'Last 1', 1),
...
('9d7f2a58-7e57-4ed4-ba54-5e9e335fb56c', 'First 999', 'Last 999', 999)

To my big surprise, the results are the opposite of what I thought:

  • 1000 INSERT statements: 290 msec.
  • 1 INSERT statement with 1000 VALUES: 2800 msec.

The test is executed directly in MSSQL Management Studio with SQL Server Profiler used for measurement (and I've got similar results running it from C# code using SqlClient, which is even more suprising considering all the DAL layers roundtrips)

Can this be reasonable or somehow explained? How come, a supposedly faster method results in 10 times (!) worse performance?

Thank you.

EDIT: Attaching execution plans for both: Exec Plans

Your plan shows the single inserts are using parameterised procedures (possibly auto parameterised) so parse/compile time for these should be minimal.

I thought I'd look into this a bit more though so set up a loop (script) and tried adjusting the number of VALUES clauses and recording the compile time.

I then divided the compile time by the number of rows to get the average compile time per clause. The results are below

Graph

Up until 250 VALUES clauses present the compile time / number of clauses has a slight upward trend but nothing to get too excited about.

Graph

But then there is a sudden change.

That section of the data is shown below.

+------+----------------+-------------+---------------+---------------+
| Rows | CachedPlanSize | CompileTime | CompileMemory | Duration/Rows |
+------+----------------+-------------+---------------+---------------+
|  245 |            528 |          41 |          2400 | 0.167346939   |
|  246 |            528 |          40 |          2416 | 0.162601626   |
|  247 |            528 |          38 |          2416 | 0.153846154   |
|  248 |            528 |          39 |          2432 | 0.157258065   |
|  249 |            528 |          39 |          2432 | 0.156626506   |
|  250 |            528 |          40 |          2448 | 0.16          |
|  251 |            400 |         273 |          3488 | 1.087649402   |
|  252 |            400 |         274 |          3496 | 1.087301587   |
|  253 |            400 |         282 |          3520 | 1.114624506   |
|  254 |            408 |         279 |          3544 | 1.098425197   |
|  255 |            408 |         290 |          3552 | 1.137254902   |
+------+----------------+-------------+---------------+---------------+

The cached plan size which had been growing linearly suddenly drops but CompileTime increases 7 fold and CompileMemory shoots up. As @Mikael Eriksson points out this is also the moment the plan changes from an auto parametrized one (with 1,0000 parameters) to a non parametrized one. Thereafter it seems to get linearly less efficient (in terms of number of value clauses processed in a given time) so maybe try restricting your batches to 250 and see how that compares.

Not sure why this should be. Presumably when it is compiling a plan for specific literal values it must perform some activity that does not scale linearly (such as sorting) when constructing the internal table of constants.

I tried to look at this in a debugger but the public symbols for my version of SQL Server 2008 don't seem to be available so instead I had to look at the equivalent UNION ALL construction in SQL Server 2005.

A typical stack trace is below

sqlservr.exe!FastDBCSToUnicode()  + 0xac bytes  
sqlservr.exe!nls_sqlhilo()  + 0x35 bytes    
sqlservr.exe!CXVariant::CmpCompareStr()  + 0x2b bytes   
sqlservr.exe!CXVariantPerformCompare<167,167>::Compare()  + 0x18 bytes  
sqlservr.exe!CXVariant::CmpCompare()  + 0x11f67d bytes  
sqlservr.exe!CConstraintItvl::PcnstrItvlUnion()  + 0xe2 bytes   
sqlservr.exe!CConstraintProp::PcnstrUnion()  + 0x35e bytes  
sqlservr.exe!CLogOp_BaseSetOp::PcnstrDerive()  + 0x11a bytes    
sqlservr.exe!CLogOpArg::PcnstrDeriveHandler()  + 0x18f bytes    
sqlservr.exe!CLogOpArg::DeriveGroupProperties()  + 0xa9 bytes   
sqlservr.exe!COpArg::DeriveNormalizedGroupProperties()  + 0x40 bytes    
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x18a bytes   
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x146 bytes   
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x146 bytes   
sqlservr.exe!COptExpr::DeriveGroupProperties()  + 0x146 bytes   
sqlservr.exe!CQuery::PqoBuild()  + 0x3cb bytes  
sqlservr.exe!CStmtQuery::InitQuery()  + 0x167 bytes 
sqlservr.exe!CStmtDML::InitNormal()  + 0xf0 bytes   
sqlservr.exe!CStmtDML::Init()  + 0x1b bytes 
sqlservr.exe!CCompPlan::FCompileStep()  + 0x176 bytes   
sqlservr.exe!CSQLSource::FCompile()  + 0x741 bytes  
sqlservr.exe!CSQLSource::FCompWrapper()  + 0x922be bytes    
sqlservr.exe!CSQLSource::Transform()  + 0x120431 bytes  
sqlservr.exe!CSQLSource::Compile()  + 0x2ff bytes   

So it seems to spend a lot of time comparing (presumably sorting) strings. No idea to what end though.

It doesn't seem to affect the size of the query plan when I tried a query consisting entirely of duplicate rows and neither affects the order of the output of the table of the constants (and as you are inserting into a heap time spent sorting would be pointless anyway even if it did).

Is varchar(MAX) always preferable?

9 votes

Regarding SQL Server, I understand :

  • var means the memory is lazy allocated, meaning it fits to the data exactly (on insertion).

  • MAX means there is no size restriction\limitation.

Then, is it always preferable to use MAX when using varchar, as we don't allocate the whole size anyhow?

Should we use a constant size only if there is a constraint we want to enforce on this DB column?

There is a very good article on this subject by SO User @Remus Rusanu. Here is a snippit that I've stolen but I suggest you read the whole thing:

The code path that handles the MAX types (varchar, nvarchar and varbinary) is different from the code path that handles their equivalent non-max length types. The non-max types can internally be represented as an ordinary pointer-and-length structure. But the max types cannot be stored internally as a contiguous memory area, since they can possibly grow up to 2Gb. So they have to be represented by a streaming interface, similar to COM’s IStream. This carries over to every operation that involves the max types, including simple assignment and comparison, since these operations are more complicated over a streaming interface. The biggest impact is visible in the code that allocates and assign max-type variables (my first test), but the impact is visible on every operation.

In the article he shows several examples that demonstrate that using varchar(n) typically improves performance.

You can find the entire article here.

What is "*;1" in TADOStoredProc.ProcedureName value in Delphi?

8 votes

You may specify TADOStoredProc.ProcedureName in Delphi with the following value:

MSSQLProcedureName;1

But what does meen ";1" in this value?

Thanks for the help!

This is an optional value that can be used to specify multiple definitions for the same stored procedure name... I think that the original intention was to allow versioning, but I've never seen it used that way in the wild.

When you don't specify the number in the create procedure statement, it defaults to 1. Some of the various data access layers that call SQL Server will explicitly add the ;1 when executing the stored procedure.

From MSDN:

;*number*

Is an optional integer used to group procedures of the same name so they can be dropped together with a single DROP PROCEDURE statement. For example, the procedures used with an application called orders may be named orderproc;1, orderproc;2, and so on. The statement DROP PROCEDURE orderproc drops the entire group. If the name contains delimited identifiers, the number should not be included as part of the identifier; use the appropriate delimiter around procedure_name only.

Tool for detecting copy/pasted code in TSQL?

8 votes

I'm working on a SQL Server 2005 system with a lot of copy-pasted code, lots of repeated calculations instead of functions, many stored procedures that differ only very slightly instead of being parameterized. Also lots of SSIS packages that are very similar.

Moving forward, I'd like to be able to to detect this type of "code smell". There's lots of tools around for this, but I haven't found anything for SQL.

Any ideas on products to use, or best practices to apply regarding copy/pasted code in TSQL?

Many thanks...

EDIT: Thanks for all the feedback. I poked around with a couple of them, but the one that looks most straightforward and easy to use appears to be Atomiq. I had to exported all my objects to .sql files, then had to use a batch file to rename all the files with extension .sql to .cs because it doesn't use .cs. However, I've already gotten some good information on major problems with our code.

You have several for that purpose:

  • Atomiq - commercial
  • Black Duck Suite - commercial (software analyzing suite)
  • CloneDR - commercial (Ada, C, C++, C#, Java, COBOL, Fortran, Python, VB.net, VB6, PHP4/5, PLSQL, SQL2011, XML, many others)
  • ConQAT (Open Source, supports: ABAP, ADA, Cobol, C/C++, C#, Java, PL/I, PL/SQL, Python, Text, Transact SQL, Visual Basic, XML)
  • Simian (software)

Strip special characters in SQL without loop?

7 votes

Is there a way to strip special characters (leaving just alphanumeric) from a string/field in SQL server without a loop / custom function?

So far, the best i've come up with is:

Create Function [dbo].[strip_special](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin
    While PatIndex('%[^a-z0-9]%', @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex('%[^a-z0-9]%', @Temp), 1, '')
    Return @TEmp
End

On some servers I don't have the privileges to cread user defined functions so i'd like to be able to achieve the same result without. I also have concerns over the efficiency/performance of the loop (although I guess even a built-in function/method would probably itself use a loop).

Thanks

I assume you have a column that you want replaced, this is how you could do it:

 declare @table table(id int, temp varchar(15))


insert @table values(1, 'abc-.123+')
insert @table values(2, '¤%&(abc-.?=&(/#')

;with t1 as
(
select temp a, id from @table
union all
select cast(replace(a, substring(a, PatIndex('%[^a-z0-9]%', a), 1), '') as varchar(15)), id
from t1
where PatIndex('%[^a-z0-9]%', a) > 0
)
select t2.*, t1.a from t1
join @table t2
on t1.id = t2.id
where PatIndex('%[^a-z0-9]%', a) = 0
option (maxrecursion 0)

Result:

id          temp            a
----------- --------------- ---------------
2           ¤%&(abc-.?=&(/# abc
1           abc-.123+       abc123

Strange behaviour of CASE construction

7 votes

Background : I was trying to get some random 'hex' values while creating dummy data and came up with this construction :

SELECT TOP 100 
   result = (CASE ABS(Binary_Checksum(NewID())) % 16 
                WHEN -1 THEN 'hello'
                WHEN 0 THEN '0' 
                WHEN 1 THEN '1' 
                WHEN 2 THEN '2' 
                WHEN 3 THEN '3'
                WHEN 4 THEN '4' 
                WHEN 5 THEN '5' 
                WHEN 6 THEN '6' 
                WHEN 7 THEN '7'
                WHEN 8 THEN '8' 
                WHEN 9 THEN '9' 
                WHEN 10 THEN 'a' 
                WHEN 11 THEN 'b'
                WHEN 12 THEN 'c' 
                WHEN 13 THEN 'd' 
                WHEN 14 THEN 'e' 
                WHEN 15 THEN 'f' 
                ELSE 'huh'  END)
          FROM sys.objects 

When running this on my SQL Server 2008 R2 instance, I get quite a lot of 'huh' records:

result
------
huh
3
huh
huh
6
8
6

I really don't understand why. What I would expect to happen is :

  • for every record NewID() comes up with a new random value
  • Binary_Checksum() calculates an int based on said value
  • ABS() makes the value positive
  • % 16 returns the remainder of that positive value if it would be divided by 16, which then would be a value between 0 and 15
  • the CASE construction converts the value to a relevant character
  • Since there are WHENs for every value between 0 and 15, the ELSE should never be needed

or at least, that is what I would think should happen... but obviously something goes wrong along the road...

When doing the same thing in a two-step approach (via temp-table), the huh's are gone...

SELECT TOP 100 x = ABS(Binary_Checksum(NewID())) % 16,
               result = 'hello'
  INTO #test
  FROM sys.objects

UPDATE #test 
   SET result = (CASE x WHEN 0 THEN '0' WHEN 1 THEN '1' WHEN 2 THEN '2' WHEN 3 THEN '3'
                        WHEN 4 THEN '4' WHEN 5 THEN '5' WHEN 6 THEN '6' WHEN 7 THEN '7'
                        WHEN 8 THEN '8' WHEN 9 THEN '9' WHEN 10 THEN 'a' WHEN 11 THEN 'b'
                        WHEN 12 THEN 'c' WHEN 13 THEN 'd' WHEN 14 THEN 'e' WHEN 15 THEN 'f' 
                        ELSE 'huh'  END)

SELECT * FROM #test

Anyone who understands this ? As far as I can tell it should give the same result (it IS copy-paste indeed) regardless of me doing it directly or via a temp-table... But obviously something goes wrong if I do it in a single statement.

PS: I don't need a 'fix' for this, I already have a workaround (see below), I merely am hoping someone can explain me why this does what it does.

Workaround :

SELECT TOP 100 result = SubString('0123456789abcdef', 1 + (ABS(Binary_Checksum(NewID())) % 16), 1) 
  FROM sys.objects

I believe that, contrary to the description of the simple CASE expression, that it actually re-evaluates input_expression for each input_expression = when_expression comparison (this would normally be safe, unless, as in this case, there's a non-deterministic function in input_expression)

So, what happens is that it keeps generating different random numbers between 0 and 15 for each comparison, and the huhs come out if, after 16 evaluations/comparisons, it never generated a matching number.


This doesn't generate huhs:

SELECT TOP 100 
    result = (CASE ABS(Binary_Checksum(Value)) % 16 
            WHEN -1 THEN 'hello'
            WHEN 0 THEN '0' 
            WHEN 1 THEN '1' 
            WHEN 2 THEN '2' 
            WHEN 3 THEN '3'
            WHEN 4 THEN '4' 
            WHEN 5 THEN '5' 
            WHEN 6 THEN '6' 
            WHEN 7 THEN '7'
            WHEN 8 THEN '8' 
            WHEN 9 THEN '9' 
            WHEN 10 THEN 'a' 
            WHEN 11 THEN 'b'
            WHEN 12 THEN 'c' 
            WHEN 13 THEN 'd' 
            WHEN 14 THEN 'e' 
            WHEN 15 THEN 'f' 
            ELSE 'huh'  END)
          FROM (select NewID() as Value,* from sys.objects ) so

how to build 2^n changed word from word with length n in sql server

6 votes

I need a function in sql server to build all of changed word in below example; for input word with length n must build 2^n changed word; For example, if the input of the function is

"I"

the output of the function should be

I
-   

the input of the function is

"am"

the output of the function should be

am
-m
a-
--

the input of the function is

"sql"

the output of the function should be

sql
-ql
s-l
sq-
--l
s--
-q-
--- 

You can do this with a numbers table (master..spt_values) and stuff in a loop.

declare @Word varchar(10) = 'sql'

declare @T table
(
  Word varchar(10)
)

insert into @T values (@Word)

while not exists(select *
                 from @T 
                 where Word = replicate('-', len(@Word)))
begin              
  insert into @T(Word)
  select distinct stuff(T.Word, N.number, 1, '-')
  from @T as T
    cross join
       master..spt_values as N
  where N.type = 'P' and
        N.number between 1 and len(@Word) and
        stuff(T.Word, N.number, 1, '-') not in (select Word from @T)
end        

select *
from @T

http://data.stackexchange.com/stackoverflow/q/122334/

Or you can use a reqursive CTE

declare @Word varchar(10) = 'sql'

;with C as
(
  select @Word as Word,
         0 as Iteration
  union all
  select cast(stuff(Word, N.number, 1, '-') as varchar(10)),
         Iteration + 1
  from C
    cross join
       master..spt_values as N
  where N.type = 'P' and
        N.number between 1 and len(@Word) and
        Iteration < len(@Word)
)
select distinct Word
from C

http://data.stackexchange.com/stackoverflow/q/122337/

Update

The recursive CTE version is really slow as pointed out by OP in a comment. Using a word with 7 letters there are 960800 rows returned from the CTE.

What is the correct SQL type to store a .Net Timespan with values > 24:00:00?

5 votes

I am trying to store a .Net TimeSpan in SQL server 2008 R2.

EF Code First seems to be suggesting it should be stored as a Time(7) in SQL.

However TimeSpan in .Net can handle longer periods than 24 hours.

What is the best way to handle storing .Net TimeSpan in SQL server?

I'd store it in the database as a BIGINT and I'd store the number of ticks (eg. TimeSpan.Ticks property).

That way, if I wanted to get a TimeSpan object when I retrieve it, I could just do TimeSpan.FromTicks(value) which would be easy.

Why did creating this new index improve performance so much when existing index included all columns in new index?

5 votes

I have Log and LogItem tables; I'm writing a query to grab some data from both. There are thousands of Logs and each Log can have up to 125 LogItems

The query in question is complicated so I'm skipping it (if someone thinks it's important I can probably post it), but when I ran SSMS Estimated Query plan, it told me a new Non-Clustered index would improve performance up to 100%.

Existing Index: Non-clustered
Key Colums (LogItem): ParentLogID, DateModified, Name, DatabaseModified

Query Plan Recommendation
CREATE NONCLUSTERED INDEX [LogReportIndex]
ON [dbo].[LogItem] ([ParentLogID],[DatabaseModified])

Just for fun, I created this new index and ran the query and much to my surprise, it now takes ~1 second for my query to run, when before it was 10+ seconds.

I assumed that my existing index would cover this new query, so my question is why did creating a new index on the only columns used in my new query improve performance? Should I have an index for each unique combination of columns used in my where clauses?

note: I don't think this is because the SQL Server is caching my results, I ran the query about 25-30 times before I created the index and it consistantly took 10-15 seconds, after the index it is now consistantly ~1 or less.

Order of columns in an index is important. If filtering requires column 1 and 4 from index, the index is not going to help. It's only useful when filtering by the first N consecutive columns.

This is because index is a tree. You can't efficiently select all nodes of the tree where column3 = something, because they are scattered all other the place, belonging to different values of column1 and column2. But if you know column1 and column2 as well, locating the right branch in the tree is a no brainer.

SQL and logical operators and null checks

5 votes

I've got a vague, possibly cargo-cult memory from years of working with SQL Server that when you've got a possibly-null column, it's not safe to write "WHERE" clause predicates like:

 ... WHERE the_column IS NULL OR the_column < 10 ...

It had something to do with the fact that SQL rules don't stipulate short-circuiting (and in fact that's kind-of a bad idea possibly for query optimization reasons), and thus the "<" comparison (or whatever) could be evaluated even if the column value is null. Now, exactly why that'd be a terrible thing, I don't know, but I recall being sternly warned by some documentation to always code that as a "CASE" clause:

 ... WHERE 1 = CASE WHEN the_column IS NULL THEN 1 WHEN the_column < 10 THEN 1 ELSE 0 END ...

(the goofy "1 = " part is because SQL Server doesn't/didn't have first-class booleans, or at least I thought it didn't.)

So my questions here are:

  1. Is that really true for SQL Server (or perhaps back-rev SQL Server 2000 or 2005) or am I just nuts?
  2. If so, does the same caveat apply to PostgreSQL? (8.4 if it matters)
  3. What exactly is the issue? Does it have to do with how indexes work or something?

My grounding in SQL is pretty weak.

I don't know SQL Server so I can't speak to that.

Given an expression a L b for some logical operator L, there is no guarantee that a will be evaluated before or after b or even that both a and b will be evaluated:

Expression Evaluation Rules

The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.

Furthermore, if the result of an expression can be determined by evaluating only some parts of it, then other subexpressions might not be evaluated at all.
[...]
Note that this is not the same as the left-to-right "short-circuiting" of Boolean operators that is found in some programming languages.

As a consequence, it is unwise to use functions with side effects as part of complex expressions. It is particularly dangerous to rely on side effects or evaluation order in WHERE and HAVING clauses, since those clauses are extensively reprocessed as part of developing an execution plan.

As far as an expression of the form:

the_column IS NULL OR the_column < 10

is concerned, there's nothing to worry about since NULL < n is NULL for all n, even NULL < NULL evaluates to NULL; furthermore, NULL is false is a boolean context so

null is null or null < 10

should evaluate to true regardless which sub-expression is evaluated first.

The whole "use a CASE" sounds mostly like cargo-cult SQL to me. However, like most cargo-cultism, there is a kernel a truth buried under the cargo; just below my first excerpt from the PostgreSQL manual, you will find this:

When it is essential to force evaluation order, a CASE construct (see Section 9.16) can be used. For example, this is an untrustworthy way of trying to avoid division by zero in a WHERE clause:

SELECT ... WHERE x > 0 AND y/x > 1.5;

But this is safe:

SELECT ... WHERE CASE WHEN x > 0 THEN y/x > 1.5 ELSE false END;

So, if you need to guard against a condition that will raise an exception or have other side effects, then you should use a CASE to control the order of evaluation as a CASE is evaluated in order:

Each condition is an expression that returns a boolean result. If the condition's result is true, the value of the CASE expression is the result that follows the condition, and the remainder of the CASE expression is not processed. If the condition's result is not true, any subsequent WHEN clauses are examined in the same manner.

So given this:

case when A then Ra
     when B then Rb
     when C then Rc
     ...

A is guaranteed to be evaluated before B, B before C, etc. and evaluation stops as soon as one of the conditions evaluates to a true value.

In summary, a CASE short-circuits buts neither AND nor OR short-circuit so you only need to use a CASE when you need to protect against side effects.

Round up or down to closest 20

5 votes

As the question suggests, I need to round up or down to closest 20, with the exception of numbers from 0-14, all must round to 20, because no zero values are allowed.The table has not to be updated with new values.For now I have managed only rounding up or down to the nearest 100 as seen below:

 CASE WHEN ROUND(number,-2)=0 THEN CAST(ROUND(number,-2)+100 AS DECIMAL(18, 0)) 
      ELSE CAST(ROUND(number,-2) AS DECIMAL(18,0)) END AS [NUMBER]

Try this:

DECLARE @testValue Int = 35;

SELECT
  CASE
    WHEN @testValue BETWEEN 0 AND 14 THEN 20
    ELSE ROUND((@testValue * 1.0) / 20, 0) * 20
  END MyRoundedValue;

There may be a more optimal approach, but this was the first solution that came to mind. The * 1.0 is to force conversion to a floating point number so that the / 20 will produce fractional results. The result is rounded using ROUND then multiplied back by 20 to get your final "rounded" value.

Edited to take your special case of 0-14 into account...

How can I import into a join table in a set based statement?

5 votes

I am importing data from a flat file into a normalized table structure. I am currently using cursors to do inserts into related tables so I have the primary keys to insert into the join table. Can I do this in a set based way in SQL Server 2008 R2?

I have 3 tables: contacts, phones, and contactPhones. After running the import I'd like there to be 2 contacts in the contact table, 2 in the phones table, and 2 in the contactPhones table. The real import is considerably more complicated, but getting this to work will let me migrate the real import from cursors to a set based solution.

It seems like the merge or output keywords should be able to do what I want but I haven't been able to get the syntax to work.

Here is a code sample trying it with OUTPUT. I got this to almost work, except I couldn't reference import.contactId.

create table import(contactId int  identity, phone varchar(50), name varchar(10))
create table contacts (contactId int identity, name varchar(50))
create table contactPhone (contactId int, phoneId int)
create table Phones (phoneId int identity, number varchar(10))

go
insert into import (phone, name)
    select '1872', 'dave'
    union (select '9110', 'Jordan')

insert into contacts
    select name from import
insert into Phones (number)
    OUTPUT import.contactId, INSERTED.phoneId into contactPhone
    select phone from import

select * from contactPhone

Here is a code sample trying it with merge:

create table import(contactId int  identity, phone varchar(50), name varchar(10))
create table contacts (contactId int identity, name varchar(50))
create table contactPhone (contactId int, phoneId int)
create table Phones (phoneId int identity, number varchar(10))

go
insert into import (phone, name)
    select '1872', 'dave'
    union (select '9110', 'Jordan')

insert into contacts
    select name from import

MERGE phones target
    USING (select import.contactId, import.phone, import.name 
            from import join contacts on import.contactId = contacts.contactId) as source
    ON (target.contactId = source.contactId)
    WHEN MATCHED THEN 
        insert into Phones (number)
            OUTPUT import.contactId, INSERTED.phoneId into contactPhone
            select phone from import
    WHEN NOT MATCHED THEN   
        INSERT (name)
        VALUES (source.Name)
        OUTPUT INSERTED.*;



select * from contactPhone

Use merge on contacts and Phones and store the output in a table variable to be used when you insert into contactPhone.

insert into import (phone, name)
select '1872', 'dave' union all
select '9110', 'Jordan'

declare @ContactIDs table(SourceID int primary key, TargetID int)
declare @PhoneIDs table (SourceID int primary key, TargetID int)

merge contacts as c
using import as i
on 0 = 1
when not matched then
  insert (name) values (i.name)
output i.contactId, inserted.contactId into @ContactIDs;

merge Phones as p
using import as i
on 0 = 1
when not matched then
  insert (number) values (i.phone)
output i.contactId, inserted.phoneId into @PhoneIDs;

insert into contactPhone(contactId, phoneId)
select c.TargetID, p.TargetID
from import as i
  inner join @ContactIDs as c
    on i.contactID = c.SourceID
  inner join @PhoneIDs as p
    on i.contactID = p.SourceID  

Using merge..output to get mapping between source.id and target.id

http://data.stackexchange.com/stackoverflow/qt/122662/

find the nearest location in ms-sql

5 votes

I send these paramaters my script : Latitude : 41.0186 Longitude : 28.964701 (it is sample). i want to find nearest location's name. how to do this? (query's where code's must be changed)

Sql Query :

   SELECT  Name FROM Location 
   WHERE Latitude = 41.0186 AND longitude= 28.964701

Location table likes this: (in real, this is huge table)

Latitude         longitude          Name
41.0200500000   40.5234490000        a
41.0185714000   37.0975924000        b
41.0184913000   34.0373739000        c
41.0166667000   39.5833333000        d
41.0166667000   28.9333333000        e

Use this function

CREATE FUNCTION dbo.DictanceKM(@lat1 FLOAT, @lat2 FLOAT, @lon1 FLOAT, @lon2 FLOAT)
RETURNS FLOAT 
AS
BEGIN

    RETURN ACOS(SIN(PI()*@lat1/180.0)*SIN(PI()*@lat2/180.0)+COS(PI()*@lat1/180.0)*COS(PI()*@lat2/180.0)*COS(PI()*@lon2/180.0-PI()*@lon1/180.0))*6371
END

You may order by this function, BUT on large datasets it will be very slow, so try to prefilter the recordset

UPD:

Using @chopikadze's test data:

declare @lat float, @lng float
select @lat = 41.0186, @lng = 28.964701

declare @Location table(Latitude float, Longtitude float, Name nvarchar(50))
insert into @Location(Latitude, Longtitude, Name) values (41.0200500000, 40.5234490000, 'a')
insert into @Location(Latitude, Longtitude, Name) values (41.0185714000, 37.0975924000, 'b')
insert into @Location(Latitude, Longtitude, Name) values (41.0184913000, 34.0373739000, 'c')
insert into @Location(Latitude, Longtitude, Name) values (41.0166667000, 39.5833333000, 'd')
insert into @Location(Latitude, Longtitude, Name) values (41.0166667000, 28.9333333000, 'e')

SELECT ABS(dbo.DictanceKM(@lat, Latitude, @lng, Longtitude)) DistanceKm, * FROM @Location
ORDER BY ABS(dbo.DictanceKM(@lat, Latitude, @lng, Longtitude))

Assuming that the Earth is NOT a geoid, but the round ball, if you need under 1m exact formula - I can find it, don't have it with me

Why does SQL Server force me to drop the table to be able to alter a field from DateTime to DateTime2(3)?

5 votes

I am trying to alter a table field - with some rows in it - from DateTime to DateTime2(3).

But the SQL Server Management Studio complains that I have drop and re-create the table.

But why?

Isn't DateTime2(3) has more precision than DateTime type? It should be fine, should not it be?

There is a setting in SSMS that will allow you to do what you want.. Menu-Tools-Options-Designers-Prevent saving changes that require table re-creaction.

SSMS has a habit of recreating almost any changes you do. It should be just fine to only alter the column data type with something like this.

alter table TableName alter column ColName datetime2(3)