Best sql questions in April 2012

Why does SELECT results differ between mysql and sqlite?

19 votes

I'm re-asking this question in a simplified and expanded manner.

Consider these sql statements:

create table foo (id INT, score INT);

insert into foo values (106, 4);
insert into foo values (107, 3);
insert into foo values (106, 5);
insert into foo values (107, 5);

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

Using sqlite, the select statement returns:

id          avg1      
----------  ----------
106         4.5       
107         4.0       

and mysql returns:

+------+--------+
| id   | avg1   |
+------+--------+
|  106 | 4.5000 |
+------+--------+

As far as I can tell, mysql's results are correct, and sqlite's are incorrect. I tried to cast to real with sqlite as in the following but it returns two records still:

select T1.id, cast(avg(cast(T1.score as real)) as real) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, cast(avg(cast(T2.score as real)) as real) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

Why does sqlite return two records?

Quick update:

I ran the statement against the latest sqlite version (3.7.11) and still get two records.

Another update:

I sent an email to sqlite-users@sqlite.org about the issue.

Myself, I've been playing with VDBE and found something interesting. I split the execution trace of each loop of not exists (one for each avg group).

To have three avg groups, I used the following statements:

create table foo (id VARCHAR(1), score INT);

insert into foo values ('c', 1.5);
insert into foo values ('b', 5.0);
insert into foo values ('a', 4.0);
insert into foo values ('a', 5.0);

PRAGMA vdbe_listing = 1;
PRAGMA vdbe_trace=ON;

select avg(score) avg1
from foo
group by id
having not exists (
    select avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

We clearly see that somehow what should be r:4.5 has become i:5:

enter image description here

I'm now trying to see why that is.

Final edit:

So I've been playing enough with the sqlite source code. I understand the beast much better now, although I'll let the original developer sort it out as he seems to already be doing it:

http://www.sqlite.org/src/info/430bb59d79

Interestingly, to me at least, it seems that the newer versions (some times after the version I'm using) supports inserting multiple records as used in a test case added in the aforementioned commit:

CREATE TABLE t34(x,y);
INSERT INTO t34 VALUES(106,4), (107,3), (106,5), (107,5);  

I tried to mess with some variants of query.

It seems, like sqlite has errors in using of previous declared fields in a nested HAVING expressions.

In your example avg1 under second having is always equal to 5.0

Look:

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    SELECT 1 AS col1 GROUP BY col1 HAVING avg1 = 5.0);

This one returns nothing, but execution of the following query returns both records:

...
having not exists (
    SELECT 1 AS col1 GROUP BY col1 HAVING avg1 <> 5.0);

I can not find any similar bug at sqlite tickets list.

How to optimize this in MySQL?

11 votes

I have table structure as displayed in first table.

And want to fetch Both Male and Female Counts in a single query so that request will go only for one time onto the server.

How to optimize this in MySQL ?

This is what you need to do:

select gender,
       count(case when age between 0 and 20 then 1 else null end) Age_0_20,
       count(case when age between 21 and 40 then 1 else null end) Age_21_40
from yourtable
group by gender

Adjust accordingly :)

Update, with clarifications

Note that COUNT aggregate function only counts non-null values. Thus, the else values in the case must be NULL. The When value returns 1 but it could just be any non-null value.

Some people implement this by using SUM:

select gender,
       sum(case when age between 0 and 20 then 1 else 0 end) Age_0_20,
       sum(case when age between 21 and 40 then 1 else 0 end) Age_21_40
from yourtable
group by gender

The result is going to be absolutely the same.

How to log someone trying to make sql injection

10 votes

There are a lot of ways here to secure your code from SQL injection attack. But what I require is How to log sql injection attack so that we can add him(the attacker-user) in the blacklist-users database.

What I need here, is a kind of function which will return true if there's a sql injection.

<?php
if(isset($_POST['username'])){
// need a function here which will return true if there's
// a sql injection else false
}
?>

You can use PHP-IDS to detect security attacks (not just SQL injection) and add custom behavior. In my case I run PHP-IDS at the start of every request. If an issue is detected, I log to the database, return a generic error message to the user and die().

Be warned though that PHP-IDS will not detect all SQL injection issues. It's not possible to do that automatically. You still need to properly handle your queries.

Fast(er) method for wildcard searching of 250K+ strings

9 votes

I have an English dictionary in a MySQL database with just over 250K entries, and I'm using a simple ruby front-end to search it using wildcards at the beginning of the strings. So far I've been doing it like this:

SELECT * FROM words WHERE word LIKE '_e__o'

or even

SELECT * FROM words WHERE word LIKE '____s'

I always know the exact length of the word, but all but a single character are potentially unknown.

This is slower than molasses, about fifteen times slower than a similar query without the leading wildcard because the index for the column cannot be used.

I've tried a few methods to narrow the scope of the search. For example, I've added 26 additional columns containing each word's individual letter counts and narrow the search using those first. I've also tried narrowing by word length. These methods made almost no difference, thanks to the inherent inefficiency of leading-wildcard searches. I've experimented with the REGEXP statement, which is even slower.

SQLite and PostgreSQL are just as limited as MySQL, and though I have limited experience with NoSQL systems, my research gives me the impression that they excel at scalability, not performance of the kind I need.

My question then, is where should I look for a solution? Should I continue trying to find a way to optimize my queries or add supplementary columns that can narrow my potential recordset? Are there systems designed specifically to accomplish fast wildcard searching in this vein?

With PostgreSQL 9.1 and the pg_trgm extension you can create indexes that are usable for a like condition you are describing.

For an example see here: http://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/

I verified it on a table with 300k rows using LIKE '____1' and it does use such an index. It took about 120ms to count the number of rows in that table (on an old laptop). Interesting enough the expression LIKE 'd___1' is not faster, it's about the same speed.

It also depends on the number of characters in the search term, the longe it gets, the slower it will be as far as I can tell.

You would need to check with your data if the performance is acceptable.

What is the meaning of the prefix N in T-SQL statements?

8 votes

I have seen prefix N in some insert T-SQL queries. Many people have used N before inserting the value in a table.

I searched, but I was not able to understand what is the purpose of including the N before inserting any strings into the table.

 INSERT INTO Personnel.Employees
 VALUES(N'29730', N'Philippe', N'Horsford', 20.05, 1),

It's declaring the string as nvarchar data type, rather than varchar

You may have seen Transact-SQL code that passes strings around using an N prefix. This denotes that the subsequent string is in Unicode (the N actually stands for National language character set). Which means that you are passing an NCHAR, NVARCHAR or NTEXT value, as opposed to CHAR, VARCHAR or TEXT.

http://databases.aspfaq.com/general/why-do-some-sql-strings-have-an-n-prefix.html


If you want to know the difference between these two data types, see this SO post:

What is the difference between varchar and nvarchar?

Delphi: how to pass a list as a parameter to a SQL query?

8 votes

I have a list of integers or of strings and need to pass it as a parameter for a Delphi DataSet. How to do it?

Here is an example. MyQuery is something like:

select * from myTable where intKey in :listParam

I'd set a parameter as a list or array or something else:

MyQuery.ParamByName('listParam').AsSomething := [1,2,3];

and it would result in this query sent to the sql server:

select * from myTable where intKey in (1, 2, 3)

It would be even better if the solution would also work with strings, making this query:

select * from myTable where stringKey in :listParam

become:

select * from myTable where stringKey in ('a', 'b', 'c')

I believe this is a simple question, but "IN" isn't a good keyword for searching the web.

Please answer how I should configure the parameter in the IDE, the query and how to pass the parameters.

I'm using Delphi 7.

Edited: I'm considering the answer is "it isn't possible to do directly". If someone give me a non-hackish answer, the accepted answer will be changed.

AFAIK, it is not possible directly.

You'll have to convert the list into a SQL list in plain text.

For instance:

function ListToText(const Args: array of string): string; overload;
var i: integer;
begin
  result := '(';
  for i := 0 to high(Args) do 
    result := result+QuotedStr(Args[i])+',';
  result[length(result)] := ')';
end;


function ListToText(const Args: array of integer): string; overload;
var i: integer;
begin
  result := '(';
  for i := 0 to high(Args) do 
    result := result+IntToStr(Args[i])+',';
  result[length(result)] := ')';
end;

To be used as such:

SQL.Text := 'select * from myTable where intKey in '+ListToText([1,2,3]);
SQL.Text := 'select * from myTable where stringKey in '+ListToText(['a','b','c']);

How expensive are MySQL events?

8 votes

In my web app I use two recurring events that "clean up" one of the tables in the database, both executed every 15 minutes or so.

My question is, could this lead to problems in performance in the future? Because I've read somewhere -I don't recall where exactly- that MySQL events are supposed to be scheduled to run once a month or so. Thing is, this same events keep the table in a pretty reduced size (as they delete records older than 15~ minutes), maybe this compensates the frequency of their execution, right?

Also, is it better to have one big MySQL event or many small ones if they are be called in the same frequency?

I don't think there's a performance indication in the monthly base just more of a suggestion of what to do with it. So i think you're ok with doing your cleanup using the events.

In the end the documentation suggets that the events are

Conceptually, this is similar to the idea of the Unix crontab (also known as a “cron job”) or the Windows Task Scheduler.

And the concept for those is that you can run a task every minute if you wish to do so.


On the second part of that question:

Serialize or spread it up. If you split them up into many events that will run at the same time you will create spikes of possibly very high cpu usage that might slow down the application while processing the events.

So either pack everything into one event so it runs in succession or spread the single events up so they execute on different times during the 15 minutes timeframe. Personally i think the first one is to be preferred, pack them up into a single event as then they are guaranteed to run in succession, even if a single one of them keeps running longer than usual.

The same goes for cronjobs. If you shedule 30 long-running exports at a single time your application is going to fail miserably during that timeslot (learned that the hard way).

Simple SQL Join Understanding?

8 votes

Possible Duplicate:
Explicit vs implicit SQL joins
Is there a difference using join andselect from multi-tables?
SQL Joins: Future of the SQL ANSI Standard (where vs join)?

What is the difference between JOIN and declaring multiple tables in the FROM clause?

Such as:

SELECT *
FROM  table1 AS t1
      , table2 AS t2
WHERE t1.id = t2.id

Compared to:

SELECT *
FROM table1 AS t1
INNER JOIN table2 AS t2 ON t2.id = t1.id

The second version, with the explicit JOIN and join condition is standardized SQL.

The implicit join syntax with a WHERE clause is deprecated syntax (or, rather, considered bad) - partially because it is easy to forget the WHERE clause and cause a Cartesian product.

SQL query that selects effective costing rate based on charge date

7 votes

SQL novice here. I'm trying to generate a costing query that outputs employee time card information and calculates cost based on an effective employee costing rate.

My question is similar to the one asked here: Retro-active effective date changes with overlapping dates but I'm not dealing with retro-activity or overlapping date ranges.

Table examples (null values in the rate table indicate current rate):

CREATE TABLE Emp_Rate
(
    Emp int,
    Rate money,
    Rate_Start datetime,
    Rate_Exp datetime
)

CREATE TABLE Emp_Time
(
    Emp int,
    Chrg_Date datetime,
    Chrg_Code varchar(10),
    Chrg_Hrs decimal(8, 2)
)

Insert into Emp_Rate (Emp,Rate,Rate_Start,Rate_Exp) Values ('1','20','5/1/09','4/30/10')
Insert into Emp_Rate (Emp,Rate,Rate_Start,Rate_Exp) Values ('1','21','5/1/10','4/30/11')
Insert into Emp_Rate (Emp,Rate,Rate_Start,Rate_Exp) Values ('1','22','5/1/11',NULL)

Insert into Emp_Time (Emp,Chrg_Date,Chrg_Code,Chrg_Hrs) Values ('1','5/10/09','B','8')
Insert into Emp_Time (Emp,Chrg_Date,Chrg_Code,Chrg_Hrs) Values ('1','5/10/10','B','8')
Insert into Emp_Time (Emp,Chrg_Date,Chrg_Code,Chrg_Hrs) Values ('1','5/10/11','B','8')

The query (returns dupes caused by multiple rate entries(obviously)):

Select  Emp_Time.Emp,
        Cast(Emp_Time.Chrg_Date as DATE) as 'Chrg_Date',
        Emp_Time.Chrg_Code,
        Emp_Time.Chrg_Hrs,
        Emp_Rate.Rate,
        Emp_Time.Chrg_Hrs * Emp_Rate.Rate as 'Cost'

From    Emp_Time inner join
        Emp_Rate on Emp_Rate.Emp = Emp_Time.Emp

Order By [Emp],[Chrg_Date]

Desired output:

Emp Chrg_Date   Chrg_Code   Chrg_Hrs    Rate    Cost
1   2009-05-10  B           8.00        20.00   160.00
1   2010-05-10  B           8.00        21.00   168.00
1   2011-05-10  B           8.00        22.00   176.00

I've gone around in circles using the Between operator in a sub query to isolate the correct rate based on the charge date, but have not had any luck.

I appreciate any help!

You didn't specify the DBMS type the answer below is for sql-server. I am sure there are other ways to do this but this way will replace the null Rate_Exp date with the current date.

Select  et.Emp,
        Cast(et.Chrg_Date as DATEtime) as 'Chrg_Date',
        et.Chrg_Code,
        et.Chrg_Hrs,
        er.Rate,
        et.Chrg_Hrs * er.Rate as 'Cost'
From  Emp_Time et
inner join 
(
    SELECT Emp
        , Rate
        , Rate_Start
        , CASE
            WHEN Rate_Exp is Null
            THEN Convert(varchar(10), getdate(), 101)
            ELSE Rate_Exp
          END as Rate_Exp
    FROM Emp_Rate 
)er 
    on er.Emp = et.Emp
WHERE (et.Chrg_Date BETWEEN er.Rate_Start AND er.Rate_Exp)
Order By et.Emp,et.Chrg_Date

OR use the CASE Statement in your WHERE Clause:

Select  et.Emp,
        Cast(et.Chrg_Date as DATEtime) as 'Chrg_Date',
        et.Chrg_Code,
        et.Chrg_Hrs,
        er.Rate,
        et.Chrg_Hrs * er.Rate as 'Cost'
From  Emp_Time et
inner join Emp_Rate er
    on er.Emp = et.Emp
WHERE (et.Chrg_Date 
        BETWEEN er.Rate_Start 
                AND CASE WHEN er.Rate_Exp Is Null 
            THEN  Convert(varchar(10), getdate(), 101)
            ELSE er.Rate_Exp END)

SQL Query for a particular scenario

Asked on Sun, 29 Apr 2012 by jaja sql
7 votes

Considering the sample table below:

Profile_ID  Child_ID

1           1
1           2
1           3

2           1
2           2

3           1
3           2
3           3
3           5

I want to construct an SQL query which gives me the profile IDs which are applicable to children 1 AND 2 AND 3... this implies that profile_ID 2 will not be in the result as profile 2 is only applied to children 1 and 2.. but I expect that profile 3 is included.

My idea is something similar to this

SELECT Profile_ID
FROM table
WHERE Child_ID IN (1 AND 2 AND 3)

Now, I am well aware that IN serves as an OR function... Is there a similar way to achieve similar functionality to the IN clause with an AND rather than an OR?

Thanks in advance for all your help.

Try this:

select profile_id from t
where child_id in (1, 2 ,3)
group by profile_id
having count(distinct child_id) = 3

Here is the fiddle to play with.

NOTE: Make sure 3 matches the amount of items in the in clause.

INSTEAD OF TRIGGER, Would it infinitely loop?

6 votes

Would an Insert Statement on a table that has an "INSTEAD OF" trigger cause an infinite "instead, insert" loop of executions?

For example this:

CREATE TRIGGER setDescToUpper ON part_numbers
INSTEAD OF INSERT

AS
BEGIN
    INSERT INTO part_numbers (
        colA,
        colB,
        part_description
    ) SELECT
        colA,
        colB,
        UPPER(part_description)
    ) FROM
        INSERTED
END
GO

Would the insert statement inside the "instead of" trigger cause a loop?

I dont wanna disable recursive triggers.

Do I need to temporarily disable the trigger?

Source: SQL Server - after insert trigger - update another column in the same table

This INSERT will not re-invoke the trigger.

SQL Server will not call INSTEAD OF triggers recursively, for exactly the reason you bring up.

Calculate sum of column for selected Ids in SQL

6 votes

These are my tables:

  • Member: Id, Points
  • CartRegister : Id, Member_Id, CartId, RegisterDate, Point
  • SelectetMembers: Id, Member_Id

Members can register Cart in CartRegister, and in Member.Points All points that a member earned must be calculated and inserted. So I need calculate all points of each SelectedMembers and update the Member table, but I don't know how to implement it.

The following script is in my head:

UPDATE [Member]
   SET [Points]=
    (
       SELECT SUM([CR].[Point]) AS [AllPoints]
       FROM  [CartRegister] AS [CR] 
       WHERE [CR].[Member_Id] = --???

    )
    WHERE [Members].[Member].[Id] IN  ( SELECT Member_Id From SelectedMembers  )

So I am confused to what is the where clause in Select Sum(Point) if I use

        WHERE [CR].[Member_Id] IN ( Select Member_Id From SelectedMembers  )

Then the sum of all members be same of sum of all Members Point, maybe I need something like foreach What is your suggestion?

Check this:

UPDATE [Member]
SET [Points]=
(
   SELECT SUM([CR].[Point]) AS [AllPoints]
   FROM  [CartRegister] AS [CR] 
   WHERE [CR].[Member_Id] = [Member].[Id]

)
WHERE [Members].[Member].[Id] IN  ( SELECT Member_Id From SelectedMembers  )

Select photos by multiple tags

6 votes

I have three MySQL tables - photos, tags and tagsphotos - and m:n relationship between photos and tags.

Photos:     id | filename | ...
Tags:       id | name
Tagsphotos: photo | tag

I want to select all photos with this condition:

(tagged as "dirty" AND tagged as "road") AND (tagged as "light.front" OR tagged as "light.side") AND (tagged as "perspective.two-point")

...which means that I want to find all pictures with dirty road, in two-point perspective and either with side or front light.

How can I do it? Thanks.

I think you're going to have to join the tags table to the photos table four times... pretty ugly.

SELECT Photos.*
FROM
  Photos
  JOIN (
    Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
  ) t1 ON (t1.photo = Photos.id)
  JOIN (
    Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
  ) t2 ON (t2.photo = Photos.id)
  JOIN (
    Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
  ) t3 ON (t3.photo = Photos.id)
  JOIN (
    Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
  ) t4 ON (t4.photo = Photos.id)
WHERE
      (t1.name = 'dirty' AND t2.name = 'road')
  AND (t3.name = 'light.front' OR t3.name = 'light.side')
  AND (t4.name = 'perspective.two-point')

Subqueries would probably be faster:

SELECT *
FROM Photos
WHERE
  Photos.id IN (
    SELECT Tagspohotos.photo
    FROM Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
    WHERE Tags.name = 'dirty'
  )
  AND Photos.id IN (
    SELECT Tagspohotos.photo
    FROM Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
    WHERE Tags.name = 'road'
  )
  AND Photos.id IN (
    SELECT Tagspohotos.photo
    FROM Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
    WHERE Tags.name = 'light.front' OR Tags.name = 'light.side'
  )
  AND Photos.id IN (
    SELECT Tagspohotos.photo
    FROM Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
    WHERE Tags.name = 'perspective.two-point'
  )

What's the best way to store different images in the database?

5 votes

Don't truly know how to name this question.

But whats the best way (regarding database design) for storing images for different purposes?

Let me give an example. I have a bunch of user photos and i got another 5 different sets of photos (like user photos but with no connection to user photos).

Is the best thing to store all photos in a single database table and try to reference them from within that table? or is the best to create different tables for each set of photos?

I can see one benefit from creating multiple tables and that's the cascade delete function for removing the photo when the main object is deleted.

Any other aspects to consider?

Another example could be addresses. A user can have an address but so can a company or a location. Create one table for all addresses and try to have some sort of index tables to reference what address belongs to what object or have different tables and eliminate the problem.

Sorry again for the bad naming of the question.

How to store large blobs in sql server

Storing large chunks of binary data in SQL Server is not a great approach. It makes your database very bulky to backup and performance is generally not great. Storing files is usually done on the file system. Sql Server 2008 has out of the box support for FILESTREAM. Microsoft documents the cases to use FileStream as follows

  • Objects that are being stored are, on average, larger than 1 MB.
  • Fast read access is important.
  • You are developing applications that use a middle tier for application logic.

In your case I think all points are valid.

Enable on Server

To enable FILESTREAM support on the server use the following statement.

EXEC sp_configure filestream_access_level, 2
RECONFIGURE

Configure the Database

To get a filestream filegroup linked to your database create

ALTER DATABASE ImageDB ADD FILEGROUP ImageGroup CONTAINS FILESTREAM
ALTER DATABASE ImageDB 
  ADD FILE ( NAME = 'ImageStream', FILENAME = 'C:\Data\Images\ImageStream.ndf')
  TO FILEGROUP TodaysPhotoShoot

Creating the table

The next step is getting your data in the database with filestream storage:

CREATE TABLE Images
(
    [Id] [uniqueidentifier] ROWGUIDCOL NOT NULL PRIMARY KEY, 
    [CreationDate] DATETIME NOT NULL,
    [ImageFile] VARBINARY(MAX) FILESTREAM NULL
)

For Filestream to work you not only need the FILESTREAM property on a field in the table, but also a field which has the ROWGUIDCOL property.

Inserting Data with TSQL

Now to insert data in this table you can use TSQL:

using(var conn = new SqlConnection(connString))
using(var cmd = new SqlCommand("INSERT INTO Images VALUES (@id, @date, cast(@image as varbinary(max))", conn))
{
     cmd.Parameters.AddRange(new {
          new SqlParameter("id", SqlDbType.UniqueIdentifier).Value = uId,
          new SqlParameter("date", SqlDbType.DateTime).Value = creationDate,
          new SqlParameter("image", SqlDbType.varbinary).Value = imageFile,
      });
     conn.Open
     cmd.ExecuteScalar();
}

Inserting data using SqlFileStream

There also exists an approach to get the file data on disk using Win32 directly. This offers you streaming access SqlFileStream inherits from IO.Stream.

Inserting data using win32 can be done with for example the code below:

    public void InsertImage(string connString, Guid uId, DateTime creationDate, byte[] fileContent)
    {
        using (var conn = new SqlConnection(connString))
        using (var cmd = new SqlCommand(@"INSERT INTO Images VALUES (@id, @date, cast(@image as varbinary(max)) output INSERTED.Image.PathName()" , conn))
        {
            conn.Open();

            using (var transaction = conn.BeginTransaction())
            {
                cmd.Transaction = transaction;
                cmd.Parameters.AddRange(
                    new[] {
                         new SqlParameter("id", SqlDbType.UniqueIdentifier).Value = uId,
                         new SqlParameter("date", SqlDbType.DateTime).Value = creationDate,
                         new SqlParameter("image", SqlDbType.VarBinary).Value = null
                        }
                    );

                var path = (string)cmd.ExecuteScalar();

                cmd.CommandText = "SELECT GET_FILESTREAM_TRANSACTION_CONTEXT()";

                var context = (byte[])cmd.ExecuteScalar();

                using (var stream = new SqlFileStream(path, context, FileAccess.ReadWrite))
                {
                    stream.Write(fileContent, 0, fileContent.Length);
                }

                transaction.Commit();
            }
        }

How to model a Photo storage database

With the filestream approach to store the images the table is very narrow which is good for performance since many records can be stored per 8K data page. I would use the following model:

    CREATE TABLE Images
    (
        Id uniqueidentifier ROWGUIDCOL NOT NULL PRIMARY KEY, 
        ImageSet INTEGER NOT NULL 
            REFERENCES ImageSets,
        ImageFile VARBINARY(MAX) FILESTREAM NULL
    )

    CREATE TABLE ImageSets
    (  
        ImageSet INTEGER NOT NULL PRIMARY KEY,
        SetName nvarchar(500) NOT NULL,
        Author INTEGER NOT NULL
            REFERENCES Users(USerId)
    )

   CREATE TABLE Users
   (
        UserId integer not null primary key,
        UserName nvarchar(500),
        AddressId integer not null
             REFERENCES Addresses
   )

   CREATE TABLE Organsations
   (
        OrganisationId integer not null primary key
        OrganisationName nvarchar(500),
        AddressId integer not null
             REFERENCES Addresses
   )

   CREATE TABLE Addresses
   (
       AddressId integer not null primary key,
       Type nvarchar(10), 
       Street nvarchar(500),
       ZipCode nvarchar(50),
       City nvarchar(500),
   )

   CREATE TABLE OrganisationMembers
   (
       OrganisationId integer not null
          REFERENCES Organisations,
       UserId integer not null
          REFERENCES Users,
       PRIMARY KEY (UserId, OrganisationId)
   )
   CREATE NONCLUSTERED INDEX ixOrganisationMembers on OrganisationMembers(OrganisationId)

This translates to the following Entity RelationShip Diagram:

Entity RelationShip Diagram

  • Performance wise, the narrow images table is very good as it contains only a few bytes of data per record.
  • We can assume that an image is always member of an Image Set, The Set information could be hidden if there is only 1 image in it.
  • I assume you want to track which users are member of which organisations, so I added a table to link them (Assuming a user can be member of multiple organisations).
  • The primary key on the OrganisationMembers table has UserId as first field since there normally a lot more users than Organisations and you probably will want to show which organisations a user is member off more often than the inverse.
  • The index on OrganisationId in OrganisationMembers is there to cater for queries where the list of members for a specific Organisation needs to be shown.

References:

How can I query a value in SQL Server XML column

5 votes

I have following XML stored in a XML column (called Roles) in a SQL Server database.

<root>
   <role>Alpha</role>
   <role>Beta</role>
   <role>Gamma</role>
</root>

I'd like to list all rows that have a specific role in them. This role passed by parameter.

select
  Roles
from
  MyTable
where
  Roles.value('(/root/role)[1]', 'varchar(max)') like 'StringToSearchFor'

These pages will show you more about how to query XML in T-SQL:

Querying XML fields using t-sql

Flattening XML Data in SQL Server

EDIT

After playing with it a little bit more, I ended up with this amazing query that uses CROSS APPLY. This one will search every row (role) for the value you put in your like expression...

Given this table structure:

create table MyTable (Roles XML)

insert into MyTable values
('<root>
   <role>Alpha</role>
   <role>Gamma</role>
   <role>Beta</role>
</root>')

We can query it like this:

select * from 

(select 
       pref.value('(text())[1]', 'varchar(32)') as RoleName
from 
       MyTable CROSS APPLY

       Roles.nodes('/root/role') AS Roles(pref)
)  as Result

where RoleName like '%ga%'

You can check the SQL Fiddle here: http://sqlfiddle.com/#!3/ae0d5/13

SQL to get X number of accounts from DB, which could be variable number of rows

5 votes

I have a SQL Server table AccountAction which is denormalised. It is a flattened version of the Account and Action tables, which I'm hoping should be a lot quicker for reporting queries over millions of rows. One Account can have many Actions, so the table looks similar to:

Account     Action
account1    action1
account1    action2
account1    action10
account2    action5

However I'm having some trouble getting the information back for a restricted subset in a simple stored procedure.

select Account, Action
from AccountAction
where ???

What I'm looking for is to get the first X accounts, with all their actions. So this will be a dynamic number of rows. So using the example table above if I passed in 1, I would get 3 rows (i.e. give me all rows for the first account).

(I don't mind that the account name will be in each row - it is pivoted elsewhere)

Do I need to use a ROWNUM or similar to restrict the rows? I'm sure this must be a simpler issue than I've found so far.

EDIT

The answers using TOP won't work, in the example I'd be wanting 3 rows returned if I said 'give me one (the first) account'. But how do I know there will be 3? Its dynamic. Also they may not be sequential, what if account1's action99 was at position 55 million in the results.

WITH
  SequencedData
AS
(
  SELECT
    DENSE_RANK() OVER (ORDER BY Account) AS account_sequence_id,
    *
  FROM
    AccountAction
)
SELECT
  *
FROM
  SequenceData
WHERE
  account_sequence_id = ???

Or, for multiples...

WHERE
  account_sequence_id BETWEEN 3 AND 5    -- For the 3rd, 4th and 5th accounts.

What is the proper way to manually sequence a column in Postgres?

4 votes

I have a SaaS pet project for invoicing. In it, I want my clients to each start with ticket number 1001. Clearly, I can't use a simple auto field in Postgres and just add 1000 to the value, because all my clients will share the same database and the same tickets table.. I've tried using an integer column type and querying (pseudo SQL) SELECT LATEST number FROM tickets WHERE client_id = [current client ID] to get the latest number, and then using that number + 1 to get the next number. The problem is that with concurrency, it's easily possible for two tickets to end with the same number this way. the number I need to be able to do this within Django or with raw SQL (vs using Bash or anything else of the sort).

I'm not looking for a way to force my example to work. I'm just looking for a solution to my the problem of needing independently incrementing ticket numbers for each client.

I don't think there is a "cheap" solution to this problem. The only solution that is safe (but not necessarily fast) in a multi-user environment is to have a "counter" table with one row for each customer.

Each transaction has to first lock the customer's entry before inserting a new ticket, something like this:

UPDATE cust_numbers
  SET current_number = current_number + 1
WHERE cust_id = 42
RETURNING current_number;

That will do three things in one step

  1. increase the current "sequential" number for that customer
  2. lock the row so other transactions doing the same will have to wait for a lock
  3. return the new value of that column.

With that new number you can now insert a new ticket. If the transaction is committed, it will also release the lock on the cust_numbers table, thus other transactions "waiting for a number" can proceed.

You could wrap the two steps (update.. returning & the insert) into a single stored function so that the logic behind this is centralized. Your application would only call select insert_ticket(...) without knowing how the ticket number is generated.

You might also want to create a trigger on the customer table to automatically insert a row into the cust_numbers table when a new customer is created.

The disadvantage of this is that you effectively serialize the transactions that are inserting new tickets for the same customer. Depending on the volumn of inserts in your system this might turn out to be a performance problem.

Edit
Another disadvantage of this is, that you are not forced to insert tickets that way which might lead to problems if e.g. a new developer forgets about this.