Best sql questions in February 2012

sql query kevin bacon number = 2

15 votes

NOTE: To avoid any semblence of academic misconduct on my part, I'd prefer if people didn't post complete/exact solutions, but rather pointed out general things that I'm doing wrong/make general suggestions as to how I should go about this.

using the imdb database, I have tables actor, casts, and movie, and I need to select actors with a kevin bacon number of 2. I thought this should do it, but I'm getting 0 rows returned. What is my error?

select fname, lname
from actor join casts on pid=actor.id
where actor.id in (
    select a3.id  --actors who have a kb number of 2
    from casts c3 join actor a3 on c3.pid=a3.id,
    (
     (select c1.mid --actors who have a kb number of 1
     from (casts c1 join actor a1 on c1.pid=a1.id), (casts c2 join actor a2 on c2.pid=a2.id)
     where c1.mid=c2.mid and a2.fname='Kevin' and a2.lname='Bacon')
    )Level1 where c3.mid=Level1.mid
)
and actor.id not in (select a4.id --and only a kb number of 2
     from (casts c4 join actor a4 on c4.pid=a4.id), (casts c5 join actor a5 on c5.pid=a5.id)
     where c4.mid=c5.mid and a5.fname='Kevin' and a5.lname='Bacon');

Here are the table schemas:

ACTOR (id, fname, lname, gender)

MOVIE (id, name, year)

CASTS (pid, mid, role)

mid is a foreign key to a movie id and pid is a foreign key to actor id.

EDIT: I also tried

select count(distinct pid) from casts join actor on pid=actor.id where mid in (
    select mid from casts where pid in (
        select distinct pid from casts where mid in (
            select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon')))

and pid not in  
    (select distinct pid from casts where mid in (
        select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon'));

which also seems like it should work, but it's not finishing.

NOTE that restrictions on the question prohibit me from using temp tables or recursion-- the query should be done with subselects.

EDIT:

ok, I'm hoping this last iteration is right:

select count(distinct pid) from casts where mid in (
    select mid from casts where pid in (
        select distinct pid from casts where mid in (
            select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon')))

and pid not in  
    (select distinct pid from casts where mid in (
        select mid from casts join actor on pid=actor.id where fname='Kevin' and lname='Bacon'));

the subqueries return sensible answers, at least. But it's taking FOREVER. Each subquery took under 30 seconds, but together they're taking 6 minutes and counting. Why?

To give a sketch of a solution rather than an exact solution I would use this general approach

SELECT *
FROM   ACTOR
WHERE  id IN (
SELECT id 
       /* ... of actors that have worked on a film worked 
         on by actors that have worked on a KB film*/
EXCEPT
SELECT id
 /* ... of all actors that have worked on a KB film
         including KB himself*/ )

Also as you are not allowed to use recursive CTEs anyway here's an answer using those.

WITH RecursiveCTE
     AS (SELECT C.pid,
                C.mid,
                0 as Level
         FROM   CASTS C
                JOIN ACTOR A
                  ON A.id = C.pid
         WHERE  A.fname = 'Kevin'
                and A.lname = 'Bacon'
         UNION ALL
         SELECT c1.pid,
                c2.mid,
                R.Level + 1
         FROM   RecursiveCTE R
                JOIN CASTS c1
                  ON c1.mid = R.mid
                     AND R.Level < 2
                JOIN CASTS c2
                  ON c1.pid = c2.pid)
SELECT *
FROM   ACTOR
WHERE  id IN (SELECT pid
              FROM   RecursiveCTE
              GROUP  BY pid
              HAVING MIN(Level) = 2)  

Designing a questiion-and-answer system that is flexible and efficient

11 votes

I've been working on a dynamic question-and-answers system, but I'm having trouble creating a efficient AND flexible design for this system. I'd love to know if there's an established design pattern or any recommendations for designing this system.

What I'm trying to do

I have a set of questions. After answering them, another set of questions are shown, depending on the answers to the previous set. This repeats, until no more questions are needed.

The question answers are all boolean, multiple-choice, or numeric.

The important part is that most questions are only shown when a specific set of criteria is met, based on the previous answers.
I need the criteria to support mainly boolean logic, such as And, Or, Not, Equals, Greater Than, and Less Than.

For example, let's say I have already received answers to questions such as Age, Gender, and State.
One of the next questions is In School?, but it should ONLY be displayed if: Age < 30 AND Gender=Male AND (State = CA OR State = NY)

Has anyone heard of a similar design pattern? How would you approach this design?


Background Information

I tried Database columns

At first, we only had 3 initial questions, so we just used 3 columns to filter the second set of questions.

However, our business needs grew and we started needing more initial questions, added more columns, and put more logic within those filters.

This quickly became too rigid and cumbersome.

I tried a Logic Interpreter

Our second attempt to make the system more flexible: store the filtering logic as JavaScript, and run a JavaScript interpreter to filter the results.

This worked pretty well for flexibility, but retrieving thousands of rows from the database and interpreting the scripts was extremely inefficient and performed too poorly for production.

I tried a Hybrid

We finally combined the two approaches, and came up with something feasable.
We first filtered our list based on several hard-coded database columns, and further filtered the list with the JavaScript interpreter.

This hybrid system still has many drawbacks:

  • The logic is in 2 different systems (SQL database logic and JavaScript interpreter)
  • Interpreting the JavaScript is slow, and is probably overkill for the simple boolean logic needed
  • The system is very difficult to maintain, especially because the JavaScript logic must always be written by a developer.

I'd really like to hear suggestions on how to improve this design.

Other Info

My database is MS SQL Server, the backend is .NET C#, and the JavaScript interpreter is JINT. The UI implementation is not important, but is a AJAX enabled website used to ask and answer these questions.

We had to do something similar in the past for a medical system and due to its complexity, we resorted to reuse the rule engine that support multi-classification decision tree. I remember that I came across a nice simple design about this and managed to dig out the link.

http://www.javaworld.com/javaworld/javatips/jw-javatip139.html?page=1

The design is loosely coupled from the data storage, so making it easy to fit into your existing solution design.

Nullable DateTime conversion

10 votes

Possible Duplicate:
c# why cant a nullable int be assigned null as a value

Im trying to convert my reader[3] object which is datetime to be null if there is no lastPostDate for a Forum but it says Im missing a conversion. Error:

Type of conditional expression cannot be determined because there is no implicit conversion between <null> and 'System.DateTime'

public class Forums
{
    public List<Forum> GetForums()
    {
        using (SqlConnection conn = new SqlConnection(ConfigurationManager.ConnectionStrings["CMS"].ConnectionString))
        {
            conn.Open();
            SqlCommand cmd = new SqlCommand("sproc_Forums_GetForums", conn);
            cmd.CommandType = CommandType.StoredProcedure;
            SqlDataReader reader = cmd.ExecuteReader(CommandBehavior.Default);

            List<Forum> forums = new List<Forum>();
            while (reader.Read())
            {
                var title = reader[6].ToString();
                var threadCount = (int)reader[5];
                var lastPostTitle = reader[4].ToString();
                // below is where im having a problem
                Nullable<DateTime> lastPostDate = (reader[3] == DBNull.Value ? null : Convert.ToDateTime(reader[3])); 
                var lastPostBy = reader[2].ToString();
                var forumGroup = reader[1].ToString();
                var description = reader[0].ToString();

                Forum forum = new Forum(0, "",DateTime.Now,
                    reader["Title"].ToString(),description,
                    0,false,"","",DateTime.Now,true,
                    forumGroup, (int)threadCount, lastPostBy,
                    lastPostDate, lastPostTitle);
                forums.Add(forum);/**/

            }
            return forums;

        }

    }            
}

Below is my class object for Forum with a Nullable lastPostDate

    public class Forum
{
    public Forum(int forumID, string addedBy, DateTime addedDate, string title, string description, int parentID, bool moderated,
        string imageUrl, string updatedBy, DateTime? updatedDate, bool active, string forumGroup, int threadCount, string lastPostBy,
        Nullable<DateTime> lastPostDate, string lastPostTitle)
    {
        this.ForumID = forumID;
        this.AddedBy = addedBy;
        this.AddedDate = addedDate;
        this.Title = title;
        this.Description = description;
        this.ParentID = parentID;
        this.Moderated = moderated;
        this.ImageUrl = imageUrl;
        this.UpdatedBy = updatedBy;
        this.UpdatedDate = updatedDate;
        this.Active = active;
        this.ForumGroup = forumGroup;
        this.ThreadCount = threadCount;
        this.LastPostBy = lastPostBy;
        this.LastPostDate = lastPostDate;
        this.LastPostTitle = lastPostTitle;
    }

    private int _forumID;
    public int ForumID
    {
        get { return _forumID; }
        set { _forumID = value; }
    }
    private string _addedBy;
    public string AddedBy
    {
        get { return _addedBy; }
        set { _addedBy = value; }
    }
    private DateTime _addedDate = DateTime.Now;
    public DateTime AddedDate
    {
        get { return _addedDate; }
        set { _addedDate = value; }
    }
    private string _title = "";
    public string Title
    {
        get { return _title; }
        set { _title = value; }
    }
    private string _description = "";
    public string Description
    {
        get { return _description; }
        set { _description = value; }
    }
    private int _parentID = 0;
    public int ParentID
    {
        get { return _parentID; }
        set { _parentID = value; }
    }
    private bool _moderated = false;
    public bool Moderated
    {
        get { return _moderated; }
        set { _moderated = value; }
    }
    private string _imageUrl = "";
    public string ImageUrl
    {
        get { return _imageUrl; }
        set { _imageUrl = value; }
    }
    private string _updatedBy = "";
    public string UpdatedBy
    {
        get { return _updatedBy; }
        set { _updatedBy = value; }
    }
    private DateTime? _updatedDate = null;
    public DateTime? UpdatedDate
    {
        get { return _updatedDate; }
        set { _updatedDate = value; }
    }
    private bool _active = false;
    public bool Active
    {
        get { return _active; }
        set { _active = value; }
    }
    private string _forumGroup = "";
    public string ForumGroup
    {
        get { return _forumGroup; }
        set { _forumGroup = value; }
    }
    private int _threadCount = 0;
    public int ThreadCount
    {
        get { return _threadCount; }
        set { _threadCount = value; }
    }
    private string _lastPostBy = "";
    public string LastPostBy
    {
        get { return _lastPostBy; }
        set { _lastPostBy = value; }
    }
    private Nullable<DateTime> _lastPosteDate = null;
    public Nullable<DateTime> LastPostDate
    {
        get { return _lastPosteDate; }
        set { _lastPosteDate = value; }
    }
    private string _lastPostTitle = "";
    public string LastPostTitle
    {
        get { return _lastPostTitle; }
        set { _lastPostTitle = value; }
    }
}

You might want to do it like this:

DateTime? lastPostDate =  (DateTime?)(reader.IsDbNull(3) ? null : reader[3]); 

The problem you are having is that the ternary operator wants a viable cast between the left and right sides. And null can't be cast to DateTime.

When executing a stored procedure, what is the benefit of using CommandType.StoredProcedure versus using CommandType.Text?

10 votes

So in C# to use a stored procedure I have code like the following (connection code omitted):

 string sql = "GetClientDefaults";

 SqlCommand cmd = new SqlCommand(sql);
 cmd.CommandType = CommandType.StoredProcedure;    //<-- DO I NEED THIS??
 cmd.Parameters.AddWithValue("@computerName", computerName);

Where sql is the name of a stored procedure. Now, this code seems to work just fine with and without the commented line.

So, do I need this line? Is there some performance (or other) benefit to setting this? Is there a benefit to NOT setting it or setting it to Text?

According to the tests in this blog post SQL Server will do the parameterization for you, by wrapping your statement in sp_executesql, when you use CommandType.Text. But when you use CommandType.StoredProcedure you will parameterize it and thereby saving the database some work. The latter method is faster.

Edit:

Setup

I've done some tests myself and here are the results.

Create this procedure:

create procedure dbo.Test
(
   @Text1 varchar(10) = 'Default1'
  ,@Text2 varchar(10) = 'Default2'
)
as
begin
   select @Text1 as Text1, @Text2 as Text2
end

Add a trace to it using SQL Server Profiler.

And then call it using the following code:

using System;
using System.Data;
using System.Data.SqlClient;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main()
        {
            CallProcedure( CommandType.Text );
            CallProcedure( CommandType.StoredProcedure );
        }

        private static void CallProcedure(CommandType commandType)
        {
            using ( SqlConnection connection = new SqlConnection("Data Source=localhost;Initial Catalog=Test;Integrated Security=SSPI;") )
            {
                connection.Open();
                using ( SqlCommand textCommand = new SqlCommand("dbo.Test", connection) )
                {
                    textCommand.CommandType = commandType;
                    textCommand.Parameters.AddWithValue("@Text1", "Text1");
                    textCommand.Parameters.AddWithValue("@Text2", "Text2");
                    using ( IDataReader reader = textCommand.ExecuteReader() )
                    {
                        while ( reader.Read() )
                        {
                            Console.WriteLine(reader["Text1"] + " " + reader["Text2"]);
                        }
                    }
                }
            }
        }
    }
}

Results

In both cases the calls are made using RPC.

Here's what the trace reveals using CommandType.Text:

exec sp_executesql N'dbo.Test',N'@Text1 nvarchar(5),@Text2 nvarchar(5)',@Text1=N'Text1',@Text2=N'Text2'

And here is the result using CommandType.StoredProcedure:

exec dbo.Test @Text1=N'Text1',@Text2=N'Text2'

As you can see the text-call is wrapped in a call to sp_executesql so that it is properly parameterized. This will of course create a slight overhead, and thus my previous statement that using CommandType.StoredProcedure is faster still stands.

Another noteworthy thing, and which is also kind of a deal breaker here, is that when I created the procedure without default values I got the following error:

Msg 201, Level 16, State 4, Procedure Test, Line 0 Procedure or function 'Test' expects parameter '@Text1', which was not supplied.

The reason for this is how the call to sp_executesql is created, as you can see the parameters are declared and initialized, but they are not used. For the call to work, it should have looked like this:

exec sp_executesql N'dbo.Test @Text1, @Text2',N'@Text1 nvarchar(5),@Text2 nvarchar(5)',@Text1=N'Text1',@Text2=N'Text2'

Meaning, when you're using CommandType.Text you have to add the parameters to the CommandText unless you always want the default values to be used.

So, to answer your question

  1. Using CommandType.StoredProcedure is faster.
  2. If you're using CommandType.Text, then you'll have to add the parameter names to the call to the procedure unless you want the default values to be used.

Timezones in SQL DATE vs java.sql.Date

9 votes

I'm getting a bit confused by the behaviour of the SQL DATE data type vs. that of java.sql.Date. Take the following statement, for example:

select cast(? as date)           -- in most databases
select cast(? as date) from dual -- in Oracle

Let's prepare and execute the statement with Java

PreparedStatement stmt = connection.prepareStatement(sql);
stmt.setDate(1, new java.sql.Date(0)); // GMT 1970-01-01 00:00:00
ResultSet rs = stmt.executeQuery();
rs.next();

// I live in Zurich, which is CET, not GMT. So the following prints -3600000, 
// which is CET 1970-01-01 00:00:00
// ... or   GMT 1969-12-31 23:00:00
System.out.println(rs.getDate(1).getTime());

In other words, the GMT timestamp I bind to the statement becomes the CET timestamp I get back. At what step is the timezone added and why?

Note:

  • I have observed this to be true for any of these databases:

    DB2, Derby, H2, HSQLDB, Ingres, MySQL, Oracle, Postgres, SQL Server, Sybase ASE, Sybase SQL Anywhere

  • I have observed this to be false for SQLite (which doesn't really have true DATE data types)
  • All of this is irrelevant when using java.sql.Timestamp instead of java.sql.Date
  • This is a similar question, which doesn't answer this question, however: java.util.Date vs java.sql.Date

The JDBC specification does not define any details with regards to time zone. Nonetheless, most of us know the pains of having to deal with JDBC time zone discrepencies; just look at all the StackOverflow questions!

Ultimately, the handling of time zone for date/time database types boils down to the database server, the JDBC driver and everything in between. You're even at the mercy of JDBC driver bugs; PostgreSQL fixed a bug in version 8.3 where

Statement.getTime, .getDate, and .getTimestamp methods which are passed a Calendar object were rotating the timezone in the wrong direction.

When you create a new date using new Date(0) (let's assert you are using Oracle JavaSE java.sql.Date, your date is created

using the given milliseconds time value. If the given milliseconds value contains time information, the driver will set the time components to the time in the default time zone (the time zone of the Java virtual machine running the application) that corresponds to zero GMT.

So, new Date(0) should be using GMT.

When you call ResultSet.getDate(int), you're executing a JDBC implementation. The JDBC specification does not dictate how a JDBC implementation should handle time zone details; so you're at the mercy of the implementation. Looking at the Oracle 11g oracle.sql.DATE JavaDoc, it doesn't seem Oracle DB stores time zone information, so it performs its own conversions to get the date into a java.sql.Date. I have no experience with Oracle DB, but I would guess the JDBC implementation is using the server's and your local JVM's time zone settings to do the conversion from oracle.sql.DATE to java.sql.Date.


You mention that multiple RDBMS implementations handle time zone correctly, with the exception of SQLite. Let's look at how H2 and SQLite work when you send date values to the JDBC driver and when you get date values from the JDBC driver.

The H2 JDBC driver PrepStmt.setDate(int, Date) uses ValueDate.get(Date), which calls DateTimeUtils.dateValueFromDate(long) which does a time zone conversion.

Using this SQLite JDBC driver, PrepStmt.setDate(int, Date) calls PrepStmt.setObject(int, Object) and does not do any time zone conversion.

The H2 JDBC driver JdbcResultSet.getDate(int) returns get(columnIndex).getDate(). get(int) returns an H2 Value for the specified column. Since the column type is DATE, H2 uses ValueDate. ValueDate.getDate() calls DateTimeUtils.convertDateValueToDate(long), which ultimately creates a java.sql.Date after a time zone conversion.

Using this SQLite JDBC driver, the RS.getDate(int) code is much simpler; it just returns a java.sql.Date using the long date value stored in the database.

So we see that the H2 JDBC driver is being smart about handling time zone conversions with dates while the SQLite JDBC driver is not (not to say this decision isn't smart, it might suit SQLite design decisions well). If you chase down the source for the other RDBMS JDBC drivers you mention, you will probably find that most are approaching date and time zone in a similar fashion as how H2 does.

Though the JDBC specifications do not detail time zone handling, it makes good sense that RDBMS and JDBC implementation designers took time zone into consideration and will handle it properly; especially if they want their products to be marketable in the global arena. These designers are pretty darn smart and I am not surprised that most of them get this right, even in the absence of a concrete specification.


I found this Microsoft SQL Server blog, Using time zone data in SQL Server 2008, which explains how time zone complicates things:

timezones are a complex area and each application will need to address how you are going to handle time zone data to make programs more user friendly.

Unfortunately, there is no current international standard authority for timezone names and values. Each system needs to use a system of their own choosing, and until there is an international standard, it is not feasible to try to have SQL Server provide one, and would ultimately cause more problems than it would solve.

is "where (ParamID = @ParamID) OR (@ParamID = -1)" a good practice in sql selection

9 votes

i used to write sql statments like

select * from teacher where (TeacherID = @TeacherID) OR (@TeacherID = -1)

read more

and pass @TeacherID value = -1 to select all teachers

now i'm worry about the performance can you tell me is that a good practice or bad one?

many thanks

If TeacherID is indexed and you are passing a value other than -1 as TeacherID to search for details of a specific teacher then this query will end up doing a full table scan rather than the potentially far more efficient option of seeking into the index to retrieve the details of the specific teacher...

... Unless you are on SQL 2008 SP1 CU5 and later and use the OPTION (RECOMPILE) hint. See Dynamic Search Conditions in T-SQL for the definitive article on the topic.

How to implement SkipWhile with Linq to Sql without first loading the whole list into memory?

8 votes

I need to order the articles stored in a database by descending publication date and then take the first 20 records after the article with Id == 100.

This is what I would like to do with Linq:

IQueryable<Article> articles = 
    db.Articles
    .OrderByDescending(a => a.PublicationDate)
    .SkipWhile(a => a.Id != 100)
    .Take(20);

However, this generates a NotSupportedException because SkipWhile is not supported in Linq to Sql (see here).

A possible solution is to execute the query and then apply SkipWhile using Linq to Object:

IEnumerable<ArticleDescriptor> articles = 
    db.Articles
    .OrderByDescending(a => a.PublicationDate)
    .ToList()
    .SkipWhile(a => a.Article.Id != 100)
    .Take(20);

But this means I need to load the whole ordered list into memory first and then take 20 articles after the one with Id == 100.

Is there a way to avoid this huge memory consumption?

More in general, what is the best way to achieve this in SQL?

If, as I'm guessing from the column name, PublicationDate doesn't change, you can do this in two separate queries:

  • Establish the PublicationDate of the Article with Id == 100
  • Retrieve the 20 articles from that date onwards

Something like:

var thresholdDate = db.Articles.Single(a => a.Id == 100).PublicationDate;
var articles = 
    db.Articles
    .Where(a => a.PublicationDate <= thresholdDate)
    .OrderByDescending(a => a.PublicationDate)
    .Take(20);

It might even be that LINQ to SQL can translate this:

var articles = 
    db.Articles
    .Where(a => a.PublicationDate 
             <= db.Articles.Single(aa => aa.Id == 100).PublicationDate)
    .OrderByDescending(a => a.PublicationDate)
    .Take(20);

but that may be too complex for it. Try it and see.

Select a portion from a MySQL Blob Field

7 votes

I have a table containing lots of data and one of them is a blob. I some times needs to look into this blob for data using PHP.

I do:

select `desc` from table where `desc` like '%Nam rhoncus%';

this return the entire data but I don't need it

So if my description is like this:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. In a tempor velit. Integer sit amet ligula nibh, eu rutrum ante. Mauris cursus, neque eu ultrices pulvinar, purus purus fermentum libero, in eleifend tortor orci quis lectus. Cras luctus nunc ac tortor laoreet eu iaculis libero consectetur. Maecenas iaculis facilisis libero sodales auctor. Donec gravida interdum vehicula. Suspendisse vitae massa eget arcu condimentum mattis. Fusce ut ligula ante, nec placerat felis. Maecenas vel nunc nibh, ut luctus urna. Nunc eu lectus a orci iaculis volutpat eget a lorem. Nunc lobortis porttitor tempor. Nulla ipsum neque, volutpat in viverra sit amet, pharetra non tortor. Phasellus at leo pellentesque nunc ultrices euismod.

Nulla ullamcorper scelerisque leo, eu consequat risus fringilla id. Nulla facilisi. Cras sit amet sem a diam molestie dignissim. Duis interdum, sapien quis laoreet bibendum, dui turpis imperdiet magna, id auctor metus velit sollicitudin dolor. Integer blandit, turpis eget interdum commodo, ante nisl laoreet dui, ac congue purus dui quis nisl. Etiam blandit eleifend tortor at egestas. Vestibulum euismod orci ac nibh consectetur feugiat. Praesent ac libero quam. Morbi elit nulla, gravida ac blandit eu, bibendum vitae lacus. In facilisis pellentesque ipsum aliquam auctor. Nam rhoncus, purus eget fringilla ullamcorper, mauris tellus fermentum lectus, ut tempus tellus arcu vel dolor. Suspendisse eros augue, tincidunt sit amet luctus et, auctor id turpis. Praesent consequat velit ut arcu convallis sodales. Proin pulvinar varius erat, id consequat orci varius sed.

An I am looking for: Nam rhoncus I want to get the first occurence like:

...m auctor. Nam rhoncus, purus eg...

This query:

SELECT 
  CONCAT(
    '...', 
    SUBSTR(`description`, 
      LOCATE('Nam rhoncus', `description`) - 10, 
      (LENGTH('Nam rhoncus') + 20)), 
    '...') AS `description`
FROM table 
WHERE `description` LIKE '%Nam rhoncus%';

(I broke it down like this so it's easier to read)

this will output:

...m auctor. Nam rhoncus, purus eg...

So in your PHP you can do:

<?php
define('CHAR_LEFT', 10);
define('CHAR_RIGHT', 10);
// db stuff
$search = mysql_real_escape_string($search_var);
$query = "SELECT CONCAT('...', SUBSTR(`description`, LOCATE('" . $search . "', `description`) - " . CHAR_LEFT . ", (LENGTH('" . $search . "') + " . (CHAR_LEFT + CHAR_RIGHT) . ")), '...') AS `description` FROM table WHERE `description` LIKE '%" . $search . "%';";
// then your request

NOTE: Ill be careful using mysql reversed words, this is why I use description instead.

Join two queries into one

7 votes

Is there a way to join this two queries into one?

query = "select foo from TABLE where foo like '%foo%'";

if (query.empty())
    query = "select bar from TABLE where bar like '%foo%'"

Update:

select ifnull(foo,bar) from TABLE where foo like 'foo%' or bar like '%foo%';

Thanks to Kamal for the idea

For Oracle

Select NVL(foo,bar) from TABLE where foo like '%foo%' or bar like '%foo%';

Python is slow when iterating over a large list

7 votes

I am currently selecting a large list of rows from a database using pyodbc. The result is then copied to a large list, and then i am trying to iterate over the list. Before I abandon python, and try to create this in C#, I wanted to know if there was something I was doing wrong.

clientItems.execute("Select ids from largetable where year =?", year);
allIDRows = clientItemsCursor.fetchall() #takes maybe 8 seconds.

for clientItemrow in allIDRows:
    aID = str(clientItemRow[0])
    # Do something with str -- Removed because I was trying to determine what was slow
    count = count+1

Some more information:

  • The for loop is currently running at about 5 loops per second, and that seems insanely slow to me.
  • The total rows selected is ~489,000.
  • The machine its running on has lots of RAM and CPU. It seems to only run one or two cores, and ram is 1.72GB of 4gb.

Can anyone tell me whats wrong? Do scripts just run this slow?

Thanks

This should not be slow with Python native lists - but maybe ODBC's driver is returning a "lazy" object that tries to be smart but just gets slow. Try just doing

allIDRows = list(clientItemsCursor.fetchall())

in your code and post further benchmarks.

(Python lists can get slow if you start inserting things in its middle, but just iterating over a large list should be fast)

How do I get sales orders where every line of the sales order is closed?

6 votes

The table has 'Sales_Order_ID', 'Sales_Order_Line_Number', and 'Sales_Order_Line_staus' among other fields. I want to retrieve 'Sales_Order_ID' where each record for that 'Sales_Order_ID' has the same 'Sales_Order_Line_Status'.

So, if every record for sales order X has status of 'closed', then I want to retrieve it. If sales order Y has three records with status 'closed' and one record with status 'open', then I don't want to retrieve it.

I tried:

SELECT DISTINCT s1.so_ID, s1.SO_line_status
FROM sales_order_table s1
INNER JOIN sales_order_table s2
ON s1.so_id = s2.so_id
  AND s1.so_line_status = s2.so_line_status
ORDER BY s1.so_id 

To no success. The following seems to give the opposite of what I want:

SELECT DISTINCT s1.so_ID, s1.SO_line_status
FROM sales_order_table s1
INNER JOIN sales_order_table s2
ON s1.so_id = s2.so_id
  AND s1.so_line_status <> s2.so_line_status
ORDER BY s1.so_id  

So I tried:

SELECT DISTINCT s1.so_ID, s1.SO_line_status
FROM sales_order_table s1
INNER JOIN sales_order_table s2
ON s1.so_id = s2.so_id
  AND NOT s1.so_line_status <> s2.so_line_status
ORDER BY s1.so_id

To no success.

Then I went totally noob and changed the join type around just hoping that it would work. Am I close here or totally going about it the wrong way?

Also, I realize that the queries above do not restrict the results to 'closed' status, but I figured if I could get one that returns only all same status lines, I could then restrict them to 'closed'.

Sorry if this is unclear! If so, I will try to clarify.

SELECT so_ID
FROM sales_order_table 
GROUP BY so_ID
HAVING MAX(SO_line_status) = 'Closed' AND
       MIN(SO_line_status) = 'Closed' AND
       COUNT(CASE WHEN SO_line_status IS NULL THEN 1 END) = 0

You could also use EXCEPT if your RDBMS supports it

SELECT so_ID
FROM sales_order_table 
WHERE SO_line_status = 'Closed'
EXCEPT
SELECT so_ID
FROM sales_order_table 
WHERE SO_line_status IS NULL OR SO_line_status <> 'Closed' 

How to make =NULL work in SQLite?

6 votes

Given the following table:

Table: Comedians
=================

Id    First    Middle    Last
---  -------  --------  -------
 1     Bob      NULL     Sagat
 2    Jerry     Kal      Seinfeld      

I want to make the following prepared query:

SELECT * FROM Comedians WHERE Middle=?

work for all cases. It currently does not work for the case where I pass NULL via sqlite3_bind_null. I realize that the query to actually search for NULL values uses IS NULL, but that would mean that I cannot use the prepared query for all cases. I would actually have to change the query depending on the input, which largely defeats the purpose of the prepared query. How do I do this? Thanks!

You can use the IS operator instead of =.

SELECT * FROM Comedians WHERE Middle IS ?

simplify SQL statement by using CTE

6 votes

I have a query like the following:

SELECT A.a, A.b, B.c,
(CASE WHEN ... THEN ... ELSE ... END) AS CalculatedValue,
B.d
FROM    dbo.TableA A INNER JOIN
        dbo.TableB B ON (...)
WHERE (CASE WHEN ... THEN ... ELSE ... END) BETWEEN @DayStart AND @DayEnd
GROUP BY A.a, (CASE WHEN ... THEN ... ELSE ... END), B.c

to avoid repeating many times the exact same expression: (CASE WHEN ... THEN ... ELSE ... END) I wanted to define a CTE and query such table using in the select, where and group by the expression CalculatedValue

unfortunately this does not work because the select needs to already include the group by when creating the CTE

is there any other way I could use to not repeat the CASE WHEN... so many times?

Use CROSS APPLY, which can be used to define aliased fields and then refer to them:

SELECT A.a, 
       A.b, 
       B.c,
       CalculatedValue,
       B.d
FROM    
       dbo.TableA A 
INNER JOIN
        dbo.TableB B 
        ON (...)
CROSS APPLY 
        (SELECT (CASE WHEN ... THEN ... ELSE ... END)) CxA(CalculatedValue)
WHERE CalculatedValue BETWEEN @DayStart AND @DayEnd
GROUP BY A.a, CalculatedValue, B.c

The CxA is just an alias and you can name it whatever you like.

Another way to improve the SQL Query to avoid union?

6 votes

User can search by Postcode (eg: L14, L15, L16) or Location from a textbox.

If user type in "Liverpool", it will find all the shops that are located in "Liverpool". If User type in the postcode (Eg: L15), it will search all the shops that do delivery in L15 postcode zone.

See the Tables below:

mysql> select * from shops;
+----+----------+-----------+----------+
| id | name     | location  | postcode |
+----+----------+-----------+----------+
|  1 | Shop One | Liverpool | L10      |
|  2 | Shop Two | Liverpool | L16      |
+----+----------+-----------+----------+

-

mysql> select * from shops_delivery_area;
+------------------+---------+----------+---------------+
| delivery_area_id | shop_id | postcode | delivery_cost |
+------------------+---------+----------+---------------+
|                1 |       1 | L10      |          1.50 |
|                2 |       1 | L11      |          0.00 |
|                3 |       1 | L12      |          1.00 |
|                4 |       1 | L13      |          1.00 |
|                5 |       2 | L10      |          2.50 |
|                6 |       2 | L16      |          0.00 |
|                7 |       2 | L28      |          0.00 |
+------------------+---------+----------+---------------+

SQL Query:

SELECT U.* FROM 
   ((SELECT DISTINCT shops.*, DA.delivery_cost, DA.postcode AS AreaPostcode FROM shops
             JOIN shops_delivery_area as DA on (DA.shop_id = shops.id)
   WHERE DA.postcode = "Liverpool")
  UNION
   (SELECT DISTINCT shops.*, DA.delivery_cost, DA.postcode AS AreaPostcode FROM shops
             JOIN shops_delivery_area as DA on  
                              (DA.shop_id = shops.id AND
                              DA.postcode = shops.postcode)
   WHERE shops.location = "Liverpool")) as U

-

Result - by Location (Liverpool):

+----+----------+-----------+----------+---------------+--------------+
| id | name     | location  | postcode | delivery_cost | AreaPostcode |
+----+----------+-----------+----------+---------------+--------------+
|  1 | Shop One | Liverpool | L10      |          1.50 | L10          |
|  2 | Shop Two | Liverpool | L16      |          0.00 | L16          |
+----+----------+-----------+----------+---------------+--------------+

Result - by Postcode (L12):

+----+----------+-----------+----------+---------------+--------------+
| id | name     | location  | postcode | delivery_cost | AreaPostcode |
+----+----------+-----------+----------+---------------+--------------+
|  1 | Shop One | Liverpool | L10      |          1.00 | L12          |
+----+----------+-----------+----------+---------------+--------------+

It appear to be working correctly... Is there other way to improve the SQL query shorter to avoid union or something?

Whatever you choose, be aware that short code is not always optimal code. In many cases, where you have sufficiently divergent logic, unioning the results really is the most optimal (and sometimes most clean, programatically) option.

That said, the following OR in the WHERE clause seems to cover both your cases...

SELECT DISTINCT
  shops.*,
  DA.delivery_cost,
  DA.postcode AS AreaPostcode
FROM
  shops
INNER JOIN
  shops_delivery_area as DA
    ON (DA.shop_id = shops.id)
WHERE
  (DA.postcode = "Liverpool")
OR
  (DA.postcode = shops.postcode AND shops.location = "Liverpool")

using PARSENAME to find the last item in a list

6 votes

I am using Parsename in SQL and would like to extract the last element in a list of items. I am using the following code.

Declare @string as varchar(1000)
set @string = '25.26.27.28'

SELECT PARSENAME(@string, 1)

This works and returns the value 28 as I expect. However if I expand my list past more than 4 items then the result returns a NULL. For example:

Declare @string2 as varchar(1000)
set @string2 = '25.26.27.28.29'

SELECT PARSENAME(@string2, 1)

I would expect this to return a value of 29 however only NULL is returned

I'm sure there is a simple explaination to this can anyone help?

PARSENAME is designed specifically to parse an sql object name. The number of periods in the latter example exempt it from being such a name so the call correctly fails.

Instead

select right(@string2, charindex('.', reverse(@string2), 1) - 1)

Tricky GROUP BY issue on ORACLE

6 votes

I’m currently facing an issue that my Oracle knowledge cannot solve, I’m definitely not DB expert and that’s why I ask you if you have any idea how to solve my SQL query issue.

Here’s my problem, I have two tables, let’s call them DEVICE_TABLE and COUNT_TABLE

COUNT_TABLE looks like :

    DEVICE (Int) PK         |       QUANTITY (Int)
- - - - - - - - - - - - - - - - - - - - - - - - - - -
        1001                |              4
- - - - - - - - - - - - - - - - - - - - - - - - - - -
        1002                |             20
- - - - - - - - - - - - - - - - - - - - - - - - - - - 
        1003                |              1
…

DEVICE_TABLE looks like :

     ID (Int) PK            |      WiFi (String)            |     Email (String)          |   Bluetooth(String)           |   …
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        1001                |             Yes               |               No            |                 No            |   …
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        1002                |             Yes               |               Yes           |                 No            |   …
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        1003                |             Unknown           |               Unknown       |                 Yes           |   …
…

Constraints are :

DEVICE_TABLE.ID = COUNT_TABLE.DEVICE

WiFi, Email, Bluetooth… are Strings that can only be : “Yes”, “No” or “Unknown”

Finally, my SQL request result expected is (based on my example):

         Feature        |            Yes           |              No            |            Unknown          
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
        WiFi            |             24           |                 0          |                 1                  
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
       Email            |             20           |                 4          |                 1                  
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
    Bluetooth           |              1           |                24          |                 0                   
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
…

In few words, aim of this request is to sum all devices count that are compatible with a particular feature.

Thank you in advance if you have any clue on how to achieve this ! (Maybe it is not possible…)

In Oracle 11, you can use the pivot clause together with the unpivot clause:

with 
count_table as (
     select 1001 device_id,  4 quantity from dual union all
     select 1002 device_id, 20 quantity from dual union all
     select 1003 device_id,  1 quantity from dual 
),
device_table as (
     select 1001 id, 'Yes'     wifi, 'No'       email, 'No'  bluetooth from dual union all
     select 1002 id, 'Yes'     wifi, 'Yes'      email, 'No'  bluetooth from dual union all
     select 1003 id, 'Unknown' wifi, 'Unknown'  email, 'Yes' bluetooth from dual 
)
----------------------------------------
select * from (
      select
        feature,
        yes_no_unknown,
        sum(quantity)  quantity
      from 
         count_table  c join 
         device_table d on c.device_id = d.id
      unpivot  ( yes_no_unknown
                 for feature in (wifi, email, bluetooth)
      ) 
      group by 
      feature,
      yes_no_unknown
)  
pivot ( sum (quantity)
        for yes_no_unknown in ('Yes' as yes, 'No' as no, 'Unknown' as unknown)
)
;

Alternatively, you might want to join the two existing tables to a third table that containts the values for the three desired rows. It's probably a bit easier to read, too:

with 
count_table as (
     select 1001 device_id,  4 quantity from dual union all
     select 1002 device_id, 20 quantity from dual union all
     select 1003 device_id,  1 quantity from dual 
),
device_table as (
     select 1001 id, 'Yes'     wifi, 'No'       email, 'No'  bluetooth from dual union all
     select 1002 id, 'Yes'     wifi, 'Yes'      email, 'No'  bluetooth from dual union all
     select 1003 id, 'Unknown' wifi, 'Unknown'  email, 'Yes' bluetooth from dual 
)
----------------------------------------
select
   f.txt,
   sum(case when ( f.txt = 'wifi'      and d.wifi      = 'Yes' ) or
                 ( f.txt = 'email'     and d.email     = 'Yes' ) or
                 ( f.txt = 'bluetooth' and d.bluetooth = 'Yes' ) 
            then   c.quantity
            else   0 end
      ) yes,
   sum(case when ( f.txt = 'wifi'      and d.wifi      = 'No' ) or
                 ( f.txt = 'email'     and d.email     = 'No' ) or
                 ( f.txt = 'bluetooth' and d.bluetooth = 'No' ) 
            then   c.quantity
            else   0 end
      ) no,
   sum(case when ( f.txt = 'wifi'      and d.wifi      = 'Unknown' ) or
                 ( f.txt = 'email'     and d.email     = 'Unknown' ) or
                 ( f.txt = 'bluetooth' and d.bluetooth = 'Unknown' ) 
            then   c.quantity
            else   0 end
      ) unknown
from 
   count_table  c                                   join 
   device_table d on c.device_id = d.id     cross   join
   (
        select 'wifi'      txt from dual union all
        select 'email'     txt from dual union all
        select 'bluetooth' txt from dual
   ) f
group by 
    f.txt;

sql query to produce xml output

5 votes

I have these tables

Table 1 tbl1_site [facilityId] [name]

Table 2 tbl2_applicant [pvid] [facilityId] [npi] [firstname]

FK join key: tbl1_site.facilityId = tbl2_applicant.facilityId

Table 3 tbl3_abstraction [pvid] [patientnumber] [diabetesdiagnosis] [dateofbirth]

FK join key: tbl2_applicant.pvId = tbl3_abstraction.pvId

i have problem to create a sql query to reproduce this xml output.

thanks

<account>
    <metadata />
    <practice-sites>
        <practice-site>
            <metadata>
                <data-element id="name">
                    <value>My Own Diabetes Medical Center</value>
                </data-element>
            </metadata>
            <applicants>
                <metadata />
                <applicant>
                    <metadata>
                        <data-element id="npi">
                            <value>1234567890</value>
                        </data-element>
                        <data-element id="firstname">
                            <value>Joseph</value>
                        </data-element>
                    </metadata>
                    <clinical-abstractions>                           
                        <clinical-abstraction>
                            <data-element id="diabetesdiagnosis">
                                <value>Backward</value>
                            </data-element>
                            <data-element id="dateofbirth">
                                <value>02/01/2009</value>
                            </data-element>
                            <data-element id="patientnumber">
                                <value>1</value>
                            </data-element>
                        </clinical-abstraction>
                    </clinical-abstractions>
                </applicant>
            </applicants>
        </practice-site>
    </practice-sites>
</account>

do you really need all those tags? I mean the "metadata" and "data-element"

try this query, it shows the data on the format you need:

select  t1.name as PracticeSite,
        (SELECT t2.npi as NPI, 
                t2.firstname,
                (SELECT t3.patientnumber, 
                        t3.diabetesdiagnosis, 
                        t3.dateofbirth
                        FROM tbl3_abstraction t3
                        WHERE t3.pvId=t2.pvId
                        FOR XML PATH('clinical-abstraction'), TYPE
                        ) as 'clinical-abstractions'
                FROM tbl2_applicant t2
                WHERE t1.[facilityId]=t2.[facilityId]
                FOR XML PATH('Applicant'), TYPE
        ) AS 'Applicants'
from tbl1_site t1
FOR XML path('PracticeSites'), root('account'), ELEMENTS;

Add row number to this T-SQL query

5 votes

How can I add ROW numbers to this query result?

SELECT DISTINCT
  VehicleSpecs.SubmittedById,
  COUNT(VehicleSpecs.SubmittedById) AS NumCars,
  aspnet_Users.UserName
FROM
  VehicleSpecs
  INNER JOIN aspnet_Users ON VehicleSpecs.SubmittedById = aspnet_Users.UserId
WHERE
  (LEN(VehicleSpecs.SubmittedById) > 0)
GROUP BY
  VehicleSpecs.SubmittedById,
  aspnet_Users.UserName
ORDER BY
  NumCars DESC

Add: ROW_NUMBER() OVER (ORDER BY NumCars)

EDIT:

WITH    t1 AS 
( SELECT DISTINCT
            VehicleSpecs.SubmittedById ,
            COUNT(VehicleSpecs.SubmittedById) AS NumCars ,
            aspnet_Users.UserName
   FROM     VehicleSpecs
            INNER JOIN aspnet_Users ON VehicleSpecs.SubmittedById = aspnet_Users.UserId
   WHERE    ( LEN(VehicleSpecs.SubmittedById) > 0 )
   GROUP BY VehicleSpecs.SubmittedById ,
            aspnet_Users.UserName
)
SELECT  ROW_NUMBER() OVER ( ORDER BY NumCars ), *
FROM    t1
ORDER BY NumCars

How can I summarize rows that occur only once?

5 votes

I have a query which returns the number of rows of a distinct device_type which occur more than once.

SELECT COUNT(*) AS C1,device_type FROM stat 
    WHERE stat_date = '2012-02-08' 
    GROUP BY 2 HAVING C1 > 1 
    ORDER BY 1 DESC

I would like to summarize the remaining (HAVING count = 1) rows as 'others'

How can I add the sum of COUNT(*) and 'others' as second column for the following query?

SELECT COUNT(*) AS C2,device_type FROM stat 
    WHERE stat_date = '2012-02-08' 
    GROUP BY 2 HAVING C2 = 1 
    ORDER BY 1 DESC

Sample data in DB

device_type
dt1
dt1
dt1
dt2
dt2
dt3
dt4
dt5

expected result

3 dt1
2 dt2
3 other

You can also try:

SELECT SUM(C1) AS C1, CASE WHEN C1 = 1 THEN 'other' ELSE device_type END as device_type
FROM (  SELECT  COUNT(*) AS C1,
                device_type 
        FROM stat 
        WHERE stat_date = '2012-02-08' 
        GROUP BY device_type) A
GROUP BY CASE WHEN C1 = 1 THEN 'other' ELSE device_type END

Merging XML in an SQL Server

5 votes

Let's say I have the following two pieces of XML in my database

<!-- XML 1 -->
<pairs>
    <item key="a">xml 1 a value</item>
    <item key="b">xml 1 b value</item>
    <item key="c">xml 1 c value</item>
</pairs>

<!-- XML 2 -->    
<pairs>
    <item key="c">xml 2 c value</item>
    <item key="d">xml 2 d value</item>
    <item key="e">xml 1 e value</item>
</pairs>

This data is stored in two separate tables using the XML datatype, additionally this XML column is linked to a schema that describes the format of expected xml e.g

[PairData] [xml](CONTENT [foo].[Pairs]) NULL

Within a stored procedure / function I would like to merge these two XML structures into the following:

<pairs>
    <item key="a">xml 1 a value</item>
    <item key="b">xml 1 b value</item>
    <item key="c">xml 2 c value</item>
    <item key="d">xml 2 d value</item>
    <item key="e">xml 2 e value</item>
</pairs>

So, from the first piece of xml we have taken items:

a, b

from the second piece of xml we have taken items:

c, d, e  

Notice that the two pieces of XML have a common item with a key of:

c

In this scenario the value from xml 2 should be used in the merged xml (discarding the value from xml 1). Another case is that the XML 1 or 2 could be NULL therefore the merge process should handle this and simply return the other. Or both could be NULL in which case NULL is returned.

As an aside, in our current implementation we are returning both XML documents from the DB and doing the merge in code. However we would prefer to have this merge done within the DB as multiple unrelated processes are calling this proc.

Use:

declare @x1 xml ='<pairs>
    <item key="a">xml 1 a value</item>
    <item key="b">xml 1 b value</item>
    <item key="c">xml 1 c value</item>
</pairs>'

declare @x2 xml ='<pairs>
    <item key="c">xml 2 c value</item>
    <item key="d">xml 2 d value</item>
    <item key="e">xml 2 e value</item>
</pairs>'

select *
from
(
    select isnull(t2.a, t1.a) [@key], isnull(t2.b, t1.b) [text()]
    from
    (
        select t.c.value('@key', 'nvarchar(max)') [a], t.c.value('.', 'nvarchar(max)') [b]
        from @x1.nodes('/*/item') t(c)
    )t1
    full join
    (
        select t.c.value('@key', 'nvarchar(max)') [a], t.c.value('.', 'nvarchar(max)') [b]
        from @x2.nodes('/*/item') t(c)
    )t2 on t2.a = t1.a
)t
for xml path('item'), root('pairs')

Output:

<pairs>
  <item key="a">xml 1 a value</item>
  <item key="b">xml 1 b value</item>
  <item key="c">xml 2 c value</item>
  <item key="d">xml 2 d value</item>
  <item key="e">xml 2 e value</item>
</pairs>

UPDATE:

declare @x1 xml ='<pairs>
    <item key="a">xml 1 a value</item>
    <item key="b">xml 1 b value</item>
    <item key="c">xml 1 c value</item>
</pairs>'

declare @x2 xml ='<pairs>
    <item key="c">xml 2 c value</item>
    <item key="d">xml 2 d value</item>
    <item key="e">xml 2 e value</item>
</pairs>'

declare @t1 table(id int, data xml)
insert @t1 values(1, @x1)

declare @t2 table(id int, data xml)
insert @t2 values(1, @x2)

select isnull(t2.a, t1.a) [@key], isnull(t2.b, t1.b) [text()]
from
(
    select t.c.value('@key', 'nvarchar(max)') [a], t.c.value('.', 'nvarchar(max)') [b]
    from @t1 ta
    cross apply ta.data.nodes('/*/item') t(c)
)t1
full join
(
    select t.c.value('@key', 'nvarchar(max)') [a], t.c.value('.', 'nvarchar(max)') [b]
    from @t2 ta
    cross apply ta.data.nodes('/*/item') t(c)
)t2 on t2.a = t1.a
for xml path('item'), root('pairs')