Best sql questions in December 2010

Useful stock SQL datasets?

15 votes

Does anyone know of any resources that provide good, useful stock datasets? For example, I've downloaded a SQL script that includes all of the U.S. states, cities, and zipcodes. This saved me a lot of time in a recent application where I wanted to be able to do lookups by geography. Are any of you aware of other useful datasets that are freely available for download?

For example:

  • Blacklisted IP addresses
  • Names of colleges/universities
  • Names of corporations/stock symbols

Anyone have any recommendations?

EDIT:

As an example, here is the location where I found a MySQL script containing all of the U.S. zip codes and their corresponding latitude/longitude. Has anyone else found similarly useful datasets in SQL that can be easily imported and used?

http://www.chrissibert.com/blog/wp-content/uploads/2009/06/zipcodes.7z

EDIT 2:

To clarify what type of datasets I'm talking about... I'm referring to datasets that can be immediately useful for applications, can be applied across a variety of scenarios, and typically represent information that is easy to find for small cases but harder to compile for larger data sets. The zip code database is a great example to me. It's not hard to get the lat/long for a single given zip code. But, it's a bit more time consuming to get the values for all valid zip codes in the U.S. This data is also not useful to a single industry or business sector, but can be applied across a range of applications.

Lots of links to open data sets here:

http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php

although I doubt any of them will generate SQL statements for you.

When to use a View instead of a Table?

14 votes

When should a View actually be used over an actual Table? What gains should I expect this to produce?

Overall, what are the advantages of using a view over a table? Shouldn't I design the table in the way the view should look like in the first place?

Oh there are many differences you will need to consider

Views for selection:

  1. Views provide abstraction over tables. You can add/remove fields easily in a view without modifying your underlying schema
  2. Views can model complex joins easily.
  3. Views can hide database-specific stuff from you. E.g. if you need to do some checks using Oracles SYS_CONTEXT function or many other things
  4. You can easily manage your GRANTS directly on views, rather than the actual tables. It's easier to manage if you know a certain user may only access a view.
  5. Views can help you with backwards compatibility. You can change the underlying schema, but the views can hide those facts from a certain client.

Views for insertion/updates:

  1. You can handle security issues with views by using such functionality as Oracle's "WITH CHECK OPTION" clause directly in the view

Drawbacks

  1. You lose information about relations (primary keys, foreign keys)
  2. It's not obvious whether you will be able to insert/update a view, because the view hides its underlying joins from you

How to store extensible metadata in an ORM-friendly way in .NET?

10 votes

It seems I am unable to find any answers to "how to use EAV approach with ORM tools" question, so I'll try my luck here.

Suppose I have an Entities Table:

ID -> int
Name -> nvarchar(50)

An Images Table:

EntityID -> int
Width -> int
Height -> int

And a Songs Table:

EntityID -> int
Duration -> decimal(12,3)

I need to add extensible metadata to the entities (unknown key-value pairs with type info), so that I'm able to issue queries like:

Find me all the Songs that have a Duration longer than 3 minutes, with a Name starting with 'The', with metadata fulfilling these criteria:

  • HasGuitarSolo is set to true
  • GuitarSoloDuration is greater than 30 seconds

And sort the results on GuitarSoloDuration in descending order.

I don't want to create HasGuitarSolo, GuitarSoloDuration, etc. columns in the database, Ideally I would like to store them in an EAV-like schema, or an alternative schema that doesn't require a knowledge of the keys up front.

Add a column to the tables called 'metadata' and put XML in it. SQL server allows you to look at a blob full of XML as if it were additional columns (with limitations).

For ORM, it depends on how your object is structured.

  • Infinitely Customizable metadata items: you put the name-value pairs from the XML in a collection. If your ORM won't allow this, put it straight into a string property, the setter could parse it into an XML doc (or faster object if you need speed). Getter would return the string. Then a separate property 'MetaDataItem(ItemName->string)' that is not ORM'd would read values from the metadata list and update/add them with its setter.
  • Metadeta is hardcoded properties - map them using a query that pulls them from the XML.
  • Hybrid of the two - hardcoded properties for some items - have their setters/getters call MetaDataItem.
  • Reverse hybrid if certain properties need to be directly stored (esp if you are sorting large lists on them): you have to hardcode properties for that metadata with their own private members, but don't ORM those properties. Hardcoded the saving/loading of those values into the string property that is ORM'd, and if you want to be able to update those hardcoded metadata items from the MetaDataItem property as well, hardcode them it that spot, too!

If you have a whole bunch of hardcoded metadata properties, in addition to the infinite amount, you can easy the crud in the XML property and MetaDataItem property with lists and reflection. If all are hardcoded you can still use the XML text property to load/save them, map that one property, not the others.

Sort them with a LINQ query on the object.

I did this with great success and with each bullet coded, stuff worked better and better! 2005/.Net 1.1 so no ORM, LINQ, my first VB.net program etc. But other developers did use SQL server's XML querying to read my XML. Of course I forgot about this, changed it, and tripped them up :-(

Here are snippets. Key of this all is: ORM friendly = ORM some properties, not others; Allow consumers to use other properties, but not some. If your ORM doesn't allow such ala-carte property selection, you might be able to use inheritance or composition to trick it. Sorry I don't have time to post full example for your purpose.

Well I don't have the code sample here, at home. I will edit and paste it in tommorrow.

EDIT as promised, here's the code snippet:

   Public Property ItemType(ByVal stTagName As String) As String
        Get
            Dim obj As Object
            obj = Me.lstMemberList.Item(stTagName)
            If Not obj Is Nothing Then
                Return CType(obj, foDataItem).Type
            End If
        End Get
        Set(ByVal Value As String)
            Dim obj As Object
            obj = Me.lstMemberList.Item(stTagName)
            If Not obj Is Nothing Then
                CType(obj, foDataItem).Type = Value
            End If
        End Set
    End Property

    Public Function ItemExists(ByVal stTagName As String) As Boolean
        Return Me.lstMemberList.ContainsKey(stTagName)
    End Function

    Public Property ItemValue(ByVal stTagName As String, Optional ByVal Type4NewItem As String = "") As String
        Get
            Dim obj As Object
            obj = Me.lstMemberList.Item(stTagName)
            If obj Is Nothing Then
                Dim stInternalKey As String = ""
                Try
                    stInternalKey = Me.InternalKey.ToString
                Catch
                End Try
                If stTagName <> "InternalKey" Then '' // avoid deadlock if internalkey errs!
                    Throw New ApplicationException("Tag '" & stTagName & _
                      "' does not exist in FO w/ internal key of " & stInternalKey)
                End If
            Else
                Return CType(obj, foDataItem).Value
            End If
        End Get
        Set(ByVal Value As String)
            '' // if child variation form...
            If bLocked4ChildVariation Then
                '' // protect properties not in the list of allowed updatable items 
                If Not Me.GetChildVariationDifferentFields.Contains(stTagName) Then
                    Exit Property
                End If
            End If
            '' // WARNING - DON'T FORGET TO UPDATE THIS LIST OR YOU WILL NEVER FIND THE BUG!
            Select Case stTagName
                Case "PageNum"
                    _PageNum = CInt(Value)
                Case "Left"
                    _Left = CInt(Value)
                Case "Top"
                    _Top = CInt(Value)
                Case "Width"
                    _Width = CInt(Value)
                Case "Height"
                    _Height = CInt(Value)
                Case "Type"
                    _Type = String2Type(Value)
                Case "InternalKey"
                    _InternalKey = CInt(Value)
                Case "UniqueID"
                    _UniqueID = Value
            End Select
            Static MyError As frmErrorMessage
            Dim obj As Object
            If Me.lstMemberList.ContainsKey(stTagName) Then
                Dim foi As foDataItem = CType(Me.lstMemberList.Item(stTagName), foDataItem)
                If foi.Type = "Number" Then
                    Value = CStr(Val(Value))
                End If
                If foi.Value <> Value Then
                    If bMonitorRefreshChanges Then
                        LogObject.LoggIt("Gonna Send Update for Change " & stTagName & " from " & _
                          foi.Value & " to " & Value)
                        If Not Me.FormObjectChanged_Address Is Nothing Then
                            FormObjectChanged_Address(Me, stTagName)
                        End If
                    End If
                End If
                foi.Value = Value
            Else
                Me.lstMemberList.Add(stTagName, New foDataItem(Value, Type4NewItem))
                Me.alOrderAdded.Add(stTagName)
            End If
        End Set
    End Property


  Public Function StringVal(ByVal st As String, Optional ByVal stDefault As String = "") As String
        Try
            StringVal = stDefault
            Return CType(Me.ItemValue(st), String)
        Catch ex As Exception
            Dim bThrowError As Boolean = True
            RaiseEvent ConversionError(ex, "String=" & Me.ItemValue(st), Me, st, bThrowError)
            If bThrowError Then
                LogObject.LoggIt("Error setting tag value in fo.StringVal: " & st)
                Throw New Exception("Rethrown Exception getting value of " & Me.ID & "." & st, ex)
            End If
        End Try
    End Function
    Public Function IntVal(ByVal st As String, Optional ByVal iDefault As Integer = 0) As Integer

    ...

 '' // 'native' values - are normal properties instead of XML properties, which 
    '' // actually makes it harder to deal with b/c of extra updates to sync them, BUT,
    '' // worth it - as they are read much more than written (sorts, wizard builds,
    '' // screen redraws, etc) I can afford to be slow when writing to them, PLUS
    '' // retain the benefits of referencing everything else via ItemValue, PLUS
    '' // these are just the more 'popular' items. 
    Private _Top As Integer
    Private _Left As Integer
    Private _Width As Integer
    Private _Height As Integer
    Private _PageNum As Integer
    Private _Type As pfoType
    Private _InternalKey As Integer
    Private _UniqueID As String

    Public Sub SetNativeValuesFromMyXML()
        _Top = CInt(CType(Me.lstMemberList("Top"), foDataItem).Value)
        _Left = CInt(CType(Me.lstMemberList("Left"), foDataItem).Value)
        _Width = CInt(CType(Me.lstMemberList("Width"), foDataItem).Value)
        _Height = CInt(CType(Me.lstMemberList("Height"), foDataItem).Value)
        _PageNum = CInt(CType(Me.lstMemberList("PageNum"), foDataItem).Value)
        _Type = String2Type(CType(Me.lstMemberList("Type"), foDataItem).Value)
        _InternalKey = CInt(CType(Me.lstMemberList("InternalKey"), foDataItem).Value)
        _UniqueID = CType(Me.lstMemberList("UniqueID"), foDataItem).Value
    End Sub

    Public Property Top() As Integer
        Get
            Return _Top '' // CInt(ItemValue("Top"))
        End Get
        Set(ByVal Value As Integer)
            ItemValue("Top") = Value.ToString
        End Set
    End Property

    Public Property Left() As Integer
        Get
            Return _Left '' //CInt(ItemValue("Left"))
        End Get

    ...

Is assembly running in SQL Server or from a Windows app

10 votes

How can I determine if an assembly is running from a SQL Server CLR stored procedure or if it is running from a Windows app?

Check out SqlContext.IsAvailable

How do I stream .flv files from SQL database

9 votes

I want to store .flv files in the database and not in the file system.

This is what I can do right now:
Successfully convert .wmv and .mpeg to .flv with ffmpeg.
Store images in SQL Server and show them on my page with an httphandler.
Same with .avi and .mpeg videos. (It's up to the user's software if he can view it though)
Play .flv files in the browser if the file is located in the file system and not in the database.

What I can't do is:
Stream .flv videos to JW Player directly from the database. (Stored as binary data)

I've searched the internet for two days now but I can't get it to work. It feels as if I'm almost there though. The JW Player opens up and starts to "buffer", but nothing happens.

I know there's no easy answer but if anyone has done this before, or something similar, I'd like to know how you did. I feel I've got too much code to post it all here.

Thanks in advance!

/zalk

EDIT: Solved!
Posted it as an answer instead...

I got it to work but I have no idea as to how efficient it is. Is it better to stream from the file system than from the database in terms of connections, efficency, load etc. I could use some pointers on this!

I'm using JW Player here, hence "swfobject.js" and "player.swf"

httpHandler:

public class ViewFilm : IHttpHandler
{
    public void ProcessRequest(HttpContext context)
    {
        try
        {
            // Check if id was given
            if (context.Request.QueryString["id"] != null)
            {
                string movId = context.Request.QueryString["id"];

                // Connect to DB and get the item id
                using (SqlConnection con = new SqlConnection(ConfigurationManager.ConnectionStrings["ApplicationServices"].ConnectionString))
                using (SqlCommand cmd = new SqlCommand("GetItem", con))
                {
                    cmd.CommandType = CommandType.StoredProcedure;
                    SqlParameter sqlParam = cmd.Parameters.Add("@itemId", SqlDbType.Int);
                    sqlParam.Value = movId;

                    con.Open();
                    using (SqlDataReader dr = cmd.ExecuteReader())
                    {
                        if (dr.HasRows)
                        {
                            dr.Read();
                            // Add HTTP header stuff: cache, content type and length
                            context.Response.Cache.SetCacheability(HttpCacheability.Public);
                            context.Response.Cache.SetLastModified(DateTime.Now);
                            context.Response.AppendHeader("Content-Type", "video/x-flv");
                            context.Response.AppendHeader("Content-Length", ((byte[])dr["data"]).Length.ToString());
                            context.Response.BinaryWrite((byte[])dr["data"]);
                        }
                    }
                }
            }
        }
        catch (Exception ex)
        {
            throw new Exception(ex.ToString());
        }
    }

    public bool IsReusable
    {
        get { return false; }
    }
}

javascript
The function adds a player to <div id="video1"> and can be called e.g when a user clicks a button.

<script type='text/javascript' src='swfobject.js'></script>
<script type="text/javascript" language="javascript">
function vid() {
  var s1 = new SWFObject('player.swf', 'player1', '480', '270', '9');
  s1.addParam('allowfullscreen', 'true');
  s1.addParam('allowscriptaccess', 'always');
  s1.addVariable('file', encodeURIComponent('ViewFilm.ashx?id=10'));
  s1.addVariable('type', 'video');
  s1.write(document.getElementById("video1"));
}
</script>

Where do I sort?

9 votes

Hello all, I have a database, which I query, and I'm unsure of where to perform the sorting of the results, so far I've have the following options.

  • At the MySQL query.
  • At list level(Using a LinkedList)
  • Sorting an unsorted list using comparators before showing the results (basically in the jsp)

The List is composed by ObjectDTO so where would it be more efficient. Any ideas?

You should do the sorting in the database if at all possible.

  • The database can use indexes. If there is a suitable index available then the results can be read from disk already in sorted order, resulting in a performance increase - no extra O(n log(n)) sorting step is required.
  • If you only need the first x results you also minimize data transfer (both reduced network transfer, and also reduced disk access if there is a suitable index).

Is the GROUP BY clause in SQL redundant?

9 votes

Whenever we use an aggregate function in SQL (MIN, MAX, AVG etc), we must always GROUP BY all non-aggregated columns, for instance:

SELECT storeid, storename, SUM(revenue), COUNT(*)
FROM Sales 
GROUP BY storeid, storename

It becomes even more intrusive when we use a function or other calculation in our SELECT statement, as this must also be copied to the GROUP BY clause.

SELECT (2 * (x + y)) / z + 1, MyFunction(x, y), SUM(z)
FROM AnotherTable
GROUP BY (2 * (x + y)) / z + 1, MyFunction(x, y)

If we ever change the SELECT statement, we must remember to make the same change to our GROUP BY clause.

As far as I can see, the GROUP BY clause is redundant, as I believe that in all cases the database could infer the information it contains.

  • If this is indeed the case, then why is there a GROUP BY clause in SQL at all?
  • If this is not the case, then what extra functionality does GROUP BY give us?

Whenever we use an aggregate function in SQL (MIN, MAX, AVG etc), we must always GROUP BY all non-aggregated columns

This is not true in general. MySQL for example doesn't require this, and the SQL standard doesn't say this either.

It becomes even more intrusive when we use a function or other calculation in our SELECT statement, as this must also be copied to the GROUP BY clause.

Also not true in general. MySQL (and perhaps other databases too) allow column aliases to be used in the GROUP BY clause:

SELECT (2 * (x + y)) / z + 1 AS a, MyFunction(x, y) AS b, SUM(z)
FROM AnotherTable
GROUP BY a, b

If this is not the case, then what extra functionality does GROUP BY give us?

The only way of specifying what to group by is to use a GROUP BY clause. You cannot necessarily deduce it from the columns mentioned in the SELECT. In fact you don't even have to select all the columns mentioned in the GROUP BY:

SELECT MAX(col2)
FROM foo
GROUP BY col1
HAVING COUNT(*) = 2

How can I avoid NULLs in my database, while also representing missing data?

8 votes

In SQL and Relational Theory (C.J. Date, 2009) chapter 4 advocates avoiding duplicate rows, and also to avoid NULL attributes in the data we store. While I have no troubles avoiding duplicate rows, I am struggling to see how I can model data without making use of NULL. Take the following, for example - which is a bit from work.

We have an artist table, which has, amongst other columns, a gender column. This is a foreign key to the gender table. However, for some artists, we don't know their gender - for example we've been given a list of new music which has no descriptions of the artist. How, without using NULL is one meant to represent this data? The only solution I see is to add a new gender, "unknown", to the gender table.

While I am thoroughly enjoying this book, I was really disappointed when the chapter concluded with:

Of course, if nulls are prohibited, then missing information will have to be handled by some other means. Unfortunately, those other means are much too complex to be discussed in detail here.

Which is a real shame - because this was the solution I was waiting to read about! There is a reference to read the appendix which has lots of publications to read, but I was hoping for a little bit more of a down to earth summary before I dived into reading these.


I'm getting a few people commenting that they don't understand why I wish to avoid 'NULL' so I will quote the book again. Take the following query:

SELECT s.sno, p.pno
  FROM s, p
 WHERE s.city <> p.city
    OR p.city <> 'Paris'

Now, take the example that s.city is London, and p.city is Paris. In this case, London <> Paris, so the query is true. Now take the case that p.city is not Paris, and is infact xyz. In this case, (London <> xyz) OR (xyz <> Paris) is also True. So, given any data - this query is true. However, if xyz is 'NULL' the scenario changes. In this case both of these expressions are neither True nor False, they are in fact, Unknown. And in this case because the result is unknown you will not get any rows returned.

The move from 2 value logic to 3 value logic can easily introduce bugs like this. Infact, I just introduced one at work which motivated this very post. I wanted all rows where the type != 0 However, this actually ends up matching type == 0 OR type IS NULL - confusing behavior.

Whether or not I model my data with or without NULL in the future is unclear, but I'm very curious what the other solutions are. (I too have always been of the argument that if you don't know, you should use NULL).

Everybody's talking and no one except dportas and Walter can even understand the question. Ok, so 95% of the people on SO do not understand The Null Problem, and feel threatened because their databases are full of Nulls, they want to convert the seeker. Priceless. How is anyone going to learn when they are arguing ?

Good on you, for eliminating Nulls. I have never allowed Nulls in any of my databases.

Of course, if nulls are prohibited, then missing information will have to be handled by some other means. Unfortunately, those other means are much too complex to be discussed in detail here.

Actually it is not so hard at all. There are three alternatives.

  1. Here's a paper on How To Handle Missing Information Without Using NULL by H Darwen, that may help to get your head around the problem.

1.1. Sixth Normal Form is the answer. But you do not have to normalise your entire database to 6NF. For each column that is optional, you need a child table off the main table, with just the PK, which is also the FK, because it is a 1::0-1 relation. Other than the PK, the only column is the optional column.

Look at this Data Model; AssetSerial on page 4 is a classic case: not allAssets have SerialNumbers; but when they do, I want them to store them; more important I want to ensure that they are Unique.

(For the OO people out there, incidentally, that is a three level class diagram in Relational notation, a "Concwete Table Inheritance", no big deal, we've had it fro 30 years.)

1.2. For each such table, use a View to provide the 5NF form of the table. Sure, use Null (or any value that is appropriate for the column) to identify the absence of the column for any row. But do not update via the view.

1.3 Do not use straight joins to grab the 6NF column. Do not use outer joins, either (and have the server fill in a Null for the missing rows). Use a subquery to populate the column, and specify the value that you want returned for a missing value (except if you have Oracle, because its Subquery processing is even worse than its set processing). Eg. and just an eg. you can convert a numeric column to string, and use "Missing" for the missing rows.

When you do not want to go that far (6NF), you have two more options.
.
2. You can use Null substitutes. I use CHAR(0) for character colomns and 0 for numeric. But I do not allow that for FKs. Obviously you need a value that is outside the normal range of data. This does not allow Three Valued Logic.
.
3. In addition to (2), for each Nullable column, you need a boolean Indicator. For the example of the Sex column, the Indicator would be something like SexIsMissing or SexLess (sorry). This allows very tight Three Valued Logic. Many people in that 5% like it because the db remains at 5NF (and less tables); the columns with missing info are loaded with values that are never used; they are only used if the Indicator is false. If you have an enterprise db, you can wrap that in a Function, and always use the UDF, not the raw column.

Of course, in all cases, you can never get away from writing code that is required to handle the missing info. Whether it is ISNULL(), or a subquery for the 6NF column, or an Indicator to check before using the value, or an UDF.

If Null has a specific meaning ... then it is not a Null ! By definition, Null is the Unknown Value.

How do we implement an IS-A Relationship ?

8 votes

We implement an One-to-Many relationship by adding one Table's PK, as FK to the other Table. We implement a Many-to-Many relationship by adding 2 Table's PKs to a third Table.

How do we implement an IS-A Relationship ?

The Entities are TECHNICIAN and ADMINISTRATIVE which both are EMPLOYEE. I could just use an extra field in the Table EMPLOYEE(id, name, surname, role, ...AdminFields..., ...TechFields...)

but i would like to explore the IS-A option.

EDIT: I did as Donnie suggested, but without the role field.

I did as Donnie suggested, but without the role field, because it complicates things. This is the final implementation:

DDL:

CREATE TABLE Employee (
ast VARCHAR(20) not null,
firstname VARCHAR(200) not null,
surname VARCHAR(200) not null,
...
PRIMARY KEY(ast)
);

CREATE TABLE Administrative (
employee_ast VARCHAR(20) not null REFERENCES Employee(ast),
PRIMARY KEY(employee_ast)
);

CREATE TABLE Technical (
employee_ast VARCHAR(20) not null REFERENCES Employee(ast),
...
PRIMARY KEY(employee_ast)
);

ER Diagram:

ERD

In this model there are no Employees of Generic Type. Here, an Employee can only be Administrative or Technical.

Should I use flat tables or a normalized database?

7 votes

I have a web application that I am currently working on that uses a MySQL database for the back-end, and I need to know what is better for my situation before I continue any further.

Simply put, in this application users will be able to construct their own forms with any number fields (they decide) and right now I have it all stored in a couple tables linked by foreign keys. A friend of mine suggests that to keep things "easy/fast" that I should convert each user's form to a flat table so that querying data from them stays fast (in case of large growth).

Should I keep the database normalized with everything pooled into relational tables with foreign keys (indexes, etc) or should I construct flat tables for every new form that a user creates?

Obviously some positives of creating flat tables is data separation (security) and query speeds would be cut down. But seriously how much gain would I get from this? I really don't want 10000 tables and to be dropping, altering, and adding all of the time, but if it will be better than I will do it... I just need some input.

Thank you

Rule of thumb. It's easier to go from normalized to denormalized than the other way around.

Start with a reasonable level of database normalization (by reasonable I mean readable, maintainable, and efficient but not prematurely optimized), then if you hit performance issues as you grow, you have the option of looking into ways in which denormalization may increase performance.

The most useful User-Defined Aggregate Functions.

7 votes

Do you have any aggregate functions that you have implemented because the standard ones were not good enough?

An aggregate function for SQL Server that produces a comma-separated list of values.

Title
-----------------
The Hobbit
The Two Towers       -->   The Hobbit, The Two Towers, Leaf by Niggle
Leaf by Niggle

Here's my implementation: A SQL CLR user-defined aggregate - notes on creating and debugging

What is the most efficient way to count rows in a table in SQLite?

7 votes

I've always just used "SELECT COUNT(1) FROM X" but perhaps this is not the most efficient. Any thoughts? Other options include SELECT COUNT(*) or perhaps getting the last inserted id if it is auto-incremented (and never deleted).

How about if I just want to know if there is anything in the table at all? (e.g., count > 0?)

The best way is to make sure that you run SELECT COUNT on a single column (SELECT COUNT(*) is slower) - but SELECT COUNT will always be the fastest way to get a count of things (the database optimizes the query internally).

If you check out the comments below, you can see arguments for why SELECT COUNT(1) is probably your best option.

T-SQL Conditional WHERE Clause

7 votes

Hi Guys,

Found a couple of similar questions here on this, but couldn't figure out how to apply to my scenario.

My function has a parameter called @IncludeBelow. Values are 0 or 1 (BIT).

I have this query:

SELECT p.*
FROM Locations l
INNER JOIN Posts p
on l.LocationId = p.LocationId
WHERE l.Condition1 = @Value1
AND   l.SomeOtherCondition = @SomeOtherValue

If @IncludeBelow is 0, i need the query to be this:

SELECT p.*
FROM Locations l
INNER JOIN Posts p
on l.LocationId = p.LocationId
WHERE l.Condition1 = @Value1
AND   l.SomeOtherCondition = @SomeOtherValue
AND   p.LocationType = @LocationType -- additional filter to only include level.

If @IncludeBelow is 1, that last line needs to be excluded. (i.e don't apply filter).

I'm guessing it needs to be a CASE statement, but can't figure out the syntax.

Here's what i've tried:

SELECT p.*
FROM Locations l
INNER JOIN Posts p
on l.LocationId = p.LocationId
WHERE l.Condition1 = @Value1
AND   l.SomeOtherCondition = @SomeOtherValue
AND (CASE @IncludeBelow WHEN 0 THEN p.LocationTypeId = @LocationType ELSE 1 = 1)

Obviously that's not correct.

What's the correct syntax?

I changed the query to use EXISTS because if there's more than one location associated with a POST, there'd be duplicate POST records that'd require a DISTINCT or GROUP BY clause to get rid of...

The non-sargable

This will perform the worst of the possible solutions:

SELECT p.*
  FROM POSTS p
 WHERE EXISTS(SELECT NULL
                FROM LOCATIONS l
               WHERE l.LocationId = p.LocationId
                 AND l.Condition1 = @Value1
                 AND l.SomeOtherCondition = @SomeOtherValue)
   AND (@IncludeBelow = 1 OR p.LocationTypeId = @LocationType)

The sargable, non-dynamic version

Self explanitory....

BEGIN
  IF @IncludeBelow = 0 THEN
    SELECT p.*
      FROM POSTS p
     WHERE EXISTS(SELECT NULL
                    FROM LOCATIONS l
                   WHERE l.LocationId = p.LocationId
                     AND l.Condition1 = @Value1
                     AND l.SomeOtherCondition = @SomeOtherValue)
       AND p.LocationTypeId = @LocationType
  ELSE
    SELECT p.*
      FROM POSTS p
     WHERE EXISTS(SELECT NULL
                    FROM LOCATIONS l
                   WHERE l.LocationId = p.LocationId
                     AND l.Condition1 = @Value1
                     AND l.SomeOtherCondition = @SomeOtherValue) 
END

The sargable, dynamic version (SQL Server 2005+):

Love or hate it, dynamic SQL lets you write the query once. Just be aware that sp_executesql caches the query plan, unlike EXEC in SQL Server. Highly recommend reading The Curse and Blessings of Dynamic SQL before considering dynamic SQL on SQL Server...

DECLARE @SQL VARCHAR(MAX)
    SET @SQL = 'SELECT p.*
                  FROM POSTS p
                 WHERE EXISTS(SELECT NULL
                                FROM LOCATIONS l
                               WHERE l.LocationId = p.LocationId
                                 AND l.Condition1 = @Value1
                                 AND l.SomeOtherCondition = @SomeOtherValue)'

    SET @SQL = CASE 
                 WHEN @IncludeBelow = 0 THEN
                   ' AND p.LocationTypeId = @LocationType '
                 ELSE ''
               END   

BEGIN 

  EXEC sp_executesql @SQL, 
                     N'@Value1 INT, @SomeOtherValue VARCHAR(40), @LocationType INT',
                     @Value1, @SomeOtherValue, @LocationType

END

Program Structure -- Simple Commandline To Do List App -- What's the Haskell way?

6 votes

Background: I am working on my first complete program in Haskell, a simple to do list application on the command line.

My question is one of structure and has two levels: (1) What is the best way to do it? and (2) What is the Haskell (functional) way to do it? The reason I phrase it this way is because I suspect that there may be a faster or easier way to do it that ignores the principles of functional programming. I'd like to do it in the more elegant and clear way, as it is more of a learning exercise than anything else.

Keep in mind that I'd (obviously) like the application to be somewhat persistent. The two options on the table right now are to store information in a flat text file or alternately in a Sqlite database.

The first structure that came to mind was something like this, where a ToDoList type object is simply a list of ToDo items:

import Data.List
import Data.Time

data ToDo = ToDo {
        todoId       :: Int,
        todoDue      :: ZonedTime,
        todoCreated  :: UTCTime,
        todoItem     :: String,
        todoPriority :: Priority,
        todoStatus   :: Status
        }
        deriving (Show, Read)

type ToDoList = [ToDo]

data Priority = Low | Medium | High
        deriving (Show, Read, Eq, Ord)

data Status = Complete | InProgress | Open
        deriving (Show, Read, Eq, Ord)

But then I began to wonder how best to store objects of this type. Would this imply that I store them in a flat file? Is there some way to relate objects of highly specified types like this to fields/columns in a database?

When I think about using a Sqlite database, it seems that all of the work will be done in the database calls, and Haskell types will have comparatively little to do with it. This seems bad.

In summary, the question is how best can I model the data structure of my simple to do list application in keeping with the notions of functional programming and Haskell ideals that I am seeking to practice with this project?

The Show/Read combination is a tremendously easy way of serializing and deserializing the internal state of your application, and because of purity it basically always works too. Furthermore, you'll get good practice writing functions that slice and dice lists, since you'll be able to treat the list as if it is entirely loaded into memory (and maybe if you want to play with some more efficient data structures, you can look at ways to optimize different queries.)

For example, if I want to find all items that are due before some date, I can write this using a filter on due:

dueBefore (ToDoList ts) d = ToDoList (filter (\t -> due t <= d) ts)

Some style nitpicks on your pseudocode:

  • Since all accessor functions (id, die. created...) get dumped into the module-wide namespace, it's good style to prefix/suffix them with the name of the record, such as todoId, todoDie. In this particular case, id is a real function, so you shouldn't shadow it!

  • ToDoList is a single constructor datatype with one value; you probably actually just want a newtype or type synonym. (Exercise: rewrite the above snippet to work with a type synonym.)

  • You probably want an Ord instance on Priority and Status

Mysql VIEWS vs. PHP query

6 votes

Hi,

I am working on a web application which involves create list of Restaurants in various lists like "Joe's must visit places". Now for each Restaurant and list, I have display on website which calculates

  • Calculating popularity of a Restaurant
  • Popularity of a list
  • Number of lists a Restaurant is present in

Currently I am using MySQL statements in PHP for this but planning to switch to MySQL VIEWS and do a simple select statement in PHP...

my question is, What is Advantage/Disadvantage of using VIEWS over writing sql queries in PHP?

Using views adds a level of abstraction : you may later change the structure of your tables, and you will not have to change the code that displays the information about the lists, because you will still be querying the view (the view definition may change, though).

The main difference is that views are updated after each insertion, such that the data is "ready" whenever you query the view, whereas using your custom query will have MySQL compute everything each time (there is some caching, of course).

The bottom line is that if your lists are updated less frenquently than they are viewed, you will see some gains in performance in using views.

Performing a bitwise sum

6 votes

As part of a complex aggregate I want to know the bitwise sum of some data, i.e. if I have rows with values 1,1,1,2,2,8 the bitwise sum is 11. In this case the values are all exact powers of two (single bits), so I can hack around it by grouping and summing over the groups (obviously this example is a bit tortured compared to the real query):

select SUM(y.test)
from (
    select x.test
    from ( -- garbage test data
        select 1 as [test]
        union all select 1
        union all select 1
        union all select 2
        union all select 2
        union all select 8) x
group by x.test) y

but is there a clean way to perform a bitwise sum in [T]SQL?

If all of your test values are single bits as in your example (1, 2, 8) - simply use SUM(DISTINCT col) in your query.

Hope that helps.

(For reference: http://msdn.microsoft.com/en-us/library/ms187810.aspx)

The purpose of SQL's EXISTS and NOT EXISTS

6 votes

Every now and then I see these being used, but it never seems to be anything that can't be performed as equally well, if not better, by using a normal join or subquery.

I see them as being misleading (they're arguably harder to accurately visualize compared to conventional joins and subqueries), often misunderstood (e.g. using SELECT * will behave the same as SELECT 1 in the EXISTS/NOT EXISTS subquery), and from my limited experience, slower to execute.

Can someone describe and/or provide me an example where they are best suited or where there is no option other than to use them? Note that since their execution and performance are likely platform dependent, I'm particularly interested in their use in MySQL.

Every now and then I see these being used, but it never seems to be anything that can't be performed as equally well, if not better, by using a normal join or subquery.

This article (though SQL Server related):

may be of interest to you.

In a nutshell, JOIN is a set operation, while EXISTS is a predicate.

In other words, these queries:

SELECT  *
FROM    a
JOIN    b
ON      some_condition(a, b)

vs.

SELECT  *
FROM    a
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    b
        WHERE   some_condition(a, b)
        )

are not the same: the former can return more than one record from a, while the latter cannot.

Their counterparts, NOT EXISTS vs. LEFT JOIN / IS NULL, are the same logically but not performance-wise.

In fact, the former may be more efficient in SQL Server:

MySQL "IN" operator performance on (large?) amount of values

6 votes

Hi,

I have been experimenting with Redis and MongoDB lately and it would seem that there are often cases where you would store an array of id's in either MongoDB or Redis. I'll stick with Redis for this question since I am asking about the MySQL IN operator.

I was wondering how performant it is to do a large amount (300-3000) of id's inside the IN operator, which would look something like this:

SELECT id, name, price
FROM products
WHERE id IN (1, 2, 3, 4, ...... 3000)

Imagine something as simple as a products and categories table which you might normally JOIN together to get the products from a certain category. In the example above you can see that under a given category in Redis ( category:4:product_ids ) I return all the product ids from the category with id 4, and place them in the above SELECT query inside the IN operator.

How performant is this?

Is this an "it depends" situation? Or is there a concreet "this is (un)acceptable" or "fast" or "slow" or should I add a LIMIT 25, or doesn't that help?

SELECT id, name, price
FROM products
WHERE id IN (1, 2, 3, 4, ...... 3000)
LIMIT 25

Or should I trim the array of product id's returned by Redis to limit it to 25 and only add 25 id's to the query rather than 3000 and LIMIT-ing it to 25 from inside the query?

SELECT id, name, price
FROM products
WHERE id IN (1, 2, 3, 4, ...... 25)

Any suggestions/feedback is much appreciated!

Generally speaking, if the IN list gets too large (for some ill-defined value of 'too large' that is usually in the region of 100 or smaller), it becomes more efficient to use a join, creating a temporary table if need so be to hold the numbers.

If the numbers are a dense set (no gaps - which the sample data suggests), then you can do even better with WHERE id BETWEEN 300 AND 3000. However, presumably there are gaps in the set, at which point it may be better to go with the list of valid values after all (unless the gaps are relatively few in number, in which case you could use: WHERE id BETWEEN 300 AND 3000 AND id NOT BETWEEN 742 AND 836 or whatever the gaps are.

Convert SQL to LINQ to SQL

6 votes

Hi

I have the SQL query

with c as (
  select  categoryId,parentId, name,0 as [level]
  from task_Category b 
  where b.parentId is null
    union all
  select b.categoryId,b.parentId,b.name,[level] + 1
  from task_Category b join c on b.parentId = 
    c.categoryId)
select name,[level],categoryId,parentId 
  as item 
from c

and I want to convert it to LINQ to SQL, yet my LINQ skills are not there yet. Could someone please help me convert this. It's the with and union statements that are making this a bit more complex for me.

Any help appreciated.

LINQ-to-SQL only supports basic queries; CTE and recursion are not included in this. Therefore, here's my high-tech translation to LINQ-to-SQL:

var data = ctx.ExecuteQuery<MyStub>(@"
with c as (
  select  categoryId,parentId, name,0 as [level]
  from task_Category b 
  where b.parentId is null
    union all
  select b.categoryId,b.parentId,b.name,[level] + 1
  from task_Category b join c on b.parentId = 
    c.categoryId)
select name,[level],categoryId,parentId 
  as item 
from c").ToList();

with

class MyStub {
    public string name {get;set;}
    public int level {get;set;}
    public int categoryId {get;set;}
    public int parentId {get;set;}
}

Using SQL, how do I update rows, using their own values?

5 votes

I have the following table which I'll call 'example'

id name       last_name
01 Adam       Adams
02 Bill       Billo
03 Cathy      McCathyson

I need to modify the table and end up with the following:

id name
01 Adam Adams
02 Bill Billo
03 Cathy McCathyson

For a single row, I know how to write this query:

UPDATE example SET name = 
   (SELECT name FROM example WHERE id = 01)+" "
   +(SELECT last_name FROM example WHERE id = 01)
WHERE id = 01;

How do I modify this query such that it updates each row with that row's values, as in the example?

EDIT: I've updated my example since it confused the issue.

UPDATE example SET NAME = NAME + ' ' + last_name

    ID NAME      LAST_NAME

     1 Adam       Adams
     2 Bill       Billo
     3 Cathy      McCathyson

SQL> UPDATE example SET NAME = NAME + ' ' + last_name
  2  /

3 rows updated

SQL> select * from example
  2  /

        ID NAME                    LAST_NAME
---------- -----------------------------------------
         1 Adam Adams              Adams
         2 Bill Billo              Billo
         3 Cathy McCathyson        McCathyson