Best sql questions in January 2012

Problems getting LEFT OUTER JOIN to work

17 votes

I thought I understood how left outer joins work, but I have a situation that is not working, and I'm not 100% sure if the way I have my query structured is incorrect, or if it's a data issue.

For background, I have the following MySQL table structures:

mysql> describe achievement;
+-------------+----------------------+------+-----+---------+-------+
| Field       | Type                 | Null | Key | Default | Extra |
+-------------+----------------------+------+-----+---------+-------+
| id          | varchar(64)          | NO   | PRI | NULL    |       |
| game_id     | varchar(10)          | NO   | PRI | NULL    |       |
| name        | varchar(64)          | NO   |     | NULL    |       |
| description | varchar(255)         | NO   |     | NULL    |       |
| image_url   | varchar(255)         | NO   |     | NULL    |       |
| gamerscore  | smallint(5) unsigned | NO   |     | 0       |       |
| hidden      | tinyint(1)           | NO   |     | 0       |       |
| base_hidden | tinyint(1)           | NO   |     | 0       |       |
+-------------+----------------------+------+-----+---------+-------+
8 rows in set (0.00 sec)

and

mysql> describe gamer_achievement;
+----------------+---------------------+------+-----+---------+-------+
| Field          | Type                | Null | Key | Default | Extra |
+----------------+---------------------+------+-----+---------+-------+
| game_id        | varchar(10)         | NO   | PRI | NULL    |       |
| achievement_id | varchar(64)         | NO   | PRI | NULL    |       |
| gamer_id       | varchar(36)         | NO   | PRI | NULL    |       |
| earned_epoch   | bigint(20) unsigned | NO   |     | 0       |       |
| offline        | tinyint(1)          | NO   |     | 0       |       |
+----------------+---------------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

As for the data, this is what I have populated here (only pertinent columns included for brevity):

+----+------------+------------------------------+
| id | game_id    | name                         |
+----+------------+------------------------------+
| 1  | 1480656849 | Cluster Buster               |
| 2  | 1480656849 | Star Gazer                   |
| 3  | 1480656849 | Flower Child                 |
| 4  | 1480656849 | Oyster-meister               |
| 5  | 1480656849 | Big Cheese of the South Seas |
| 6  | 1480656849 | Hexic Addict                 |
| 7  | 1480656849 | Collapse Master              |
| 8  | 1480656849 | Survivalist                  |
| 9  | 1480656849 | Tick-Tock Doc                |
| 10 | 1480656849 | Marathon Mogul               |
| 11 | 1480656849 | Millionaire Extraordinaire   |
| 12 | 1480656849 | Grand Pearl Pooh-Bah         |
+----+------------+------------------------------+
12 rows in set (0.00 sec)

and

+----------------+------------+--------------+---------+
| achievement_id | game_id    | earned_epoch | offline |
+----------------+------------+--------------+---------+
| 1              | 1480656849 |            0 |       1 |
| 2              | 1480656849 |            0 |       1 |
| 3              | 1480656849 |            0 |       1 |
| 4              | 1480656849 |   1149789371 |       0 |
| 7              | 1480656849 |   1149800406 |       0 |
| 8              | 1480656849 |            0 |       1 |
| 9              | 1480656849 |   1149794790 |       0 |
| 10             | 1480656849 |   1149792417 |       0 |
+----------------+------------+--------------+---------+
8 rows in set (0.02 sec)

In this particular case, the achievement table is the "master" table and will contain the information that I always want to see. The gamer_achievement table only contains information for achievements that are actually earned. For any particular game for any particular gamer, there can be any number of rows in the gamer_achievement table - including none if no achievements have been earned for that game. For example, in the sample data above, achievements with ids 5, 6, 11, and 12 have not been earned.

What I currently have written is

select a.id,
       a.name,
       ga.earned_epoch,
       ga.offline
from   achievement a 
       LEFT OUTER JOIN gamer_achievement ga 
       ON (a.id = ga.achievement_id and a.game_id = ga.game_id)
where  ga.gamer_id = 'fba8fcaa-f57b-44c6-9431-4ab78605b024' 
       and a.game_id = '1480656849'
order by convert (a.id, unsigned)

but this is only returning the full information for those achievements that have actually been earned - the unearned achievement information from the right side table (gamer_achievement) is not being show with the NULL values as I would expect from this type of query. This is what I am expecting to see:

+----+-------------------------------+--------------+---------+
| id | name                          | earned_epoch | offline |
+----+-------------------------------+--------------+---------+
| 1  | Cluster Buster                |            0 |       1 |
| 2  | Star Gazer                    |            0 |       1 |
| 3  | Flower Child                  |            0 |       1 |
| 4  | Oyster-meister                |   1149789371 |       0 |
| 5  | Big Cheese of the South Seas  |         NULL |    NULL |
| 6  | Hexic Addict                  |         NULL |    NULL |
| 7  | Collapse Master               |   1149800406 |       0 |
| 8  | Survivalist                   |            0 |       1 |
| 9  | Tick-Tock Doc                 |   1149794790 |       0 |
| 10 | Marathon Mogul                |   1149792417 |       0 |
| 11 | Millionaire Extraordinaire    |         NULL |    NULL |
| 12 | Grand Pearl Pooh-Bah          |         NULL |    NULL |
+----+-------------------------------+--------------+---------+
12 rows in set (0.00 sec)

What am I missing here? From what I understand, the basic query LOOKS right to me, but I'm obviously missing some piece of critical information.

Many have answered, but I'll try too and hopefully lend in some more clarification. How I have always interpreted it (and you can check so many other posts I've responded to with LEFT joins), I try to list the table I want everything from first (left side... hence read from left to right). Then left join to the "Other" table (right side) on whatever the criteria is between them... Then, when doing a left join, and there are additional criteria against the right side table, those conditions would stay with that join condition. By bringing them into the "WHERE" clause would imply an INNER JOIN (must always match) which is not what you want... I also try to always show the left table alias.field = right table alias.field to keep the correlation clear... Then, apply the where clause to the basis criteria you want from the first table.. something like

select 
      a.id,
      a.name,
      ga.earned_epoch,
      ga.offline
   from   
      achievement a 
         LEFT OUTER JOIN gamer_achievement ga 
             ON a.id = ga.achievement_id
            AND a.game_id = ga.game_id
            AND ga.gamer_id = 'fba8fcaa-f57b-44c6-9431-4ab78605b024'
   where
      a.game_id = '1480656849'
   order by 
      convert (a.id, unsigned)

Notice the direct relation between "a" and "ga" by the common ID and game ID values, but then tacked on the specific gamer. The where clause only cares at the outer level of achievement based on the specific game.

prevent sql injection in oracle "order by" part

14 votes

To get some data I'm creating an sql query :)
Of course there's some filtering and ordering parts.

To get the result I use "NamedParameterJdbcTemplate" and when I need to add something to the "where" part, I use parameter map, to prevent injection.

But it's different with "order by" part, as there is no automatic escaping (and it's a part of sql). This order part is sometimes filled with data from user (directly), sometimes put some extra sort parameters from inside code. There is one problem: sometimes this sort field contains not only column name, but a sql statement.

Now each parameter for sorting is escaped manually by replacing some characters (like ') to empty string, but some parameters we set for our code is a bit complex to pass this rule.

What is best way to prevent sql injections in sort part of query, when you use jdbc template?

To help guard against SQL injection on the database side, have a look at the DBMS_ASSERT built-in Oracle package: http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_assert.htm

You might find the SIMPLE_SQL_NAME function will help protect against SQL Injection for your ORDER BY clause.

Hope it helps...

How to select single row based on the max value in multiple rows

10 votes

Possible Duplicate:
SQL: Find the max record per group

I have a table with four columns as such:

name   major    minor  revision
p1     0        4      3
p1     1        0      0
p1     1        1      4
p2     1        1      1
p2     2        5      0
p3     3        4      4

This is basically ca table containing records for each version of a program. I want to do a select to get all of the programs and their latest version so the results would look like this:

name   major    minor  revision
p1     1        1      4
p2     2        5      0
p3     3        4      4

I can't just group by the name and get the max of each column because then i would just end up with the highest number from each column, but not the specific row with the highest version. How can I set this up?

The way I try to solve SQL problems is to take things step by step.

  • You want the maximum revision for the maximum minor version corresponding to the maximum major version for each product.

The maximum major number for each product is given by:

SELECT Name, MAX(major) AS Major FROM CA GROUP BY Name;

The maximum minor number corresponding to the maximum major number for each product is therefore given by:

SELECT CA.Name, CA.Major, MAX(CA.Minor) AS Minor
  FROM CA
  JOIN (SELECT Name, MAX(Major) AS Major
          FROM CA
         GROUP BY Name
       ) AS CB
    ON CA.Name = CB.Name AND CA.Major = CB.Major
 GROUP BY CA.Name, CA.Major;

And the maximum revision (for the maximum minor version number corresponding to the maximum major number for each product), therefore, is given by:

SELECT CA.Name, CA.Major, CA.Minor, MAX(CA.Revision) AS Revision
  FROM CA
  JOIN (SELECT CA.Name, CA.Major, MAX(CA.Minor) AS Minor
          FROM CA
          JOIN (SELECT Name, MAX(Major) AS Major
                  FROM CA
                 GROUP BY Name
               ) AS CB
            ON CA.Name = CB.Name AND CA.Major = CB.Major
         GROUP BY CA.Name, CA.Major
       ) AS CC
    ON CA.Name = CC.Name AND CA.Major = CC.Major AND CA.Minor = CC.Minor
 GROUP BY CA.Name, CA.Major, CA.Minor;

Tested - it works and produces the same answer as Andomar's query does.


Performance

I created a bigger volume of data (11616 rows of data), and ran a benchmark timing of Andomar's query against mine - target DBMS was IBM Informix Dynamic Server (IDS) version 11.70.FC2 running on MacOS X 10.7.2. I used the first of Andomar's two queries since IDS does not support the comparison notation in the second one. I loaded the data, updated statistics, and ran the queries both with mine followed by Andomar's and with Andomar's followed by mine. I also recorded the basic costs reported by the IDS optimizer. The result data from both queries were the same (so the queries are both accurate - or equally inaccurate).

Table unindexed:

Andomar's query                           Jonathan's query
Time: 22.074129                           Time: 0.085803
Estimated Cost: 2468070                   Estimated Cost: 22673
Estimated # of Rows Returned: 5808        Estimated # of Rows Returned: 132
Temporary Files Required For: Order By    Temporary Files Required For: Group By

Table with unique index on (name, major, minor, revision):

Andomar's query                           Jonathan's query
Time: 0.768309                            Time: 0.060380
Estimated Cost: 31754                     Estimated Cost: 2329
Estimated # of Rows Returned: 5808        Estimated # of Rows Returned: 139
                                          Temporary Files Required For: Group By

As you can seen, the index dramatically improves the performance of Andomar's query, but it still seems to be more expensive on this system than my query. The index gives a 25% time saving for my query. I'd be curious to see comparable figures for the two versions of Andomar's query on comparable volumes of data, with and without the index. (My test data can be supplied if you need it; there were 132 products - the 3 listed in the question and 129 new ones; each new product had (the same) 90 version entries.)

The reason for the discrepancy is that the sub-query in Andomar's query is a correlated sub-query, which is a relatively expensive process (dramatically so when the index is missing).

Are SQL ANY and SOME keywords synonyms in all SQL dialects?

9 votes

In Postgres, ANY and SOME are synonyms when used on the right hand side of a predicate expression. For example, these are the same:

column = ANY (SELECT ...)
column = SOME (SELECT ...)

This is documented here:

http://www.postgresql.org/docs/9.1/static/functions-subquery.html#FUNCTIONS-SUBQUERY-ANY-SOME

I have observed ANY and SOME to be supported by at least these SQL DBMSs:

  • DB2
  • Derby
  • H2
  • HSQLDB
  • Ingres
  • MySQL
  • Oracle
  • Postgres
  • SQL Server
  • Sybase ASE
  • Sybase SQL Anywhere

Can I safely assume that all of those dialects (and others, too) treat ANY and SOME as synonyms or is there a subtle difference between the two keywords in any/some DBMS?

I have found this in the SQL92 definition:

<quantifier> ::= <all> | <some>
<all> ::= ALL
<some> ::= SOME | ANY

This doesn't say anything about the semantics of ANY and SOME. Later on in the document, only <some> is referenced, not the two keywords. I'm suspecting that there might be a subtle difference in NULL handling, for instance, at least in some DBMSs. Any/some pointer to a clear statement whether this can be assumed or not is welcome.

Few lines after what you're quoting, the SQL92 standard also specifies the semantics for <some>, namely:

c) If the implied <comparison predicate> is true for at least
one row RT in T, then "R <comp op> <some> T" is true.

d) If T is empty or if the implied <comparison predicate> is
false for every row RT in T, then "R <comp op> <some> T" is
false.

e) If "R <comp op> <quantifier> T" is neither true nor false,
then it is unknown.

These rules apply for the <some> token, independent on whether it is the SOME or ANY alternative, so yes, they are synonyms according to the standard

Would relational databases scale as well (or better) than their NoSQL counterparts if we drop the relationships?

8 votes

Disclaimer: This is a broad question, so it could be moved to a different source (if the admins find it appropriate).

All the cool kids seem to be dropping relational databases in favor of their NoSQL counterparts. Everyone will have their reasons, from scaling issues to simply being on the bleeding edge of tech. And, I am not here to question their motives.

However, what I am interested in is whether any NoSQL transitions ever validated the performance (maintenance) gains over a traditional RDBMS when relationships were dropped. Why would we want to use a RDBMS when the core reason it exists is dropped? A few reasons come to mind

  1. 30+ years of academic and work research in developing these systems
  2. A well-known language in Structured Query Language (SQL).
  3. Stable and mature ORM support across technologies (Hibernate, ActiveRecord)

Clearly, in the modern world where horizontal scaling is important, there is a need to make sure that shards are fault tolerant, updated within the time intervals required by the app, etc. However, those needs shouldn't necessarily be the responsibility of a system that stores data (case in point: ZooKeeper).

Also, I acknowledge that research should be dedicated to NoSQL and that time spent in this arena will clearly lead to better more internet worthy technologies. However, a comparison of sorts between NoSQL and traditional RDBMS offerings (minus relationships) would be useful in making business decisions.

UPDATE 1: When I refer to NoSQL databases, I am talking about data stores that may not require fixed table schemas and usually avoid join operations. Hence, the emphasis in the question on dropping the relationships in a traditional SQL RDBMS

I don't find that inter-table relationships are the main limiter for scalability. I use queries with joins regularly and get good scalability if indexes are defined well.

The greater limiter for scalability is the cost of synchronous I/O. The requirements of consistency and durability -- that the DBMS actually and reliably saves data when it tells you it saved data -- is expensive.

Several NoSQL products that are currently in vogue achieve great performance by weakening their consistency and durability guarantees in their default configuration. There are many reports of CouchDB or MongoDB losing data.

There are ways you can configure those NoSQL products to be more strict about durability, but then you sacrifice their impressive performance numbers.

Likewise, you can make an SQL database achieve high performance like the NoSQL products, by disabling the default features that ensure data safety. See RunningWithScissorsDB.

PS: If you think document-oriented databases are "cutting edge", I invite you to read about MUMPS. Everything old is new again. :-)

How to tally wins and losses using SUM and CASE?

7 votes

I am using SQL Server 2008

I am trying to tally the wins and losses for any given bike. Each time a user votes, he casts a vote for one bike (1) and a vote against another bike (0).

My vote table looks like this:

VoteID --- BikeID ---- Vote
1          100         1
2          101         0
3          100         0
4          101         1
5          102         1
6          100         0
7          102         0
8          101         1

I want my results to look like this when I run a query for a specific bike

Wins -- Losses
5       6

Right now, my results look like this:

Wins --- Losses
5        NULL
NULL     6   

My query looks like this:

SELECT  SUM(CASE WHEN Vote = 1 THEN 1 END) AS Wins,
    SUM(CASE WHEN Vote = 0 THEN 1 END) AS Losses
FROM     Votes
WHERE    BikeID = 101
GROUP BY Vote

What do I need to do to get the results on one line?

SELECT  SUM(CASE WHEN Vote = 1 THEN 1 ELSE 0 END) AS Wins,
    SUM(CASE WHEN Vote = 0 THEN 1 ELSE 0 END) AS Losses
FROM     Votes
WHERE    BikeID = 101

The problem is your case statements did not cover all conditions, and so were returning null for the cases where they did not account for.

Also, you did not need the group by vote, since you aren't actually selecting the vote outside the aggregates.

Retrieve an object from entityframework without ONE field

7 votes

I'm using entity framework to connect with the database. I've one little problem:

I've one table which have one varbinary(MAX) column(with filestream).

I'm using SQL request to manage the "Data" part, but EF for the rest(metadata of the file).

I've one code which has to get all files id, filename, guid, modification date, ... of a file. This doesn't need at all the "Data" field.

Is there a way to retrieve a List but without this column filled?

Something like

context.Files.Where(f=>f.xyz).Exclude(f=>f.Data).ToList();

??

I know I can create anonymous objects, but I need to transmit the result to a method, so no anonymous methods. And I don't want to put this in a list of anonymous type, and then create a list of my non-anonymous type(File).

The goal is to avoid this:

using(RsSolutionsEntities context = new RsSolutionsEntities())
{
    var file = context.Files
        .Where(f => f.Id == idFile)
        .Select(f => new {
            f.Id, f.MimeType, f.Size, f.FileName, f.DataType,
            f.DateModification, f.FileId
        }).FirstOrDefault();

    return new File() {
        DataType = file.DataType, DateModification = file.DateModification,
        FileId = file.FileId, FileName = file.FileName, Id = file.Id,
        MimeType = file.MimeType, Size = file.Size
    };
}

(I'm using here the anonymous type because otherwise you will get a NotSupportedException: The entity or complex type 'ProjectName.File' cannot be constructed in a LINQ to Entities query.)

(e.g. this code throw the previous exception:

File file2 = context.Files.Where(f => f.Id == idFile)
  .Select(f => new File() {Id = f.Id, DataType = f.DataType}).FirstOrDefault();

and "File" is the type I get with a context.Files.ToList(). This is the good class:

using File = MyProjectNamespace.Common.Data.DataModel.File;

File is a known class of my EF datacontext:

public ObjectSet<File> Files
{
    get { return _files  ?? (_files = CreateObjectSet<File>("Files")); }
}
private ObjectSet<File> _files;

Is there a way to retrieve a List but without this column filled?

Not without projection which you want to avoid. If the column is mapped it is natural part of your entity. Entity without this column is not complete - it is different data set = projection.

I'm using here the anonymous type because otherwise you will get a NotSupportedException: The entity or complex type 'ProjectName.File' cannot be constructed in a LINQ to Entities query.

As exception says you cannot project to mapped entity. I mentioned reason above - projection make different data set and EF don't like "partial entities".

Error 16 Error 3023: Problem in mapping fragments starting at line 2717:Column Files.Data in table Files must be mapped: It has no default value and is not nullable.

It is not enough to delete property from designer. You must open EDMX as XML and delete column from SSDL as well which will make your model very fragile (each update from database will put your column back). If you don't want to map the column you should use database view without the column and map the view instead of the table but you will not be able to insert data.

As a workaround to all your problems use table splitting and separate the problematic binary column to another entity with 1 : 1 relation to your main File entity.

Selecting specific records in a SQL Server database from C#

6 votes

I am currently trying to grab some rows from a SQL Server database using C# that are of the following criteria:

  • From the RamResults database
  • in the Results table
  • where the Date column is equal to the current date

I have the following so far:

// Open the same connection with the same connection string.
using (SqlCeConnection con = new SqlCeConnection(DatabaseControl.conString))
{
   con.Open();
   // Read specific values in the table.
   using (SqlCeCommand com = new SqlCeCommand("SELECT Result FROM RamResults WHERE Date == @Form1.date", con))
   {
      SqlCeDataReader reader = com.ExecuteReader();
      while (reader.Read())
      {
         int resultsoutput = reader.GetInt32(0);
         MessageBox.Show(resultsoutput.ToString());
      }
   }
}

Using SELECT Result FROM RamResults WHERE Date == Form1.date throws an error:

There was an error parsing the query. [ Token line number = 1,Token line offset = 43,Token in error = = ]

Although if I take out the WHERE statement e.g.

SELECT Result FROM RamResults

it works perfectly

Description

2 things

  1. Use = instead of == because this is the right equals operator in T-SQL. Your Query should be like this

    SELECT Result FROM RamResults WHERE Date = @Date

  2. You forget to pass in the parameter.

Sample

// Open the same connection with the same connection string.
using (SqlCeConnection con = new SqlCeConnection(DatabaseControl.conString))
{
    con.Open();
    // Read specific values in the table.
    using (SqlCeCommand com = new SqlCeCommand("SELECT Result FROM RamResults WHERE Date = @Date", con))
    {
        com.Parameters.AddWithValue("@Date", Form1.date);
        SqlCeDataReader reader = com.ExecuteReader();
        while (reader.Read())
        {
            int resultsoutput = reader.GetInt32(0);
            MessageBox.Show(resultsoutput.ToString());
        }
    }
}

MySQL Query Optimization - inner queries

6 votes

This is the whole query...

SELECT s.*, (SELECT url FROM show_medias WHERE show_id = s.id AND is_primary = 1) AS media_url
FROM (shows As s)
WHERE `s`.`id` IN (
 SELECT DISTINCT st.show_id
 FROM show_time_schedules AS sts
 LEFT JOIN show_times AS st ON st.id = sts.show_time_id
 WHERE sts.schedule_date BETWEEN CAST('2012-01-10' AS date) AND CAST('2012-01-14' AS date)
 )
AND `s`.`is_active` = 1
ORDER BY s.name asc 

If...

SELECT url FROM show_medias WHERE show_id = s.id AND is_primary = 1
(0.0004 sec)

And...

 SELECT DISTINCT st.show_id
 FROM show_time_schedules AS sts
 LEFT JOIN show_times AS st ON st.id = sts.show_time_id
 WHERE sts.schedule_date BETWEEN CAST('2012-01-10' AS date) AND CAST('2012-01-14' AS date)
(0.0061 sec)

Is there an obvious reason....

SELECT s.*, (inner query 1) AS media_url
FROM (shows As s)
WHERE `s`.`id` IN ( inner query 2 )
AND `s`.`is_active` = 1
ORDER BY s.name asc

is taking 5.7245 sec?

EXPLAIN EXTENDED

id  select_type         table       type    possible_keys   key     key_len ref                     rows    filtered    Extra
1   PRIMARY             s           ALL     NULL            NULL    NULL    NULL                    151     100.00      Using where; Using filesort
3   DEPENDENT SUBQUERY  sts         ALL     NULL            NULL    NULL    NULL                    26290   100.00      Using where; Using temporary
3   DEPENDENT SUBQUERY  st          eq_ref  PRIMARY         PRIMARY 4       bvcdb.sts.show_time_id  1       100.00      Using where
2   DEPENDENT SUBQUERY  show_medias ALL     NULL            NULL    NULL    NULL                    159     100.00      Using where

You can always use EXPLAIN or EXPLAIN EXTENDED to see what MySql is doing with a query

You could also write your query a slightly different way, have you tried the following?

SELECT        s.*, 
              sm.url AS media_url 
FROM          shows AS s
INNER JOIN    show_medias AS sm ON s.id = SM.show_id
WHERE `s`.`id` IN ( 
                        SELECT DISTINCT st.show_id 
                        FROM show_time_schedules AS sts 
                        LEFT JOIN show_times AS st ON st.id = sts.show_time_id 
                        WHERE sts.schedule_date BETWEEN CAST('2012-01-10' AS date) AND CAST('2012-01-14' AS date) 
                        ) 
AND            `s`.`is_active` = 1 
AND            sm.is_primary = 1
ORDER BY       s.name asc 

It would be interesting to see what the effect of that is. I would expect it to be faster as, at the moment, I think MySql will be running inner query 1 for each show you have (so that one query will be run many times. A join should be more efficient.)

Replace the INNER JOIN with a LEFT JOIN if you want all shows that don't have a row in show_medias.

EDIT:

I'll take a look at your EXPLAIN EXTENDED shortly, I also wonder if you want to try the following; it removes all of the subqueries:

SELECT        DISTINCT s.*,  
                       sm.url AS media_url  
FROM                   shows AS s 
INNER JOIN             show_medias AS sm ON s.id = SM.show_id
INNER JOIN             show_times AS st ON (s.id = st.show_id)
RIGHT JOIN             show_time_schedules AS sts ON (st.id = sts.show_time_id)

WHERE                  `s`.`is_active` = 1  
AND                    sm.is_primary = 1 
AND                    sts.schedule_date BETWEEN CAST('2012-01-10' AS date) AND CAST('2012-01-14' AS date)  
ORDER BY               s.name asc 

(It would also be good to see the EXPLAIN EXTENDED on these - you could add it to the comments for this one).

Further EDIT:

On your EXPLAIN EXTENDED (a good start on how to read these is here)

The USING FILESORT and USING TEMPORARY are both key indicators. Hopefully, the second query I recommend should remove any TEMPORARY tables (in the subquery). Try then leaving the ORDER BY off to see if that makes a difference (and we can add that to the findings so far :-)

I can also see that the query is potentially missing out on a lot of index lookups; all of your id columns are prime candidates for index matches (with the usual index caveats). I'd also try adding those indexes and then running EXPLAIN EXTENDED again to see what the difference is now (EDIT as we already know from your comment above!)

improving performance of mysql load data infile

6 votes

I'm trying to bulk load around 12m records into a InnoDB table in a (local) mysql using LOAD DATA INFILE (from CSV) and finding it's taking a very long time to complete.

The primary key type is UUID and the keys are unsorted in the data files.

I've split the data file into files containing 100000 records and import it as:

mysql -e 'ALTER TABLE customer DISABLE KEYS;'
for file in *.csv
    mysql -e "SET sql_log_bin=0;SET FOREIGN_KEY_CHECKS=0; SET UNIQUE_CHECKS=0;
    SET AUTOCOMMIT=0;LOAD DATA INFILE '${file}' INTO TABLE table 
    FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';  COMMIT"

This works fine for the first few hundred thousand records but then the insert time for each subsequent load seems to keep growing (from around 7 seconds to around 2 minutes per load before I killed it.)

I'm running on a machine with 8GB RAM and have set the InnoDB parameters to:

innodb_buffer_pool_size =1024M
innodb_additional_mem_pool_size =512M
innodb_log_file_size = 256M
innodb_log_buffer_size = 256M

I've also tried loading a single CSV containing all rows with no luck - this ran in excess of 2 hours before I killed it.

Is there anything else that could speed this up as this seems like an excessive time to only load 12m records?

Its always hard to tell what is the cause of performance issues but these are my 2 cents: Your key being a uuid is randomly distributed which makes it hard to maintain an index. The reason being that keys are stored by range in a file system block, so having random uuids follow each other makes the OS read and write blocks to the file system without leveraging the cache. I don't know if you can change the key, but you could maybe sort the uuids in the input file and see if that helps. FYI, to understand this issue better I would take a look at this blog post and maybe read this book mysql high performance it has a nice chapter about innodb clustered index. Good Luck!

Oracle 11g DB returning Streams instead of Strings

6 votes

I've got a new database here and it's an upgraded version from Oracle 10g to Oracle 11g - the main problem is with LOB columns and everytime any function returns a LOB as result, the new database won't return strings like the old one did:

Old DB:

["C"]=>
string(23) "3874163,3874197,3874201"

New DB:

["C"]=>
resource(182) of type (stream)

Now when reading the streams sometimes there is an error of a non-existing stream resource beeing referenced and everything fails. I'm guessing the connection closed in the meantime without the stream beeing read and therefore the access is lost.

When changing the statements to include a casting against varchar for example:

CONVERT(VARCHAR, C, 120)

Or like this:

SELECT TO_CHAR(FUNC())

The value is returned as a string again but this is not really an optimal solution as every statement would need to be changed.

Is there any way/option to prevent LOBs from beeing delivered as streams so they are instead delivered as strings like in Oracle 10g?

Edit:
We are using the oci function-set for db access.

Not really an answer as such but a few items that I hope helps.

It looks like that there is a small difference in the way that LOBs are returned between 10g and 11g, under 11g there is some notes about the conversion from btyes to byteStreams when LOBs are over a certain value, in the JDBC reference manual (I understand that this doesnt effect OCI calls as they use a different driver set).

From what I can see in terms of the OCI8 functions within php the default operation of the fetch functions is that the LOBs are returns as a reference and need to be accessed using the ->read() ->load() etc functions (see http://au.php.net/oci_fetch_array - regarding the mode and the default).

Now I dont know if you are using the OCI functions to access your oracle system as it's not specified in your question.

Couple of other items that would help in figuring this out would be if you could let us know if you recompiled php or updated the oracle drivers with the newer client version at all.

I know its not a full solution but if you are using oci_fetch_* to return the row, add a second argument to the call of OCI_RETURN_LOBS, this will cause the fetch to return a string of the LOB field instead of a reference to a stream, or use the $variable["C"]->load() to access this LOB this will cause it to load the full stream and act like a normal string.

Hope this helps.

Is there any 'web-based' sql test environment?

6 votes

In HTML+CSS+JS world, http://jsfiddle.net/ is very helpful tool for asking / making example about web development. And I also saw several browser(javascript)-based programming language compilers and REPLs. But I can't find online / web-based test environment for database operations( especially for RDBMS ).

Is there any open/free database service with web-based interfaces for testing queries?


Added: This tool will be good for this situation; If I'm troubling with complex queries, then create a sample table via web interface and ask it on stackoverflow with the 'sample table URL'. Anyone can access to the URL and test their queries on web site. (Yes, queries are running on 'real' database system) And also the query results can be tracked, then we can even make 'ranking' for it :)

There are free "disposable" database servers like db4free and FreeSQL (offline?) and even MonoQL.

As far as the web-based interfaces and short URLs go, I don't think you'll have much luck.

To manage your data you have to stick to what is provided (usually phpMyAdmin or similar) and there is no short-URL to query mapping. One other caveat of such system is that (without the appropriate user permissions) one user could easily destroy all your test data -- and remember that (relational) database versioning is much more expensive than plain text versioning, so that's pretty much out of the question.

For non-RDBMS, I can think of try.mongodb.org -- but it suffers from the same problems.

Almost forgot, the Stack Exchange Data Explorer, lets you practice T-SQL queries (with permalinks).


PS: As a personal side-note, I think it's a cool idea and I would love to see something like that implemented, perhaps even mashed-up with SchemaBank or similar - that would be just awesome.

Hotel Reservation System Sql Query?

6 votes

I want to build a Hotel Reservation System. For this system; database is also used fro an other program... But i have problem: before the reservation i want to see which number of rooms type are available for my for my reservation..

My table create sql querys

CREATE TABLE oteldb.oda (
oda_id INT (11) NOT NULL auto_increment,
oda_tip_id INT (11) DEFAULT NULL,
oda_adi VARCHAR (20) DEFAULT NULL,
PRIMARY KEY (oda_id)
)
ENGINE = MyISAM
AUTO_INCREMENT = 1
CHARACTER SET utf8
COLLATE utf8_general_ci;

CREATE TABLE oteldb.tip (
tip_id INT (11) NOT NULL auto_increment,
tip_adi VARCHAR (20) DEFAULT NULL,
PRIMARY KEY (tip_id)
)
ENGINE = MyISAM
AUTO_INCREMENT = 1
CHARACTER SET utf8
COLLATE utf8_general_ci
ROW_FORMAT = FIXED;

CREATE TABLE oteldb.rezervasyon (
rezervasyon_id INT (11) NOT NULL auto_increment,
rezervasyon_gt DATE DEFAULT NULL,
rezervasyon_ct DATE DEFAULT NULL,
rezervasyon_oda_id INT (11) DEFAULT NULL,
PRIMARY KEY (rezervasyon_id)
)
ENGINE = MyISAM
AUTO_INCREMENT = 1
CHARACTER SET utf8
COLLATE utf8_general_ci;

i try this but not work

SELECT
*
FROM
oteldb.tip
WHERE
IN tip.tip_id
(SELECT
oteldb.oda.oda_tip_id
FROM
oteldb.oda
WHERE
IN oda.oda_id note

(SELECT
oteldb.rezervasyon.rezervasyon_oda_id
FROM
oteldb.rezervasyon
WHERE
"2012-01-03" BETWEEN AND rezervasyon_ct rezervasyon_gt
AND "2012-01-22" AND BETWEEN rezervasyon_gt rezervasyon_ct))

thanks now...

Assuming that available rooms are those that are not already reserved at any time during the query period, and that rezervasyon_gt and rezervasyon_ct are the reservation start and end dates respectively, try:

select t.tip_adi, count(oda.oda_tip_id)
from oteldb.tip t
left join (select oda_tip_id
           from oteldb.oda o
           where not exists
                (select null
                 from oteldb.rezervasyon r
                 where r.rezervasyon_oda_id = o.oda_id and
                       r.rezervasyon_gt <= '2012-01-22' and
                       '2012-01-03' <= r.rezervasyon_ct)
           ) oda on oda.oda_tip_id = t.tip_id
group by t.tip_adi

Tool for making diagram from SQL query

6 votes

I have this complicated SQL query for Oracle that I want to visualize in a diagram to make it understandable for my co-workers. I tried at http://snowflakejoins.com but it just chokes on it.

Has someone a better suggestion? I prefer a web-app on the internet and if not a desktop app for windows.

with 
  logs as (
    select 
      l.job_id, 
      l.subjob,
      sum(l.verwerkt) verwerkt, 
      sum(l.errors) errors, 
      max(l.datum) laatst
     from 
      dinf_monitor_logs l, 
      dinf_monitor_jobs j 
     where 
      l.datum>sysdate-j.dagen
      and j.job_id=l.job_id(+)
     group by 
      l.job_id, 
      l.subjob
  ),
  alllogs as (
    select job_id, subjob, max(datum) laatst from dinf_monitor_logs group by job_id, subjob
  )
  select row_number() over(order by alllogs.job_id, alllogs.subjob) r,
    alllogs.job_id,
    alljobs.naam,
    alllogs.subjob,
    logs.verwerkt, 
    logs.errors, 
    alllogs.laatst datum,
    alljobs.wikilink,
    alljobs.loglink,
    alljobs.contact,
    case 
      when alllogs.laatst is null then 1
      when round(sysdate-(alllogs.laatst+alljobs.dagen))<0 then 0
      else round(sysdate-(alllogs.laatst+alljobs.dagen))
    end overtijd,
    case 
      when logs.errors-alljobs.max_errors>0 then 5
      when logs.verwerkt-alljobs.min_verwerkt<0 then 7
      when round(sysdate-(alllogs.laatst+alljobs.dagen))>0 then 3
      else 11
    end status
  from logs, alllogs, (select job_id, naam, wikilink, loglink, contact, dagen, min_verwerkt, max_errors from dinf_monitor_jobs) alljobs
  where 
    logs.job_id(+)=alllogs.job_id 
    and logs.subjob(+)=alllogs.subjob
    and alllogs.job_id=alljobs.job_id
  order by alllogs.job_id, alllogs.subjob

You can use the "Query Builder" tab of the Oracle's SQL Developer.

The result of your sample query will be:

query in query builder

SELECT COUNT(*) with an ORDER BY

6 votes

Will the following two queries be executed in the same way?

SELECT COUNT(*) from person ORDER BY last_name;

and

SELECT COUNT(*) from person;

Either way they should display the same results, so I was curious if the ORDER BY just gets ignored.

The reason I am asking is because I am displaying a paginated table where I will get 20 records at a time from the database and then firing a second query that counts the total number of records. I want to know if I should use the same criteria that the first query used, or if I should be removing all sorting from the criteria?

According to the execution plan, the two queries are different. For example, the query:

select count(*) from USER

Will give me:

INDEX (FAST FULL SCAN)  3.0 3   453812  3457        1   TPMDBO  USER_PK FAST FULL SCAN  INDEX (UNIQUE)  ANALYZED

As you can see, we hit USER_PK which is the primary key of that table.

If I sort by a non-indexed column:

select count(*) from USER ORDER BY FIRSTNAME --No Index on FIRSTNAME

I'll get:

TABLE ACCESS (FULL) 19.0    19  1124488 3457    24199   1   TPMDBO  USER    FULL    TABLE   ANALYZED    1

Meaning we did a full table scan (MUCH higher node cost)

If I sort by the primary key (which is already index,) Oracle is smart enough to use the index to do that sort:

INDEX (FAST FULL SCAN)  3.0 3   453812  3457    13828   1   TPMDBO  USER_PK FAST FULL SCAN  INDEX (UNIQUE)  ANALYZED

Which looks very similar to the first execution plan.

So, the answer to your question is absolutely not - they are not the same. However, ordering by an index that Oracle is already seeking anyway will probably result in the same query plan.

MS Access: WHERE-EXISTS-clause not working on views?

6 votes

Prerequisites: In MS Access 2010 create the following tables:

CREATE TABLE ATBL(ID INT);
INSERT INTO ATBL(ID) VALUES (1);
INSERT INTO ATBL(ID) VALUES (2);
INSERT INTO ATBL(ID) VALUES (3);

CREATE TABLE BTBL(ID INT);
INSERT INTO BTBL(ID) VALUES (1);
INSERT INTO BTBL(ID) VALUES (2);

Also create a view called BVIEW which uses the following SELECT statement:

SELECT A.ID FROM ATBL AS A WHERE A.ID = 1 OR A.ID = 2

Now BVIEW should have the same contents as BTBL. Nevertheless the following two queries will return different results:

SELECT A.ID FROM ATBL AS A WHERE EXISTS (SELECT 1 FROM  BTBL AS B WHERE B.ID=A.ID)
SELECT A.ID FROM ATBL AS A WHERE EXISTS (SELECT 1 FROM BVIEW AS B WHERE B.ID=A.ID)

The first query returns two records (1 and 2), but the second query returns all records from ATBL. What's wrong here? Am I missing something?

A view is really a saved SQL SELECT statement. At least, this is what a saved view in MS Access is. And you use the same inner variables A and B. IMHO, they are getting mixed. The last line really looks as

SELECT A.ID FROM ATBL AS A WHERE EXISTS (SELECT 1 FROM (SELECT A.ID FROM ATBL AS A WHERE A.ID = 1 OR A.ID = 2) AS B WHERE B.ID=A.ID)

Try to change some inner names, for example:

SELECT AA.ID FROM ATBL AS AA WHERE AA.ID = 1 OR AA.ID = 2

So, the last line will look as

SELECT A.ID FROM ATBL AS A WHERE EXISTS (SELECT 1 FROM (SELECT AA.ID FROM ATBL AS AA WHERE AA.ID = 1 OR AA.ID = 2) AS B WHERE B.ID=A.ID)

So, as we see here, MS Access even doesn't know how to isolate aliases!

Fastest and most efficient way to pre-populate database in Android

5 votes

If you want to pre-populate a database (SQLite) in Android, this is not that easy as one might think.

So I found this tutorial which is often referenced here on Stack Overflow as well.

But I don't really like that way of pre-populating the database since you take the control from the database handler and create the files yourself. I would prefer to not touch the file system and let the database handler do everything on its own.

So what I thought one could do is create the database in the database handler's onCreate() as usual but then load a file (.sql) from /assets which contains the statements to fill in the values:

INSERT INTO testTable (name, pet) VALUES ('Mike', 'Tiger');
INSERT INTO testTable (name, pet) VALUES ('Tom', 'Cat');
...

But calling execSQL() in the handler's onCreate() doesn't really work. It seems that the /assets file must not have more than 1MB and the execSQL() only executes the first statement (Mike - Tiger).

What would you do do pre-populate the database?

I suggest the following:

  1. Wrap all of your INSERT logic into a transaction (BEGIN... COMMIT, or via the beginTransaction()... endTransaction() APIs)
  2. As already suggested, utilize the bind APIs and recycle objects.
  3. Don't create any indexes until after this bulk insert is complete.

Additionally take a look at Faster bulk inserts in sqlite3?

NOLOCK with Multithreading

5 votes

I am working on a multithreaded application (C#), and 2 threads are updating a table using NOLOCK at the same time, is that a problem? To be more specific, they are both updating the same records.

The answer is "it depends".

NOLOCK allows 'dirty reads'. This means that within a transaction, you may see uncommitted data from another transaction. If you have multiple threads updating the same row in the same table, you may see the modified value of data that the other thread touched, before that thread commits it's transaction.

For example, take the table account_balances, defined as (account_num int, balance decimal(12,2)). Let's assume the following happens:

// precondition, account #1 has a balance of 10.00

  1. Thread #1 starts a transaction, decrements account #1 by 10
  2. Thread #2 starts a transaction, attempts to read the balance of account #1. It reads a balance of 0.
  3. Thread #2 decrements the account by $5, and issues an overdraft to the customer (their balance is -5)
  4. Thread #1 rolls back it's transaction
  5. Thread #2 commits it's transaction

// The account balance is now -5, even though it should be 5.

What you won't see is some form of inconsistent data within a field- the nolock hint isn't like running multi-threaded code without a lock- individual writes are still atomic.

How should I perform automated testing of SQL ETL Scripts & SPROCs?

5 votes

We have a number of DTSX Packages that Extract-Transform-Load data from a number of legacy Pervasive Databases -> SQL Server 2k8 R2. (via ODBC)

The majority of our ETL process T-SQL scripts

The legacy data is a badly structured mess & most often, the tweaks & changes we make to our ETL scripts are to deal with this malformed data.

What we'd like to do is put some process in place to automate testing of these SPROCS / Scripts that perform these transformations.

What we've come up with ourselves is to:

  • Have some known datasource input (hosted in a local pervasive database)
  • Run the DTSX Packages to generate the output
  • Have some expected & pre-created output somewhere else within SQL Server.
  • Write our own verification scripts to compare the expected with the actual.

Just wondering if there are any other processes/systems/db-testing applications we should be considering also.

Have a look at tSQLt - the Database Unit Testing Framework for SQL Server. It makes writing robust tests fairly easy.

It does not (yet) allow for testing of scripts, but you could wrap those into stored procedures.

Calling user defined functions with Create Table

5 votes

I have the following function

CREATE FUNCTION GetIdentity() RETURNS INT AS
BEGIN
  RETURN (IDENT_CURRENT('tblTempPo'))
END
GO

I need to call it with create table

create table tblTempPo
(
ID int null,
 BrickVolume AS
              (
              GetIdentity() 
              )
)

I'm getting the error

'GetIdentity' is not a recognized built-in function name.

How can I solve this?

You need to add dbo (or whatever the schema name is) to properly call the function:

create table tblTempPo
(
    ID int null,
    BrickVolume AS(dbo.GetIdentity())
)

Although, for your example to work, you'd want to do something like this:

CREATE TABLE tblTempPo
(
    ID INT IDENTITY(1,1) NOT NULL,
    AnotherField VARCHAR(10),
    BrickVolume AS (dbo.GetIdentity())
)
GO

INSERT INTO tblTempPo VALUES('a')
INSERT INTO tblTempPo VALUES('b')
INSERT INTO tblTempPo VALUES('c')
SELECT * FROM tblTempPo

The SELECT statement will yield the results:

ID     AnotherField     BrickVolume
-----------------------------------
 1          a               3
 2          b               3
 3          c               3