Best database questions in November 2011

"where 1=1" statement

74 votes

Possible Duplicate:
Why would someone use WHERE 1=1 AND <conditions> in a SQL clause?

I saw some people use a statement to query a table in a MySQL database like the following:

select * from car_table where 1=1 and value="TOYOTA"

But what does 1=1 mean here?

It's usually when folks build up SQL statements.

When you add and value = "Toyota" you don't have to worry about whether there is a condition before or just WHERE. The optimiser should ignore it

No magic, just practical


Example Code:

commandText = "select * from car_table where 1=1";

if (modelYear <> 0)     commandText += " and year="+modelYear
if (manufacturer <> "") commandText += " and value="+QuotedStr(manufacturer)
if (color <> "")        commandText += " and color="+QuotedStr(color)
if (california)         commandText += " and hasCatalytic=1"

Otherwise you would have to have a complicated set of logic:

commandText = "select * from car_table"
whereClause = "";
if (modelYear <> 0)
{
   if (whereClause <> "") 
      whereClause = whereClause + " and ";
   commandText += "year="+modelYear;
}
if (manufacturer <> "")
{    
   if (whereClause <> "") 
      whereClause = whereClause + " and ";
   commandText += "value="+QuotedStr(manufacturer)
}
if (color <> "")
{
   if (whereClause <> "") 
      whereClause = whereClause + " and ";
   commandText += "color="+QuotedStr(color)
}
if (california)
{
   if (whereClause <> "") 
      whereClause = whereClause + " and ";
   commandText += "hasCatalytic=1"
}

if (whereClause <> "")
   commandText = commandText + "WHERE "+whereClause;

Using STM and Database transactions together

7 votes

I have been using Haskell's STM library and I really like the ability to compose transactions and the general "you-can't-get-this-wrong" nature of STM.

For good reason, STM does not allow IO actions within a transaction. There is no way to retry an IO action. (insert Launch missiles reference here). Database transactions on the other hand do have some atomicity guarantees that are very similar. Is there an accepted way to use the two together?

Interleaving different sorts of transactions into a single transaction notion is called "transactional boosting" and there isn't a great way to do it in Haskell's STM at the moment. There is, however, a way to build up actions to be performed only on commit or only on retry: http://hackage.haskell.org/package/stm-io-hooks

Additionally, you could try the twilight-stm project, which provides an an explicit "twilight" between transactions ending and actual commit. As far as I know, the code provided is more of a reference implementation than one tuned for performance, however: http://proglang.informatik.uni-freiburg.de/projects/twilight/

Nonetheless, depending on your application, it may turn out to be plenty-fast for your purposes.

Best way to manage thread pool against database queue

6 votes

I have a data table full of summary entries and my software needs to go through and reach out to a web service to get details, then record those details back to the database. Looping through the table synchronously while calling the web service and waiting for the response is too slow (there are thousands of entries) so I'd like to take the results (10 or so at a time) and thread it out so it performs 10 operations at the same time.

My experience with C# threads is limited to say the least, so what's the best approach? Does .NET have some sort of threadsafe queue system that I can use to make sure that the results get handled properly and in order?

Depending on which version of the .NET Framework you have two pretty good options.

You can use ThreadPool.QueueUserWorkItem in any version.

int pending = table.Rows.Count;
var finished = new ManualResetEvent(false);
foreach (DataRow row in table.Rows)
{
  DataRow capture = row; // Required to close over the loop variable correctly.
  ThreadPool.QueueUserWorkItem(
    (state) =>
    {
      try
      {
        ProcessDataRow(capture);
      }
      finally
      {
         if (Interlocked.Decrement(ref pending) == 0) 
         {
           finished.Set();  // Signal completion of all work items.
         }
      }
    }, null);
}
finished.WaitOne(); // Wait for all work items to complete.

If you are using .NET Framework 4.0 you can use the Task Parallel Library.

var tasks = new List<Task>();
foreach (DataRow row in table.Rows)
{
  DataRow capture = row; // Required to close over the loop variable correctly.
  tasks.Add(
    Task.Factory.StartNew(
      () =>
      {
        ProcessDataRow(capture);        
      }));
}
Task.WaitAll(tasks.ToArray()); // Wait for all work items to complete.

There are many other reasonable ways to do this. I highlight the patterns above because they are easy and work well. In the absence of specific details I cannot say for certain that either will be a perfect match for your situation, but they should be a good starting point.

Update:

I had a short period of subpar cerebral activity. If you have the TPL available you could also use Parallel.ForEach as a simpler method than all of that Task hocus-pocus I mentioned above.

Parallel.ForEach(table.Rows,
  (DataRow row) =>
  {
    ProcessDataRow(row);
  });

Why is mySQL query, left join 'considerably' faster than my inner join

5 votes

I've researched this, but I still cannot explain why:

SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
INNER JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = 23155

Is significantly slower than:

SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
LEFT JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = 23155

115ms Vs 478ms. They are both using InnoDB and there are relationships defined. The 'card_legality' contains approx 200k rows, while the 'legality' table contains 11 rows. Here is the structure for each:

CREATE TABLE `card_legality` (
  `card_id` varchar(8) NOT NULL DEFAULT '',
  `legality_id` int(3) NOT NULL,
  `cl_boolean` tinyint(1) NOT NULL,
  PRIMARY KEY (`card_id`,`legality_id`),
  KEY `legality_id` (`legality_id`),
  CONSTRAINT `card_legality_ibfk_2` FOREIGN KEY (`legality_id`) REFERENCES `legality` (`legality_id`),
  CONSTRAINT `card_legality_ibfk_1` FOREIGN KEY (`card_id`) REFERENCES `card` (`card_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

And:

CREATE TABLE `legality` (
  `legality_id` int(3) NOT NULL AUTO_INCREMENT,
  `l_name` varchar(16) NOT NULL DEFAULT '',
  PRIMARY KEY (`legality_id`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=latin1;

I could simply use LEFT-JOIN, but it doesn't seem quite right... any thoughts, please?

UPDATE: As requested, I've included the results of explain for each. I had run it previously, but I dont pretend to have a thorough understanding of it..

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  cl  ALL PRIMARY NULL    NULL    NULL    199747  Using where
1   SIMPLE  l   eq_ref  PRIMARY PRIMARY 4   hexproof.co.uk.cl.legality_id   1   

AND, inner join:

id  select_type table   type    possible_keys   key key_len         ref                         rows    Extra
1   SIMPLE  l   ALL PRIMARY NULL    NULL    NULL    11  
1   SIMPLE  cl  ref PRIMARY,legality_id legality_id 4   hexproof.co.uk.l.legality_id    33799   Using where

It is because of the varchar on card_id. MySQL can't use the index on card_id as card_id as described here mysql type conversion. The important part is

For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly. If str_col is an indexed string column, the index cannot be used when performing the lookup in the following statement:

SELECT * FROM tbl_name WHERE str_col=1;

The reason for this is that there are many different strings that may convert to the value 1, such as '1', ' 1', or '1a'.

If you change your queries to

SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
INNER JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = '23155'

and

SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
LEFT JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = '23155'

You should see a huge improvement in speed and also see a different EXPLAIN.

Here is a similar (but easier) test to show this:

> desc id_test;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| id    | varchar(8) | NO   | PRI | NULL    |       |
+-------+------------+------+-----+---------+-------+
1 row in set (0.17 sec)

> select * from id_test;
+----+
| id |
+----+
| 1  |
| 2  |
| 3  |
| 4  |
| 5  |
| 6  |
| 7  |
| 8  |
| 9  |
+----+
9 rows in set (0.00 sec)

> explain select * from id_test where id = 1;
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
|  1 | SIMPLE      | id_test | index | PRIMARY       | PRIMARY | 10      | NULL |    9 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)


> explain select * from id_test where id = '1';
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | id_test | const | PRIMARY       | PRIMARY | 10      | const |    1 | Using index |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)

In the first case there is Using where; Using index and the second is Using index. Also ref is either NULL or CONST. Needless to say, the second one is better.

Order of condition execution in MySQL

5 votes

Suppose I have a MySQL query with two conditions:

SELECT * FROM `table` WHERE `field_1` = 1 AND `field_2` LIKE '%term%';

The first condition is obviously going to be a lot cheaper than the second, so I'd like to be sure that it runs first, limiting the pool of rows which will be compared with the LIKE clause. Do MySQL query conditions run in the order they're listed or, if not, is there a way to specify order?

MySQL has an internal query optimizer that takes care of such things in most cases. So, typically, you don't need to worry about it.

But, of course, the query optimizer is not foolproof. So...

Sorry to do this to you, but you'll want to get familiar with EXPLAIN if you suspect that a query may be running less efficiently than it should.

http://dev.mysql.com/doc/refman/5.0/en/explain.html

New Connection fails during Add New Data Source Dialog

4 votes

The Problem

If I have Visual Studio 2010, Service Pack 1 installed, and I go to Data, Add New Data Source and click Database, it brings up the standard Choose Your Data Connection wizard page.

On that page, if I have an existing connection configured, I can then select it, and add that data source to my project, no problems, no issues.

On the other hand, if I click New Connection... it just bombs out, wizard closes, as if I had done nothing. Now keep in mind, Visual Studio doesn't close, and there are no exceptions, errors, or warnings, the dialog window just disappears.

Unsuccessful Attempts to Resolve Issue

I've checked the simple things, using a debugger, and a valid machine config file, not sure what else I might be missing.

  1. Using a Debugger

    I tried starting a 2nd copy of Visual Studio as I did this, and attaching a debugger to the process, with all exceptions checked in my debug menu, and still nothing.

  2. Validating machine.config file is valid

    I backed up my machine.config file, and used another person's working machine's machine.config file, and I'm still having issues.

    I have since reverted back to my own machine.config file.

    I suspect the machine.config might be the issue, but I thought taking a working one should work, unless I'm missing one of the registered providers? I don't know enough about how this wizard works behind the scenes though, or what sections/lines of the machine.config file exactly drive the wizard.

  3. Uninstalling all third-party database providers

    I have recently tried uninstalling Oracle, SQLite, beta versions of SQL Server Compact Edition, and MySQL providers in hopes that one of them set a registry entry or configuration entry that was negatively impacting my ability to create new connections and the error still occurs.

  4. Reinstalling Visual Studio

    I reinstalled Visual Studio, complete uninstall, and reinstall, no extensions, or nothing, and the issue exists, definitely convinced its a configuration error at this point, I could use the expert opinion of someone who knows more about the configuration behind the scenes for this dialog though.

Any help in resolving the issue would be greatly appreciated.

  1. Try using Server Explorer window to add your Connection.

  2. Try defining your Connection String from Settings.settings file

  3. Try moving/deleting all the *.datasource files from the project/solution before adding the connection

  4. Try creating a new, empty solution and see if the problem is solution-indepentent

  5. Try finding out what registry keys and files Visual Studio uses when the issue occurs with Process Monitor from Sysinternals

How to find out if store open or close - dealing with hours?

4 votes

What is the best way to store shop opening and closing time in the database and also how to calculate the time in PHP?

I have come up with this table design:

+----+---------+----------+-----------+------------+
| id | shop_id | week_day | open_hour | close_hour |
+----+---------+----------+-----------+------------+
|  1 |       3 |        1 | 15:00:00  | 23:00:00   |
|  2 |       3 |        2 | 15:00:00  | 23:00:00   |
|  3 |       3 |        3 | 18:00:00  | 02:00:00   |
|  4 |       3 |        4 | 18:00:00  | 02:00:00   |
|  5 |       3 |        5 | 18:00:00  | 03:00:00   |
+----+---------+----------+-----------+------------+

+------------+---------+------+-----+---------+----------------+
| Field      | Type    | Null | Key | Default | Extra          |
+------------+---------+------+-----+---------+----------------+
| id         | int(11) | NO   | PRI | NULL    | auto_increment |
| shop_id    | int(11) | NO   |     | NULL    |                |
| week_day   | int(11) | NO   |     | NULL    |                |
| open_hour  | time    | NO   |     | NULL    |                |
| close_hour | time    | NO   |     | NULL    |                |
+------------+---------+------+-----+---------+----------------+

For example, on Tuesday (week_day = 2) it open at 3PM and close at 11PM (Tuesday).

On Wednesday (`week_day = 2'), it open at 6PM and close after midnight at 2AM which would be Thursday. How should midnight (00:00:00 or after) closing time should be stored in the database?

Let say customer want to place an order (shop_id = 3) at 10PM on Tuesday, they should be able to do so according to the database data. However if customer want to place an order at 1AM on Thursday but the database show that week_day = 3 it close at 02:00:00

How to write in PHP to work out if the shop open or not? it seem complicated!

Do I need to change the the table design so it would much easier to write in PHP?

You can consider making the table information truly correct

+----+---------+----------+-----------+------------+
| id | shop_id | week_day | open_hour | close_hour |
+----+---------+----------+-----------+------------+
|  1 |       3 |        1 | 15:00:00  | 23:00:00   |
|  2 |       3 |        2 | 15:00:00  | 23:00:00   |
|  3 |       3 |        3 | 18:00:00  | 23:59:59   |
|  4 |       3 |        4 | 00:00:00  | 02:00:00   |
|  5 |       3 |        4 | 18:00:00  | 23:59:59   |
|  6 |       3 |        5 | 00:00:00  | 02:00:00   |
|  7 |       3 |        5 | 18:00:00  | 23:59:59   |
|  8 |       3 |        6 | 00:00:00  | 03:00:00   |
+----+---------+----------+-----------+------------+

Then use the following kind of (this query is for Tuesday 10:00PM as you mentioned):

SELECT count(*) FROM `shop` 
 WHERE week_day=3 
 and open_hour<='22:00:00' 
 and close_hour>='22:00:00'

Performance difference between VARCHAR2 to NUMBER

4 votes

If I query my DataBase (Oracle) with unique Index on multiple columns, Will there be any performance difference if I change one of the columns from VARCHAR2 to NUMBER?
If there is, is it significant?

(It's varchar2 because I need '0' at the beginning but I can change it in the presentation layer in my app)

Yes, it should be quicker to use NUMBER. Whether it gives you a significant increase will depend on your data, indexes and queries. If you're having performance problems, this is unlikely to be the magic fix.