Best operators questions in December 2010

Why does >= return false when == returns true for null values?

50 votes

I have two variables of type int? (or Nullable<int> if you will). I wanted to do a greater-than-or-equal (>=) comparison on the two variables but as it turns out, this returns false if both variables are null, while obviously the == operator returns true.

Can someone explain to me why that is logical because the semantical definition of the >= operator contains the word "or"?

There was a huge debate about this oddity when the feature was originally designed back in C# 2.0. The problem is that C# users are completely used to this being meaningful:

if(someReference == null)

When extending equality to nullable value types, you have the following choices.

1) Nullable equality is truly lifted. If one or both of the operands is null then the result is neither true, nor false, but null. In this case you can either:

1a) Make it illegal to have a nullable value type equality in an "if" statement, because the "if" statement needs a bool, not a nullable bool. Instead, require everyone to use "HasValue" if they want to compare to null. This is verbose and irritating.

1b) Automatically convert null to false. The downside of this is that "x==null" returns false if x is null, which is confusing and works against people's understanding of null comparisons with reference types.

2) Nullable equality is not lifted. Nullable equality is either true or false, and comparison to null is a null check. This makes nullable equality inconsistent with nullable inequality.

None of these choices is obviously correct; they all have pros and cons. VBScript chooses 1b, for example. After much debate the C# design team chose #2.

What does "<=>" in MySQL mean?

13 votes

What does <=> in MySQL mean and do?

The manual says it all:

NULL-safe equal. This operator performs an equality comparison like the = operator, but returns 1 rather than NULL if both operands are NULL, and 0 rather than NULL if one operand is NULL.

mysql> select NULL <=> NULL;
+---------------+
| NULL <=> NULL |
+---------------+
|             1 |
+---------------+
1 row in set (0.00 sec)

mysql> select NULL = NULL;
+-------------+
| NULL = NULL |
+-------------+
|        NULL |
+-------------+
1 row in set (0.00 sec)

mysql> select NULL <=> 1;
+------------+
| NULL <=> 1 |
+------------+
|          0 |
+------------+
1 row in set (0.00 sec)

mysql> select NULL = 1;
+----------+
| NULL = 1 |
+----------+
|     NULL |
+----------+
1 row in set (0.00 sec)

mysql> 

MySQL "IN" operator performance on (large?) amount of values

6 votes

Hi,

I have been experimenting with Redis and MongoDB lately and it would seem that there are often cases where you would store an array of id's in either MongoDB or Redis. I'll stick with Redis for this question since I am asking about the MySQL IN operator.

I was wondering how performant it is to do a large amount (300-3000) of id's inside the IN operator, which would look something like this:

SELECT id, name, price
FROM products
WHERE id IN (1, 2, 3, 4, ...... 3000)

Imagine something as simple as a products and categories table which you might normally JOIN together to get the products from a certain category. In the example above you can see that under a given category in Redis ( category:4:product_ids ) I return all the product ids from the category with id 4, and place them in the above SELECT query inside the IN operator.

How performant is this?

Is this an "it depends" situation? Or is there a concreet "this is (un)acceptable" or "fast" or "slow" or should I add a LIMIT 25, or doesn't that help?

SELECT id, name, price
FROM products
WHERE id IN (1, 2, 3, 4, ...... 3000)
LIMIT 25

Or should I trim the array of product id's returned by Redis to limit it to 25 and only add 25 id's to the query rather than 3000 and LIMIT-ing it to 25 from inside the query?

SELECT id, name, price
FROM products
WHERE id IN (1, 2, 3, 4, ...... 25)

Any suggestions/feedback is much appreciated!

Generally speaking, if the IN list gets too large (for some ill-defined value of 'too large' that is usually in the region of 100 or smaller), it becomes more efficient to use a join, creating a temporary table if need so be to hold the numbers.

If the numbers are a dense set (no gaps - which the sample data suggests), then you can do even better with WHERE id BETWEEN 300 AND 3000. However, presumably there are gaps in the set, at which point it may be better to go with the list of valid values after all (unless the gaps are relatively few in number, in which case you could use: WHERE id BETWEEN 300 AND 3000 AND id NOT BETWEEN 742 AND 836 or whatever the gaps are.