Best mysql questions in June 2012

How to successfully rewrite old mysql-php code with deprecated mysql_* functions?

12 votes

I am still learning mostly from books I buy, but today I leart that my book is old even though I bought it this year concerning programming in PHP. Now I know that mysql_* commands in PHP are deprecated and should be replaced with more secure and stable prepared statements and PDO. So I put myself to rewrite all my web according to it and maybe I will need some advices from you how to do it properly and working from you all more experienced guys :)

So I will start my rewrite with only main part (connect to db and choosing DB) in here (the rest I can do on my own with google and manuals). I will write here my old script and ask you if I am doing everything right and not missing anything and I hope this could be some good manual/answer for other people as well. So lets start.

So in config I have something like this:

$db = new dbConn('127.0.0.1', 'root', 'pass', 'people', 'animals');

Which should be like this:

$db = new PDO('mysql:host=127.0.0.1;dbname=people;charset=UTF-8', 'root', 'pass');

Right? But when I need to choose database later should i do it without dbname=people;? But how to choose database later?

Here is my one and only script to rewrite which is basic in most web projects and I hope it will bring not only me some understanding how new PDO system really works:

class dbConn
{
  public function __construct($server, $user, $pass, $db_people, $db_animals)
  {    
    if (!empty($server) && !empty($user) && !empty($pass) && !empty($db_people) && !empty($db_animals))
    {
      $this->server = $server;
      $this->user =  $user;
      $this->pass = $pass;
      $this->db_people = $db_people;  
      $this->db_animals = $db_animals;  
      $this->connect(); 
    }  
    else
    {
      die("Set up connection to db");
    }
  }

  public function connect()
  {
    $this->conn = mysql_connect($this->server, $this->user, $this->pass) or die ('cannot connect to MySQL');
  }

  public function selectDb($database)
  {
    switch($database)
    {
      case 'people':
        mysql_select_db($this->db_people, $this->conn) or die ('cannot connect to database '.$this->db_people.'.');
        mysql_query("SET NAMES 'utf8'");
        break;

      case 'animals':
        mysql_select_db($this->db_animals, $this->conn) or die ('cannot connect to database '.$this->db_animals.'.');
        mysql_query("SET NAMES 'utf8'"); 
    }
  }

  public function __destruct() 
  {
    if (!empty($this->conn))
    {
      mysql_close($this->conn); 
    }
  }  
}

So from what I know from Google and Wiki - functions like public function __construct and public function __destruct() should not be needed anymore, right? The same with functions like public function connect() SO only whats left is public function selectDb($database) but i have no idea how to do this correctly without damading all connection to database. Because in rest of my code (not mentioned here) I can easily choose database by this code: $this->db->selectDb("people"); But with prepared statements I do not know if this is even possible in easy way. I hope some advices around this from you will help me and other users understand this new code better. Other parts in code you may have are eplained in this PDO Tutorial for MySQL Developers. Thank you.

Actually, a simple, sweet and short: Yes, not necessary any longer.

Let's review the code not that we have lost something:

  • __construct - The constructor merely contained all the configuration. PDO has a much easier concept here, a connection string containing the most information:

     mysql:host=127.0.0.1;dbname=people;charset=UTF-8
    

    Also PDO provides the constructor for use ready-made, so double not necessary.

  • connect - The connection function is not necessary any longer as well. This is done by instantiating PDO already. You can look for exceptions, the PHP manual has an example on it's constructor page.

  • selectDb - This complicated function is not needed any longer as well. Wow, the third function we can just drop because of the PDO connection string. Much power with so less characters. Cheers!

  • __destruct - The destructor. Let's be fair: MySQL did not need this as well. However with PDO we get it for free - without writing a single line of code.

Looks good! You managed to migrate from that obscure database class to PDO by removing outdated code! Congratulations:

$db = new PDO('mysql:host=127.0.0.1;dbname=people;charset=UTF-8', 'root', 'pass');

If you now think, what about if I want to have database class on my own? Well you can do that, because you can extend from PDO (yes that works!):

class DB extends PDO
{
   ... my super-new-shiny-code
}

Why you might want to do that? No idea, but maybe it's more fluent for your code. If you're looking for a better code-example, I have one at PHP/MySQL Table with Hyperlinks.

Re-indexing huge database (the English Wikipedia) efficiently

11 votes

THE GIST

Before performing a massive 40+ GB import of the English Wikipedia, I had to temporarily remove indexes and auto-increment fields from three tables ('page', 'revision', and 'text') to handle the load. Now I have finally successfully imported the English Wikipedia to my local machine and created a local mirror (MediaWiki API). Yay!

However, I now need to re-create the indexes and auto-increment fields in less than a decade. Luckily, (1) I took plenty of screen-shots of the relevant tables in phpmyadmin before I removed the indexes and fields; (2) I can explain with extreme precision the steps I took before the import; and (3) this shouldn't be too difficult for anyone fluent in MySQL. Unfortunately, I have no expertise in MySQL whatsoever, so "baby steps" explanations would be extremely helpful.

PRECISELY WHAT I DID (PREPARING FOR THE IMPORT):

Steps 1, 2, 3: This image depicts the table page before I modified the field page_id by clicking 'Change' and un-checking 'Auto-Increment' (in preparation for importing). I performed precisely the same modifications for the field rev_id in table revision and old_id in table text but omitted the screen-shots to avoid redundancy.

table 'page' before modification of 'page_id'

Step 4: This image depicts the indexes for the table page before I dropped all of them.

indexes for table 'page' before I dropped them

Step 5: This image depicts the indexes for the table revision before I dropped all of them.

indexes for table 'revision' before I dropped them

Step 6: This image depicts the indexes for the table text before I dropped all of them.

indexes for table 'text' before I dropped them

WHAT I NEED NOW (RESTORING AFTER THE IMPORT):

I just need to restore the original indexes and auto-increment fields without waiting a hundred years.

Set-up details: PHP 5.3.8 (apache2handler), MySQL 5.5.16 (InnoDB), Apache 2.2.21, Ubuntu 12.04 LTS, MediaWiki 1.19.0 (private wiki)

I really like Wikipedia so I'll try to help.

You need to use a lot of

ALTER TABLE

Add primary keys

ALTER TABLE page ADD PRIMARY KEY (page_id);
ALTER TABLE revision ADD PRIMARY KEY (rev_id);
ALTER TABLE text ADD PRIMARY KEY (old_id);

Add auto increments back

ALTER TABLE page MODIFY COLUMN page_id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT;

I need the table descriptions for all tables before continuing. If rev_id and old_id are same definitions as page_id then:

ALTER TABLE revision MODIFY COLUMN rev_id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE text MODIFY COLUMN old_id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT;

Add unique keys

ALTER TABLE page ADD UNIQUE name_title(page_namespace, page_title);
ALTER TABLE revision ADD UNIQUE rev_page_id(rev_page, rev_id);

Other indexes

ALTER TABLE page ADD INDEX page_random(page_random);
ALTER TABLE page ADD INDEX page_len(page_len);
ALTER TABLE page ADD INDEX page_redirect_namespace(page_is_redirect, page_namespace, page_len);
ALTER TABLE revision ADD INDEX rev_timestamp(rev_timestamp);
ALTER TABLE revision ADD INDEX page_timestamp(rev_page, rev_timestamp);
ALTER TABLE revision ADD INDEX user_timestamp(rev_user, rev_timestamp);
ALTER TABLE revision ADD INDEX user_text_timestamp(rev_user_text, rev_timestamp);

Again, there may be column definitions that change this stuff. You need to provide the CREATE TABLE info.

SQL selecting people you may know

11 votes

The question you're asking appears subjective and is likely to be closed.

I wasn't surprised when I saw above horrible warning while I was filling in title field.

I read almost every thread talking about friends of friends or mutual friends but I'm not sure I found the right solution that I want to do.

I'm sorry I'm not good at English nor SQL.

How can I find the right answer while being not good at both of languages?

I decided I have to ask. I won't let myself down for down-votes or any duplication warnings.

As I want the answer, I'll write down as sincerely as possible for any further similar problems can be helped.

I have a table for friend relations.

FRIEND (TABLE)
-----------------------------------
PLAYER_ID(PK,FK)   FRIEND_ID(PK,FK)
-----------------------------------
1                  2                 // 1 knows 2
2                  1                 // 2 knows 1
1                  3                 // 1 knows 3
2                  3                 // 2 knows 3
2                  4                 // 2 knows 4
2                  5                 // 2 knows 5 // updated
3                  5                 // 3 knows 5 // updated
1                  100
1                  200
1                  300
100                400
200                400
300                400

Both composite primary keys are also foreign keys from PLAYER table.

I asked and got answered from such nice people for "people know each other".

SQL view for acquaintance from table.

And I have a view like this.

ACQUAINTANCE (VIEW)
-----------------------------------
PLAYER_ID(PK,FK)   FRIEND_ID(PK,FK)
-----------------------------------
1                  2                 // 1 knows 2
2                  1                 // 2 knows 1

As you might be noticed, this relationships' business logic has following two purposes.

  1. One player can say he or she knows someone else.
  2. When both people say they know each other, they can be said as acquaintance.

And, now, I want to know is there any good way for

  1. Selecting other PLAYER_IDs
  2. With given PLAYER(PLAYER_ID) (say 1)
  3. Which each is one of `friends of given PLAYER's direct friends'
  4. Which each is not the PLAYER himself (excluding 1 -> 2 -> 1)
  5. Which each is not the PLAYER's direct friends (excluding 3 from 1 -> 2 -> 3 by 1 -> 3)
  6. Order by number of mutual friends if possible.

I think the Justin Niessner's answer in "people you may know" sql query is the closest path that I must follow.

Thanks in advance.

I'll close the thread if this subject is really duplicated and not necessary.

UPDATE --------------------------------------------------------------

for Raphaël Althaus's comment whose name is same with my future daughter (is it boy's name?),

3 is a candidate for friends of friends of 1 because

1 knows 2
2 knows 3

but excluded because

1 already knows 3

Basically I want to serve for the given player the

people he or she may know
which is not himself or herself // this is nothing but obvious
which each is not already known to himself

With above table

by 1 -> 2 -> 4 and 1 -> 3 -> 5

4 and 5 can be suggested for 1 as 'people you may know'

order by number of mutual friends will be perfect
but I don't think I can understand even if someone show me how. sorry.

Thank you.

UPDATE ---------------------------------------------------------------------

I think I must try step by step by myself from what I've learned FROM HERE WITH VARIOUS PEOPLE even if it's not the right answer. Please let me know if I'm doing anything wrong.

First of all, let me self join the FRIEND table itself.

SELECT *
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID

prints

+-----------+-----------+-----------+-----------+
| PLAYER_ID | FRIEND_ID | PLAYER_ID | FRIEND_ID |
+-----------+-----------+-----------+-----------+
|         1 |         2 |         2 |         1 |
|         1 |         2 |         2 |         3 |
|         1 |         2 |         2 |         4 |
|         1 |         2 |         2 |         5 |
|         1 |         3 |         3 |         5 |
|         2 |         1 |         1 |         2 |
|         2 |         1 |         1 |         3 |
|         2 |         3 |         3 |         5 |
+-----------+-----------+-----------+-----------+

F2.FRIEND_ID only

SELECT F2.FRIEND_ID
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID

prints

+-----------+
| FRIEND_ID |
+-----------+
|         1 |
|         3 |
|         4 |
|         5 |
|         5 |
|         2 |
|         3 |
|         5 |
+-----------+

for 1 only

SELECT F2.FRIEND_ID
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
WHERE F1.PLAYER_ID = 1;

prints

+-----------+
| FRIEND_ID |
+-----------+
|         1 |
|         3 |
|         4 |
|         5 |
|         5 |
+-----------+

not 1

SELECT F2.FRIEND_ID
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
WHERE F1.PLAYER_ID = 1 
AND F2.FRIEND_ID != 1;

prints

+-----------+
| FRIEND_ID |
+-----------+
|         3 |
|         4 |
|         5 |
|         5 |
+-----------+

not 1's direct knowns

SELECT F2.FRIEND_ID
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
WHERE F1.PLAYER_ID = 1
AND F2.FRIEND_ID != 1
AND F2.FRIEND_ID NOT IN (SELECT FRIEND_ID FROM FRIEND WHERE PLAYER_ID = 1);

prints

+-----------+
| FRIEND_ID |
+-----------+
|         4 |
|         5 |
|         5 |
+-----------+

I think I'm getting there.

UPDATE -----------------------------------------------------------------

Following paths added

1 -> 100 -> 400
1 -> 200 -> 400
1 -> 300 -> 400

And the last query prints (again)

+-----------+
| FRIEND_ID |
+-----------+
|         4 |
|         5 |
|         5 |
|       400 |
|       400 |
|       400 |
+-----------+

at last, I got the candidates: 4, 5, 400

Putting distinct surely work for the primary goal

SELECT DISTINCT F2.FRIEND_ID
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
WHERE F1.PLAYER_ID = 1
AND F2.FRIEND_ID != 1
AND F2.FRIEND_ID NOT IN (SELECT FRIEND_ID FROM FRIEND WHERE PLAYER_ID = 1);

prints

+-----------+
| FRIEND_ID |
+-----------+
|         4 |
|         5 |
|       400 |
+-----------+

And, now, ordering by mutual counts needed.

Here comes the number of mutual friends for each candidates.

+-----------+
| FRIEND_ID |
+-----------+
|         4 | 1 (1 -> 2 -> 4)
|         5 | 2 (1 -> 2 -> 5, 1 -> 3 -> 5)
|       400 | 3 (1 -> 100 -> 400, 1 -> 200 -> 400, 1 -> 300 -> 400)
+-----------+

How can I calculate and order by those number of mutual friends?

SELECT F2.FRIEND_ID, COUNT(*)
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
WHERE F1.PLAYER_ID = 1
AND F2.FRIEND_ID != 1
AND F2.FRIEND_ID NOT IN (SELECT FRIEND_ID FROM FRIEND WHERE PLAYER_ID = 1)
GROUP BY F2.FRIEND_ID;

prints

+-----------+----------+
| FRIEND_ID | COUNT(*) |
+-----------+----------+
|         4 |        1 |
|         5 |        2 |
|       400 |        3 |
+-----------+----------+

I got it!

SELECT F2.FRIEND_ID, COUNT(*) AS MFC
FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
WHERE F1.PLAYER_ID = 1
AND F2.FRIEND_ID != 1
AND F2.FRIEND_ID NOT IN (SELECT FRIEND_ID FROM FRIEND WHERE PLAYER_ID = 1)
GROUP BY F2.FRIEND_ID
ORDER BY MFC DESC;

prints

+-----------+-----+
| FRIEND_ID | MFC |
+-----------+-----+
|       400 |   3 |
|         5 |   2 |
|         4 |   1 |
+-----------+-----+

Can anybody please confirm this? Is that query optimal? Any possible performance problem when make it as a view?

Thank you.

UPDATE --------------------------------------------------------------------------------------------

I created a view as

CREATE VIEW FOLLOWABLE AS
    SELECT F1.PlAYER_ID, F2.FRIEND_ID AS FOLLOWABLE_ID, COUNT(*) AS MFC
    FROM FRIEND F1 INNER JOIN FRIEND F2 ON F1.FRIEND_ID = F2.PLAYER_ID
    WHERE F2.FRIEND_ID != F1.PLAYER_ID
    AND F2.FRIEND_ID NOT IN (SELECT FRIEND_ID FROM FRIEND WHERE PLAYER_ID = F1.PLAYER_ID)
    GROUP BY F2.FRIEND_ID
    ORDER BY MFC DESC;

and tested.

mysql> select * from FOLLOWABLE;
+-----------+---------------+-----+
| PlAYER_ID | FOLLOWABLE_ID | MFC |
+-----------+---------------+-----+
|         1 |           400 |   3 |
|         1 |             5 |   2 |
|         2 |           100 |   1 |
|         2 |           200 |   1 |
|         2 |           300 |   1 |
|         1 |             4 |   1 |
+-----------+---------------+-----+
6 rows in set (0.01 sec)

mysql> select * from FOLLOWABLE WHERE PLAYER_ID = 1;
+-----------+---------------+-----+
| PlAYER_ID | FOLLOWABLE_ID | MFC |
+-----------+---------------+-----+
|         1 |           400 |   3 |
|         1 |             5 |   2 |
|         1 |             4 |   1 |
+-----------+---------------+-----+
3 rows in set (0.00 sec)

use this EDIT

SELECT `friend_id` AS `possible_friend_id`
FROM `friends`
WHERE `player_id` IN (        --selecting those who are known
    SELECT `friend_id`        --by freinds of #1
    FROM `friends`
    WHERE `player_id` = 1) 
AND `friend_id` NOT IN (      --but not those who are known by #1
    SELECT `friend_id`
    FROM `friends`
    WHERE `player_id` = 1)
AND NOT `friend_id` = 1       --and are not #1 himself
                              --if one is known by multiple people
                              --he'll be multiple time in the list
GROUP BY `possible_friend_id` --so we group
ORDER BY COUNT(*) DESC        --and order by amount of repeatings

MySQL PDO - What should be inside the try { block }?

10 votes

So I'm working on learning PDO, and making the transfer from the standard PHP MySQL functions. However, I have a question. Regarding the try {} blocks, what exactly should be in them, and what should go outside it?

Should everything that uses $sth-> ... be inside try {}? Should it just be from when the statement is first prepared all the way to when it is executed? Even less than that?

Any help would be greatly appreciated. :)

Here is an example method I have in a class. Is it organized properly? Notice how I put everything inside try {}. Is that wrong? It feels incorrect to me, but I'm not sure how I should change it.

protected function authorized()
{
    try
    {
        // Attempt to grab the user from the database.
        $sth = $dbh->prepare("
            SELECT COUNT(*) AS num_rows
            FROM users
            WHERE user_id = :user_id
            ");

        $sth->bindParam(':user_id', $this->user_id);
        $sth->execute();

        // Check if user exists in database.
        if ($sth->fetch()->num_rows > 0)
        {
            // User exists in database, and is therefore valid.
            return TRUE;
        }
        else
        {
            // User does not exist in database, and is therefore invalid.
            return FALSE;
        }
    }
    catch (PDOException $e)
    {
        pdo_error($e);
    }
}

The try catch should be outside the function.

<?php

protected function authorized() {
    // Attempt to grab the user from the database.
    $sth = $dbh->prepare("
            SELECT COUNT(*) AS num_rows
            FROM users
            WHERE user_id = :user_id
            ");

    $sth->bindParam(':user_id', $this->user_id);
    $sth->execute();

    // Check if user exists in database.
    if ($sth->fetch()->num_rows > 0) {
        // User exists in database, and is therefore valid.
        return TRUE;
    }
    else {
        // User does not exist in database, and is therefore invalid.
        return FALSE;
    }
}

...

try {
    authorized()
}
catch (PDOException $e) {
    pdo_error($e);
}

Don't handle exceptions inside of the methods. You try the method and catch the resulting exception if it happens.

Two duplicate delete queries using primary key causing deadlock

9 votes

I don't understand how two duplicate queries that each delete a single row against a single table using the primary key could have deadlocked. Can anyone explain?

It seems to me like one of the transactions should have gotten the lock and the other one would have to wait.

Here's the deadlock report, with the queries:

Fri Jun 01 2012 13:50:23
*** (1) TRANSACTION:
TRANSACTION 3 1439005348, ACTIVE 0 sec, process no 22419, OS thread id 1166235968 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 368
MySQL thread id 125597624, query id 3426379709 node3-int 10.5.1.119 application-devel updating
DELETE FROM `SessData` WHERE `SessKey` = '87EDF1479A275557AC8280DCA78AB886'
AND `Name` = 'CurrentRequestURL'

*** (2) TRANSACTION:
TRANSACTION 3 1439005340, ACTIVE 0 sec, process no 22419, OS thread id 1234073920 starting index read, thread declared inside InnoDB 0
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1216
MySQL thread id 125597622, query id 3426379705 node2-int 10.5.1.118 application-devel updating
DELETE FROM `SessData` WHERE `SessKey` = '87EDF1479A275557AC8280DCA78AB886'
AND `Name` = 'CurrentRequestURL'

*** WE ROLL BACK TRANSACTION (2)

Here's the schema for the table:

CREATE TABLE  `application`.`SessData` (
  `SessKey` varchar(255) NOT NULL default '',
  `Name` varchar(255) NOT NULL default '',
  `Value` varchar(255) default NULL,
  PRIMARY KEY  (`SessKey`,`Name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

A few other details:

MySQL version: 4.1.21
Isolation level: REPEATABLE-READ
Character set on the the above columns: latin1

You are using MySQL version 4.1.21. 4.1 is past its end-of-life and 4.1.21 isn't even the latest 4.1 version. (Extended support for MySQL 4.1 ended on December 31, 2009.) You should upgrade to at least 5.0.96, though you might as well come fully up-to-date to 5.5.25. Failing that, an upgrade to 4.1.22 would be the minimum you could do, though that probably won't fix your problem.

If you read the last example in the MySQL 4.1 documentation you see how this deadlock could occur if the row being deleted had previously been selected with a shared lock earlier in the transaction. Likewise you could have acquired shared locks if there are foreign key constraints involved. The general problem is:

A acquires a shared lock on x

B waits for an exclusive lock on x. It has to wait because of A's lock.

A waits for an exclusive lock on x. It has to wait because B is ahead of it in the queue for the exclusive lock.

The way InnoDB handles locks, it will not upgrade A's shared lock to exclusive while B is waiting for the same exclusive lock, so this is a deadlock.

Alternarely, you may be hitting a bug when the two statements are both trying to delete a non-existent row (possibly just deleted by an immediately preceding third duplicate delete). Possibly related to:

Complex SQL query (newbie)

8 votes

I have a MySQL table containing the points (x/y coordinates) of tracks. Each row contains the TrackID, a Timestamp, and the X and Y Positions for that track at that given point in time.

What I want is a list of all TrackIDs that were active during a given time interval (tmin...tmax), sorted by their start-time, even if that start time is outside the interval.

A little illustration might help:

Illustration

As an example: Track 1 is active from t11 till t12, which means I have many rows in my table with ID=1 and with timestamps ranging from t11 to t12.

The desired output would be:

TrackID | StartTime
--------+-----------
    7   |    t71
    1   |    t11
    2   |    t21
    6   |    t61

I tried something like this:

SELECT TrackID, MIN(Timestamp) AS StartTime FROM Tracks WHERE Timestamp BETWEEN tmin AND tmax GROUP BY TrackID ORDER BY StartTime;

However, in the example above I don't get the real start times for tracks 1 and 7, since all rows with timestamps less than tmin are not considered at all.

Of course I could in a first step just get all active TrackIDs with

SELECT TrackID FROM Tracks WHERE Timestamp BETWEEN tmin AND tmax GROUP BY TrackID;

and then with separate queries find the start times of all these tracks and then sort them in my application code.

But I'm sure there is a way to do this with one SQL query. My table contains millions of rows, so efficiency is an issue here.

I hope someone can show me a solution (and explain it, since I'd like to learn SQL better...)!

One way to think about it is to construct the logic to handle your four special cases in your diagram. These two rules should suffice.

  1. tend > tmin AND
  2. tstart < tmax

If any of these two conditions are true, then the track should be included. You will need a list of tracks as in your second query with their min and max values, and then perform the comparisons:

SELECT T.TrackID
  FROM (SELECT TrackID, MIN(Timestamp) AS StartTime, MAX(Timestamp) AS EndTime
        FROM Tracks GROUP BY TrackID) T
 WHERE T.EndTime > tmin AND T.StartTime < tmax

Why MySQL does not use indexes in WHERE IF clause?

8 votes

Because of this setting:

mysql> show global variables like '%indexes';
+-------------------------------+-------+
| Variable_name                 | Value |
+-------------------------------+-------+
| log_queries_not_using_indexes | ON    | 
+-------------------------------+-------+

The slow queries log keep receiving:

# Time: 120607 16:58:30
# User@Host: xbtit[xbtit] @  [123.30.53.244]
# Query_time: 0  Lock_time: 0  Rows_sent: 1  Rows_examined: 16006
SELECT * FROM xbtit_files WHERE IF(soha_id is null OR soha_id = '', info_hash, soha_id)='6d63dd4ab199190b531752067414d4d6e6568f90';

Trying to explain this query:

mysql> EXPLAIN SELECT * FROM xbtit_files WHERE IF(soha_id is null OR soha_id = '', info_hash, soha_id)='6d63dd4ab199190b531752067414d4d6e6568f90';
+----+-------------+-------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table       | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | xbtit_files | ALL  | NULL          | NULL | NULL    | NULL | 16006 | Using where | 
+----+-------------+-------------+------+---------------+------+---------+------+-------+-------------+

What surprised me is why MySQL not using indexes:

mysql> show index from xbtit_files;
+-------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table       | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| xbtit_files |          0 | PRIMARY   |            1 | info_hash   | A         |       16006 |     NULL | NULL   |      | BTREE      |         | 
| xbtit_files |          1 | filename  |            1 | filename    | A         |       16006 |     NULL | NULL   | YES  | BTREE      |         | 
| xbtit_files |          1 | category  |            1 | category    | A         |           1 |     NULL | NULL   |      | BTREE      |         | 
| xbtit_files |          1 | uploader  |            1 | uploader    | A         |          16 |     NULL | NULL   |      | BTREE      |         | 
| xbtit_files |          1 | bin_hash  |            1 | bin_hash    | A         |       16006 |       20 | NULL   |      | BTREE      |         | 
| xbtit_files |          1 | ix_sohaid |            1 | soha_id     | A         |       16006 |     NULL | NULL   | YES  | BTREE      |         | 
+-------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

FORCE INDEX also doesn't work:

mysql> EXPLAIN SELECT * FROM xbtit_files force index (PRIMARY) WHERE IF(soha_id is null OR soha_id = '', info_hash, soha_id)='6d63dd4ab199190b531752067414d4d6e6568f90';
+----+-------------+-------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table       | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | xbtit_files | ALL  | NULL          | NULL | NULL    | NULL | 16006 | Using where | 
+----+-------------+-------------+------+---------------+------+---------+------+-------+-------------+

Must I split this query into 2 operations?

In MySQL, you cannot create indexes on expressions, and the optimizer is not smart enough to split your query against two indexes.

Use this:

SELECT  *
FROM    xbtit_files 
WHERE   soha_id = '6d63dd4ab199190b531752067414d4d6e6568f90'
UNION ALL
SELECT  *
FROM    xbtit_files 
WHERE   soha_id = ''
        AND info_hash = '6d63dd4ab199190b531752067414d4d6e6568f90'
UNION ALL
SELECT  *
FROM    xbtit_files 
WHERE   soha_id IS NULL
        AND info_hash = '6d63dd4ab199190b531752067414d4d6e6568f90'

Each query uses its own index.

You can just combine it into a single query:

SELECT  *
FROM    xbtit_files 
WHERE   (
        soha_id = '6d63dd4ab199190b531752067414d4d6e6568f90'
        OR
        (soha_id = '' AND info_hash = '6d63dd4ab199190b531752067414d4d6e6568f90')
        OR
        (soha_id IS NULL AND info_hash = '6d63dd4ab199190b531752067414d4d6e6568f90')
        )

and create a composit index on (soha_id, info_hash) for this to work fast.

MySQL is also able to merge results from two indexes together, using index_merge, so there is a chance you would see this in the plan for the second query even if you don't create a composite index.

Why do web sites tend to use random id:s on database tables?

8 votes

I wonder why many web sites choose to use random id:s instead of incrementing from 1 on their database tables. I´ve searched without finding any good reasons, are there any?

Also, which is the best method to use? It seems quite inefficient to check if an id already exists before inserting the data, (takes a second query).

Thanks for your help!

Under the hood, it is likely that they are using incremental ids in the database to identify rows, but the value that gets exposed to end users via the URL parameters is often made into a random string to make the sequence of available objects harder to guess.

It is really a matter of security through obscurity. It hinders automated scripts from proceeding through incremental values and attempting attacks via the URL, and it hinders automated scraping of site content.

If youtube, for example, used incremental ids instead of values like v=HSsdaX4s, you could download every by simply starting at v=1 and incrementing that value millions of times.

Mysql return more than one row

7 votes

I have this query.

SELECT tickets.id,source.name as 'source_name',flow_stage.title as 'flow status',tickets.user_name as user_created,tickets.created,tickets.rel_client_id,td_doc_nr.value_string as 'document number',source.source_code,
    IF(!ISNULL(td_doc_nr.value_string),
     (SELECT  GROUP_CONCAT(product_name SEPARATOR ',') from documents d 
        join document_bundles b on b.document_id = d.id
        join document_products p on p.doc_bundle_id = b.id
        join document_product_cstm_fields f on f.doc_product_id = p.id
        join document_product_cstm_field_data fd on fd.cstm_field_id = f.id
        where d.doc_nr = td_doc_nr.value_string
        and value_string ='auto')
,NULL) as test
FROM tickets tickets
JOIN tickets_flow_stages flow_stage ON  flow_stage.id  = tickets.flow_stage_id 
JOIN tickets_sources source ON source.id = tickets.source_id
    LEFT JOIN tickets_custom_fields tf_doc_nr ON tf_doc_nr.name = 'document_number' AND tf_doc_nr.rel_entity_id = source.id
    LEFT JOIN tickets_custom_field_data td_doc_nr ON td_doc_nr.rel_entity_id = tickets.id AND
 td_doc_nr.field_instance_id = tf_doc_nr.id 
WHERE tickets.source_id in  (114,122,125,129,131) AND tickets.status = 1  

enter image description here

I added this subquery to the 'if statement'

 SELECT  GROUP_CONCAT(product_name SEPARATOR ',') from documents d 
    join document_bundles b on b.document_id = d.id
    join document_products p on p.doc_bundle_id = b.id
    join document_product_cstm_fields f on f.doc_product_id = p.id
    join document_product_cstm_field_data fd on fd.cstm_field_id = f.id
    where d.doc_nr = 'C9000100781'
    and value_string ='auto'

enter image description here After this I added a new column.

 SELECT  GROUP_CONCAT(product_name SEPARATOR ','),GROUP_CONCAT(DISTINCT b.msisdn SEPARATOR ',') from documents d 
    join document_bundles b on b.document_id = d.id
    join document_products p on p.doc_bundle_id = b.id
    join document_product_cstm_fields f on f.doc_product_id = p.id
    join document_product_cstm_field_data fd on fd.cstm_field_id = f.id
    where d.doc_nr = 'C9000100781'
    and value_string ='auto'

enter image description here It returns two columns. How can I return two columns?Is it possible? :) Thanks

Final result I got an error 'Operand should contain 1 column(s)' enter image description here

A subquery inside an IF statement can't return multiple columns. You will need to join the subquery into the results, and pull out the two separate columns individually:

SELECT ...
    IF(!ISNULL(td_doc_nr.value_string), sub.one, NULL) as one,
    IF(!ISNULL(td_doc_nr.value_string), sub.two, NULL) as two
FROM ...
LEFT JOIN (
    SELECT  d.doc_nr, GROUP_CONCAT(product_name SEPARATOR ','),GROUP_CONCAT(DISTINCT b.msisdn SEPARATOR ',') from documents d 
    join document_bundles b on b.document_id = d.id
    join document_products p on p.doc_bundle_id = b.id
    join document_product_cstm_fields f on f.doc_product_id = p.id
    join document_product_cstm_field_data fd on fd.cstm_field_id = f.id
    where value_string ='auto'
    group by d.doc_nr
) sub on sub.doc_nr = td_doc_nr.value_string

What exactly is "pinning" in relation to indexes in DBMS?

7 votes

As part of a test question, students were asked to define and describe pinning in relation to indexes. When one talks about "pinning" in indexes, what exactly is this? Is there some other word/term I can search for, as google has not provided any solutions.

When a table or index is pinned, it means it is kept in memory.

The database has a certain amount of memory allocated to work with. Typically, the database caches recently used data.

When an index is pinned, it means the index is kept in memory permanently, rather than being aged out of the cache.

Consider the following two EXPLAINs:

EXPLAIN SELECT * FROM sales WHERE title != 'The'

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE      sales   ALL      title        NULL  NULL    NULL    41707   Using where

And -

EXPLAIN SELECT * FROM sales WHERE title = 'The'
id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE      sales   ref      title         title    767 const   1   Using where 

Why does the != query have a NULL key? Why doesn't it use title? What causes a = statement to be able to utilize an index but not a !=?

There is no point on using the index unless title is exactly 'The' very frequently.

Since almost every row needs to be selected you don't gain anything from using an index. It can actually be costly to use an index, which is probably what your MySQL engine is determining, so it is opting not to use the index.

Compare the amount of work done in these two situations:

Using the index:

1) Read the entire index tree into memory.
2) Search the index tree for the value 'The' and filter out those entries.
3) Read every row except for the few exceptions (which probably are in the same blocks on the disk as rows that do need to be read, so really the whole table is likely to be read in) from the table into memory.

Without the index:

1) Read every row into memory and while reading them filter out any where title = 'The' from the result set

how to swap values of two rows in mysql without violating unique constraint?

7 votes

I have a "tasks" table with a priority column, which has a unique constraint.

I'm trying to swap the priority value of two rows, but I keep violating the constraint. I saw this statement somewhere in a similar situation, but it wasn't with MySQL.

UPDATE tasks 
SET priority = 
CASE
    WHEN priority=2 THEN 3 
    WHEN priority=3 THEN 2 
END 

WHERE priority IN (2,3);

This will lead to the error:

Error Code: 1062. Duplicate entry '3' for key 'priority_UNIQUE'

Is it possible to accomplish this in MySQL without using bogus values and multiple queries?

EDIT:

here's the table structure:

CREATE TABLE `tasks` (
  `id` int(11) NOT NULL,
  `name` varchar(200) DEFAULT NULL,
  `priority` varchar(45) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `priority_UNIQUE` (`priority`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Is it possible to accomplish this in MySQL without using bogus values and multiple queries?

No. (none that I can think of).

The problem is how MySQL processes updates. MySQL (in difference with other DBMS that implement UPDATE properly), processes updates in a broken manner. It enforces checking of UNIQUE (and other) constraints after every single row update and not - as it should be doing - after the whole UPDATE statement completes. That's why you don't have this issue with (most) other DBMS.

For some updates (like increasing all or some ids, id=id+1), this can be solved by using - another non-standard feature - an ORDER BY in the update.

For swapping the values from two rows, that trick can't help. You'll have to use NULL or a bogus value (that doesn't exist but is allowed in your column) and 2 or 3 statements.

You could also temporarily remove the unique constraint but I don't think that's a good idea really.

Why same query giving two different result?

6 votes

I have created two tables & inserted values as shown below .

Table 1

create table maxID (myID varchar(4));

insert into maxID values ('A001');
insert into maxID values ('A002');
insert into maxID values ('A004');
insert into maxID values ('A003');

Table 2

create table maxID2 (myID varchar(4) PRIMARY KEY);

insert into maxID2 values ('A001');
insert into maxID2 values ('A002');
insert into maxID2 values ('A004');
insert into maxID2 values ('A003');

When I execute query

SELECT myId, @rowid:=@rowid+1 as myrow 
FROM maxID, (SELECT @rowid:=0) as init
ORDER BY myrow desc
LIMIT 1;

I get output as

+++++++++++++
myid + myrow
+++++++++++++
A003 + 4
+++++++++++++

AND

When I execute query

SELECT myId, @rowid:=@rowid+1 as myrow 
FROM maxID2, (SELECT @rowid:=0) as init
ORDER BY myrow desc
LIMIT 1;

I get output as

+++++++++++++
myid + myrow
+++++++++++++
A004 + 4
+++++++++++++

The difference between two table is that in second table I have myID as PRIMARY KEY.

You can view above data/ result at www.sqlfiddle.com.

My Question is

Why I am getting two different results when query is same?

NOTE : This question is bit related to my old question Getting last record from mysql, where I almost got the answer and Yak informed me that the order of rows are not guaranteed. :(

This is because when the selected fieldset is totally included into a given index fieldset, this index is used to retrieve the data instead of the fullscan result.

Since indexes have a default sorting order, when raw tables datas don't, the extracted data using the index is therefore appearing in a different order than if it were from a full table scan.

In your case the 4th row is indeed the 4th one when you are using the primary key because internally mysql (oracle, sql server...) organized it this way to seek the data faster.

Please note that by chance, you might have obtained the same result in both queries, just because the default selected result order is indeed not proven to be related to the inserted order.

At last, let me warn you that if you planned to add an index with a specific order in mysql (such as described here) so the rows are retrieved in, let say, the DESC order, you could not do it since it is not an allowed feature yet in mysql:

An index_col_name specification can end with ASC or DESC. These keywords are permitted for future extensions for specifying ascending or descending index value storage. Currently, they are parsed but ignored; index values are always stored in ascending order.

Use of GROUP BY twice in MySQL

6 votes

My table looks like this.

Location    Head    Id  IntTime
1           AMD     1   1
2           INTC    3   3
3           AMD     2   2
4           INTC    4   4
5           AMD2    1   0
6           ARMH    5   1
7           ARMH    5   0
8           ARMH    6   1
9           AAPL    7   0
10          AAPL    7   1

Location is the primary key. I need to GROUP BY Head and by Id and when I use GROUP BY, I need to keep the row with the smallest IntTime.

After the first GROUP BY Id, I should get (I keep the smallest IntTime)

Location    Head    Id  IntTime
2           INTC    3   3
3           AMD     2   2
4           INTC    4   4
5           AMD2    1   0
7           ARMH    5   0
8           ARMH    6   1
9           AAPL    7   0

After the second GROUP BY Head, I should get (I keep the smallest IntTime)

Location    Head    Id  IntTime
2           INTC    3   3
3           AMD     2   2
5           AMD2    1   0
7           ARMH    5   0
9           AAPL    7   0

When I run the following command, I keep the smallest IntTime but the rows are not conserved.

SELECT Location, Head, Id, MIN(IntTime) FROM test 
GROUP BY Id

Also, to run the second GROUP BY, I save this table and do again

SELECT Location, Head, Id, MIN(IntTime) FROM test2 
GROUP BY Head

Is there a way to combine both commands?

[Edit: clarification] The result should not contain two Head with the same value or two Id with the same value. When deleting those duplicates, the row with the smallest IntTime should be kept.

This query returns the exact final results you're looking for (example):

SELECT `final`.*
FROM `tableName` AS `final`
JOIN (
    SELECT `thead`.`Id`, `Head`, MIN(`intTime`) AS `min_intTime`
    FROM `tableName` AS `thead`
    JOIN (
        SELECT `Id`, MIN(intTime) as min_intTime
        FROM `tableName` AS `tid`
        GROUP BY `Id`
    ) `tid`
    ON `tid`.`Id` = `thead`.`Id`
    AND `tid`.`min_intTime` = `thead`.`intTime`
    GROUP BY `Head`
) `thead`
ON `thead`.`Head` = `final`.`Head`
AND `thead`.`min_intTime` = `final`.`intTime`
AND `thead`.`Id` = `final`.`Id`

How it works

The innermost query groups by Id, and returns the Id and corresponding MIN(intTime). Then, the middle query groups by Head, and returns the Head and corresponding Id and MIN(intTime). The final query returns all rows, after being narrowed down. You can think of the final (outermost) query as a query on a table with only the rows you want, so you can do additional comparisons (e.g. WHERE final.intTime > 3).

How to write a MYSQL query that will return children nested under parents?

5 votes

I don't know if what I'm asking is even possible, but here's my situation. I have a table structured somewhat like this:

+--------------------------------------------------+
|   id   |   parent_id   |   name   |   category   | ....
+--------------------------------------------------+
|    0   |       -1      |   item0  |      1       |
|    1   |        0      |   item1  |      1       |
|    2   |        0      |   item2  |      1       |
|    3   |        2      |   item3  |      1       | 
|    4   |        2      |   item4  |      1       | 
|    5   |       -1      |   item5  |      1       | 
+--------------------------------------------------+

A parent_id of -1 will mean it is a "base" item with no parent. Each item will have more columns of information. I need to somehow output all items in a category nested like the following:

item0 => item1    
      => item2
            => item3
            => item4  
item5  

I don't know if that makes sense or not, but hopefully it does!

The only way I can think of doing this is making a query to get all of the "base" items (querying for rows with parent_id = -1) then iterate through every resulting row, querying for rows that have their parent_id equal to the current row's id, then repeating the process going deeper and deeper until there aren't any more children for a base item.

Is there a better way?

Thanks!!

It is not possible in pure SQL.

SQL is intended to work with relational data not trees (hierarchical data).

You can represent tree in an SQL schema, however you won't be a able to result in a tree as you intend to do.

The only way to do is to get an usable result by making as many join as level you're storing.

Your current schema may support multiple level, however, it will be very difficult to manage more than one or two level.

You may be interested in Nested Set Model or Managing hierarchical data in mysql

There are some implementation of the Nested Set like this one to work with Doctrine 2