Best mysql questions in January 2011

I'm not sure if I have the correct indexes or if I can improve the speed of my query in MySQL?

13 votes

My query has a join, and it looks like it's using two indexes which makes it more complicated. I'm not sure if I can improve on this, but I thought I'd ask.

The query produces a list of records with similar keywords the record being queried.

Here's my query.

SELECT match_keywords.padid,
       COUNT(match_keywords.word) AS matching_words
FROM   keywords current_program_keywords
       INNER JOIN keywords match_keywords
         ON match_keywords.word = current_program_keywords.word
WHERE  match_keywords.word IS NOT NULL
       AND current_program_keywords.padid = 25695
GROUP  BY match_keywords.padid
ORDER  BY matching_words DESC
LIMIT  0, 11  

The EXPLAIN alt text

Word is varchar(40).

You can start by trying to remove the IS NOT NULL test, which is implicitly removed by COUNT on the field. It also looks like you would want to omit 25695 from match_keywords, otherwise 25695 (or other) would surely show up as the "best" match within your 11 row limit?

SELECT     match_keywords.padid,
           COUNT(match_keywords.word) AS matching_words
FROM       keywords current_program_keywords
INNER JOIN keywords match_keywords
        ON match_keywords.word = current_program_keywords.word
WHERE      current_program_keywords.padid = 25695
GROUP BY   match_keywords.padid
ORDER BY   matching_words DESC
LIMIT      0, 11

Next, consider how you would do it as a person.

  • You would to start with a padid (25695) and retrieve all the words for that padid
  • From those list of words, go back into the table again and for each matching word, get their padid's (assumed to have no duplicate on padid + word)
  • group the padid's together and count them
  • order the counts and return the highest 11

With your list of 3 separate single-column indexes, the first two steps (both involve only 2 columns) will always have to jump from index back to data to get the other column. Covering indexes may help here - create two composite indexes to test

create index ix_keyword_pw on keyword(padid, word);
create index ix_keyword_wp on keyword(word, padid);

With these composite indexes in place, you can remove the single-column indexes on padid and word since they are covered by these two.

Note: You always have to temper SELECT performance against

  • size of indexes (the more you create the more to store)
  • insert/update performance (the more indexes, the longer it takes to commit since it has to update the data, then update all indexes)

SQL LIMIT returns no results where no LIMIT returns results

11 votes
SELECT * FROM mm_tfs 
WHERE product_slug LIKE '%football%' 
    AND schoolid = '8' AND category_id ='21'
LIMIT 4

Returns 4 values like I'm asking, but the following statement returns 0 - is there a rule about using the OR statement that I'm not familiar with? My assumption is that it should return all of the values in 1 (or more if it weren't for the limit).

SELECT * FROM mm_tfs 
WHERE (product_slug LIKE '%football%' 
   AND schoolid = '8' AND category_id ='21') 
OR (product_description LIKE '%football%' 
   AND schoolid = '8' AND category_id ='21') 
LIMIT 4

NOTE by cyberkiwi The first OR portion of Q2 is exactly the same as the WHERE clause on Q1

product_description LIKE '%football%' AND schoolid = '8' AND category_id ='21

Without the OR Statement by itself does produce the desired result as long as it does not have the limit. When the OR statement is used with the LIMIT as well, there are 0 values returned.

SELECT * FROM mm_tfs 
WHERE product_description LIKE '%football%' 
    AND schoolid = '8' AND category_id ='21'
LIMIT 4

^-- That produces 0 results

SELECT * FROM mm_tfs 
WHERE product_description LIKE '%football%' 
    AND schoolid = '8' AND category_id ='21'

^-- That produces results

The strangest part of this is that all of these queries product the right effect in my PHPMYADMIN SQL query window, but not in the application itself.

You are repeating some of the conditions which is not necessary. Try this instead:

SELECT * FROM mm_tfs 
WHERE 
   (product_slug LIKE '%football%' 
 OR product_description LIKE '%football%')
   AND schoolid = '8' 
   AND category_id ='21'
LIMIT 4

UPDATE:

I have created the following table:

create table mm_tfs2 (schoolid varchar(2), categoryid varchar(2), 
                      description varchar(20), slug varchar(20));

And 5 times:

insert into mm_tfs2 values (8, 21, '', 'football');

And finally the query:

select * from mm_tfs2 where 
(slug like '%football%' and schoolid = 8 and categoryid = 21) 
or (description like '%football%' and schoolid = 8 and categoryid = 21) 
limit 4;
+----------+------------+-------------+----------+
| schoolid | categoryid | description | slug     |
+----------+------------+-------------+----------+
| 8        | 21         |             | football |
| 8        | 21         |             | football |
| 8        | 21         |             | football |
| 8        | 21         |             | football |
+----------+------------+-------------+----------+
4 rows in set (0.00 sec)

So I'm sorry to say that I'm not able to recreate the problem.

Guides for dealing with Unicode in PHP5?

9 votes

Hey everybody. I'm developing a new site (php5/mySQL) and am looking to finally get on the Unicode bandwagon. I'll admit to knowing next to absolutely nothing about supporting Unicode at the moment, but I'm hoping to resolve that with your help.

After desperately flexing my tiny, pathetic excuses for Googlefu-muscles, and scouring over each page that looked promising to my Unicode-newbie eyes, I have come to the conclusion that, while not entirely supported, my precious language of choice (PHP for those that have forgotten) has made at least a half-assed attempt at managing the foreign beast (and from what else I see, succeeding?). I have also come to the conclusion that

<php header('Content-Type: text/html; charset=utf-8'); ?>

is a great place to start and that I should be looking into supporting UTF-8 since I have plenty of space on my (shared, for the moment) hosting.

However, I'm not sure what this strange functionality known as mb_* means or how to incorporate it into functions such as strlen() and . . . to be honest at this point I don't know what other functionality (that I can't live without) is affected.

So I've come to you SO-ites in search of enlightenment and possibly straightening out my confused (where Unicode is concerned!) brain. I really want to support it but I need serious help.

P.S.: Does Unicode affect mysql_real_escape_string() or any other XSS prevention/security measures? I need to stay on top of this as well!

Thanks ahead of time.

  • Adding Javascript into the mix, since I'll be using a mix of pure and jQuery and no knowing about Unicode support + this language. ;)

  1. Welcome onboard utf8 :)
  2. You should simply use mb_* functions in place of your traditional str* functions
  3. MySQL and its API has long and well been supporting utf8, the only requirement that you use encoding when saving data and connecting. google for 'SET NAMES utf8'
  4. Note the 'u' modifier for preg_* functions that tells them to use unicode mode.

Are dynamic mysql queries with sql escaping just as secure as prepared statements?

8 votes

I have an application which would greatly benefit by using dynamic mysql queries in combination with mysql (mysqli) real escape string. If I ran all data received from the user through mysql real escape would it be just as secure as using mysql prepared statements?

Yes, but a qualified yes.

You need to properly escape 100% of the input. And you need to properly set character sets (If you're using the C API, you need to call the mysql_set_character_set() instead of SET NAMES). If you miss one tiny thing, you're vulnerable. So it's yes, as long as you do everything right...

And that's the reason a lot of people will recommend prepared queries. Not because they are any safer. But because they are more forgiving...

MySQL Query IN() Clause Slow on Indexed Column

7 votes

I Have a MySQL query that is being generated by a PHP script, the query will look something like this:

SELECT * FROM Recipe_Data WHERE 404_Without_200 = 0 AND Failures_Without_Success = 0 AND RHD_No IN (10, 24, 34, 41, 43, 51, 57, 59, 61, 67, 84, 90, 272, 324, 402, 405, 414, 498, 500, 501, 510, 559, 562, 595, 632, 634, 640, 643, 647, 651, 703, 714, 719, 762, 765, 776, 796, 812, 814, 815, 822, 848, 853, 855, 858, 866, 891, 920, 947, 956, 962, 968, 1049, 1054, 1064, 1065, 1070, 1100, 1113, 1119, 1130, 1262, 1287, 1292, 1313, 1320, 1327, 1332, 1333, 1335, 1340, 1343, 1344, 1346, 1349, 1352, 1358, 1362, 1365, 1482, 1495, 1532, 1533, 1537, 1549, 1550, 1569, 1571, 1573, 1574, 1596, 1628, 1691, 1714, 1720, 1735, 1755, 1759, 1829, 1837, 1844, 1881, 1919, 2005, 2022, 2034, 2035, 2039, 2054, 2076, 2079, 2087, 2088, 2089, 2090, 2091, 2092, 2154, 2155, 2156, 2157, 2160, 2162, 2164, 2166, 2169, 2171, 2174, 2176, 2178, 2179, 2183, 2185, 2186, 2187, 2201, 2234, 2236, 2244, 2245, 2250, 2255, 2260, 2272, 2280, 2281, 2282, 2291, 2329, 2357, 2375, 2444, 2451, 2452, 2453, 2454, 2456, 2457, 2460, 2462, 2464, 2465, 2467, 2468, 2469, 2470, 2473, 2474, 2481, 2485, 2487, 2510, 2516, 2519, 2525, 2540, 2545, 2547, 2553, 2571, 2579, 2580, 2587, 2589, 2597, 2602, 2611, 2629, 2660, 2662, 2700, 2756, 2825, 2833, 2835, 2858, 2958, 2963, 2964, 3009, 3090, 3117, 3118, 3120, 3121, 3122, 3123, 3126, 3127, 3129, 3130, 3133, 3135, 3137, 3138, 3139, 3141, 3142, 3145, 3146, 3147, 3151, 3152, 3155, 3193, 3201, 3204, 3219, 3221, 3222, 3223, 3224, 3225, 3226, 3227, 3228, 3229, 3231, 3232, 3233, 3234, 3235, 3237, 3239, 3246, 3250, 3253, 3259, 3261, 3291, 3315, 3328, 3377, 3381, 3383, 3384, 3385, 3387, 3388, 3389, 3390, 3396, 3436, 3463, 3465, 3467, 3470, 3471, 3484, 3507, 3515, 3554, 3572, 3641, 3672, 3683, 3689, 3690, 3692, 3693, 3694, 3697, 3698, 3705, 3711, 3713, 3715, 3716, 3717, 3719, 3720, 3722, 3726, 3727, 3732, 3737, 3763, 3767, 3770, 3771, 3772, 3773, 3803, 3810, 3812, 3816, 3846, 3847, 3848, 3851, 3874, 3882, 3902, 3903, 3906, 3908, 3916, 3924, 3967, 3987, 4006, 4030, 4043, 4045, 4047, 4058, 4067, 4107, 4108, 4114, 4115, 4131, 4132, 4133, 4137, 4138, 4139, 4140, 4141, 4142, 4146, 4150, 4151, 4152, 4153, 4157, 4158, 4160, 4163, 4166, 4167, 4171, 4179, 4183, 4221, 4225, 4242, 4257, 4435, 4437, 4438, 4443, 4446, 4449, 4450, 4451, 4452, 4454, 4460, 4550, 4557, 4618, 4731, 4775, 4804, 4972, 5025, 5026, 5039, 5042, 5294, 5578, 5580, 5599, 5602, 5649, 5726, 5779, 5783, 5931, 5934, 5936, 5939, 5940, 5941, 5978, 6044, 6056, 6113, 6116, 6118, 6122, 6123, 6125, 6127, 6128, 6129, 6130, 6131, 6135, 6141, 6145, 6147, 6150, 6152, 6153, 6154, 6160, 6166, 6169);

The column RHD_No is the primary key for this database, and there are about 400,000 rows total. The problem is, the query is extremely slow, it's often around 2 seconds, but I've seen it get as long as 10.

When I try to explain the query, everything seems like it should be fine:

+----+-------------+-------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table       | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | Recipe_Data | range | PRIMARY       | PRIMARY | 4       | NULL |  420 | Using where |
+----+-------------+-------------+-------+---------------+---------+---------+------+------+-------------+

When I profile the query I get:

mysql> show profile;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000015 |
| checking query cache for query | 0.000266 |
| Opening tables                 | 0.000009 |
| System lock                    | 0.000004 |
| Table lock                     | 0.000006 |
| init                           | 0.000115 |
| optimizing                     | 0.000038 |
| statistics                     | 0.000797 |
| preparing                      | 0.000047 |
| executing                      | 0.000002 |
| Sending data                   | 2.675270 |
| end                            | 0.000007 |
| query end                      | 0.000003 |
| freeing items                  | 0.000071 |
| logging slow query             | 0.000002 |
| logging slow query             | 0.000058 |
| cleaning up                    | 0.000005 |
+--------------------------------+----------+

I've been working on this problem for a long time and I haven't been able to find a solution. Is there anything overtly wrong with this query? I don't see how looking at 420 rows should take 2+ seconds.

You are accessing 420 rows by primary key which will probably lead to an index access path. This could access 2 index pages and one data page per key. If these are in cache, the query should run fast. If not, every page access that goes to disk will incur the usual disk latency. If we assume 5ms disk latency and 80% cache hits, we arrive at 420*3*0.2*5ms=1.2 seconds which is on the order of what you see.

Hamming distance on binary strings in SQL

7 votes

I have a table in my DB where I store SHA256 hashes in a BINARY(32) column. I'm looking for a way to compute the Hamming distance of the entries in the column to a supplied value, i.e. something like:

SELECT * FROM table 
  ORDER BY HAMMINGDISTANCE(hash, UNHEX(<insert supplied sha256 hash here>)) ASC 
  LIMIT 10

(in case you're wondering, the Hamming distance of strings A and B is defined as BIT_COUNT(A^B), where ^ is the bitwise XOR operator and BIT_COUNT returns the number of 1s in the binary string).

Now, I know that both the ^ operator and BIT_COUNT function only work on INTEGERs and so I'd say that probably the only way to do it would be to break up the binary strings in substrings, cast each binary substring to integer, compute the Hamming distance substring-wise and then add them. The problem with this is that it sounds terribly complicated, not efficient and definitely not elegant. My question therefore is: could you suggest any better way? (please note that I'm on shared hosting and therefore I can't modify the DB server or load libraries)

edit(1): Obviously loading the whole table in PHP and doing the computations there would be possible but I'd rather avoid it because this table will probably grow quite large.

edit(2): The DB server is MySQL 5.1

edit(3): My answer below contains the code that I just described above.

edit(4): I just found out that using 4 BIGINTs to store the hash instead of a BINARY(32) yields massive speed improvements (more than 100 times faster). See the comments to my answer below.

FWIW, this is the code I was hinting at while explaining the problem. Better ways to accomplish the same thing are welcome (I especially don't like the binary > hex > decimal conversions):

CREATE FUNCTION HAMMINGDISTANCE(A BINARY(32), B BINARY(32))
RETURNS INT DETERMINISTIC
RETURN 
  BIT_COUNT(
    CONV(HEX(SUBSTRING(A, 1,  8)), 16, 10) ^ 
    CONV(HEX(SUBSTRING(B, 1,  8)), 16, 10)
  ) +
  BIT_COUNT(
    CONV(HEX(SUBSTRING(A, 9,  8)), 16, 10) ^ 
    CONV(HEX(SUBSTRING(B, 9,  8)), 16, 10)
  ) +
  BIT_COUNT(
    CONV(HEX(SUBSTRING(A, 17, 8)), 16, 10) ^ 
    CONV(HEX(SUBSTRING(B, 17, 8)), 16, 10)
  ) +
  BIT_COUNT(
    CONV(HEX(SUBSTRING(A, 25, 8)), 16, 10) ^ 
    CONV(HEX(SUBSTRING(B, 25, 8)), 16, 10)
  );

table structure for personal messages

7 votes

Hi.

What is the best table structure to store dialogs between users in private messages? Each user can send personal message to many recepients. Each message has flag for sender: is message deleted or not Each message has flag for receiver: is message unread, read or deleted Each message can be deleted (set flag 'deleted')

PrivateMessages' main page should look like this:

E.g. User1 sends Message1 to User2 and User3. On private message page I have to show 2 same messages:

  1. sent Message1 to user2
  2. sent Message1 to user3

next step - User2 replies to Message2, I'll see on the same page following:

  1. received Message2 from user2 (reply on Message1)
  2. sent Message1 to user3

next step, I answer to message3, I'll see

  1. sent Message3 to user2
  2. sent Message1 to user3

and so on.

Can anyone provide a table-structure? I'm using MySQL 5.5

Main question. How can I get only the last non-deleted message of each dialog?

UPD.

I need to see on main page dialog list, between current user and other users (with pagination, sorted by Date DESC).

I will answer your main question first, then show the table structure I will use for this.

To get only the last non-deleted message of a particular dialog:

select
    Message.Id
   ,Message.Subject
   ,Message.Content
from Message
join Junc_Message_To on Fk_Message = Message.Id
where Junc_Message_To.Fk_User =  {RECIPIENT_ID}
  and Message.Fk_User__From   =  {SENDER_ID}
  and Junc_Message_To.Deleted is null
order by Junc_Message_To.Sent desc
limit 1

A simple three table structure could be used.

Table 1 stores user records - one record per user.

Table 2 stores message record - one record per message, foreign key relates to the user that sent the message.

Table 3 stores the correlation between messages and users that have had the messages sent to them.

enter image description here

Here is the SQL that is used to create the above table diagram:

create table `User` (
  `Id`            int          not null auto_increment ,
  `Username`      varchar(32)  not null ,
  `Password`      varchar(32)  not null ,
  primary key     (`Id`) ,
  unique index     `Username_UNIQUE` (`Username` ASC) )
engine = InnoDB

create table `Message` (
  `Id`            int          not null auto_increment ,
  `Fk_User__From` int          not null ,
  `Subject`       varchar(256) not null ,
  `Content`       text         not null ,
  primary key   (`Id`) ,
  index          `Fk_Message_User__From` (`Fk_User__From` ASC) ,
  constraint     `Fk_Message_User__From`
    foreign key (`Fk_User__From` )
    references   `User` (`Id` )
    on delete cascade
    on update cascade)
engine = InnoDB

create table `Junc_Message_To` (
`Fk_Message`      int          not null ,
  `Fk_User`       int          not null ,
  `Sent`          datetime     not null ,
  `Read`          datetime     not null ,
  `Deleted`       datetime     not null ,
  PRIMARY KEY    (`Fk_Message`, `Fk_User`) ,
  INDEX           `Fk_Junc_Message_To__Message` (`Fk_Message` ASC) ,
  INDEX           `Fk_Junc_Message_To__User` (`Fk_User` ASC) ,
  constraint      `Fk_Junc_Message_To__Message`
    foreign key  (`Fk_Message` )
    references    `Message` (`Id` )
    on delete cascade
    on update cascade,
  constraint      `Fk_Junc_Message_To__User`
    foreign key  (`Fk_User` )
    references    `User` (`Id` )
    on delete cascade
    on update cascade)
engine = InnoDB

Preventing SQL injection using ONLY php

7 votes

The think is that i have a complete working website with many calls to the MySQL server and doing some research on this site i saw that making my querys in this form:

$query = sprintf("SELECT * FROM users WHERE user='%s' AND password='%s'",
            mysql_real_escape_string($user),
            mysql_real_escape_string($password));

I can solve the security issue, but, as i said, i have many calls to the MySQL server, and the best way (in my case) to solve the problem is going directly to the vars im passing to the query but whitout using a MySQL function because im out of the query. Let me explain it, i have this:

mysql_query("SELECT * FROM `post` WHERE id=" . $_GET['edit']);

I cant do modifications to this query because i have a lot of this in all my code, insted i preefer to check for injections on the var, $_GET['edit'].

How can i using pure PHP check for SQL injections on the variables of the querys? Like:

$_GET['edit']=freehack($_GET['edit']);

Don't do it this way. By replacing the value of your $_GET parameters with "safe" versions, you are contaminating your input data, which you may need for other places.

Only escape data when you need to use it on the database layer. It will only take you a little time to fix your queries, and will save you a ton of headache in the long run.

In any case, what you are doing is still not secure! See: PHP: Is mysql_real_escape_string sufficient for cleaning user input?

You really should be using prepared queries with PDO. Also, you should be checking your user input for validity before using it in a query.

Mysql Slow Insert

7 votes

Hi,

i have the following InnoDB table:

+-----------+-----------+------+-----+-------------------+----------------+
| Field     | Type      | Null | Key | Default           | Extra          |
+-----------+-----------+------+-----+-------------------+----------------+
| id        | int(11)   | NO   | PRI | NULL              | auto_increment |
| doc_id    | char(32)  | NO   |     | NULL              |                |
| staff     | char(18)  | NO   |     | NULL              |                |
| timestamp | timestamp | NO   | MUL | CURRENT_TIMESTAMP |                |
+-----------+-----------+------+-----+-------------------+----------------+

With this keys:

+--------------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table        | Non_unique | Key_name        | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| staff_online |          0 | PRIMARY         |            1 | id          | A         |      277350 |     NULL | NULL   |      | BTREE      |         |
| staff_online |          1 | timestamp       |            1 | timestamp   | A         |      277350 |     NULL | NULL   |      | BTREE      |         |
| staff_online |          1 | staff_timestamp |            1 | timestamp   | A         |      277350 |     NULL | NULL   |      | BTREE      |         |
| staff_online |          1 | staff_timestamp |            2 | staff       | A         |      277350 |     NULL | NULL   |      | BTREE      |         |
+--------------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

I just noticed that in mysql-slow.log i some times have an INSERT query on this table which takes > 1 second

INSERT INTO `staff_online` (`doc_id`, `staff`, `timestamp`) VALUES ('150b60a0ab8c5888bdbbb80bd8b7f8a2', 'asia', '2011-01-29 16:52:54')

I'm really puzzled why does it take so long and how can i speed up it?

BTW: Each day there're ~80 slow INSERTS and 40 slow UPDATES like this.

Sometimes it is not the query itself which causes a slowdown - another query operating on the table can easily inserts to slow down due to transactional isolation and locking. Your slow queries might simply have been waiting for another transaction(s) to complete. This is fairly common on a busy table, or if your server is executing long/complex transactions.

Another significant factor will be how the overall performance of your database: how your my.cnf file is tuned, how the server itself is tuned, what else the server has running on it, and of course, what hardware the server is running.

The linux tool mytop and the query SHOW ENGINE INNODB STATUS\G can be helpful to see possible trouble spots. General linux performance tools can also show how busy your disks are, etc.

Given the nature of this table, have you considered an alternative way to keep track of who is online? In MySQL, I have used a MEMORY table for such purposes in the past. A NoSQL data store might also be good for this type of information. Redis could store this as sorted set with much success (score == timestamp).

Further reading:

INT vs VARCHAR in search

6 votes

Which one of the following queries will be faster and more optimal (and why):

  1. SELECT * FROM items WHERE w = 320 AND h = 200 (w and h are INT)

  2. SELECT * FROM items WHERE dimensions = '320x200'(dimensions is VARCHAR)

Here are some actual measurements. (Using SQLite; may try it with MySQL later.)

Data = All 1,000,000 combinations of w, h ∈ {1...1000}, in randomized order.

CREATE TABLE items (id INTEGER PRIMARY KEY, w INTEGER, h INTEGER)

Average time (of 20 runs) to execute SELECT * FROM items WHERE w = 320 and h = 200 was 5.39±0.29 µs.

CREATE TABLE items (id INTEGER PRIMARY KEY, dimensions TEXT)

Average time to execute SELECT * FROM items WHERE dimensions = '320x200' was 5.69±0.23 µs.

There is no significant difference, efficiency-wise.

But

There is a huge difference in terms of usability. For example, if you want to calculate the area and perimeter of the rectangles, the two-column approach is easy:

SELECT w * h, 2 * (w + h) FROM items

Try to write the corresponding query for the other way.

Implementing a live voting system

6 votes

I'm looking at implementing a live voting system on my website. The website provides a live stream, and I'd like to be able to prompt viewers to select an answer during a vote initiated by the caster. I can understand how to store the data in a mySQL database, and how to process the answers. However:

How would I initially start the vote on the client-side and display it? Should a script be running every few seconds on the page, checking another page to see if a question is available for the user?

Are there any existing examples of a real-time polling system such as what I'm looking at implementing?

You would have to query the server for a new question every few seconds.

The alternative is to hold the connection open until the server sends more data or it times out, which just reduces (but does not eliminate) the server hits. I think it is called "long polling". http://en.wikipedia.org/wiki/Push_technology

MySQL query - possible to include this clause?

6 votes

I have the following query, which retrieves 4 adverts from certain categories in a random order.

At the moment, if a user has more than 1 advert, then potentially all of those ads might be retrieved - I need to limit it so that only 1 ad per user is displayed.

Is this possible to achieve in the same query?

SELECT      a.advert_id, a.title, a.url, a.user_id, 
            FLOOR(1 + RAND() * x.m_id) 'rand_ind' 

FROM        adverts AS a
INNER JOIN  advert_categories AS ac
ON          a.advert_id = ac.advert_id,
(
            SELECT MAX(t.advert_id) - 1 'm_id' 
            FROM adverts t
)           x

WHERE       ac.category_id IN 
(
            SELECT category_id
            FROM website_categories
            WHERE website_id = '8'
)
AND         a.advert_type = 'text'

GROUP BY    a.advert_id
ORDER BY    rand_ind 
LIMIT       4

Note: The solution is the last query at the bottom of this answer.

Test Schema and Data

create table adverts (
    advert_id int primary key, title varchar(20), url varchar(20), user_id int, advert_type varchar(10))
;
create table advert_categories (
    advert_id int, category_id int, primary key(category_id, advert_id))
;
create table website_categories (
    website_id int, category_id int, primary key(website_id, category_id))
;
insert website_categories values
    (8,1),(8,3),(8,5),
    (1,1),(2,3),(4,5)
;
insert adverts (advert_id, title, user_id) values
    (1, 'StackExchange', 1),
    (2, 'StackOverflow', 1),
    (3, 'SuperUser', 1),
    (4, 'ServerFault', 1),
    (5, 'Programming', 1),
    (6, 'C#', 2),
    (7, 'Java', 2),
    (8, 'Python', 2),
    (9, 'Perl', 2),
   (10, 'Google', 3)
;
update adverts set advert_type = 'text'
;
insert advert_categories values
    (1,1),(1,3),
    (2,3),(2,4),
    (3,1),(3,2),(3,3),(3,4),
    (4,1),
    (5,4),
    (6,1),(6,4),
    (7,2),
    (8,1),
    (9,3),
   (10,3),(10,5)
;

Data properties

  • each website can belong to multiple categories
  • for simplicity, all adverts are of type 'text'
  • each advert can belong to multiple categories. If a website has multiple categories that are matched multiple times in advert_categories for the same user_id, this causes the advert_id's to show twice when using a straight join between 3 tables in the next query.

This query joins the 3 tables together (notice that ids 1, 3 and 10 each appear twice)

select *
from website_categories wc
inner join advert_categories ac on wc.category_id = ac.category_id
inner join adverts a on a.advert_id = ac.advert_id and  a.advert_type = 'text'
where wc.website_id='8'
order by a.advert_id

To make each website show only once, this is the core query to show all eligible ads, each only once

        select *
        from adverts a
        where a.advert_type = 'text'
          and exists (
            select *
            from website_categories wc
            inner join advert_categories ac on wc.category_id = ac.category_id
            where wc.website_id='8'
              and a.advert_id = ac.advert_id)

The next query retrieves all the advert_id's to be shown

select advert_id, user_id
from (
    select
        advert_id, user_id,
        @r := @r + 1 r
    from (select @r:=0) r
    cross join 
    (
        # core query -- vvv
        select a.advert_id, a.user_id
        from adverts a
        where a.advert_type = 'text'
          and exists (
            select *
            from website_categories wc
            inner join advert_categories ac on wc.category_id = ac.category_id
            where wc.website_id='8'
              and a.advert_id = ac.advert_id)
        # core query -- ^^^
        order by rand()
    ) EligibleAdsAndUserIDs
) RowNumbered
group by user_id
order by r
limit 2

There are 3 levels to this query

  1. aliased EligibleAdsAndUserIDs: core query, sorted randomly using order by rand()
  2. aliased RowNumbered: row number added to core query, using MySQL side-effecting @variables
  3. the outermost query forces mysql to collect rows as numbered randomly in the inner queries, and group by user_id causes it to retain only the first row for each user_id. limit 2 causes the query to stop as soon as two distinct user_id's have been encountered.

This is the final query which takes the advert_id's from the previous query and joins it back to table adverts to retrieve the required columns.

  1. only once per user_id
  2. feature user's with more ads proportionally (statistically) to the number of eligible ads they have

Note: Point (2) works because the more ads you have, the more likely you will hit the top placings in the row numbering subquery

select a.advert_id, a.title, a.url, a.user_id
from
(
    select advert_id
    from (
        select
            advert_id, user_id,
            @r := @r + 1 r
        from (select @r:=0) r
        cross join 
        (
            # core query -- vvv
            select a.advert_id, a.user_id
            from adverts a
            where a.advert_type = 'text'
              and exists (
                select *
                from website_categories wc
                inner join advert_categories ac on wc.category_id = ac.category_id
                where wc.website_id='8'
                  and a.advert_id = ac.advert_id)
            # core query -- ^^^
            order by rand()
        ) EligibleAdsAndUserIDs
    ) RowNumbered
    group by user_id
    order by r
    limit 2
) Top2
inner join adverts a on a.advert_id = Top2.advert_id;

Get total hours worked in a day mysql

6 votes

Hi all,

I have a mysql table where employee login logout timings are noted.Here in the in-out coloumn 1-represents login and 0-represents logout.

  [id]   [User_id]           [Date_time]                 [in_out]
    1       1          2011-01-20 09:30:03                  1
    2       1          2011-01-20 11:30:43                  0
    3       1          2011-01-20 11:45:12                  1
    4       1          2011-01-20 12:59:56                  0
    5       1          2011-01-20 13:33:11                  1
    6       1          2011-01-20 15:38:16                  0
    7       1          2011-01-20 15:46:23                  1
    8       1          2011-01-20 17:42:45                  0

Is it possible to retreieve total hours worked in a day by a user using single query?? I tried a alot but all in vain.I can do this in php using array but unable to do so using single query.

Thanks in advance...

SELECT `User_id`, time(sum(`Date_time`*(1-2*`in_out`)))
  FROM `whatever_table` GROUP BY `User_id`;

The (1-2*`in_out`) term gives every login event a -1 factor and every logout event a +1 factor. The sum function takes the sum of the Date_time column, and GROUP BY `User_id` makes that the sum for each different user is created.

How to append data from SQL to an existing file

6 votes

SQL has the option to dump data into a file, using the INTO OUTFILE option, for exmaple

SELECT * from FIshReport INTO OUTFILE './FishyFile'

The problem is, this command is only allowed if the file didn't exist before it. It creates the file and then enters the data. So, is there any way to append data to a file this way?

As the MySQL page on SELECT syntax suggests:

http://dev.mysql.com/doc/refman/5.0/en/select.html

the alternative to this is to issue the SELECT from the MySQL client:

However, if the MySQL client software is installed on the remote machine,
you can instead use a client command such as mysql -e "SELECT ..." > file_name 
to generate the file on the client host. 

which, in your case, would be modified to be:

mysql -e "SELECT * from FishReport" >> file_name

so that you simply append to the file.

From your Tcl script, you could simply issue this as an exec command:

http://www.tcl.tk/man/tcl/tutorial/Tcl26.html

Is it ever a good idea to not have an 'id' primary key for a table?

6 votes

It seems to me like its always a good idea, but is there ever a case where you are better off not having this in table?

In my experience, almost never. (For a "speed matters, I'm just inserting and don't really care about retrieval at this point" style of application, perhaps.)

Whilst you might conceivably never use the ID field, it's nearly always wise to have one happily AUTO_INCREMENTing away, because one day you might need one. (You could of course simply do an 'ALTER..' to add one, but that's besides the point.)

MySql second-smallest element in each group

6 votes

Hi all, I have a table similar to the following:

    date    |   expiry
-------------------------    
2010-01-01  | 2010-02-01
2010-01-01  | 2010-03-02
2010-01-01  | 2010-04-04
2010-02-01  | 2010-03-01
2010-02-01  | 2010-04-02

In the table, each date may have multiple 'expiry' values. I need a query that returns the n-th smallest expiry in each date. For example, for n = 2, I would expect:

     date    |   expiry
-------------------------       
2010-01-01  | 2010-03-02
2010-02-01  | 2010-04-02

My trouble is that AFAIK, there is no aggregate function which returns the n'th largest/smallest element, so I can't use 'GROUP BY'. More specifically, if I had a magical MIN() aggregate that accepts a second parameter 'offset', I would write:

SELECT MIN(expiry, 1) FROM table WHERE date IN ('2010-01-01', '2010-02-01') GROUP BY date

Any suggestions?

One hack is to use group_concat. Group by the date and concat the expiry date in ascending order and use substring_index function to fetch the nth value.

mysql> select * from expiry;
+------------+------------+
| date       | expiry     |
+------------+------------+
| 2010-01-01 | 2010-02-01 |
| 2010-01-01 | 2010-03-02 |
| 2010-01-01 | 2010-04-04 |
| 2010-02-01 | 2010-03-01 |
| 2010-02-01 | 2010-04-02 |
+------------+------------+
5 rows in set (0.00 sec)

mysql> SELECT mdate,
       Substring_index(Substring_index(edate, ',', 2), ',', -1) AS exp_date
FROM   (SELECT `date`               AS mdate,
               GROUP_CONCAT(expiry order by expiry asc separator ",") AS edate
        FROM   expiry
        GROUP  BY mdate) e1;  
+------------+------------+
| mdate      | exp_date   |
+------------+------------+
| 2010-01-01 | 2010-03-02 |
| 2010-02-01 | 2010-04-02 |
+------------+------------+
2 rows in set (0.00 sec)

In the example here the sub-query gives the following output:

+------------+----------------------------------+
| mdate      | edate                            |
+------------+----------------------------------+
| 2010-01-01 | 2010-02-01,2010-03-02,2010-04-04 |
| 2010-02-01 | 2010-03-01,2010-04-02            |
+------------+----------------------------------+

substring_index(edate,',',2) goes 2 elements forward (for nth element substitute 2 by n).

+------------+------------------------------+
| mdate      | substring_index(edate,',',2) |
+------------+------------------------------+
| 2010-01-01 | 2010-02-01,2010-03-02        |
| 2010-02-01 | 2010-03-01,2010-04-02        |
+------------+------------------------------+

we run another substring_index on the above output to get only the 2nd element (the last element of the intermediate result) using substring_index(substring_index(edate,',',2),',',-1)

+------------+------------------------------------------------------+
| mdate      | substring_index(substring_index(edate,',',2),',',-1) |
+------------+------------------------------------------------------+
| 2010-01-01 | 2010-03-02                                           |
| 2010-02-01 | 2010-04-02                                           |
+------------+------------------------------------------------------+

If there are too many values to concat you might run out of group_concat_max_len value (default 1024, but can be set higher).

UPDATE: The SQL given above will give nth element even when there is less n elements for tht group. To avoid that the sql can be modified as:

SELECT mdate,
       IF(cnt >= 2,Substring_index(Substring_index(edate, ',', 2), ',', -1),NULL) AS exp_date
FROM   (SELECT `date`               AS mdate,
               count(expiry) as cnt,
               GROUP_CONCAT(expiry order by expiry asc separator ",") AS edate
        FROM   expiry
        GROUP  BY mdate) e1;  

Use PHP to detect which htaccess user signed in?

5 votes

Okey dokey, I'm constructing an upload so people I know can send me files securely, and with ease. But I want to design it just so, that when one of my friends sign in with their sign-in (its going to an .htaccess login). I can establish that in PHP and log their file into a database associated with their account.

In short, I need PHP to be able to detect who is signed in so I can pass that data to a database.

Is there any possible way of doing that?

You should be able to get the user name the user signed in with from the $_SERVER['REMOTE_USER'] variable after they've successfully signed in.