Best mysql questions in March 2012

SQL select some rows in a table so that they sum up to certain value

9 votes

How can I select just some rows in the following table so that they sum up to certain value?

Table
-----
id | qty1 | qty2 | qty3 | qty4
------------------------------
1  | 0.0  | 0.0  | 10   | 20
2  | 1.5  | 0.0  | 7.5  | 18
3  | 1.0  | 2.0  | 7.5  | 18
4  | 0.0  | 0.5  | 5    | 13

Let's say, the top value I want is 57...

So I need to select the rows from the previous table such that qty1+qty2+qty3+qty4 of each row, get until that 57 value, and discard the other rows. In this example, I would get the following:

id | qty1 | qty2 | qty3 | qty4
------------------------------
1  | 0.0  | 0.0  | 10   | 20
2  | 1.5  | 0.0  | 7.5  | 18

Because 10+20+1.5+7.5+18 = 57, so I discard rows 3 & 4...

Now I wish that the top value is 50, then I should get:

id | qty1 | qty2 | qty3 | qty4
------------------------------
1  | 0.0  | 0.0  | 10   | 20
2  | 1.5  | 0.0  | 7.5  | 11

Since these values sum up to 50, and the 7 from row2,qty4 is left out... (BTW the rows are ordered in this particular way because that's the order in which I wish to account for the sums of qtys... It's not valid to sum up first row1, then 3, then 2 then 4, for example... They should always be sumed in the order 1,2,3,4...)

What if I would like the complement of this? I mean, the other two rows I didn't got in the last result.

First case:

id | qty1 | qty2 | qty3 | qty4
------------------------------
3  | 1.0  | 2.0  | 7.5  | 18
4  | 0.0  | 0.5  | 5    | 13

Second case:

id | qty1 | qty2 | qty3 | qty4
------------------------------
2  | 0.0  | 0.0  | 0.0  | 7
3  | 1.0  | 2.0  | 7.5  | 18
4  | 0.0  | 0.5  | 5    | 13

(If the second case is too complicated, how about obtaining:

id | qty1 | qty2 | qty3 | qty4
------------------------------
1  | 0.0  | 0.0  | 10   | 20

Because adding up the original qtys of row 2 would surpass the 50 value, I discard it... The complement in this case should just be:

id | qty1 | qty2 | qty3 | qty4
------------------------------
2  | 1.5  | 0.0  | 7.5  | 18
3  | 1.0  | 2.0  | 7.5  | 18
4  | 0.0  | 0.5  | 5    | 13

)

Let's put it this way: If SQL was a religion I'd go to hell for providing this solution. SQL is not meant to solve this kind of problems, so any solution would be horrible. Mine is no exception :)

set @limitValue := 50;
select id, newQty1, newQty2, newQty3, newQty4 from (
  select id,
  if(@limitValue - qty1 > 0, qty1, greatest(@limitValue, 0)) newQty1,
  @limitValue := @limitValue - qty1 Total1,
  if(@limitValue - qty2 > 0, qty2, greatest(@limitValue, 0)) newQty2,
  @limitValue := @limitValue - qty2 Total2,
  if(@limitValue - qty3 > 0, qty3, greatest(@limitValue, 0)) newQty3,
  @limitValue := @limitValue - qty3 Total3,
  if(@limitValue - qty4 > 0, qty4, greatest(@limitValue, 0)) newQty4,
  @limitValue := @limitValue - qty4 Total4
  from (
    select id, qty1, qty2, qty3, qty4,
      @rowTotal < @limitValue Useful,
      @previousRowTotal := @rowTotal PreviousRowTotal,
      @rowTotal := @rowTotal + qty1 + qty2 + qty3 + qty4 AllRowsTotal,
      @rowTotal - @previousRowTotal CurrentRowTotal 
    from t,
    (select @rowTotal := 0, @previousRowTotal := 0) S1
  ) MarkedUseful
  where useful = 1
) Final

For the provided data, this results in:

+----+---------+---------+---------+---------+
| ID | NEWQTY1 | NEWQTY2 | NEWQTY3 | NEWQTY4 |
+----+---------+---------+---------+---------+
|  1 | 0       |       0 | 10      |      20 |
|  2 | 1.5     |       0 | 7.5     |      11 |
+----+---------+---------+---------+---------+

And the complement:

set @limitValue := 50;
select t1.id,
  coalesce(t1.qty1 - newQty1, t1.qty1) newQty1,
  coalesce(t1.qty2 - newQty2, t1.qty2) newQty2,
  coalesce(t1.qty3 - newQty3, t1.qty3) newQty3,
  coalesce(t1.qty4 - newQty4, t1.qty4) newQty4
from t t1 left join (
    select id,
    if(@limitValue - qty1 > 0, qty1, greatest(@limitValue, 0)) newQty1,
    @limitValue := @limitValue - qty1 Total1,
    if(@limitValue - qty2 > 0, qty2, greatest(@limitValue, 0)) newQty2,
    @limitValue := @limitValue - qty2 Total2,
    if(@limitValue - qty3 > 0, qty3, greatest(@limitValue, 0)) newQty3,
    @limitValue := @limitValue - qty3 Total3,
    if(@limitValue - qty4 > 0, qty4, greatest(@limitValue, 0)) newQty4,
    @limitValue := @limitValue - qty4 Total4
    from (
      select id, qty1, qty2, qty3, qty4,
        @rowTotal < @limitValue Useful,
        @previousRowTotal := @rowTotal PreviousRowTotal,
        @rowTotal := @rowTotal + qty1 + qty2 + qty3 + qty4 AllRowsTotal,
        @rowTotal - @previousRowTotal CurrentRowTotal 
      from t,
      (select @rowTotal := 0, @previousRowTotal := 0) S1
    ) MarkedUseful
    where useful = 1
) Final
on t1.id = final.id
where Total1 < 0 or Total2 < 0 or Total3 < 0 or Total4 < 0 or final.id is null

For the provided data, this results in:

+----+---------+---------+---------+---------+
| ID | NEWQTY1 | NEWQTY2 | NEWQTY3 | NEWQTY4 |
+----+---------+---------+---------+---------+
|  2 |       0 | 0       | 0       |       7 |
|  3 |       1 | 2       | 7.5     |      18 |
|  4 |       0 | 0.5     | 5       |      13 |
+----+---------+---------+---------+---------+

Enjoy!

Is the PDO Library faster than the native MySQL Functions?

9 votes

I have read several questions regarding this but I fear they may be out of date as newer versions of the PDO libraries have been released since these questions were answered.

I have written a MySQL class that builds queries and escapes parameters, and then returns results based on the query. Currently this class is using the built-in mysql functions.

I am well aware of the advantages of using the PDO Library, e.g. it is compatible with other databases, stored procedures are easier to execute etc... However, what I would like to know is simply; is using the PDO Library faster then using the mysql built-in functions?

I have just written the equivalent class for MsSQL, so rewriting it to work with all databases would not take me long at all. Is it worth it or is the PDO library slower?

I found PDO in many situation/projects to be even faster than the more native modules.
Mainly because many patterns/building blocks in a "PDO-application" require less php script driven code and more code is executed in the compiled extension and there is a speed penalty when doing things in the script. Simple, synthetic tests without data and error handling often do not cover this part, which is why (amongst other problems like e.g. measuring inaccuracies) I think "10000x SELECT x FROM foo took 10ms longer" conclusions are missing the point more often than not .
I can't provide you with solid benchmarks and the outcome depends on how the surrounding application handles the data but even synthetic tests usually only show differences so negligible that you better spend your time on optimizing your queries, the MySQL server, the network, ... instead of worrying about PDO's raw performance. Let alone security and error handling ...

MySQL How do I create this subquery?

9 votes

I have the following tables

Table Farm

+---------+--------+-------------------+-----------+------------+
| FARM_ID |Stock_ID| FARM_TITLE        | Size      | FARM_VALUE |
+---------+--------+-------------------+-----------+------------+
|       2 |      1 | AgriZone          | M         |        202 |
|       3 |      1 | Cow Mill          | L         |         11 |
|       4 |      2 | Beef Farm         | H         |        540 |
|       5 |      2 | CattleOne         | M         |       1080 |
|       6 |      2 | FarmOne           | L         |        455 |
|       7 |      3 | Perdue            | H         |        333 |
|       8 |      4 | Holstein          | M         |        825 |
|      10 |      1 | Dotterers         | H         |         98 |
+---------+--------+-------------------+-----------+------------+

Table Gate

+---------+---------+------------+
| GATE_ID | FARM_ID | FARM_VALUE |
+---------+---------+------------+
|       1 |       2 |          0 |
|       1 |       3 |          0 |
|       1 |       4 |        540 |
|       2 |       4 |        550 |
|       3 |       4 |        560 |
|       4 |       4 |        570 |
|       5 |       4 |        580 |
|       6 |       4 |        590 |
|       1 |       5 |       1080 |
|       2 |       5 |       1100 |
|       3 |       5 |       1120 |
|       4 |       5 |       1140 |
|       5 |       5 |       1160 |
|       6 |       5 |       1180 |
|       1 |       6 |        455 |
|       2 |       6 |        536 |
|       3 |       6 |        617 |
|       4 |       6 |        698 |
|       5 |       6 |        779 |
|       6 |       6 |        860 |
|       1 |       7 |          0 |
|       1 |       8 |          0 |
|       1 |      10 |          0 |
+---------+---------+------------+

Table Origin

+--------+----------+
| ORI_ID | ORI_NAME |
+--------+----------+
|      1 |   US     |
|      2 |   CA     |
|      3 |   MX     |
+--------+----------+

Table Stock

+--------+--------+-------------------+
|Stock_ID| ORI_ID | Stock_TITLE       |
+--------+--------+-------------------+
|      1 |      1 | P1                |
|      2 |      2 | P3                |
|      3 |      3 | Q4                |
|      4 |      3 | B3                |
+--------+--------+-------------------+

Table Results

+-----------+---------+---------+------------+------------+
| RESULT_ID | FARM_ID | GATE_ID | FARM_VALUE |    Score%  |
+-----------+---------+---------+------------+------------+
|         1 |       7 |       1 |        333 |        100 |
|         2 |       8 |       1 |        825 |        100 |
|         3 |       6 |       1 |        455 |         40 |
|         4 |       6 |       2 |        536 |          0 |
|         5 |       6 |       3 |        617 |          0 |
|         6 |       6 |       4 |        698 |        100 |
|         7 |       6 |       5 |        779 |          0 |
|         8 |       6 |       6 |        860 |         10 |
|         9 |       4 |       1 |        540 |        100 |
|        10 |       4 |       2 |        550 |         90 |
|        11 |       4 |       3 |        560 |          0 |
|        12 |       4 |       4 |        570 |        100 |
|        13 |       4 |       5 |        580 |         10 |
|        14 |       4 |       6 |        590 |          0 |
|        15 |       5 |       1 |       1080 |          0 |
|        16 |       5 |       2 |       1100 |          0 |
|        17 |       5 |       3 |       1120 |          0 |
|        18 |       5 |       4 |       1140 |         50 |
|        19 |       5 |       5 |       1160 |          0 |
|        20 |       5 |       6 |       1180 |        100 |
|        21 |       3 |       1 |         11 |        100 |
|        22 |      10 |       1 |         98 |         90 |
|        23 |       2 |       1 |        202 |        100 |
+-----------+---------+---------+------------+------------+

Annotated Result table: Same as above ^

+-----------+---------+---------+------------+------------+
| RESULT_ID | FARM_ID | GATE_ID | FARM_VALUE |   Score%   |
+-----------+---------+---------+------------+------------+

+-----------+---------+---------+------------+------------+
|         1 |       7 |       1 |        333 |        100 | <--|H-Case {H}
+-----------+---------+---------+------------+------------+    

+-----------+---------+---------+------------+------------+     
|         2 |       8 |       1 |        825 |        100 | <--|M-Case {M}
+-----------+---------+---------+------------+------------+

+-----------+---------+---------+------------+------------+
|         3 |       6 |       1 |        455 |         40 |
|         4 |       6 |       2 |        536 |          0 |
|         5 |       6 |       3 |        617 |          0 |
|         6 |       6 |       4 |        698 |        100 |  <--|L
|         7 |       6 |       5 |        779 |          0 |     |
|         8 |       6 |       6 |        860 |         10 |     |
+-----------+---------+---------+------------+------------+     |
|         9 |       4 |       1 |        540 |        100 |     |
|        10 |       4 |       2 |        550 |         90 |     |
|        11 |       4 |       3 |        560 |          0 |     |
|        12 |       4 |       4 |        570 |        100 |  <--+M-case {H,M,L}
|        13 |       4 |       5 |        580 |         10 |     |
|        14 |       4 |       6 |        590 |          0 |     |
+-----------+---------+---------+------------+------------+     |
|        15 |       5 |       1 |       1080 |          0 |     |
|        16 |       5 |       2 |       1100 |          0 |     |
|        17 |       5 |       3 |       1120 |          0 |     |
|        18 |       5 |       4 |       1140 |         50 |  <--|H
|        19 |       5 |       5 |       1160 |          0 |
|        20 |       5 |       6 |       1180 |        100 |
+-----------+---------+---------+------------+------------+

+-----------+---------+---------+------------+------------+
|        21 |       3 |       1 |         11 |        100 | <--|L
|        22 |      10 |       1 |         98 |         90 | <--+H-case {H,M,L}
|        23 |       2 |       1 |        202 |        100 | <--|M
+-----------+---------+---------+------------+------------+

Computations required:

  • Type can have only three values at most: {H, M, L};
  • When all values are present, they are graded as followed: H=70 M=20 L=10
  • ALL unique casese are

  • Case {H,M} : H=80 M=20

  • Case {M,L} : M=60 L=40
  • Case {H,L} : H=90 L=10
  • Case {H} : H=100
  • Case {M} : M=100
  • Case {L} : L=100
  • Case {H,M,L} : H=70 M=20 L=10

Further Explanation

  • Only Stock with with atleast one GATE, fully satisfied can get 100 points max
    1. Example: Q4 has 3 sets of 6 GATES; Only one GATE set has to be satsified (have a score present).
    2. The points present must be Multiplied against the particular case it pertains Example: Q4 has case {H,M,L} which means H=70; M=20; L=10 This would results in (70*100%)+(20*50%)+(10*100%)=90 (look back above at the Result table annotations)
    3. 2.
  • Points should still be considered and accounted for even when a gate hasn't been fully satisfied. The gate with MAX points earned should be kept when no gates have been fully satisfied. (Will provide further explanation if not understood)

If we Perform a query to make sense of the tables and data it will look like below

+---------+-----------+---------------+-----------+---------+-----------+---------+
| Origin  | Stock     | Farm Title    | Farm Value|   Gate  |  Size     |  Score  |
+---------+-----------+---------------+-----------+---------+-----------+---------+
| US      | P1        | Perdue        |       333 |       1 | H         |     100 |
| US      | P3        | Holstein      |       825 |       1 | M         |     100 |
| CA      | Q4        | FarmOne       |       455 |       1 | L         |      40 |
| CA      | Q4        | FarmOne       |       536 |       2 | L         |       0 |
| CA      | Q4        | FarmOne       |       617 |       3 | L         |       0 |
| CA      | Q4        | FarmOne       |       698 |       4 | L         |     100 |
| CA      | Q4        | FarmOne       |       779 |       5 | L         |       0 |
| CA      | Q4        | FarmOne       |       860 |       6 | L         |      10 |
| CA      | Q4        | Beef Farm     |       540 |       1 | H         |       0 |
| CA      | Q4        | Beef Farm     |       550 |       2 | H         |      90 |
| CA      | Q4        | Beef Farm     |       560 |       3 | H         |       0 |
| CA      | Q4        | Beef Farm     |       570 |       4 | H         |     100 |
| CA      | Q4        | Beef Farm     |       580 |       5 | H         |      10 |
| CA      | Q4        | Beef Farm     |       590 |       6 | H         |       0 |
| CA      | Q4        | CattleOne     |      1080 |       1 | M         |       0 |
| CA      | Q4        | CattleOne     |      1100 |       2 | M         |       0 |
| CA      | Q4        | CattleOne     |      1120 |       3 | M         |       0 |
| CA      | Q4        | CattleOne     |      1140 |       4 | M         |      50 |
| CA      | Q4        | CattleOne     |      1160 |       5 | M         |     100 |
| CA      | Q4        | CattleOne     |      1180 |       6 | M         |       0 |
| MX      | B3        | Cow Mill      |        11 |       1 | L         |     100 |
| MX      | B3        | Dotterers     |        98 |       1 | H         |      90 |
| MX      | B3        | AgriZone      |       202 |       1 | M         |     100 |
+---------+-----------+---------------+-----------+---------+-----------+---------+

Desire Results

+---------+-------------------+-------+
| Origin  |  Stock            | score |
+---------+-------------------+-------+
| US      |   P1              |   100 |
| US      |   P3              |   100 |
| CA      |   Q4              |    90 |
| MX      |   B3              |    93 |
+---------+-------------------+-------+

Explanation

Since origin has a stock which consists of 3 different farms and those farms have 6 gates each. As long as one gate-set (numerically matching gates) is scored to SOME value we can consider the entire STOCK found completely. This is the only way a stock can be considered 100.

Moreover and to reiterate, STOCK Q4 has case: {H,M,L} and all of gate (4) was found to some degree. gate 4 has the score (100% * H) + (50% * M) + (100% * L) which equals (70*100%) + (20*50%) + (10*100%) = 90

Hence: (Taken from above)

  | CA      |   Q4              |    90 |

QED

So what I need help with is creating the subquery/subselect to do make this computation work. I set up everything in the scenario above (along with a query in progress that I've been working with) in the SQL fiddle link below.

Thanks greatly stackoverflow community.

> The above problem in SqlFiddle can be found here <

Here is the query I've been working on. However, the results are slighly different from the ones you've posted in your question:

select o.origin_name, s.stock_title, sum(
  case f.size
    when 'H' then
      case
        when sizes = 'H,L,M' then 70
        when sizes = 'H,M' then 80
        when sizes = 'H,L' then 90
        when sizes = 'H' then 100
        else 0
      end
    when 'M' then
      case
        when sizes = 'H,L,M' then 20
        when sizes = 'H,M' then 20
        when sizes = 'L,M' then 60
        when sizes = 'M' then 100
        else 0
      end
    else
      case
        when sizes = 'H,L,M' then 10
        when sizes = 'L,M' then 40
        when sizes = 'H,L' then 10
        when sizes = 'L' then 100
        else 0
      end
  end * r.score / 100) FinalScore
from farm f
join (
  select f.stock_id, group_concat(distinct f.size order by f.size) sizes
  from farm f
  join results r on f.farm_id = r.farm_id
  group by f.stock_id
) stockSizes on f.stock_id = stockSizes.stock_id
join results r on f.farm_id = r.farm_id
join (
  select f.stock_id, r.gate_id
  from results r
  join farm f on r.farm_id = f.farm_id
  group by f.stock_id, r.gate_id
  having sum(r.score = 0) = 0
) FullGates
on FullGates.stock_id = f.stock_id and FullGates.gate_id = r.gate_id
join stock s on s.stock_id = f.stock_id
join origin o on o.origin_id = s.origin_id
group by o.origin_id, s.stock_id

Result:

+-------------+-------------+------------+
| ORIGIN_NAME | STOCK_TITLE | FINALSCORE |
+-------------+-------------+------------+
| US          | P1          |         93 |
| CA          | P3          |         90 |
| MX          | Q4          |        100 |
| MX          | B3          |        100 |
+-------------+-------------+------------+

Let me know if this did the trick.

System design for matching closest registered store based on zip code?

8 votes

I have the following problem.

I am creating a restaurant delivery system.

So restaurants choose the zipcodes they want to deliver. So in Boston, they might choose Either all of Boston or Back Bay (a specific area of Boston with several zip codes .... ).

Basically, the restaurant confirms the areas they are willing to serve by ticking boxes that are described as follows:

- Cambridge (ZIP CODE)
- Boston (all of Boston)
--- Back Bay (covers zip codes: 02...., 02.., 02..)
--- North Boston (covers zip codes: 02145, 021..., 02..., 02..)

Users type in their zipcodes, and I match them to the areas that Restaurants specified.

What is the best way to design such a system? I don't think I am going in the right direction...

is this only for Boston or would it be global? Will you be looking at exact zipcode matches? What if someone enters a zipcode you don't have but it's within the delivery range. I would recommend using longitude / latitude lookups.

This might be a good place to start: https://developers.google.com/maps/articles/phpsqlsearch

BTW: I'm looking to do something very similar and will most likely use the article referenced above :) thanks for helping me too.

MySQL. Creating an index for "OR" queries

8 votes

I have a table of 200k entries with columns of INT's. I want to create an index to make queries faster.

This is the query I would like to execute: SELECT A,B,C,D,E FROM table WHERE A=23 and (B=45 or C=43).

I created the following INDEX's: B, ACD, C, ABC

With the EXPLAIN command I found that MySQL chooses the INDEX: ACD. So I kept populating the table with more values and I realized that MySQL was switching between the INDEX's above (not always the same one).

Since there are many inserts, having various INDEX's will cause performance issues and we can assume that this table is accessed by other queries that require different columns where every INDEX makes sense.

I am aware of the USE INDEX(), but I would like to understand if we should trust MySQL to choose the right index.

Because of the OR in the SQL statement, MySQL is simply getting the first index that includes A, which is ACD.

I came to the conclusion that the way to solve this issue using an INDEX is to make two separate queries. SELECT A,B,C,D,E FROM table WHERE A=23 AND B=45 which will use the INDEX ABC and then SELECT A,B,C,D,E FROM table WHERE A=23 AND C=43 which will use INDEX ACD. This can all be done in one step with (...) UNION (...), which turns out to be quicker and only uses INDEX's.

Why can't MySQL optimize this query?

7 votes

I have a query that is giving me problems and I can't understand why MySQL's query optimizer is behaving the way it is. Here is the background info:

I have 3 tables. Two are relatively small and one is large.

Table 1 (very small, 727 rows):

CREATE TABLE ipa (
ipa_id int(11) NOT NULL AUTO_INCREMENT,
ipa_code int(11) DEFAULT NULL,
ipa_name varchar(100) DEFAULT NULL,
payorcode varchar(2) DEFAULT NULL,
compid int(11) DEFAULT '2'
PRIMARY KEY (ipa_id),
KEY ipa_code (ipa_code) ) ENGINE=MyISAM

Table 2 (smallish, 59455 rows):

CREATE TABLE assign_ipa (
assignid int(11) NOT NULL AUTO_INCREMENT,
ipa_id int(11) NOT NULL,
userid int(11) NOT NULL,
username varchar(20) DEFAULT NULL,
compid int(11) DEFAULT NULL,
PayorCode char(10) DEFAULT NULL
PRIMARY KEY (assignid),
UNIQUE KEY assignid (assignid,ipa_id),
KEY ipa_id (ipa_id)
) ENGINE=MyISAM

Table 3 (large, 24,711,730 rows):

CREATE TABLE master_final (
IPA int(11) DEFAULT NULL,
MbrCt smallint(6) DEFAULT '0',
PayorCode varchar(4) DEFAULT 'WC',
KEY idx_IPA (IPA)
) ENGINE=MyISAM DEFAULT

Now for the query. I'm doing a 3-way join using the first two smaller tables to essentially subset the big table on one of it's indexed values. Basically, I get a list of IDs for a user, SJOnes and query the big file for those IDs.

mysql> explain
SELECT master_final.PayorCode, sum(master_final.Mbrct) AS MbrCt
FROM master_final
INNER JOIN ipa ON ipa.ipa_code = master_final.IPA
INNER JOIN assign_ipa ON ipa.ipa_id = assign_ipa.ipa_id
WHERE assign_ipa.username = 'SJones'
GROUP BY master_final.PayorCode, master_final.ipa\G;
************* 1. row *************
id: 1
select_type: SIMPLE
table: master_final
type: ALL
possible_keys: idx_IPA
key: NULL
key_len: NULL
ref: NULL
rows: 24711730
Extra: Using temporary; Using filesort
************* 2. row *************
id: 1
select_type: SIMPLE
table: ipa
type: ref
possible_keys: PRIMARY,ipa_code
key: ipa_code
key_len: 5
ref: wc_test.master_final.IPA
rows: 1
Extra: Using where
************* 3. row *************
id: 1
select_type: SIMPLE
table: assign_ipa
type: ref
possible_keys: ipa_id
key: ipa_id
key_len: 4
ref: wc_test.ipa.ipa_id
rows: 37
Extra: Using where
3 rows in set (0.00 sec)

This query takes forever (like 30 minutes!). The explain statement tells me why, it's doing a full table scan on the big table even though there is a perfectly good index. It's not using it. I don't understand this. I can look at the query and see that it's only needs to query a couple of IDs from the big table. If I can do it, why can't MySQL's optimizer do it?

To illustrate, here are the IDs associated with 'SJones':

mysql> select username, ipa_id from assign_ipa where username='SJones';
+----------+--------+
| username | ipa_id |
+----------+--------+
| SJones | 688 |
| SJones | 689 |
+----------+--------+
2 rows in set (0.02 sec)

Now, I can rewrite the query substituting the ipa_id values for the username in the where clause. To me this is equivalent to the original query. MySQL sees it differently. If I do this, the optimizer makes use of the index on the big table.

mysql> explain
SELECT master_final.PayorCode, sum(master_final.Mbrct) AS MbrCt
FROM master_final
INNER JOIN ipa ON ipa.ipa_code = master_final.IPA
INNER JOIN assign_ipa ON ipa.ipa_id = assign_ipa.ipa_id
*WHERE assign_ipa.ipa_id in ('688','689')*
GROUP BY master_final.PayorCode, master_final.ipa\G;
************* 1. row *************
id: 1
select_type: SIMPLE
table: ipa
type: range
possible_keys: PRIMARY,ipa_code
key: PRIMARY
key_len: 4
ref: NULL
rows: 2
Extra: Using where; Using temporary; Using filesort
************* 2. row *************
id: 1
select_type: SIMPLE
table: assign_ipa
type: ref
possible_keys: ipa_id
key: ipa_id
key_len: 4
ref: wc_test.ipa.ipa_id
rows: 37
Extra: Using where
************* 3. row *************
id: 1
select_type: SIMPLE
table: master_final
type: ref
possible_keys: idx_IPA
key: idx_IPA
key_len: 5
ref: wc_test.ipa.ipa_code
rows: 34953
Extra: Using where
3 rows in set (0.00 sec)

The only thing I've changed is a where clause that doesn't even directly hit the big table. And yet, the optimizer uses the index 'idx_IPA' on the big table and the full table scan is no longer used. The query when re-written like this is very fast.

OK, that's a lot of background. Now my question. Why should the where clause matter to the optimizer? Either where clause will return the same result set from the smaller table, and yet I'm getting dramatically different results depending on which one I use. Obviously, I want to use the where clause containing the username rather than trying to pass all associated IDs to the query. As written though, this is not possible?

  1. Can someone explain why this is happening?
  2. How might I rewrite my query to avoid the full table scan?

Thanks for sticking with me. I know its a very longish question.

Not quite sure if I'm right, but I think the following is happening here. This:

WHERE assign_ipa.username = 'SJones'

may create a temporary table, since it requires a full table scan. Temporary tables have no indexes, and they tend to slow down things down a lot.

The second case

INNER JOIN ipa ON ipa.ipa_code = master_final.IPA
INNER JOIN assign_ipa ON ipa.ipa_id = assign_ipa.ipa_id
WHERE assign_ipa.ipa_id in ('688','689')

on the other hand allows for joining of indexes, which is fast. Additionally, it can be transformed to

SELECT .... FROM master_final WHERE IDA IN (688, 689) ...

and I think MySQL is doing that, too.

Creating an index on assign_ipa.username may help.

Edit

I rethought the problem and now have a different explanation.

The reason of course is the missing index. This means that MySQL has no clue how large the result of querying assign_ipa would be (MySQL does not store counts), so it starts with the joins first, where it can relay on keys.

That's what row 2 and 3 of explain log tell us.

And after that, it tries to filter the result by assign_ipa.username, which has no key, as stated in row 1.

As soon as there is an index, it filters assign_ipa first, and joins afterwards, using the according indexes.

Can I/O latency cause a simple UPDATE to take seconds in MySQL?

7 votes

My MySQL application is experiencing slow performance when running some UPDATE, INSERT and DELETE queries. In this question, I will only discuss one particular UPDATE, because it's enough to demonstrate the problem:

UPDATE projects SET ring = 5 WHERE id = 1

This UPDATE is usually fast enough, around 0.2ms, but every now and then (enough to be a problem) it takes several seconds. Here's an excerpt from the log (look at the 4th line):

 ~ (0.000282) UPDATE `projects` SET `ring` = 5 WHERE `id` = 1
 ~ (0.000214) UPDATE `projects` SET `ring` = 6 WHERE `id` = 1
 ~ (0.000238) UPDATE `projects` SET `ring` = 7 WHERE `id` = 1
 ~ (3.986502) UPDATE `projects` SET `ring` = 8 WHERE `id` = 1
 ~ (0.000186) UPDATE `projects` SET `ring` = 9 WHERE `id` = 1
 ~ (0.000217) UPDATE `projects` SET `ring` = 0 WHERE `id` = 1
 ~ (0.000162) UPDATE `projects` SET `ring` = 1 WHERE `id` = 1

projects is an InnoDB table with 6 columns of types INT and VARCHAR, 17 rows and an index on id. It happens with other tables too, but here I'm focusing on this one. When trying to solve the problem, I ensured that the queries were all sequential, so this is not a lock issue. The UPDATE above is executed in the context of a transaction. Other information on the server:

  • VPS with 4GB RAM (was 1GB), 12GB free disk space
  • CentoOS 5.8 (was 5.7)
  • MySQL 5.5.10 (was 5.0.x)

The "was" bit above means it didn't work before or after the upgrade.

What I've tried so far, to no avail:

  • Setting innodb_flush_log_at_trx_commit to 0, 1 or 2
  • Setting innodb_locks_unsafe_for_binlog on or off
  • Setting timed_mutexes on or off
  • Changing innodb_flush_method from the default to O_DSYNC or O_DIRECT
  • Increasing innodb_buffer_pool_size from the default to 600M and then to 3000M
  • Increasing innodb_log_file_size from the default to 128M
  • Compiling MySQL from source
  • Running SHOW PROCESSLIST, which informs me that the state is "updating"
  • Running SHOW PROFILE ALL, which says that almost all the time was spent on "updating", and that, within that step, not so much time was spent on CPU cycles and there were many voluntary context switches (like 30)
  • Monitoring SHOW STATUS for changes in Innodb_buffer_pool_pages_dirty. There may be some relation between dirty pages being flushed and the slow queries, but the correlation isn't clear.

Then I decided to check the system's I/O latency with ioping. This is my first VPS, so I was surprised to see this result:

4096 bytes from . (vzfs /dev/vzfs): request=1 time=249.2 ms
4096 bytes from . (vzfs /dev/vzfs): request=2 time=12.3 ms
4096 bytes from . (vzfs /dev/vzfs): request=3 time=110.5 ms
4096 bytes from . (vzfs /dev/vzfs): request=4 time=232.8 ms
4096 bytes from . (vzfs /dev/vzfs): request=5 time=294.4 ms
4096 bytes from . (vzfs /dev/vzfs): request=6 time=704.7 ms
4096 bytes from . (vzfs /dev/vzfs): request=7 time=1115.0 ms
4096 bytes from . (vzfs /dev/vzfs): request=8 time=209.7 ms
4096 bytes from . (vzfs /dev/vzfs): request=9 time=64.2 ms
4096 bytes from . (vzfs /dev/vzfs): request=10 time=396.2 ms

Pretty erratic, I would say.

Having said all of that, I ask:

  1. Can the I/O latency be occasionally killing MySQL performance? I always thought that, when you ran an UPDATE, the thread taking care of that connection wasn't going to flush data to disk or wait for such a flush; it would return immediately and the flushing would be done by another thread at another time.

  2. If it can't be disk I/O, is there anything else I can possibly try, short of renting a dedicated server?

I'm replying to my own question with additional data that I collected based on your answers.

I used two notebooks connected by means of a wireless network. On notebook A, I mounted a directory of notebook B using sshfs. Then on notebook A I started up MySQL specifying that mounted directory as its data directory. This should provide MySQL with a very slow I/O device. MySQL was started with innodb_flush_log_at_trx_commit = 0.

I defined 3 sets of queries, each set consisting of an update and a select query repeated 10,000 times, without explicit transactions. The experiments were:

  • US1SID: update and select on a specific row of the same table. The same row was used in all iterations.
  • US1MID: update and select on a specific row of the same table. The row was a different one in each iteration.
  • US2MID: update and select on rows of different tables. In this case, the table being read by the select didn't change at all during the experiment.

Each set was run twice using a shell script (hence timings are slower than those in my original question), one under normal conditions and the other after executing the following command:

tc qdisc replace dev wlan0 root handle 1:0 netem delay 200ms

The command above adds a mean delay of 200ms when transmitting packets through wlan0.

First, here's the mean time of the top 99% fastest updates and selects, and the bottom 1% updates and selects.

          |        Delay: 0ms        |       Delay: 200ms       |
          | US1SID | US1MID | US2MID | US1SID | US1MID | US2MID |
| top99%u | 0.0064 | 0.0064 | 0.0064 | 0.0063 | 0.0063 | 0.0063 |
| top99%s | 0.0062 | 0.0063 | 0.0063 | 0.0062 | 0.0062 | 0.0062 |
| bot01%u | 1.1834 | 1.2239 | 0.9561 | 1.9461 | 1.7492 | 1.9731 |
| bot01%s | 0.4600 | 0.5391 | 0.3417 | 1.4424 | 1.1557 | 1.6426 |

As is clear, even with really, really poor I/O performance, MySQL manages to execute most queries really fast. But what concerns me the most is the worst cases, so here's another table, showing the 10 slowest queries. A "u" means it was an update, an "s" a select.

|          Delay: 0ms         |          Delay: 200ms          |
| US1SID  | US1MID  | US2MID  | US1SID   | US1MID   | US2MID   |
| 5.443 u | 5.946 u | 5.315 u | 11.500 u | 10.860 u | 11.424 s |
| 5.581 u | 5.954 s | 5.466 u | 11.649 s | 10.995 u | 11.496 s |
| 5.863 s | 6.291 u | 5.658 u | 12.551 s | 11.020 u | 12.221 s |
| 6.192 u | 6.513 u | 5.685 u | 12.893 s | 11.370 s | 12.599 u |
| 6.560 u | 6.521 u | 5.736 u | 13.526 u | 11.387 u | 12.803 u |
| 6.562 u | 6.555 u | 5.743 u | 13.997 s | 11.497 u | 12.920 u |
| 6.872 u | 6.575 u | 5.869 u | 14.662 u | 12.825 u | 13.625 u |
| 6.887 u | 7.908 u | 5.996 u | 19.953 u | 12.860 u | 13.828 s |
| 6.937 u | 8.100 u | 6.330 u | 20.623 u | 14.015 u | 16.292 u |
| 8.665 u | 8.298 u | 6.893 u | 27.102 u | 22.042 s | 17.131 u |

Conclusions:

  1. Poor I/O performance can indeed slow MySQL to a crawl. It's not clear why or when exactly, but it does happen.

  2. The slowing down applies to both selects and updates, with updates suffering more.

  3. For some reason, even selects on a table that wasn't involved in any changes, and which had recently been populated, were also slowed down, as is clear from US2MID above.

  4. As for the test cases proposed by mentatkgs, it seems that updating different rows instead of the same ones does help a little, but doesn't solve the problem.

I guess I will either adapt my software to tolerate such delays or try to move to another provider. Renting a dedicated server is too expensive for this project.

Thank you all for the comments.

Retrieving a specific row depending on a date variable?

6 votes

I have 7 columns that which contain the information of closing times, each for one day. (It goes like VENUE_CLOSE_T_MO, VENUE_CLOSE_T_TU... etc)

How would I, for example choose one of those columns depending on a date variable ($somevariable) which contains a specific date?

For example, if the date variable was Sunday, March 18 22:00, it would choose column VENUE_CLOSE_T_SU.

Thanks for the help everyone!

EDIT (Solution given by TEEZ that solved the issue)

My Date variable is $Start.

And this is the code:

$day_name=strtoupper(date('D',$start));
$day_name=substr($day_name,0,2);
$selectcolumn='VENUE_CLOSE_T_'.$day_name;

So in this case $selectcolumn = VENUE_CLOSE_T_SU

And the echo is then this:

$row[$selectcolumn]

Thanks for all your help again Teez!

first get day name from variable ($somevariable)

$day_name=strtoupper(date('D',$somevariable));

then make query like below for getting column according to day in $somevariable

select concat('VENUE_CLOSE_T_',left($day_name,2)) as datecolumnname  from tableame

EDIT:

OR

you don't need to do this in query if you taking all column in query. just add these lines in php code where you printing data in we page under date column

$day_name=strtoupper(date('D',$somevariable));
$day_name=substr($day_name,0,2);
$selectcolumn='venues.VENUE_CLOSE_T_'.$day_name; 
echo $row[$selectcolumn];

SQL GROUP BY: intervals in continuity?

6 votes

The idea is that say you have the following table.

-------------
| oID | Area|
-------------
| 1 | 5     |
| 2 | 2     |
| 3 | 3     |
| 5 | 3     |
| 6 | 4     |
| 7 | 5     |
-------------

If grouping by continuity is possible this pseudo query

SELECT SUM(Area) FROM sample_table GROUP BY CONTINUITY(oID)

would return

-------------
| SUM(Area) |
-------------
|  10       |
|  12       |
-------------

Where the continuity break arises at oID or rather the lack thereof an entry representing oID 4.

Does such functionality exist within the standard functions of Sql?

There is no such functionality in "standard functions of SQL", but it is possible to get the desired result set by using some tricks.

With the subquery illustrated below we create a virtual field which you can use to GROUP BY in the outer query. The value of this virtual field is incremented each time when there is a gap in the sequence of oID. This way we create an identifier for each of those "data islands":

SELECT  SUM(Area), COUNT(*) AS Count_Rows
FROM    (
        /* @group_enumerator is incremented each time there is a gap in oIDs continuity */
        SELECT  @group_enumerator := @group_enumerator + (@prev_oID != oID - 1) AS group_enumerator,
                @prev_oID := oID AS prev_oID,
                sample_table.*
        FROM    (
                SELECT  @group_enumerator := 0,
                        @prev_oID := -1
                ) vars,
                sample_table
        /* correct order is very important */
        ORDER BY
                oID
        ) q
GROUP BY
        group_enumerator

Test table and data generation:

CREATE TABLE sample_table (oID INT auto_increment, Area INT, PRIMARY KEY(oID));
INSERT INTO sample_table (oID, Area) VALUES (1,5), (2,2), (3,3), (5,3), (6,4), (7,5);

I need to thank Quassnoi for pointing out this trick in my related question ;-)

UPDATE: added test table and data and fixed duplicate column name in example query.

How to find out the tables that take up maximum memory in database?

6 votes

Hi I am new to databases. I am working on huge database and trying to clear up the mess. I want to start by finding the top ten tables that take up highest memory in the whole database. I cannot go by finding memory of each table since there are too many tables. I need the top 10 or 20 tables that take up the maximum space. Any help would be much appreciated. Thank you.

Maybe something like this:

SELECT CONCAT(table_schema, '.', table_name),
       CONCAT(ROUND(table_rows / 1000000, 2), 'M')                                    rows,
       CONCAT(ROUND(data_length / ( 1024 * 1024 * 1024 ), 2), 'G')                    DATA,
       CONCAT(ROUND(index_length / ( 1024 * 1024 * 1024 ), 2), 'G')                   idx,
       CONCAT(ROUND(( data_length + index_length ) / ( 1024 * 1024 * 1024 ), 2), 'G') total_size,
       ROUND(index_length / data_length, 2)                                           idxfrac
FROM   information_schema.TABLES
ORDER  BY data_length + index_length DESC
LIMIT  10;

Reference here

Dynamically creating date periods using MySQL

5 votes

I trying to get the Grape count from dates March 1 - 3.

enter image description here

You will notice that on March 2 - there are no grapes inserted..

I'st possible to show a query from dates March 1, 2 and 3 but showing 0 count for March 2 enter image description here

In this image above only shows dates where there are grapes..

Here is mySQL query

SELECT  `fruitDate` ,  `fruitName` , COUNT( * ) 
FROM  `tbl_fruits` 
WHERE  `fruitName` =  "Grapes"
GROUP BY  `fruitDate

UPDATE 2:

Using this query:

SELECT f.fruitDate, f.fruitName, f1.count FROM tbl_fruits f
    LEFT JOIN (SELECT fruitDate, COUNT(*) as count from tbl_fruits d WHERE d.fruitName='Grapes' GROUP BY d.fruitDate) as f1 ON (f.fruitDate = f1.fruitDate) 
    GROUP BY f.fruitDate

I got this result..but its dsplaying diffrent fruit..something wrong with my query?

enter image description here

Remember there is a dynamically (and a bit ugly) solution to creating a date range that does not require creating a table:

select aDate from (
  select @maxDate - interval (a.a+(10*b.a)+(100*c.a)+(1000*d.a)) day aDate from
  (select 0 as a union all select 1 union all select 2 union all select 3
   union all select 4 union all select 5 union all select 6 union all
   select 7 union all select 8 union all select 9) a, /*10 day range*/
  (select 0 as a union all select 1 union all select 2 union all select 3
   union all select 4 union all select 5 union all select 6 union all
   select 7 union all select 8 union all select 9) b, /*100 day range*/
  (select 0 as a union all select 1 union all select 2 union all select 3
   union all select 4 union all select 5 union all select 6 union all
   select 7 union all select 8 union all select 9) c, /*1000 day range*/
  (select 0 as a union all select 1 union all select 2 union all select 3
   union all select 4 union all select 5 union all select 6 union all
   select 7 union all select 8 union all select 9) d, /*10000 day range*/
  (select @minDate := '2001-01-01', @maxDate := '2002-02-02') e
) f
where aDate between @minDate and @maxDate

Depending on the length of the date range you can reduce the amount of dynamically generated results (10000 days means over 27 years of records each representing one day) by removing tables (d, c, b and a) and removing them from the upper formula. Setting the @minDate and @maxDate variables will allow you to specify the dates between you want to filter the results.

Edit:

I see you're still looking for a solution. Try this:

select c.date, f.fruitName, count(f.fruitName = 'Grapes')
from tbl_calendar c
left join tbl_fruits f
on c.date = f.fruitDate and f.fruitName = 'Grapes'
group by c.date, f.fruitName

If you also want to filter the extra dates from the created table, use this query:

select c.date, f.fruitName, count(f.fruitName = 'Grapes')
from tbl_calendar c
left join tbl_fruits f
on c.date = f.fruitDate and f.fruitName = 'Grapes'
group by c.date, f.fruitName
having c.date between
  (select min(fruitDate) from tbl_fruits) and
  (select max(fruitDate) from tbl_fruits)

Breaking out of a loop when a condition occurs, and avoiding the usage of its preset db value

5 votes

Assuming a student take 6 courses in a semester. All those couses have coures units(int), and depending on the score in each course there are points..

 so a score >=70 will have a point of 5

 <70 and >=60 will have a ponit of 4

and so on. For each course unit and point are multipied together, down the column for each column. Now when the score of a course is not found the grade is 'AR'. Now what i want is for the loops to omit the occurence of AR..i.e not adding the course unit of the course having a grade of 'AR'. But when i run my queries above the units still add to the total course units.

Query4 is used to generate some rows of course_unit and Score

  $query4 = mysql_query("SELECT  c.course_unit, m.score
  FROM    maintable AS m
  INNER JOIN students AS s ON
  m.matric_no = s.matric_no
  INNER JOIN courses AS c ON
  m.course_code = c.course_code
  WHERE m.matric_no = '".$matric_no."'
  AND m.level = '".$level."'")
  or die (mysql_error());

Query3 is used for the summation of the course_units

 $query3 = mysql_query("SELECT  SUM(c.
 course_unit) AS 'TOTAL'
 FROM    maintable AS m
 INNER JOIN students AS s ON
 m.matric_no = s.matric_no
 INNER JOIN courses AS c ON
 m.course_code = c.course_code
 WHERE m.matric_no = '".$matric_no."'
 AND m.level = '".$level."'")
 or die (mysql_error());

Grades in Respect to Score

 while ($row8 = mysql_fetch_assoc
 ($query8)) {
            if ($row8['score'] >= 70) {
              $grade = 'A';
            }
            elseif ($row8['score'] >= 60) {
               $grade = 'B';
            }elseif ($row8['score'] >= 50) {
               $grade = 'C';
            }elseif ($row8['score'] >= 45) {
               $grade = 'D';
            }elseif($row8['score'] >= 40) {
               $grade = 'E';
            }elseif($row8['score'] >= 0) &&
            ($row8['score'] < 40){
               $grade = 'F';
            }else{
               $grade = 'AR';
            }   
     }   

Calculation of the Grade Point

      $grade_point = 0;
      while ($row4 = mysql_fetch_assoc($query4)) {
         if ($row4['score'] >= 70) {
            $score = 5;
          }
          elseif ($row4['score'] >= 60) {
             $score = 4;
          }elseif ($row4['score'] >= 50) {
             $score = 3;
          }elseif ($row4['score'] >= 45) {
             $score = 2;
          }elseif($row4['score'] >= 40) {
             $score = 1;
          }elseif($row4['score'] >= 0 AND                       $row4['score'] < 40) {
             $score = 0;
          }else{
             $score = 0;
          } 

          $grade_point += $score * $row4['course_unit'];

      }

I have added

  if ( $grade == 'AR' )
  {
       continue;
  }

But the calculations are still the same. It adds the course_unit value of any course having

$grade == 'AR' .

I'll be most delighted with you answers. Thanks very much.

UPDATE

I have being able to solve the grade piont part by adding

     elseif($row4['score'] >= 0 AND                       $row4['score'] < 40) {
             $score = 0;
          }else{
             $score = 0;
          }

This sets both the occurences of a score between 0 and 39 to zero and also the default score of <0 (i.e AR) to zero. But it still set's the value of the courses having a grade of AR and a score of -1 to the default respective values of the course_unit.

I think this problem is being cause due to the fact that the course_unit are preloaded from the database. Any help?

Courses Table Stucture
=================

course_id
course_code
course_title
course_unit

I'll be most delighted with your answers. Thank you in anticipation.

Is it as simple as adding "AND NOT 'AR'" to your SELECT SUM statement?

Or... if your DB values are coming in as AR, why can't you use PHP is_int() in your loop? That would allow you to still assign 0 for F, and just skip over any non integer values being sent from your DB.