Best mysql questions in April 2012

Why does SELECT results differ between mysql and sqlite?

19 votes

I'm re-asking this question in a simplified and expanded manner.

Consider these sql statements:

create table foo (id INT, score INT);

insert into foo values (106, 4);
insert into foo values (107, 3);
insert into foo values (106, 5);
insert into foo values (107, 5);

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

Using sqlite, the select statement returns:

id          avg1      
----------  ----------
106         4.5       
107         4.0       

and mysql returns:

+------+--------+
| id   | avg1   |
+------+--------+
|  106 | 4.5000 |
+------+--------+

As far as I can tell, mysql's results are correct, and sqlite's are incorrect. I tried to cast to real with sqlite as in the following but it returns two records still:

select T1.id, cast(avg(cast(T1.score as real)) as real) avg1
from foo T1
group by T1.id
having not exists (
    select T2.id, cast(avg(cast(T2.score as real)) as real) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

Why does sqlite return two records?

Quick update:

I ran the statement against the latest sqlite version (3.7.11) and still get two records.

Another update:

I sent an email to sqlite-users@sqlite.org about the issue.

Myself, I've been playing with VDBE and found something interesting. I split the execution trace of each loop of not exists (one for each avg group).

To have three avg groups, I used the following statements:

create table foo (id VARCHAR(1), score INT);

insert into foo values ('c', 1.5);
insert into foo values ('b', 5.0);
insert into foo values ('a', 4.0);
insert into foo values ('a', 5.0);

PRAGMA vdbe_listing = 1;
PRAGMA vdbe_trace=ON;

select avg(score) avg1
from foo
group by id
having not exists (
    select avg(T2.score) avg2
    from foo T2
    group by T2.id
    having avg2 > avg1);

We clearly see that somehow what should be r:4.5 has become i:5:

enter image description here

I'm now trying to see why that is.

Final edit:

So I've been playing enough with the sqlite source code. I understand the beast much better now, although I'll let the original developer sort it out as he seems to already be doing it:

http://www.sqlite.org/src/info/430bb59d79

Interestingly, to me at least, it seems that the newer versions (some times after the version I'm using) supports inserting multiple records as used in a test case added in the aforementioned commit:

CREATE TABLE t34(x,y);
INSERT INTO t34 VALUES(106,4), (107,3), (106,5), (107,5);  

I tried to mess with some variants of query.

It seems, like sqlite has errors in using of previous declared fields in a nested HAVING expressions.

In your example avg1 under second having is always equal to 5.0

Look:

select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
    SELECT 1 AS col1 GROUP BY col1 HAVING avg1 = 5.0);

This one returns nothing, but execution of the following query returns both records:

...
having not exists (
    SELECT 1 AS col1 GROUP BY col1 HAVING avg1 <> 5.0);

I can not find any similar bug at sqlite tickets list.

PHP: How to find connections between users so I can create a closed friend circle?

14 votes

Hi all,

First of all, I'm not trying to create a social network, facebook is big enough! (comic) I've chosen this question as example because it fits exactly on what I'm trying to do.

Imagine that I have in MySQL a users table and a user_connections table with 'friend requests'. If so, it would be something like this:

Users Table:

userid  username
1       John
2       Amalia
3       Stewie
4       Stuart
5       Ron
6       Harry
7       Joseph
8       Tiago
9       Anselmo
10      Maria


User Connections Table:

userid_request  userid_accepted
2               3
7               2
3               4
7               8
5               6
4               5
8               9
4               7
9               10
6               1
10              7
1               2

Now I want to find circles between friends and create a structure array and put that circle on the database (none of the arrays can include the same friends that another has already).

Return Example:

    // First Circle of Friends
    Circleid => 1
    CircleStructure => Array(
        1 => 2,
        2 => 3,
        3 => 4,
        4 => 5,
        5 => 6,
        6 => 1,
    )
    // Second Circle of Friends
    Circleid => 2
    CircleStructure => Array(
        7 => 8,
        8 => 9,
        9 => 10,
        10 => 7,
    )

I'm trying to think of an algorithm to do that, but I think it will take a lot of processing time because it would randomly search the database until it 'closes' a circle.

PS: The minimum structure length of a circle is 3 connections and the limit is 100 (so the daemon doesn't search the entire database)

EDIT:

I've think on something like this:

function browse_user($userget='random',$users_history=array()){

    $user = user::get($userget);

    $users_history[] = $user['userid'];

    $connections = user::connection::getByUser($user['userid']);
    foreach($connections as $connection){
        $userid = ($connection['userid_request']!=$user['userid']) ? $connection['userid_request'] : $connection['userid_accepted'];

        // Start the circle array
        if(in_array($userid,$users_history)) return array($user['userid'] => $userid);

        $res = browse_user($userid, $users_history);

        if($res!==false){
            // Continue the circle array
            return $res + array($user['userid'] => $userid);
        }
    }

  return false;
}

while(true){

    $res = browse_user();

    // Yuppy, friend circle found!
    if($res!==false){
            user::circle::create($res);
    }

    // Start from scratch again!
}

The problem with this function is that it could search the entire database without finding the biggest circle, or the best match.

I've tried all the solutions here (even mine at my question) and neither of them was good. On large amount of data they simply hang because they exceed the memory limit (my script spended 128MB with a 3 level limit).

After a few hours thinking, i've made a new script from scratch to search connections that uses 1MB of memory and with 10 levels limit, it analyses all the circles, remove duplicated circles and it didn't take a complete 1 second !!!! (I WAS LIKE: I'm a Genious!)

Once my program is complete (i'm awaiting the patenting situation) i will put here the code of that part of searching.

Thanks to All!

How to optimize this in MySQL?

11 votes

I have table structure as displayed in first table.

And want to fetch Both Male and Female Counts in a single query so that request will go only for one time onto the server.

How to optimize this in MySQL ?

This is what you need to do:

select gender,
       count(case when age between 0 and 20 then 1 else null end) Age_0_20,
       count(case when age between 21 and 40 then 1 else null end) Age_21_40
from yourtable
group by gender

Adjust accordingly :)

Update, with clarifications

Note that COUNT aggregate function only counts non-null values. Thus, the else values in the case must be NULL. The When value returns 1 but it could just be any non-null value.

Some people implement this by using SUM:

select gender,
       sum(case when age between 0 and 20 then 1 else 0 end) Age_0_20,
       sum(case when age between 21 and 40 then 1 else 0 end) Age_21_40
from yourtable
group by gender

The result is going to be absolutely the same.

Mysql help needed to optimize group by sub query

8 votes

Hi I seem to be a little stuck. Its quite a straight forward query.

If I run the quieries seperately it is not that slow but when I combine them its very slow.

I'm not sure how to optimise it. Any help would be much appreciated. Im basically only wanting to show multiple refunds. So where faultid exists more than once.

SELECT 
  r.* 
FROM faultrefunds_v2 r
WHERE r.id IN 
( 
    SELECT
    r1.id 
    FROM 
    faultrefunds_v2 r1 
    GROUP BY faultid
    HAVING count(r1.faultid) > 1
);

The results from explain are have been attached as an image

enter image description here

I guess, this qualifies rather as a re-writing than as an optimisation, but this is what I would try instead, anyway:

SELECT 
  r.* 
FROM faultrefunds_v2 r
WHERE EXISTS (
  SELECT *
  FROM faultrefunds_v2 r1 
  WHERE r1.faultid = r.faultid
    AND r1.id <> r.id
);

mysql - if value = 0 shorten where statement

8 votes

I wonder if it's possible to shorten query depending on some variable value in elegant way.

For example: I have value named $var = 0 and I would like to send a query that looks like this:

$query = "SELECT id, name, quantity FROM products WHERE quantity > 100";

But whan the $var != 1 I'd like to send a query like this:

$query = "SELECT id, name, quantity FROM products WHERE quantity > 100 AND id = '$var'";

So depending on value of $var I want to execute one of queries. They differ only with last expression. I found two possible solutions but they are not elegant and I dont like them at all.

One is made in php:

if ( $var == 0 ) {
  $query_without_second_expression
} else {
  $query_with_second_expression
}

Second is made in mysql:

SELECT WHEN '$var' <> 0 THEN id, name, quantity 
FROM products WHERE quantity > 100 AND id = '$var' ELSE id, name, quantity 
FROM products WHERE quantity > 100 END

but i dont like it - each idea doubles queries in some whay. Can I do something like this?

SELECT id, name, quantity 
FROM products WHERE quantity > 100 
CASE WHEN $var <> 0 THEN AND id = '$var' 

It's much shorter, and adds part of query if needed. Of course real query is much more complicated and shorter statement would be really expected. Anyone has an idea?

If I understand well..

$query = "SELECT id, name, quantity FROM products WHERE quantity > 100";
if ( $var != 0 ) {
  $query .= " AND id = '$var'";
}

do you like it?

Best practice for storing database password

8 votes

I am developing a custom server application that will access a database. I need to decide where I will store the credentials (and to address) to that server.

A common solution is to put the credential in a config file. However, I do not want a compromised server to mean that the hacker has access to the DB (which is hosted on a separate server).

I could store the credentials in the environment, but that is just security through obscurity. Mr. Evil can just look in the environment to find it.

Someone suggested encryption. However, if I store the key in the executable, a quick de-compile (we are using Java) and I am still doomed.

I also want to avoid having to enter a paraphrase every time I start the server.

Any suggestions? I feel like I'm missing something simple.

Thanks

I don't think you're missing something simple. Either the server in question can connect to the database without your help, in which case it has to have the credentials; or it cannot connect without your supplying them. You can take various steps like the ones you've listed to make it harder for a compromised server to reveal the credentials to the database, but at the end of the day, if it has to have those credentials and supply them to the DB server to connect, they'll have to be stored on it somewhere — or at least, it will have to have some means of getting them, and so will be hackable in that sense.

Your best bet is to focus on finding out about intrusions (compromised servers) as quickly as possible, keeping good off-site, off-line backups for the worst case, putting up lots of barriers to intrusion in the first place, etc.

How expensive are MySQL events?

8 votes

In my web app I use two recurring events that "clean up" one of the tables in the database, both executed every 15 minutes or so.

My question is, could this lead to problems in performance in the future? Because I've read somewhere -I don't recall where exactly- that MySQL events are supposed to be scheduled to run once a month or so. Thing is, this same events keep the table in a pretty reduced size (as they delete records older than 15~ minutes), maybe this compensates the frequency of their execution, right?

Also, is it better to have one big MySQL event or many small ones if they are be called in the same frequency?

I don't think there's a performance indication in the monthly base just more of a suggestion of what to do with it. So i think you're ok with doing your cleanup using the events.

In the end the documentation suggets that the events are

Conceptually, this is similar to the idea of the Unix crontab (also known as a “cron job”) or the Windows Task Scheduler.

And the concept for those is that you can run a task every minute if you wish to do so.


On the second part of that question:

Serialize or spread it up. If you split them up into many events that will run at the same time you will create spikes of possibly very high cpu usage that might slow down the application while processing the events.

So either pack everything into one event so it runs in succession or spread the single events up so they execute on different times during the 15 minutes timeframe. Personally i think the first one is to be preferred, pack them up into a single event as then they are guaranteed to run in succession, even if a single one of them keeps running longer than usual.

The same goes for cronjobs. If you shedule 30 long-running exports at a single time your application is going to fail miserably during that timeslot (learned that the hard way).

Simple SQL Join Understanding?

8 votes

Possible Duplicate:
Explicit vs implicit SQL joins
Is there a difference using join andselect from multi-tables?
SQL Joins: Future of the SQL ANSI Standard (where vs join)?

What is the difference between JOIN and declaring multiple tables in the FROM clause?

Such as:

SELECT *
FROM  table1 AS t1
      , table2 AS t2
WHERE t1.id = t2.id

Compared to:

SELECT *
FROM table1 AS t1
INNER JOIN table2 AS t2 ON t2.id = t1.id

The second version, with the explicit JOIN and join condition is standardized SQL.

The implicit join syntax with a WHERE clause is deprecated syntax (or, rather, considered bad) - partially because it is easy to forget the WHERE clause and cause a Cartesian product.

Are quotes around tables and columns in a MySQL query really necessary?

8 votes

I have a short question about mysql query.

What is correct?

SELECT * FROM Persons WHERE Year='1965'

Or

SELECT * FROM `Persons` WHERE `Year` = '1965'

Is this a personal choice or is this something what is really wrong?

What if you have a table named table, or a column named where. These are reserved keywords. If you used those in your queries without backticks, they'd produce an invalid query (Of course, using reserved keywords is bad practice).

SELECT something FROM table WHERE where = 1;

vs.

SELECT something FROM `table` WHERE `where` = 1;

MySQL - find events that occured x-times during given period

7 votes

Let's say I have following table:

CREATE TABLE `occurences` (
  `object_id` int(10) NOT NULL,
  `seen_timestamp` int(10) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8

which contains ID of object (not unique, it repeats) and timestamp when this object ID has been observed.

Observation is running 24/7 and inserts every occurrence of object ID with current timestamp.

Now I want to write query to select all object IDs which has been seen during any 10 minute period at least 7 times.

It should function like detection of intrusion.

Similar algorithm is used in denyhost script which checks for invalid SSH logins. If find configured number of occurrences during configured time period, it blocks IP.

Any good suggestion?

This should work:

SET @num_occurences = 7; -- how many occurences should occur in the interval
SET @max_period = 10; -- your interval in seconds

SELECT offset_start.object_id FROM 
(SELECT @rownum_start := @rownum_start+1 AS idx, object_id, seen_timestamp 
 FROM occurences, (SELECT @rownum_start:=0) r ORDER BY object_id ASC, seen_timestamp ASC) offset_start
JOIN
(SELECT @rownum_end := @rownum_end + 1 AS idx, object_id, seen_timestamp 
 FROM occurences, (SELECT @rownum_end:=0) r ORDER BY object_id ASC, seen_timestamp ASC) offset_end
   ON offset_start.object_id = offset_end.object_id 
  AND offset_start.idx + @num_occurences - 1 = offset_end.idx
  AND offset_end.seen_timestamp - offset_start.seen_timestamp <= @max_period
GROUP BY offset_start.object_id;

You can move @num_occurences and @num_occurences to your code and set these as parameters of your statement. Depending on your client you can also move the the initialisation of @rownum_start and @rownum_end in front of the query, that might improve the query performance (you should test that nontheless, just a gut feeling looking at the explain of both versions)

Here's how it works:

It selects the entire table twice and joins each row of offset_start with the row in offset_end which has an offset of @num_occurences. (This is done using the @rownum_* variables to create the index of each row, simulating row_number() functionality known from other rdbms).
Then it just checks whether the two rows refer to the same object_id and satisfy the period requirements.
Since this is done for every occurence row, the object_id would be returned multiple times if the number of occurences is actually larger than @max_occurences, so it's grouped in the end to make the returned object_ids unique

1.3M queries/Hour. How would you construct the queries?

7 votes

I have an online iphone turnbased game, with lots of games running at the same time. I'm in the process of optimizing the code, since both me and the server have crashed today.

This is the setup:

Right now I have one table, "matches" (70 fields of data for each row. The structure), that keep track of all the active matches. Every 7 seconds, the iphone will connect, download all the matches in the "matches" table that he/she is active in, and update the UI in the iphone.

This worked great until about 1,000 people downloaded the game and played. The server crashed.

So to optimize, I figure I can create a new table called "matches_needs_update". This table have 2 rows; name and id. The "id" is the same as the match in the "matches" table. When a match is updated, it's put in this table.

Now, instead for search through the whole "matches" table, the query just check if the player have any matches that need to be updated, and then get those matches from the "matches" table.

My question is twofold:

  1. Is this the optimal solution?
  2. If a player is active in, say 10 matches, is there a good way to get those 10 matches from the "matches" table at the same time, or do I need a for loop doing 10 queries, one for each match:

    "SELECT * FROM matches WHERE id = ?"

Thanks in advance

I suggest APC...

...as you're on PHP, and I assume you're doing this from a single mysql database,

It's easy to install, and will be default from PHP 6 onwards.

Keep this 1 table in memory and it will fly.

Php mysql, pdo and query: why multiple 'query' don't work (in this case)?

7 votes

Here's my sql request:

$sql
    = 'CREATE TEMPORARY TABLE tmp '
    . 'SELECT * FROM '.$table.' '
    . 'WHERE id=:id; '
    . 'ALTER TABLE tmp drop ID; '
    . 'INSERT INTO '.$table.' '
    . 'SELECT 0,tmp.* FROM tmp; '
    . 'SET @last=LAST_INSERT_ID(); '
    . 'DROP TABLE tmp;'
    . 'SELECT @last; ';
    $stmt = $this->bd->execQuery($sql, array(':id'=>101));
    echo "1 -> = "; var_export($stmt); echo "\n";
    $stmt = $stmt->fetch(PDO::FETCH_OBJ);
    echo "2 -> = "; var_export($stmt); echo "\n";

The dump talk by itself: the query works (I've checked).

sql =
'CREATE TEMPORARY TABLE tmp SELECT * FROM categorie WHERE id=:id; ALTER TABLE tmp drop ID; INSERT INTO categorie SELECT 0,tmp.* FROM tmp; SET @last=LAST_INSERT_ID(); DROP TABLE tmp;SELECT @last; '
params = array (
  ':id' => 101,
)
1 -> = PDOStatement::__set_state(array(
   'queryString' => 'CREATE TEMPORARY TABLE tmp SELECT * FROM categorie WHERE id=:id; ALTER TABLE tmp drop ID; INSERT INTO categorie SELECT 0,tmp.* FROM tmp; SET @last=LAST_INSERT_ID(); DROP TABLE tmp;SELECT @last; ',
))
2 -> = false

If I do it "by hand" on the console line it works too (sorry for the looong line of code):

mysql> CREATE TEMPORARY TABLE tmp SELECT * FROM categorie WHERE id=101; ALTER TABLE tmp drop ID; INSERT INTO categorie SELECT 0,tmp.* FROM tmp; SET @last=LAST_INSERT_ID(); DROP TABLE tmp;SELECT @last; 
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

Query OK, 1 row affected (0.07 sec)
Records: 1  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

+-------+
| @last |
+-------+
|   141 |
+-------+
1 row in set (0.00 sec)

mysql> 

Here's my code that is being executed.

public function execQuery($sql, $tab=array())
{
    $stmt = self::$_pdo->prepare($sql);
    if ($stmt===false) {
        throw new Exception(
            'Erreur prepare '.$sql.
            ' = '.var_export(self::$_pdo->errorInfo(), true)
        );
    }
    foreach ($tab as $key=>$valeur) {
        $stmt->bindValue($key, $valeur);
    }
    if ($stmt->execute()===false) {
        throw new Exception(
            "Erreur execution de la requete :\n\"".$sql."\"\n".
            "Paramètres de la requete :\n\"".var_export($tab, true)."\"\n".
            "Details de l'erreur : \n".var_export(self::$_pdo->errorInfo(), true)
        );
    }
    return $stmt;
}

How can I do to get the last inserted value in one shot (= make what I did work)?

As mentioned in my comment above, whilst it doesn't answer your question of how you issue multiple SQL commands in one query from PHP, one workaround would be to put your SQL in a stored procedure using a prepared statement:

DELIMITER ;;

CREATE PROCEDURE copyRecord(TableName VARCHAR(20), id INT) BEGIN
  -- prevent SQL injection
  SET TableName = CONCAT('`', REPLACE(TableName, '`', '``'), '`');
  SET @id = id;

  SET @sql = CONCAT('
    CREATE TEMPORARY TABLE tmp SELECT * FROM ', TableName, ' WHERE id = ?
  ');
  PREPARE stmt FROM @sql;
  EXECUTE stmt USING @id;
  DEALLOCATE PREPARE stmt;

  ALTER TABLE tmp drop ID;

  SET @sql = CONCAT('
    INSERT INTO ', TableName, ' SELECT 0,tmp.* FROM tmp
  ');
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
  DEALLOCATE PREPARE stmt;

  DROP TABLE tmp;

  SET @sql = NULL;
  SET @id  = NULL;
  SELECT LAST_INSERT_ID();
END;;

DELIMITER ;

From PHP you would then simply invoke the SQL command CALL copyRecord('categorie', 101).

Writing a MySQL database find/replace script in PHP

6 votes

I'm currently in the process of redeveloping my bespoke website to a WordPress-driven CMS.

The website I've been working on is simply my existing URL plus /dev/, i.e. http://my.website.com/dev/.

I'll be moving this website to http://my.website.com over the weekend, and as such will need to remove all references to the /dev/ URL.

What I'd like to do is, basically, a "find and replace" for /dev on my database. I can see exactly which tables have this value in, but naturally as per a WordPress install, quite a lot of these fields are serialized data - which a straightforward dump > open with notepad++ > find & replace will break.

The code I've developed for this purpose is here:

<?php

$look_at[] = array( "table" => "wp_options", "fields" => array( "option_value" ), "id_field" => "option_id" );
$look_at[] = array( "table" => "wp_postmeta", "fields" => array( "meta_value" ), "id_field" => "meta_id" );
$look_at[] = array( "table" => "wp_posts", "fields" => array( "post_content", "guid" ), "id_field" => "ID" );
$look_at[] = array( "table" => "wp_sfmeta", "fields" => array( "meta_value" ), "id_field" => "meta_id" );
$look_at[] = array( "table" => "wp_sfoptions", "fields" => array( "option_value" ), "id_field" => "option_id" );
$look_at[] = array( "table" => "wp_sferrorlog", "fields" => array( "error_text" ), "id_field" => "id" );

for ( $i = 0; $i < sizeof ( $look_at ); $i++ ) {
    foreach( $look_at[$i]["fields"] as $field ) {

        $sql = 'SELECT `' . $field . '`, `' . $look_at[$i]["id_field"] . '` FROM  `' . $look_at[$i]["table"] . '`;';
        $res = mysql_query( $sql );

        while ( $row = mysql_fetch_assoc( $res ) ) {

            $table = $look_at[$i]["table"];
            $id_field = $look_at[$i]["id_field"];
            $old_val = $row[$field];
            $id = $row[$id_field];

            $unserialized_value = @unserialize( $old_val );

            if ( $old_val === 'b:0;' || $unserialized_value !== false )
                $new_val = serialize( str_replace( array( "/dev/", "/dev" ), array( "/", "" ), $unserialized_value ) );
            else
                $new_val = str_replace( array( "/dev/", "/dev" ), array( "/", "" ), $old_val );

            $update_array[] = array( "id_field" => $id_field, "id" => $id, "table" => $table, "key" => $key, "old_val" => $old_val, "new_val" => $new_val );

        }

    }
}

for ( $i = 0; $i < sizeof( $update_array ); $i++ ) {
    if ( $update_array[$i]["old_val"] !== $update_array[$i]["new_val"] )
        $updated_sql .= 'UPDATE ' . $update_array[$i]["table"] . ' SET `' . $update_array[$i]["key"] . '` = \'' . $update_array[$i]["new_val"] . '\' WHERE `' . $update_array[$i]["id_field"] . '` = \'' . $update_array[$i]["id"] . '\';';
}

mysql_query( $updated_sql );

?>

An example of the serialized data:

a:6:{s:5:"width";s:3:"400";s:6:"height";s:3:"530";s:14:"hwstring_small";s:22:"height='96' width='72'";s:4:"file";s:30:"2011/12/Amazonas-English-1.jpg";s:5:"sizes";a:13:{s:9:"thumbnail";a:3:{s:4:"file";s:30:"Amazonas-English-1-125x165.jpg";s:5:"width";s:3:"125";s:6:"height";s:3:"165";}s:6:"medium";a:3:{s:4:"file";s:30:"Amazonas-English-1-339x450.jpg";s:5:"width";s:3:"339";s:6:"height";s:3:"450";}s:5:"large";s:0:"";s:14:"post-thumbnail";a:3:{s:4:"file";s:30:"Amazonas-English-1-125x165.jpg";s:5:"width";s:3:"125";s:6:"height";s:3:"165";}s:23:"indexleft-species-thumb";a:3:{s:4:"file";s:30:"Amazonas-English-1-200x265.jpg";s:5:"width";s:3:"200";s:6:"height";s:3:"265";}s:13:"species-thumb";a:3:{s:4:"file";s:30:"Amazonas-English-1-288x381.jpg";s:5:"width";s:3:"288";s:6:"height";s:3:"381";}s:17:"indexheader-thumb";a:5:{s:4:"file";s:30:"Amazonas-English-1-400x300.jpg";s:5:"width";s:3:"400";s:6:"height";s:3:"300";s:4:"path";s:38:"2011/12/Amazonas-English-1-400x300.jpg";s:3:"url";s:88:"http://www.xxxxxxxxxxx.com/dev/wp-content/uploads/2011/12/Amazonas-English-1-400x300.jpg";}s:14:"random-thumb-1";a:3:{s:4:"file";s:28:"Amazonas-English-1-56x75.jpg";s:5:"width";s:2:"56";s:6:"height";s:2:"75";}s:14:"random-thumb-2";a:3:{s:4:"file";s:29:"Amazonas-English-1-75x100.jpg";s:5:"width";s:2:"75";s:6:"height";s:3:"100";}s:14:"random-thumb-3";a:3:{s:4:"file";s:29:"Amazonas-English-1-94x125.jpg";s:5:"width";s:2:"94";s:6:"height";s:3:"125";}s:14:"random-thumb-4";a:3:{s:4:"file";s:30:"Amazonas-English-1-113x150.jpg";s:5:"width";s:3:"113";s:6:"height";s:3:"150";}s:14:"random-thumb-5";a:3:{s:4:"file";s:30:"Amazonas-English-1-132x175.jpg";s:5:"width";s:3:"132";s:6:"height";s:3:"175";}s:13:"d4p-bbp-thumb";s:0:"";}s:10:"image_meta";a:10:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";}}

ADDITIONAL EDIT

Unfortunately, there are other instances of /dev/ in other serialized arrays, such as this example:

'a:1:{i:0;a:5:{s:4:"type";s:5:"image";s:3:"loc";s:107:"/home/xxxxx/domains/xxxxxxxxx.com/public_html/dev/wp-content/sp-resources/forum-image-uploads/matt/2012/01/";...

Or,

a:1:{i:0;a:5:{s:4:"data";s:88:"Your search - <b>link:http://www.xxxxxxxxx.com/dev/</b> - did not match any documents. ";...

As such, I don't think a simple preg_replace (or callback) will do the trick, but I guess an advanced one might?


My questions are:

  1. Is there a simpler way of doing this?!
  2. Will the above code run into any problems?

I'm awful at forseeing problems with my code (bad programmer, I do apologise) and as such a little apprehensive about running tests with this code.


FINAL EDIT: WORKING CODE

Because my SQL dump was nearly 100mb, I had to use WAMP with unlimited memory.

<?php
    error_reporting(E_ALL);
    ini_set('display_errors', 'On');
    ini_set('memory_limit', '-1');

    $handle = @fopen("amend-this.sql", "r");
    if ($handle) {
        while (($buffer = fgets($handle, 4096)) !== false) {
          $newLine = preg_replace_callback('@s:(\d+)(:\\\"[^"]*www.seriouslyfish\.com)/dev@', create_function('$matches', 'return \'s:\'.($matches[1] - 4).$matches[2];'), $buffer);
          $newLine = preg_replace_callback('@s:(\d+)(:\\\"[^\\\"]*/home/sfish/domains/seriouslyfish\.com/public_html)/dev@', create_function('$matches', 'return \'s:\'.($matches[1] - 4).$matches[2];'), $newLine);
          $newLine = str_replace('http://dunc.seriouslyfish.com/dev/', 'http://www.seriouslyfish.com/', $newLine);
          $newLine = str_replace('http://www.seriouslyfish.com/dev/', 'http://www.seriouslyfish.com/', $newLine);
          $newLine = str_replace('/dev', '', $newLine);
          file_put_contents( "amended.sql", $newLine, FILE_APPEND );
        }
        fclose($handle);
    }
?>

This code put my new SQL file into the same directory (X:\wamp\www) for me to manipulate further.

I had a few issues with data-repetition, and there were 67 instances of /dev still in the file for some reason but I used Notepad++ and WinMerge to sort all of this out and in the end it took me around 45 minutes to search/replace a database of over 90 million characters.

When I had the same problem I ran a mysqldump of the database, then opened in a text editor and just search/replaced the values, before using the SQL to create the new database. Quite simple, surprisingly fast,especially for a one off.

As pointed out, you have the problem with serialized data, so you could do a similar thing with a simple PHP file:

<?php
$handle = @fopen("/tmp/dump.sql", "r");
if ($handle) {
    while (($buffer = fgets($handle, 4096)) !== false) {
      $newLine = preg_replace_callback('@s:(\d+)(:\\\"[^"]*xxxxxxxxxxx\.com)/dev@', create_function('$matches', 'return \'s:\'.($matches[1] - 4).$matches[2];'), $buffer);
      $newLine = preg_replace_callback('@s:(\d+)(:\\\"[^\\\"]*xxxxxxxxxxx\.com/public_html)/dev@', create_function('$matches', 'return \'s:\'.($matches[1] - 4).$matches[2];'), $newLine);
      $newLine = str_replace('http://www.xxxxxxxxxxx.com/dev/', 'http://www.xxxxxxxxxxx.com/', $newLine);
      echo $newLine;
    }
    fclose($handle);
}
?>

Note: this works on a mysqldump, if you're testing, you'll need to remove the \\\ before the "s in the preg_replace_callbacks - this is just mysqldump escaping quotes.

Also Note: There are two preg replaces (one for normal URLs and one for server paths), and one str replace for standard URLs left over.

Select photos by multiple tags

6 votes

I have three MySQL tables - photos, tags and tagsphotos - and m:n relationship between photos and tags.

Photos:     id | filename | ...
Tags:       id | name
Tagsphotos: photo | tag

I want to select all photos with this condition:

(tagged as "dirty" AND tagged as "road") AND (tagged as "light.front" OR tagged as "light.side") AND (tagged as "perspective.two-point")

...which means that I want to find all pictures with dirty road, in two-point perspective and either with side or front light.

How can I do it? Thanks.

I think you're going to have to join the tags table to the photos table four times... pretty ugly.

SELECT Photos.*
FROM
  Photos
  JOIN (
    Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
  ) t1 ON (t1.photo = Photos.id)
  JOIN (
    Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
  ) t2 ON (t2.photo = Photos.id)
  JOIN (
    Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
  ) t3 ON (t3.photo = Photos.id)
  JOIN (
    Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
  ) t4 ON (t4.photo = Photos.id)
WHERE
      (t1.name = 'dirty' AND t2.name = 'road')
  AND (t3.name = 'light.front' OR t3.name = 'light.side')
  AND (t4.name = 'perspective.two-point')

Subqueries would probably be faster:

SELECT *
FROM Photos
WHERE
  Photos.id IN (
    SELECT Tagspohotos.photo
    FROM Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
    WHERE Tags.name = 'dirty'
  )
  AND Photos.id IN (
    SELECT Tagspohotos.photo
    FROM Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
    WHERE Tags.name = 'road'
  )
  AND Photos.id IN (
    SELECT Tagspohotos.photo
    FROM Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
    WHERE Tags.name = 'light.front' OR Tags.name = 'light.side'
  )
  AND Photos.id IN (
    SELECT Tagspohotos.photo
    FROM Tagsphotos JOIN Tags ON (Tags.id = Tagsphotos.tag)
    WHERE Tags.name = 'perspective.two-point'
  )

MySQL: why varchar(254) and not varchar(255)?

5 votes

Why do so many developers set varchar to 254 and not 255 when creating MySQL tables?

Proof that this happens: mysql varchar 254

varchar fields require n+1 bytes for fields less than or equal to 255 and required n+2 bytes for fields > 255

http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html

It should be set to 255, I'm assuming developers think they will save an extra byte from 254, but 255 is the standard

Database build process management

5 votes

What options exists to manage database scripts and do a new development for database:

For example, the database used by a number of applications and there are a number of developers working with database, what will be the best options to maintain database up to date with the last changes and what should be the process of deployment changes to production

I see two options:

  1. Microsoft visual studio has a database project, so all database scripts should be add in the project and database can be rebuild from visual studio
  2. Restore database from backup and apply only new scripts to database

What another options exists? How can I manage database development, what is the best practices? what will be advantages and disadvantages of options I write above? How to maintain new sql scripts?

I understand then source control system should be used, but with DB scripts it's not so easy as with application.

I believe it will be no universal solution, but at least I am interesting in DB developers opinion how it's implemented in your company.

Liquibase is IMHO the best tool. It's brutally simple in its approach, which is one of the reasons it works so well.

You can read up on the site how it works, but basically it creates and manages a simple table that stores a hash of each script to determine if it has run a script of not. There's pre- and post- sql too, and you can bypass on conditions... it does pretty much everything you'd want or need. It also has maven integration, so it can seamlessly become part of your build.

I used it very successfully on a large (8 developers) project and now I wouldn't use anything else.

And it's free!

Cannot connect to mysql with 127.0.0.1

5 votes

With the following code I can connect to mysql: mysql_connect("localhost","username","");

But if I change localhost to 127.0.0.1 I get the following error:

Can't connect to MySQL server on '127.0.0.1' (13)

Why doesn't it work with 127.0.0.1?

localhost is special cased and uses UNIX sockets instead of TCP/IP. 127.0.0.1 doesn't get that special handling.

See the documentation:

On Unix, MySQL programs treat the host name localhost specially, in a way that is likely different from what you expect compared to other network-based programs. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file. This occurs even if a --port or -P option is given to specify a port number. To ensure that the client makes a TCP/IP connection to the local server, use --host or -h to specify a host name value of 127.0.0.1, or the IP address or name of the local server. You can also specify the connection protocol explicitly, even for localhost, by using the --protocol=TCP option.

If it doesn't work when you use TCP/IP then the database probably isn't listening on the network. This is generally a good thing as it enhances security (not that listening on 127.0.0.1 exposes any problems, but listening on all interfaces gives more opportunity for attacks).

If you really want to allow connections via the network, then see skip-networking.

Is it harmful to store javascript and css in a database?

5 votes

Ok so I'm trying to store user custom css and javascript in mysql database to be used later the head section on the page, so is it a good idea to store css and javascript in a database ? and if not what is the safe way to do this ?

I'm using wordpress and using esc_js() on javascript code I noticed it adds a backslash before quotes, and adds \n instead of new lines, so is it enough for javascript ?

and thanks in advance.

Short answer: Yes, it's fine as long as it's people you trust.

Long answer:

Make sure you 100% trust your users. There's a lot of exploits that can be done through CSS/JS, so make sure you're only allowing people you completely trust to do this. Even if you tried to implement filtering of some kind, people would find ways around.

Depending on how easily you could figure out the file name, I might consider storing the data in a file. After all, the database is unnecessary overhead if you can readily determine the file name. If you can't determine the file name easily, I just go ahead and store it in the database. (Or if you're planning on storing any meta data or version information or anything, I would go ahead and go for a DB.)

Performance bulk-loading data from an XML file to MySQL

3 votes

Should an import of 80GB's of XML data into MySQL take more than 5 days to complete?

I'm currently importing an XML file that is roughly 80GB in size, the code I'm using is in this gist and while everything is working properly it's been running for almost 5 straight days and its not even close to being done ...

The average table size is roughly:

Data size: 4.5GB
Index size: 3.2GB
Avg. Row Length: 245
Number Rows: 20,000,000

Let me know if more info is needed!

Server Specs:

Note this is a linode VPS

Intel Xeon Processor L5520 - Quad Core - 2.27GHZ 4GB Total Ram

XML Sample

https://gist.github.com/2510267

Thanks!


After researching more regarding this matter this seems to be average, I found this answer which describes ways to improve the import rate.

One thing which will help a great deal is to commit less frequently, rather than once-per-row. I would suggest starting with one commit per several hundred rows, and tuning from there.

Also, the thing you're doing right now where you do an existence check -- dump that; it's greatly increasing the number of queries you need to run. Instead, use ON DUPLICATE KEY UPDATE (a MySQL extension, not standards-compliant) to make a duplicate INSERT automatically do the right thing.

Finally, consider building your tool to convert from XML into a textual form suitable for use with the mysqlimport tool, and using that bulk loader instead. This will cleanly separate the time needed for XML parsing from the time needed for database ingestion, and also speed the database import itself by using tools designed for the purpose (rather than INSERT or UPDATE commands, mysqlimport uses a specialized LOAD DATA INFILE extension).