Best mysql questions in October 2010

How does Facebook achieve good performance?

43 votes

Almost everyone has a Facebook account, even people who are not familiar with the Internet. With millions people actively using Facebook, updating their status, replying to messages, uploading photos and so on, how is Facebook's page still loading very fast?

I was told that Facebook was built using only PHP and MySQL, so how can Facebook's performance be so good?

Note: This needs to be updated.

  1. Facebook uses HipHop, which converts PHP into C++ code (which is then compiled into much more efficient machine code than actual PHP).

  2. Facebook has data distributed across many, many servers. For example, they also use Hadoop clusters for some of their data storage.

  3. memcached :)

Node.JS and MySQL drivers

9 votes

Is there a Node.JS Driver for MySQL that is commonly used other than http://github.com/masuidrive/node-mysql ? This driver is unstable (says the creator). It seems like there is not much activity with node.js database drivers. Is there a reason for this or is it just because Node.JS is so young?

Here are some options:

What is wrong with this MySQL Query?

8 votes

It's 12:30am and I have been coding for 9 hours straight. I really need to get this project done, but MySQL is messing with my deadline. Could you examine this snippet for me and see if you can find out what is wrong?

PHP/MySQL Query

$q = $this->db->query("SELECT * FROM bans WHERE ip='".$ip."'");

Keeps returning the following error...

MYSQL Error [Oct 6th, 2010 11:31pm CDT]
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '* FROM bans WHERE ip='206.53.90.231'' at line 1 (1064)

I do not see anything wrong with the query. I've even tried different methods of including the variable $ip but with no avail.

EDIT:
Just to add in here, the ip column in my database is a varchar(255).

EDIT 2:
Here is the whole affected code. Keep in mind that this is all in a class. If I'm missing something, let me know.

Line from another Function

if($this->isBanned($_SERVER['REMOTE_ADDR'])===true) { return json_encode(array('error'=>'You are banned from this ShoutBox.')); }

Affected Function

function isBanned($ip) {
    $q = $this->db->query("SELECT * FROM bans WHERE ip='".$ip."'"); $num = $this->db->affected_rows;
    if($num>0) { $row = $this->db->fetch_array($q); if(($row['expires'] < time()) && ($row['expires'] !== 0)) { $this->unbanUser($ip,'internal'); return false; } return true; } return false;
}

unbanUser function

function unbanUser($ip,$t='box') {
    $q = $this->db->query("SELECT * FROM bans WHERE ip='".$ip."'"); $num = $this->db->affected_rows; if($num>0) { $q = $this->db->query("DELETE * FROM bans WHERE ip='".$ip."'"); 
    return (($t=='box') ? json_encode(array('status'=>'removed')) : true); } else { return (($t=='box') ? json_encode(array('error'=>'Unable to locate the user.')) : true); }
}

I think it may be It is your DELETE statement which is causing the error.

Remove the * after the DELETE and it should be fine.

Need help understanding MySQL injection

8 votes

From http://www.tizag.com/mysqlTutorial/mysql-php-sql-injection.php I got:

SQL injection refers to the act of someone inserting a MySQL statement to be run on your database without your knowledge. Injection usually occurs when you ask a user for input, like their name, and instead of a name they give you a MySQL statement that you will unknowingly run on your database.

I read the whole article but I still have some major issues understand what it is and how can it be done.

In the first example, what will they actually see?

As far as i understood, if I actually echo $name, the will see all the names because it will always "be true" am I correct?

The other thing I dont understand is whether THE MySQL injection problem is solved with mysql_real_escape_string(), there has to be more to it.

What I really dont get is that mysql_real_escape_string() is made to solve that issue, why isn“t this done automatically, I mean is there a reason you have to add every time mysql_real_escape_string(), is there cases when you should use it and thats why they dont make this automatic?

I hope the question is clear enough, maybe my luck of understanding of the topic makes the question confusing so please ask for any clarification if necessary!

Thanks in advance!!

MySQL won't escape automatically, because you build the query string yourself. For example:

$query = 'SELECT * FROM users WHERE name="' . $name . '"';

You just pass the raw string stored in $query, which is open to SQL injection. For example, if $name is [something" OR "1=1] your query string ends up being:

$query = 'SELECT * FROM users WHERE name="something" OR "1=1"

That would return every user from the user table. Which is why you need to escape values. However, if you use PDO, it is done for you if you use the binding functionality. It's a 2 step process, preparing the querying, then "binding" the data/variables to the placeholders. In PDO, your query string would look something like this:

$query = 'SELECT * FROM users WHERE name=":name"';
$bindings = array('name'=>'something');
prepare($query);
execute($bindings);

Then, things are automatically escaped for you.

Does the order of fields in a WHERE clause affect performance in MySQL?

8 votes

I have two indexed fields in a table - type and userid (individual indexes, not a composite).

types field values are very limited (let's say it is only 0 or 1), so 50% of table records have the same type. userid values, on the other hand, come from a much larger set, so the amount of records with the same userid is small.

Will any of these queries run faster than the other:

select * from table where type=1 and userid=5
select * from table where userid=5 and type=1

Also if both fields were not indexed, would it change the behavior?

SQL was designed to be a declarative language, not a procedural one. So the query optimizer should not consider the order of the where clause predicates in determining how to apply them.

I'm probably going to waaaay over-simplify the following discussion of an SQL query optimizer. I wrote one years ago, along these lines (it was tons of fun!). If you really want to dig into modern query optimization, see Dan Tow's SQL Tuning, from O'Reilly.

In a simple SQL query optimizer, the SQL statement first gets compiled into a tree of relational algebra operations. These operations each take one or more tables as input and produce another table as output. Scan is a sequential scan that reads a table in from the database. Sort produces a sorted table. Select produces a table whose rows are selected from another table according to some selection condition. Project produces a table with only certain columns of another table. Cross Product takes two tables and produces an output table composed of every conceivable pairing of their rows.

Confusingly, the SQL SELECT clause is compiled into a relational algebra Project, while the WHERE clause turns into a relational algebra Select. The FROM clause turns into one or more Joins, each taking two tables in and producing one table out. There are other relational algebra operations involving set union, intersection, difference, and membership, but let's keep this simple.

This tree really needs to be optimized. For example, if you have:

select E.name, D.name 
from Employee E, Department D 
where E.id = 123456 and E.dept_id = D.dept_id

with 5,000 employees in 500 departments, executing an unoptimized tree will blindly produce all possible combinations of one Employee and one Department (a Cross Product) and then Select out just the one combination that was needed. The Scan of Employee will produce a 5,000 record table, the Scan of Department will produce a 500 record table, the Cross Product of those two tables will produce a 2,500,000 record table, and the Select on E.id will take that 2,500,000 record table and discard all but one, the record that was wanted.

[Real query processors will try not to materialize all of these intermediate tables in memory of course.]

So the query optimizer walks the tree and applies various optimizations. One is to break up each Select into a chain of Selects, one for each of the original Select's top level conditions, the ones and-ed together. (This is called "conjunctive normal form".) Then the individual smaller Selects are moved around in the tree and merged with other relational algebra operations to form more efficient ones.

In the above example, the optimizer first pushes the Select on E.id = 123456 down below the expensive Cross Product operation. This means the Cross Product just produces 500 rows (one for each combination of that employee and one department). Then the top level Select for E.dept_id = D.dept_id filters out the 499 unwanted rows. Not bad.

If there's an an index on Employee's id field, then the optimizer can combine the Scan of Employee with the Select on E.id = 123456 to form a fast index Lookup. This means that only one Employee row is read into memory from disk instead of 5,000. Things are looking up.

The final major optimization is to take the Select on E.dept_id = D.dept_id and combine it with the Cross Product. This turns it into a relational algebra Equijoin operation. This doesn't do much by itself. But if there's an index on Department.dept_id, then the lower level sequential Scan of Department feeding the Equijoin can be turned into a very fast index Lookup of our one employee's Department record.

Lesser optimizations involve pushing Project operations down. If the top level of your query just needs E.name and D.name, and the conditions need E.id, E.dept_id, and D.dept_id, then the Scan operations don't have to build intermediate tables with all the other columns, saving space during the query execution. We've turned a horribly slow query into two index lookups and not much else.

Getting more towards the original question, let's say you've got:

select E.name 
from Employee E 
where E.age > 21 and E.state = 'Delaware'

The unoptimized relational algebra tree, when executed, would Scan in the 5,000 employees and produce, say, the 126 ones in Delaware who are older than 21. The query optimizer also has some rough idea of the values in the database. It might know that the E.state column has the 14 states that the company has locations in, and something about the E.age distributions. So first it sees if either field is indexed. If E.state is, it makes sense to use that index to just pick out the small number of employees the query processor suspects are in Delaware based on its last computed statistics. If only E.age is, the query processor likely decides that it's not worth it, since 96% of all employees are 22 and older. So if E.state is indexes, our query processor breaks the Select and merges the E.state = 'Delaware' with the Scan to turn it into a much more efficient Index Scan.

Let's say in this example that there are no indexes on E.state and E.age. The combined Select operation takes place after the sequential "Scan" of Employee. Does it make a difference which condition in the Select is done first? Probably not a great deal. The query processor might leave them in the original order in the SQL statement, or it might be a bit more sophisticated and look at the expected expense. From the statistics, it would again find that the E.state = 'Delaware' condition should be much more highly selective, so it would reverse the conditions and do that first, so that there are only 126 E.age > 21 comparisons instead of 5,000. Or it might realize that string equality comparisons are much more expensive than integer compares and leave the order alone.

At any rate, all this is very complex and your syntactic condition order is very unlikely to make a difference. I wouldn't worry about it unless you have a real performance problem and your database vendor uses the condition order as a hint.

8 votes

Is their any GUI-interface for MySQL, like Microsoft SQL Server 2005 Server Management Studio provides? Because basically I am operating MySQL with command prompt interface.

Mysql query browser:

http://dev.mysql.com/downloads/gui-tools/5.0.html

How to efficiently find the closest locations nearby a given location.

7 votes

I'm sorry for the vague question title... its going to take a bit of explaining...

Basically I'm making a script where a load of business are loaded into a mySQL database with a latitude and longitude. Then I am supplying that script with a latitude an longitude (of the end user) and the script has to calculate the distance from the supplied lat/long to EACH of the entries it gets from the database and order them in order of nearest to furthest.

I only realistically need about 10 or 20 "nearest" results, but I can't think of anyway to do this other than to get all the results from the database and run the function on each of them and then array sort.

This is what I have already:

<?php

function getDistance($point1, $point2){

    $radius      = 3958;      // Earth's radius (miles)
    $pi          = 3.1415926;
    $deg_per_rad = 57.29578;  // Number of degrees/radian (for conversion)

    $distance = ($radius * $pi * sqrt(
                ($point1['lat'] - $point2['lat'])
                * ($point1['lat'] - $point2['lat'])
                + cos($point1['lat'] / $deg_per_rad)  // Convert these to
                * cos($point2['lat'] / $deg_per_rad)  // radians for cos()
                * ($point1['long'] - $point2['long'])
                * ($point1['long'] - $point2['long'])
        ) / 180);

    $distance = round($distance,1);
    return $distance;  // Returned using the units used for $radius.
}

include("../includes/application_top.php");

$lat = (is_numeric($_GET['lat'])) ? $_GET['lat'] : 0;
$long = (is_numeric($_GET['long'])) ? $_GET['long'] : 0;

$startPoint = array("lat"=>$lat,"long"=>$long);

$sql = "SELECT * FROM mellow_listings WHERE active=1"; 
$result = mysql_query($sql);

while($row = mysql_fetch_array($result)){
    $thedistance = getDistance($startPoint,array("lat"=>$row['lat'],"long"=>$row['long']));
    $data[] = array('id' => $row['id'],
                    'name' => $row['name'],
                    'description' => $row['description'],
                    'lat' => $row['lat'],
                    'long' => $row['long'],
                    'address1' => $row['address1'],
                    'address2' => $row['address2'],
                    'county' => $row['county'],
                    'postcode' => strtoupper($row['postcode']),
                    'phone' => $row['phone'],
                    'email' => $row['email'],
                    'web' => $row['web'],
                    'distance' => $thedistance);
}

// integrate google local search
$url = "http://ajax.googleapis.com/ajax/services/search/local?";
$url .= "q=Off+licence";    // query
$url .= "&v=1.0";           // version number
$url .= "&rsz=8";           // number of results
$url .= "&key=ABQIAAAAtG"
        ."Pcon1WB3b0oiqER"
        ."FZ-TRQgsWYVg721Z"
        ."IDPMPlc4-CwM9Xt"
        ."FBSTZxHDVqCffQ2"
        ."W6Lr4bm1_zXeYoQ"; // api key
$url .= "&sll=".$lat.",".$long;

// sendRequest
// note how referer is set manually
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, /* url */);
$body = curl_exec($ch);
curl_close($ch);

// now, process the JSON string
$json = json_decode($body, true);

foreach($json['responseData']['results'] as $array){

    $thedistance = getDistance($startPoint,array("lat"=>$array['lat'],"long"=>$array['lng']));
    $data[] = array('id' => '999',
                    'name' => $array['title'],
                    'description' => '',
                    'lat' => $array['lat'],
                    'long' => $array['lng'],
                    'address1' => $array['streetAddress'],
                    'address2' => $array['city'],
                    'county' => $array['region'],
                    'postcode' => '',
                    'phone' => $array['phoneNumbers'][0],
                    'email' => '',
                    'web' => $array['url'],
                    'distance' => $thedistance);

}

// sort the array
foreach ($data as $key => $row) {
$id[$key] = $row['id'];
$distance[$key] = $row['distance'];
}

array_multisort($distance, SORT_ASC, $data); 

header("Content-type: text/xml"); 


echo '<?xml version="1.0" encoding="UTF-8"?>'."\n";
echo '<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">'."\n";
echo '<plist version="1.0">'."\n";
echo '<array>'."\n";

for($i = 0; isset($distance[$i]); $i++){
    //echo $data[$i]['id']." -> ".$distance[$i]."<br />";
    echo '<dict>'."\n";
        foreach($data[$i] as $key => $val){
            echo '<key><![CDATA['.$key.']]></key>'."\n";
            echo '<string><![CDATA['.htmlspecialchars_decode($val, ENT_QUOTES).']]></string>'."\n";
        }
    echo '</dict>'."\n";
}

echo '</array>'."\n";
echo '</plist>'."\n";
?>

Now, this runs fast enough with only 2 or 3 businesses in the database, but I'm currently loading 5k businesses into the database and I'm worried that its going to be incredibly slow running this for EACH entry? What do you think?

Its not the kind of data I could cache either, as the likelihood of two users having the same lat/long is liable to be incredibly rare, and therefore wouldn't help.

What can I do about this?

Thanks for any help and any suggestions. They're all much appreciated.

Option 1: Do the calculation on the database by switching to a database that supports GeoIP.

Option 2: Do the calculation on the database: you're using MySQL, so the following stored procedure should help

CREATE FUNCTION distance (latA double, lonA double, latB double, LonB double)
    RETURNS double DETERMINISTIC
BEGIN
    SET @RlatA = radians(latA);
    SET @RlonA = radians(lonA);
    SET @RlatB = radians(latB);
    SET @RlonB = radians(LonB);
    SET @deltaLat = @RlatA - @RlatB;
    SET @deltaLon = @RlonA - @RlonB;
    SET @d = SIN(@deltaLat/2) * SIN(@deltaLat/2) +
    COS(@RlatA) * COS(@RlatB) * SIN(@deltaLon/2)*SIN(@deltaLon/2);
    RETURN 2 * ASIN(SQRT(@d)) * 6371.01;
END//

EDIT

If you have an index on latitude and longitude in your database, you can reduce the number of calculations that need to be calculated by working out an initial bounding box in PHP ($minLat, $maxLat, $minLong and $maxLong), and limiting the rows to a subset of your entries based on that (WHERE latitude BETWEEN $minLat AND $maxLat AND longitude BETWEEN $minLong AND $maxLong). Then MySQL only needs to execute the distance calculation for that subset of rows.

FURTHER EDIT (as an explanation for the previous edit)

If you're simply using the SQL statement provided by Jonathon (or a stored procedure to calculate the distance) then SQL still has to look through every record in your database, and to calculate the distance for every record in your database before it can decide whether to return that row or discard it.

Because the calculation is relatively slow to execute, it would be better if you could reduce the set of rows that need to be calculated, eliminating rows that will clearly fall outside of the required distance, so that we're only executing the expensive calculation for a smaller number of rows.

If you consider that what you're doing is basically drawing a circle on a map, centred on your initial point, and with a radius of distance; then the formula simply identifies which rows fall within that circle... but it still has to checking every single row.

Using a bounding box is like drawing a square on the map first with the left, right, top and bottom edges at the appropriate distance from our centre point. Our circle will then be drawn within that box, with the Northmost, Eastmost, Southmost and Westmost points on the circle touching the borders of the box. Some rows will fall outside that box, so SQL doesn't even bother trying to calculate the distance for those rows. It only calculates the distance for those rows that fall within the bounding box to see if they fall within the circle as well.

Within PHP, we can use a very simple calculation that works out the minimum and maximum latitude and longitude based on our distance, then set those values in the WHERE clause of your SQL statement. This is effectively our box, and anything that falls outside of that is automatically discarded without any need to actually calculate its distance.

There's a good explanation of this (with PHP code) on the Movable Type website that should be essential reading for anybody planning to do any GeoPositioning work in PHP.

Can I save credit card secret code in the database ?

7 votes

I need to save the credit card numbers and secret codes of users in the database in plain text ( consensus behind obviously ) for automatic operation made from the server.

Is there some problems ?

What do I need to be aware of?

Most credit card processing agreements that I have seen do not allow you to store the code from the back of the card.

There are other security implications of storing plain text credit card numbers, but storing the code is usually specifically disallowed by your agreement. You will need to read yours to make sure you can do that.

As for storing the credit card number, that is also usually a very bad idea. If your database is compromised, you will be held liable and it could cost you a lot of money.

Unless you have a very good reason to store the credit card number and have a very good team working on security, I would not recommend storing any credit card data.

phpmyadmin alternative (ajax, excel-like)

7 votes

(if i searched for "phpmyadmin and excel", i get answers like "hot wo import excel-files"....) i searched for a phpmyadmin-alternative that works faster with ajax and i want to edit once fields directly like EXCEL! click n edit via ajax!! i dont like to click the row and say "now edit", then edit, and click "ok", return and new loading to edit only one field :-(

i hope you have answers for me :-)

Thnaks comod!

Adminer (previously known as phpMinAdmin) is not bad. They also have a specialized DB editing interface called Adminer Editor.

Implementing a random chat among people

6 votes

My idea is to make a website, where people could registry and search for a people to talk. They can choose people from certain country, genre, with certain age and so on.

Yeah, I know there is a lot of websites like this, but I want to implement this, because it looks really challenging.

Can you give me ideas how could I implement this using PHP + MYSQL + Jquery(Ajax)? I am neither a beginner nor advanced with these things.

So, how should this work? One person clicks search button, this person is put in database that he searches for somebody to talk, so what's next? I also want to be able to allow people to talk with a few people at the same moment.

I am not asking for a code or something, just ideas how to code it, no code needed.

Thank you.

I don't think that a synchronous, blocking programming language like PHP is the right platform for such an application. It were much wiser to choose an asynchronous, non-blocking language like JavaScript. This has the great advantage that you may use Long Polling which will improve the chatting experience in your application.

Thus I recommend implementing this using NodeJS. You may want to look at an implementation of a simple chat in node.

SQL statement to get all customers with no orders

6 votes

I have a typical Persons table and an Orders table defined in such a way that I can do JOIN query as the following to return Orders for all Persons.

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo
FROM Persons
INNER JOIN Orders
ON Persons.id=Orders.Person_id

The question is, how do I write a statement that would return all Persons with NO Orders?

I'm using mysql.

Thank all in advance.

You may want to use LEFT JOIN and IS NULL:

SELECT     Persons.LastName, Persons.FirstName
FROM       Persons
LEFT JOIN  Orders ON Persons.id = Orders.Person_id
WHERE      Orders.Person_id IS NULL;

The result of a left join always contains all records of the "left" table (Persons), even if the join-condition does not find any matching record in the "right" table (Orders). When there is no match, the columns of the "right" table will NULL in the result set.

High Number of MySQL Temporary Disk Tables

6 votes

We have noticed that MySQL is reporting a very high number of temporary disk tables (over 10,000) this is reported by Server Density. Were trying to understand a bit more about this.

  • Why are temporary disk tables created by MySQL?
  • What impact do they have on performance?
  • Are they ever removed by MySQL or will this number just increase?

Temp tables can be created for lots of reasons. Any select operation which has a large data set and requires sorting will be written into one. Actual temp tables created by queries directly (TEMPORARY table type) are done on a per-connection basis, so if you've got a script with 50 connections each doing the same temporary table, thats 50 sets of on-disk temp files for them.

Disk-based i/o is the most expensive part of a DBMS, generally, so if these tables are for large data sets, you're probably limiting DB performance to that of your i/o system. But generally, by just existing, they're only chewing up disk space and not much else.

Temp tables for sorting purposes should clean themselves up when the query completes. Temp table of the 'TEMPORARY' type will clean themselves up when the connection they're attached to is closed. If you're using persistent connections, then the TEMPORARY tables will stick around until you (or a program) DROPs them manually.

How can I find the Unix process that owns a local Sleeping MySQL connection?

5 votes

I'm fighting a 'Too many connections' problem with my MySQL process and I've got to the point when mysqladmin processlist -uroot -pXXXXX results in:

+------+------------+-----------+------------+---------+------+-------+------------------+
| Id   | User       | Host      | db         | Command | Time | State | Info             |
+------+------------+-----------+------------+---------+------+-------+------------------+
| 842  | svcControl | localhost | svcObjects | Sleep   | 1772 |       |                  |
| 875  | svcControl | localhost | svcObjects | Sleep   | 1773 |       |                  |
| 884  | svcControl | localhost | svcObjects | Sleep   | 1770 |       |                  |
| 896  | svcControl | localhost | svcObjects | Sleep   | 1540 |       |                  |
| 951  | svcControl | localhost | svcObjects | Sleep   | 709  |       |                  |
| 1023 | svcControl | localhost | svcObjects | Sleep   | 708  |       |                  |
| 1063 | svcControl | localhost | svcObjects | Sleep   | 708  |       |                  |
| 1069 | svcControl | localhost | svcObjects | Sleep   | 708  |       |                  |
| 1075 | svcControl | localhost | svcObjects | Sleep   | 707  |       |                  |
| 1083 | svcControl | localhost | svcObjects | Sleep   | 707  |       |                  |
| 1091 | svcControl | localhost | svcObjects | Sleep   | 706  |       |                  |
| 1097 | svcControl | localhost | svcObjects | Sleep   | 706  |       |                  |
| 1107 | svcControl | localhost | svcObjects | Sleep   | 705  |       |                  |
| 1112 | svcControl | localhost | svcObjects | Sleep   | 702  |       |                  |
| 1120 | svcControl | localhost | svcObjects | Sleep   | 704  |       |                  |
| 1127 | svcControl | localhost | svcObjects | Sleep   | 704  |       |                  |
| 1136 | svcControl | localhost | svcObjects | Sleep   | 704  |       |                  |
| 1141 | svcControl | localhost | svcObjects | Sleep   | 644  |       |                  |
| 1202 | svcControl | localhost | svcObjects | Sleep   | 268  |       |                  |
| 1222 | svcView    | localhost | svcObjects | Sleep   | 8    |       |                  |
| 1224 | svcControl | localhost | svcObjects | Sleep   | 280  |       |                  |
| 1225 | svcControl | localhost | svcObjects | Sleep   | 8    |       |                  |
| 1265 | root       | localhost |            | Query   | 0    |       | show processlist |
+------+------------+-----------+------------+---------+------+-------+------------------+

So, clearly, I'm leaking connections somewhere, but I can't figure out where. I've been looking for sources that tell me how to get from a connection ID to a Unix process ID, but most of the approaches expect the connections to be from remote servers, allowing you to use netstat to debug them. Either that, or they require you to modify your codebase to log each attempted connection, which seems a little like overkill.

So... are there any approaches that anyone knows of? Any files that this sleeping connection will have open, so that I can use fuser, for instance? Or... is it possible that these are zombie connections that have no owning process?

In response to answers below

If I run ps -ef | grep mysql, I see:

root      5960     1  0 Oct14 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --user=mysql
mysql     6007  5960  0 Oct14 ?        00:24:12 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --socket=/var/lib/mysql/mysql.sock

Running the suggest commands, I get:

# fuser -u /var/lib/mysql/mysql.sock
/var/lib/mysql/mysql.sock:  6007(mysql)

# lsof -p 5960
COMMAND    PID USER   FD   TYPE DEVICE     SIZE    NODE NAME
mysqld_sa 5960 root  cwd    DIR    8,1     4096       2 /
mysqld_sa 5960 root  rtd    DIR    8,1     4096       2 /
mysqld_sa 5960 root  txt    REG    8,1   735004  117216 /bin/bash
mysqld_sa 5960 root  mem    REG    8,1    50848   90123 /lib/libnss_files-2.5.so
mysqld_sa 5960 root  mem    REG    8,1   129832   87812 /lib/ld-2.5.so
mysqld_sa 5960 root  mem    REG    8,1  1689388   90858 /lib/libc-2.5.so
mysqld_sa 5960 root  mem    REG    8,1    20668   90874 /lib/libdl-2.5.so
mysqld_sa 5960 root  mem    REG    8,1    13276   91815 /lib/libtermcap.so.2.0.8
mysqld_sa 5960 root  mem    REG 253,18 56418144  656494 /usr/lib/locale/locale-archive
mysqld_sa 5960 root  mem    REG 253,18    25462 1573763 /usr/lib/gconv/gconv-modules.cache
mysqld_sa 5960 root    0r   CHR    1,3             1432 /dev/null
mysqld_sa 5960 root    1w   CHR    1,3             1432 /dev/null
mysqld_sa 5960 root    2w   CHR    1,3             1432 /dev/null
mysqld_sa 5960 root  255r   REG 253,18    13077 1181028 /usr/bin/mysqld_safe

lsof -p 6007
COMMAND  PID  USER   FD   TYPE     DEVICE      SIZE     NODE NAME
mysqld  6007 mysql  cwd    DIR     253,19      4096    33005 /var/lib/mysql
mysqld  6007 mysql  rtd    DIR        8,1      4096        2 /
mysqld  6007 mysql  txt    REG     253,18   7413588  1998921 /usr/libexec/mysqld
mysqld  6007 mysql  mem    REG        8,1   1296932    91720 /lib/libcrypto.so.0.9.8e
mysqld  6007 mysql  mem    REG     253,18    612180  1514570 /usr/lib/libkrb5.so.3.3
mysqld  6007 mysql  mem    REG        8,1    293108    91722 /lib/libssl.so.0.9.8e
mysqld  6007 mysql  mem    REG        8,1   3200288    87758 /lib/libnss_ldap-2.5.so
mysqld  6007 mysql  mem    REG        8,1     21948    90121 /lib/libnss_dns-2.5.so
mysqld  6007 mysql  mem    REG        8,1    129832    87812 /lib/ld-2.5.so
mysqld  6007 mysql  mem    REG        8,1   1689388    90858 /lib/libc-2.5.so
mysqld  6007 mysql  mem    REG        8,1    216544    90877 /lib/libm-2.5.so
mysqld  6007 mysql  mem    REG        8,1     20668    90874 /lib/libdl-2.5.so
mysqld  6007 mysql  mem    REG        8,1    137908    90866 /lib/libpthread-2.5.so
mysqld  6007 mysql  mem    REG     253,18     75284  1514584 /usr/lib/libz.so.1.2.3
mysqld  6007 mysql  mem    REG        8,1     48156    90867 /lib/librt-2.5.so
mysqld  6007 mysql  mem    REG        8,1     50848    90123 /lib/libnss_files-2.5.so
mysqld  6007 mysql  mem    REG        8,1    245376    90881 /lib/libsepol.so.1
mysqld  6007 mysql  mem    REG        8,1     93508    91676 /lib/libselinux.so.1
mysqld  6007 mysql  mem    REG        8,1     46636    90847 /lib/libgcc_s-4.1.2-20080825.so.1
mysqld  6007 mysql  mem    REG     253,18    936908  1514611 /usr/lib/libstdc++.so.6.0.8
mysqld  6007 mysql  mem    REG        8,1    109740    90873 /lib/libnsl-2.5.so
mysqld  6007 mysql  mem    REG        8,1     80636    90879 /lib/libresolv-2.5.so
mysqld  6007 mysql  mem    REG        8,1      7748    91684 /lib/libcom_err.so.2.1
mysqld  6007 mysql  mem    REG     253,18    190712  1514583 /usr/lib/libgssapi_krb5.so.2.2
mysqld  6007 mysql  mem    REG     253,18    157304  1514569 /usr/lib/libk5crypto.so.3.1
mysqld  6007 mysql  mem    REG        8,1      8072    90878 /lib/libkeyutils-1.2.so
mysqld  6007 mysql  mem    REG     253,18     33712  1509918 /usr/lib/libkrb5support.so.0.1
mysqld  6007 mysql  mem    REG        8,1     45432    90876 /lib/libcrypt-2.5.so
mysqld  6007 mysql    0r   CHR        1,3               1432 /dev/null
mysqld  6007 mysql    1w   REG     253,19    251337   163863 /var/log/mysqld.log
mysqld  6007 mysql    2w   REG     253,19    251337   163863 /var/log/mysqld.log
mysqld  6007 mysql    3u  IPv4   26825288                TCP monstermunch.ssd.hursley.ibm.com:59850->hurgsa.hursley.uk.ibm.com:ldap (CLOSE_WAIT)
mysqld  6007 mysql    4uW  REG     253,19 161480704    33142 /var/lib/mysql/ibdata1
mysqld  6007 mysql    5u   REG     253,16         0       13 /tmp/ibmydRNj (deleted)
mysqld  6007 mysql    6u   REG     253,16        20       14 /tmp/ib59yuxj (deleted)
mysqld  6007 mysql    7u   REG     253,16         0       15 /tmp/ib0Vp8gj (deleted)
mysqld  6007 mysql    8u   REG     253,16         0       16 /tmp/ibjkCG1i (deleted)
mysqld  6007 mysql    9uW  REG     253,19   5242880    33143 /var/lib/mysql/ib_logfile0
mysqld  6007 mysql   10uW  REG     253,19   5242880    33144 /var/lib/mysql/ib_logfile1
mysqld  6007 mysql   11u  IPv4   26825294                TCP *:mysql (LISTEN)
mysqld  6007 mysql   12u   REG     253,16         0       17 /tmp/ibKpCJ1i (deleted)
mysqld  6007 mysql   13u  unix 0xea432900           26825295 /var/lib/mysql/mysql.sock
mysqld  6007 mysql   14u   REG     253,19     30720    65557 <DB Table>
mysqld  6007 mysql   15u  unix 0xf5188c80           30430332 /var/lib/mysql/mysql.sock
mysqld  6007 mysql   16u   REG     253,19      3072    98468 <DB Table>
mysqld  6007 mysql   17u  unix 0xf5925680           30433174 /var/lib/mysql/mysql.sock
mysqld  6007 mysql   18u   REG     253,19      1024    65605 <DB Table>
mysqld  6007 mysql   19u  unix 0xf5188880           30430336 /var/lib/mysql/mysql.sock
mysqld  6007 mysql   20u   REG     253,19        40    98469 <DB Table>
mysqld  6007 mysql   21u  unix 0xf5925280           30433176 /var/lib/mysql/mysql.sock
mysqld  6007 mysql   22u   REG     253,19     39344    65558 <DB Table>
mysqld  6007 mysql   23u   REG     253,19      1024    65623 <DB Table>
mysqld  6007 mysql   24u   REG     253,19         0    65624 <DB Table>
mysqld  6007 mysql   25u  unix 0xf51f6880           30430339 /var/lib/mysql/mysql.sock

In the output above, <DB Table> is my edit.

I may be missing something, but I don't see anything there that is a process of mine holding open a connection... unless those deleted /tmp files are key.

If the connections were made over a local socket, you might have some luck with sudo fuser -u /tmp/mysql.sock. You could also find mysql's process ID (let's call it $MYSQL_PID) and then run sudo lsof -p $MYSQL_PID, which will include all open sockets.

How to migrate a CSV file to Sqlite3 (or MySQL)? - Python

5 votes

I'm using Python in order to save the data row by row... but this is extremely slow!

The CSV contains 70million lines, and with my script I can just store 1thousand a second.


This is what my script looks like

reader = csv.reader(open('test_results.csv', 'r'))
for row in reader:
    TestResult(type=row[0], name=row[1], result=row[2]).save()

I reckon that for testing I might have to consider MySQL or PostgreSQL.

Any idea or tips? This is the first time I deal with such massive volumes of data. :)

For MySQL imports:

mysqlimport [options] db_name textfile1 [textfile2 ...]

For SQLite3 imports:

ref http://stackoverflow.com/questions/1045910/how-can-i-import-load-a-sql-or-csv-file-into-sqlite