My MySQL application is experiencing slow performance when running some
DELETE queries. In this question, I will only discuss one particular
UPDATE, because it's enough to demonstrate the problem:
UPDATE projects SET ring = 5 WHERE id = 1
UPDATE is usually fast enough, around 0.2ms, but every now and then (enough to be a problem) it takes several seconds. Here's an excerpt from the log (look at the 4th line):
~ (0.000282) UPDATE `projects` SET `ring` = 5 WHERE `id` = 1 ~ (0.000214) UPDATE `projects` SET `ring` = 6 WHERE `id` = 1 ~ (0.000238) UPDATE `projects` SET `ring` = 7 WHERE `id` = 1 ~ (3.986502) UPDATE `projects` SET `ring` = 8 WHERE `id` = 1 ~ (0.000186) UPDATE `projects` SET `ring` = 9 WHERE `id` = 1 ~ (0.000217) UPDATE `projects` SET `ring` = 0 WHERE `id` = 1 ~ (0.000162) UPDATE `projects` SET `ring` = 1 WHERE `id` = 1
projects is an InnoDB table with 6 columns of types
VARCHAR, 17 rows and an index on
id. It happens with other tables too, but here I'm focusing on this one. When trying to solve the problem, I ensured that the queries were all sequential, so this is not a lock issue. The
UPDATE above is executed in the context of a transaction. Other information on the server:
- VPS with 4GB RAM (was 1GB), 12GB free disk space
- CentoOS 5.8 (was 5.7)
- MySQL 5.5.10 (was 5.0.x)
The "was" bit above means it didn't work before or after the upgrade.
What I've tried so far, to no avail:
innodb_flush_log_at_trx_committo 0, 1 or 2
innodb_locks_unsafe_for_binlogon or off
timed_mutexeson or off
innodb_flush_methodfrom the default to
innodb_buffer_pool_sizefrom the default to 600M and then to 3000M
innodb_log_file_sizefrom the default to 128M
- Compiling MySQL from source
SHOW PROCESSLIST, which informs me that the state is "updating"
SHOW PROFILE ALL, which says that almost all the time was spent on "updating", and that, within that step, not so much time was spent on CPU cycles and there were many voluntary context switches (like 30)
SHOW STATUSfor changes in
Innodb_buffer_pool_pages_dirty. There may be some relation between dirty pages being flushed and the slow queries, but the correlation isn't clear.
Then I decided to check the system's I/O latency with
ioping. This is my first VPS, so I was surprised to see this result:
4096 bytes from . (vzfs /dev/vzfs): request=1 time=249.2 ms 4096 bytes from . (vzfs /dev/vzfs): request=2 time=12.3 ms 4096 bytes from . (vzfs /dev/vzfs): request=3 time=110.5 ms 4096 bytes from . (vzfs /dev/vzfs): request=4 time=232.8 ms 4096 bytes from . (vzfs /dev/vzfs): request=5 time=294.4 ms 4096 bytes from . (vzfs /dev/vzfs): request=6 time=704.7 ms 4096 bytes from . (vzfs /dev/vzfs): request=7 time=1115.0 ms 4096 bytes from . (vzfs /dev/vzfs): request=8 time=209.7 ms 4096 bytes from . (vzfs /dev/vzfs): request=9 time=64.2 ms 4096 bytes from . (vzfs /dev/vzfs): request=10 time=396.2 ms
Pretty erratic, I would say.
Having said all of that, I ask:
Can the I/O latency be occasionally killing MySQL performance? I always thought that, when you ran an
UPDATE, the thread taking care of that connection wasn't going to flush data to disk or wait for such a flush; it would return immediately and the flushing would be done by another thread at another time.
If it can't be disk I/O, is there anything else I can possibly try, short of renting a dedicated server?
I'm replying to my own question with additional data that I collected based on your answers.
I used two notebooks connected by means of a wireless network. On notebook A, I
mounted a directory of notebook B using
sshfs. Then on notebook A I started up
MySQL specifying that mounted directory as its data directory. This should
provide MySQL with a very slow I/O device. MySQL was started with
innodb_flush_log_at_trx_commit = 0.
I defined 3 sets of queries, each set consisting of an update and a select query repeated 10,000 times, without explicit transactions. The experiments were:
- US1SID: update and select on a specific row of the same table. The same row was used in all iterations.
- US1MID: update and select on a specific row of the same table. The row was a different one in each iteration.
- US2MID: update and select on rows of different tables. In this case, the table being read by the select didn't change at all during the experiment.
Each set was run twice using a shell script (hence timings are slower than those in my original question), one under normal conditions and the other after executing the following command:
tc qdisc replace dev wlan0 root handle 1:0 netem delay 200ms
The command above adds a mean delay of 200ms when transmitting packets through wlan0.
First, here's the mean time of the top 99% fastest updates and selects, and the bottom 1% updates and selects.
| Delay: 0ms | Delay: 200ms | | US1SID | US1MID | US2MID | US1SID | US1MID | US2MID | | top99%u | 0.0064 | 0.0064 | 0.0064 | 0.0063 | 0.0063 | 0.0063 | | top99%s | 0.0062 | 0.0063 | 0.0063 | 0.0062 | 0.0062 | 0.0062 | | bot01%u | 1.1834 | 1.2239 | 0.9561 | 1.9461 | 1.7492 | 1.9731 | | bot01%s | 0.4600 | 0.5391 | 0.3417 | 1.4424 | 1.1557 | 1.6426 |
As is clear, even with really, really poor I/O performance, MySQL manages to execute most queries really fast. But what concerns me the most is the worst cases, so here's another table, showing the 10 slowest queries. A "u" means it was an update, an "s" a select.
| Delay: 0ms | Delay: 200ms | | US1SID | US1MID | US2MID | US1SID | US1MID | US2MID | | 5.443 u | 5.946 u | 5.315 u | 11.500 u | 10.860 u | 11.424 s | | 5.581 u | 5.954 s | 5.466 u | 11.649 s | 10.995 u | 11.496 s | | 5.863 s | 6.291 u | 5.658 u | 12.551 s | 11.020 u | 12.221 s | | 6.192 u | 6.513 u | 5.685 u | 12.893 s | 11.370 s | 12.599 u | | 6.560 u | 6.521 u | 5.736 u | 13.526 u | 11.387 u | 12.803 u | | 6.562 u | 6.555 u | 5.743 u | 13.997 s | 11.497 u | 12.920 u | | 6.872 u | 6.575 u | 5.869 u | 14.662 u | 12.825 u | 13.625 u | | 6.887 u | 7.908 u | 5.996 u | 19.953 u | 12.860 u | 13.828 s | | 6.937 u | 8.100 u | 6.330 u | 20.623 u | 14.015 u | 16.292 u | | 8.665 u | 8.298 u | 6.893 u | 27.102 u | 22.042 s | 17.131 u |
Poor I/O performance can indeed slow MySQL to a crawl. It's not clear why or when exactly, but it does happen.
The slowing down applies to both selects and updates, with updates suffering more.
For some reason, even selects on a table that wasn't involved in any changes, and which had recently been populated, were also slowed down, as is clear from US2MID above.
As for the test cases proposed by mentatkgs, it seems that updating different rows instead of the same ones does help a little, but doesn't solve the problem.
I guess I will either adapt my software to tolerate such delays or try to move to another provider. Renting a dedicated server is too expensive for this project.
Thank you all for the comments.