Best mysql questions in April 2011

Need help to optimize MySQL query

16 votes

I have 6 tables:

CREATE TABLE IF NOT EXISTS `sbpr_groups` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `active` tinyint(1) DEFAULT '0',
  `dnd` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=32 ;

CREATE TABLE IF NOT EXISTS `sbpr_newsletter` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `created_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `from` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `mail` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `subject` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `body` text COLLATE utf8_unicode_ci,
  `attach1` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `attach2` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `attach3` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=14;

CREATE TABLE IF NOT EXISTS `sbpr_news_groups` (
  `newsletter_id` int(11) NOT NULL,
  `groups` int(11) NOT NULL,
  KEY `fk_sbpr_news_groups_sbpr_newsletter1` (`newsletter_id`),
  KEY `fk_sbpr_news_groups_sbpr_groups1` (`groups`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `sbpr_news_recs` (
  `newsletter_id` int(11) NOT NULL,
  `recipients` int(11) NOT NULL,
  KEY `fk_sbpr_news_recs_sbpr_newsletter1` (`newsletter_id`),
  KEY `fk_sbpr_news_recs_sbpr_recipients1` (`recipients`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `sbpr_recipients` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `mail` varchar(160) DEFAULT NULL,
  `date_reg` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `active` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=3008 ;

CREATE TABLE IF NOT EXISTS `sbpr_rec_groups` (
  `rec_id` int(11) NOT NULL,
  `group` int(11) NOT NULL,
  KEY `fk_sbpr_rec_groups_sbpr_recipients` (`rec_id`),
  KEY `fk_sbpr_rec_groups_sbpr_groups1` (`group`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

With this foregin keys:

ALTER TABLE `sbpr_news_groups`
  ADD CONSTRAINT `fk_sbpr_news_groups_sbpr_groups1` 
    FOREIGN KEY (`groups`) REFERENCES `sbpr_groups` (`id`) 
    ON DELETE CASCADE ON UPDATE NO ACTION,
  ADD CONSTRAINT `fk_sbpr_news_groups_sbpr_newsletter1` 
    FOREIGN KEY (`newsletter_id`) REFERENCES `sbpr_newsletter` (`id`) 
    ON DELETE CASCADE ON UPDATE NO ACTION;

ALTER TABLE `sbpr_news_recs`
  ADD CONSTRAINT `fk_sbpr_news_recs_sbpr_newsletter1` 
    FOREIGN KEY (`newsletter_id`) REFERENCES `sbpr_newsletter` (`id`) 
    ON DELETE CASCADE ON UPDATE NO ACTION,
  ADD CONSTRAINT `fk_sbpr_news_recs_sbpr_recipients1` 
    FOREIGN KEY (`recipients`) REFERENCES `sbpr_recipients` (`id`) 
    ON DELETE CASCADE ON UPDATE NO ACTION;

ALTER TABLE `sbpr_rec_groups`
  ADD CONSTRAINT `fk_sbpr_rec_groups_sbpr_groups1` 
    FOREIGN KEY (`group`) REFERENCES `sbpr_groups` (`id`) 
    ON DELETE CASCADE ON UPDATE NO ACTION,
  ADD CONSTRAINT `fk_sbpr_rec_groups_sbpr_recipients` 
    FOREIGN KEY (`rec_id`) REFERENCES `sbpr_recipients` (`id`) 
    ON DELETE CASCADE ON UPDATE NO ACTION;

Visual structure of tables: enter image description here

I want to select all rows from sbpr_newsletter table, and add to each of these lines the number of rows from sbpr_recipients whose id prescribed in sbpr_news_recs or prescribed in sbpr_rec_groups depence on FKs.

Ex. I whant to select count of all recipients of curent newsletter wich are in sbpr_news_recs or exists in group wich are in sbpr_rec_groups plus count of active recipients.

I have workinq SQL:

SELECT d.id,  d.subject , d.created_date,
    (SELECT count(*) FROM sbpr_recipients r
      LEFT JOIN sbpr_news_recs nr ON nr.recipients = r.id
      LEFT JOIN sbpr_rec_groups g ON g.rec_id = r.id
      LEFT JOIN sbpr_news_groups ng ON ng.groups = g.group
      WHERE nr.newsletter_id = d.id OR ng.newsletter_id = d.id) AS repicients,

    (SELECT count(*) FROM sbpr_recipients r
      LEFT JOIN sbpr_news_recs nr ON nr.recipients = r.id
      LEFT JOIN sbpr_rec_groups g ON g.rec_id = r.id
      LEFT JOIN sbpr_news_groups ng ON ng.groups = g.group
      WHERE (nr.newsletter_id = d.id OR ng.newsletter_id = d.id) 
      AND r.active = 1) AS active_repicients
FROM sbpr_newsletter d
ORDER BY d.id ASC, d.id

Explain of this sql: enter image description here

Question: How can I optimize my sql?

Just approach to optimize, two SELECT queries are transfered into JOIN clause -

SELECT d.id
     , d.subject
     , d.created_date
     , count(if(nr_newsletter_id is not null or ng_newsletter_id is not null, 1, null)) repicients
     , count(if((nr_newsletter_id is not null or ng_newsletter_id is not null) and t.active = 1, 1, null)) active_repicients
FROM
  sbpr_newsletter d
LEFT JOIN (
  SELECT nr.newsletter_id nr_newsletter_id
       , ng.newsletter_id ng_newsletter_id
       , r.active
  FROM
    sbpr_recipients r
  LEFT JOIN sbpr_news_recs nr
    ON nr.recipients = r.id
  LEFT JOIN sbpr_rec_groups g
    ON g.rec_id = r.id
  LEFT JOIN sbpr_news_groups ng
    ON ng.groups = g.group
  ) t
ON nr_newsletter_id = d.id OR ng_newsletter_id = d.id
GROUP BY
  d.id;

I rewrited your query a little, it is not tested, but try it.

Inefficient SQL Query

14 votes

I'm building a simple web app at the moment that I'll one day open source. As it stands at the moment, the nav is generated on every page load (which will change to be cached one day) but for the moment, it's being made with the code below. Using PHP 5.2.6 and MySQLi 5.0.7.7, how more efficient can the code below be? I think joins might help, but I'm after advice. Any tips would be greatly appreciated.

<?php
    $navQuery = $mysqli->query("SELECT id,slug,name FROM categories WHERE live=1 ORDER BY name ASC") or die(mysqli_error($mysqli));
    while($nav = $navQuery->fetch_object()) {
        echo '<li>';
            echo '<a href="/'. $nav->slug .'">'. $nav->name .'</a>';
            echo '<ul>';
                $subNavQuery = $mysqli->query("SELECT id,name FROM snippets WHERE category='$nav->id' ORDER BY name ASC") or die(mysqli_error($mysqli));
                while($subNav = $subNavQuery->fetch_object()) {
                    echo '<li>';
                        echo '<a href="/'. $nav->slug .'/'. $subNav->name .'">'. $subNav->name .'</a>';
                    echo '</li>';
                }
            echo '</ul>';
        echo '</li>';
    }
?>

You can run this query:

SELECT c.id AS cid, c.slug AS cslug, c.name AS cname,
    s.id AS sid, s.name AS sname
FROM categories AS c
    LEFT JOIN snippets AS s ON s.category = c.id
WHERE c.live=1
ORDER BY c.name, s.name

Then iterate thru the results to create the proper heading like:

// last category ID
$lastcid = 0;
while ($r = $navQuery->fetch_object ()) {

    if ($r->cid != $lastcid) {
        // new category

        // let's close the last open category (if any)
        if ($lastcid)
            printf ('</li></ul>');

        // save current category
        $lastcid = $r->cid;

        // display category
        printf ('<li><a href="/%s">%s</a>', $r->cslug, $r->cname);

        // display first snippet
        printf ('<li><a href="/%s/%s">%s</a></li>', $r->cslug, $r->sname, $r->sname);

    } else {

        // category already processed, just display snippet

        // display snippet
        printf ('<li><a href="/%s/%s">%s</a></a>', $r->cslug, $r->sname, $r->sname);
    }
}

// let's close the last open category (if any)
if ($lastcid)
    printf ('</li></ul>');

Note that I used printf but you should use your own function instead which wraps around printf, but runs htmlspecialchars thru the parameters (except the first of course).

Disclaimer: I do not necessarily encourage such use of <ul>s.

This code is just here to show the basic idea of processing hierarchical data got with one query.

version of mysql2 (0.3.2) doesn't ship with the ActiveRecord adapter bundled anymore as it's now part of Rails 3.1

12 votes

Hi i am using rails version 3.0.7 when i run rails generate model task name:string i m getting following warning

WARNING: This version of mysql2 (0.3.2) doesn't ship with the ActiveRecord adapter bundled anymore as it's now part of Rails 3.1
WARNING: Please use the 0.2.x releases if you plan on using it in Rails <= 3.0.x
/usr/lib/ruby/gems/1.8/gems/activerecord-3.0.7/lib/active_record/connection_adapters/abstract/connection_specification.rb:71:in `establish_connection': Please install the mysql2 adapter: `gem install activerecord-mysql2-adapter` (no such file to load -- active_record/connection_adapters/mysql2_adapter) (RuntimeError)
    from /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.7/lib/active_record/connection_adapters/abstract/connection_specification.rb:60:in `establish_connection'
    from /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.7/lib/active_record/connection_adapters/abstract/connection_specification.rb:55:in `establish_connection'
    from /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.7/lib/active_record/railtie.rb:59
    from /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.7/lib/active_support/lazy_load_hooks.rb:36:in `instance_eval'
    from /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.7/lib/active_support/lazy_load_hooks.rb:36:in `execute_hook'
    from /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.7/lib/active_support/lazy_load_hooks.rb:43:in `run_load_hooks'
    from /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.7/lib/active_support/lazy_load_hooks.rb:42:in `each'
    from /usr/lib/ruby/gems/1.8/gems/activesupport-3.0.7/lib/active_support/lazy_load_hooks.rb:42:in `run_load_hooks'
    from /usr/lib/ruby/gems/1.8/gems/activerecord-3.0.7/lib/active_record/base.rb:1904
    from /home/sun/railsapp/dog/vendor/plugins/attribute_fu/init.rb:1
    from /usr/lib/ruby/gems/1.8/gems/railties-3.0.7/lib/rails/plugin.rb:81
    from /usr/lib/ruby/gems/1.8/gems/railties-3.0.7/lib/rails/initializable.rb:25:in `instance_exec'
    from /usr/lib/ruby/gems/1.8/gems/railties-3.0.7/lib/rails/initializable.rb:25:in `run'
    from /usr/lib/ruby/gems/1.8/gems/railties-3.0.7/lib/rails/initializable.rb:50:in `run_initializers'
    from /usr/lib/ruby/gems/1.8/gems/railties-3.0.7/lib/rails/initializable.rb:49:in `each'
    from /usr/lib/ruby/gems/1.8/gems/railties-3.0.7/lib/rails/initializable.rb:49:in `run_initializers'
    from /usr/lib/ruby/gems/1.8/gems/railties-3.0.7/lib/rails/application.rb:134:in `initialize!'
    from /usr/lib/ruby/gems/1.8/gems/railties-3.0.7/lib/rails/application.rb:77:in `send'
    from /usr/lib/ruby/gems/1.8/gems/railties-3.0.7/lib/rails/application.rb:77:in `method_missing'
    from /home/sun/railsapp/dog/config/environment.rb:5
    from /usr/lib/ruby/gems/1.8/gems/railties-3.0.7/lib/rails/application.rb:103:in `require'
    from /usr/lib/ruby/gems/1.8/gems/railties-3.0.7/lib/rails/application.rb:103:in `require_environment!'
    from /usr/lib/ruby/gems/1.8/gems/railties-3.0.7/lib/rails/commands.rb:16
    from script/rails:6:in `require'
    from script/rails:6

when i try to install activerecord-mysql2-adapter by :gem install activerecord-mysql2-adapter ERROR: Could not find a valid gem 'activerecord-mysql2-adapter' (>= 0) in any repository

please help me thanks

The problem is because you are trying to install the lastet version of mysql2 wich is incompatible with rails 3.0.x version

SO, in your Gemfile change the line for mysql2 gem for this:

gem 'mysql2', '< 0.3'

then bundle command

and then when the new mysql2 gem file ( i think is 0.2.7 ) you will solve the problem

A Select Statement that would do the following

11 votes

I am just learning how to wrap my head around sql and php. I have 4 tables structured as follows

+-----------+    +------------+    +---------+    +----------+
|  Project  |    | Slide      |    | Shape   |    |  Points  |
+-----------+    +------------+    +---------+    +----------+
|    id     |    |  id        |    | id      |    | id       |
+-----------+    | project_id |    | cont_id |    | shape_id |
                 +------------+    +---------+    | x        |
                                                  | y        |
                                                  +----------+

As you can see the tables are linked by id all the way down to points meaning a project will contain a number of slides that contain a number of shapes that contain a number of points.

I have a SQL query

SELECT slide.`id`, shape.`id`, points.`x_point`, points.`y_point` 
FROM `project`, `slide`, `shape`, `points` 
WHERE 1 = slide.`project_id` 
   AND slide.`id` = shape.`slide_id` 
   AND shape.`id` = points.`shape_id`

What I want is to take the results of this query that look like this

[0] => stdClass Object
     (
         [id] => 27
         [x] => 177
         [y] => 177
     )

 [1] => stdClass Object
     (
         [id] => 27
         [x] => 178
         [y] => 423
     )

 [2] => stdClass Object
     (
         [id] => 27
         [x] => 178
         [y] => 419
     )

 [3] => stdClass Object
     (
         [id] => 27
         [x] => 178
         [y] => 413
     )

 [4] => stdClass Object
     (
         [id] => 27
         [x] => 181
         [y] => 399
     )

 [5] => stdClass Object
     (
         [id] => 27
         [x] => 195
         [y] => 387
     )

 [6] => stdClass Object
     (
         [id] => 27
         [x] => 210
         [y] => 381
     )

 [7] => stdClass Object
     (
         [id] => 27
         [x] => 231
         [y] => 372
     )

 [8] => stdClass Object
     (
         [id] => 27
         [x] => 255
         [y] => 368
     )

 [9] => stdClass Object
     (
         [id] => 27
         [x] => 283
         [y] => 368
     )
... AND CONTINUED FOR A LONG TIME

What I want is to convert this beastly array of crap into something that more resembles this

[9] => stdClass Object
         (
             [id] => ID OF LIKE SHAPES
             [x] => Array(ALL THE X POINTS)
             [y] => ARRAY(ALL THE Y Points)
         )

I cannot for the life of me figure out how to convert this to such an array.

If it cannot be done with the query I designed is there a better query. Maybe one that grabs the points then takes that puts it into an array that of the points... I think I just got an Idea...


New Info,

So I added an answer to this question, I don't know if that's the standard way. To help out other answers if mine is not a good solution I will add my thought process here as well.

Check out my answer bellow for more info.

Also how does an ORM compare to my algorithm bellow?

Using an ORM like Doctrine, you would simply model it like

/**
 * @Entity
 */
class Project
{
    /**
     * @Id @GeneratedValue
     * @Column(type="integer")
     */
    private $id;

    /**
     * @OneToMany(targetEntity="Slide", mappedBy="project")
     */
    private $slides;

    public function __construct()
    {
        $this->slides = new \Doctrine\Common\Collections\ArrayCollection;
    }
}

/**
 * @Entity
 */
class Slide
{
    /**
     * @Id @GeneratedValue
     * @Column(type="integer")
     */
    private $id;

    /**
     * @ManyToOne(targetEntity="Project", inversedBy="slides")
     * @JoinColumn(name="project_id", referencedColumnName="id")
     */
    private $project;

    /**
     * @OneToMany(targetEntity="Shape", mappedBy="slide")
     */
    private $shapes;
}

And so on...

See http://www.doctrine-project.org/docs/orm/2.0/en/reference/association-mapping.html#one-to-many-bidirectional

Of course, there's a fair amount of setup and processing overhead involved but you'll appreciate an ORM as your domain model becomes more complex.

Improving MySQL tables with Indexes

11 votes

I am very new to Indexes in MySQL. I know, I should probably have leart it earlier, but most projects been small enough for me to get away with out it ;)

So, now I am testing it. I did my test by running EXPLAIN on a query:

Query:

EXPLAIN SELECT a . *
FROM `tff__keywords2data` AS a
LEFT JOIN `tff__keywords` AS b ON a.keyword_id = b.id
WHERE (
b.keyword = 'dog' || b.keyword = 'black' || b.keyword = 'and' || b.keyword = 'white'
)
GROUP BY a.data_id
HAVING COUNT( a.data_id ) =4 

First, without indexes I got these results:

enter image description here

Then, with index on data_id and keyword_id i got this:

enter image description here

So as I understand, the number of rows MySQL has to search goes from 61k down to 10k which must be good right?

So my question is, am I correct here? And is there anything else I could think about when trying to optimize?

UPDATE:

Further more, after some help from AJ and Piskvor pointing out my other table and its column keyword not having index I got this:

enter image description here

Great improvement! Right?

As you see, the key used for table b is still NULL. You may want to add an index on b.keyword and match with

WHERE b.keyword IN ('dog','black','and','white')

This is functionally different from your WHERE clause, although it returns the same results.

As it looks, you may be interested in fulltext searching.

How many rows will be locked by SELECT ... ORDER BY xxx LIMIT 1 FOR UPDATE?

10 votes

Hello.

I have a query with the following structure:

SELECT ..... WHERE status = 'QUEUED' ORDER BY position ASC LIMIT 1 FOR UPDATE;

It's a single-table SELECT statement on InnoDB table. Field position (INT NOT NULL) has an index on it. status is ENUM and is also indexed.

SELECT ... FOR UPDATE manual page says, that it locks all rows it reads. Do I understand correctly, that in this case only one row will be locked? Or rather it will lock the whole table?

Is that possible to determine which rows will be locked with EXPLAIN query? If yes - how? Explain for a query on the empty table shows the following:

1;'SIMPLE';'job';'index';<null>;'index_position';[34,...];<null>;1;'Using where'

This is a great question. InnoDB is a row level locking engine, but it has to set additional locks to ensure safety with the binary log (used for replication; point in time recovery). To start explaining it, consider the following (naive) example:

session1> START TRANSACTION;
session1> DELETE FROM users WHERE is_deleted = 1; # 1 row matches (user_id 10), deleted.
session2> START TRANSACTION;
session2> UPDATE users SET is_deleted = 1 WHERE user_id = 5; # 1 row matches.
session2> COMMIT;
session1> COMMIT;

Because statements are only written to the binary log once committed, on the slave session#2 would apply first, and would produce a different result, leading to data corruption.

So what InnoDB does, is sets additional locks. If is_deleted is indexed, then before session1 commits nobody else will be able to modify or insert into the range of records where is_deleted=1. If there are no indexes on is_deleted, then InnoDB needs to lock every row in the entire table to make sure the replay is in the same order. You can think of this as locking the gap, which is different concept to grasp from row-level locking directly.

In your case with that ORDER BY position ASC, InnoDB needs to make sure that no new rows could be modified between the lowest key value and a "special" lowest possible value. If you did something like ORDER BY position DESC.. well, then nobody could insert into this range.

So here comes the solution:

  • Statement based binary logging sucks. I really look forward to a future where we all switch to row based binary logging (available from MySQL 5.1, but not on by default).

  • With Row-based replication, if you change the isolation level to read-committed, then only the one row that matches needs to be locked.

  • If you want to be a masochist, you can also turn on innodb_locks_unsafe_for_binlog with statement-based replication.


Update 22 April: To copy + paste my improved version of your testcase (it was not searching 'in the gap'):

session1> CREATE TABLE test (id int not null primary key auto_increment, data1 int, data2 int, INDEX(data1)) engine=innodb;
Query OK, 0 rows affected (0.00 sec)

session1> INSERT INTO test VALUES (NULL, 1, 2), (NULL, 2, 1), (5, 2, 2), (6, 3, 3), (3, 3, 4), (4, 4, 3);
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

session1> start transaction;
Query OK, 0 rows affected (0.00 sec)

session1> SELECT id FROM test ORDER BY data1 LIMIT 1 FOR UPDATE;
+----+
| id |
+----+
|  1 |
+----+
1 row in set (0.00 sec)

session2> INSERT INTO test values (NULL, 0, 99); # blocks - 0 is in the gap between the lowest value found (1) and the "special" lowest value.

# At the same time, from information_schema:

localhost information_schema> select * from innodb_locks\G
*************************** 1. row ***************************
    lock_id: 151A1C:1735:4:2
lock_trx_id: 151A1C
  lock_mode: X,GAP
  lock_type: RECORD
 lock_table: `so5694658`.`test`
 lock_index: `data1`
 lock_space: 1735
  lock_page: 4
   lock_rec: 2
  lock_data: 1, 1
*************************** 2. row ***************************
    lock_id: 151A1A:1735:4:2
lock_trx_id: 151A1A
  lock_mode: X
  lock_type: RECORD
 lock_table: `so5694658`.`test`
 lock_index: `data1`
 lock_space: 1735
  lock_page: 4
   lock_rec: 2
  lock_data: 1, 1
2 rows in set (0.00 sec)

# Another example:
select * from test where id < 1 for update; # blocks

MySQL -- Joins Between Databases On Different Servers Using Python?

9 votes

In MySQL, I have two different databases -- let's call them A and B.

Database A resides on server server1, while database B resides on server server2.

Both servers {A, B} are physically close to each other, but are on different machines and have different connection parameters (different username, different password etc).

In such a case, is it possible to perform a join between a table that is in database A, to a table that is in database B?

If so, how do I go about it, programatically, in python? (I am using python's MySQLDB to separately interact with each one of the databases).

Try to use FEDERATED Storage Engine.

Handling unread posts in PHP / MySQL

8 votes

For a personal project, I need to build a forum using PHP and MySQL. It is not possible for me to use an already-built forum package (such as phpBB).

I'm currently working through the logic needed to build such an application, but it's been a long day and I'm struggling with the concept of handling unread posts for users. One solution I had was to have a separate table which essentially holds all post IDs and user IDs, to determine if they've been read:

tbl_userReadPosts: user_id, post_id, read_timestamp

Obviously, if a user's ID appears in this table, we know they've read the post. This is great, except if we have thousdands of posts per day (which is more than possible in the system which is being proposed), and thousdands of users. This table would become huge within a matter of days, if not hours.

Another option would be to track the user's last activity as a timestamp, and then retrieve all posts made after their last activity was updated. This works in theory, but let's say a user is writing an extremely long post, and in the meantime several members also start new threads or reply to posts in other threads. When the user submits his new post, his last activity would be updated, and thus not match those made in the meantime.

Does anyone have experience with this, and how did you tackle it?

I've checked in phpBB and it seems that the system assigns a custom session to each user, and works on that basis, but the documentation is pretty sparse as to how this deals with unread posts.

Thoughts and opinions gratefully received, as always.

Sorry for the quick answer but I only have a second. You definitely do not want to store the read information in the database, as you've already deduced, this table would become gigantic.

Something in between what you've already suggested: Store the users last activity, and in conjunction with storing information of what they've seen in the cookie, to determine which threads/posts they've read already.

This offloads the storage to the client side cookie, which is far more efficient.

Web app - slider showing days of the months feature

8 votes

Hi.

For a school project we have to build a web app. I'll be creating something where people can keep track of their classes, their homework, and their free time. A planner/calendar. (I'm making it sound really lame here but hey, I'm tired and English isn't my first language ;) )

I'll be working in CodeIgniter for the PHP logic, combined with the usual.. CSS, jQuery, mySQL. PHP is a requirement for the course; I chose to do this in CI because well.. I wanted to learn the framework. We kind of have to show off what we can do at this point of our 'school career'.

Anyway, I would like to ask for some insights regarding a feature I want to implement. At the top of my page, I would like to show a bar which contains the days of the month. Below the day number, I would be showing how many tasks are added on that day by means of some dots. When the user clicks previous or next, I want to show the previous/next month's days. I also want some sort of slider underneath this box which the user can use to slide left and right, and cycle through the days that way. I hope that made sense?

EDIT 2: I want the slider to be dynamic. If the user slides to the previous or next months, or clicks the buttons, I want it to load the days of the previous/next months and show those. Also, say we're at the 26th of a month, the slider would have to show something like 10-31 of this month AND 1-10 of the next month. I suppose I'll also have to change my month indication (not like in the image here) so a user knows when another month starts (I'll show them the name of the month).

Here's a picture (don't mind the day numbers being messed up, I was lazy doing that correctly in Photoshop. will fix that tomo): Day bar, dont mind the numbers

I've been looking at the jQuery UI sliders. I suppose I'd have to grab the number of days from a database or by using PHP? I guess the cal_days_in_month function could come in handy here. When the user clicks on the arrows or slides left or right, I don't want the page to refresh. Should I go with ajax calls there? I'm not quite sure how to implement this, to be honest. The numbers are also links to a calendar type of view which shows underneath this bar. Could I possibly use the CI Calendar class for this? Or is it more for full-fledged google calendar-type of calendars? I thought this screencast could perhaps be useful?

If possible, could someone please provide some insights on how to start working on this and which plugins/etc I could perhaps use? I'm not sure where to start, to be honest. I'm sure I can work this out somehow, but I guess it'd be nice to get a kickstart by means of some help here. The main problem I'm seeing is the slider/next/previous thing and loading in the previous/next month's days.

Thanks in advance.

EDIT: I realise some people might say/think 'OMG, why don't you just use the skills you have instead of trying something you have to ask us about!'. Well, this is because I actually want to learn something while doing this project. Keep in mind, I'm not asking for lines of code here, I'm just asking for some insight on where to start and what stuff to use; perhaps little snippets that can help me out. Thanks.

UPDATE:

I got a very basic 'day bar' working. Still without a slider, nor do the previous and next buttons work, but hey.. at least it fills it in dynamically. It shows the 5 days previous to the current day, then this month till the end. Whatever is left to fill in gets filled with days of the next month. Quite basic. However, I do have a couple of questions!

Since someone told me yesterday that I was breaking design patterns by doing some stuff the way I was doing it, I'm extremely paranoid about the way I'm working now and I would really like some feedback from 'CodeIgniter pro's'. To fill in the 'day bar', I created a helper with a couple of methods. (One method to dynamically fill that 'month year' thing you see in the picture, another method init() which loads the list of the days, like I explained before). I loaded this helper in the controller and I'm now using the methods in my view:

    <ul>
        <?php
            init($current_day_of_month, $current_month, 
                          $current_year, $days_in_current_month, $show_history);
        ?>
    </ul>

The helper then echoes my day values in my view. Is this good or bad practice? I kept thinking the wrong way when I wanted to start writing the code for this.. I wanted to have a function somewhere in my controller and then call it from the view, but I read that I shouldn't be doing it like that.. that I had to reverse my logic. I find it hard to wrap my head around the fact that I have to do this by sending arrays of data to my view (from my controller), so I opted for creating the helper. Good? Bad? Any tips, resources I should read, screencasts I should watch? Thanks a bunch.

The key thing if you want your system to be dynamic is to make the data transmission short. So using Ajax, as icchanobot says, send the request for a specific month. Use get:

'some_controller?m=' + month + '&y=' + year

or even:

'some_controller?next' // or previous

The controller has to get data for the correct month, but not send back the whole month - only the data needed for your display, in a format as tight as possible. You could query how many events run on which days of that month:

SELECT day, count(event) FROM event_table WHERE DATE BETWEEN 'yyyy-mm-01' AND 'yyyy-mm-31' GROUP BY day ORDER BY day;

query needs adapting to your data structure - use a function to get the day from a complete date, and maybe use indexes so that the query returns the data fast.

Then the controller returns a string as short as you can make it, of the relevant data sorted in day order:

1=3,15=1,29=2

That would mean "1st=3 events, 15th=1 event, 29=2 events". If you don't want the number of events then "1,15,2" is enough. Empty days aren't transmitted.

the data is received by an ajax event handler on your web page and you parse it by using split, then populate the slider by using a loop.

Your biggest drag, in a very dynamic application, is if it slows down when you repeatedly ask for the next month and the next. A few tricks:

  • Update the display while waiting for data; you send your query, and while it is being processed, you can slide the month into view, with the correct number of days, looking disabled so that the user knows immediately that they will get their data, and that it is in progress. Then when the data comes, populate and highlight. It will feel instant though it isn't.
  • Avoid processing information the user doesn't want anymore. If somebody clicks "next" three times, they want the data for july, not may, june and july. Don't process what you don't display.
  • Cache data you've already asked, unless you want the system to return dynamically to the server for the latest state of the calendar. You've asked for the data for May and June, but not displayed it; when the user hits "back", don't ask for that data again.

Good luck!

Which is more efficient for a one or two-character string: CHAR(2) or VARCHAR(2)?

8 votes

A really quick one - is it more efficient to store data which might be one or two characters as CHAR(2) or VARCHAR(2) in MySql?

Thanks!

In terms of storage space required, you're better off with CHAR(2) because the VARCHAR(2) type will require one extra byte to store the length:

Value   CHAR(2)  Storage Required  VARCHAR(2)  Storage Required
''      '  '     2 bytes           ''          1 byte          
'a'     'a '     2 bytes           'a'         2 bytes         
'ab'    'ab'     2 bytes           'ab'        3 bytes         

See 10.4.1. The CHAR and VARCHAR Types for more details.

More Information: What's the difference between VARCHAR and CHAR?

Count number of rows that are not within 10 seconds of each other

7 votes

Hello all,

I track web visitors. I store the IP address as well as the timestamp of the visit.

ip_address    time_stamp
180.2.79.3  1301654105
180.2.79.3  1301654106
180.2.79.3  1301654354
180.2.79.3  1301654356
180.2.79.3  1301654358
180.2.79.3  1301654366
180.2.79.3  1301654368
180.2.79.3  1301654422

I have a query to get total tracks:

SELECT COUNT(*) AS tracks FROM tracking

However, I now want to disregard visits from users that have visited multiple times within 10 seconds of each visit. Since I don't consider this another visit, its still part of the first visit.

When the ip_address is the same, check timestamp and only count those rows that are 10 seconds away from each other.

I am having difficulty in putting this into a SQL query form, I would appreciate any help on this!

Let me start with this table. I'll use ordinary timestamps so we can easily see what's going on.

180.2.79.3   2011-01-01 08:00:00
180.2.79.3   2011-01-01 08:00:09
180.2.79.3   2011-01-01 08:00:20
180.2.79.3   2011-01-01 08:00:23
180.2.79.3   2011-01-01 08:00:25
180.2.79.3   2011-01-01 08:00:40
180.2.79.4   2011-01-01 08:00:00
180.2.79.4   2011-01-01 08:00:13
180.2.79.4   2011-01-01 08:00:23
180.2.79.4   2011-01-01 08:00:25
180.2.79.4   2011-01-01 08:00:27
180.2.79.4   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:50

If I understand you correctly, you want to count these like this.

180.2.79.3   3
180.2.79.4   3

You can do that for each ip_address by selecting the maximum timestamp that is both

  • greater than the current row's timestamp, and
  • less than or equal to 10 seconds greater than the current row's timestamp.

Taking these two criteria together will introduce some nulls, which turn out to be really useful.

select ip_address, 
       t_s.time_stamp, 
       (select max(t.time_stamp) 
        from t_s t 
        where t.ip_address = t_s.ip_address 
          and t.time_stamp > t_s.time_stamp
          and t.time_stamp - t_s.time_stamp <= interval '10' second) next_page
from t_s 
group by ip_address, t_s.time_stamp
order by ip_address, t_s.time_stamp;

ip_address   time_stamp            next_page
180.2.79.3   2011-01-01 08:00:00   2011-01-01 08:00:09
180.2.79.3   2011-01-01 08:00:09   <null>
180.2.79.3   2011-01-01 08:00:20   2011-01-01 08:00:25
180.2.79.3   2011-01-01 08:00:23   2011-01-01 08:00:25
180.2.79.3   2011-01-01 08:00:25   <null>
180.2.79.3   2011-01-01 08:00:40   <null>
180.2.79.4   2011-01-01 08:00:00   <null>
180.2.79.4   2011-01-01 08:00:13   2011-01-01 08:00:23
180.2.79.4   2011-01-01 08:00:23   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:25   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:27   2011-01-01 08:00:29
180.2.79.4   2011-01-01 08:00:29   <null>
180.2.79.4   2011-01-01 08:00:50   <null>

The timestamp that marks the end of a visit has a null for its own next_page. That's because no timestamp is less than or equal to time_stamp + 10 seconds for that row.

To get a count, I'd probably create a view and count the nulls.

select ip_address, count(*)
from t_s_visits 
where next_page is null
group by ip_address

180.2.79.3   3
180.2.79.4   3

What effects does using a binary collation have?

7 votes

While answering this question, I became uncertain about something that I didn't manage to find a sufficient answer to.

What are the practical differences between using the binary utf8_bin and the case insensitive utf8_general_ci collations?

I can see three:

  1. Both have a different sorting order; _bin's sorting order is likely to put any umlauts to the end of the alphabet, because byte values are compared (right?)

  2. Only case sensitive searches in _bin

  3. No A = Ä equality in _bin

Are there any other differences or side-effects to be aware of?

Reference:

Similar questions that don't address the issue:

Binary collation compares your string exactly as strcmp() in C would do, if characters are different (be it just case or diacritics difference). The downside of it that the sort order is not natural.

An example of unnatural sort order (as in "binary" is) : A,B,a,b Natural sort order would be in this case e.g : A,a,B,b (small and capital variations of the sme letter are sorted next to each other)

The practical advantage of binary collation is its speed, as string comparison is very simple/fast. In general case, indexes with binary might not produce expected results for sort, however for exact matches they can be useful.

Live chat with PHP and jQuery. Where to store information? Mysql or file?

7 votes

There are 1 on 1 live chat. Two solutions:

1) I store every message into database and with jQuery's help I check if there is a new message in database every second. Of course I use cache either. If there is, we give that message.

2) I store every message in one html file and every second through jQuery that file is shown over and over again.

What is better? Or there is third option? And in general, what is better, mysql or file for this kinda project?

Thank you very much.

P.S. The most important question is: what is more efficient and what way will eat less resources!

Edit: And is it, nowadays, very bad for many chats (let's say 2,500 chats, that means 5,000 users) to use long polling and check when file was edited every second through javascript? I use very similiar methods like this chat: http://css-tricks.com/jquery-php-chat/ Will it kill my hosting?

Everyone has given a wide range of opinions but I don't think anyone has really hit the nail on the head.

When it comes down to storing data, the amount of data, the rate it is to be accessed, and several other factors all determine what's the best storage platform.

Some people have suggested using memcached. Now although this is a valid answer (you can use it), I don't think that this is a good idea, solely based on the fact that memcached stores data within your server's memory.

Your memory is not for data storage, it's for use of the actual applications, operating system, shared libraries, etc.

Storing data within the memory can cause a lot of issues with other applications currently running. If you store too much data in your RAM your applications would not be able to complete operations assigned to them.

Although this is faster then a disk based storage platform such as MySQL, it's not as reliable.

I would personally use MySQL as your storage engine server-side. This would reduce the amount of problems you would come across and also makes the data very manageable.

To speed up the responses to your clients I would look at running node on your server.

This is because it's event driven and non-blocking.

What does that mean?

Well, when Client A requests some data that is stored on the hard drive, traditionally PHP might say to the C++, fetch me this chunk of data stored on this sector of the hard drive. C++ would say 'ok no problem', and while it goes of to get the information PHP would sit and wait for the data to be read and returned before it continues it's operations, blocking all other client's in the meantime.

With node, it's slightly different. Node will say to the kernel, 'fetch me this chunk of information and when your done, give me call', and then it continues to take requests from other clients that may not need disk access.

So suddenly because we have assigned a callback to the kernel, we do not have to wait :), happy days.

Take a look at this image: Node Event Loop

This really could be the answer your looking for, please see the following for a more descriptive and detailed information regarding how node could be the right choice for you:

PHP/MySQL - find items that have similar or matching properties

7 votes

I'm trying to develop a way of taking an entity with a number of properties and searching for similar entities in the database (matching as many of the properties in the correct order as possible). The idea is that it would then return a % of how similar it is.

The order of the properties should also be taken into account, so the properties at the beginning are more important than the ones at the end.

For example:

Item 1 - A, B, C, D, E

Item 2 - A, B, C, D, E

Would be a 100% match

Item 1 - A, B, C, D, E

Item 2 - B, C, A, D, E

This wouldn't be a perfect match as the properties are in a different order

Item 1 - A, B, C, D, E

Item 2 - F, G, H, I, A

Would be a low match as only one property is the same and it is in position 5

This algorithm will run for thousands and thousands of records so it needs to be high performing and efficient. Any thoughts as to how I could do this in PHP/MySQL in a fast and efficient manner?

I was considering levenshtein but as far as I can tell that would also look at the distance between two completely different words in terms of spelling. Doesn't appear to be ideal for this scenario unless I'm just using it in the wrong way..

It might be that it could be done solely in MySQL, perhaps using a full text search or something.

This seems like a nice solution, though not designed for this scenario. Perhaps binary comparison could be used in some way?

what i'd do is encode the order and property value into a number. numbers have the advantage of fast comparisons.

this is a general idea and may still need some work but i hope it would help in some way.

calculate a number (some form of hash) for each property and multiply the number representative of the order of appearance the property for an item.

say item1 has 3 properties A, B and C.

hash(A) = 123, hash(B) = 345, hash(C) = 456

then multiply that by the order of appearance given that we have a know number of properties:

(hash(A) * 1,000,00) + (hash(B) * 1,000) + (hash(C) * 1) = someval

magnitude of the multiplier can be tweaked to reflect your data set. you'll have to identify the hash function. soundex maybe?

the problem is now reduced to a question of uniqueness due to hash collisions but we can be pretty sure about properties that don't match.

also, this would have the advantage of relative ease of checking if a property appears in another item in different order by using the magnitude of the multiplier to extract the hash value from the number generated.

HTH.

edit: example for checking matches

given item1(a b c) and item2(a b c). the computed hash of items would be equal. this is a best case scenario. no further computations are required.

given item1(a b c) and item2(d e a). computed hash of items are not equal. proceed to breaking down property hashes...

say a hash table for properties a = 1, b = 2, c = 3, d = 4, e = 5 with 10^n for multiplier. computed hash for item1 is 123 and item2 is 451, break down the computed hash for each property and compare for all combinations of properties one for each item1 (which becomes item1(1 2 3) ) and item2 (which becomes item2(4 5 1) ). then compute the score.

another way of looking at it would be comparing the properties one by one, except this time, you're playing with numbers instead of the actual string values

How to select part of a Timestamp in a SQL Query

7 votes

Hi all,

In the DB I am working with, I want to select only the year from a specific TimeStamp field. In particular, I'm hoping to select the unique years from this database column.

For instance, if all of the timestamps in the field "ModifyTimeStamp" are either from the year 2002 or the year 2006, I would like returned simply a result of '2002' and '2006'. If this is impossible, I'd be content with getting a result of a bunch of '2002's mixed with '2006's and would parse it later.

All I've been able to get working so far is "Select ModifyTimeStamp from Table" - all my attempts to parse have failed. I started reading about the extract command for SQL, but I believe it's only for PostgreSQL.

Any advice greatly appreciated!

Edit: Got the answer, thanks a lot datagod and Marc. The code I was looking for ended up being:

"Select distinct YEAR(ModifyTimeStamp) from Table"

You don't specify which RDBMS (database server) you're using, but most databases do have date handling functions built-in:

  • MySQL/SQL Server:

    select YEAR(modifytimestamp) from yourtable
    
  • Access/SQL Server:

    select DatePart(year, modifytimestamp) from yourtable
    
  • Oracle:

    select TO_CHAR(modifytimestamp, 'YYYY') from yourtable
    

Is there any downside to giving extra size/length to your database columns?

6 votes

I usually set all my varchars to 255 to be safe. Does it make any difference in terms of the disk space or anything else? Is there any downside to having bigger varchars/ints/other fields than you would mostly need?

I takes more time and more disk transfers to load larger data items into memory. Defining large maximum sizes for columns increases the size of table rows. For many DBMS servers, table rows are the items transferred. So defining columns that are too fat does slow things down.

This effect is minimal for VARCHAR items. But VARCHAR is quite a bit slower than data types like integers. Eight byte integers take four times as much time to transfer as two byte integers. So, if a database is being designed for ultimate performance, limiting data columns to the range actually required will speed things up.The extent of this effect depends on whether the disk channel is a bottleneck or not.

Another possible bottleneck is the channel that links the server with the client, often a network channel. Bottlenecking in this channel can be reduced by queries that don't ask for data that will never be used, but there's a trade off here between asking for data only when you need it and making too many round trips.

There's also a trade off between designing for optimal performance and over designing in the anticipation of chagning requirements.

jQuery, ajax, php, msyql: auto-suggest form input

5 votes

I have a mysql db table that has every U.S. city, state, zip, lat, long. On my site page I have a form input for cities. As the user types, I'd like to suggest city, state. What current practices/techniques are good? (Limitations are jQuery, PHP, MYSQL)

Solution

jQuery UI has a nice autocomplete feature.

JQuery UI Autocomplete

It is quite well configurable. You should send the data in JSON format from the server, Autocomplete will mostly handle the rest. Check out the docs.

Get last record of each month in MySQL....?

5 votes

Hello All, I have a problem in writing a query for MySQL. I have following fields in the DB

id     created_on            status
1      2011-02-15 12:47:09    1 
2      2011-02-24 12:47:09    1
3      2011-02-29 12:47:09    1
4      2011-03-11 12:47:09    1
5      2011-03-15 12:47:09    1
6      2011-03-22 12:47:09    1
7      2011-04-10 12:47:09    1
8      2011-04-11 12:47:09    1

I need to select the last record of each month. That is for the month FEB record # 3 month MARCH record # 6 and for month APRIL record # 8

Please help me.....

Thanks in advance.....

SELECT * FROM table 
WHERE created_on in 
(select DISTINCT max(created_on) from table 
GROUP BY YEAR(created_on), MONTH(created_on))

Performance Testing Various Databases

5 votes

I am currently testing a few different relational database management systems. (MySQL, PostgreSQL, Oracle and Firebird SQL) using a Java application to do so.

I was debating what tests I should run to distinguish the performances of each.

The obvious ones would be insert, select, delete and drop.

I would love to hear your opinions and to make this apply to the question answer format I shall ask for the 5 most appropriate tests to indicate performace differences. In an ideal world I would like to mimic real word use.

Thanks to all who answer.

I think that any of them would probably be fine. However, your configuration of the different databases for what you are trying to do would be different based on your application.

Suggested place to start: look for apps similar to yours. See what they are using, if you can. Then start testing the different databases with similar configurations and see what works for you.

Personally I've used Oracle, MySql, and Postgres over the last 11 years and they've all worked well. It's really all in your configuration, which is where a good DBA comes in handy.

Is there a mySQL frontend that has an interface for joins?

4 votes

Are there any mySQL frontends, like phpmyadmin, that has a graphical interface for joins?

I know you can run saved queries (which may include joins) in phpmyadmin, but I am looking for a user-friendly way of how other frontends are tackling the problem. I don't actually need a frontend, I just want to see how others are doing it.

If there are none are available, what would be a good way of approaching creating a join interface?

I am currently thinking, given a student and enrollment table (as a super simple example), such that

student table
+---------------------------+
| id   | name   | number    |
+------+--------+-----------+
| 2    | Joe    | 04567843  |
| 3    | Jim    | 43243254  |
| 4    | Jack   | 23145671  |
+------+--------+-----------+

and..

enrollment 
+---------------------+------------+-----------+
| id   | student_id   | course_id  | score     |
+------+--------------+------------+-----------+
| 1    | 2            | ma001      | 86%       |
| 2    | 2            | en001      | 46%       |
| 3    | 3            | ma001      | 78%       |
+------+--------------+------------+-----------+

The interface could allow you to select a primary table, and the fields you want, then a secondary table, and the fields you want. And finally, a JOIN fieldset, where you choose the join type and the fields connecting it (see image).

The image is a mockup using firebug manipulating phpmyadmin to show what I mean (hopefully)

enter image description here

I realise this is kind of 2 questions, but highly linked to each other, but to summarise, does a front end like this exist? And if not, would the above approach work?

If you want to look at how others do it, play around with Microsoft Access a bit. e.g. enter image description here