Best mysql questions in May 2012

Migrating from MySQL to SQL Server, issues with constraints

11 votes

I created a web app that uses a MySQL database, but I have to migrate the database to Microsoft SQL Server 2008 R2 and I'm using the SQL Server Migration Assistant (SSMA).

I'm getting errors in my report for some tables that use foreign keys.

1. Self-referencing foreign keys

I have one table that has a parent-child relationship between rows; map table:

| map_id | map_title           | latitude  | longitude  | map_zoom | map_parent |
|:------:|:-------------------:|:---------:|:----------:|:--------:|:----------:|
| 1      | My Parent Map       | 50.364829 | -52.635623 | 17       | NULL       |
| 2      | Some Child Map      | 50.366916 | -52.634718 |          | 1          |
| 3      | Another Child Map   | 50.364898 | -52.634543 |          | 1          |
| 4      | My Last Example Map | 50.361986 | -52.638891 |          | 3          |

The report generated by SQL Server Migration Assistant (SSMA) shows the SQL that would be used to create a table in SQL Server.

MySQL (source):

1  CREATE
2      TABLE `map`
3          (
4              `map_id` int(11) UNSIGNED NOT NULL  AUTO_INCREMENT, 
5              `map_title` varchar(50) DEFAULT NULL, 
6              `latitude` varchar(12) DEFAULT NULL, 
7              `longitude` varchar(12) DEFAULT NULL, 
8              `map_zoom` varchar(5) NOT NULL, 
9              `map_parent` int(11) UNSIGNED DEFAULT NULL, 
10               PRIMARY KEY  (`map_id`) , 
11               KEY `map_parent`  (`map_parent`) , 
12               CONSTRAINT `map_ibfk_2` FOREIGN KEY  (`map_parent`)  REFERENCES `map`  (`map_id`)   ON DELETE CASCADE   ON UPDATE CASCADE 
13          )  ENGINE = InnoDB AUTO_INCREMENT = 12 DEFAULT  CHARSET = utf8;

SQL Server (target, SQL generated by SSMA):

1  CREATE TABLE dbo.map
2  (
3      map_id bigint NOT NULL IDENTITY(12, 1), 
4      map_title nvarchar(50) NULL DEFAULT NULL, 
5      latitude nvarchar(12) NULL DEFAULT NULL, 
6      longitude nvarchar(12) NULL DEFAULT NULL, 
7      map_zoom nvarchar(5) NOT NULL, 
8      map_parent bigint NULL DEFAULT NULL, 
9      CONSTRAINT PK_map_map_id PRIMARY KEY (map_id), 
10      /* 
11      *   SSMA error messages:
12      *   M2SS0040: ON DELETE  CASCADE|SET NULL|SET DEFAULT action  was changed to NO ACTION to avoid circular references of cascaded foreign keys.
13  
14      CONSTRAINT map$map_ibfk_2 FOREIGN KEY (map_parent) REFERENCES dbo.map (map_id) 
15           ON DELETE NO ACTION 
16          /* 
17          *   SSMA error messages:
18          *   M2SS0036: ON UPDATE CASCADE|SET NULL|SET DEFAULT action  was changed to NO ACTION to avoid circular references of cascaded foreign keys.
19  
20           ON UPDATE NO ACTION
21          */
22  
23  
24      */
25  
26  
27  )
28  GO
29  CREATE NONCLUSTERED INDEX map_parent
30      ON dbo.map (map_parent ASC)
31  GO

As you can see it gives an error indicating it changed my ON UPDATE CASCADE and ON DELETE CASCADE to NO ACTION in order to "to avoid circular references of cascaded foreign keys."

2. Many-to-many tables

I have two tables that got an error for "multiple paths" and similarly were changed to NO ACTION.

asset_property table:

| asset_id | property_id | property_value  |
|:--------:|:-----------:|:---------------:|
| 933      | 1           | Joseph          |
| 933      | 2           | Green           |
| 936      | 1           | Jacob           |
| 936      | 2           | Yellow          |
| 942      | 1           | Susan           |
| 942      | 2           | Blue            |

MySQL (source):

1  CREATE
2      TABLE `asset_property`
3          (
4              `asset_id` int(11) NOT NULL, 
5              `property_id` int(11) NOT NULL, 
6              `property_value` varchar(100) DEFAULT NULL, 
7               PRIMARY KEY  (`asset_id`, `property_id`) , 
8               KEY `asset_id`  (`asset_id`) , 
9               KEY `property_id`  (`property_id`) , 
10               CONSTRAINT `asset_property_ibfk_1` FOREIGN KEY  (`asset_id`)  REFERENCES `asset`  (`asset_id`)   ON DELETE CASCADE   ON UPDATE CASCADE , 
11               CONSTRAINT `asset_property_ibfk_2` FOREIGN KEY  (`property_id`)  REFERENCES `property`  (`property_id`)   ON DELETE CASCADE   ON UPDATE CASCADE 
12          )  ENGINE = InnoDB DEFAULT  CHARSET = utf8;

SQL Server (target, SQL generated by SSMA):

1  CREATE TABLE dbo.asset_property
2  (
3      asset_id int NOT NULL, 
4      property_id int NOT NULL, 
5      property_value nvarchar(100) NULL DEFAULT NULL, 
6      CONSTRAINT PK_asset_property_asset_id PRIMARY KEY (asset_id, property_id), 
7      /* 
8      *   SSMA error messages:
9      *   M2SS0041: ON DELETE CASCADE|SET NULL|SET DEFAULT action was changed to NO ACTION to avoid multiple paths in cascaded foreign keys.
10  
11      CONSTRAINT asset_property$asset_property_ibfk_1 FOREIGN KEY (asset_id) REFERENCES dbo.asset (asset_id) 
12           ON DELETE NO ACTION 
13          /* 
14          *   SSMA error messages:
15          *   M2SS0037: ON UPDATE CASCADE|SET NULL|SET DEFAULT action was changed to NO ACTION to avoid multiple paths in cascaded foreign keys.
16  
17           ON UPDATE NO ACTION
18          */
19  
20  
21      */
22  
23  , 
24      CONSTRAINT asset_property$asset_property_ibfk_2 FOREIGN KEY (property_id) REFERENCES dbo.property (property_id) 
25           ON DELETE CASCADE 
26           ON UPDATE CASCADE
27  )
28  GO
29  CREATE NONCLUSTERED INDEX asset_id
30      ON dbo.asset_property (asset_id ASC) 31  GO 32  CREATE NONCLUSTERED INDEX property_id
33      ON dbo.asset_property (property_id ASC) 34  GO

I've only found one article that talks about these errors. The article's solution for the self-referencing table error doesn't seem to apply, and the many-to-many error solution is to just remove the constraint "because the application or user shouldn’t be modifying these values."

Thanks for any help!!


db diagram

I am not experienced with SSMA, I have used SSIS for migrating databases.After reading your question, I think I could give you some suggessions..

You have created circular dependency in your database.When your database have circular dependencies and you have data in both dependent tables, if you want migrate the data you must have to disable the key constraints in the destination database.To avoid the second problem should avoid the cascading option and instead use stored procedure or trigger.

You can take a look of this link :

http://blogs.msdn.com/b/ssma/archive/2011/03/19/mysql-to-sql-server-migration-method-for-correcting-schema-issues.aspx

improving a friends list query : counting the mutual friends

11 votes

i have two tables in my database one is to keep users info (users_table ) and the other one keeps track of the friends

users_table:

id    username      avatar  
1         max       max.jpg  
2         jack      jack.jpg  

friends_table :

id    u1_id      u2_id  
1         1          2  
2         1          3  

in every user profile i show his/her friends list

here is my query

select u.id,
    u.username,
    u.avatar
from friends_table f
join users_table u on f.u1_id = u.id || f.u2_id = u.id
where u.id <> $profile_id
    and (f.u1_id = $profile_id || f.u2_id = $profile_id)

this query selects friends of the profile owner ($profile_id)

and join them with the user table to get each friend username and avatar

now i want to count the mutual friends between each friend and the profile owner is it possible to this in one query or should i do some long and probably slow query like this for each founded friend( it's just a example and it might have some syntax error ):

       foreach ( $friends_list_query_resul as $qr ){
       $friend_id = $qr['id'];

       $mutual_count = mysql_query
    ( "select count(*) from friends_table where 
    ($u1_id = $friend_id || $u2_id = $friend_id )
               && 


    ( $u1_id IN ( SELECT `u1_id`,`u2_id` from friends_table where
     ($u1_id = $profile_id || $u2_id = $profile_id ) )

||

      $u2_id IN ( SELECT `u1_id`,`u2_id` from friends_table where
     ($u1_id = $profile_id || $u2_id = $profile_id ) )


       ")
        }

i've decided to add two rows for each friend relation to the table .

id    u1_id      u2_id  
1         10         20  
2         20         10

it makes the process easier and faster .

How to set a MySQL row to READ-ONLY?

10 votes

I have a row in a table that I do not want to be changed (ever).

Is it possible to set a MySQL row to READ-ONLY so that it cannot be updated in any way? If so, how?

If not, is it possible to set a permanent value in one of the columns of that row so that it cannot be changed? If so, how?

Thanks.

You can create a BEFORE UPDATE trigger that raises an error if a "locked" record is about to be updated (e.g. by calling a non-existent procedure); since an error occurs before the operation is undertaken, MySQL ceases to proceed with it. If you also want to prevent the record from being deleted, you'd need to create a similar trigger BEFORE DELETE.

To determine whether a record is "locked", you could create a boolean locked column:

ALTER TABLE my_table ADD COLUMN locked BOOLEAN NOT NULL DEFAULT FALSE;

UPDATE my_table SET locked = TRUE WHERE ...;

DELIMITER ;;

CREATE TRIGGER foo_upd BEFORE UPDATE ON my_table FOR EACH ROW
IF OLD.locked THEN CALL raise_error;
END IF;;

CREATE TRIGGER foo_del BEFORE DELETE ON my_table FOR EACH ROW
IF OLD.locked THEN CALL raise_error;
END IF;;

DELIMITER ;

In your case, as you want to permanently lock a very specific record, you can do without the locked column and instead hard-code the test into your trigger; for example, to "lock" the record with id_column = 1234:

DELIMITER ;;

CREATE TRIGGER foo_upd BEFORE UPDATE ON my_table FOR EACH ROW
IF OLD.id_column <=> 1234 THEN CALL raise_error;
END IF ;;

CREATE TRIGGER foo_del BEFORE DELETE ON my_table FOR EACH ROW
IF OLD.id_column <=> 1234 THEN CALL raise_error;
END IF ;;

DELIMITER ;

InnoDB Bottleneck: Relaxing ACID to Improve Performance

9 votes

After noticing that our database has become a major bottleneck on our live production systems, I decided to construct a simple benchmark to get to the bottom of the issue.

The benchmark: I time how long it takes to increment the same row in an InnoDB table 3000 times, where the row is indexed by its primary key, and the column being updated is not part of any index. I perform these 3000 updates using 20 concurrent clients running on a remote machine, each with its own separate connection to the DB.

I'm interested in learning why the different storage engines I benchmarked, InnoDB, MyISAM, and MEMORY, have the profiles that they do. I'm also hoping to understand why InnoDB fares so poorly in comparison.

InnoDB (20 concurrent clients): Each update takes 0.175s. All updates are done after 6.68s.

MyISAM (20 concurrent clients): Each update takes 0.003s. All updates are done after 0.85s.

Memory (20 concurrent clients): Each update takes 0.0019s. All updates are done after 0.80s.

Thinking that the concurrency could be causing this behavior, I also benchmarked a single client doing 100 updates sequentially.

InnoDB: Each update takes 0.0026s.

MyISAM: Each update takes 0.0006s.

MEMORY: Each update takes 0.0005s.

The actual machine is an Amazon RDS instance (http://aws.amazon.com/rds/) with mostly default configurations.

I'm guessing that the answer will be along the following lines: InnoDB fsyncs after each update (since each update is an ACID compliant transaction), whereas MyISAM does not since it doesn't even support transaction. MyISAM is probably performing all updates in memory, and regularly flushing to disk, which is how its speed approaches the MEMORY storage engine. If this is so, is there a way to use InnoDB for its transaction support, but perhaps relax some constraints (via configurations) so that writes are done faster at the cost of some durability?

Also, any suggestions on how to improve InnoDB's performance as the number of clients increases? It is clearly scaling worse than the other storage engines.

Update

I found https://blogs.oracle.com/MySQL/entry/comparing_innodb_to_myisam_performance, which is precisely what I was looking for. Setting innodb-flush-log-at-trx-commit=2 allows us to relax ACID constraints (flushing to disk happens once per second) for the case where a power failure or server crash occurs. This gives us a similar behavior to MyISAM, but we still get to benefit from the transaction features available in InnoDB.

Running the same benchmarks, we see a 10x improvement in write performance.

InnoDB (20 concurrent clients): Each update takes 0.017s. All updates are done after 0.98s.

Any other suggestions?

I found https://blogs.oracle.com/MySQL/entry/comparing_innodb_to_myisam_performance, which is precisely what I was looking for. Setting innodb-flush-log-at-trx-commit=2 allows us to relax ACID constraints (flushing to disk happens once per second) for the case where a power failure or server crash occurs. This gives us a similar behavior to MyISAM, but we still get to benefit from the transaction features available in InnoDB.

Running the same benchmarks, we see a 10x improvement in write performance.

InnoDB (20 concurrent clients): Each update takes 0.017s. All updates are done after 0.98s.

Is it possible to merge two records by using "delete from" statement?

9 votes

I have some networked equipment that's attached to multiple networks/VLans (A, B & C), and other equipment that's just connected to one of the networks. When I remove or replace a network, I need to update my database to reflect what the equipment is attached to so I'm trying to write a mysql statement to do that, but I'm running into various road blocks.

My table only has two fields and there cannot be duplicate records. My data example is

deviceID network
1        A
1        B
1        C
2        B
2        C
3        A
4        A
5        B

How can I merge network A into network B so the above table would look like...

deviceID network
1        B
1        C
2        B
2        C
3        B
4        B
5        B

My initial attempt was to just set network = 'B' where network = 'A', followed by a DELETE network 'A' statement but that would create duplicates, which isn't allowed for that table - even though the duplicates would be brief. Using alternate methods, I just keep running into failed mysql statements by using WHERE EXISTS and various FROM (SELECT) statements. Is it possible to do in a single mysql statement? Do I need two?

Any help is appreciated.

You could use UPDATE IGNORE with your update statement - this would skip any updates that caused duplicates. You would then follow this with a DELETE to clear the rows that had been skipped. For example:

UPDATE IGNORE mytable SET network = 'B' WHERE network = 'A';
DELETE FROM mytable WHERE network = 'A';

From the documentation:

With the IGNORE keyword, the update statement does not abort even if errors occur during the update. Rows for which duplicate-key conflicts occur are not updated. Rows for which columns are updated to values that would cause data conversion errors are updated to the closest valid values instead.

SQL: Repeat a result row multiple times, and number the rows

8 votes

I have a SQL query with a result like this:

value | count
------+------
foo   |     1
bar   |     3
baz   |     2

Now I want to expand this so that each row with a count larger than 1 occurs multiple times. I also need these rows to be numbered. So I would get:

value | count | index
------+-------+------
foo   |     1 |     1
bar   |     3 |     1
bar   |     3 |     2
bar   |     3 |     3
baz   |     2 |     1
baz   |     2 |     2

I have to make this work on all the major databases (Oracle, SQL Server, MySQL, PostgreSQL, and maybe more). So a solution that works across different databases would be ideal, but clever ways to make it work on any database are appreciated.

For MySQL, use the poor man's generate_series, which is done via views. MySQL is the only RDBMS among big four that don't has any CTE feature.

Actually you can use this technique on database that supports view. So that's virtually all database

Generator technique sourced here: http://use-the-index-luke.com/blog/2011-07-30/mysql-row-generator#mysql_generator_code

The only minor modification we made is we replace the bitwise (shift left and bitwise or) technique from the original technique with mere multiplication and addition respectively; as Sql Server and Oracle has no shift left operator.

This abstraction is 99% guaranteed to work on all database, except Oracle; Oracle's SELECT can't function without any table, in order to do this, one need to select from dummy table, Oracle provided one already, it's called DUAL table. Database portability is a pipe dream :-)

Here's the abstracted views that works on all RDBMS, devoid of bitwise operations(which is not really a necessity anyway in this scenario) and feature nuances(we remove OR REPLACE on CREATE VIEW, only Postgresql and MySQL supports them) among all major database.

Oracle caveat: Just put FROM DUAL after each SELECT expression

CREATE VIEW generator_16
AS SELECT 0 n UNION ALL SELECT 1  UNION ALL SELECT 2  UNION ALL 
   SELECT 3   UNION ALL SELECT 4  UNION ALL SELECT 5  UNION ALL
   SELECT 6   UNION ALL SELECT 7  UNION ALL SELECT 8  UNION ALL
   SELECT 9   UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL
   SELECT 12  UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL 
   SELECT 15;

CREATE VIEW generator_256
AS SELECT ( ( hi.n * 16 ) + lo.n ) AS n
     FROM generator_16 lo, generator_16 hi;

CREATE VIEW generator_4k
AS SELECT ( ( hi.n * 256 ) + lo.n ) AS n
     FROM generator_256 lo, generator_16 hi;

CREATE VIEW generator_64k
AS SELECT ( ( hi.n * 256 ) + lo.n ) AS n
     FROM generator_256 lo, generator_256 hi;

CREATE VIEW generator_1m
AS SELECT ( ( hi.n * 65536 ) + lo.n ) AS n
     FROM generator_64k lo, generator_16 hi;

Then use this query:

SELECT t.value, t.cnt, i.n
FROM tbl t
JOIN generator_64k i 
ON i.n between 1 and t.cnt
order by t.value, i.n

Postgresql: http://www.sqlfiddle.com/#!1/1541d/1

Oracle: http://www.sqlfiddle.com/#!4/26c05/1

Sql Server: http://www.sqlfiddle.com/#!6/84bee/1

MySQL: http://www.sqlfiddle.com/#!2/78f5b/1

Slow MySQL Remote Connection

8 votes

Currently our site is running on 1 server (Ubuntu 10.04 - Rackspace is our host), and in order to be able to handle traffic spikes, we are currently using the highest option that Rackspace offers (30 GB RAM and an 8-core CPU).

CPU is our bottleneck, so I would like to put MySQL on its own server. I have tried doing this, but unfortunately it's adding 9 seconds to the page load time. PHP / MySQL is connecting from 1 server to the other through Rackspace's ServiceNet (local ip address). Before using the ServiceNet ip address, the page load was ridiculously slow (over 40 seconds).

I have added "skip-name-resolve" to my.cnf, and this did not seem to improve performance at all.

I am just wondering what options I have to reduce the remote MySQL connection time. It seems like there has to be something I'm missing because an extra 9 seconds is way too much.

Cloned server running MySQL from the other server: http://173.45.255.52/index.php?action=browse_members

Live server running MySQL locally: http://bros4bros.com/index.php?action=browse_members


UPDATE:

I did a simple query, and my page load time was very fast (270 ms), so I did a much larger query (SELECT * FROM cities WHERE 1... this table has almost 3,000,000 records), and the response time is still not much different from the live site: http://173.45.255.52/simple_query.php

I'm kind of stumped. How can it be adding 9 seconds to the site load time just by running MySQL remotely?

A waterfall shows it's just waiting for 9 seconds.

waiting

The live site's waterfall is below:

live waterfall


UPDATE 2:

Using mysql_pconnect instead of mysql_connect makes no difference. There is still an extra 8 - 9 seconds of waiting. Pinging one server to the other is less than 1 ms. I'm tearing my hair out now.

Temporarily disabling the firewall has no effect.

Using "mysqli_connect" instead of "mysql_connect" successfully connects but queries return no results... SCRATCHES HEAD


UPDATE 3:

Successfully updated the code to use "mysqli_connect" instead of "mysql_connect", and there was no performance benefit. I had to change my "mysql_query" statements to "mysqli_query", etc.


UPDATE 4:

Yeah, that sounds like it has to be the problem (in response to Jens' comment below). We're displaying 60 members on the page, so if we're doing two small queries per member displayed, that's 120 queries. I guess they're just adding up. Time to hit the code to find a more elegant solution to minimize queries... Interestingly enough I destroyed the second server, cloned the first server again, and now I'm only seeing an extra 2.5 seconds. If I develop a more elegant solution to minimize queries, I've got to be able to get that to negligible.

Reduce the number of MySQL queries!

I am in the process of converting a lot of small queries into fewer larger queries, and it is working. The site is now very fast when running MySQL from the other server.

Average posts per hour on MySQL?

8 votes

I have a number of posts saved into a InnoDB table on MySQL. The table has the columns "id", "date", "user", "content". I wanted to make some statistic graphs, so I ended up using the following query to get the amount of posts per hour of yesterday:

SELECT HOUR(FROM_UNIXTIME(`date`)) AS `hour`, COUNT(date)  from fb_posts 
WHERE DATE(FROM_UNIXTIME(`date`)) = CURDATE() - INTERVAL 1 DAY GROUP BY hour

This outputs the following data:

table data

I can edit this query to get any day I want. But what I want now is the AVERAGE of each hour of every day, so that if on Day 1 at 00 hours I have 20 posts and on Day 2 at 00 hours I have 40, I want the output to be "30". I'd like to be able to pick date periods as well if it's possible.

Thanks in advance!

You can use a sub-query to group the data by day/hour, then take the average by hour across the sub-query.

Here's an example to give you the average count by hour for the past 7 days:

select the_hour,avg(the_count)
from
(
  select date(from_unixtime(`date`)) as the_day,
    hour(from_unixtime(`date`)) as the_hour, 
    count(*) as the_count
  from fb_posts
  where `date` >= unix_timestamp(current_date() - interval 7 day)
  and created_on < unix_timestamp(current_date())
  group by the_day,the_hour
) s
group by the_hour

How can I pass an array of PDO parameters yet still specify their types?

7 votes
$sql = "SELECT * FROM table WHERE id LIKE CONCAT('%', :id, '%')
LIMIT :limit1, :limit2";

I want to still use the array input like this:

$stmt->execute($array);

Otherwise I cannot reuse the same method for executing my queries.

At the same time, the :limit1 and :limit2 doesn't work unless it is put in like this:

$stmt->bindParam(':limit1', $limit1, PDO::PARAM_INT);

I tried to do both but it doesn't execute with the bindParams:

$stmt->bindParam(':limit2', $limit2, PDO::PARAM_INT);
$stmt->execute($array);

What is the way around it?

I thought I could extend PDOStatement and add a new method "bindLimit" or something but I can't figure out what internal method PDO uses to bind parameters to a variable.

If you turn off the default setting of PDO::ATTR_EMULATE_PREPARES, then it will work. I just found out that that setting is on by default for mysql, which means you never actually use prepared statements, php internally creates dynamic sql for you, quoting the values for you and replacing the placeholders.

$pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$stmt = $pdo->prepare($sql);
$stmt->execute(array(5)); //works!

The prepares are emulated by default because of performance reasons.

See as well PDO MySQL: Use PDO::ATTR_EMULATE_PREPARES or not?

Pairing content on an external website with entries in an mySQL database

7 votes

tl;dr: I'm looking for a way to find entries in our database which are missing information, getting that information from a website and adding it to the database entry.


We have a media management program which uses a mySQL table to store the information. When employees download media (video files, images, audio files) and import it into the media manager they are suppose to also copy the description of the media (from the source website) and add it to the description in the Media Manager. However this has not been done for thousands of files.

The file name (eg. file123.mov) is unique and the details page for that file can be accessed by going to a URL on the source website:

website.com/content/file123

The information we want to scrape from that page has an element ID which is always the same.

In my mind the process would be:

  1. Connect to database and Load table
  2. Filter: "format" is "Still Image (JPEG)"
  3. Filter: "description" is "NULL"
  4. Get first result
  5. Get "FILENAME" without extension)
  6. Load the URL: website.com/content/FILENAME
  7. Copy contents of the element "description" (on website)
  8. Paste contents into the "description" (SQL entry)
  9. Get 2nd result
  10. Rinse and repeat until last result is reached

My question(s) are:

  1. Is there software that could perform such a task or is this something that would need to be scripted?
  2. If scripted, what would be the best type of script (eg could I achieve this using AppleScript or would it need to be made in java or php etc.)

I too am not aware of any existing software packages that will do everything you're looking for. However, Python can connect to your database, make web requests easily, and handle dirty html. Assuming you already have Python installed, you'll need three packages:

You can install these packages with pip commands or Windows installers. Appropriate instructions are on each site. The whole process won't take more than 10 minutes.

import MySQLdb as db
import os.path
import requests
from bs4 import BeautifulSoup

# Connect to the database. Fill in these fields as necessary.

con = db.connect(host='hostname', user='username', passwd='password',
                 db='dbname')

# Create and execute our SELECT sql statement.

select = con.cursor()
select.execute('SELECT filename FROM table_name \
                WHERE format = ? AND description = NULL',
               ('Still Image (JPEG)',))

while True:
    # Fetch a row from the result of the SELECT statement.

    row = select.fetchone()
    if row is None: break

    # Use Python's built-in os.path.splitext to split the extension
    # and get the url_name.

    filename = row[0]
    url_name = os.path.splitext(filename)[0]
    url = 'http://www.website.com/content/' + url_name

    # Make the web request. You may want to rate-limit your requests
    # so that the website doesn't get angry. You can slow down the
    # rate by inserting a pause with:
    #               
    # import time   # You can put this at the top with other imports
    # time.sleep(1) # This will wait 1 second.

    response = requests.get(url)
    if response.status_code != 200:

        # Don't worry about skipped urls. Just re-run this script
        # on spurious or network-related errors.

        print 'Error accessing:', url, 'SKIPPING'
        continue

    # Parse the result. BeautifulSoup does a great job handling
    # mal-formed input.

    soup = BeautifulSoup(response.content)
    description = soup.find('div', {'id': 'description'}).contents

    # And finally, update the database with another query.

    update = db.cursor()
    update.execute('UPDATE table_name SET description = ? \
                    WHERE filename = ?',
                   (description, filename))

I'll warn that I've made a good effort to make that code "look right" but I haven't actually tested it. You'll need to fill in the private details.

Cascade delete of data table and search table at same time

6 votes

I'm storing file system hierarchies of directories and files.

In an innodb table, I store the details of each directory/file and maintain the parent child relationship with a foreign key constraint that will cascade on delete.

A myisam table is used to search these directories/files with a full text search. It contains the names and id's of each row.

Any rows in the data table (innodb table) will have a corresponding row in the search table (myisam table) and adding or removing rows from the data table must be reflected in the search table.

I'm trying to find the best solution to maintain data consistency between the two tables when deleting a parent directory. The innodb table is fine. I delete the parent, the delete cascades through the children until they are all deleted. Deleting the corresponding rows from the myisam table is more difficult.

My first thought was to use an on-delete trigger on the innodb table. When a row is deleted, it deletes the corresponding row from myisam table. However, since MySQL does not activate triggers during a cascade delete (a known bug for 7 years that was fixed by mentioning the lack of support in the manual), that is not an option.

My second thought was put a parent child relationship in the search table, but it is a myisam table to support the full text search functionality, and so it does not support foreign key constraints.

I had heard that innodb now supports full text searches, so I thought maybe I could change the search table engine, but its only available in the lab release.

My last thought was to abandon foreign key constraints and use only triggers to maintain data consistency. On delete, delete from both innodb and myisam table where parent = OLD.id. However, to prevent endless loops that could corrupt all data in the table, MySQL does not support manipulating the data in the same table that activated the trigger.

I have resorted to programmatically retrieving all children under the parent directory through a loop of requests, however, I feel there has got to be a better option. Is there any other work around that would be more efficient? At this point, the only two options I can think of are waiting for one of the above approaches to be fixed or changing to a different RDBMS like PostgreSQL that does support firing triggers from a cascade delete.

Any other ideas would be greatly appreciated.

These sort of headaches are exactly the thing that made me move away from mysql where possible.

... I feel there has got to be a better option ...

Sadly there isn't. The simple problem is that you can't delete cascade and have mysql know what it just deleted. Therefor your only option is to find out what its about to delete before it does (this is the algorithm you suggested at the end).

Since cascading will break your data you should not use an on update cascade key so that attempting to delete a parent directory without deleting the child will fail.

I would advise that you create a procedure to do the heavy lifting (deleting) for you. This will prevent a large IO between your app and the DB as it recuses through all the directories. Iy will also provide common code for doing so if you ever access the same db through a different app (or you just want to do something manually).

As I stated first, I use postgresql mostly these days. This is one example of why.

Shift values down the chain in a table

6 votes

Provided that I have the following result set from a mysql database table:

+----+------+-------+
| ID | type | value |
+----+------+-------+
|  8 |    A |  1435 |
|  9 |    B |  7348 | 
| 10 |    A |  1347 | 
| 11 |    A |  3478 | 
| 12 |    A |  4589 | 
| 13 |    B |  6789 |
+----+------+-------+

I would like to delete row ID 8 and push the values in the field 'value' down, in such a way that every row has the value of previous entry, but affecting only those where the field 'type' is the same as the row being deleted ('A' in this case).

That is to say, deleting row id 8 should eventually yield the following:

+----+------+-------+
| ID | type | value |
+----+------+-------+
|  - |    - |    -  | *
|  9 |    B |  7348 |   |
| 10 |    A |  1435 | * |
| 11 |    A |  1347 | * |
| 12 |    A |  3478 | * |
| 13 |    B |  6789 |   V
+----+------+-------+

ID 10 has inherited the value from ID 8, then ID 11 inherits from ID 10, and so on. Notice however how rows having type 'B' are unaffected.

So the question: Is there any way to perform this "shift" of values without having to query and update each row one by one? In an ideal world I would do one query to do shift and then another to delete the row, but I'm not quite sure if this is possible at all.

(Also I would rather not use Triggers, since I intend encapsulate all the application logic within the application itself)

SET @remove_id = 8;

SELECT ID, type, value FROM (
  SELECT   ID,
           type,
           CAST(IF(type <> @type OR ISNULL(@val), value, @val) AS UNSIGNED)
             AS value,
           @type := IF(ID   = @remove_id, type, @type),
           @val  := IF(type = @type, value, @val)
  FROM     my_table JOIN (SELECT @type := NULL, @val := NULL) AS z
  ORDER BY ID ASC
) AS t
WHERE ID <> @remove_id

See it on sqlfiddle.


UPDATE

I hadn't realised you actually wanted to update the underlying table. For that, you can use some slight hackery to effectively do the same thing in an UPDATE statement (one can't assign to user variables directly, so instead assign to a column the concatenation of its new value and a null string formed from taking the first 0 characters of the newly assigned user variable):

SET @remove_id = 8, @type = NULL, @val = NULL;

UPDATE my_table SET
  value = IF(
    type <> @type OR ISNULL(@val),
    value,
    CONCAT(@val, LEFT(@val := value, 0))
  ),
  type = CONCAT(type, LEFT(
    @type := IF(
      ID <> @remove_id,
      @type,
      CONCAT(type, LEFT(@val := value, 0))
    )
  , 0))
ORDER BY ID ASC;

DELETE FROM my_table WHERE ID = @remove_id;

See it on sqlfiddle.

Best practice for storing usernames & password in MySQL Databases

5 votes

Possible Duplicate:
Secure hash and salt for PHP passwords

I am making a system that has stores user credentials (email, username and password) in a MySQL database and have seen conflicting views on using encryption, salting and encryption types.

What are the best methods you would recommend? Encoding in MD5 or SHA1? Salting or not salting? Encrypting just the password or all 3 elements?

For the password hash use PBKDF2 it's NIST approved. You should use a random non-secret salt for each password and nontrivial (over 1000) iteration count.

For the username and email, probably not worth encrypting.

Limiting integer data type field lengths

5 votes

I am trying to limit the number of numbers that an integer field can contain. For example, I want the field to contain a number no more than 5 long, so 99999 would be the highest valid entry.

Is this possible to do in MySQL? I have looked at the documentation but haven't found my answer.

Unfortunately neither the CHECKconstraint nor user defined types are implemented in MySQL. Maybe this will change in future versions.

Until then you can use a trigger to correct the input if that is a way to go for you:

delimiter //
CREATE TRIGGER trigger_check BEFORE INSERT ON your_table
FOR EACH ROW 
BEGIN 
    IF NEW.NUM > 99999 THEN 
        SET NEW.NUM = 0; 
    END IF; 
END
//