Best database questions in September 2011

Pizza & Food - database design

8 votes

I want to create a website that allow customer to order food from the website.

There are two food type:

  • regular food/drink (eg: burger, donner kebab, chip, coke, pepsi etc)
  • Pizza food (eg: Margherita Pizza, meat Pizza, etc)

If they select pizza from the list - they may need to select Base (thin crust, dep crust), Extras, and the pizza size/Option.

How to design the tables in this situation?

Note: Each item have 1 or more options. An option may have extra (1 or more) or without extra. If item is pizza type - then it may have Base (crust)

Screenshots prototype

See two screenshots I am trying to implement, I on the right path on the database design or what could have done better?

Pizza customize:

enter image description here

Beef Burger customize:

enter image description here

Extras functionality (Dropdown / tickboxes)

On the extras, sometime I need to add multiple extras for the dropdown instead tickboxes.. That mean customer can only choose 1 from 1, 2 or 3 dropdowns .. how do I implement like this into database? Example Below:

enter image description here

Here what I have came up with:

categories Table:

+----------+--------------+------+-----+---------+----------------+
| Field    | Type         | Null | Key | Default | Extra          |
+----------+--------------+------+-----+---------+----------------+
| cat_id   | int(11)      | NO   | PRI | NULL    | auto_increment |
| cat_name | varchar(100) | NO   |     | NULL    |                |
+----------+--------------+------+-----+---------+----------------+

items Table:

+-----------+--------------+------+-----+---------+----------------+
| Field     | Type         | Null | Key | Default | Extra          |
+-----------+--------------+------+-----+---------+----------------+
| item_id   | int(11)      | NO   | PRI | NULL    | auto_increment |
| cat_id    | int(11)      | NO   |     | NULL    |                |
| item_name | varchar(100) | NO   |     | NULL    |                |
| item_type | int(11)      | NO   |     | NULL    |                |
+-----------+--------------+------+-----+---------+----------------+

- item_type (0 = normal, 1 = pizza, 2 = set meal)

item_options Table:

+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| option_id   | int(11)      | NO   | PRI | NULL    | auto_increment |
| item_id     | int(11)      | NO   |     | NULL    |                |
| option_name | varchar(100) | NO   |     | NULL    |                |
| price       | decimal(6,2) | NO   |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

item_extras Table: (Do you think should have separate tables for pizza toppin and extras?)

+-----------+--------------+------+-----+---------+----------------+
| Field     | Type         | Null | Key | Default | Extra          |
+-----------+--------------+------+-----+---------+----------------+
| extra_id  | int(11)      | NO   | PRI | NULL    | auto_increment |
| option_id | int(11)      | NO   |     | NULL    |                |
| name      | varchar(50)  | NO   |     | NULL    |                |
| cost      | decimal(6,2) | NO   |     | NULL    |                |
+-----------+--------------+------+-----+---------+----------------+

item_pizza_base Table:

+-----------+--------------+------+-----+---------+----------------+
| Field     | Type         | Null | Key | Default | Extra          |
+-----------+--------------+------+-----+---------+----------------+
| base_id   | int(11)      | NO   | PRI | NULL    | auto_increment |
| option_id | int(11)      | NO   |     | NULL    |                |
| base_name | varchar(50)  | NO   |     | NULL    |                |
| cost      | decimal(6,2) | NO   |     | NULL    |                |
+-----------+--------------+------+-----+---------+----------------+

SQL Result:

mysql> select * from categories;
+--------+----------+
| cat_id | cat_name |
+--------+----------+
|      1 | Pizzas   |
|      2 | Burgers  |
+--------+----------+

mysql> select * from items;
+---------+--------+------------------+-----------+
| item_id | cat_id | item_name        | item_type |
+---------+--------+------------------+-----------+
|       1 |      1 | Vegetarian Pizza |         1 |
|       2 |      2 | Beef Burger      |         0 |
+---------+--------+------------------+-----------+

mysql> select * from item_options;
+-----------+---------+-------------+-------+
| option_id | item_id | option_name | price |
+-----------+---------+-------------+-------+
|         1 |       1 | 12 Inches   |  5.60 |
|         2 |       1 | 14 Inches   |  7.20 |
|         3 |       2 | 1/4lb       |  1.80 |
|         4 |       2 | 1/2lb       |  2.50 |
+-----------+---------+-------------+-------+

mysql> select * from item_extras;
+----------+-----------+-----------+------+
| extra_id | option_id | name      | cost |
+----------+-----------+-----------+------+
|        1 |         1 | Mushroom  | 1.00 |
|        2 |         1 | Pepperoni | 1.00 |
|        3 |         2 | Mushroom  | 1.00 |
|        4 |         2 | Pepperoni | 1.00 |
|        5 |         3 | Chips     | 0.50 |
|        6 |         4 | Chips     | 0.50 |
+----------+-----------+-----------+------+

As you can see extras from burger and pizza in 1 table.. should it be separated?

mysql> select * from item_pizza_base;
+---------+-----------+------------+------+
| base_id | option_id | base_name  | cost |
+---------+-----------+------------+------+
|       1 |         1 | Thin Crust | 0.00 |
|       2 |         1 | Deep Crust | 0.00 |
|       3 |         2 | Thin Crust | 0.00 |
|       4 |         2 | Deep Crust | 0.00 |
+---------+-----------+------------+------+

keep in mind, price extras for each item is not always the same. For example: Pizza size 10" will cost 1.00 for each extra but 0.50 for 12" pizzas. Also there will be a case for each pizza will have different cost of extras.

Is the database design correct or what could be improved?

Im stuck with Extras functionality - how to design table and fields for the dropdown extras? See my question "Extras functionality (Dropdown / tickboxes)"

If you need to make a dropdown, put the values for the dropdown in a table.
It's OK to put extra values in there, as long as you can easily separate out the values you need in the dropdown.

Example

table option_labels
-------------------
id    integer auto_increment PK
name  varchar(40)

table toppings
--------------
id               integer  auto_increment PK
option_label_id  integer  foreign key references option_labels(id)
item_id          integer  foreign key references items(item_id)
price            decimal(10,2)

As soon as you know the item, you can populate the dropdown using:

SELECT ol.name, t.price FROM toppings t
INNER JOIN option_labels ol ON (t.option_label_id = ol.id)
WHERE t.item_id = '$item_id'

Normalize those tables
This table has a major flaw:

mysql> select * from item_extras;
+----------+-----------+-----------+------+
| extra_id | option_id | name      | cost |
+----------+-----------+-----------+------+
|        1 |         1 | Mushroom  | 1.00 |
|        2 |         1 | Pepperoni | 1.00 |
|        3 |         2 | Mushroom  | 1.00 |
|        4 |         2 | Pepperoni | 1.00 |
|        5 |         3 | Chips     | 0.50 |
|        6 |         4 | Chips     | 0.50 |
+----------+-----------+-----------+------+

It's not normalized, put the names in a separate labels table, just like in the example above. If a row (excluding the id) is not unique then your data is not normalized and you're doing it wrong.

Because you are using a lot of joins, it's IMHO best to use InnoDB, it has some cool features to speed up joins that use PK's a lot.

No matter what people say
Don't denormalize until slowness starts.

Use indexes
Put an index on all fields named *_id.
Also put an index on alls fields that are used in where clauses often.
Do not put indexes on fields with low cardinality, so no index on a boolean or enum field!
Indexes slow down inserts and speed up selects.

processing a large number of database entries with paging slows down with time

7 votes

I am trying to process millions of records from my table (size is about 30 GB) and I am currently doing it using paging (mysql 5.1.36). The query I use in my for loop is

select blobCol from large_table 
where name= 'someKey' and city= 'otherKey' 
order by name
LIMIT <pageNumber*pageSize>, <pageSize>

This works perfectly fine for about 500K records. I have a page size of 5000 that I am using and after page 100, the queries start slowing down dramatically. The first ~80 pages are extracted in a 2-3 seconds but after around page 130, each page takes about 30 seconds to retrieve, at least until page 200. One of my queries has about 900 pages and that would take too long.

The table structure is (type is MyISAM)
    name char(11)
    id int // col1 & col2 is a composite key
    city varchar(80) // indexed
    blobCol longblob

what can i do to speed it up? The explain for the query shows this

select_type: SIMPLE
possible_keys: city
key : city
type: ref
key_len: 242
ref: const
rows: 4293720
Extra: using where; using filesort

In case it helps, the my.cnf for my server (24 GB ram, 2 quad core procs) has these entries

  key_buffer_size = 6144M
  max_connections = 20
  max_allowed_packet = 32M
  table_open_cache = 1024
  sort_buffer_size = 256M
  read_buffer_size = 128M
  read_rnd_buffer_size = 512M
  myisam_sort_buffer_size = 128M
  thread_cache_size = 16
  tmp_table_size = 128M
  max_heap_table_size = 64M

Here is what I did, and reduced the total execution time by a factor of 10.

What I realized form the execution plan of my original query was that it was using filesort for sorting all results and ignoring the indexes. That is a bit of a waste.

My test database: 5 M records, 20 GB size. table structure same as in the question

Instead of getting blobCol directly in the first query, i first get the value of 'name' for beginning of every page. Run this query indefinitely until it returns 0 results. Every time, add the result to a list

SELECT name
FROM my_table
where id = <anyId> // I use the id column for partitioning so I need this here
order by name
limit <pageSize * pageNumber>, 1

Sine page number is not previously known, start with value 0 and keep incrementing until the query returns null. You can also do a select count(*) but that itself might take long and will not help optimize anything. Each query took about 2 seconds to run once the page number exceeded ~60.

For me, the page size was 5000 so I got a list of 'name' strings at position 0, 5001, 10001, 15001 and so on. The number of pages turned out to be 1000 and storing a list of 1000 results in memory is not expensive.

Now, iterate through the list and run this query

SELECT blobCol
FROM my_table
where name >= <pageHeader>
and name < <nextPageHeader>
and city="<any string>"
and id= 1

This will run N times, where N = size of list obtained previously. Since 'name' is the primary key col, and 'city' is also indexed, EXPLAIN shows that this calculation is performed in memory using the index.

Now, each query takes 1 second to run, instead of the original 30-40. So combining the pre-processing time of 2 seconds per page, total time per page is 3-4 seconds instead of 30-40.

If anyone has a better solution or if there is something glaringly wrong with this one, please let me know

Are there plans to support "type providers" for Scala's SIQ (ScalaIntegratedQuery) like in F#?

7 votes

The current state of SIQ was presented by Christopher Vogt at ScalaDays 2011. It was shown how queries would work and look like, but as far as I remember there was no notion about how those types would be represented, e. g. if it is still necessary to write boilerplate code to explain the database structure to Scala.

F# 3.0 adds type providers (PDC talk by Don Syme: video; GOTO Copenhagen talk by Tomas Petricek: video, slides, blog post), which make it basically unnecessary to manually write or generate mappings between a DB (or another data provider) and the language/type system, because the language can query structural information from the database itself directly with type providers.

Will ScalaIntegratedQuery be able to provide the types/relationships of a database to the Scala compiler without requiring the user to write boilerplate code themselves?

The answer is YES. We will offer an automatic solution, without manual boilerplate. The current SIQ prototype uses code-generation, but this aspect is not very developed yet, which is why I did not show it at Scala Days. Work on SIQ resumes, when Stefan Zeiger and I start our jobs in Lausanne on 3rd of October. We will review F# type providers for further inspiration. Thank you for the hint!

How to universally skip database touches when precompiling assets on Heroku

7 votes

I'm deploying a Rails 3.1 app to Heroku's Cedar stack. With Heroku Cedar and Rails 3.1, you can compile the assets yourself locally, let Heroku compile them when you push (during "slug compilation"), or have them be compiled just-in-time while the app is running. I want to do the middle option, letting Heroku precompile the assets.

When Heroku runs the assets:precompile task, it errors with "could not connect to server" because the app is trying to connect to the database but no database is available at that stage of slug compilation. The lack of database connection is expected and unavoidable at this point. I'm looking for a way to move past it, because a database connection isn't crucial to asset precompilation.

The part of my app that's trying to connect to the database is Devise. There's a devise_for :users line in routes.rb that wants to look at the User model.

I could just write a rake task that stubs out devise_for and make it a prereq of assets:precompile. I think that would solve my problem, but I'm looking for a more universal solution that I could use on any Rails 3.1 app with this problem on Heroku.

Is there anything out there, or can you conceive of anything that silences database connection errors while still running the app enough to have route and asset path generation?

Obviously if an app needs to read/write data during startup, we can't stub that, but can we fake every ActiveRecord model automatically?

Not quite a universal stubbing but devise has added a check now to fix this particular problem . See the issue and fix on Github. By providing a RAILS_ASSETS_PRECOMPILE environment config devise should skip building the routes

Oracle - How to create a materialized view with FAST REFRESH and JOINS

7 votes

So I'm pretty sure Oracle supports this, so I have no idea what I'm doing wrong. This code works:

CREATE MATERIALIZED VIEW MV_Test
  NOLOGGING
  CACHE
  BUILD IMMEDIATE 
  REFRESH FAST ON COMMIT 
  AS
    SELECT V.* FROM TPM_PROJECTVERSION V;

If I add in a JOIN, it breaks:

CREATE MATERIALIZED VIEW MV_Test
  NOLOGGING
  CACHE
  BUILD IMMEDIATE 
  REFRESH FAST ON COMMIT 
  AS
    SELECT V.*, P.* FROM TPM_PROJECTVERSION V
    INNER JOIN TPM_PROJECT P ON P.PROJECTID = V.PROJECTID

Now I get the error:

ORA-12054: cannot set the ON COMMIT refresh attribute for the materialized view

I've created materialized view logs on both TPM_PROJECT and TPM_PROJECTVERSION. TPM_PROJECT has a primary key of PROJECTID and TPM_PROJECTVERSION has a compound primary key of (PROJECTID,VERSIONID). What's the trick to this? I've been digging through Oracle manuals to no avail. Thanks!

To start with, from the Oracle Database Data Warehousing Guide:

Restrictions on Fast Refresh on Materialized Views with Joins Only

...

  • Rowids of all the tables in the FROM list must appear in the SELECT list of the query.

This means that your statement will need to look something like this:

CREATE MATERIALIZED VIEW MV_Test
  NOLOGGING
  CACHE
  BUILD IMMEDIATE 
  REFRESH FAST ON COMMIT 
  AS
    SELECT V.*, P.*, V.ROWID as V_ROWID, P.ROWID as P_ROWID 
    FROM TPM_PROJECTVERSION V,
         TPM_PROJECT P 
    WHERE P.PROJECTID = V.PROJECTID

Another key aspect to note is that your materialized view logs must be created as with rowid.

Below is a functional test scenario:

CREATE TABLE foo(foo NUMBER, CONSTRAINT foo_pk PRIMARY KEY(foo));

CREATE MATERIALIZED VIEW LOG ON foo WITH ROWID;

CREATE TABLE bar(foo NUMBER, bar NUMBER, CONSTRAINT bar_pk PRIMARY KEY(foo, bar));

CREATE MATERIALIZED VIEW LOG ON bar WITH ROWID;

CREATE MATERIALIZED VIEW foo_bar
  NOLOGGING
  CACHE
  BUILD IMMEDIATE
  REFRESH FAST ON COMMIT  AS SELECT foo.foo, 
                                    bar.bar, 
                                    foo.ROWID AS foo_rowid, 
                                    bar.ROWID AS bar_rowid 
                               FROM foo, bar
                              WHERE foo.foo = bar.foo;

Oracle combine multiple rows into one with distinct title

6 votes

I have this statement that combines multiple rows into one and then outputs it in the way I need the data:

 SELECT COURSES_ID, REQUISITE_TYPE_TITLE
            , RTRIM
           ( xmlagg (xmlelement (c, CONDITION_TITLE || '' || REQ_TEXT || ''  ) order by   ORDER_NUM).extract ('//text()')
           , ',' ) AS REQTexts
    FROM   COS_REQUISITES
    WHERE COURSES_ID = '1175'
    AND REQUISITE_TYPE_TITLE !=  'Corequisite'
    GROUP BY COURSES_ID, REQUISITE_TYPE_TITLE;

Result:

╔═══════════╦════════════╦═════════════════════════════════════╦═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ ORDER_NUM ║ COURSES_ID ║        REQUISITE_TYPE_TITLE         ║                                                                                                                                                           REQ_TEXT                                                                                                                                                            ║
╠═══════════╬════════════╬═════════════════════════════════════╬═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║         1 ║       1175 ║ Limitation on Enrollment            ║   With minimum grade of &quot;C&quot;                                                                                                                                                                                                                                                                                         ║
║         2 ║       1175 ║ Advisory on Recommended Preparation ║ MATH 200 or equivalent college course with &quot;C&quot; or better  or equivalent college course with &quot;C&quot; or better or MATH 205 or equivalent college course with &quot;C&quot; or better or   or equivalent college course with &quot;C&quot; or better  or equivalent college course with &quot;C&quot; or better ║
╚═══════════╩════════════╩═════════════════════════════════════╩═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

Structure:

╔═══════════╦════════════╦═══════════╦═════════════════════════════════════╦══════════════════════════════════════════════════════════╗
║ ORDER_NUM ║ COURSES_ID ║ CONDITION ║        REQUISITE_TYPE_TITLE         ║                         REQ_TEXT                         ║
╠═══════════╬════════════╬═══════════╬═════════════════════════════════════╬══════════════════════════════════════════════════════════╣
║      1164 ║       1175 ║           ║ Advisory on Recommended Preparation ║ MATH 200 or equivalent college course with "C" or better ║
║      1165 ║       1175 ║           ║ Advisory on Recommended Preparation ║   or equivalent college course with "C" or better        ║
║      1166 ║       1175 ║  or       ║ Advisory on Recommended Preparation ║ MATH 205 or equivalent college course with "C" or better ║
║      1167 ║       1175 ║  or       ║ Advisory on Recommended Preparation ║   or equivalent college course with "C" or better        ║
║      1168 ║       1175 ║           ║ Advisory on Recommended Preparation ║   or equivalent college course with "C" or better        ║
║      1169 ║       1175 ║           ║ Limitation on Enrollment            ║   With minimum grade of "C"                              ║
╚═══════════╩════════════╩═══════════╩═════════════════════════════════════╩══════════════════════════════════════════════════════════╝

Wanted result:

╔═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ Advisory on Recommended Preparation MATH 200 or equivalent college course with &quot;C&quot; or better  or equivalent college course with &quot;C&quot; or better or MATH 205 or equivalent college course with &quot;C&quot; or better or equivalent college course with &quot;C&quot; or better  or equivalent college course with &quot;C&quot; or better Limitation on Enrollment With minimum grade of &quot;C&quot; ║
╚═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

I'm trying to avoid having to put this to a temp table or multiple views that would output this into one section.

I know it has bad data but I was told Garbage In, Garbage Out. Any help would be greatly appreciated.

Even with LISTAGG which doesn't allow for distinct as far as I can tell:

SELECT COURSES_ID, LISTAGG(REQUISITE_TYPE_TITLE || ' ' || CONDITION_TITLE || ' ' || REQ_TEXT, ' ') WITHIN GROUP (ORDER BY ORDER_NUM) AS Title FROM COS_REQUISITES WHERE COURSES_ID = '1175' GROUP BY COURSES_ID;

Result:

╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                                                                                                                                                                                                                                             TITLE                                                                                                                                                                                                                                                              ║
╠════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║ Advisory on Recommended Preparation  MATH 200 or equivalent college course with "C" or better Advisory on Recommended Preparation    or equivalent college course with "C" or better Advisory on Recommended Preparation  or  MATH 205 or equivalent college course with "C" or better Advisory on Recommended Preparation  or    or equivalent college course with "C" or better Advisory on Recommended Preparation    or equivalent college course with "C" or better Limitation on Enrollment    With minimum grade of "C" ║
╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

Notice the REQUISITE_TYPE_TITLE repeats, I only want it to show up once like the wanted result. I can't modify the table to have them null due to this is a table being loaded dynamically and updated by someone else daily.

a bit messy but should yield what you asked for

SELECT 
COURSES_ID, 
RTRIM ( xmlagg (xmlelement (c, REQUISITE_TYPE_TITLE || '' || REQTexts || ''  ) order by   mino).extract ('//text()'), ',' ) AS REQTexts
FROM
(
SELECT 
COURSES_ID, REQUISITE_TYPE_TITLE, MIN (ORDER_NUM) mino, 
RTRIM ( xmlagg (xmlelement (c, CONDITION_TITLE || '' || REQ_TEXT || ''  ) order by   ORDER_NUM).extract ('//text()') , ',' ) AS REQTexts
FROM COS_REQUISITES
WHERE COURSES_ID = '1175'
AND REQUISITE_TYPE_TITLE !=  'Corequisite'
GROUP BY COURSES_ID, REQUISITE_TYPE_TITLE
) 
GROUP BY COURSES_ID;

Is it *really* worth to use integer over varchar for a set of data?

6 votes

For example if I have a table User, I want to store gender or sex, I'll add a column like sex.

Is it really worth to use an integer and then map it in my favorite programming language?

Like 1 => 'Male' and 2 => 'Female'

Is there any performance reason to do that?

Or could I safely use a varchar which more meaning with 'female' or 'male' almost like I was using mysql ENUM ?

Edit: I here and there that it is sometimes better, sometimes it doesn't matter, so I more looking for benchmark or something over a "it is better" answer.

I mean I think using varchar is actually more meaningfull than an integer, and I would use an integer only if performance are more than 0.3% or something.

Ortiginal Answer:
I would suggest storing it in a CHAR(1) column as M or F
It is expressive enough for the specific purpose AND has the speed benefit of being a single character comparison

Update 4 (fixed benchmark):
All previous benchmarks had a fatal flaw that one (the CHAR(1)) table was MyISAM and all other were InnoDB. So I recreated the database with all tables using the MyISAM and the results make much more sense now.

The error creeped in as I used the MySQLWorkbench's wizard to create the tables and forgot to change the database engine in the other tables and it defaulted to InnoDB (I have MySQL 5.5)

So the corrected results are as follows, (I have removed all my previous benchmarks as they were invalid) :

// select queries
$query['char'] = "select count(*) from test_table where gender = 'M'";
$query['char_lower'] = "select count(*) from test_table where LOWER(gender) = 'm'";
$query['varchar'] = "select count(*) from test_table_2 where gender = 'Male'";
$query['varchar_lower'] = "select count(*) from test_table_2 where LOWER(gender) = 'male'";
$query['tinyint'] = "select count(*) from test_table_3 where gender = 1";

// benchmark result
array
  'char' => float 0.35457420349121
  'char_lower' => float 0.44702696800232
  'varchar' => float 0.50844311714172
  'varchar_lower' => float 0.64412498474121
  'tinyint' => float 0.26296806335449

New Conclusion : TINYINT Is fastest. But my recommendation would be still yo use CHAR(1) as it would be easier for future developers to understand the database.

If you do use TINYINT, my recommendation would be name the column ismale instead of sex and store 0 => Female and 1 => male thus making it a little more easy to understand in raw database.

The table structure for benchmark is this:

CREATE TABLE `test_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `gender` char(1) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM

Only the type of the gender column is different in the 3 tables, the types are:

CHAR(1), VARCHAR(6), TINYINT

All 3 tables have 10000 entries.

Magic: The Gathering database design

6 votes

I would like to create a database for MTG cards I own. What would the design be? I would like to store the following information about each card:

1. Name of card.
2. Set the card belongs to.
3. Condition of card.
4. Price it sold for.
5. Price I bought it for.

Here is a little info about MTG cards in general:

1. Every card has a name. 
2. Every card belongs to a set.
3. A card may have a foil version of itself. 
4. Card name, set it belongs to, and whether it's foil or not makes it unique. 
5. A card may be in multiple sets.
6. A set has multiple cards. 

The gimmick is that in my collection I may have several copies of the same card but with different conditions, or purchased price, or sold price may be different.

I will have another collection of mtg cards that have been sold on eBay. This collection will have the price/condition/date/whether it was a "Buy It Now" or Bid, etc.

The idea is to find out what price I should sell my cards based on the eBay collection.

It's not a programming question, it's a modeling question. Anyone who is programming but not modeling, is a coder, not a programmer. That's just a step above data entry. Modeling is a fundamental aspect of programming as it deals directly with abstraction, and abstraction is the real genius of computer programming.

Normalization and database design is an excellent way for someone to become better at programming in general as normalization is also an abstraction process.

Abstraction is arguably the most difficult aspect of computer programming, particularly since computer programming requires a person to both be especially pedantic and literal (in order to properly work with the steadfast stubbornness and idiocy that is a computer) as well as handle and work in a very high level and abstract space.

For example, the arguments in design meetings are not over language syntax.

So, that said. I have updated the schema in minor ways to address the changes.

create table card (
    card_key numeric not null primary key,
    name varchar(256) not null,
    foil varchar(1) not null); -- "Y" if it's foil, "N" if it is not.

create table set (
    set_key numeric not null primary key,
    name varchar(256) not null);

create table cardset (
    card_key numeric not null references card(card_key),
    set_key numeric not null references set(set_key));

create table condition (
    condition_key numeric not null primary key,
    alias varchar(64),
    description varchar(256));

create table saletype (
    saletype_key numeric not null primary key,
    alias varchar(64),
    description varchar(256));

create table singlecard (
    singlecard_key numeric not null primary key,
    card_key numeric not null references card(card_key),
    condition_key numeric not  null references condition(condition_key),
    purchase_date date,
    purchase_price numeric,
    saletype_key numeric references saletype(saletype_key),
    sell_date date,
    sell_price numeric,
    notes varchar(4000));

A more detailed explanation.

The card table is the concept of the card vs an actual card. You can have a card row without having any actual cards in hand. It models any details of the card that are common to all cards. Obviously MTG cards have very many details (color text as some one mentioned), but these are likely not important to this kind of model, since this is to track actual cards for the sake of collecting and sale. But if there was any desire to add any other attributes, like card rarity, the 'card' table would be the place to put them.

The set table is for the sets. I don't know what a set is, only what is posited here (there is also casual reference to a series, I don't know if they are related or not). Sets have a name, and are used to group cards. So, we have a 'set' table.

The cardset table is the many-to-many joiner table. Since a set can have several cards, and a card can belong to several sets, the model needs something to represent that relationship. This is a very common pattern in relational databases, but it is also non-obvious to novices.

There are two simple lookup tables, the condition and saletype table. These two tables are here for normalization purposes and let the user standardize their terms for these two categories of data. They each have an 'alias' and a 'description'. The alias is the short english version: 'Good', 'Poor', 'Auction', 'Buy it now', while the description is the longer english text 'Poor cards show sign of wear, bending, and rub marks'. Obviously someone doing this for their own purpose likely do not need the description, but it's there as a habit.

Finally, the meat of the system is the singlecard table. The singlecard table represents an actual, in your hand card. It models all of the characteristic that make each actual card different from each other. An individual card is not a member of a set (at least not from the description), rather that's a higher level concept (such as how it was published -- all "Hero: Bartek the Axe Wielder" cards are part of the "Dark Mysteries" and "Clowns of Death" sets, or whatever). So, the single card needs only reference its parent card table, with the actual common card characteristics.

This single card has the references to the card's condition and how it was sold via the foreign keys to the appropriate tables. It also has the other data, such as the dates and prices that were necessary.

Based on what was given, this should meet the basic need.

It would be a good exercise to remodel this yourself. Start with the your most basic needs, and the best model that you understand to make. Then contrast it to what I've written here, and then use that book to perhaps try and understand how whatever your simple design may have been becomes this design.

Note that there is no way to actually enforce that a card is a member of ANY set, or that a set has any cards. That will be an application logic problem. This is the one of this issues with many-to-many joiner tables. It can model the relationship, but it can not enforce it.

Fastest Way to Count Distinct Values in a Column, Including NULL Values

5 votes

The Transact-Sql Count Distinct operation counts all non-null values in a column. I need to count the number of distinct values per column in a set of tables, including null values (so if there is a null in the column, the result should be (Select Count(Distinct COLNAME) From TABLE) + 1.

This is going to be repeated over every column in every table in the DB. Includes hundreds of tables, some of which have over 1M rows. Because this needs to be done over every single column, adding Indexes for every column is not a good option.

This will be done as part of an ASP.net site, so integration with code logic is also ok (i.e.: this doesn't have to be completed as part of one query, though if that can be done with good performance, then even better).

What is the most efficient way to do this?


Update After Testing

I tested the different methods from the answers given on a good representative table. The table has 3.2 million records, dozens of columns (a few with indexes, most without). One column has 3.2 million unique values. Other columns range from all Null (one value) to a max of 40K unique values. For each method I performed four tests (with multiple attempts at each, averaging the results): 20 columns at one time, 5 columns at one time, 1 column with many values (3.2M) and 1 column with a small number of values (167). Here are the results, in order of fastest to slowest

  1. Count/GroupBy (Cheran)
  2. CountDistinct+SubQuery (Ellis)
  3. dense_rank (Eriksson)
  4. Count+Max (Andriy)

Testing Results (in seconds):

   Method          20_Columns   5_Columns   1_Column (Large)   1_Column (Small)
1) Count/GroupBy      10.8          4.8            2.8               0.14       
2) CountDistinct      12.4          4.8            3                 0.7         
3) dense_rank        226           30              6                 4.33 
4) Count+Max          98.5         44             16                12.5        

Notes:

  • Interestingly enough, the two methods that were fastest (by far, with only a small difference in between then) were both methods that submitted separate queries for each column (and in the case of result #2, the query included a subquery, so there were really two queries submitted per column). Perhaps because the gains that would be achieved by limiting the number of table scans is small in comparison to the performance hit taken in terms of memory requirements (just a guess).
  • Though the dense_rank method is definitely the most elegant, it seems that it doesn't scale well (see the result for 20 columns, which is by far the worst of the four methods), and even on a small scale just cannot compete with the performance of Count.

Thanks for the help and suggestions!

SELECT COUNT(*)
FROM (SELECT ColumnName
      FROM TableName
      GROUP BY ColumnName) AS s;

GROUP BY selects distinct values including NULL. COUNT(*) will include NULLs, as opposed to COUNT(ColumnName), which ignores NULLs.

Custom Fields for a Form representing an object

5 votes

I have an architectural question concerning custom fields in a view for an object. Let's say you have a User Object with some basic information like firstname, lastname, ... that can be used by all customers.

Now, often we get a question from a customer to add couple of custom fields typical for their domain. Our solution now is an xml data column where key value pairs are stored. This has been ok so far, but now we'll have to find a more architectural solution.

For instance, now, a customer wants a dropdown where it can select the value for its custom field. We could still store the selected value in the xml data column, but where do we store all those dropdown values...

I know that in sharepoint you can also add custom fields like dropdowns and I was wondering how to deal with this best. I want to avoid creating custom tables for customers, or having a table with 90 columns (10 basic and then 10 for each customer), ...

You get the idea, it should be generic and be able to deal with all sorts of problems in the future.

What I was thinking about is a Table UserConfiguration where each record has a Foreign Key to the Customer (Channel in our database), then a column FieldName, a column FieldType and a column Values. The column values should be an xml type column, because for a dropdown, we'll need to add multiple values. Also, each value can have extra data attached to it (not just a name). The other problem then is how to store the selected value. I don't like the idea of having foreign keys to xml in my database (read somewhere that Azure can't handle this all to well). Do you just store the name of the value (what if the value were to disappear out of the xml?)?

Any documentation, links on this kind of problems would also be great. I'm trying to find a design pattern that deals with this kind of problem in the database.

I want to answer your question in two parts:
1) Implementing custom fields in a database server
2) Restricting custom fields to an enumeration of values


Although common solutions to 1) are discussed in the question referenced by @Simon, maybe you are looking for a bit of discussion on what the problem is and why it hasn't been solved for us already.

  • databases are great for structured, typed data
  • custom fields are inherently less structured
  • therefore, custom fields are more difficult to work with in a database
  • some or many of the advantages of using a database are lost
    • some queries may be more difficult or impossible
    • type safety may be lost (in the database)
    • data integrity may no longer be enforced (by the database)
    • it's a lot more work for the implementers and maintainers

As discussed in the other question, there's no perfect solution.
But these benefits/features still need to be implemented somewhere, and so often the application becomes responsible for data integrity and type safety.
For situations like these, people have created Object-Relation Mapping tools, although, as Jeff Atwood says, even using an ORM could create more problems than it solved. However, you mentioned that it 'should be generic and be able to deal with all sorts of problems in the future' -- this makes me think an ORM might be your best bet.

So, to sum up my answer, this is a known problem with known solutions, none of which are completely satisfactory (because it's so hard). Pick your poison.


To answer the second part of (what I think is) your question:
As mentioned in the linked question, you could implement Entity-Attribute-Value in your database for custom fields, and then add an extra table to hold the legal values for each entity. Then, the attribute/value of the EAV table is a foreign key into the attribute-value table.

For example,

CREATE TABLE `attribute_value` ( -- enumerations go in this table
    `attribute` varchar(30), 
    `value` varchar(30), 
    PRIMARY KEY (`attribute`, `value`)
);

CREATE TABLE `eav` ( -- now the values of attributes are restricted
    `entityid` int, 
    `attribute` varchar(30), 
    `value` varchar(30), 
    PRIMARY KEY (`entityid`, `attribute`), 
    FOREIGN KEY (`attribute`, `value`) REFERENCES `attribute_value`(`attribute`, `value`)
);

Of course, this solution isn't perfect or complete -- it's only supposed to illustrate the idea. For instance, it uses varchars, and lacks a type column. Also, who gets to decide what the possible values for each attribute are? Can these be changed at any time by the user?

Oracle - Why is SELECT * FROM Foo; so slow?

5 votes

I'm working on some Oracle performance issues with our web app. One thing I've noticed that's seems to obfuscate any sort of tests is that simple queries that return a lot of results are still very slow. One example is:

select * from TPM_PROJECTWORKGROUPS;

When I run it, I get:

 5825 record(s) selected [Fetch MetaData: 0ms] [Fetch Data: 59s] 

 [Executed: 9/22/2011 1:52:38 PM] [Execution: 203ms] 

If I understand this correctly, it means the actual query took 203ms to run but it took 59 seconds for that data to be returned to the client, is that was "Fetch" means in this case?

I don't have access to connect to the database machine directly and run the query locally, but is it safe to assume that the culprit is the actual network bandwidth itself? This makes sense since I'm in Seattle and the server is in New York, but still a minute for 5800 rows seems like pretty slow through-put.

Is there any quick advice for a) confirming that network bandwidth is indeed the problem and b) any "gotchas" or things to check for why serializing the data over the wire is so slow? Thanks!

A Few Updates Based On Comments:

SELECT COUNT(*) FROM (select * from TPM_PROJECTWORKGROUPS) t;

Results:

 1 record(s) selected [Fetch MetaData: 0ms] [Fetch Data: 0ms] 

 [Executed: 9/22/2011 2:16:08 PM] [Execution: 219ms] 

And if I try selecting only one column:

SELECT PROJECTID FROM TPM_PROJECTWORKGROUPS;

Results:

5825 record(s) selected [Fetch MetaData: 0ms] [Fetch Data: 1m 0s]

[Executed: 9/22/2011 2:17:20 PM] [Execution: 203ms]

Table schema:

PROJECTID (NUMBER) WORKGROUPID (NUMBER)

What API are you using to interact with the database (SQL*Plus, JDBC, ODBC, etc.)? Any API will have some function that specifies how many rows (or how much data) to fetch in a single network round-trip. For example, in SQL*Plus, it's set arraysize N. In JDBC, it's setFetchSize. Other APIs will have similar functions. If you're on a WAN, you generally want to minimize how chatty your application is by increasing the number of rows fetched with each network round trip.

Along the same lines, you'll probably benefit from moving less data over the network and pushing more logic to the server. Do you actually display a grid with 5800 rows of data to the user? Or do you do fetch that data and then perform some processing in the application (i.e. order the data and display the first 100 rows)? If you can push that processing to the database and reduce the amount of data that has to be transmitted over the database, you'll be much better off.

Oracle has options to configure the SDU and TDU as well as a few other networking parameters in SQL*Net. I wouldn't start looking at those options, though, until you've optimized the fetch size and ensured that you're pulling back the least amount of data possible.

Possible hacking attempt. How to tell if my db has been compromised

5 votes

I have the following in my log file with seconds apart. I'm assuming something was trying to find my database or an admin page or something, but i'm not sure.

Should I be worried about this and how can I tell if my db has been compromised?

ERROR - 2011-09-23 20:51:42 --> 404 Page Not Found --> muieblackcat
ERROR - 2011-09-23 20:51:46 --> 404 Page Not Found --> PMA
ERROR - 2011-09-23 20:51:46 --> 404 Page Not Found --> admin
ERROR - 2011-09-23 20:51:47 --> 404 Page Not Found --> dbadmin
ERROR - 2011-09-23 20:51:48 --> 404 Page Not Found --> mysql
ERROR - 2011-09-23 20:51:48 --> 404 Page Not Found --> myadmin
ERROR - 2011-09-23 20:51:48 --> 404 Page Not Found --> phpmyadmin2
ERROR - 2011-09-23 20:51:49 --> 404 Page Not Found --> phpMyAdmin2
ERROR - 2011-09-23 20:51:49 --> 404 Page Not Found --> phpMyAdmin-2
ERROR - 2011-09-23 20:51:50 --> 404 Page Not Found --> php-my-admin
ERROR - 2011-09-23 20:51:50 --> 404 Page Not Found --> phpMyAdmin-2.2.3
ERROR - 2011-09-23 20:51:51 --> 404 Page Not Found --> phpMyAdmin-2.2.6
ERROR - 2011-09-23 20:51:52 --> 404 Page Not Found --> phpMyAdmin-2.5.1
ERROR - 2011-09-23 20:51:52 --> 404 Page Not Found --> phpMyAdmin-2.5.4
ERROR - 2011-09-23 20:51:53 --> 404 Page Not Found --> phpMyAdmin-2.5.5-rc1
ERROR - 2011-09-23 20:51:53 --> 404 Page Not Found --> phpMyAdmin-2.5.5-rc2
ERROR - 2011-09-23 20:51:54 --> 404 Page Not Found --> phpMyAdmin-2.5.5
ERROR - 2011-09-23 20:51:54 --> 404 Page Not Found --> phpMyAdmin-2.5.5-pl1
ERROR - 2011-09-23 20:51:55 --> 404 Page Not Found --> phpMyAdmin-2.5.6-rc1
ERROR - 2011-09-23 20:51:58 --> 404 Page Not Found --> phpMyAdmin-2.5.6
ERROR - 2011-09-23 20:51:59 --> 404 Page Not Found --> phpMyAdmin-2.5.7
ERROR - 2011-09-23 20:51:59 --> 404 Page Not Found --> phpMyAdmin-2.5.7-pl1
ERROR - 2011-09-23 20:52:00 --> 404 Page Not Found --> phpMyAdmin-2.6.0-alpha
ERROR - 2011-09-23 20:52:00 --> 404 Page Not Found --> phpMyAdmin-2.6.0-alpha2
ERROR - 2011-09-23 20:52:04 --> 404 Page Not Found --> phpMyAdmin-2.6.0-beta2
ERROR - 2011-09-23 20:52:04 --> 404 Page Not Found --> phpMyAdmin-2.6.0-rc1
ERROR - 2011-09-23 20:52:05 --> 404 Page Not Found --> phpMyAdmin-2.6.0-rc2
ERROR - 2011-09-23 20:52:05 --> 404 Page Not Found --> phpMyAdmin-2.6.0-rc3
ERROR - 2011-09-23 20:52:09 --> 404 Page Not Found --> phpMyAdmin-2.6.0-pl1
ERROR - 2011-09-23 20:52:09 --> 404 Page Not Found --> phpMyAdmin-2.6.0-pl2
ERROR - 2011-09-23 20:52:10 --> 404 Page Not Found --> phpMyAdmin-2.6.0-pl3
ERROR - 2011-09-23 20:52:10 --> 404 Page Not Found --> phpMyAdmin-2.6.1-rc1
ERROR - 2011-09-23 20:52:11 --> 404 Page Not Found --> phpMyAdmin-2.6.1-rc2
ERROR - 2011-09-23 20:52:11 --> 404 Page Not Found --> phpMyAdmin-2.6.1
ERROR - 2011-09-23 20:52:15 --> 404 Page Not Found --> phpMyAdmin-2.6.1-pl2
ERROR - 2011-09-23 20:52:15 --> 404 Page Not Found --> phpMyAdmin-2.6.1-pl3
ERROR - 2011-09-23 20:52:16 --> 404 Page Not Found --> phpMyAdmin-2.6.2-rc1
ERROR - 2011-09-23 20:52:16 --> 404 Page Not Found --> phpMyAdmin-2.6.2-beta1
ERROR - 2011-09-23 20:52:17 --> 404 Page Not Found --> phpMyAdmin-2.6.2-rc1
ERROR - 2011-09-23 20:52:17 --> 404 Page Not Found --> phpMyAdmin-2.6.2
ERROR - 2011-09-23 20:52:18 --> 404 Page Not Found --> phpMyAdmin-2.6.2-pl1
ERROR - 2011-09-23 20:52:18 --> 404 Page Not Found --> phpMyAdmin-2.6.3
ERROR - 2011-09-23 20:52:19 --> 404 Page Not Found --> phpMyAdmin-2.6.3-rc1
ERROR - 2011-09-23 20:52:19 --> 404 Page Not Found --> phpMyAdmin-2.6.3
ERROR - 2011-09-23 20:52:20 --> 404 Page Not Found --> phpMyAdmin-2.6.3-pl1
ERROR - 2011-09-23 20:52:20 --> 404 Page Not Found --> phpMyAdmin-2.6.4-rc1
ERROR - 2011-09-23 20:52:21 --> 404 Page Not Found --> phpMyAdmin-2.6.4-pl1
ERROR - 2011-09-23 20:52:21 --> 404 Page Not Found --> phpMyAdmin-2.6.4-pl2
ERROR - 2011-09-23 20:52:22 --> 404 Page Not Found --> phpMyAdmin-2.6.4-pl3
ERROR - 2011-09-23 20:52:22 --> 404 Page Not Found --> phpMyAdmin-2.6.4-pl4
ERROR - 2011-09-23 20:52:23 --> 404 Page Not Found --> phpMyAdmin-2.6.4
ERROR - 2011-09-23 20:52:23 --> 404 Page Not Found --> phpMyAdmin-2.7.0-beta1
ERROR - 2011-09-23 20:52:24 --> 404 Page Not Found --> phpMyAdmin-2.7.0-rc1
ERROR - 2011-09-23 20:52:24 --> 404 Page Not Found --> phpMyAdmin-2.7.0-pl1
ERROR - 2011-09-23 20:52:25 --> 404 Page Not Found --> phpMyAdmin-2.7.0-pl2
ERROR - 2011-09-23 20:52:25 --> 404 Page Not Found --> phpMyAdmin-2.7.0
ERROR - 2011-09-23 20:52:26 --> 404 Page Not Found --> phpMyAdmin-2.8.0-beta1
ERROR - 2011-09-23 20:52:26 --> 404 Page Not Found --> phpMyAdmin-2.8.0-rc1
ERROR - 2011-09-23 20:52:27 --> 404 Page Not Found --> phpMyAdmin-2.8.0-rc2
ERROR - 2011-09-23 20:52:27 --> 404 Page Not Found --> phpMyAdmin-2.8.0
ERROR - 2011-09-23 20:52:28 --> 404 Page Not Found --> phpMyAdmin-2.8.0.1
ERROR - 2011-09-23 20:52:34 --> 404 Page Not Found --> phpMyAdmin-2.8.0.4
ERROR - 2011-09-23 20:52:35 --> 404 Page Not Found --> phpMyAdmin-2.8.1-rc1
ERROR - 2011-09-23 20:52:35 --> 404 Page Not Found --> phpMyAdmin-2.8.1
ERROR - 2011-09-23 20:52:36 --> 404 Page Not Found --> phpMyAdmin-2.8.2
ERROR - 2011-09-23 20:52:36 --> 404 Page Not Found --> sqlmanager
ERROR - 2011-09-23 20:52:38 --> 404 Page Not Found --> mysqlmanager
ERROR - 2011-09-23 20:52:38 --> 404 Page Not Found --> p
ERROR - 2011-09-23 20:52:39 --> 404 Page Not Found --> PMA2005
ERROR - 2011-09-23 20:52:39 --> 404 Page Not Found --> pma2005
ERROR - 2011-09-23 20:52:40 --> 404 Page Not Found --> phpmanager
ERROR - 2011-09-23 20:52:40 --> 404 Page Not Found --> php-myadmin
ERROR - 2011-09-23 20:52:41 --> 404 Page Not Found --> phpmy-admin
ERROR - 2011-09-23 20:52:41 --> 404 Page Not Found --> webadmin
ERROR - 2011-09-23 20:52:42 --> 404 Page Not Found --> sqlweb
ERROR - 2011-09-23 20:52:42 --> 404 Page Not Found --> websql
ERROR - 2011-09-23 20:52:42 --> 404 Page Not Found --> webdb
ERROR - 2011-09-23 20:52:43 --> 404 Page Not Found --> mysqladmin
ERROR - 2011-09-23 20:52:43 --> 404 Page Not Found --> mysql-admin
ERROR - 2011-09-23 20:52:50 --> 404 Page Not Found --> dbadmin
ERROR - 2011-09-23 20:52:50 --> 404 Page Not Found --> myadmin
ERROR - 2011-09-23 20:52:54 --> 404 Page Not Found --> mysqladmin
ERROR - 2011-09-23 20:52:54 --> 404 Page Not Found --> phpadmin
ERROR - 2011-09-23 20:52:55 --> 404 Page Not Found --> phpMyAdmin
ERROR - 2011-09-23 20:52:55 --> 404 Page Not Found --> phpmyadmin
ERROR - 2011-09-23 20:52:56 --> 404 Page Not Found --> phpmyadmin1
ERROR - 2011-09-23 20:52:56 --> 404 Page Not Found --> phpmyadmin2
ERROR - 2011-09-23 20:52:57 --> 404 Page Not Found --> pma
ERROR - 2011-09-23 20:52:57 --> 404 Page Not Found --> databaseadmin
ERROR - 2011-09-23 20:52:58 --> 404 Page Not Found --> admm
ERROR - 2011-09-23 20:52:58 --> 404 Page Not Found --> admn
ERROR - 2011-09-23 20:52:59 --> 404 Page Not Found --> _myadmin
ERROR - 2011-09-23 20:52:59 --> 404 Page Not Found --> phpMyA
ERROR - 2011-09-23 20:53:03 --> 404 Page Not Found --> admin
ERROR - 2011-09-23 20:53:04 --> 404 Page Not Found --> mysql2
ERROR - 2011-09-23 20:53:04 --> 404 Page Not Found --> phpmyadm
ERROR - 2011-09-23 20:53:05 --> 404 Page Not Found --> php1
ERROR - 2011-09-23 20:53:05 --> 404 Page Not Found --> php2
ERROR - 2011-09-23 20:53:09 --> 404 Page Not Found --> sqladm
ERROR - 2011-09-23 20:53:09 --> 404 Page Not Found --> myAdmin
ERROR - 2011-09-23 20:53:10 --> 404 Page Not Found --> pmabd
ERROR - 2011-09-23 20:53:10 --> 404 Page Not Found --> mydb
ERROR - 2011-09-23 20:53:11 --> 404 Page Not Found --> mysql_administrator
ERROR - 2011-09-23 20:53:11 --> 404 Page Not Found --> pma_mydb
ERROR - 2011-09-23 20:53:12 --> 404 Page Not Found --> webmail2
ERROR - 2011-09-23 20:53:12 --> 404 Page Not Found --> myphp
ERROR - 2011-09-23 20:53:16 --> 404 Page Not Found --> phpas
ERROR - 2011-09-23 20:53:16 --> 404 Page Not Found --> _pma
ERROR - 2011-09-23 20:53:17 --> 404 Page Not Found --> /scripts
ERROR - 2011-09-23 20:53:20 --> 404 Page Not Found --> _dbadmin
ERROR - 2011-09-23 20:53:24 --> 404 Page Not Found --> _admin
ERROR - 2011-09-23 20:53:27 --> 404 Page Not Found --> _phpMyAdmin
ERROR - 2011-09-23 20:53:34 --> 404 Page Not Found --> sql
ERROR - 2011-09-23 20:53:34 --> 404 Page Not Found --> _sql
ERROR - 2011-09-23 20:53:35 --> 404 Page Not Found --> my-php
ERROR - 2011-09-23 20:53:35 --> 404 Page Not Found --> My-php

Something (probably a bot) is scanning your web server for those pages, which do not exist since they are receiving 404 errors. The scanning is very common -- usually scripts are looking for vulnerabilities.

We can't tell if your database has been compromised. Although the log contents you posted do not indicate that you have been compromised, just scanned.

How to join multiple tables related by other tables

5 votes

I'm developing a site where people can publicate their houses for rent. I'm using php 5.2.0 and MySQL 5+

the publications are stored in a table like this

ta_publications
+---+-------------+------+
|id |    name     | date |
+---+-------------+------+
| 1 | name_001    |  ... |
| 2 | name_002    |  ... |
| 3 | name_003    |  ... |
+---+-------------+------+

I have diferent publications, which have "features" such as "internet", "made service", "satellite tv", etc.

These features might change in the future, and I want to be able to add/remove/modify them, so I store them in the database in a table.

ta_features
+---+-------------+
|id | name        |
+---+-------------+
| 1 | Internet    |
| 2 | Wi-Fi       |
| 3 | satelital tv|
+---+-------------+

which are related to the publications using the following table

ta_publication_features
+---+-------------+----------------+
|id |   type_id   | publication_id |
+---+-------------+----------------+
| 1 |      1      |       1        |
| 2 |      2      |       1        |
| 3 |      3      |       1        |
+---+-------------+----------------+

I think it's pretty easy to understand; There is a publication called name_001 which have internet, wi-fi and satellite tv.

I have the same data-schema for the images, I store them in this table

ta_images
+---+-------------+
|id | src         |
+---+-------------+
| 1 | URL_1       |
| 2 | URL_2       |
| 3 | URL_3       |
+---+-------------+

And use the following table to relate them to the publications

ta_publication_images
+---+-------------+----------------+----------+
|id |  img_id     | publication_id |   order  |
+---+-------------+----------------+----------+
| 1 |      1      |       1        |    0     |
| 2 |      2      |       1        |    1     |
| 3 |      3      |       1        |    2     |
+---+-------------+----------------+----------+

the column order gives the order in wich publications should be displayed when listing a single publication.

Philipp Reichart provided me with a query that will search and get all the publications that have certain features. It works for listing the publications, I can't modified it to return me the data I need.

So I figured I'll run that query and get all of the publications that pass the search criteria and then use another query to list them.

The listing of these publications shall include all publication data (everything on ta_publications)+ all of it's features + the most important (order 0) image src.

I could, for every publication, have two simple querys wich will return, separately, the most important image and all the features it has, but when listing 25 publications per page, it'll be 1 search query + (2 querys per publication * 25 publications) = 51 different querys, clearly not very efficient.

EDIT:

My question is, how can I create a SQL query that, given some publication ids, will return: all publication data (everything on ta_publications) + all of it's features + the most important (order 0) image src

You'll get redundant publication and image data with this one, but here is a way to do it with one query:

   SELECT p.id, p.name, p.date,
           f.id, f.name,
           i.id, i.src
    FROM ta_publications p
    JOIN ta_publication_features pf ON p.id = pf.publication_id
    JOIN ta_features f ON f.id = pf.type_id
    JOIN ta_publication_images pi ON p.id = pi.publication_id 
         AND pi.order = 0
    JOIN ta_images i ON i.id = pi.img_id
    WHERE p.id IN (  -- list of publication ids );

Odd behaviour when doing LIKE with wildcards searching for backslash in MySQL

5 votes

I've encountered a very unusual problem with MySQL, involving backslashes. Basically when I do a wildcard match with LIKE for \n which is in the database as text rather than an actual newline, it will only return a match if I have just a right hand wildcard:

SELECT * 
FROM  `tmptest` 
WHERE  `a` LIKE  '\\\\n%'

Now, if I query like this, it will not return anything:

SELECT *  
FROM `tmptest` 
WHERE `a` LIKE '%\\\\n%'

As you can see from the data I have in the table both queries should be matching. I'm not sure if it's something that I'm missing, or I'm incorrectly escaping the newline but it doesn't make sense for the first query to work and the second not.

Table structure:

CREATE TABLE IF NOT EXISTS `tmptest` (
`a` varchar(22) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Sample data:

INSERT INTO `tmptest` (`a`) VALUES
('\\n'),
('\\n\\ndfsdfsdfs\\n');

Thanks for taking time to read this.

It works for me with 6 backslashes when using the left side wildcard:

mysql> SELECT * FROM `tmptest` WHERE `a` LIKE '%\\\\\\n%';
+-----------------+
| a               |
+-----------------+
| \n              |
| \n\ndfsdfsdfs\n |
+-----------------+
2 rows in set (0.00 sec)

Using mysqld Ver 5.1.49


@Karolis as far as i understand the expression for the LIKE operator should be parsed twice, hence \\\\ turns into \ when used with LIKE.

But how to explain this (using the expression "back\slash"):

SELECT 'back\\slash' LIKE '%back\\\\slash%'; -> TRUE (normal behaviour)

SELECT 'back\\slash' LIKE '%back\\\\\slash%'; -> TRUE (5 backslashes?)

SELECT 'back\\slash' LIKE '%back\\\\\\slash%'; -> TRUE (6 backslashes?)

SELECT 'back\\slash' LIKE '%back\\\\\\\slash%'; -> TRUE (7 backslashes?)

SELECT 'back\\slash' LIKE '%back\\\\\\\\slash%'; -> FALSE (normal behaviour, I guess..)

Nevertheless if searching only for the "\":

mysql> SELECT 'back\\slash' LIKE '%\\\\%'; --> FALSE (but should work)

mysql> SELECT 'back\\slash' LIKE '%\\\\\%'; --> TRUE (5 backslashes)

mysql> SELECT 'back\\slash' LIKE '%\\\\\\%'; --> TRUE (6 backslashes)

mysql> SELECT 'back\\slash' LIKE '%\\\\\\\%'; --> FALSE (7 backslashes)

For this particular question, one could use a different escape character | and bypass the problem altogether (if no | character occurs):

mysql> SELECT 'back\\slash' LIKE '%\\%' ESCAPE '|'; --> TRUE

So maybe some mysql guru out there can explain this. I simply can't. also tested with mysql 5.1.53 on a different machine. Same behaviour was observed. As i started by commenting, its a rather interesting question.