Best database-design questions in October 2010

Database Design: replace a boolean column with a timestamp column?

7 votes

Earlier I have created tables this way:

create table workflow (
    id number primary key,
    name varchar2(100 char) not null,
    is_finished number(1) default 0 not null,
    date_finished date
);

Column is_finished indicates whether the workflow finished or not. Column date_finished is when the workflow was finished.

Then I had the idea "I don't need is_finished as I can just say: where data_finished is not null", and I designed without is_finished column:

create table workflow (
    id number primary key,
    name varchar2(100 char) not null,
    date_finished date
);

(We use Oracle 10)

Is it a good or bad idea? I've heard you can not have an index on a column with NULL values, so where data_finished is not null will be very slow on big tables.

There is a right way to index null values, and it doesn't use a FBI. Oracle will index null values, but it will NOT index null LEAF values in the tree. So, you could eliminate the column is_finished and create the index like this.

CREATE INDEX ON workflow (date_finished, 1);

Then, if you check the explain plan on this query:

SELECT count(*) FROM workflow WHERE date_finished is null;

You might see the index being used (if the optimizer is happy).

Back to the original question: looking at the variety of answers here, I think there is no right answer. I may have a personal preference to eliminate a column if it is unnecessary, but I also don't like overloading the meaning of columns either. There are two concepts here:

  1. The record has finished. is_finished
  2. The record finished on a particular date. date_finished

Maybe you need to keep these separate, maybe you don't. When I think about eliminating the is_finished column, it bothers me. Down the road, the situation may arise where the record finished, but you don't know precisely when. Perhaps you have to import data from another source and the date is unknown. Sure, that's not in the business requirements now, but things change. What do you do then? Well, you have to put some dummy value in the date_finished column, and now you've compromised the data a bit. Not horribly, but there is a rub there. The little voice in my head is shouting YOU'RE DOING IT WRONG when I do things like that.

My advice, keep it separate. You're talking about a tiny column and a very skinny index. Storage should not be an issue here.

Rule of Representation: Fold knowledge into data so program logic can be stupid and robust.

-Eric S. Raymond

SQL Server: how to constrain a table to contain a single row ?

7 votes

I want to store a single row in a configuration table for my application. I would like to enforce that this table can contain only one row.

What is the simplest way to enforce the single row constraint ?

You make sure one of the columns can only contain one value, and then make that the primary key (or apply a uniqueness constraint)

CREATE TABLE T1(
    Lock char(1) not null,
    /* Other columns */,
    constraint PK_T1 PRIMARY KEY (Lock),
    constraint CK_T1_Locked CHECK (Lock='X')
)

I have a number of these tables in various databases, mostly for storing config. It's a lot nicer knowing that, if the config item should be an int, you'll only ever read an int from the DB.

How to store "same as" data?

6 votes

I've got one model with 3 addresses: pickup, dropoff, and billing. I figure the billing address will usually be either the pickup or drop-off address, so from a UI perspective, I should have a "same as" option. But from a DB perspective, should I save the "same as" field, or should I duplicate the data?

You should have the same Id of a row from an Address table in two different columns, PickUp and DropOff. This way, you do not duplicate the address, do not use some sentinel address, and can easily query to see if the PickUp address is the same as the DropOff. If one of these changes in the future, you can always modify the Id value stored in its respective column to a new address.