Earlier I have created tables this way:
create table workflow ( id number primary key, name varchar2(100 char) not null, is_finished number(1) default 0 not null, date_finished date );
Column is_finished indicates whether the workflow finished or not. Column date_finished is when the workflow was finished.
Then I had the idea "I don't need is_finished as I can just say: where data_finished is not null", and I designed without is_finished column:
create table workflow ( id number primary key, name varchar2(100 char) not null, date_finished date );
(We use Oracle 10)
Is it a good or bad idea? I've heard you can not have an index on a column with NULL values, so
where data_finished is not null will be very slow on big tables.
There is a right way to index null values, and it doesn't use a FBI. Oracle will index null values, but it will NOT index null LEAF values in the tree. So, you could eliminate the column
is_finished and create the index like this.
CREATE INDEX ON workflow (date_finished, 1);
Then, if you check the explain plan on this query:
SELECT count(*) FROM workflow WHERE date_finished is null;
You might see the index being used (if the optimizer is happy).
Back to the original question: looking at the variety of answers here, I think there is no right answer. I may have a personal preference to eliminate a column if it is unnecessary, but I also don't like overloading the meaning of columns either. There are two concepts here:
- The record has finished.
- The record finished on a particular date.
Maybe you need to keep these separate, maybe you don't. When I think about eliminating the
is_finished column, it bothers me. Down the road, the situation may arise where the record finished, but you don't know precisely when. Perhaps you have to import data from another source and the date is unknown. Sure, that's not in the business requirements now, but things change. What do you do then? Well, you have to put some dummy value in the
date_finished column, and now you've compromised the data a bit. Not horribly, but there is a rub there. The little voice in my head is shouting YOU'RE DOING IT WRONG when I do things like that.
My advice, keep it separate. You're talking about a tiny column and a very skinny index. Storage should not be an issue here.
Rule of Representation: Fold knowledge into data so program logic can be stupid and robust.
-Eric S. Raymond