Best database-design questions in February 2012

Designing a questiion-and-answer system that is flexible and efficient

11 votes

I've been working on a dynamic question-and-answers system, but I'm having trouble creating a efficient AND flexible design for this system. I'd love to know if there's an established design pattern or any recommendations for designing this system.

What I'm trying to do

I have a set of questions. After answering them, another set of questions are shown, depending on the answers to the previous set. This repeats, until no more questions are needed.

The question answers are all boolean, multiple-choice, or numeric.

The important part is that most questions are only shown when a specific set of criteria is met, based on the previous answers.
I need the criteria to support mainly boolean logic, such as And, Or, Not, Equals, Greater Than, and Less Than.

For example, let's say I have already received answers to questions such as Age, Gender, and State.
One of the next questions is In School?, but it should ONLY be displayed if: Age < 30 AND Gender=Male AND (State = CA OR State = NY)

Has anyone heard of a similar design pattern? How would you approach this design?


Background Information

I tried Database columns

At first, we only had 3 initial questions, so we just used 3 columns to filter the second set of questions.

However, our business needs grew and we started needing more initial questions, added more columns, and put more logic within those filters.

This quickly became too rigid and cumbersome.

I tried a Logic Interpreter

Our second attempt to make the system more flexible: store the filtering logic as JavaScript, and run a JavaScript interpreter to filter the results.

This worked pretty well for flexibility, but retrieving thousands of rows from the database and interpreting the scripts was extremely inefficient and performed too poorly for production.

I tried a Hybrid

We finally combined the two approaches, and came up with something feasable.
We first filtered our list based on several hard-coded database columns, and further filtered the list with the JavaScript interpreter.

This hybrid system still has many drawbacks:

  • The logic is in 2 different systems (SQL database logic and JavaScript interpreter)
  • Interpreting the JavaScript is slow, and is probably overkill for the simple boolean logic needed
  • The system is very difficult to maintain, especially because the JavaScript logic must always be written by a developer.

I'd really like to hear suggestions on how to improve this design.

Other Info

My database is MS SQL Server, the backend is .NET C#, and the JavaScript interpreter is JINT. The UI implementation is not important, but is a AJAX enabled website used to ask and answer these questions.

We had to do something similar in the past for a medical system and due to its complexity, we resorted to reuse the rule engine that support multi-classification decision tree. I remember that I came across a nice simple design about this and managed to dig out the link.

http://www.javaworld.com/javaworld/javatips/jw-javatip139.html?page=1

The design is loosely coupled from the data storage, so making it easy to fit into your existing solution design.

Are primary keys passé?

9 votes

What unique functionality do Primary Keys provide?

While i titled the question with tongue firmly planted in cheek, my question is serious. Before any flames start, I'm not saying build a database without constraints or referential integrity. As far I can tell, however, SQL Server could do away with the primary key key word.

  • Unique indexes cover, well, uniqueness
  • Column based Non-nullability covers the non-nullability requirement for PKs
  • PK's don't have to be clustered, so that's not it
  • Foreign keys can, and often are, implemented with unique indexes, rather than PKs
  • Even MSDN states that a unique index is created to enforce the PK's uniqueness

I do agree that logically a Primary Key coveys a bit of intention about a data model, but is that it? [sarcasm]Oh, and we do get that little Key icon SSMS shows when designing a table! [/sarcasm]


EDIT

From the comments, it seems clear I didn't ask this question as clearly as I thought. I agree that primary keys are important from a logical perspective.

I'm not asking:

  • should i choose an int or a varchar for my PK
  • do PK's have to be clustered, or how do i identify what should be clustered
  • how do i uniquely identify rows

My intention was to ask "what features do PK's provide that cannot reasonably be implemented using other features?" I'm not suggesting going crazy here -- like using a trigger to enforce uniqueness instead of unique constraints/indexes. Reasonable is a key word here -- and using a unique index/constraint seems very similar to defining a PK.

A completely different perspective :

SQL is a language that is defined by an ISO standard. That standard has "mandatory" features and "optional conformance" features.

If you build a DBMS with some data manipulation language, then you are entitled to call your language "SQL" only if :

(a) you have implemented ALL of the syntax prescribed by the standard ("mandatory" features) , and (b) all of the language features that you have implemented (all the mandatory ones as a minimum, but also the "optional" ones you "opted in" for), expose exactly the behaviour as defined/Described in the standard.

The "PRIMARY KEY" syntax is a very old feature, and it's not unlikely that it is one of those "mandatory" ones. Ditching the word from your language means you can no longer legitimately call your language SQL. Big commercial vendors are not likely going to make such a move any time soon.

Too many fields in MySQL?

6 votes

I developed a stats site for a game as a learning project a few years back. It's still used today and I'd like to get it cleaned up a bit.

The database is one area that needs improvement. I have a table for the game statistics, which has GameID, PlayerID, Kills, Deaths, DamageDealt, DamageTaken, etc. In total, there are about 50 fields in that single table and many more that could be added in the future. At what point are there too many fields? It currently has 57,341 rows and is 153.6 MiB by itself.

I also have a few fields that stores arrays in a BLOB in this same table. An example of the array is Player vs Player matchups. The array stores how many times that player killed another player in the game. These are the bigger fields in filesize. Is storing an array in a BLOB advised?

The array looks like:

        [Killed] => Array
            (
                [SomeDude] => 13
                [GameGuy] => 10
                [AnotherPlayer] => 8
                [YetAnother] => 7
                [BestPlayer] => 3
                [APlayer] => 9
                [WorstPlayer] => 2
            )

These tend to not exceed more than 10 players.

I prefer to not have one table with an undetermined number of columns (with more to come) but rather to have an associated table of labels and values, so each user has an id and you use that id as a key into the table of labels and values. That way you only store the data you need per user. I believe this approach is called EAV (as per Triztian's comment) and it's how medical databases are kept, since there are SO many potential fields for an individual patient, even while any given patient only has a very small number of those fields with actual data.

so, you'd have

user:
id | username | some_other_required_field

user_data:
id | user_id | label | value

Now you can have as many or as few user_data rows as you need per user.

[Edit]

As to your array, I would treat this with a relational table as well. Something like:

player_interraction:
id | player_id | player_id | interraction_type

here you would store the two players who had an interaction and what type of interaction it was.