I have been asked to create something which tracks the daily cost to collect on accounts, and I am trying to figure out a database table schema that would support this.
Here's what I know
- Company has over 2.5 million accounts
- Of these, they currently work an average of 200,000 per month (that changes with staffing levels, which are currently low)
- They have 13 different cost types they'd like to track, and they have warned that they might add more in the future
- They want the costs to be tracked daily
- Costs are not split across the entire inventory. They are either split across the # of accounts that are worked per month (200,000), or users can enter account identifiers to apply a cost to a group of accounts, or they could simply specify which accounts to apply the cost to.
My first thought was a normalized database:
AccountId Date CostTypeId Amount
My issue with this is, do the math. This table is going to get huge quickly. Assuming all 13 cost types get applied to all worked accounts for the current month, that's
200k * 13 * N days in month, which is somewhere around a 75-80 million records per month, or close to a billion records per year.
My second thought was to denormalize it a bit
AccountId Date TotalCost CostType1 CostType2 CostType3 CostType4 CostType5 CostType6 CostType7 CostType8 CostType9 CostType10 CostType11 CostType12 CostType13
This method is more denormalized and can create up to 6 million records per month (
200k * N days in month), or about 72million per year. It's a lot less than the first method, however if the company decides on a new Cost Type in the future, another database column will need to be added.
Of the two methods, which do you prefer? Why? Is there another alternative that you can think of which would handle this better?
I am most interested in reporting performance, both summerized and detailed reports. The job that will spread the costs out over accounts will be run nightly when no one is around. A secondary concern is database size. The existing database is already almost 300GB, and I believe the space on disk is around 500GB.
The database is SQL Server 2005
A billion records a year isn't much.
With partitioning (per Costtype maybe) and archiving it is manageable.
The number of data items to store is still 200k * 13 * N. As columns, you'll get less rows per page and it will take more space than as rows. You may gain if "CostType1" is not a fixed length datatype, but it's marginal.
"KISS" as they say