The Cost of Calculated Columns In SQL Database

You are not supposed to put a calculated column in a table in a pure SQL database. And as the guardian of pure SQL, I should oppose this practice. Too bad the real world is not as nice as the theoretical world.

Calculated Columns

There are many types of calculated columns. The first are columns which derive their values from outside the database itself. The most common examples are timestamps, user identifiers, and other values generated by the system or the application program. This type of calculated column is fine and presents no problems for the database.
The second type is values calculated from columns in the same row. In the days when we used punch cards, you would take a deck of cards, run them thru a machine that would do the multiplications and addition, then punch the results in the right hand side of the cards. For example, the total cost of a line in an order could be described as price times quantity.
cost of calculated columns

The reason for this calculation was simple; the machines that processed punch cards had no secondary storage, so the data had to be kept on the cards themselves. There is truly no reason for doing this today; it is much faster to re-calculate the data than it is to read the results from secondary storage.
The third type of calculated data uses data in the same table, but not always in the same row in which it will appear. The fourth type uses data in the same database.
These last two types are used when the cost of the calculation is higher than the cost of a simple read. In particular, data warehouses love to have this type of data in them to save time.
When and how you do something is important in SQL. Here is an example, based on a thread in a SQL Server discussion group. I am changing the table around a bit, and not telling you the names of the guilty parties involved, but the idea still holds.
It records the final selling price of many different stocks. The trend column is +1 if the price increased from the last reported selling price, 0 if it stayed the same and -1 if it dropped in price. The trend column is the problem, not because it is hard to compute, but because it can be done several different ways. Let's look at the methods for doing this calculation.

Triggers

You can write a trigger which will fire after the new row is inserted. While there is an ISO Standard SQL/PSM language for writing triggers, the truth is that every vendor has a proprietary trigger language and they are not compatible. In fact, you will find many different features from product to product and totally different underlying data models. If you decide to use triggers, you will be using proprietary, nonrelational code and have to deal with several problems.

INSERT INTO Statement

This is not as bad as you first think. The innermost subquery finds the sale just before the current sale, then returns its price. If the old price minus the new price is positive negative or zero, the SIGN() function can computer the value of TREND. Yes, I was showing off a little bit with this query.
The problem with this is much the same as the triggers. What if I delete a row or add a new row between two existing rows? This statement will not do a thing about changing the other rows.
But there is another problem; this stored procedure is good for only one row at a time. That would mean that at the end of the business day, I would have to write a loop that put one row at a time into the StockHistory table.

UPDATE the Table

You already have a default value of 0 in the trend column, so you could just write an UPDATE statement based on the same logic we have been using.
While this statement does the job, it will re-calculate trend column for the entire table. What if we only looked at the columns that had a zero? Better yet, what if we made the trend column NULL-able and used the NULLs as a way to locate the rows that need the updates?

Use a VIEW

This approach will handle the insertion and deletion of any number of rows, in any order. The trend column will be computed from the existing data each time. The primary key is also a covering index for the query, which helps performance. A covering index is one which contains all of the columns used the WHERE clause of a query. The major objection to this approach is that the VIEW can be slow to build each time, if StockHistory is a large table

No comments

Post Top Ad