Summary: An easy way to tell how efficient your database schema is and how to tell whether it should be flatter or more relational is to simply export your records, compress them, and look at the compression ratio; a higher compression ratio means you have more repeated data – and, if it the ratio is large enough, you might consider redesigning your database schema to be more efficient, if efficient usage of storage space is a priority to you.
We are all quite aware of how important the concept and implementation of a database is to modern technological life. Designing a database schema deserves more than a bit of thought in order to hit a good ratio between space and time complexity. (Of course, the target ratio varies with respect to circumstance.)
Layman’s explanation: If you are selling widgets in a store, you can either record each sale as customer X bought widget “Kalamazoo”, writing the name of the widget sold in each case – or you can write down that widget Kalamazoo has code K and writing that customer X bought widket K, being more concise in your recrods, but requiring more thought to figure out who bought what, and this trade-off is something to consider when deciding how to keep records.
It is easy to come up with examples of poorly designed databases that are either too flat or too relational. However, in practice, how can one gauge how efficiently one’s database stores one’s data?
Layman’s explanation: Continuing the example in the preceding layman’s explanation, the question is: how can you balance conciseness and clarity so as to have the most useful and efficient records?
We were recently performing a server migration for a client, and this server migration required migrating MySQL databases. That usually entails exporting the current server’s MySQL database and importing it into the target server’s MySQL database. In order to expedite the upload to the new server, we compressed the database before upload. We were surprised to discover that the compression ratio of the database was 4%.
Layman’s explanation: suppose you had the string of letters ABCDABCDABCD and you wanted to represent it more compactly. You could define that “AB” will be represented by “0″ and “CD” will be represented by “1″. Then, you could rewrite “ABCDABCDABCD” as “010101″, which is shorter to write down. “010101″ is 50% as many letters/numbers as “ABCDABCD”, which means that you rewrote your original records using 50% as much writing, saving you some trouble. Now imagine that you could find a way to keep your records using 5% as much writing as normal – that would mean that your original records could have been much more concise, to a significant degree.
Honestly, we had never seen a compression ratio of 4% before. This suggested that the database we were migrating had far too much repeated data and could be redesigned to be more relational in order to be more efficient.
Layman’s explanation: The database we were looking at was rewriting the same names of products (or whatever else) so many times that it was extremely inefficient and inconcise, and therefore the record-keeping system could be significantly improved.
We have thus adopted a new protocol to gauge whether a given database has an efficient schema: compress it and look at its compression ratio. If the database is large and has an extreme compression ratio, then it is time to consider a redesign of the database schema.
Layman’s explanation: if you are using a run-of-the-mill database system to store your records, we have a very easy way to tell whether there is a way to make your record-keeping more efficient (i.e., saving you time and effort) ; it would take literally a few minutes to tell how concise your record keeping is, based on theoretical concepts, and you would have a concrete answer as to whether it is worth investing in upgrading your database system.
Therefore, we suggest an easy and simple way to gauge how relational your data is (and, therefore, in some cases, either how efficient or how inefficient your database schema is) by simply exporting your database, compressing it and looking at the compression ratio.
A technical comment: this discussion is under the premise that the big-O of lookup is insignificant with respect to the application at hand and that spacial complexity is the main priority, such as can often be the case with a small business dealing with the growing pains of being stuck with a poor database system. If time complexity is the priority, then the consideration raised above loses value – but this is a case outside the scope of the discussion we intended.