Today, more and more engineers are offering and pushing for use of NoSQL technologies. It has better performance, infinite scaling and it is much cheaper. But after some time spent in the same debate I think the question is completely wrong. The real question is are we aware that we cannot use databases as anything else but data storage and also that for connected layer (direct applications use) we can start forgetting about joins and relations between tables.

SQL technology started in 1970-es and it was revolutionary in every point: it allowed structured storage of data, it created a system for not repeating of data (normalization) and it allowed security and easy maintenance. Today, we have large established SQL systems as Microsoft SQL Server, Oracle, MySQL, Postgres, and many more. But with invention of NoSQL technology solved important problems and created grim future for SQL as an approach of storing of data.

What SQL databases were missing? Well, with SQL everything is good if you have small data. If you have a bit more of data you have to start archiving old data and that is the reason your bank account cannot show all data from the past but only predominantly last 12 months of activity. We are all used to it and banks decided to archive everything older to allow new data speed and good performance. And after that you have second example as e-mail accounts that can store infinite number of e-mails from the future (as Google, Yahoo, Hotmail, etc.). Same with Facebook. You do not see disclaimer that “only posts from the last 12 months will be visible”. Why? Because they use NoSQL.

But can we conclude that we have to use NoSQL because it is “better”? Yes and no. My opinion is that NoSQL brought different architecture of data storage, but if you start thinking about why NoSQL can scale indefinitely you will see that SQL can mimic that benefits, but it has to renounce “what is good in SQL”. The first to be shoot is normalized data structure and constraints. Why? Because if data is normalized you cannot store different parts on different servers. Especially if you have strict constraints validation. For example, if you have table of doctors with 100k doctors and other table of patients with 1m patients and if they have relationship you have to keep them together if you want to have any performance. If not, adding a new patient will demand search through all locations of doctors table data to check if that doctor exists which will make whole system really bad performance.

To conclude, I think the real answer is to reduce or even eliminate data normalization wherever it is possible. Also, you have to stop using databases as an integration layer for different systems and different management studios of databases as UIs to change data (as many companies does). As I always say: you can violate all these rules, but if you go with highly normalized SQL it will be really few times more expensive, demand more people for maintenance and will need some compromises for user experience. Every of two technologies should be selected based on use cases: reason, not emotions.

« »