Monday, October 07, 2013

Perils of Stored Procedures



Introduction

The topic will seem to be a sacrilege to quite a few members of the architect and designer community. Stored Procedures have been considered to be a panacea for all performance problems in the RDBMS world. Of late they also have been considered as the panacea to prevent SQL injection in applications. Given this background it will not be surprising to see many architects and designers react with shock and awe looking at the heading of the topic.

Advantages/Benefits of Stored Procedures

First let us explore some of the advantages of Stored Procedures.
1. Usage of Stored Procedures prevents SQL injection, unless one uses dynamic queries within the Procedures themselves.
2. The data is processed within the database server and only the data that is really required to be sent to the client travels over the network. This optimizes the network traffic between the Database Server and Client and makes the performance better.
3. It is easier to leverage features like cursors and temporary tables to optimise the processing of the data.
4. All the applications that can connect to the database server can execute the business processes by executing the stored procedures in the database. This leads to the belief that stored procedures eases in reusability of code across applications.

The Past

Given these advantages it is not surprising that many consider stored procedures to be a panacea to many of the problems that they face. These benefits are(read were) genuine and were a boon in the days past.
Till a decade ago the scalability requirements of the applications were limited. A bulk of the processing happened in batch processes and the users of the Online Transaction Processing transactions were limited. Applications typically followed the client server paradigm where the client managed the User Interactions and the Database managed the Business Processes.
There was a lot going in the favour of stored procedures in this scenario. A big sized server could manage the load that was generated by the online transaction processing.

The Present

The scale of operations in today's applications has drastically changed. The applications today are not exposed to a limited number of users like the applications of the bygone era. Today's applications are exposed on the internet and are accessed via browers, tablets and mobile phones. If one considers a banking application, in the past, this access and usage of this application was limited to the employees of the bank. Today the same banking application is exposed on the internet to the customers of the bank. A variety of functionality of the application are exposed through the internet. The customers can enquire their balance, they can make a payment to a vendor via net banking, they can transfer funds to another person and for a variety of other features are available to the customers. This has tremendously increased the load on the application.
Before we proceed it is important to remember the fact that stored procedures execute within the database server and can leverage resources in that server.
If now, in an application a majority of the business logic is in stored procedures, every user who will be using the application will be using the resources in the database server. So as the number of users and correspondingly the number of transactions in the applications go up the load in the database servers go up. Soon we will reach a stage where the database server reaches its capacity. The only way now to scale the application will be to increase the sizing of the server on which the database is running. Databases scale only VERTICALLY and not HORIZONTALLY (Yes, I hear you, we will come to Oracle RAC in a short while)
Instead of having all the logic or majority of the logic in the databases, one were to put them in the "Application Servers" and leverage features like distributed caching it will be possible to scale the application horizontally. The database will be used primarily for storing and retrieving the data required for the data. To scale the application it will be possible to increase the number of "Application Servers" on which the application runs. In this way one can almost scale indefinitely.
The retrieval of data from the database server to the application server will add some inefficiency of the processing, but this will offset by the fact that one will be able to scale the application support a much larger load.
The cost of ownership also comes down as one can use normal servers as "Application Servers". One does not require high cost specialized servers for this purpose.
To get the efficiency of a stored procedure one can use PreparedStatement in the Java world and Prepared Commands in the .Net world. Using PreparedStatements and Prepared Commands, the right way, one can also shield the application from SQL injection.

Other perils of stored procedures

1. If one makes a heavy usage of stored procedures then one will be stuck with the particular database for the life of the application. Migrating to another database will be an expensive proposition.
2. This becomes an even bigger limitation in case of products. The customers wish to have the ability to be able to run the product using a database of their choice rather than the vendor of the application deciding the underlying database. If we use stored procedures in the product we will end up maintaining stored procedures per database and this will be an expensive proposition.
3. The code in stored procedures is not conducive for efficient documentation even if one uses good coding practices and we end up having bad code smells which are not easy to wish or whiff off.
4. The code in stored procedures is not conducive for refactoring which means that the code will continue to smell.
5. The newer techniques of using interfaces and/or web services to expose features of an application to another application are a much better way of reusing the business logic rather than using the stored procedures from multiple applications. These mechanisms give a more elegant mechanism to loosely couple applications than stored procedures.

Note on Oracle RAC

Oracle RAC has the capability of scaling horizontally. But it comes with its own set of complexities and limitations. Databases leverage the memory on the servers to cache data to give improved performance. This is possible and it is very efficient when the database server is running on a single server as all the changes to the data pass through the same process and it is possible to update the cache with the changed data or invalidate the right section of the cache.
When one starts running the database servers on more than one physical server it becomes important to ensure that any change to the data is conveyed to all the caches on the various servers or the corresponding caches are invalidated on all the servers or before using the data in the cache one will need to validate if the data has changed through the process in the other server. This is not easy and adds to the overhead of processing thus reducing the benefit that ones gets from the caching mechanisms used by the traditional databases.

A note on NoSQL

Today’s applications have reached a scale where even the most powerful RDBMS is unable to scale to the levels to traffic experienced in these applications. Even if it scales, the cost of the hardware required to support these loads, is prohibitive.
Google, Facebook, Twitter and many sites which deal with large data (read Big Data) all use RDBMS only to a limited extent. They instead use a new breed of database servers classified as NoSQL. The term NoSQL itself has no expansion. It was used as a twitter hashtag to denote the set of database applications which do not use SQL to provide access to the data that is stored. Also they do not store the data in the traditional way where the data is organized in tables and columns. NoSQLs will be a topic for another post sometime in future.
This further strengthens the reasons to use stored procedures only in really exceptional situations.

Conclusion

There are situations where stored procedures are still a possible solution. But given the scale of operations of today's applications it is advisable to consider stored procedures only in really, really exceptional situations. It is time we understand and learn better ways of developing applications that perform and are secure rather than sticking to the older practice of using stored procedures to achieve performance.
In the CMG meet at TCS, Mumbai Mr. B. Gopal on 4th October used an apt phrase “Conservation of Minimum Complexity” to mean that any system has a minimum level of complexity and it not possible to reduce this complexity any further. The aim should be distribute the complexity across the various layers of the system. When one makes excessive usage of stored procedures one is increasing the complexity of the database layer and this will is not be good in the long run both from a performance as well as from a maintenance perspective.

 

No comments: