What should be the best Database design Strategies for MSSQL

Developing databases might seem relatively simple. However, Microsoft SQL Server database architects and administrators have many choices to make if they want their applications to perform well right from the start. Many attempt to use the “rapid development” approach by jumping straight to coding. Unfortunately, taking shortcuts often leads to performance problems that are considerably more difficult and costly to overcome once the application is deployed in a production environment. Thinking through all the options during database development might increase the time it takes to complete your project, but it will pay off in performance dividends .Taking the time to choose appropriate data models, data types, indexing strategies and code modules can save a lot of time and effort tuning the database down the road.

What’s the purpose of your database?

The first step in database development is to consider the type of application you’re about to build. There is a right tool for each job, and using the wrong tool can have unpleasant side effects. Keeping that in mind, databases generally have one of the following three purposes:

Online transaction processing (OLTP) :
Systems are used to collect the data. For example, you could build an OLTP system so your customers can order items from an online bookstore. Data models for OLTP systems are highly normalized. Each data element is recorded only in one place. By eliminating or at least minimizing data duplication, you can make transactions faster and reduce the amount of storage space needed.

Decision support system (DSS) :
Applications are used to generate detailed, transaction-level reports about data collected by OLTP systems. Because OLTP systems attempt to minimize data duplication, creating reports against such systems requires joining numerous tables, and it generally does not perform well. DSS applications mold the OLTP model into one that is much easier to report against by de-normalizing tables and introducing data duplication. Each Data warehouse systems are used to consolidate all enterprise data into a single data store and make it possible to correlate data from multiple OLTP systems. Data warehouses allow senior executives to examine a global picture of business performance and to detect areas of weakness for making strategic decisions. Data warehouses are built using star or snowflake schema data models, which consist of fact and dimension tables. These models are best suited for pre-aggregating data and for examining the business from various perspectives. For example, an executive could review car rental history based on date, make or model of a vehicle, customer demographics or store location. Although you can use an OLTP system for reporting or a data warehouse for collecting data, you’re bound to encounter performance problems if you do so.

Choose appropriate data types:


  • Use smallest data type possible
  • Use fixed length data type when values are mostly remains same in size.

Here is a simple yet often neglected principle: Use the smallest and most appropriate data type for each column. For example, you could use about a dozen different data types for storing a person’s age, such as BIGINT,INT, SMALLINT, TINYI NT, DECIMAL, FLOAT, CHAR(3), VARCHAR(3) and others. Because most of us won’t live more than 255 years, it is safe to use the data type that requires the least overhead and still supports the necessary range of values — TINYINT, which supports whole numbers between 0 and 255. Similarly, in a data warehousing scenario, you could use a character identifier for a vehicle rental instance to join this dimension to the rental fact. However, for best performance, you should create a surrogate key with an integer data type to make joins between fact and dimension tables as fast as possible. Using inappropriate data types can also affect your indexing strategies. SQL Server indexes are limited to 900 characters. If you create a VARCHAR column with more than 900 characters, you cannot index such columns. Furthermore, if your columns are wider than necessary, the indexes created on such columns will also be wider than necessary. SQL Server can scan — or seek through — lean indexes faster than fat indexes.

Decide on the best indexing strategies :

Appropriate indexes can make a world of difference in performance. SQL Server supports only two index types for most data types — clustered and non-clustered. SQL Server also supports full-text indexes and XML indexes, but those are relevant only for specific data types. It is crucial to choose the appropriate column or set of columns for your clustered index. The reason is that the table’s data is physically sorted by the values in the clustered index column or columns. You can create only a single clustered index on each table. Nonclustered indexes reference the clustered index keys (data values) to determine the physical location of each record. It is recommended that you. create the clustered index on the columns that do not change often, are highly selective and have lean data types. In many cases, the clustered index on an identity column is the best choice because identity values are highly selective — each record has a unique identity value — and they are never updated and are built using SMALLINT, INT or BIGINT data types. However, it is not uncommon to find a table that is never queried based on its identity column. If so, carefully consider how the data is commonly retrieved, perhaps by a foreign key to another table or by a character column. Often, you can improve performance by creating the clustered index on the column or set of columns that is most frequently used for retrieving the data. Some developers like to create composite clustered indexes. These span several columns, a combination of which uniquely identifies each record. This might sound like a good practice because the identity column has no business meaning, whereas other columns — such as hire date, department name and vehicle identification number — definitely translate into something immediately known by application users. However, from a performance perspective, you should avoid composite clustered indexes. Once again, the leaner the index, the faster SQL Server can scan or seek through it. You might find that for a small data set, composite indexes perform relatively well. But as the number of users grows, you’re bound to encounter problems. After you see performance benefits from building appropriate indexes, you might think your work is finished. But as data is added, modified and removed from tables, the respective indexes become fragmented. The higher the fragmentation, the less effective your indexes become. Now you’ll need to implement a plan for removing fragmentation from your indexes to ensure they remain effective. With prior versions of SQL Server, removing fragmentation from large indexes (tables with many millions of rows) often required downtime. Fortunately, SQL Server 2005 supports online index rebuilds that make your life considerably easier. Keep in mind, however, that rebuilding indexes still requires system resources and space in a tempdb database. If possible, schedule index maintenance during periods of minimum user activity.

Comments

Popular Posts