All ETL speeds are limited by something. Luckily, these factors are rather limited. The purpose of this article is to help you find your areas of opportunity to increase your ETL speeds using Microsoft SQL Server Integration Services 2012.
Careful timing and analysis of your ETL procedures can help you find ways to improve performance. Typically, the four types of improvement that exist are:
- Source System Speed
- Network speed
- Transformation Speed
- Load speed
While many solutions exist to improve performance, here a few techniques you might want to consider:
Source System Speed
ETL procedures can often put a great deal of stress on source systems, especially if you are using complex queries. Make sure your source system is properly indexed. Sometimes these systems are out of your control. Consider breaking up your ETL process by making copies of the source tables in your staging environment and then performing whatever queries you need to run.
While uncommon in the vast majority of modern applications, there still exist applications where network speeds and availability limit ETL speeds. In these cases, consider breaking out loads into smaller pieces, or using compressed flat files to transfer data.
There are many ways to decrease transformation times. The vast majority of these involve using your data flow components wisely, making the most of set-based operations. While designing your Data Flows, carefully consider your approach. Make sure you’re not using unnecessary components and that your most common lookups are properly cached. Sometimes, using a Cache Transform can greatly improve lookup speeds for large datasets.
Loading large tables can often be cumbersome. When working with a large data set, you need to ensure that your data can be loaded quickly and efficiently. Consider modifying the number of rows per batch on your destination component. Also, the effective use of table partitions can improve your load times. If you have a lot of late arriving facts and find yourself updating large amounts of data, consider using the Merge command.
When optimizing your packages, the best advice is to always go for the low-hanging fruit. This is almost always going to be your Data Flow Task.