This blog originally started as a discrete series of “how to” videos. Each video explained how to use specific elements of SQL Server Integration Services (SSIS). However, after producing a couple dozen videos, its probably timely to write a post that ties together the various videos to produce a cohesive syllabus or plan of attack for folks wanting to learn SSIS. This post will be broken up into 3 sections.
- Why use SQL Server Integration Services (SSIS?)
- Building blocks of a SSIS solution.
- Managing Errors & Getting help
1) Why SQL Server Integration Services (SSIS?)
SQL Server Integration Services (SSIS) is an ETL tool. ETL stands for Extract, Transform and Load. The primary purpose of SSIS is to 1) Extract data from one or more sources files (or databases). 2) Then Transform the data by aggregating, filtering or manipulating it in some way. 3) Then Loading the data into one or destinations in as a specific file format.
Therefore SSIS helps solve one of the most common problems that businesses face, the integration & manipulation of data stored in different locations.
One of the advantages of SSIS is that most solutions can be built with the Graphical user interface (GUI). The GUI means that an analyst without any programing skills can build a fairly comprehensive ETL solution (known as a SSIS package). An example of SSIS package might be a process that “cleanses customer leads” that would involve the following elements.
a) Connecting to a daily leads file & importing the leads into SSIS.
b) Filtering out leads that don’t meet the qualifying criteria (such as non-qualifying segments etc.).
c) Loading the washed “clean leads” into a destination file (such as an CSV file) so that they can be loaded into a CRM system and agents can start calling the leads.
Because SSIS skills are fairly easy to develop it means that a number of analysts within a company can easily edit or update an ETL solution built by another analyst. This eliminates a single point of failure, such as if an analysts leaves the business.
The other good thing about Microsoft SQL Server in general is that its a great Business Intelligence tool full of features. This is reflected in the 2011 results of Gartner’s Magic quadrant analysis of Business Intelligence vendors & tools. Microsoft’s SQL Server ranks very well against other BI tools for both “Ability to Execute” & “Completeness of Vision”. So if you are looking at developing Business Intelligence skills, you can’t go too far wrong with SQL Server for tradition BI.
To give you a taste of what integration services is about, here is a 15 minute video clip that provides a basic overview of SSIS. It demonstrates how to build a simple SSIS package.
If you don’t have a copy of SQL Server, here is a video overview of 1) how to get a copy of SQL server 2) what version of SQL to use & 3) how to install it. http://www.youtube.com/watch?v=piY2_X019BQ.
If you want to buy a copy of SQL Server Developer Edition to learn how to use SQL you can get a copy from Amazon (by clicking on this link) – last time I looked they cost about $50.
2) Building blocks of a SSIS solution.
As mentioned above, SSIS is an ETL tool and performs 3 functions: Extracting, Transforming and Loading data. Here is more information on each of these steps.
a) Extract (Data Flow Sources). This step extracts data from 1 or more files. Different “Data flow sources” are used to connect to different file types. For example a “flat file connection manager” is used to connect to CSV or Text files, an “Excel connection manager” is used to connect to Excel files, an ADO.net connection manager for connecting to a SQL database tables etc.
b) Transform (Data flow transformations): There are dozens of transformations that can be used in SSIS. The following 8 are perhaps the commonly used. Therefore most solutions can be built using the following transformations.The list below contains links to YouTube videos demonstrating how to use each transformation. Some of the links do cover multiple transformations.
- Data Conversion (covered in the same video as the aggregate function).
- Derived Column (Covered in the standardizing date formats video)
- Merge Join
- Conditional Split (Covered in the 3 part 15 minute Overview video above).
- Sort (Covered in the Merge Join video).
- Aggregate (Covered in the Aggregate and Data conversion Video)
- Union ALL
An explanation of each these 8 transformations are provided in Table 1 below.
c) Load (Data Flow Destinations). The function is the same as a source connection manager, but rather than piping data into SSIS, it can pipe pipes data off to a range of different file types and destinations.
Table 1. Key data flow transformations.
Transformation Description Example of use
Merge Join Joins two data sources using a common field (join key). Prior to joining the join key needs to be sorted. Linking a list of customer leads to a master customer database using the customer's phone number.
Conditional Split Accepts an input and determines which destination to pipe the data into based on the result of an expression. Filtering a customer’s location or product holdings to restrict a dataset.
Sort Sorts data in ascending or descending order. Primarily used before a "Merge Join” transformation. The join key in both data sources need to be first sorted for a Merge Join to function to work.
Data Conversion Converts input columns from one data type to another. For example Text to Numeric formats. This transformation is useful when importing CSV or .txt files where, by default, all of the imported fields are in a string format. Converting currency data in a CSV file into currency format so that one currency field can be added to another.
Multicast Transformation that pipes the entire dataset to multiple destinations. Enables one copy of a customer leads file to be loaded into a CRM System, and another copy into a customer leads database
Derived Column Calculates new column value based on an existing column or multiple columns. Removing leading and trailing spaces from a column. Add title of courtesy (Mr, Mrs, Dr, etc.) to the name. Restructure dates into the correct format.
Aggregate Calculates aggregations such as SUM, COUNT, AVG, MIN and MAX based on the values of a given numeric column. Note MIN, MAX, & AVG functions don't work on columns formatted as text. Often data is converted into numeric or currency format prior to an AGGREGATE transformation. Calculating the SUM of purchases that a customer has made.
Lookup Performs a similar function as a Merge Join, but is more efficient & doesn't require pre-sorted inputs. Merging an Excel data source with a SQL Table
Union ALL Appends two or more datasets together. I.e. it pastes one dataset below one or more other datasets. Appending 5 days of customer’s sales or leads into a single file.
3) Managing Errors & Getting help
a) Truncation Errors in SSIS
Like all BI tools, so will get some errors in SSIS that you will need to fix. The most common error I have found is a truncation error. This is a problem stemming from the connection manger. The error is first usually identified when a connection manager turns red (see Figure 1). Truncation errors are easy to fix and a video demonstrating the simple fix is provided here: http://www.youtube.com/watch?v=8gBYOdI8gfI
b) Getting help with SSIS
You will eventually strike a problem you can’t figure out. When this happens the Microsoft MSDN forum for SQL Server Integration Server is a great resource. It is a free community based forum and most questions are answered within a few hours or even a few minutes. The forum address is
If a solution is provided, but the answer is too technical for me, I tend to use Elance.com for outsourcing small technical jobs. A demonstration of how I use a combination of the MSDN forum & Elance.com is provided here:
Well thats it- an overview of how to learn SQL Server Integration Services. Get hold of a copy of SQL and start building!