Analytics in Action

Learning SSIS

This blog originally started as a discrete series of “how to” videos. Each video explained how to use specific elements of SQL Server Integration Services (SSIS). However, after producing a couple dozen videos, its probably timely to write a post that ties together the various videos to produce a cohesive syllabus or plan of attack for folks wanting to learn SSIS. This post will be broken up into 3 sections.

  1. Why use SQL Server Integration Services (SSIS?)
  2. Building blocks of a SSIS solution.
  3. Managing Errors & Getting help

 

1)    Why SQL Server Integration Services (SSIS?)

SQL Server Integration Services (SSIS) is an ETL tool. ETL stands for Extract, Transform and Load. The primary purpose of SSIS is to 1) Extract data from one or more sources files (or databases). 2) Then Transform the data by aggregating, filtering or manipulating it in some way. 3) Then Loading the data into one or destinations in as a specific file format.

Therefore SSIS helps solve one of the most common problems that businesses face, the integration & manipulation of data stored in different locations.

One of the advantages of SSIS is that most solutions can be built with the Graphical user interface (GUI). The GUI means that an analyst without any programing skills can build a fairly comprehensive ETL solution (known as a SSIS package).  An example of SSIS package might be a process that “cleanses customer leads” that would involve the following elements.

a)     Connecting to a daily leads file & importing the leads into SSIS.

b)     Filtering out leads that don’t meet the qualifying criteria (such as non-qualifying segments etc.).

c)    Loading the washed “clean leads” into a destination file (such as an CSV file) so that they can be loaded into a CRM system and agents can start calling the leads.

Because SSIS skills are fairly easy to develop it means that a number of analysts within a company can easily edit or update an ETL solution built by another analyst. This eliminates a single point of failure, such as if an analysts leaves the business.

The other good thing about Microsoft SQL Server in general is that its a great Business Intelligence tool full of features. This is reflected in the 2011 results of Gartner’s Magic quadrant analysis of Business Intelligence vendors & tools. Microsoft’s  SQL Server ranks very well against other BI tools for both “Ability to Execute” & “Completeness of Vision”.  So if you are looking at developing Business Intelligence skills, you can’t go too far wrong with SQL Server for tradition BI.

To give you a taste of what integration services is about, here is a 15 minute video clip that provides a basic overview of SSIS. It demonstrates how to build a simple SSIS package.

Part 1 http://www.youtube.com/watch?v=tPdUYpi-m10&feature=related

Part 2 http://www.youtube.com/watch?v=mVrqYFkyoGo&feature=related

Part 3 http://www.youtube.com/watch?v=GVdKBs4Albc&feature=related

If you don’t have a copy of SQL Server, here is a video overview of 1) how to get a copy of SQL server 2) what version of SQL to use & 3) how to install it. http://www.youtube.com/watch?v=piY2_X019BQ.

If you want to buy a copy  of SQL Server Developer Edition to learn how to use  SQL you can get a copy from Amazon (by clicking on this link) – last time I looked they cost about $50.

2)   Building blocks of a SSIS solution.

As mentioned above, SSIS is an ETL tool and performs 3 functions: Extracting, Transforming and Loading data. Here is more information on each of these steps.

a)  Extract (Data Flow Sources). This step extracts data from 1 or more files. Different “Data flow sources” are used to connect to different file types. For example a “flat file connection manager” is used to connect to CSV or Text files, an “Excel connection manager” is used to connect to Excel files, an ADO.net connection manager for connecting to a SQL database tables etc.

 

b) Transform (Data flow transformations): There are dozens of transformations that can be used in SSIS. The following  8 are perhaps the commonly used. Therefore most solutions can be built using the following transformations.The list below contains links to YouTube videos demonstrating how to use each transformation. Some of the links do cover multiple transformations.

  1. Data Conversion (covered in the same video as the aggregate function).
  2. Derived Column (Covered in the standardizing date formats video)
  3. Merge Join
  4. Conditional Split (Covered in the 3 part 15 minute Overview video above).
  5. Sort (Covered in the Merge  Join video).
  6. Multicast
  7. Aggregate (Covered in the Aggregate and Data conversion Video)
  8. Union ALL
  9. Lookup

An  explanation of each these 8 transformations are provided in Table 1 below.

c)  Load (Data Flow Destinations). The function is the same as a source connection manager, but rather than piping data into SSIS, it can pipe pipes data off to a range of different file types and destinations.

Table 1. Key data flow transformations.

TransformationDescriptionExample of use
Merge JoinJoins two data sources using a common field (join key). Prior to joining the join key needs to be sorted.Linking a list of customer leads to a master customer database using the customer's phone number.
Conditional SplitAccepts an input and determines which destination to pipe the data into based on the result of an expression.Filtering a customer’s location or product holdings to restrict a dataset.
SortSorts data in ascending or descending order.Primarily used before a "Merge Join” transformation. The join key in both data sources need to be first sorted for a Merge Join to function to work.
Data ConversionConverts input columns from one data type to another. For example Text to Numeric formats. This transformation is useful when importing CSV or .txt files where, by default, all of the imported fields are in a string format. Converting currency data in a CSV file into currency format so that one currency field can be added to another.
MulticastTransformation that pipes the entire dataset to multiple destinations.Enables one copy of a customer leads file to be loaded into a CRM System, and another copy into a customer leads database
Derived ColumnCalculates new column value based on an existing column or multiple columns.Removing leading and trailing spaces from a column. Add title of courtesy (Mr, Mrs, Dr, etc.) to the name. Restructure dates into the correct format.
AggregateCalculates aggregations such as SUM, COUNT, AVG, MIN and MAX based on the values of a given numeric column. Note MIN, MAX, & AVG functions don't work on columns formatted as text. Often data is converted into numeric or currency format prior to an AGGREGATE transformation.Calculating the SUM of purchases that a customer has made.
LookupPerforms a similar function as a Merge Join, but is more efficient & doesn't require pre-sorted inputs.Merging an Excel data source with a SQL Table
Union ALLAppends two or more datasets together. I.e. it pastes one dataset below one or more other datasets.Appending 5 days of customer’s sales or leads into a single file.

3)   Managing Errors & Getting help

 

a)    Truncation Errors in SSIS

Like all BI tools, so will get some errors in SSIS  that you will need to fix. The most common error I have found  is a truncation error. This  is a problem stemming from the connection manger. The error is first usually identified when a connection manager turns red (see Figure 1). Truncation errors are easy to fix and a video demonstrating the simple fix is provided here: http://www.youtube.com/watch?v=8gBYOdI8gfI

Figure 1

b)    Getting help with SSIS

You will eventually strike a problem you can’t figure out. When this happens the Microsoft MSDN forum for SQL Server Integration Server is a great resource. It is a free community based forum and most questions are answered within a few hours or even a few minutes.  The forum address is

http://social.msdn.microsoft.com/forums/en-us/sqlintegrationservices/threads/

If a solution is provided, but the answer is too technical for me, I tend to use  Elance.com for outsourcing small technical  jobs.  A demonstration of how I use a combination of the MSDN forum & Elance.com is provided here:

http://www.youtube.com/watch?v=ytYbG1lUOVQ

Well thats it- an overview of how to learn SQL Server Integration Services. Get hold of a copy of SQL and start building!

25 Responses to “Learning SSIS”

  1. Sadik

    Thanks SteveFox, Information is userful

  2. thank so much it very good learning

  3. kumar

    Awesome!!!Really helpful

  4. Sivakumar

    Thanks a lot steveFox. You are great person

  5. Your Name

    Very helpful. Easy to learn.
    Thank you SteveFox.

  6. Satyam From India

    great material on BI thanx. Easy to grab !

    • admin

      Thanks – Glad you found the videos useful.
      Steve

  7. Thanks..it’s very useful information for beginners(like me) It’s really appreciated.

    • admin

      Thanks for the comment, great to hear that the videos helped.

  8. sunita

    Thank you that videos are really helpful.

  9. Leo

    Found your site to be very helpful. Very well executed. Looking forward to visiting your often. Thank you.

    • admin

      Thanks Leo, Glad you found the site useful.

      Cheers
      Steve

  10. Aishya

    Thanks for the videos. Very very useful.

  11. Joseph

    Steve, I found this to be very useful and insightful. I have been wanting to learn more about SQL server reporting for a long time, but only recently got the time. I am pretty knowledgeable on Access and can see similar functionality between the querying, joins, reporting, etc tools in SSIS and Accesss. I saw simialr functionality in the Cognos Reporting Studio as well. I have seen the same functionality in other BI tools too. What in your mind sets SSIS apart from the others? To me, it seems like this is more flexible and someone with less IT knowledge can utilize it, but I was interested in your thoughts on the subject. Thanks again. Love the videos.

    • admin

      Hi – yes you are right, much of the functionality of SSIS (joins etc) is similar to other BI tools. However, there are some advantages of a specialist ETL (extract transform load) tools like SSIS over tools such as MS Access. These include

      1) Better auditing of errors /mistakes. A simple example is the each time data passes through a separate transformation a count of rows is shown. This helps identify stages where an error may be stopping data data from flowing through the package. Another example is the redirection of errors to say a separate file. Also if an error occurs a transformation will turn red, this identifies at what stage a problem or error occurred. There are lots of similar examples.

      2) SSIS integrates natively with other SQL server based tools/services (such as Reporting services, Analysis services, SQL Server Agent, the core SQL Data base). Often solutions I design also use these other services. So for example we might use SSIS to extract data out of a Telco database and load it into our SQL Server Analysis Services data warehouse. This load is scheduled via SQL Server Agent to occur at 3 am when the server load is low. We also use SQL Server Analysis Services to build a predictive / machine learning model to predict which Telco customer in the dataset will churn. All of these tools are packed rather economically into a std edition of SQL that has a starting cost of say c. $1200

      3) Scale. MS access is limited to something like 2 GB of storage. No size limitations exist with SSIS /SQL.

      4) SSIS can be easily extended with code (via script transformations etc.)

      Hope this helps
      Regards
      Steve

  12. great executed task

  13. Hi ,
    I am new to SSIS ..I am really surprised i have seen 1000’s of ssis videos in web no one provided valuble information like you some are there but not as you .Here I got the valuble information thanks alot

    • admin

      Thanks for the comment. Great that you found the information useful. Cheers Steve

  14. Cliff Hardy

    I found you site to be very easyto understand and informative.

  15. ayja

    After you create the package how do you use that package to insert new rows into the db table?

    • admin

      Hi
      It will depend a lot on the situation. If you run the package again, the package will append data directly to the same table. So this can be used to load new rows. However, in most situation this would result in duplicate rows in your DB. If you need to apply some more advanced logic, such as deleting out the old data from the DB each time you want to run your package, you will probably need to use “T-SQL”. A “TRUNCATE” SQL statement could be used to remove all of the old rows in the DB, prior to re-running the package.

      An easy way to use T-SQL in SSIS is to use the “Execute SQL Task” in the “control flow”. If you don’t have experience with T-SQL you may find my post on T-SQL useful. Here is the link http://www.analyticsinaction.com/tsql/

      Hope this helps
      Regards
      Steve

  16. Jéfte Reis

    Awesome job. I´m taking B.I classes offered from Microsoft for free. I passed several tests in order to take these classes for free. Out of 3000 people I´m one of finalists 20 people left. Just wanna say that your tutorials are the most clear to understand I need to build a final project and I´m taking loads of useful information from you. Thank you so much keep up what you´re doing for the I.T community.
    From Brazil. Cheers.

    • admin

      Hi Jefte, Thanks for the very kind words. All the best with your final project. Cheers Steve