One of the biggest challenges for analysts is to cleanse, transform & join multiple datasets so that the data can be used for reporting, analysis or modelling. This process is known as a data integration or in more technical terms as an Extract, Transform, and Load (ETL) process. Without the right software ETL is time consuming and painful, particularly when the data is required on a scheduled basis such as for monthly reporting.
SQL Server Integration Services (SSIS) is one of the more common & user friendly ETL tools. Its a common tool as SQL server databases are probably one of the most common databases used in business. As SSIS is a free component of the Standard & Enterprise editions many businesses have the tool. Therefore its a logical ETL tool for many businesses. I am, however, continually surprised how many businesses have data integrating challenges, but don’t realize the already own a tool that can help solve their pain.
Many analysts perceive Microsoft SQL Server to be a specialist tool, that only developers and users comfortable writing coding can use. This may have had some truth with the very early releases (7 & 2000) , but now its a very user friendly graphical user interface (GUI). Much of the functionality in SSIS is no more difficult than similar tasks in Excel. The advantage of a specialist ETL tool such as SSIS over a tool such as excel is that the process becomes much more flexible, scalable, auditable & repeatable.
The video associated with this post demonstrates how to build a basic SSIS ETL process using the “drag & drop” graphical user interface. The video shows how to
- Open SQL Server Integration Services (SSIS).
- How to navigate the Integration Development Environment (IDE) where you can create a simple ETL (Extract Transform Load) process.
- Build “Connection Managers” that connect Integration services to data sources. This is the Extract part.
- Filter data with a Conditional Split transformation. This is the Transform Part
- Send data to a destination location. This is the Load part.
Subsequent posts explore additional “data transformation” tasks such as filtering, aggregating, and deriving data.