You come across all sorts of data when working with SSIS. If you can get data from a relational source or a simple data file format your work is greatly simplified. But sometimes you get file extracts from legacy or proprietary systems you don’t have much control over. One such format is the Header-Detail-Trailer type of file.
Here I have a Pizza store’s daily sales extract (pizza.txt), a pipe delimited file with a header, details and footer.
Header H has a date representing the sales date and the name of the file.
Details D have the store id, type of pizza followed by volume sales and prices for S M L size pizzas.
Trailer T has a count of detail rows. |
 |
So how do we get this source data loaded into a destination table on SQL Server as defined below? The LoadDate is only in the 1st header row, we don’t want to load the header or trailer rows and don’t want the D identifier for detail rows.
CREATE TABLE LoadPizza(
LoadDate DATE,
StoreId INT,
PizzaName NVARCHAR(30),
SoldSmall INT,
SoldMed INT,
SoldLarge INT,
PriceSmall MONEY,
PriceMed MONEY,
PriceLarge MONEY)
Open BIDS (Visual Studio 2008) and create a new Integration Services Project, lets rename the default package to Pizza.dtsx. Drop a Data Flow Task on the Control Flow. The Data Flow is where all the work is performed. The final solution will look something like this

You can see that the 18 rows from pizza.txt (1 header, 16 details, 1 footer) loads the data (ie the detail rows) into the table LoadPizza, obtaining LoadDate from the Header record, followed by only the detail record data. Why the use of 3 Row Count data flow items (DetailRows, HeaderRows, TrailerRows)? Without these the header and footer would also appear in the destination table as rows (nulls allowed in all columns) since they all flow through the pipeline to the OLE DB destination unless they are somehow redirected in the flow.

Start by dragging a Flat File Source onto the Data Flow design surface. The first thing to do is set up a New connection to point to the pizza.txt file (right click Edit) that contains the data source via the Browse button and name the Connection manager.
Select the format Ragged right, each row being delimited by a CR/LF and we do not want to skip any rows. We could skip the 1st row which is the header but it contains the Date representing the pizza sales we will need to insert to our target table. If we skip this row it will never be available in the pipeline. SSIS gives a warning that Columns are not defined for this connection manager. That’s fine we’ll do this later.

Click on Columns. For Ragged Right formats you can actually click on the “Ruler” at points in the file to create columns. We cant do this here as Header, Trailer and detail rows have different structures, we will use a Script component later to split the row up.

|
In Advanced options, rename the column to PizzaRow and increase the size of the OutputColumnWidth to 100 |
 |
|
If we preview the Rows, we can see each file row is represented in one column, all the way down to the trailer T|16| record. |
 |
We need to drag a Script Component to the design surface to break this up. When asked to Select Script Component Type, choose Transformation – the script is a transformation in the data flow and operates on data from input columns and provides output columns. Join the Flat File Source Path to the Script Component. You get an error icon as no code yet exists.
Add 3 Row Count items under the Script Component to the design surface and name as shown by using the Properties pane. Add an OLE DB Destination to the design surface. We could use an SQL Server Destination as well for better performance if the package runs on the same server as the destination table, otherwise OLE DB is a safer choice.
Link the Script Component SRC_Pizzas to the first Row Count item - DetailRows. When you create a Script component as a transformation, by default it has a single input and output flow.
You will only see one output flow since currently there is only 1 output path. We’ll add more Output paths later so we can link the Header and Trailer Row Count items. |
 |
|
Since Row Count items need to store count values (or you’ll get an error), define 3 count variables via menu SSIS / Variables
|
 |
Edit the Script Component. Configuring the input and outputs for the component is one of the steps that you must complete in metadata design mode, by using the Script Transformation Editor, before you write your custom script code.
|
Input Columns – check PizzaRow as the Available Input Column. Remember this is what we named the single column in the Advanced properties of SRC_Pizzas.
|
 |
|
Input and Outputs - The initial view looks like this.
Click and Rename Input 0 to PizzaRows via the properties window that appears.
Click and Rename Output 0 to DetailRows.
|
 |
Expand DetailRows, highlight Output Columns and click the Add Column button. Do not mistakenly click Add Output, here we want to add all the columns this script component will send “Out” in its data flow when we write the script code, we are just defining the ‘column structure’ to match what our target destination expects. For each column you add, make sure the ‘folder’ Output Columns is highlighted in the tree view first, then click Add Column to ensure they are added in order.
Once all the DetailRow output columns are added, click each column and set the data type as defined in the SQL Server table, eg LoadDate is set to DT_DATE

In particular the PizzaName is set to Unicode string DT_WSTR and the prices to currency DT_CY. SSIS will do the type conversion for you but we’ll need to convert the Date ourselves since our header has it as 20110606 yyyyMMdd format.
|
So we have just laid out the structure of what will flow OUT of the Script Component SCR_Pizzas INTO the DetailRows output. But what about HeaderRow and TrailerRow – they don’t have the ability to join any Paths to them?
|
 |
Lets get rid of the 3 errors first in the Row Count items – “The variable (null) specified by VariableName property is not a valid variable. Need a valid variable name to write to”
This is done by clicking each Row Count item and selecting the appropriate User:variable name we setup earlier.
|
 |
If you’re still with me and haven’t yet fallen asleep or died or boredom, grab a coffee, fun stuff ahead…
Edit the Script task ‘Inputs and Outputs’ again. We now add two new Outputs, HeaderRow and TrailerRow. Click Add Output, rename it to HeaderRow and set its SynchronousInputId, via the drop down, to PizzaRows. Add another Output and rename it to TrailerRow, set the SynchronousInputId as above. We could of used only one combined HeaderTrailer output as well, instead of individual outputs.

Add a string Output column to each Output (click Output Columns then Add Column). These will become visible inside the script code and contain our header and trailer data. Set the data type of these columns to string.

Set all Output Rows (Detail, Header, and Trailer) to have a non-zero Exclusion Group (set them all to 1)

See http://msdn.microsoft.com/en-us/library/ms136114.aspx for more information about Synchronous Transformation and Exclusion Groups. Basically this will provide a method in the script code to redirect rows based on conditions you can check eg Row.DirectRowToHeaderRows(); Without specifying the Exclusion Group, these methods are not available inside the code window.
|
Click OK, a new arrow appears under SRC_Pizzas, connect this to HeaderRow.
It asks you to make a connection since you defined 2 new Output flows, pick HeaderRows
|
 |

|
Click SCR_Pizzas again, another arrow appears, join this to TrailerRow (no prompt appears as it’s the last available Output flow). The design surface should look like this
|
 |
Let’s take a break from Script task, we haven’t written any code yet but we’ll get there. Join the arrow from DetailRows Row Count item to OLEDB Pizza. Edit the OLE DB Destination to point to the target table in a database of your choice, see below (of course the table must exist in your DB).

Now when you click Mappings you will see the Output columns from the DetailRow Count appear as Available Input columns that line up to the target table columns. SSIS can derive this by matching column names. Notice PizzaRow in Available Input Columns, our large row string, goes nowhere. This is what we want as in a minute we will break up this row to setup the new columns we added manually in the Script Transformation Editor.

|
Finally we write some code to remove the final warning in SRC_Pizza. So far we have defined ‘new’ output columns in SRC_Pizza and have 3 Output “paths” we will send Header, Detail and Trailer rows.
|
 |
|
Right click SRC_Pizza and select Edit. Depending on your default language you could have Visual Basic 2008 or C#. Click the Edit Script button.
This will launch the code editor.
|
 |
Declare some variables up top
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
DateTime processDate;
string[] token_arr;
bool trailer;
const string DELIM = "|";
Most of the code goes here
public override void PizzaRows_ProcessInputRow(PizzaRowsBuffer Row)
{
if (processDate.Year == 1)
{
// We have not initialise the date yet (default 01/01/0001)
ProcessHeader(Row.PizzaRow);
Row.DirectRowToHeaderRows();
}
else if (!trailer)
{
// We are now on subsequent rows
token_arr = Row.PizzaRow.Split(DELIM.ToCharArray());
if (token_arr[0] == "T")
{
trailer = true;
Row.DirectRowToTrailerRows();
}
else
{
// Detail Row assumed (could validate later)
Row.LoadDate = processDate;
Row.StoreId = Convert.ToInt32(token_arr[1]);
Row.PizzaName = token_arr[2];
Row.SoldSmall = Convert.ToInt32(token_arr[3]);
Row.SoldMed = Convert.ToInt32(token_arr[4]);
Row.SoldLarge = Convert.ToInt32(token_arr[5]);
Row.PriceSmall = Convert.ToDecimal(token_arr[6]);
Row.PriceMed = Convert.ToDecimal(token_arr[7]);
Row.PriceLarge = Convert.ToDecimal(token_arr[8]);
Row.DirectRowToDetailRows();
}
}
}
private void ProcessHeader(string input)
{
token_arr = input.Split(DELIM.ToCharArray());
if (token_arr[0] == "H")
{
// Date is 2nd array value
processDate = DateTime.ParseExact(token_arr[1], "yyyyMMdd", null);
}
}
Since you have defined the output columns with their corresponding data types, intellisense knows about the columns and their data types in the script code.
PizzaRows_ProcessInputRow is fired for each row in the buffer, so you don’t need any looping yourself. if (processDate.Year == 1) relates to the header row since at this iteration, the processDate has not been set and defaults to the year 0001 for a DateTime variable. We grab the date off the header by splitting the row string into an array and converting the 2nd subscript (or [1]) to a date. We are done with the Header as it is redirected to the HeaderRow flow and will not flow through to the DetailRows flow, avoiding insertion into the destination table.
The next iterations could be details or trailer. If we haven’t yet found the trailer, we check for it and if found set a module level flag trailer = true and redirect the flow.
Otherwise the row must be a Detail Row, which we have again split into an array and assign the relevant subscripts to the Output columns we defined for DetailRows in the Script Transformation Editor.
Running the package gives the results shown at the beginning of this post. The package could of course be improved with the use of variables for filenames and connection managers and better error handling logic but the intention was to focus on how you might approach a solution with a data file like the sample pizza.txt header, detail, trailer rows.
Code sample here (21Kb)