SAS Tranport File Parser

June 30, 2009

In my current consulting project, I’m working with a lot of SAS transports files (about 120 for a total of 30 GB). I need to get the data into a format that is usable by dnAnalytics and QuantPro. I usually use DBMS/Copy or StatTransfer to convert SAS file into CSV and then go from there. In this case, I thought I would just write a SAS transport file parser and add it to QuantPro. This would allow me to import the SAS data directly to QuantPro.

The XPORT file format is a really simple, but with some minor gotchas. First, the data is stored in a big-endian byte order. Second, all floating point numbers are stored in an old IBM mainframe format (not IEEE 754). This format is not supported by .NET but it was simple enough to create a converter (but I probably didn’t do it in most efficient manner).

The parser has been added into QuantPro.IO.Sas namespace. It can be used as followed to copy variables from a SAS XPORT file into a QuantPro data store:

var reader = new TransportFileReader("data.xpt");
foreach( var member in reader.Members ){
   using( var qpFile = new ZipFileStore(member.DatasetName) ){
     var variables = reader.GetVariables(member.DatasetName);
     foreach( var variable in variables ){
       qp.Add(variable);
    }
  }
}

The parser supports multiple data members but ignores format labels. Data in SAS transport files is stored as either a string or a floating point number. Format labels can then be used to convert the floating point number into integers or dates. The parser only returns a string or double variable. It is up to the user to do any conversion. I’ll be adding support for format labels sometime in the future.

A little aside about DBMS/Copy. It looks like the utility is no longer available. SAS bought out Conceptual (the company the produced DBMS/Copy), but I cannot find it on the SAS site. That is a shame since it was probably the best tool for converting data from one format to another. Maybe its time to create QuantPro Transfer, a free tool with similar functionality as DBMS/Copy and StatTransfer.

[Post to Twitter] Tweet This Post  [Post to Plurk] Plurk This Post  [Post to Delicious] Delicious This Post 

{ Comments on this entry are closed }

Tinkering with NDepend

June 12, 2009

I was curious about NDepend so I thought I would give it a spin. Wow – it could do much more than I thought. Three features I found very useful are project dependencies, build comparisons, and static analysis (NDepend has many more features). Project dependencies let you see how every detail of [...]

Read the full article →

Generic Matrices with Pluggable Storage

June 3, 2009

[Update: The code below will perform roughly the same as a non-generic version on the x86 CLR, but is about 30% slower on the x64 CLR. This due to the fact that the x64 CLR doesn't inline methods with struct parameters]
I’ve spent last several days trying to create a generic matrix class, Matrix<T>. [...]

Read the full article →

ERF Time Series Page Requirements

May 14, 2009

This project has been pushed back a bit. I’ve been busy working on the dnAnalytics’ parallel Map code and the QuantPro data storage. I’ll get back to the ERF rewrite the first week of June. I wanted to start hashing out the sites requirements. User are most interested in viewing, charting, [...]

Read the full article →

ERF Data Model Update

May 5, 2009

While writing the data acquisition programs, I ran into a couple of issues with the current model.  Most notably, NHibernate will not persist null values for a list.  This looks like a NHibernate optimization, since it keeps the values ordered with an index column. So when recreating the list, it knows where to put [...]

Read the full article →