Archive for September 17th, 2007



Perl ETL: Surrogate Key Lookup Caching

Here’s a tip I picked up from a co-worker (thanks Matt H.) a few years back during a big ETL rewrite he and I were doing for the company I worked for. First off, I don’t recommend doing ETL in Perl. Perl is interpreted, which makes it a little slower, and it also can become a maintenance nightmare. I love Perl but coding ETL in Perl is exhausting. I’d recommend using something like DataStage, Informatica, or Data Integrator instead. Those programs are expensive though, and if your company is on a tight budget they might not spring for any of those tools. Perl is free and that really is one of the cheapest ways to go about this. We’ll proceed under the premise that you’re left with no other options.

This code snippet can be used for a fact table load job. I’m presuming a working knowledge of Perl, ETL, SQL, and dimensional modeled (star schema) databases here. This also presumes you have knowledge …read more »