Slowly Changing Dimension Lookup
Looks up records in a slowly changing dimension table
Processing
For every input row the step retrieves a record corresponding to given business keys and effective date from the dimension table.
Settings
Name | Type | Description |
---|---|---|
Connection |
dict |
The database connection to use. Evaluated for each input row |
Schema |
string |
The database schema of the table. Evaluated for each input row |
Table |
string |
Name of the dimension table in the database. Evaluated for each input row |
Effective Date |
datetime |
The effective date for which to look up the dimension record. Evaluated for each input row |
Fields | ||
Primary Key |
string |
The name of the primary key field in the database table. Evaluated once when step initializes |
Business Keys |
The fields and values making up the business keys of a record. These fields are used to identify the business entity to retrieve from the dimension table. Evaluated for each input row |
|
Dimension Fields |
These fields are fetched from the database in addition to the primary key of the dimension record. They are available as Evaluated once when step initializes |
|
Date Effective Field |
string |
The name of the table field holding the effective date of a record. Evaluated once when step initializes |
Date Expired Field |
string |
The name of the table field holding the expiry date of a record. Evaluated once when step initializes |
Cache | ||
Use Cache |
boolean |
Use a in-memory cache to hold table rows. Evaluated once when step initializes |
Cache Size |
long |
Number of records cached
Evaluated once when step initializes |
Pre-load Cache |
boolean |
If Evaluated once when step initializes |
Pre-load SQL Filter |
string |
If non-empty, this query text is appended as an AND condition to the cache pre-loading query. It should contain a predicate to append to the where clause of the query, for example: This is useful to selectively pre-warm the cache with a specific subset of records. Evaluated once when step initializes |
Pre-load SQL Parameters |
list |
If the pre-load cache filter contains Evaluated once when step initializes |
Bloom Filter | ||
Use Bloom Filter |
boolean |
If During processing, the bloom filter is then used to determine whether any records with given business keys are present in the table. The filter is most useful when few records are expected to actually exist in the dimension table. It allows skipping the corresponding lookup query. Evaluated once when step initializes |
Bloom Filter Size |
long |
The expected number of distinct business keys the bloom filter should be able to handle. This number is used to ensure the bloom filter operates with an expected false positive rate of 5%. That is 5% of the time the filter might indicate that a set of business keys might be present in the table, when in fact it is not. This number should be close to the number of distinct business keys in the dimension table. Evaluated once when step initializes |
Results
Name | Type | Description |
---|---|---|
key |
long |
Primary key for looked up dimension record. If no dimension record was found for the given business key and effective date, this value is |
record |
dict |
All fields looked up in dimension record. This value contains the primary key, business keys, and any fields specified as dimension fields. If no dimension record was found for the given business key and effective date, this value is |