Diff on Sorted Keys
Diffs two sorted streams of rows
Processing
The step accepts two streams of rows, coming from two input gates in
and reference
.
Input rows are matched to reference rows based on key fields. The step uses a diffing algorithm that requires both row streams to be be sorted on key fields in ascending order with nil values last, if any.
- if an input row has a matching reference row, it is either
identical
orchanged
depending on whether the data fields match as well - if an input row has no matching row in the reference stream, it is considered
new
- if a reference row has no matching row in the input stream, it is considered
deleted
If the key fields in either stream are not sorted or contain duplicates, the behaviour of the step is undefined.
Settings
Name | Type | Description |
---|---|---|
Key Fields |
N/A | Key fields to match on. Both input and reference inputs must be sorted by the key fields specified, nil values last. |
Data Fields |
N/A | Data fields to check for changes. |
Results
Name | Type | Description |
---|---|---|
diff |
string |
Can take one of the following values:
|
ref |
dict |
Reference row matched to current input row.
|
changes |
dict |
Records detected changes in data fields in case an input row is detected to be The structure is:
|