Diff on Sorted Keys
Diffs two sorted streams of rows
Processing
The step accepts two streams of rows, coming from two input gates in and reference.
Input rows are matched to reference rows based on key fields. The step uses a diffing algorithm that requires both row streams to be be sorted on key fields in ascending order with nil values last, if any.
- if an input row has a matching reference row, it is either
identicalorchangeddepending on whether the data fields match as well - if an input row has no matching row in the reference stream, it is considered
new - if a reference row has no matching row in the input stream, it is considered
deleted
If the key fields in either stream are not sorted or contain duplicates, the behaviour of the step is undefined.
Settings
| Name | Type | Description |
|---|---|---|
Key Fields |
N/A | Key fields to match on. Both input and reference inputs must be sorted by the key fields specified, nil values last. |
Data Fields |
N/A | Data fields to check for changes. |
Results
| Name | Type | Description |
|---|---|---|
diff |
string |
Can take one of the following values:
|
ref |
dict |
Reference row matched to current input row.
|
changes |
dict |
Records detected changes in data fields in case an input row is detected to be The structure is:
|