Sub Flow
Sends rows through a persistent data flow
Processing
The step runs a data flow with given parameters. The configured data flow accepts input rows thorugh its interface input steps and returns output rows thorugh its interface output steps.
Sub flows are kept alive and rows are directed to its interface input steps immediately as they arrive. Output rows are extracted from the sub flow’s interface output steps and are made available as results that generate output rows.
Row passing happens concurrently to maximize throughput.
Passing rows in and out
A sub data flow indicates which fields - if any - it wants to operate on through its interface input step. The main flow maps input rows to dicts and passes them on to the interface input step of the data flow. The interface input step then extracts the keys it is interested in ito its own row stream.
Similarly, a sub data flow indicates which fields - if any - it returns to the main flow through its interface output step. The interface output step constructs a dict containing the keys it passes back. The main flow makes this dict available in results and generates an output flow.
By convention, a sub data flow should declare a field named _
that it both accepts and returns. The entire input row arriving at the sub flow step is mapped into this field as a dict. The sub data flow would by convention preserve whatever data comes through a _
field and output it unaltered.
Using this method it is possible to carry the entire input row through a sub data flow.
I/O Multiplicity
If a sub data flow does not contain any interface input steps, all input rows are discarded.
If a sub data flow contains more than one interface input step, they are reading input rows concurrently as independent consumers of a single row stream.
If a sub data flow does not contain any interface output steps, the sub flow step does not generate any output rows.
If a sub data flow contains more than one interface output step, they generate output rows concurrently as independent producers. The sub flow step reads rows in a round robin fashion from all producers.
Settings
Name | Type | Description |
---|---|---|
Sub Flow |
string |
The path of the sub data flow to run. Each distinct path spawns a distict instance of a data flow. Evaluated for each input row |
Parameters |
dict |
Parameters to set when launching the sub flow. Each distinct set of parameters spawn a fresh instance of sub flow. Evaluated for each input row |
Mode |
string |
Reflects whether the step should attempt to preserve row structure by looking at output results, and trying to restore fields from a dict at key The sub data flow must accomodate this strategy by providing a Static configuration |
Input |
dict |
Input record dict to send into the sub flow. Evaluated for each input row |
Results
Name | Type | Description |
---|---|---|
output |
dict | An output row as generated by the sub flow’s interface output step. |