Data flows perform tasks in parallel. A data flow begins executing all of its steps concurrently. Any rows that are generated by input steps, such as database query steps, or CSV reading steps, are passed on to connected steps down the line and processed as soon as they arrive.
Above is a basic data flow structure. Records are read from a database, enriched with master data coming from a file, and written back to the database. All steps execute concurrently, and only a subset of all procesed records are in memory at any given time.
Imports are bringing libraries and values from tweakflow modules into flow scope. The imported libraries are available in the entire flow, and can be used in all expressions.
An example import section might look like this:
# import the standard library
import core, data, strings, time, math, fun, locale, regex, bin, decimals from 'std';
# import additional json library
import json from 'tweakstreet/json';
Parameters are named expression values that can be passed in when executing a flow. Parameter values are available in the entire flow. They are declared with default values which are used when the flow is invoked without specifying a value for a parameter.
Flow variables are named expression values that are available in the entire flow. They are typically used to specify flow-wide constants. They are also a good place to validate parameters, or calculate derived values from parameters.
Services are named expression values that are available in the entire flow. They describe various kinds of resources or configuration such as database connections, server credentials, etc.
Specifying these items in the services section of the flow makes it easier to define them and reference them when configuring steps to use them.
Flows provide data about themselves in additional flow variables.
Data flows start executing all their steps at once. There is no dedicated starting point. If a step has no predecessor step that feeds data into it, it is kickstarted with an empty row as input.
When a step executes, it performs its task, and potentially generates output rows passing them through its output gates to any connected steps. There are no semantic limitations on how many rows a step produces, or which output gates they are sent to.
A data flow finishes once all steps have finished processing rows.
Data rows are dicts that are carried along the execution path. Steps processing them have the opportunity to read, add, remove and replace fields.
A data flow finishes successfully when all steps finish processing without error.
Data flows that finish successfully can provide a return value called the result. By default the result value is nil. The Set Flow Result step can set the result value explicitly. When running a flow through the Run Flow step, the flow result value is available as one of the step results.
A flow that fails does not provide a result value, it is always nil.