ORC Output
Writes records into an ORC file
Processing
For each incoming row, the step writes a record into an ORC file.
Settings
Name | Type | Description |
---|---|---|
General | ||
Hadoop |
dict |
Determines the Hadoop environment to use. The ORC file format is deeply tied to the Hadoop/Hive ecosystem and requires the Hadoop libraries of your distribution to be present. Evaluated once when step initializes |
Output File |
string |
The path of the output file. Evaluated for each input row |
If File Exists |
string |
Determines what to do if the output file already exists.
Evaluated for each new file created |
Columns |
How output file structure is established:
Specified at design time |
|
Format | ||
Compression |
string |
The compression to use for the file.
Evaluated for each new file created |
Block Size (KB) |
long |
If compression is used, this is the size of a compression blocks in KiB. Evaluated for each new file created |
Stripe Size (MB) |
long |
The size of stripes in the ORC file in MiB. Evaluated for each new file created |
Create Index |
boolean |
If Evaluated for each new file created |
Index Stride |
long |
Number of rows between index entries. Evaluated for each new file created |
Table Properties |
dict |
Additional table properties set for the created file. Refer to the ORC Table Properties documentation for details. Evaluated for each new file created |
ORC Columns - fixed
Defines data written to the ORC file.
Evaluated for each input row
- Column Type
- The hive data format used for the column.
- Name
- Name of the column.
- Value
- The value to write into the column.
ORC Columns - data driven
When output columns are data driven, the step determines file structure at runtime.
Name | Type | Description |
---|---|---|
CSV Columns |
list |
A list defining the columns written to file. Each entry in the list is a dict with the following keys:
To specify three output columns you would supply a definition such as:
Evaluated for each new file |
Column Values |
list or dict |
A list or dict defining the data to write to file. If the data structure is a list, the value for each column is extracted in order. If the data structure is a dict, the values for columns are extracted by using column names as keys. For example, consider the following column definition:
We can supply column values as a list:
We can also supply column values as a dict:
Evaluated for each input row |
Results
This step does not provide any results.