ORC Output

Writes records into an ORC file

Processing

For each incoming row, the step writes a record into an ORC file.

Settings

Name	Type	Description
General
Hadoop	dict	Determines the Hadoop environment to use. The ORC file format is deeply tied to the Hadoop/Hive ecosystem and requires the Hadoop libraries of your distribution to be present. Evaluated once when step initializes
Output File	string	The path of the output file. Evaluated for each input row
If File Exists	string	Determines what to do if the output file already exists. replace: replaces the file with a new one error: throws an error Evaluated for each new file created
Columns		How output file structure is established: fixed The columns of the output file are static. They are specified directly. data driven The columns of the output file are established at runtime. Specified at design time
Format
Compression	string	The compression to use for the file. NONE - no compression is used ZLIB - zlib compression is used SNAPPY - snappy compression is used Evaluated for each new file created
Block Size (KB)	long	If compression is used, this is the size of a compression blocks in KiB. Evaluated for each new file created
Stripe Size (MB)	long	The size of stripes in the ORC file in MiB. Evaluated for each new file created
Create Index	boolean	If `true` inline indexes are created in the file. Evaluated for each new file created
Index Stride	long	Number of rows between index entries. Evaluated for each new file created
Table Properties	dict	Additional table properties set for the created file. Refer to the ORC Table Properties documentation for details. Evaluated for each new file created

ORC Columns - fixed

Defines data written to the ORC file.

Evaluated for each input row

Column Type: The hive data format used for the column.
Name: Name of the column.
Value: The value to write into the column.

ORC Columns - data driven

When output columns are data driven, the step determines file structure at runtime.

Name Type Description

Name	Type	Description
CSV Columns	list	A list defining the columns written to file. Each entry in the list is a dict with the following keys: :name - The name of the column :orcType - The hive type of the column To specify three output columns you would supply a definition such as: `[ {:name 'id', :orcType 'bigint'}, {:name 'name', :orcType 'string'}, {:name 'sku', :orcType 'string'} ]` Evaluated for each new file
Column Values	list or dict	A list or dict defining the data to write to file. If the data structure is a list, the value for each column is extracted in order. If the data structure is a dict, the values for columns are extracted by using column names as keys. For example, consider the following column definition: `[ {:name 'id', :orcType 'bigint'}, {:name 'name', :orcType 'string'} ]` We can supply column values as a list: `[123, "Sherlock Holmes"]` We can also supply column values as a dict: `{ :id 123, :name "Sherlock Holmes" }` Evaluated for each input row

CSV Columns

list

A list defining the columns written to file.

Each entry in the list is a dict with the following keys:

:name - The name of the column
:orcType - The hive type of the column

To specify three output columns you would supply a definition such as:

[
  {:name 'id',   :orcType 'bigint'},
  {:name 'name', :orcType 'string'},
  {:name 'sku',  :orcType 'string'}
]

Evaluated for each new file

Column Values

list or dict

A list or dict defining the data to write to file.

If the data structure is a list, the value for each column is extracted in order.

If the data structure is a dict, the values for columns are extracted by using column names as keys.

For example, consider the following column definition:

[
  {:name 'id',   :orcType 'bigint'},
  {:name 'name', :orcType 'string'}
]

We can supply column values as a list:

[123, "Sherlock Holmes"]

We can also supply column values as a dict:

{
  :id 123,
  :name "Sherlock Holmes"
}

Evaluated for each input row

Results

This step does not provide any results.

Tweakstreet v1.22.6

crypto⌃

data⌃

data - selection⌃

data - transforms⌃

json⌃

strings⌃

time⌃

time - between⌃

time - get⌃

time - set⌃

urls⌃

Filesystems⌃

Databases⌃

Kafka⌃

MongoDB⌃

Smtp Mail Server⌃

OAuth 2.0⌃

Hadoop⌃

ORC Output

Processing

Settings

Hadoop

Output File

If File Exists

Columns

Compression

Block Size (KB)

Stripe Size (MB)

Create Index

Index Stride

Table Properties

ORC Columns - fixed

ORC Columns - data driven

CSV Columns

Column Values

Results