XML Input
Parses XML and puts elements in rows
Processing
For each input row, the step parses an XML payload and generates parsing events as rows.
The step supports low-level parsing events such as begin element and end element. In addition the step supports tree events, which generate a DOM-style data structure for a sub-tree in the XML.
Settings
Name | Type | Description |
---|---|---|
General | ||
XML Source |
string |
Source of XML Text.
Evaluated for each input row |
Filesystem |
dict |
Only used if XML Source is set to file The filesystem to use. If the filesystem is Evaluated for each input row |
File |
string |
Only used if XML Source is set to file The path of the XML file to read. Relative paths are interpreted as relative to the flow file. Evaluated for each input row |
XML |
string |
Only used if XML Source is set to setting The XML text to parse. Evaluated for each input row |
Generate Events for |
list |
A list governing which tags to generate events for - a whitelist. If empty or If non-empty, the list contains matcher strings. At least one of the strings in the list must match a tag in order for the tag to generate events. Matching of matcher string
Evaluated for each input row |
Omit Events for |
list |
A list governing which tags to not generate events for - a blacklist. Only applied if a tag has successfully passed the ‘Generate Events for’ whitelist. If empty or If non-empty, the list contains matcher strings. If any of the strings in the list match a tag it does not generate events. Matching of matcher string
Evaluated for each input row |
Trim text nodes |
boolean |
If Evaluated for each input row |
Sub-Trees | ||
Extract Sub-Trees for |
list |
A list governing which tags to generate sub-trees for - a whitelist. Only applied if a tag has successfully passed the ‘Generate Events for’ whitelist and ‘Omit Events for’ blacklist. If empty or If non-empty, the list contains matcher strings. At least one of the strings in the list must match a tag in order for the tag to generate a sub-tree. Matching of matcher string
When a sub-tree is generated, it parses the entire content of the element and places it into a tree structure. The contents of a sub-tree are always complete. Contents of a sub-tree are not subject to whitelists/blacklist inclusion/exclusion rules. Evaluated for each input row |
Tree Element Placement |
string |
Governs how element names are generated:
Evaluated for each input row |
Lists for |
list |
Only used if ‘Tree Element Placement’ is set to index by name A list governing which tags to always place into lists, even if there is only one child. If non-empty, the list contains matcher strings. At least one of the strings in the list must match a tag in order for the tag to always be placed into a list, even if there is only one instance. Matching of matcher string
Example use: When working with a structure like:
it is convenient to always put
Otherwise single-book libraries will generate a structure just containing the single book item directly under
Evaluated for each input row |
Output Structure | ||
Element Names |
string |
Governs how element names are generated:
Evaluated for each input row |
Attribute Names |
string |
Governs how attribute names are generated:
Evaluated for each input row |
Attribute Placement |
string |
Governs how element names are generated:
Evaluated for each input row |
Text Nodes |
string |
Governs how text nodes are processed:
This setting does not apply to tree events. Sub-Trees always contain text information in This setting is intended for recovering text information from documents with rich-text structure such as Evaluated for each input row |
Misc Events | ||
Start Document |
string |
Governs whether to generate a start document event:
Evaluated for each input row |
End Document |
string |
Governs whether to generate an end document event:
Evaluated for each input row |
End Element |
string |
Governs whether to generate end element events:
Evaluated for each input row |
Results
Name | Type | Description | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
event_type |
string |
The type of event emitted:
|
||||||||||||||
event |
dict |
A dict with structure dependent on event type
|
||||||||||||||
parent |
dict |
The parent element’s start_element value or nil if parent element did not generate a start_element event. |
||||||||||||||
ancestors |
list |
All parent elements' start_element values or [] if no parent elements generated start_element events. The closest parent is at the beginning of the list. |