Regex Parser Transform

The Vector regex_parser transform parses a log field's value with a Regular Expression.

Warnings

Configuration

[transforms.my_transform_id]
# General
type = "regex_parser" # required
inputs = ["my-source-or-transform-id", "prefix-*"] # required
drop_failed = false # optional, default
drop_field = true # optional, default
field = "message" # optional, default
patterns = ['^(?P<timestamp>[\\w\\-:\\+]+) (?P<level>\\w+) (?P<message>.*)$'] # required
# Types
types.status = "int" # example
types.duration = "float" # example
types.success = "bool" # example
types.timestamp_iso8601 = "timestamp|%F" # example
types.timestamp_custom = "timestamp|%a %b %e %T %Y" # example
types.timestamp_unix = "timestamp|%F %T" # example
types.parent.child = "int" # example
  • commonoptionalbool

    drop_failed

    If the event should be dropped if parsing fails.

    • Default: false
  • commonoptionalbool

    drop_field

    If the specified field should be dropped (removed) after parsing.

    • Default: true
  • commonoptionalstring

    field

    The log field to parse.

    • Syntax: literal
    • Default: "message"
  • optionalbool

    overwrite_target

    If target_field is set and the log contains a field of the same name as the target, it will only be overwritten if this is set to true.

    • Default: true
  • commonrequired[string]

    patterns

    The Regular Expressions to apply. Do not include the leading or trailing / in any of the expressions.

  • optionalstring

    target_field

    If this setting is present, the parsed fields will be inserted into the log as a sub-object with this name. If a field with the same name already exists, the parser will fail and produce an error.

    • Syntax: literal
  • commonoptionaltable

    types

    Key/value pairs representing mapped log field names and types. This is used to coerce log fields from strings into their proper types. The available types are listed in the Types list below.

    Timestamp coercions need to be prefaced with timestamp|, for example "timestamp|%F". Timestamp specifiers can use either of the following:

    1. One of the built-in-formats listed in the Timestamp Formats table below.
    2. The time format specifiers from Rust's chrono library.

    Types

    • array
    • bool
    • bytes
    • float
    • int
    • map
    • null
    • timestamp (see the table below for formats)

    Timestamp Formats

    FormatDescriptionExample
    %F %TYYYY-MM-DD HH:MM:SS2020-12-01 02:37:54
    %v %TDD-Mmm-YYYY HH:MM:SS01-Dec-2020 02:37:54
    %FT%TISO 8601[RFC
    3339](https://tools.ietf.org/html/rfc3339) format without time zone

    2020-12-01T02:37:54 %a, %d %b %Y %T | RFC 822/2822 without time zone | Tue, 01 Dec 2020 02:37:54 %a %d %b %T %Y | date command output without time zone | Tue 01 Dec 02:37:54 2020 %a %b %e %T %Y | ctime format | Tue Dec 1 02:37:54 2020 %s | UNIX timestamp | 1606790274 %FT%TZ | ISO 8601/RFC 3339 UTC | 2020-12-01T09:37:54Z %+ | ISO 8601/RFC 3339 UTC with time zone | 2020-12-01T02:37:54-07:00 %a %d %b %T %Z %Y | date command output with time zone | Tue 01 Dec 02:37:54 PST 2020 %a %d %b %T %z %Y| date command output with numeric time zone | Tue 01 Dec 02:37:54 -0700 2020 %a %d %b %T %#z %Y | date command output with numeric time zone (minutes can be missing or present) | Tue 01 Dec 02:37:54 -07 2020

    Note: the examples in this table are for 54 seconds after 2:37 am on December 1st, 2020 in Pacific Standard Time. See Named Captures for more info.

Telemetry

This component provides the following metrics that can be retrieved through the internal_metrics source. See the metrics section in the monitoring page for more info.

  • counter

    processing_errors_total

    The total number of processing errors encountered by this component. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • error_type - The type of the error

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    events_in_total

    The total number of events accepted by this component. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    processed_events_total

    The total number of events processed by this component. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • file - The file that produced the error

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    events_out_total

    The total number of events emitted by this component. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

  • counter

    processed_bytes_total

    The total number of bytes processed by the component. This metric includes the following tags:

    • component_kind - The Vector component kind.

    • component_name - The Vector component ID.

    • component_type - The Vector component type.

    • instance - The Vector instance identified by host and port.

    • job - The name of the job producing Vector metrics.

Examples

Given the following Vector log event:

{
"message": "5.86.210.12 - zieme4647 5667 [19/06/2019:17:20:49 -0400] \"GET /embrace/supply-chains/dynamic/vertical\" 201 20574"
}

And the following configuration:

vector.toml
[transforms.regex_parser]
type = "regex_parser"
field = "message"
patterns = ['^(?P<host>[\w\.]+) - (?P<user>[\w]+) (?P<bytes_in>[\d]+) \[(?P<timestamp>.*)\] "(?P<method>[\w]+) (?P<path>.*)" (?P<status>[\d]+) (?P<bytes_out>[\d]+)$']
types.bytes_in = "int"
types.timestamp = "timestamp|%d/%m/%Y:%H:%M:%S %z"
types.status = "int"
types.bytes_out = "int"

The following Vector log event will be output:

{
"bytes_in": 5667,
"host": "5.86.210.12",
"user_id": "zieme4647",
"timestamp": "2019-06-19T17:20:49-0400",
"method": "GET",
"path": "/embrace/supply-chains/dynamic/vertical",
"status": 201,
"bytes_out": 20574
}

How It Works

Failed Parsing

By default, if the input message text does not match any of the configured regular expression patterns, this transform will log an error message but leave the log event unchanged. If you instead wish to have this transform drop the event, set drop_failed = true.

Flags

Regex flags can be toggled with the (?flags) syntax. The available flags are:

FlagDescriuption
icase-insensitive: letters match both upper and lower case
mmulti-line mode: ^ and $ match begin/end of line
sallow . to match \n
Uswap the meaning of x* and x*?
uUnicode support (enabled by default)
xignore whitespace and allow line comments (starting with #)

For example, to enable the case-insensitive flag you can write:

(?i)Hello world

More info can be found in the Regex grouping and flags documentation.

Named Captures

You can name Regex captures with the <name> syntax. For example:

^(?P<timestamp>\w*) (?P<level>\w*) (?P<message>.*)$

Will capture timestamp, level, and message. All values are extracted as string values and must be coerced with the types table.

More info can be found in the Regex grouping and flags documentation.

Regex Debugger

If you are having difficulty with your regular expression not matching text, you may try debugging your patterns at [Regex 101][regex_tester]. This site includes a regular expression tester and debugger. The regular expression engine used by Vector is most similar to the "Go" implementation, so make sure that is selected in the "Flavor" menu.

Regex Syntax

Vector uses the Rust standard regular expression engine for pattern matching. Its syntax shares most of the features of Perl-style regular expressions, with a few exceptions. You can find examples of patterns in the [Rust regex module documentation][rust_regex_syntax].

State

This component is stateless, meaning its behavior is consistent across each input.