Remap with VRL

Modify your observability data as it passes through your topology using Vector Remap Language (VRL)

status: beta egress: stream state: stateless

Is the recommended transform for parsing, shaping, and transforming data in Vector. It implements the Vector Remap Language (VRL), an expression-oriented language designed for processing observability data (logs and metrics) in a safe and performant manner.

Please refer to the VRL reference when writing VRL scripts.

Configuration

Example configurations

{
  "transforms": {
    "my_transform_id": {
      "type": "remap",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "source": ". = parse_json!(.message)\n.new_field = \"new value\"\n.status = to_int!(.status)\n.duration = parse_duration!(.duration, \"s\")\n.new_name = del(.old_name)",
      "file": "./my/program.vrl"
    }
  }
}
[transforms.my_transform_id]
type = "remap"
inputs = [ "my-source-or-transform-id" ]
source = """
. = parse_json!(.message)
.new_field = "new value"
.status = to_int!(.status)
.duration = parse_duration!(.duration, "s")
.new_name = del(.old_name)"""
file = "./my/program.vrl"
---
transforms:
  my_transform_id:
    type: remap
    inputs:
      - my-source-or-transform-id
    source: |-
      . = parse_json!(.message)
      .new_field = "new value"
      .status = to_int!(.status)
      .duration = parse_duration!(.duration, "s")
      .new_name = del(.old_name)      
    file: ./my/program.vrl
{
  "transforms": {
    "my_transform_id": {
      "type": "remap",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "source": ". = parse_json!(.message)\n.new_field = \"new value\"\n.status = to_int!(.status)\n.duration = parse_duration!(.duration, \"s\")\n.new_name = del(.old_name)",
      "file": "./my/program.vrl",
      "drop_on_error": null,
      "drop_on_abort": true
    }
  }
}
[transforms.my_transform_id]
type = "remap"
inputs = [ "my-source-or-transform-id" ]
source = """
. = parse_json!(.message)
.new_field = "new value"
.status = to_int!(.status)
.duration = parse_duration!(.duration, "s")
.new_name = del(.old_name)"""
file = "./my/program.vrl"
drop_on_abort = true
---
transforms:
  my_transform_id:
    type: remap
    inputs:
      - my-source-or-transform-id
    source: |-
      . = parse_json!(.message)
      .new_field = "new value"
      .status = to_int!(.status)
      .duration = parse_duration!(.duration, "s")
      .new_name = del(.old_name)      
    file: ./my/program.vrl
    drop_on_error: null
    drop_on_abort: true

drop_on_abort

optional bool
Drop the event if the VRL program is manually aborted through the abort statement.
default: true

drop_on_error

optional bool
Drop the event if the VRL program returns an error at runtime.
default: false

file

common optional string literal

File path to the Vector Remap Language (VRL) program to execute for each event.

If a relative path is provided, its root is the current working directory.

Required if source is missing.

Examples
"./my/program.vrl"

inputs

required [string]

A list of upstream source or transform IDs. Wildcards (*) are supported.

See configuration for more info.

Array string literal
Examples
[
 "my-source-or-transform-id",
 "prefix-*"
]

source

common optional string remap_program

The Vector Remap Language (VRL) program to execute for each event.

Required if file is missing.

Examples
". = parse_json!(.message)\n.new_field = \"new value\"\n.status = to_int!(.status)\n.duration = parse_duration!(.duration, \"s\")\n.new_name = del(.old_name)"

Telemetry

Metrics

link

component_received_event_bytes_total

counter
The number of event bytes accepted by this component either from tagged origins like file and uri, or cumulatively from other origins.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_scope required
The Vector component scope.
component_type required
The Vector component type.
container_name optional
The name of the container from which the event originates.
file optional
The file from which the event originates.
host required
The hostname of the system Vector is running on.
mode optional
The connection mode used by the component.
peer_addr optional
The IP from which the event originates.
peer_path optional
The pathname from which the event originates.
pid required
The process ID of the Vector instance.
pod_name optional
The name of the pod from which the event originates.
uri optional
The sanitized URI from which the event originates.

component_received_events_total

counter
The number of events accepted by this component either from tagged origins like file and uri, or cumulatively from other origins.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_scope required
The Vector component scope.
component_type required
The Vector component type.
container_name optional
The name of the container from which the event originates.
file optional
The file from which the event originates.
host required
The hostname of the system Vector is running on.
mode optional
The connection mode used by the component.
peer_addr optional
The IP from which the event originates.
peer_path optional
The pathname from which the event originates.
pid required
The process ID of the Vector instance.
pod_name optional
The name of the pod from which the event originates.
uri optional
The sanitized URI from which the event originates.

component_sent_event_bytes_total

counter
The total number of event bytes emitted by this component.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_scope required
The Vector component scope.
component_type required
The Vector component type.
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

component_sent_events_total

counter
The total number of events emitted by this component.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_scope required
The Vector component scope.
component_type required
The Vector component type.
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

events_in_total

counter
The number of events accepted by this component either from tagged origins like file and uri, or cumulatively from other origins. This metric is deprecated and will be removed in a future version. Use component_received_events_total instead.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_scope required
The Vector component scope.
component_type required
The Vector component type.
container_name optional
The name of the container from which the event originates.
file optional
The file from which the event originates.
host required
The hostname of the system Vector is running on.
mode optional
The connection mode used by the component.
peer_addr optional
The IP from which the event originates.
peer_path optional
The pathname from which the event originates.
pid required
The process ID of the Vector instance.
pod_name optional
The name of the pod from which the event originates.
uri optional
The sanitized URI from which the event originates.

events_out_total

counter
The total number of events emitted by this component. This metric is deprecated and will be removed in a future version. Use component_sent_events_total instead.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_scope required
The Vector component scope.
component_type required
The Vector component type.
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

processed_bytes_total

counter
The number of bytes processed by the component.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_scope required
The Vector component scope.
component_type required
The Vector component type.
container_name optional
The name of the container from which the bytes originate.
file optional
The file from which the bytes originate.
host required
The hostname of the system Vector is running on.
mode optional
The connection mode used by the component.
peer_addr optional
The IP from which the bytes originate.
peer_path optional
The pathname from which the bytes originate.
pid required
The process ID of the Vector instance.
pod_name optional
The name of the pod from which the bytes originate.
uri optional
The sanitized URI from which the bytes originate.

processed_events_total

counter
The total number of events processed by this component. This metric is deprecated in place of using component_received_events_total and component_sent_events_total metrics.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_scope required
The Vector component scope.
component_type required
The Vector component type.
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

processing_errors_total

counter
The total number of processing errors encountered by this component.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_scope required
The Vector component scope.
component_type required
The Vector component type.
error_type required
The type of the error
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

utilization

gauge
A ratio from 0 to 1 of the load on a component. A value of 0 would indicate a completely idle component that is simply waiting for input. A value of 1 would indicate a that is never idle. This value is updated every 5 seconds.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_scope required
The Vector component scope.
component_type required
The Vector component type.
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

Examples

Parse Syslog logs

Given this event...
{
 "log": {
  "message": "\u003c102\u003e1 2020-12-22T15:22:31.111Z vector-user.biz su 2666 ID389 - Something went wrong"
 }
}
...and this configuration...
[transforms.my_transform_id]
type = "remap"
inputs = [ "my-source-or-transform-id" ]
source = ". |= parse_syslog!(.message)"
---
transforms:
  my_transform_id:
    type: remap
    inputs:
      - my-source-or-transform-id
    source: . |= parse_syslog!(.message)
{
  "transforms": {
    "my_transform_id": {
      "type": "remap",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "source": ". |= parse_syslog!(.message)"
    }
  }
}
...this Vector event is produced:
{
 "log": {
  "appname": "su",
  "facility": "ntp",
  "hostname": "vector-user.biz",
  "message": "Something went wrong",
  "msgid": "ID389",
  "procid": 2666,
  "severity": "info",
  "timestamp": "2020-12-22T15:22:31.111Z",
  "version": 1
 }
}

Parse key/value (logfmt) logs

Given this event...
{
 "log": {
  "message": "@timestamp=\"Sun Jan 10 16:47:39 EST 2021\" level=info msg=\"Stopping all fetchers\" tag#production=stopping_fetchers id=ConsumerFetcherManager-1382721708341 module=kafka.consumer.ConsumerFetcherManager"
 }
}
...and this configuration...
[transforms.my_transform_id]
type = "remap"
inputs = [ "my-source-or-transform-id" ]
source = ". = parse_key_value!(.message)"
---
transforms:
  my_transform_id:
    type: remap
    inputs:
      - my-source-or-transform-id
    source: . = parse_key_value!(.message)
{
  "transforms": {
    "my_transform_id": {
      "type": "remap",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "source": ". = parse_key_value!(.message)"
    }
  }
}
...this Vector event is produced:
{
 "log": {
  "@timestamp": "Sun Jan 10 16:47:39 EST 2021",
  "id": "ConsumerFetcherManager-1382721708341",
  "level": "info",
  "module": "kafka.consumer.ConsumerFetcherManager",
  "msg": "Stopping all fetchers",
  "tag#production": "stopping_fetchers"
 }
}

Parse custom logs

Given this event...
{
 "log": {
  "message": "2021/01/20 06:39:15 +0000 [error] 17755#17755: *3569904 open() \"/usr/share/nginx/html/test.php\" failed (2: No such file or directory), client: xxx.xxx.xxx.xxx, server: localhost, request: \"GET /test.php HTTP/1.1\", host: \"yyy.yyy.yyy.yyy\""
 }
}
...and this configuration...
[transforms.my_transform_id]
type = "remap"
inputs = [ "my-source-or-transform-id" ]
source = """
. |= parse_regex!(.message, r'^(?P<timestamp>\\d+/\\d+/\\d+ \\d+:\\d+:\\d+ \\+\\d+) \\[(?P<severity>\\w+)\\] (?P<pid>\\d+)#(?P<tid>\\d+):(?: \\*(?P<connid>\\d+))? (?P<message>.*)$')

# Coerce parsed fields
.timestamp = parse_timestamp(.timestamp, "%Y/%m/%d %H:%M:%S %z") ?? now()
.pid = to_int!(.pid)
.tid = to_int!(.tid)

# Extract structured data
message_parts = split(.message, ", ", limit: 2)
structured = parse_key_value(message_parts[1], key_value_delimiter: ":", field_delimiter: ",") ?? {}
.message = message_parts[0]
. = merge(., structured)"""
---
transforms:
  my_transform_id:
    type: remap
    inputs:
      - my-source-or-transform-id
    source: >-
      . |= parse_regex!(.message, r'^(?P<timestamp>\d+/\d+/\d+ \d+:\d+:\d+
      \+\d+) \[(?P<severity>\w+)\] (?P<pid>\d+)#(?P<tid>\d+):(?:
      \*(?P<connid>\d+))? (?P<message>.*)$')


      # Coerce parsed fields

      .timestamp = parse_timestamp(.timestamp, "%Y/%m/%d %H:%M:%S %z") ?? now()

      .pid = to_int!(.pid)

      .tid = to_int!(.tid)


      # Extract structured data

      message_parts = split(.message, ", ", limit: 2)

      structured = parse_key_value(message_parts[1], key_value_delimiter: ":", field_delimiter: ",") ?? {}

      .message = message_parts[0]

      . = merge(., structured)      
{
  "transforms": {
    "my_transform_id": {
      "type": "remap",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "source": ". |= parse_regex!(.message, r'^(?P<timestamp>\\d+/\\d+/\\d+ \\d+:\\d+:\\d+ \\+\\d+) \\[(?P<severity>\\w+)\\] (?P<pid>\\d+)#(?P<tid>\\d+):(?: \\*(?P<connid>\\d+))? (?P<message>.*)$')\n\n# Coerce parsed fields\n.timestamp = parse_timestamp(.timestamp, \"%Y/%m/%d %H:%M:%S %z\") ?? now()\n.pid = to_int!(.pid)\n.tid = to_int!(.tid)\n\n# Extract structured data\nmessage_parts = split(.message, \", \", limit: 2)\nstructured = parse_key_value(message_parts[1], key_value_delimiter: \":\", field_delimiter: \",\") ?? {}\n.message = message_parts[0]\n. = merge(., structured)"
    }
  }
}
...this Vector event is produced:
{
 "log": {
  "client": "xxx.xxx.xxx.xxx",
  "connid": "3569904",
  "host": "yyy.yyy.yyy.yyy",
  "message": "open() \"/usr/share/nginx/html/test.php\" failed (2: No such file or directory)",
  "pid": 17755,
  "request": "GET /test.php HTTP/1.1",
  "server": "localhost",
  "severity": "error",
  "tid": 17755,
  "timestamp": "2021-01-20T06:39:15Z"
 }
}

Multiple parsing strategies

Given this event...
{
 "log": {
  "message": "\u003c102\u003e1 2020-12-22T15:22:31.111Z vector-user.biz su 2666 ID389 - Something went wrong"
 }
}
...and this configuration...
[transforms.my_transform_id]
type = "remap"
inputs = [ "my-source-or-transform-id" ]
source = """
structured =
  parse_syslog(.message) ??
  parse_common_log(.message) ??
  parse_regex!(.message, r'^(?P<timestamp>\\d+/\\d+/\\d+ \\d+:\\d+:\\d+) \\[(?P<severity>\\w+)\\] (?P<pid>\\d+)#(?P<tid>\\d+):(?: \\*(?P<connid>\\d+))? (?P<message>.*)$')
. = merge(., structured)"""
---
transforms:
  my_transform_id:
    type: remap
    inputs:
      - my-source-or-transform-id
    source: >-
      structured =
        parse_syslog(.message) ??
        parse_common_log(.message) ??
        parse_regex!(.message, r'^(?P<timestamp>\d+/\d+/\d+ \d+:\d+:\d+) \[(?P<severity>\w+)\] (?P<pid>\d+)#(?P<tid>\d+):(?: \*(?P<connid>\d+))? (?P<message>.*)$')
      . = merge(., structured)      
{
  "transforms": {
    "my_transform_id": {
      "type": "remap",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "source": "structured =\n  parse_syslog(.message) ??\n  parse_common_log(.message) ??\n  parse_regex!(.message, r'^(?P<timestamp>\\d+/\\d+/\\d+ \\d+:\\d+:\\d+) \\[(?P<severity>\\w+)\\] (?P<pid>\\d+)#(?P<tid>\\d+):(?: \\*(?P<connid>\\d+))? (?P<message>.*)$')\n. = merge(., structured)"
    }
  }
}
...this Vector event is produced:
{
 "log": {
  "appname": "su",
  "facility": "ntp",
  "hostname": "vector-user.biz",
  "message": "Something went wrong",
  "msgid": "ID389",
  "procid": 2666,
  "severity": "info",
  "timestamp": "2020-12-22T15:22:31.111Z",
  "version": 1
 }
}

Modify metric tags

Given this event...
{
 "metric": {
  "counter": {
   "value": 102
  },
  "kind": "incremental",
  "name": "user_login_total",
  "tags": {
   "email": "vic@vector.dev",
   "host": "my.host.com",
   "instance_id": "abcd1234"
  }
 }
}
...and this configuration...
[transforms.my_transform_id]
type = "remap"
inputs = [ "my-source-or-transform-id" ]
source = """
.tags.environment = get_env_var!("ENV") # add
.tags.hostname = del(.tags.host) # rename
del(.tags.email)"""
---
transforms:
  my_transform_id:
    type: remap
    inputs:
      - my-source-or-transform-id
    source: |-
      .tags.environment = get_env_var!("ENV") # add
      .tags.hostname = del(.tags.host) # rename
      del(.tags.email)      
{
  "transforms": {
    "my_transform_id": {
      "type": "remap",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "source": ".tags.environment = get_env_var!(\"ENV\") # add\n.tags.hostname = del(.tags.host) # rename\ndel(.tags.email)"
    }
  }
}
...this Vector event is produced:
{
 "metric": {
  "counter": {
   "value": 102
  },
  "kind": "incremental",
  "name": "user_login_total",
  "tags": {
   "environment": "production",
   "hostname": "my.host.com",
   "instance_id": "abcd1234"
  }
 }
}

Emitting multiple logs from JSON

Given this event...
{
 "log": {
  "message": "[{\"message\": \"first_log\"}, {\"message\": \"second_log\"}]"
 }
}
...and this configuration...
[transforms.my_transform_id]
type = "remap"
inputs = [ "my-source-or-transform-id" ]
source = ". = parse_json!(.message) # sets `.` to an array of objects"
---
transforms:
  my_transform_id:
    type: remap
    inputs:
      - my-source-or-transform-id
    source: ". = parse_json!(.message) # sets `.` to an array of objects"
{
  "transforms": {
    "my_transform_id": {
      "type": "remap",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "source": ". = parse_json!(.message) # sets `.` to an array of objects"
    }
  }
}
...this Vector event is produced:
[{"log":{"message":"first_log"}},{"log":{"message":"second_log"}}]

Emitting multiple non-object logs from JSON

Given this event...
{
 "log": {
  "message": "[5, true, \"hello\"]"
 }
}
...and this configuration...
[transforms.my_transform_id]
type = "remap"
inputs = [ "my-source-or-transform-id" ]
source = ". = parse_json!(.message) # sets `.` to an array"
---
transforms:
  my_transform_id:
    type: remap
    inputs:
      - my-source-or-transform-id
    source: ". = parse_json!(.message) # sets `.` to an array"
{
  "transforms": {
    "my_transform_id": {
      "type": "remap",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "source": ". = parse_json!(.message) # sets `.` to an array"
    }
  }
}
...this Vector event is produced:
[{"log":{"message":5}},{"log":{"message":true}},{"log":{"message":"hello"}}]

How it works

Emitting multiple log events

Multiple log events can be emitted from remap by assigning an array to the root path .. One log event is emitted for each input element of the array.

If any of the array elements isn’t an object, a log event is created that uses the element’s value as the message key. For example, 123 is emitted as:

{
  "message": 123
}

Event Data Model

You can use the remap transform with both log and metric events.

Log events in the remap transform correspond directly to Vector’s log schema, which means that the transform has access to the whole event.

With metric events the remap transform has:

  • read-only access to the event’s.type
  • read/write access to kind, but it can only be set to one of incremental or absolute and cannot be deleted
  • read/write access to name, but it cannot be deleted
  • read/write/delete access to namespace, timestamp, and keys in tags

Lazy Event Mutation

When you make changes to an event through VRL’s path assignment syntax, the change isn’t immediately applied to the actual event. If the program fails to run to completion, any changes made until that point are dropped and the event is kept in its original state.

If you want to make sure your event is changed as expected, you have to rewrite your program to never fail at runtime (the compiler can help you with this).

Alternatively, if you want to ignore/drop events that caused the program to fail, you can set the drop_on_error configuration value to true.

Learn more about runtime errors in the Vector Remap Language reference.

Vector Remap Language

The Vector Remap Language (VRL) is a restrictive, fast, and safe language we designed specifically for mapping observability data. It avoids the need to chain together many fundamental Vector transforms to accomplish rudimentary reshaping of data.

The intent is to offer the same robustness of full language runtime (ex: Lua) without paying the performance or safety penalty.

Learn more about Vector’s Remap Language in the Vector Remap Language reference.

State

This component is stateless, meaning its behavior is consistent across each input.