Tag cardinality limit

Limit the cardinality of tags on metrics events as a safeguard against cardinality explosion

status: beta egress: stream state: stateful
Limits the cardinality of tags on metric events, protecting against accidental high cardinality usage that can commonly disrupt the stability of metrics storages.

Configuration

Example configurations

{
  "transforms": {
    "my_transform_id": {
      "type": "tag_cardinality_limit",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "limit_exceeded_action": "drop_tag",
      "mode": "exact",
      "value_limit": 500
    }
  }
}
[transforms.my_transform_id]
type = "tag_cardinality_limit"
inputs = [ "my-source-or-transform-id" ]
limit_exceeded_action = "drop_tag"
mode = "exact"
value_limit = 500
---
transforms:
  my_transform_id:
    type: tag_cardinality_limit
    inputs:
      - my-source-or-transform-id
    limit_exceeded_action: drop_tag
    mode: exact
    value_limit: 500
{
  "transforms": {
    "my_transform_id": {
      "type": "tag_cardinality_limit",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "cache_size_per_tag": 5120000,
      "limit_exceeded_action": "drop_tag",
      "mode": "exact",
      "value_limit": 500
    }
  }
}
[transforms.my_transform_id]
type = "tag_cardinality_limit"
inputs = [ "my-source-or-transform-id" ]
cache_size_per_tag = 5_120_000
limit_exceeded_action = "drop_tag"
mode = "exact"
value_limit = 500
---
transforms:
  my_transform_id:
    type: tag_cardinality_limit
    inputs:
      - my-source-or-transform-id
    cache_size_per_tag: 5120000
    limit_exceeded_action: drop_tag
    mode: exact
    value_limit: 500

cache_size_per_tag

optional uint
The size of the cache in bytes to use to detect duplicate tags. The bigger the cache the less likely it is to have a ‘false positive’ or a case where we allow a new value for tag even after we have reached the configured limits.
default: 5.12e+06 (bytes)
Relevant when: mode = "probabilistic"

inputs

required [string]

A list of upstream source or transform IDs. Wildcards (*) are supported.

See configuration for more info.

Array string literal
Examples
[
  "my-source-or-transform-id",
  "prefix-*"
]

limit_exceeded_action

common optional string literal enum
Controls what should happen when a metric comes in with a tag that would exceed the configured limit on cardinality.
Enum options string literal
OptionDescription
drop_eventDrop any metric events that contain tags that would exceed the configured limit
drop_tagRemove tags that would exceed the configured limit from the incoming metric
default: drop_tag

mode

required string literal enum
Controls what approach is used internally to keep track of previously seen tags and determine when a tag on an incoming metric exceeds the limit.
Examples
"exact"
"probabilistic"
Enum options string literal
OptionDescription
exactHas higher memory requirements than probabilistic, but never falsely outputs metrics with new tags after the limit has been hit.
probabilisticHas lower memory requirements than exact, but may occasionally allow metric events to pass through the transform even when they contain new tags that exceed the configured limit. The rate at which this happens can be controlled by changing the value of cache_size_per_tag.

value_limit

common optional uint
How many distinct values to accept for any given key.
default: 500

Telemetry

Metrics

link

component_received_event_bytes_total

counter
The number of event bytes accepted by this component either from tagged origins like file and uri, or cumulatively from other origins.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_type required
The Vector component type.
container_name optional
The name of the container from which the data originated.
file optional
The file from which the data originated.
host required
The hostname of the system Vector is running on.
mode optional
The connection mode used by the component.
peer_addr optional
The IP from which the data originated.
peer_path optional
The pathname from which the data originated.
pid required
The process ID of the Vector instance.
pod_name optional
The name of the pod from which the data originated.
uri optional
The sanitized URI from which the data originated.

component_received_events_total

counter
The number of events accepted by this component either from tagged origins like file and uri, or cumulatively from other origins.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_type required
The Vector component type.
container_name optional
The name of the container from which the data originated.
file optional
The file from which the data originated.
host required
The hostname of the system Vector is running on.
mode optional
The connection mode used by the component.
peer_addr optional
The IP from which the data originated.
peer_path optional
The pathname from which the data originated.
pid required
The process ID of the Vector instance.
pod_name optional
The name of the pod from which the data originated.
uri optional
The sanitized URI from which the data originated.

component_sent_event_bytes_total

counter
The total number of event bytes emitted by this component.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_type required
The Vector component type.
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

component_sent_events_total

counter
The total number of events emitted by this component.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_type required
The Vector component type.
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

events_in_total

counter
The number of events accepted by this component either from tagged origins like file and uri, or cumulatively from other origins. This metric is deprecated and will be removed in a future version. Use component_received_events_total instead.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_type required
The Vector component type.
container_name optional
The name of the container from which the data originated.
file optional
The file from which the data originated.
host required
The hostname of the system Vector is running on.
mode optional
The connection mode used by the component.
peer_addr optional
The IP from which the data originated.
peer_path optional
The pathname from which the data originated.
pid required
The process ID of the Vector instance.
pod_name optional
The name of the pod from which the data originated.
uri optional
The sanitized URI from which the data originated.

events_out_total

counter
The total number of events emitted by this component. This metric is deprecated and will be removed in a future version. Use component_sent_events_total instead.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_type required
The Vector component type.
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

processed_bytes_total

counter
The number of bytes processed by the component.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_type required
The Vector component type.
container_name optional
The name of the container from which the bytes originate.
file optional
The file from which the bytes originate.
host required
The hostname of the system Vector is running on.
mode optional
The connection mode used by the component.
peer_addr optional
The IP from which the bytes originate.
peer_path optional
The pathname from which the bytes originate.
pid required
The process ID of the Vector instance.
pod_name optional
The name of the pod from which the bytes originate.
uri optional
The sanitized URI from which the bytes originate.

processed_events_total

counter
The total number of events processed by this component. This metric is deprecated in place of using component_received_events_total and component_sent_events_total metrics.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_type required
The Vector component type.
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

tag_value_limit_exceeded_total

counter
The total number of events discarded because the tag has been rejected after hitting the configured value_limit.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_type required
The Vector component type.
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

utilization

gauge
A ratio from 0 to 1 of the load on a component. A value of 0 would indicate a completely idle component that is simply waiting for input. A value of 1 would indicate a that is never idle. This value is updated every 5 seconds.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_type required
The Vector component type.
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

value_limit_reached_total

counter
The total number of times new values for a key have been rejected because the value limit has been reached.
component_id required
The Vector component ID.
component_kind required
The Vector component kind.
component_name required
Deprecated, use component_id instead. The value is the same as component_id.
component_type required
The Vector component type.
host required
The hostname of the system Vector is running on.
pid required
The process ID of the Vector instance.

Examples

Drop high-cardinality tag

Given this event...
[{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{"user_id":"user_id_1"}}},{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{"user_id":"user_id_2"}}}]
...and this configuration...
[transforms.my_transform_id]
type = "tag_cardinality_limit"
inputs = [ "my-source-or-transform-id" ]

  [transforms.my_transform_id.fields]
  value_limit = 1
  limit_exceeded_action = "drop_tag"
---
transforms:
  my_transform_id:
    type: tag_cardinality_limit
    inputs:
      - my-source-or-transform-id
    fields:
      value_limit: 1
      limit_exceeded_action: drop_tag
{
  "transforms": {
    "my_transform_id": {
      "type": "tag_cardinality_limit",
      "inputs": [
        "my-source-or-transform-id"
      ],
      "fields": {
        "value_limit": 1,
        "limit_exceeded_action": "drop_tag"
      }
    }
  }
}
...this Vector event is produced:
[{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{"user_id":"user_id_1"}}},{"metric":{"counter":{"value":2},"kind":"incremental","name":"logins","tags":{}}}]

How it works

Intended Usage

This transform is intended to be used as a protection mechanism to prevent upstream mistakes. Such as a developer accidentally adding a request_id tag. When this is happens, it is recommended to fix the upstream error as soon as possible. This is because Vector’s cardinality cache is held in memory and it will be erased when Vector is restarted. This will cause new tag values to pass through until the cardinality limit is reached again. For normal usage this should not be a common problem since Vector processes are normally long-lived.

Failed Parsing

This transform stores in memory a copy of the key for every tag on every metric event seen by this transform. In mode exact, a copy of every distinct value for each key is also kept in memory, until value_limit distinct values have been seen for a given key, at which point new values for that key will be rejected. So to estimate the memory usage of this transform in mode exact you can use the following formula:

(number of distinct field names in the tags for your metrics * average length of
the field names for the tags) + (number of distinct field names in the tags of
your metrics * `value_limit` * average length of the values of tags for your
metrics)

In mode probabilistic, rather than storing all values seen for each key, each distinct key has a bloom filter which can probabilistically determine whether a given value has been seen for that key. The formula for estimating memory usage in mode probabilistic is:

(number of distinct field names in the tags for your metrics * average length of
the field names for the tags) + (number of distinct field names in the tags of
-your metrics * `cache_size_per_tag`)

The cache_size_per_tag option controls the size of the bloom filter used for storing the set of acceptable values for any single key. The larger the bloom filter the lower the false positive rate, which in our case means the less likely we are to allow a new tag value that would otherwise violate a configured limit. If you want to know the exact false positive rate for a given cache_size_per_tag and value_limit, there are many free on-line bloom filter calculators that can answer this. The formula is generally presented in terms of ‘n’, ‘p’, ‘k’, and ’m' where ‘n’ is the number of items in the filter (value_limit in our case), ‘p’ is the probability of false positives (what we want to solve for), ‘k’ is the number of hash functions used internally, and ’m' is the number of bits in the bloom filter. You should be able to provide values for just ‘n’ and ’m' and get back the value for ‘p’ with an optimal ‘k’ selected for you. Remember when converting from value_limit to the ’m' value to plug into the calculator that value_limit is in bytes, and ’m' is often presented in bits (1/8 of a byte).

Restarts

This transform’s cache is held in memory, and therefore, restarting Vector will reset the cache. This means that new values will be passed through until the cardinality limit is reached again. See intended usage for more info.

State

This component is stateful, meaning its behavior changes based on previous inputs (events). State is not preserved across restarts, therefore state-dependent behavior will reset between restarts and depend on the inputs (events) received since the most recent restart.