Monitoring and observing Vector

Use logs and metrics generated by Vector itself in your Vector topology

Although Vector is primarily used to handle observability data from from a wide variety of sources, we also strive to make Vector highly observable itself. To that end, Vector provides two sources, internal_logs and internal_metrics, that you can use to handle logs and metrics produced by Vector just like you would logs and metrics from any other source.

Logs

Vector provide clear, informative, well-structured logs via the internal_logs source. This section shows you how to use them in your Vector topology.

Which logs Vector pipes through the internal_logs source is determined by the log level, which defaults to info.

Accessing logs

You can access Vector’s logs by adding an internal_logs source to your topology. Here’s an example configuration that takes Vector’s logs and pipes them to the console as plain text:

[sources.vector_logs]
type = "internal_logs"

[sinks.console]
type = "console"
inputs = ["vector_logs"]

Using Vector logs

Once Vector logs enter your topology through the internal_logs source, you can treat them like logs from any other system, i.e. you can transform them and send them off to any number of sinks. The configuration below, for example, transforms Vector’s logs using the remap transform and Vector Remap Language and then stores those logs in Clickhouse:

[sources.vector_logs]
type = "internal_logs"

[transforms.modify]
type = "remap"
inputs = ["vector_logs"]

# Reformat the timestamp to Unix time
source = '''
  .timestamp = to_unix_timestamp!(to_timestamp!(.timestamp))
'''

[sinks.database]
type = "clickhouse"
inputs = ["modify"]
host = "http://localhost:8123"
table = "vector-log-data"

Configuring logs

Levels

Vector logs at the info level by default. You can set a different level when starting up your instance using either command-line flags or the LOG environment variable. The table below details these options:

MethodDescription
-v flagDrops the log level to debug
-vv flagDrops the log level to trace
-q flagRaises the log level to warn
-qq flagRaises the log level to error
-qqq flagDisables logging
LOG=<level> environment variableSet the log level. Must be one of trace, debug, info, warn, error, off.

Stack traces

You can enable full error backtraces by setting the RUST_BACKTRACE=full environment variable. More on this in the Troubleshooting guide. You can

Metrics

You can monitor metrics produced by Vector using the internal_metrics source. As with Vector’s internal logs, you can configure an internal_metrics source and use the piped-in metrics however you wish. Here’s an example configuration that

Metrics catalogue

The table below provides a list of internal metrics provided by Vector. See the docs for the internal_metrics source for more detailed information about the available metrics.

NameDescriptionData type
adaptive_concurrency_averaged_rttThe average round-trip time (RTT) for the current window.histogram
adaptive_concurrency_in_flightThe number of outbound requests currently awaiting a response.histogram
adaptive_concurrency_limitThe concurrency limit that the adaptive concurrency feature has decided on for this current window.histogram
adaptive_concurrency_observed_rttThe observed round-trip time (RTT) for requests.histogram
aggregate_events_recorded_totalThe number of events recorded by the aggregate transform.counter
aggregate_failed_updatesThe number of failed metric updates, incremental adds, encountered by the aggregate transform.counter
aggregate_flushes_totalThe number of flushes done by the aggregate transform.counter
api_started_totalThe number of times the Vector GraphQL API has been started.counter
checkpoint_write_errors_totalThe total number of errors writing checkpoints.counter
checkpoints_totalThe total number of files checkpointed.counter
checksum_errors_totalThe total number of errors identifying files via checksum.counter
collect_completed_totalThe total number of metrics collections completed for this component.counter
collect_duration_secondsThe duration spent collecting of metrics for this component.histogram
command_executed_totalThe total number of times a command has been executed.counter
command_execution_duration_secondsThe command execution duration in seconds.histogram
communication_errors_totalThe total number of errors stemming from communication with the Docker daemon.counter
config_load_errors_totalThe total number of errors loading the Vector configuration.counter
connection_errors_totalThe total number of connection errors for this Vector instance.counter
connection_established_totalThe total number of times a connection has been established.counter
connection_failed_totalThe total number of times a connection has failed.counter
connection_read_errors_totalThe total number of errors reading datagram.counter
connection_send_ack_errors_totalThe total number of protocol acknowledgement errors for this Vector instance for source protocols that support acknowledgements.counter
connection_send_errors_totalThe total number of errors sending data via the connection.counter
connection_shutdown_totalThe total number of times the connection has been shut down.counter
consumer_offset_updates_failed_totalThe total number of failures to update a Kafka consumer offset.counter
container_metadata_fetch_errors_totalThe total number of errors encountered when fetching container metadata.counter
container_processed_events_totalThe total number of container events processed.counter
containers_unwatched_totalThe total number of times Vector stopped watching for container logs.counter
containers_watched_totalThe total number of times Vector started watching for container logs.counter
decode_errors_totalThe total number of decode errors seen when decoding data in a source component.counter
encode_errors_totalThe total number of errors encountered when encoding an event.counter
events_discarded_totalThe total number of events discarded by this component.counter
events_failed_totalThe total number of failures to read a Kafka message.counter
events_in_totalThe number of events accepted by this component either from tagged origin like file and uri, or cumulatively from other origins.counter
events_out_totalThe total number of events emitted by this component.counter
file_delete_errors_totalThe total number of failures to delete a file.counter
file_watch_errors_totalThe total number of errors encountered when watching files.counter
files_added_totalThe total number of files Vector has found to watch.counter
files_deleted_totalThe total number of files deleted.counter
files_resumed_totalThe total number of times Vector has resumed watching a file.counter
files_unwatched_totalThe total number of times Vector has stopped watching a file.counter
fingerprint_read_errors_totalThe total number of times Vector failed to read a file for fingerprinting.counter
glob_errors_totalThe total number of errors encountered when globbing paths.counter
http_bad_requests_totalThe total number of HTTP 400 Bad Request errors encountered.counter
http_client_response_rtt_secondsThe round-trip time (RTT) of HTTP requests, tagged with the response code.histogram
http_client_responses_totalThe total number of HTTP requests, tagged with the response code.counter
http_client_rtt_secondsThe round-trip time (RTT) of HTTP requests.histogram
http_error_response_totalThe total number of HTTP error responses for this component.counter
http_request_errors_totalThe total number of HTTP request errors for this component.counter
http_requests_totalThe total number of HTTP requests issued by this component.counter
invalid_record_bytes_totalThe total number of bytes from invalid records that have been discarded.counter
invalid_record_totalThe total number of invalid records that have been discarded.counter
k8s_docker_format_parse_failures_totalThe total number of failures to parse a message as a JSON object.counter
k8s_event_annotation_failures_totalThe total number of failures to annotate Vector events with Kubernetes Pod metadata.counter
k8s_format_picker_edge_cases_totalThe total number of edge cases encountered while picking format of the Kubernetes log message.counter
k8s_reflector_desyncs_totalThe total number of desyncs for the reflector.counter
k8s_state_ops_totalThe total number of state operations.counter
k8s_stream_chunks_processed_totalThe total number of chunks processed from the stream of Kubernetes resources.counter
k8s_stream_processed_bytes_totalThe number of bytes processed from the stream of Kubernetes resources.counter
k8s_watch_requests_failed_totalThe total number of watch requests failed.counter
k8s_watch_requests_invoked_totalThe total number of watch requests invoked.counter
k8s_watch_stream_failed_totalThe total number of watch streams failed.counter
k8s_watch_stream_items_obtained_totalThe total number of items obtained from a watch stream.counter
k8s_watcher_http_error_totalThe total number of HTTP error responses for the Kubernetes watcher.counter
kafka_consumed_messages_bytes_totalTotal number of message bytes (including framing) received from Kafka brokers.counter
kafka_consumed_messages_totalTotal number of messages consumed, not including ignored messages (due to offset, etc), from Kafka brokers.counter
kafka_produced_messages_bytes_totalTotal number of message bytes (including framing, such as per-Message framing and MessageSet/batch framing) transmitted to Kafka brokers.counter
kafka_produced_messages_totalTotal number of messages transmitted (produced) to Kafka brokers.counter
kafka_queue_messagesCurrent number of messages in producer queues.gauge
kafka_queue_messages_bytesCurrent total size of messages in producer queues.gauge
kafka_requests_bytes_totalTotal number of bytes transmitted to Kafka brokers.counter
kafka_requests_totalTotal number of requests sent to Kafka brokers.counter
kafka_responses_bytes_totalTotal number of bytes received from Kafka brokers.counter
kafka_responses_totalTotal number of responses received from Kafka brokers.counter
logging_driver_errors_totalThe total number of logging driver errors encountered caused by not using either the jsonfile or journald driver.counter
memory_used_bytesThe total memory currently being used by Vector (in bytes).gauge
metadata_refresh_failed_totalThe total number of failed efforts to refresh AWS EC2 metadata.counter
metadata_refresh_successful_totalThe total number of AWS EC2 metadata refreshes.counter
open_connectionsThe number of current open connections to Vector.gauge
parse_errors_totalThe total number of errors parsing metrics for this component.counter
processed_bytes_totalThe number of bytes processed by the component.counter
processed_events_totalThe total number of events processed by this component. This metric is deprecated in place of using events_in_total and events_out_total metrics.counter
processing_errors_totalThe total number of processing errors encountered by this component.counter
protobuf_decode_errors_totalThe total number of Protocol Buffers errors thrown during communication between Vector instances.counter
quit_totalThe total number of times the Vector instance has quit.counter
recover_errors_totalThe total number of errors caused by Vector failing to recover from a failed reload.counter
reload_errors_totalThe total number of errors encountered when reloading Vector.counter
reloaded_totalThe total number of times the Vector instance has been reloaded.counter
request_automatic_decode_errors_totalThe total number of request errors for this component when it attempted to automatically discover and handle the content-encoding of incoming request data.counter
request_duration_secondsThe total request duration in seconds.histogram
request_errors_totalThe total number of requests errors for this component.counter
request_read_errors_totalThe total number of request read errors for this component.counter
requests_completed_totalThe total number of requests completed by this component.counter
requests_received_totalThe total number of requests received by this component.counter
send_errors_totalThe total number of errors sending messages.counter
sqs_message_delete_failed_totalThe total number of failures to delete SQS messages.counter
sqs_message_delete_succeeded_totalThe total number of successful deletions of SQS messages.counter
sqs_message_processing_failed_totalThe total number of failures to process SQS messages.counter
sqs_message_processing_succeeded_totalThe total number of SQS messages successfully processed.counter
sqs_message_receive_failed_totalThe total number of failures to receive SQS messages.counter
sqs_message_receive_succeeded_totalThe total number of times successfully receiving SQS messages.counter
sqs_message_received_messages_totalThe total number of received SQS messages.counter
sqs_s3_event_record_ignored_totalThe total number of times an S3 record in an SQS message was ignored (for an event that was not ObjectCreated).counter
stale_events_flushed_totalThe number of stale events that Vector has flushed.counter
started_totalThe total number of times the Vector instance has been started.counter
stdin_reads_failed_totalThe total number of errors reading from stdin.counter
stopped_totalThe total number of times the Vector instance has been stopped.counter
tag_value_limit_exceeded_totalThe total number of events discarded because the tag has been rejected after hitting the configured value_limit.counter
timestamp_parse_errors_totalThe total number of errors encountered parsing RFC 3339 timestamps.counter
uptime_secondsThe total number of seconds the Vector instance has been up.gauge
utf8_convert_errors_totalThe total number of errors converting bytes to a UTF-8 string in UDP mode.counter
value_limit_reached_totalThe total number of times new values for a key have been rejected because the value limit has been reached.counter
windows_service_does_not_exist_totalThe total number of errors raised due to the Windows service not existing.counter
windows_service_install_totalThe total number of times the Windows service has been installed.counter
windows_service_restart_totalThe total number of times the Windows service has been restarted.counter
windows_service_start_totalThe total number of times the Windows service has been started.counter
windows_service_stop_totalThe total number of times the Windows service has been stopped.counter
windows_service_uninstall_totalThe total number of times the Windows service has been uninstalled.counter

Troubleshooting

More information in our troubleshooting guide:

How it works

Event-driven observability

Vector employs an event-driven observability strategy that ensures consistent and correlated telemetry data. You can read more about our approach in RFC 2064.

Log rate limiting

Vector rate limits log events in the hot path. This enables you to get granular insight without the risk of saturating IO and disrupting the service. The trade-off is that repetitive logs aren’t logged.