Kubernetes Logs Source
The Vector kubernetes_logs
source
collects all log data for Kubernetes Nodes, automatically enriching data with
Kubernetes metadata via the Kubernetes API.
Requirements
Setup
This component is part of a larger setup strategy for the Kubernetes platform.
Configuration
- Common
- Advanced
- vector.toml
- vector.yaml
- vector.json
[sources.my_source_id]type = "kubernetes_logs" # required
- optionaltable
annotation_fields
Configuration for how the events are annotated with Pod metadata.
- optionalstring
container_image
Event field for Container image.
- Syntax:
literal
- Default:
"kubernetes.container_image"
- Syntax:
- optionalstring
container_name
Event field for Container name.
- Syntax:
literal
- Default:
"kubernetes.container_name"
- Syntax:
- optionalstring
pod_ip
Event field for Pod IPv4 Address.
- Syntax:
literal
- Default:
"kubernetes.pod_ip"
- Syntax:
- optionalstring
pod_ips
Event field for Pod IPv4 and IPv6 Addresses.
- Syntax:
literal
- Default:
"kubernetes.pod_ips"
- Syntax:
- optionalstring
pod_labels
Event field for Pod labels.
- Syntax:
literal
- Default:
"kubernetes.pod_labels"
- Syntax:
- optionalstring
pod_name
Event field for Pod name.
- Syntax:
literal
- Default:
"kubernetes.pod_name"
- Syntax:
- optionalstring
pod_namespace
Event field for Pod namespace.
- Syntax:
literal
- Default:
"kubernetes.pod_namespace"
- Syntax:
- optionalstring
pod_node_name
Event field for Pod node_name.
- Syntax:
literal
- Default:
"kubernetes.pod_node_name"
- Syntax:
- optionalstring
pod_uid
Event field for Pod uid.
- Syntax:
literal
- Default:
"kubernetes.pod_uid"
- Syntax:
- optionalbool
auto_partial_merge
Automatically merge partial messages into a single event. Partial here is in respect to messages that were split by the Kubernetes Container Runtime log driver.
- Default:
true
- View examples
- Default:
- optionalstring
data_dir
The directory used to persist file checkpoint positions. By default, the global
data_dir
option is used. Please make sure the Vector project has write permissions to this dir. See Checkpointing for more info.This field accepts a valid file system path.
- Syntax:
file_system_path
- View examples
- Syntax:
- optional[string]
exclude_paths_glob_patterns
A list of glob patterns to exclude from reading the files. See Filtering for more info.
- Default:
[]
- View examples
- Default:
- optionalstring
extra_field_selector
Specifies the field selector to filter
Pod
s with, to be used in addition to the built-inNode
filter. The name of the KubernetesNode
this Vector instance runs at. Configured to use an env var by default, to be evaluated to a value provided by Kubernetes at Pod deploy time. See Filtering for more info.- Syntax:
literal
- Default:
""
- View examples
- Syntax:
- optionalstring
extra_label_selector
Specifies the label selector to filter
Pod
s with, to be used in addition to the built-invector.dev/exclude
filter. See Filtering for more info.- Syntax:
literal
- Default:
""
- View examples
- Syntax:
- optionalstring
self_node_name
The name of the Kubernetes
Node
this Vector instance runs at. Configured to use an env var by default, to be evaluated to a value provided by Kubernetes at Pod deploy time.- Syntax:
literal
- Default:
"${VECTOR_SELF_NODE_NAME}"
- Syntax:
Output
This component outputs log events with the following fields:
{"file" : "/var/log/pods/pod-namespace_pod-name_pod-uid/container/1.log","kubernetes.container_image" : "busybox:1.30","kubernetes.container_name" : "coredns","kubernetes.pod_ip" : "192.168.1.1","kubernetes.pod_ips" : "192.168.1.1","kubernetes.pod_name" : "coredns-qwertyuiop-qwert","kubernetes.pod_namespace" : "kube-system","kubernetes.pod_node_name" : "minikube","kubernetes.pod_uid" : "ba46d8c9-9541-4f6b-bbf9-d23b36f2f136","message" : "53.126.150.246 - - [01/Oct/2020:11:25:58 -0400] \"GET /disintermediate HTTP/2.0\" 401 20308","mylabel" : "myvalue","source_type" : "kubernetes_logs","stream" : "stdout","timestamp" : "2020-10-10T17:07:36+00:00"}
- commonrequiredstring
file
The absolute path of originating file.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionalstring
kubernetes.container_image
Container image.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionalstring
kubernetes.container_name
Container name.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionalstring
kubernetes.pod_ip
Pod IPv4 address.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionalstring
kubernetes.pod_ips
Pod IPv4 and IPv6 addresses.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionaltable
kubernetes.pod_labels
Pod labels name.
- commonoptionalstring
kubernetes.pod_name
Pod name.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionalstring
kubernetes.pod_namespace
Pod namespace.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionalstring
kubernetes.pod_node_name
Pod node name.
- Syntax:
literal
- View examples
- Syntax:
- commonoptionalstring
kubernetes.pod_uid
Pod uid.
- Syntax:
literal
- View examples
- Syntax:
- commonrequiredstring
message
The raw line from the Pod log file.
- Syntax:
literal
- View examples
- Syntax:
- commonrequiredstring
source_type
The name of the source type.
- Syntax:
literal
- View examples
- Syntax:
- commonrequiredstring
stream
The name of the stream the log line was sumbitted to.
- Syntax:
literal
- View examples
- Syntax:
- commonrequiredtimestamp
timestamp
The exact time the event was ingested into Vector.
- View examples
Telemetry
This component provides the following metrics that can be retrieved through
the internal_metrics
source. See the
metrics section in the
monitoring page for more info.
- counter
k8s_format_picker_edge_cases_total
The total number of edge cases encountered while picking format of the Kubernetes log message. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
k8s_docker_format_parse_failures_total
The total number of failures to parse a message as a JSON object. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
k8s_event_annotation_failures_total
The total number of failures to annotate Vector events with Kubernetes Pod metadata. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
processed_bytes_total
The total number of bytes processed by the component. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
events_out_total
The total number of events emitted by this component. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.instance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
- counter
processed_events_total
The total number of events processed by this component. This metric includes the following tags:
component_kind
- The Vector component kind.component_name
- The Vector component ID.component_type
- The Vector component type.file
- The file that produced the errorinstance
- The Vector instance identified by host and port.job
- The name of the job producing Vector metrics.
Examples
Given the following input:
F1015 11:01:46.499073 1 main.go:39] error getting server version: Get "https://10.96.0.1:443/version?timeout=32s": dial tcp 10.96.0.1:443: connect: network is unreachable
And the following configuration:
[sources.kubernetes_logs]type = "kubernetes_logs"
The following Vector log event will be output:
{"file": "/var/log/pods/kube-system_storage-provisioner_93bde4d0-9731-4785-a80e-cd27ba8ad7c2/storage-provisioner/1.log","kubernetes.container_image": "gcr.io/k8s-minikube/storage-provisioner:v3","kubernetes.container_name": "storage-provisioner","kubernetes.pod_ip": "192.168.1.1","kubernetes.pod_ips": ["192.168.1.1","::1"],"kubernetes.pod_labels": {"addonmanager.kubernetes.io/mode": "Reconcile","gcp-auth-skip-secret": "true","integration-test": "storage-provisioner"},"kubernetes.pod_name": "storage-provisioner","kubernetes.pod_namespace": "kube-system","kubernetes.pod_node_name": "minikube","kubernetes.pod_uid": "93bde4d0-9731-4785-a80e-cd27ba8ad7c2","message": "F1015 11:01:46.499073 1 main.go:39] error getting server version: Get \"https://10.96.0.1:443/version?timeout=32s\": dial tcp 10.96.0.1:443: connect: network is unreachable","source_type": "kubernetes_logs","stream": "stderr","timestamp": "2020-10-15T11:01:46.499555308Z"}
How It Works
Checkpointing
Vector checkpoints the current read position after each
successful read. This ensures that Vector resumes where it left
off if restarted, preventing data from being read twice. The
checkpoint positions are stored in the data directory which is
specified via the global data_dir
option, but can be overridden
via the data_dir
option in the file source directly.
Container exclusion
The kubernetes_logs
source
can skip the logs from the individual container
s of a particular
Pod
. Add an annotation vector.dev/exclude-containers
to the
Pod
, and enumerate the name
s of all the container
s to exclude in
the value of the annotation like so:
vector.dev/exclude-containers: "container1,container2"
This annotation will make Vector skip logs originating from the
container1
and container2
of the Pod
marked with the annotation,
while logs from other container
s in the Pod
will still be
collected.
Context
By default, the kubernetes_logs
source will augment events with helpful
context keys as shown in the "Output" section.
Enrichment
Vector will enrich data with Kubernetes context. A comprehensive
list of fields can be found in the
kubernetes_logs
source output docs.
Filtering
Vector provides rich filtering options for Kubernetes log collection:
- Built-in
Pod
andcontainer
exclusion rules. - The
exclude_paths_glob_patterns
option allows you to exclude Kuberenetes log files by the file name and path. - The
extra_field_selector
option specifies the field selector to filter Pods with, to be used in addition to the built-inNode
filter. - The
extra_label_selector
option specifies the label selector to filterPod
s with, to be used in addition to the built-invector.dev/exclude
filter.
Kubernetes API access control
Vector requires access to the Kubernetes API.
Specifically, the kubernetes_logs
source
uses the /api/v1/pods
endpoint to "watch" the pods from
all namespaces.
Modern Kubernetes clusters run with RBAC (role-based access control)
scheme. RBAC-enabled clusters require some configuration to grant Vector
the authorization to access the Kubernetes API endpoints. As RBAC is
currently the standard way of controlling access to the Kubernetes API,
we ship the necessary configuration out of the box: see ClusterRole
,
ClusterRoleBinding
and a ServiceAccount
in our kubectl
YAML
config, and the rbac
configuration at the Helm chart.
If your cluster doesn't use any access control scheme and doesn't restrict access to the Kubernetes API, you don't need to do any extra configuration - Vector willjust work.
Clusters using legacy ABAC scheme are not officially supported
(although Vector might work if you configure access properly) -
we encourage switching to RBAC. If you use a custom access control
scheme - make sure Vector Pod
/ServiceAccount
is granted access to
the /api/v1/pods
resource.
Kubernetes API communication
Vector communicates with the Kubernetes API to enrich the data it collects with Kubernetes context. Therefore, Vector must have access to communicate with the Kubernetes API server. If Vector is running in a Kubernetes cluster then Vector will connect to that cluster using the Kubernetes provided access information.
In addition to access, Vector implements proper desync handling to ensure communication is safe and reliable. This ensures that Vector will not overwhelm the Kubernetes API or compromise its stability.
Partial message merging
Vector, by default, will merge partial messages that are
split due to the Docker size limit. For everything else, it
is recommended to use the reduce
transform which offers
the ability to handle custom merging of things like
stacktraces.
Pod exclusion
By default, the kubernetes_logs
source
will skip logs from the Pod
s that have a vector.dev/exclude: "true"
label.
You can configure additional exclusion rules via label or field selectors,
see the available options.
Pod removal
To ensure all data is collected, Vector will continue to collect logs from the
Pod
for some time after its removal. This ensures that Vector obtains some of
the most important data, such as crash details.
Resource limits
Vector recommends the following resource limits.
Agent resource limits
If deploy Vector as an agent (collecting data for each of your Nodes), then we recommend the following limits:
resources:requests:memory: "64Mi"cpu: "500m"limits:memory: "1024Mi"cpu: "6000m"
As with all Kubernetes resource limit recommendations, use these as a reference point and adjust as ncessary. If your configured Vector pipeline is complex, you may need more resources. If you have a pipeline you may need less.
State
This component is stateless, meaning its behavior is consistent across each input.
State management
Agent state management
For the agent role, Vector stores its state at the host-mapped dir with a static path, so if it's redeployed it'll continue from where it was interrupted.
Testing & reliability
Vector is tested extensively against Kubernetes. In addition to Kubernetes being Vector's most popular installation method, Vector implements a comprehensive end-to-end test suite for all minor Kubernetes versions starting with `1.14.