Install Vector on Kubernetes
Kubernetes, also known as k8s, is an open-source container-orchestration system for automating application deployment, scaling, and management. This page will cover installing and managing Vector on the Kubernetes platform.
Install
- Agent Role
- Aggregator Role
The agent role is designed to collect all Kubernetes log data on each Node. Vector runs as a DaemonSet and tails logs for the entire Pod, automatically enriching them with Kubernetes metadata via the Kubernetes API. Collection is handled automatically, and it is intended for you to adjust your pipeline as necessary using Vector's sources, transforms, and sinks.
Add the Vector repo
helm repo add timberio https://packages.timber.io/helm/latestCheck available Helm chart configuration options
helm show values timberio/vector-agentConfigure Vector
cat <<-'VALUES' > values.yaml# The Vector Kubernetes integration automatically defines a# kubernetes_logs source that is made available to you.# You do not need to define a log source.sinks:# Adjust as necessary. By default we use the console sink# to print all data. This allows you to see Vector working.# /docs/reference/sinks/stdout:type: consoleinputs: ["kubernetes_logs"]target: "stdout"encoding: "json"VALUESInstall Vector
helm install --namespace vector --create-namespace vector timberio/vector-agent --values values.yamlObserve Vector
kubectl logs --namespace vector daemonset/vector-agentexplain this command
Deployment
Vector is an end-to-end observability data pipeline designed to deploy under various roles. You mix and match these roles to create topologies. The intent is to make Vector as flexible as possible, allowing you to fluidly integrate Vector into your infrastructure over time. The deployment section demonstrates common Vector pipelines:
Administration
Restart
kubectl rollout restart --namespace vector daemonset/vector-agent
Observe
kubectl logs --namespace vector daemonset/vector-agent
Upgrade
helm repo update && helm upgrade --namespace vector vector timberio/vector-agent --reuse-values
Uninstall
helm uninstall --namespace vector vector
How it works
Checkpointing
Vector checkpoints the current read position after each
successful read. This ensures that Vector resumes where it left
off if restarted, preventing data from being read twice. The
checkpoint positions are stored in the data directory which is
specified via the global data_dir
option, but can be overridden
via the data_dir
option in the file source directly.
Container exclusion
The kubernetes_logs
source
can skip the logs from the individual container
s of a particular
Pod
. Add an annotation vector.dev/exclude-containers
to the
Pod
, and enumerate the name
s of all the container
s to exclude in
the value of the annotation like so:
vector.dev/exclude-containers: "container1,container2"
This annotation will make Vector skip logs originating from the
container1
and container2
of the Pod
marked with the annotation,
while logs from other container
s in the Pod
will still be
collected.
Context
By default, the kubernetes_logs
source will augment events with helpful
context keys as shown in the "Output" section.
Enrichment
Vector will enrich data with Kubernetes context. A comprehensive
list of fields can be found in the
kubernetes_logs
source output docs.
Filtering
Vector provides rich filtering options for Kubernetes log collection:
- Built-in
Pod
andcontainer
exclusion rules. - The
exclude_paths_glob_patterns
option allows you to exclude Kuberenetes log files by the file name and path. - The
extra_field_selector
option specifies the field selector to filter Pods with, to be used in addition to the built-inNode
filter. - The
extra_label_selector
option specifies the label selector to filterPod
s with, to be used in addition to the built-invector.dev/exclude
filter.
Kubernetes API access control
Vector requires access to the Kubernetes API.
Specifically, the kubernetes_logs
source
uses the /api/v1/pods
endpoint to "watch" the pods from
all namespaces.
Modern Kubernetes clusters run with RBAC (role-based access control)
scheme. RBAC-enabled clusters require some configuration to grant Vector
the authorization to access the Kubernetes API endpoints. As RBAC is
currently the standard way of controlling access to the Kubernetes API,
we ship the necessary configuration out of the box: see ClusterRole
,
ClusterRoleBinding
and a ServiceAccount
in our kubectl
YAML
config, and the rbac
configuration at the Helm chart.
If your cluster doesn't use any access control scheme and doesn't restrict access to the Kubernetes API, you don't need to do any extra configuration - Vector willjust work.
Clusters using legacy ABAC scheme are not officially supported
(although Vector might work if you configure access properly) -
we encourage switching to RBAC. If you use a custom access control
scheme - make sure Vector Pod
/ServiceAccount
is granted access to
the /api/v1/pods
resource.
Kubernetes API communication
Vector communicates with the Kubernetes API to enrich the data it collects with Kubernetes context. Therefore, Vector must have access to communicate with the Kubernetes API server. If Vector is running in a Kubernetes cluster then Vector will connect to that cluster using the Kubernetes provided access information.
In addition to access, Vector implements proper desync handling to ensure communication is safe and reliable. This ensures that Vector will not overwhelm the Kubernetes API or compromise its stability.
Metrics
Our Helm chart deployments provide quality of life around setup and maintenance of
metrics pipelines in Kubernetes. Each of the Helm charts provide an internal_metrics
source and prometheus
sink out of the box. Agent deployments also expose host_metrics
via the same prometheus
sink.
Charts come with options to enable Prometheus integration via annotations or Prometheus Operator
integration via PodMonitor. Thus, the Prometheus node_exporter agent is not required when the host_metrics
source is
enabled.
Partial message merging
Vector, by default, will merge partial messages that are
split due to the Docker size limit. For everything else, it
is recommended to use the reduce
transform which offers
the ability to handle custom merging of things like
stacktraces.
Pod exclusion
By default, the kubernetes_logs
source
will skip logs from the Pod
s that have a vector.dev/exclude: "true"
label.
You can configure additional exclusion rules via label or field selectors,
see the available options.
Pod removal
To ensure all data is collected, Vector will continue to collect logs from the
Pod
for some time after its removal. This ensures that Vector obtains some of
the most important data, such as crash details.
Resource limits
Vector recommends the following resource limits.
Agent resource limits
If deploy Vector as an agent (collecting data for each of your Nodes), then we recommend the following limits:
resources:requests:memory: "64Mi"cpu: "500m"limits:memory: "1024Mi"cpu: "6000m"
As with all Kubernetes resource limit recommendations, use these as a reference point and adjust as ncessary. If your configured Vector pipeline is complex, you may need more resources. If you have a pipeline you may need less.
State
This component is stateless, meaning its behavior is consistent across each input.
State management
Agent state management
For the agent role, Vector stores its state at the host-mapped dir with a static path, so if it's redeployed it'll continue from where it was interrupted.
Testing & reliability
Vector is tested extensively against Kubernetes. In addition to Kubernetes being Vector's most popular installation method, Vector implements a comprehensive end-to-end test suite for all minor Kubernetes versions starting with `1.14.