Concepts
In order to understand Vector, you must first understand the fundamental concepts. The following concepts are ordered progressively, starting with the individual unit of data (events) and broadening all the way to Vector's deployment models (pipelines).
Events
"Events" represent the individual units of data in Vector. They must fit into one of the following types.
Logs
A "log" event is a generic key/value representation of an event.
Metrics
A "metric" event is a first-class representation of numerical operation performed on a time series. Vector's metric events are fully interoperable.
Components
"Component" is the generic term we use for sources, transforms, and sinks. Components ingest, transform, and route events. You compose components to create topologies.
Sources
Vector wouldn't be very useful if it couldn't ingest data. A "source" defines
where Vector should pull data from, or how it should receive data pushed to it.
A topology can have any number of sources, and as they ingest data
they proceed to normalize it into events (see next section). This sets the stage
for easy and consistent processing of your data. Examples of sources include
file
, syslog
,
StatsD
, and stdin
.
Transforms
A "transform" is responsible for mutating events as they are transported by Vector. This might involve parsing, filtering, sampling, or aggregating. You can have any number of transforms in your pipeline and how they are composed is up to you.
Sinks
A "sink" is a destination for events. Each sink's
design and transmission method is dictated by the downstream service it is
interacting with. For example, the socket
sink will
stream individual events, while the aws_s3
sink will
buffer and flush data.
Pipeline
A "Pipeline" is a directed acyclic graph of components. Each component is a node on the graph with directed edges. Data must flow in one direction, from sources to sinks. Components can produce zero or more events.
Roles
A "role" refers to a deployment role that Vector fills in order to create end-to-end pipelines.
Agent
The "agent" role is designed for deploying Vector to the edge, typically for data collection.
Aggregator
The "aggregator" role is designed to collect and process data from multiple upstream sources. These upstream sources could be other Vector agents or non-Vector agents such as Syslog-ng.
Topology
A "topology" refers to the end result of deploying Vector into your infrastructure. A topology may be as simple as deploying Vector as an agent, or it may be as complex as deploying Vector as an agent and routing data through multiple Vector aggregators.