Berlin

November 4 & 5, 2024

New York

September 4 & 5, 2024

A primer on the OpenTelemetry collector

Combining tracing, metrics, and logs into a single stream of data.
March 10, 2021

The OpenTelemetry project is an observability system that combines tracing, metrics, and logs into a single stream of data.

It has two major components: the clients you install in your applications to generate data, and the collector. Let’s go over the basics of what the collector is, and how it works.

Lightstep_Advert

The collector is a stand-alone service for processing and transmitting observations. Architecturally, the collector is a framework for chaining together plugins, which can be configured to form observability pipelines that buffer data, scrub it, and send it on to various backends in multiple different formats.

Collectors are configured via YAML files. Documentation and details on the configuration format can be found here. The following are the three major types of plugins:

  • Receivers. Receivers ingest data from a variety of popular sources and formats, such as Zipkin and Prometheus. Running multiple types of receivers can help with mixed deployments, allowing for a seamless transition to OpenTelemetry from older systems.
  • Processors. Processors allow tracing, metrics, and resources to be manipulated in a variety of ways.
  • Exporters. Collector exporters are the same as client exporters. They take completed data from OpenTelemetry’s in-memory buffer and flush it to various endpoints in a variety of formats. Exporters can be run in parallel, and data may be efficiently fanned out to multiple endpoints at once – for example, sending your trace data to both Lightstep and Jaeger, while sending your metrics data to Prometheus.

Using the collector to improve operations

OpenTelemetry clients are like any other component you install in your application. That means that they come with some overhead that can be minimized by running a collector.

Avoid rebooting applications

Like most libraries, you have to reboot your application processes in order to make configuration changes to your OpenTelemetry clients. To mitigate this, you can run the OpenTelemetry clients in as close to default mode as possible, pointed at a collector. This allows configuration changes to your observability pipeline to be made by rebooting the collector, not the application process. A locally-running collector can also measure system metrics such as CPU and memory usage for your application.

Avoid stealing resources

The clients may steal system resources away from applications when plugins to process and transform data are installed. To manage telemetry at scale, a pool of data-processing collectors can be run on separate machines on the same private network. This prevents OpenTelemetry from slowing down the application and allows the operator to manage resource provisioning for your observability pipeline by spinning up more collectors as needed.

Avoid hanging on shutdown

Flushing the remaining data may slow down or hang on application shutdown when there are networking problems; flushing the data to a collector running on a local network ensures that the data leaves your application quickly,  preventing issues such as networking egress or backend availability from affecting the shutdown of your application.

Lambda support

When running on Lambda, the collector can be installed as an extension to help deal with the sudden shutdowns and freezing which can occur in serverless environments.

And that’s the basic design and operation of the OpenTelemetry collector! Hopefully, this high-level advice helps you to understand the project components and helps you decide how to best set up your own OpenTelemetry deployment.

Lightstep advert