The OpenTelemetry GenAI Spec Vendors Don't Tell You About

Abstract architectural diagram comparing vendor-locked proprietary traces with a standardized, vendor-neutral OpenTelemetry GenAI span
Key Takeaways:
  • Vendor Lock-In Prevention: Standardizing on OpenTelemetry GenAI attributes prevents your data from being siloed within proprietary ecosystems.
  • The v1.30 Reality: While vendors pitch custom tracing, the official gen_ai.* spec has stabilized critical request and response tracking primitives.
  • Unified Cost Metrics: You can finally append custom token-cost telemetry directly to standard traces for unified billing dashboards.
  • Ecosystem Compatibility: Major open-source players natively ingest these schemas, reducing the friction of migrating your backend infrastructure.

Vendors want you locked into their proprietary tracing dashboards, which is why most observability SDKs quietly ignore the stable OpenTelemetry GenAI semantic conventions.

They sell you on one-click integrations, but the moment you need to correlate a prompt latency spike with your Kubernetes pod metrics, you hit a paywall.

If you have consulted our overarching AI agent observability playbook, you already know that decoupling your instrumentation from your storage backend is a non-negotiable architectural requirement.

This deep dive exposes the specific span schemas, stable attributes, and implementation tactics you need to achieve true vendor-neutral observability in 2026.

Breaking the Proprietary Tracing Lock-In

The biggest observability mistake engineering teams make today is allowing their LLM instrumentation to dictate their APM choice. When you use a vendor's native SDK without a standard abstraction, your trace data is fundamentally incompatible with your broader enterprise stack.

Adopting the OpenTelemetry GenAI semantic conventions transforms your LLM calls from opaque black boxes into standardized telemetry. This means your APM treats a complex LLM chain with the same diagnostic rigor as a standard database query.

When configuring exactly how to trace this stack, prioritizing vendor-neutral instrumentation ensures you retain total sovereignty over your telemetry data.

The gen_ai Attributes You Need to Know

The OpenTelemetry specification dictates a strict nomenclature for generative AI telemetry. At the core of this schema are the gen_ai.request.* and gen_ai.response.* attributes. These prefixes organize the massive amount of metadata generated during a single model invocation.

Essential Request Attributes:

  • gen_ai.system: Identifies the underlying provider (e.g., openai, anthropic).
  • gen_ai.request.model: Specifies the exact model version being invoked.
  • gen_ai.request.temperature: Logs the specific temperature setting for prompt debugging.

Stable vs. Experimental Fields in 2026

It is crucial to understand which fields are safe for production and which are subject to breaking changes. In 2026, the core token counting attributes—specifically gen_ai.response.completion_tokens and gen_ai.response.prompt_tokens—are definitively stable. Rely on these for accurate cost aggregation.

Conversely, experimental attributes often relate to highly specific, emerging multi-modal embeddings. Stick to the stable core when instrumenting mission-critical production pipelines.

Implementing OpenTelemetry LLM Tracing in Python

Applying the span schema for AI requires explicitly mapping standard OpenAI API responses to OpenTelemetry attributes.

You should wrap your LLM calls within a standard tracer.start_as_current_span() block. Inside this context, manually extract the token usage metadata from the API response payload.

Once extracted, use span.set_attribute() to append the standardized gen_ai keys to the active trace context before the span closes.

Differentiating Request and Response Telemetry

Accurate debugging requires cleanly separating what you asked the model from what it returned. The gen_ai.request attributes should be set before the API call executes. This ensures that if the request times out, you still have a record of the intended parameters.

The gen_ai.response attributes, such as finish reasons and token counts, must be appended conditionally only after a successful network response is received.

If you are exploring cost-effective backend alternatives that natively ingest these OTel traces, consider reviewing our self hostLangfuse production deployment guide.

Vendor-Neutral Observability for Multi-Agent Workflows

While basic LLM tracing is straightforward, multi-agent frameworks introduce asynchronous complexities. Does OpenTelemetry GenAI cover multi-agent and tool call spans? Yes, but it requires diligent trace context propagation.

Every sub-agent execution must extract the active trace ID from its parent process. When a tool is called, a child span must be created beneath the specific sub-agent, utilizing the gen_ai.tool.* semantic attributes to define the tool's name and input parameters.

About the Author: Sanjay Saini

Sanjay Saini is an Enterprise AI Strategy Director specializing in digital transformation and AI ROI models. He covers high-stakes news at the intersection of leadership and sovereign AI infrastructure.

Connect on LinkedIn

Transform your ideas into stunning presentations. Create dynamic, professional visuals faster with Prezi AI. The ultimate AI presentation assistant for professionals. Elevate your storytelling today.

Prezi AI - Create Stunning Presentations

This link leads to a paid promotion

Frequently Asked Questions (FAQ)

What is the OpenTelemetry GenAI semantic convention in 2026?

The OpenTelemetry GenAI semantic convention is a standardized, vendor-neutral schema for logging and tracing generative AI operations. It defines exact attribute names—like gen_ai.system and gen_ai.request.model—ensuring AI telemetry is universally readable across different APM platforms.

Which gen_ai.* span attributes are stable versus experimental?

Core execution attributes, such as gen_ai.system, gen_ai.request.model, and base token counts (gen_ai.response.prompt_tokens), are considered stable in 2026. Experimental attributes typically encompass newer multi-modal specific inputs, streaming chunks, and highly granular, vendor-specific tool interaction metadata.

How do I instrument an OpenAI API call with OTel GenAI conventions in Python?

Wrap your OpenAI execution within a tracer.start_as_current_span() block. Manually extract properties from the API response object and use span.set_attribute() to map them to the official gen_ai.* keys, ensuring your telemetry adheres to the standard schema.

Does OpenTelemetry GenAI cover multi-agent and tool call spans?

Yes, the specification includes structural support for multi-agent logic through standard trace context propagation and specific gen_ai.tool.* attributes. This allows you to track intricate workflows by associating autonomous tool executions as child spans of the primary agent.

How do I forward GenAI traces to Datadog, New Relic, and Honeycomb?

Because the OTel GenAI spec standardizes the payload, you simply configure the standard OpenTelemetry Collector with the appropriate OTLP exporter endpoints. This allows you to route the exact same vendor-neutral traces to Datadog, New Relic, or Honeycomb simultaneously without altering code.

What is the difference between gen_ai.request and gen_ai.response attributes?

The gen_ai.request.* attributes capture the initial inputs, model selection, and hyperparameter configurations sent to the provider. The gen_ai.response.* attributes log the resulting outputs, including generated tokens, exact finish reasons, and the final payload returned by the LLM.

How do I add cost-per-token as a custom OTel metric?

You can create a custom OpenTelemetry Counter metric within your application code. After retrieving the token usage from the gen_ai.response attributes, calculate the cost based on the model pricing, and record the dollar amount to the metric, tagging it with the model name.

Are OTel GenAI conventions compatible with LangSmith and Langfuse?

Langfuse actively champions OpenTelemetry and natively ingests standard gen_ai traces out of the box. LangSmith primarily relies on its proprietary LangChain callbacks, requiring custom integration or specialized OTLP translation layers to fully utilize generic OpenTelemetry spans.

How do I sample GenAI traces without losing error spans?

Implement tail-based sampling at the OpenTelemetry Collector level rather than head-based sampling in the SDK. This ensures that all traces are evaluated after completion, allowing you to sample successful 200 OK responses at 10% while retaining 100% of traces containing an error or exception.

What is the recommended exporter for a self-hosted OTel GenAI stack?

For self-hosted, high-volume AI telemetry, the recommended architecture utilizes the OpenTelemetry Collector exporting directly to a ClickHouse database via the OTLP/HTTP exporter. ClickHouse provides exceptional compression and query speed for the large JSON payloads typical of GenAI traces.