链路追踪与分布式追踪构建可观测的微服务系统一、分布式追踪概述1.1 为什么需要链路追踪在微服务架构中一次请求可能涉及多个服务的协同工作问题定位困难出现问题时难以快速定位是哪个服务性能瓶颈不明无法了解整个链路的性能情况依赖关系复杂服务间的调用关系难以理清调用链路不透明无法追踪请求的完整路径1.2 链路追踪核心概念概念描述Trace一次请求的完整链路标识Span链路中的一个工作单元Annotation时间点上的标记事件Baggage随请求传递的上下文数据1.3 链路追踪架构┌─────────────────────────────────────────────────────────────────────────┐ │ 分布式追踪架构 │ ├─────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ Client │────▶│Service A │────▶│Service B │────▶│Service C │ │ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ Trace Context │ │ │ │ traceId: abc123 | spanId: 1 | parentSpanId: null | sampled: true │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ Collector │ │ │ │ (Zipkin/Jaeger)│ │ │ └─────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ Storage │ │ │ │ (ES/MySQL) │ │ │ └─────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────┘二、Spring Cloud Sleuth配置2.1 基础依赖dependency groupIdorg.springframework.cloud/groupId artifactIdspring-cloud-starter-sleuth/artifactId /dependency !-- 可选添加OpenTelemetry支持 -- dependency groupIdorg.springframework.cloud/groupId artifactIdspring-cloud-starter-tracing/artifactId /dependency dependency groupIdio.opentelemetry/groupId artifactIdopentelemetry-exporter-otlp/artifactId /dependency2.2 Sleuth配置spring: application: name: user-service sleuth: sampler: probability: 1.0 # 采样率 0-1 rate: 100 # 每秒最大采样数 propagation: type: B3 w3c: enabled: true baggage: remote-fields: - user-id - request-id correlation-enabled: true header-names: user-id: X-User-Id instrument: web: enabled: true reactor: enabled: true mongo: enabled: true redis: enabled: true logs: enabled: true2.3 手动创建SpanService public class UserService { private static final Logger log LoggerFactory.getLogger(UserService.class); Autowired private Tracer tracer; public User getUserById(Long id) { // 创建子Span Span span tracer.nextSpan().name(getUserById).start(); try (Tracer.SpanInScope inScope tracer.withSpanInScope(span)) { log.info(Getting user by id: {}, id); // 创建子Span Span dbSpan tracer.nextSpan().name(queryDatabase).start(); try (Tracer.SpanInScope dbScope tracer.withSpanInScope(dbSpan)) { dbSpan.tag(db.system, mysql); dbSpan.tag(db.statement, SELECT * FROM users WHERE id ?); User user userRepository.findById(id).orElse(null); return user; } finally { dbSpan.end(); } } finally { span.end(); } } }三、Jaeger集成3.1 Jaeger服务端配置version: 3.8 services: jaeger: image: jaegertracing/all-in-one:latest ports: - 16686:16686 # UI - 6831:6831/udp # Jaeger.thrift (compact) - 14250:14250 # gRPC environment: - COLLECTOR_OTLP_ENABLEDtrue - SPAN_STORAGE_TYPEelasticsearch - ES_SERVER_URLShttp://elasticsearch:9200 depends_on: - elasticsearch elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:7.17.0 environment: - discovery.typesingle-node - ES_JAVA_OPTS-Xms512m -Xmx512m ports: - 9200:92003.2 Spring Boot集成Jaegerspring: application: name: user-service autoconfigure: exclude: - org.springframework.cloud.sleuth.autoconfig.SleuthReactorInstrumentationAutoConfiguration otlp: tracing: endpoint: http://localhost:4318/v1/traces headers: Authorization: Bearer your-token management: tracing: sampling: probability: 1.0 propagation: type: w3c exclusions: - /actuator/** - /health3.3 自定义Jaeger配置Configuration public class JaegerConfig { Bean public Configurer samplerConfigurer() { return builder - builder .withLogSpans(true) .withCodec(Propagation.B3) .withSampler(new ProbabilisticSampler(0.5)); } Bean public RestTemplateCustomizer jaegerRestTemplateCustomizer(Tracer tracer) { return restTemplate - { ListClientHttpRequestInterceptor interceptors new ArrayList( restTemplate.getInterceptors()); interceptors.add(new TracingClientHttpRequestInterceptor(tracer)); restTemplate.setInterceptors(interceptors); }; } }四、Zipkin集成4.1 Zipkin服务端配置# docker-compose.yml version: 3.8 services: zipkin: image: openzipkin/zipkin:latest ports: - 9411:9411 environment: - STORAGE_TYPEelasticsearch - ES_HOSTShttp://elasticsearch:9200 - RABBIT_URIamqp://guest:guestrabbit:5672 depends_on: - elasticsearch4.2 Spring Boot集成Zipkinspring: application: name: user-service zipkin: base-url: http://localhost:9411 sender: type: rest # 或 rabbit/kafka/web sampler: probability: 1.0 # 采样率 locator: discovery: enabled: true # 从Eureka发现Zipkin服务器4.3 异步发送配置spring: zipkin: sender: type: rabbit rabbit: queue: zipkin connection-name: zipkin-sender rabbitmq: host: localhost port: 5672 username: guest password: guest management: metrics: export: zipkin: enabled: true五、OpenTelemetry集成5.1 OpenTelemetry SDK配置spring: application: name: user-service otel: exporter: otlp: endpoint: http://localhost:4317 headers: api-key: your-api-key service: name: ${spring.application.name} version: 1.0.0 traces: exporter: otlp metrics: exporter: otlp logs: exporter: otlp sampler: ratio: 1.0 parent-based: true5.2 自定义Span配置Component public class TracingInterceptor extends HandlerInterceptorAdapter { private final Tracer tracer; public TracingInterceptor(Tracer tracer) { this.tracer tracer; } Override public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) { Span span tracer.nextSpan() .name(request.getMethod() request.getRequestURI()) .tag(http.method, request.getMethod()) .tag(http.url, request.getRequestURL().toString()) .tag(http.host, request.getRemoteHost()) .start(); tracer.withSpanInScope(span); request.setAttribute(currentSpan, span); return true; } Override public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) { Span span tracer.currentSpan(); if (span ! null) { span.tag(http.status_code, String.valueOf(response.getStatus())); if (ex ! null) { span.tag(error, true); span.tag(error.message, ex.getMessage()); span.status(StatusCode.ERROR); } span.end(); } } }5.3 数据库追踪Component public class TracingDataSourceDecorator extends DataSourceWrapper { private final Tracer tracer; public TracingDataSourceDecorator(DataSource delegate, Tracer tracer) { super(delegate); this.tracer tracer; } Override public Connection getConnection() throws SQLException { Span span tracer.nextSpan().name(db.query).start(); try (Tracer.SpanInScope inScope tracer.withSpanInScope(span)) { span.tag(db.system, mysql); span.tag(db.pool.active, getActiveCount()); Connection connection super.getConnection(); return new TracingConnection(connection, span, tracer); } catch (Exception e) { span.tag(error, true); span.status(StatusCode.ERROR); throw e; } finally { span.end(); } } }六、请求上下文传播6.1 上下文传播配置Configuration public class ContextPropagationConfig { Autowired private BeanFactory beanFactory; Bean public ContextRegistry contextRegistry() { ContextRegistry registry ContextRegistry.getInstance(); registry.registerContextPropagator(TextMapPropagator.getDefault()); return registry; } Bean public BaggageRegistry baggageRegistry() { BaggageRegistry registry BaggageRegistry.newBuilder() .addDefaultBaggageHandler((key, value) - MDC.put(key, value)) .build(); registry.register BaggageHandler.forEntry( Entry.of(user-id, new MDCEntryToContextCarrier()) ); return registry; } }6.2 MDC集成Component public class MdcTracingFilter extends OncePerRequestFilter { private static final String TRACE_ID traceId; private static final String SPAN_ID spanId; Autowired private Tracer tracer; Override protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain chain) throws ServletException, IOException { Span currentSpan tracer.currentSpan(); if (currentSpan ! null) { MDC.put(TRACE_ID, currentSpan.context().traceId()); MDC.put(SPAN_ID, currentSpan.context().spanId()); } try { chain.doFilter(request, response); } finally { MDC.clear(); } } }6.3 跨服务上下文传递Service public class UserServiceClient { private final RestTemplate restTemplate; private final Tracer tracer; public UserServiceClient(RestTemplate restTemplate, Tracer tracer) { this.restTemplate restTemplate; this.tracer tracer; } public User getUserById(Long id) { HttpHeaders headers new HttpHeaders(); // 从当前Span注入上下文到HTTP Header Span span tracer.currentSpan(); if (span ! null) { InjectorHttpHeaders injector TracingPropagators.getDefault() .getPropagator(getGlobalTracer()); injector.inject(span.context(), headers, HttpHeadersCarrier.create(headers)); } HttpEntityVoid entity new HttpEntity(headers); ResponseEntityUser response restTemplate.exchange( http://user-service/api/users/{id}, HttpMethod.GET, entity, User.class, id ); return response.getBody(); } }七、链路分析7.1 慢查询分析Service public class SlowQueryAnalyzer { Autowired private Tracer tracer; public void analyze() { Span currentSpan tracer.currentSpan(); if (currentSpan null) return; // 获取当前Span的子Span CollectionSpanData childSpans getChildSpans(currentSpan.context().spanId()); // 找出慢Span ListSpanData slowSpans childSpans.stream() .filter(span - span.durationMs() 1000) // 超过1秒 .sorted(Comparator.comparing(SpanData::durationMs).reversed()) .collect(Collectors.toList()); log.warn(Slow spans detected: {}, slowSpans); } }7.2 调用链分析Service public class TraceAnalyzer { Autowired private SpanRepository spanRepository; public CallGraph buildCallGraph(String traceId) { ListSpanData spans spanRepository.findByTraceId(traceId); CallGraph graph new CallGraph(); for (SpanData span : spans) { Node node new Node( span.getSpanId(), span.getOperationName(), span.getDurationMs() ); graph.addNode(node); if (span.getParentSpanId() ! null) { graph.addEdge(span.getParentSpanId(), span.getSpanId()); } } return graph; } public ListPath findCriticalPath(String traceId) { CallGraph graph buildCallGraph(traceId); return graph.findLongestPath(); } }7.3 依赖分析Service public class DependencyAnalyzer { public ServiceDependencyGraph buildDependencyGraph() { ListSpanData allSpans spanRepository.findAll(); MapString, SetString dependencies new HashMap(); for (SpanData span : allSpans) { String service span.getServiceName(); span.getTags().forEach((key, value) - { if (key.startsWith(peer.)) { String peerService extractPeerService(value); if (peerService ! null) { dependencies.computeIfAbsent(service, k - new HashSet()) .add(peerService); } } }); } return new ServiceDependencyGraph(dependencies); } }八、告警配置8.1 错误率告警# Prometheus告警规则 groups: - name: tracing-alerts rules: - alert: HighErrorRate expr: | sum(rate(spring_sleuth_spans{tag_errortrue}[5m])) by (service) / sum(rate(spring_sleuth_spans_count[5m])) by (service) 0.05 for: 5m labels: severity: critical annotations: summary: High error rate in {{ $labels.service }} description: Error rate is {{ $value | humanizePercentage }} - alert: SlowResponseTime expr: | histogram_quantile(0.95, sum(rate(spring_sleuth_spans_duration_seconds_bucket[5m])) by (le, service) ) 2 for: 10m labels: severity: warning annotations: summary: Slow response time in {{ $labels.service }} description: 95th percentile is {{ $value | humanizeDuration }}8.2 延迟告警- alert: LatencyIncrease expr: | sum(rate(spring_sleuth_spans_duration_seconds_sum[5m])) by (service) / sum(rate(spring_sleuth_spans_duration_seconds_count[5m])) by (service) 1.5 * avg_over_time( sum(rate(spring_sleuth_spans_duration_seconds_sum[1h])) by (service) / sum(rate(spring_sleuth_spans_duration_seconds_count[1h])) by (service) [1h:5m]) for: 5m labels: severity: warning annotations: summary: Latency increased in {{ $labels.service }}九、Grafana仪表盘9.1 链路追踪面板{ title: Request Trace Overview, panels: [ { title: Request Rate by Service, type: graph, targets: [ { expr: sum(rate(spring_sleuth_spans_count[5m])) by (service), legendFormat: {{ service }} } ] }, { title: Error Rate, type: graph, targets: [ { expr: sum(rate(spring_sleuth_spans{tag_error\true\}[5m])) by (service), legendFormat: {{ service }} } ] }, { title: P99 Latency, type: graph, targets: [ { expr: histogram_quantile(0.99, sum(rate(spring_sleuth_spans_duration_seconds_bucket[5m])) by (le, service)), legendFormat: {{ service }} } ] } ] }十、最佳实践10.1 采样策略策略适用场景配置全量采样开发环境、调试probability: 1.0概率采样生产环境常规probability: 0.1-0.5头部采样请求入口统一采样sampler: HeadBased自适应采样动态调整错误时提高采样率10.2 性能优化建议异步发送使用Kafka/RabbitMQ异步发送追踪数据采样策略根据流量动态调整采样率数据压缩启用追踪数据的压缩批量发送聚合多个Span后批量发送存储优化使用合适的存储后端和索引策略10.3 安全考虑# 敏感数据过滤 spring: sleuth: instrument: exclude: - org.springframework.web.servlet.Filter propagation: type: w3c baggage: correlation-enabled: false # 禁用自动MDC关联 data: redis: customizers: - tracing-repository-customizer十一、总结链路追踪是微服务可观测性的核心组件通过本文的介绍你可以链路追踪概述Trace、Span、Annotation等核心概念Spring Cloud Sleuth分布式追踪的基础组件Jaeger集成CNCF推荐的追踪系统Zipkin集成Twitter开源的追踪系统OpenTelemetry跨语言的追踪标准上下文传播跨服务传递追踪上下文链路分析慢查询、调用链、依赖分析告警配置基于Prometheus的告警规则Grafana仪表盘可视化链路追踪数据通过完善的链路追踪系统可以快速定位问题、优化性能、理解系统行为构建真正可观测的微服务系统。