Tag

#monitoring

13 posts tagged monitoring.

tooling

Federated Learning in Production: What Substra Actually Does for Privacy-Preserving ML

Owkin's Substra framework keeps training data local while sharing only model weights — but federated architectures break standard MLOps assumptions around
June 12, 2026
monitoring

OpenAI Tops Gartner's Coding-Agent Quadrant. Now You Own a Production ML System.

Gartner named OpenAI a Leader in its first Magic Quadrant for Enterprise AI Coding Agents. The operational story is the part the press release skips: a
June 2, 2026
monitoring

The ML Monitoring Metrics Taxonomy: Drift, Data Quality, and Model Decay

A reference taxonomy of the signals that actually tell you a production ML system is failing — input drift, prediction drift, concept drift, data quality
May 22, 2026
monitoring

OpenTelemetry GenAI Semantic Conventions: Instrument LLM Apps

How the OpenTelemetry GenAI semantic conventions standardize spans, metrics, and events for LLM apps, what they skip, and how to instrument without rework.
May 22, 2026
mlops

LLM Benchmarks in 2026: Which Still Discriminate, and How to Run

Static benchmarks like MMLU and HumanEval have saturated for frontier models. Here's which LLM benchmarks still produce signal, why contamination is worse
May 13, 2026
monitoring

Watermarking Should Be Treated as a Monitoring Primitive

A new paper reframes LLM watermarking from an adversarial evasion problem into a monitoring infrastructure question.
May 13, 2026
monitoring

LLM Testing: A Guide to Evals, Metrics, and Production Monitoring

LLM testing spans offline evals, CI gate checks, and live production monitoring — three distinct jobs that need different tools.
May 11, 2026
mlops

LLM Benchmarks Explained: What the Numbers Mean and Miss

A practical guide to the major LLM benchmarks — MMLU, HumanEval, GPQA Diamond, SWE-bench — what they actually test, why saturation makes most scores
May 10, 2026
mlops

LLM Fine Tuning in Production: A Practical MLOps Guide

When to use LLM fine tuning over RAG, how LoRA and QLoRA cut GPU costs, and what to monitor after you ship a fine-tuned model — for ML engineers who own
May 10, 2026
mlops

Machine Learning Pipeline: Stages, Failure Points, and Monitoring

A practitioner's guide to the machine learning pipeline — from data ingestion to production monitoring — covering common failure points, drift types, and
May 10, 2026
mlops

ML Model Deployment: A Guide to Shipping Models That Stay Healthy

ML model deployment fails far more often than it should — typically before the model ever serves traffic. Here's what breaks, which deployment patterns
May 10, 2026
mlops

MLOps Best Practices: What Keeps Models Running in Production

A practitioner's guide to mlops best practices — from CI/CD pipeline automation and model versioning to drift detection and continuous retraining — based
May 10, 2026
mlops

MLOps Tools: A Practitioner's Map of the Production Stack

A category-by-category breakdown of MLOps tools — experiment tracking, orchestration, feature stores, serving, and monitoring — with honest tradeoffs for
May 10, 2026