The MURE LOG #10: routing, context, benchmarks
The data systems log, curated by humans
Hey, welcome back!
This issue is about what it actually takes to make AI work in a data context: agent architectures, context layers, query routing, and benchmarks that cut through the noise. Less hype, more decisions you can act on, plus lots of announcements.
Our feed
01 / I built a (very small) agent swarm - Tristan Handy
About (micro) agent swarm: what he built and what he learned.
02 / Beyond the Semantic Layer: Building a Context Layer for the Agentic Era - Simon Späti
Building the context layer with ktx.
03 / Routing Multiple Query Engines with Iceberg - Rob M
How to route queries across Trino, Spark, DuckDB, Snowflake, Athena, and Flink on shared Iceberg tables? Great write up.
04 / How Anthropic enables self-service data analytics with Claude - Anthropic
Tips and approaches to maximizing Claude’s ability to drive self-serve business insights.
05 / LLM Research Papers: The 2026 List (January to May) - Sebastian Raschka
LLMs paper list.
06 / TPC-H for less than a cent: ClickHouse Cloud vs. Snowflake, Databricks, BigQuery, and Redshift - Tom Schreiber et al.
TPC-H benchmark results. All benchmark scripts, queries, and result files are available in a public GitHub repository
07 / Writing Agent Skills for an Open Source Project: Lessons from DataFusion Python - Tim Saucer
Great write up: who a skill is for, where a skill should live, how to keep it honest as your API evolves, and how to evaluate it against a real workload.
08 / How OpenAI Built Its Data Agent - Alex Xu
With Emma Tang, Head of Data Platform Engineering at OpenAI, about how the agent works, architecture, use cases, and more.
What's new
Anthropic - acquired Stainless
Apache Hudi - announced v1.2, towards multimodal data
Columnar - introduced databow, a CLI to query multiple DBs
Databricks - introduced cross-engine ABAC
dbt Labs - dbt Core v2
DuckDB - DuckDB Labs became DuckLabs
LangChain - built SmithDB, the data layer for agent observability
LinkedIn Labs - introduced crosscheck, AI models benchmarking
Microsoft - Microsoft Build happened last week, here a summary on Fabric Analytics at Build 2026 and Fabric June update
Polars - launched distributed engine on Kubernetes
Snowflake - Snowflake Summit happened last week, here a great summary of Snowflake Summit 2026 by Hugo Lu
Tools & demos
Jack Vanlightly - dimster, a performance benchmarking for Apache Kafka
Julien Hurault - boring-ui, a workbench the agent can control and reshape
LakeOps- queryflu, multi-engine SQL query router in Rust
Microsoft - rayfin, data apps on Fabric
Peer Grønnerup - fabricstack.dev, tools catalog for Microsoft Fabric & Power BI
Raki Rahman - spark-devcontainer and spark-sandbox with a focus on Microsoft Fabric
Tim Hiebenthal- duckbrain, DuckDB-backed MCP memory server for Obsidian vaults
Tobias Müller- quacklake, a DuckLake data catalog based on quack, deployed to Cloudflare
Upcoming Events
Databricks Data + AI Summit · June 15 / SF
EuroPython 2026 · July 13 / Kraków, Poland
DataEngBytes · July 13 / Melbourne + July 28 / Sydney
Ai4 · Aug 4 / Las Vegas
dbt Summit · Sep 15 / Las Vegas
Big Data LDN · Sep 23 / London
Microsoft Fabric Community Conference Europe · Sep 28 / Barcelona, Spain
J On the Beach · Oct 29 / Malaga, Spain
OSA CON · Nov 2 / SF
Microsoft Ignite · Nov 2 / SF
AWS re:Invent · Nov 30 / Las Vegas
Bonus: call for data speakers link
That’s all for now — we’ll be back in your inbox in two weeks.

