Links | Steven Engelhardt

Wednesday 2025-05-28 Assorted Links

Assorted Links links

Published: 2025-05-28

Assorted links for Wednesday, May 28:

Unlocking the Power of Regex in SQL Server
Announcing the General Availability (GA) of JSON data type & JSON aggregates…

We are excited to announce the General Availability (GA) of the native JSON data type and JSON aggregates – JSON_OBJECTAGG & JSON_ARRAYAGG. You can use the JSON data type and JSON aggregates to integrate and work with JSON documents more efficiently in the database. This functionality is available in Azure SQL Database, Azure SQL Managed Instance with the Always-up-to-date update policy.
C# 14 – Exploring extension members
Evaluating content safety in your .NET AI applications

We are excited to announce the addition of the Microsoft.Extensions.AI.Evaluation.Safety package to the Microsoft.Extensions.AI.Evaluation libraries! This new package provides evaluators that help you detect harmful or sensitive content — such as hate speech, violence, copyrighted material, insecure code, and more — within AI-generated content in your Intelligent Applications.
.NET Aspire 9.3 is here and enhanced with GitHub Copilot!

Tuesday 2025-05-27 Assorted Links

Assorted Links links

Published: 2025-05-27

Assorted links for Tuesday, May 27:

Fast, Accurate, and Affordable: Sharded DiskANN for Multitenant Vector Search in Azure Cosmos DB

Azure Cosmos DB now includes Sharded DiskANN, a powerful capability that optimized for large-scale multitenant apps by splitting a DiskANN index into smaller, more performant pieces.
General Availability for Data API in vCore-based Azure Cosmos DB for MongoDB

The Data API is a RESTful HTTPS interface that enables developers to interact with their MongoDB data hosted in vCore-based Azure Cosmos DB directly from their applications—no drivers or complex query logic required. It offers a simple, efficient way to connect your MongoDB data with web apps, mobile apps, and tools like Power BI.
Elevating Azure Cosmos DB Resilience with Per Partition Automatic Failover

Today, we’re excited to announce the preview of Per Partition Automatic Failover (PPAF) for Azure Cosmos DB, a significant improvement to our single-region write accounts that boosts availability and resilience.
Boost Query Performance with Global Secondary Indexes in Azure Cosmos DB

Global secondary indexes for Azure Cosmos DB—now in Public Preview—make it easier to query data efficiently, especially as your datasets grow.
New Generally Available and Preview Search Capabilities in Azure Cosmos DB for NoSQL

Monday 2025-05-26 Assorted Links

Assorted Links links

Published: 2025-05-26

Assorted links for Monday, May 26:

How Meta understands data at scale
- Managing and understanding large-scale data ecosystems is a significant challenge for many organizations, requiring innovative solutions to efficiently safeguard user data. Meta’s vast and diverse systems make it particularly challenging to comprehend its structure, meaning, and context at scale.
- To address these challenges, we made substantial investments in advanced data understanding technologies, as part of our Privacy Aware Infrastructure (PAI). Specifically, we have adopted a “shift-left” approach, integrating data schematization and annotations early in the product development process. We also created a universal privacy taxonomy, a standardized framework providing a common semantic vocabulary for data privacy management across Meta’s products that ensures quality data understanding and provides developers with reusable and efficient compliance tooling.
Efficiently and Elegantly Modeling Embeddings in Azure SQL and SQL Server

Storing and querying text embeddings in a database it might seem challenging, but with the right schema design, it’s not only possible, it’s powerful. Whether you’re building AI-powered search, semantic filtering, or recommendation features, embeddings, and thus vectors, are now a first-class data type. So how do you model them well inside a database like SQL Server and Azure SQL?
Avoid T-SQL anti-patterns with the free T-SQL analysis tool

T-SQL Analyzer is a free, open-source, cross platform command line tool for identifying, and reporting the presence of anti-patterns and design issues in SQL Server T-SQL scripts.
Surprising Scalability of Multitenancy

Roughly speaking, the cost of a system scales with its (short-term1) peak traffic, but for most applications the value the system generates scales with the (long-term) average traffic.

The gap between “paying for peak” and “earning on average” is critical to understand how the economics of large-scale cloud systems differ from traditional single-tenant systems.
FAST ‘23 - Building and Operating a Pretty Big Storage System (My Adventures in Amazon S3)

Monday 2025-05-05 Assorted Links

Assorted Links links

Published: 2025-05-05

Assorted links for Monday, May 5:

How we ended up rewriting NuGet Restore in .NET 9

This is the story of how team members across NuGet, Visual Studio, and .NET embarked on a journey to fully rewrite the NuGet Restore algorithm to achieve break-through scale and performance. Written from the perspective of several team members, this entry provides a deep dive into the internals of NuGet, as well as strategies to identify and address performance issues. We hope that you enjoy it!
How Netflix Accurately Attributes eBPF Flow Logs

In a previous blog post, we described how Netflix uses eBPF to capture TCP flow logs at scale for enhanced cloud network insights. In this post, we delve deeper into how Netflix solved a core problem: accurately attributing flow IP addresses to workload identities.
Going beyond singleton, scoped, and transient lifetimes—tenant, pooled, and drifter

In this post I first briefly describe the standard lifetimes available in the .NET DI container. I then briefly describe the three hypothetical lifetimes described in the podcast. Finally, I show how you could implement one of these lifetimes in practice. In the next post I show a possible implementation for the remaining lifetime.
Multi-tenancy in ASP.NET Core 8 - Dependency Injection & Tenant Specific Services

This post discusses how we can have tenant specific services in a multi-tenant ASP.NET Core 8 application.
Enhancing Search Capabilities in SQL Server and Azure SQL with Hybrid Search and RRF Re-Ranking

In today’s data-driven world, delivering precise and contextually relevant search results is critical. SQL Server and Azure SQL Database now enable this through Hybrid Search—a technique that combines traditional full-text search with modern vector similarity search. This allows developers to build intelligent, AI-powered search experiences directly inside the database engine.

Wednesday 2025-04-09 Assorted Links

Assorted Links links

Published: 2025-04-09

Assorted links for Wednesday, April 9:

A large scale analysis of hundreds of in-memory cache clusters at Twitter

In this work, we significantly further the understanding of real-world cache workloads by collecting production traces from 153 in-memory cache clusters at Twitter, sifting through over 80 TB of data, and sometimes interpreting the workloads in the context of the business logic behind them.
Durability: Linux File APIs

[Y]ou should assume your data is corrupt between when a write is issued until after a flush or force unit access write completes. However, most programs use system calls to write data. This article looks at the guarantees provided by the Linux file APIs. It seems like this should be simple: a program calls write() and after it completes, the data is durable. However, write() only copies data from the application into the kernel’s cache in memory. To force the data to be durable you need to use some additional mechanism.
How Netflix Scales its API with GraphQL Federation (Part 1)

As we’ve grown the number of developers and increased our domain complexity, developing the API aggregation layer has become increasingly harder.

In order to address this rising problem, we’ve developed a federated GraphQL platform to power the API layer.
Traefik: canary deployments with weighted load balancing

Traefik is The Cloud Native Edge Router yet another reverse proxy and load balancer. Omitting all the Cloud Native buzzwords, what really makes Traefik different from Nginx, HAProxy, and alike is the automatic and dynamic configurability it provides out of the box. And the most prominent part of it is probably its ability to do automatic service discovery.
Rust vs Go in 2025

Which is better, Rust or Go—and does that question even make sense? Which language should you choose for your next project in 2025, and why? How does Rust compare with Go in areas like performance, simplicity, safety, features, scale, and concurrency?

Tuesday 2025-04-08 Assorted Links

Assorted Links links

Published: 2025-04-08

Assorted links for Tuesday, April 8:

Building a ubiquitous shared infrastructure using Twine

Twine is our homegrown cluster management system, which has been running in production for the past decade. A cluster management system allocates workloads to machines and manages the life cycle of machines, containers, and workloads. Kubernetes is a prominent example of an open source cluster management system. Twine has helped convert our infrastructure from a collection of siloed pools of customized machines dedicated to individual workloads to a large-scale ubiquitous shared infrastructure in which any machine can run any workload.
Ensuring data reaches disk

The purpose of this document is to describe the path data takes from the application down to the storage, concentrating on places where data is buffered, and to then provide best practices for ensuring data is committed to stable storage so it is not lost along the way in the case of an adverse event. The main focus is on the C programming language, though the system calls mentioned should translate fairly easily to most other languages.
Revolutionizing Money Movements at Scale with Strong Data Consistency

As one of the underlying engines, Uber Money fulfills some of the most important aspects of people’s engagement in the Uber experience. A system like this should not only be robust, but should also be highly available with zero-tolerance to downtime, after our success mantra: “To collect and disburse on-time, accurately and in-compliance”.

While we expand to multiple lines of businesses, and strategize the next best, the engineers in Uber Money also thrive on building the next generation’s Payments Platform which extends Uber’s growth. In this blog, we introduce you to this platform and provide insights into our learnings. This includes migrating hundreds of millions customers between two asynchronous systems while maintaining data-consistency with a goal of zero impact on our users.
How 30 Lines of Code Blew Up a 27-Ton Generator

A secret experiment in 2007 proved that hackers could devastate power grid equipment beyond repair—with a file no bigger than a GIF.
Avoiding overload in distributed systems by putting the smaller service in control

Within AWS, a common pattern is to split the system into services that are responsible for executing customer requests (the data plane), and services that are responsible for managing and vending customer configuration (the control plane). In this article, I discuss a number of different ways the data plane and the control plane interact with each other to avoid system overload. In many of these architectures the larger data plane fleet calls the smaller control plane fleet, but I also want to share the success we’ve had at Amazon when we put the smaller fleet in control.

Monday 2025-04-07 Assorted Links

Assorted Links links

Published: 2025-04-07

Assorted links for Monday, April 7:

The Lost Xperf Documentation–Disk Usage

As I’ve lamented previously, the documentation for xperf (Windows Performance Toolkit) is a bit light. The names of the columns in the summary tables can be exquisitely subtle, and I have never found any documentation for them. But, I’ve talked to the xperf authors, and I’ve used xperf a lot, and I’ve done some experiments, and here I share some more results, this time for the Disk Usage summary table.
The macro problem with microservices

In just 20 years, software engineering has shifted from architecting monoliths with a single database and centralized state to microservices where everything is distributed across multiple containers, servers, data centers, and even continents. Distributing things solves scaling concerns, but introduces a whole new world of problems, many of which were previously solved by monoliths.
Learnings From Two Years of Kubernetes in Production
FioSynth: A representative I/O benchmark and data visualizer for data center workloads

FioSynth is a benchmark tool used to automate the execution of storage workload suites and to parse results. It contains a base set of block level storage workloads, synthesized from production I/O traces, that simulate a diverse range of Facebook production services. It is useful for predicting how a storage device will perform in realistic production environments and for assisting with performance tuning.
Azure Container Registry Adds Teleportation

Project Teleport removes the cost of download and decompression by SMB mounting pre-expanded layers from the Azure Container Registry to Teleport enabled Azure container hosts.

Friday 2025-04-04 Assorted Links

Assorted Links links

Published: 2025-04-04

Assorted links for Friday, April 4:

Quake’s 3-D Engine: The Big Picture
Reasoning about colors

In July 2020 I went on a color-scheme vision quest. This led to some research on various color spaces and their utility, some investigation into the styling guidelines outlined by the base16 project, and the color utilities that ship within the GNU Emacs text editor. This article will be a whirlwind tour of things you can do to individual colors, and at the end how I put these blocks together.
Modern storage is plenty fast. It is the APIs that are bad.

In this article I will demonstrate that while hardware changed dramatically over the past decade, software APIs have not, or at least not enough. Riddled with memory copies, memory allocations, overly optimistic read ahead caching and all sorts of expensive operations, legacy APIs prevent us from making the most of our modern devices.
How io_uring and eBPF Will Revolutionize Programming in Linux

[eBPF and io_uring] may look evolutionary, but they are revolutionary in the sense that they will — we bet — completely change the way applications work with and think about the Linux Kernel.
An ex-Googler’s guide to dev tools

I thought it would be helpful to write a guide to dev tools outside of Google for the ex-Googler, written with an eye toward pragmatism and practicality. No doubt many ex-Googlers wish they could simply clone the Google internal environment to their new company, but you can’t boil the ocean. Here is my take on where you should start and a general path I think ex-Googlers can take to find the tools that will make them - and their new teams - as productive as possible.

Thursday 2025-04-03 Assorted Links

Assorted Links links

Published: 2025-04-03

Assorted links for Thursday, April 3:

A Critique of Snapshot Isolation

There have been recent attempts to enrich large-scale data stores, such as HBase and BigTable, with transactional support. Not surprisingly, inspired by traditional database management systems, serializability is usually compromised for the benefit of efficiency. For example, Google Percolator, implements lock-based snapshot isolation on top of BigTable. We show in this paper that this compromise is not necessary in lock-free implementations of transactional support. We introduce write-snapshot isolation, a novel isolation level that has a performance comparable with that of snapshot isolation, and yet provides serializability.
Making Snapshot Isolation Serializable

This article develops a theory that characterizes when nonserializable executions of applications can occur under [snapshot isolation].
Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions

This thesis presents the first implementation-independent specifications of existing ANSI isolation levels and a number of levels that are widely used in commercial systems, e.g., Cursor Stability, Snapshot Isolation. It also specifies a variety of guarantees for predicate-based operations in an implementation-independent manner. Two new levels are defined that provide useful consistency guarantees to application writers; one is the weakest level that ensures consistent reads, while the other captures some useful consistency properties provided by pessimistic implementations.
What Does Write Skew Look Like?

This post is about gaining intuition for Write Skew, and, by extension, Snapshot Isolation. Snapshot Isolation is billed as a transaction isolation level that offers a good mix between performance and correctness, but the precise meaning of “correctness” here is often vague. In this post I want to break down and capture exactly when the thing called “write skew” can happen.
Constructing human-grade parsers

Here are some observations on how parsers can be constructed in a way that makes it easier to recover from parse errors, produce multiple diagnostics in one pass, and provide partial results for further analysis even in the face of errors, providing a better experience for user-driven command line tools and interactive environments.

Wednesday 2025-04-02 Assorted Links

Assorted Links links

Published: 2025-04-02

Assorted links for Wednesday, April 2:

Build systems à la carte: Theory and practice

Build systems are awesome, terrifying – and unloved. They are used by every developer around the world, but are rarely the object of study. In this paper, we offer a systematic, and executable, framework for developing and comparing build systems, viewing them as related points in a landscape rather than as isolated phenomena. By teasing apart existing build systems, we can recombine their components, allowing us to prototype new build systems with desired properties.
Erasure Coding in Windows Azure Storage

In this paper we introduce a new set of codes for erasure coding called Local Reconstruction Codes (LRC). LRC reduces the number of erasure coding fragments that need to be read when reconstructing data fragments that are offline, while still keeping the storage overhead low.
Reproducible Builds

Reproducible builds are a set of software development practices that create an independently-verifiable path from source to binary code.
Getting to Deterministic Builds on Windows

This is a set of notes on getting to deterministic builds in C, C++ and Rust on Windows.
Our Software Dependency Problem

Software dependencies carry with them serious risks that are too often overlooked. The shift to easy, fine-grained software reuse has happened so quickly that we do not yet understand the best practices for choosing and using dependencies effectively, or even for deciding when they are appropriate and when not. My purpose in writing this article is to raise awareness of the risks and encourage more investigation of solutions.