Links | Steven Engelhardt

Tuesday 2025-04-01 Assorted Links

Assorted Links links

Published: 2025-04-01

Assorted links for Tuesday, April 1:

Cloudflare turns AI against itself with endless maze of irrelevant facts

On Wednesday, web infrastructure provider Cloudflare announced a new feature called “AI Labyrinth” that aims to combat unauthorized AI data scraping by serving fake AI-generated content to bots. The tool will attempt to thwart AI companies that crawl websites without permission to collect training data for large language models that power AI assistants like ChatGPT.
War story: the hardest bug I ever debugged

All of a sudden, without any ostensible cause, Google Docs was flooded with errors. How it took me 2 days and a coworker to solve the hardest bug I ever debugged.
Introducing Styrolite: Building a Linux Container Runtime from Scratch

Edera Protect is a suite of offerings bridging the gap between modern cloud native computing and virtualization-based security techniques. To power this platform, we’ve built our own container runtime designed to operate as a microservice, allowing it to run containers in a fully programmatic way—similar to how the Kubernetes Container Runtime Interface (CRI) enables container management through microservices.
High-Performance PNG Codec

I would like to announce a new high-performance PNG codec, which is much faster than other available codecs written in C, C++, and other programming languages.
Image creation and testing with HashiCorp Packer and ServerSpec

Monday 2025-03-31 Assorted Links

Assorted Links links

Published: 2025-03-31

Assorted links for Monday, March 31:

In S3 simplicity is table stakes

Today, on Pi Day (S3’s 19th birthday), I’m sharing a post from Andy Warfield, VP and Distinguished Engineer of S3. Andy takes us through S3’s evolution from simple object store to sophisticated data platform, illustrating how customer feedback has shaped every aspect of the service. It’s a fascinating look at how we maintain simplicity even as systems scale to handle hundreds of trillions of objects.
The xz attack shell script
Context is all you need: Better AI results with custom instructions

Earlier this month, we announced the general availability of custom instructions in Visual Studio Code. Custom instructions are how you give Copilot specific context about your team’s workflow, your particular style preferences, libraries the model may not know about, etc.

In this post we’ll dive into what custom instructions are, how you can use them today to drastically improve your results with GitHub Copilot, and even a brand new preview feature called “prompt files” that you can try today.
OpenAI adopts rival Anthropic’s standard for connecting AI models to data

In a post on X on Wednesday, OpenAI CEO Sam Altman said that OpenAI will add support for Anthropic’s Model Context Protocol, or MCP, across its products, including the desktop app for ChatGPT. MCP is an open source standard that helps AI models produce better, more relevant responses to certain queries.
Microsoft announces security AI agents to help overwhelmed humans

Microsoft’s six security agents will be available in preview next month, and are designed to do things like triage and process phishing and data loss alerts, prioritize critical incidents, and monitor for vulnerabilities.

Tuesday 2025-03-18 Assorted Links

Assorted Links links

Published: 2025-03-18

Assorted links for Tuesday, March 18:

Researchers astonished by tool’s apparent success at revealing AI’s “hidden objectives”

In a new paper published Thursday titled “Auditing language models for hidden objectives,” Anthropic researchers described how custom AI models trained to deliberately conceal certain “motivations” from evaluators could still inadvertently reveal secrets, due to their ability to adopt different contextual roles they call “personas.” The researchers were initially astonished by how effectively some of their interpretability methods seemed to uncover these hidden training objectives, although the methods are still under research.
Why SNES hardware is running faster than expected—and why it’s a problem

After significant research and testing on dozens of actual SNES units, the TASBot team now thinks that a cheap ceramic resonator used in the system’s Audio Processing Unit (APU) is to blame for much of this inconsistency. While Nintendo’s own documentation says the APU should run at a consistent rate of 24.576 Mhz (and the associated Digital Signal Processor sample rate at a flat 32,000 Hz), in practice, that rate can vary just a bit based on heat, system age, and minor physical variations that develop in different console units over time.
The Defer Technical Specification: It Is Time

Time for me to write this blog post and prepare everyone for the implementation blitz that needs to happen to make defer a success for the C programming language.
Introducing support for SLNX, a new, simpler solution file format in the .NET CLI

Solution files have been a part of the .NET and Visual Studio experience for many years now, and they’ve had the same custom format the whole time. Recently, the Visual Studio solution team has begun previewing a new, XML-based solution file format called SLNX. Starting in .NET SDK 9.0.200, the dotnet CLI supports building and interacting with these files in the same way as it does with existing solution files.
Hello HybridCache! Streamlining Cache Management for ASP.NET Core Applications

HybridCache is a new .NET 9 library available via the Microsoft.Extensions.Caching.Hybrid package and is now generally available! HybridCache, named for its ability to leverage both in-memory and distributed caches like Redis, ensures that data storage and retrieval is optimized for performance and security, regardless of the scale or complexity of your application.

Monday 2025-03-17 Assorted Links

Assorted Links links

Published: 2025-03-17

Assorted links for Monday, March 17:

Faster Go maps with Swiss Tables

Like sorting algorithms, hash table data structures continue to see improvements. In 2017, Sam Benzaquen, Alkis Evlogimenos, Matt Kulukundis, and Roman Perepelitsa at Google presented a new C++ hash table design, dubbed “Swiss Tables”. In 2018, their implementation was open sourced in the Abseil C++ library.

Go 1.24 includes a completely new implementation of the built-in map type, based on the Swiss Table design.
Harden-Runner detection: tj-actions/changed-files action is compromised

We are investigating a critical security incident involving the popular tj-actions/changed-files GitHub Action. We want to alert you immediately so that you can take prompt action. This post will be updated as new information becomes available.
Highlights from Git 2.49
1. Faster packing with name-hash v2
2. Backfill historical blobs in partial clones
Life Altering Postgresql Patterns: Many of these apply to all SQL, not just PostgreSQL
1. Use UUID primary keys
2. Give everything created_at and updated_at
3. On update restrict on delete restrict
4. Use schemas
5. Enum Tables
6. Name your tables singularly
7. Mechanically name join tables
8. Almost always soft delete
9. Represent statuses as a log
10. Mark special rows with a system_id
11. Use views sparingly
12. JSON Queries
cppmatch

A header-only C++ library that offers exceptionless error handling and type-safe enums, bringing Rust-inspired error propagation with the ? operator and the match operator to C++.

Friday 2025-03-14 Assorted Links

Assorted Links links

Published: 2025-03-14

Assorted links for Friday, March 14:

Agentic AI is the New Web App, and Your AI Strategy Must Evolve

Two years into the generative AI revolution, the LLMs that power tools like ChatGPT and Claude have become startlingly powerful. However, according to Salesforce CEO Marc Benioff, they may be reaching their limits. Per Benioff, the next evolution is not necessarily more intelligent LLMs but autonomous AI agents that leverage LLMs to execute tasks independently.
Title Launch Observability at Netflix Scale
Performance of the Python 3.14 tail-call interpreter

About a month ago, the CPython project merged a new implementation strategy for their bytecode interpreter. The initial headline results were very impressive, showing a 10-15% performance improvement on average across a wide range of benchmarks across a variety of platforms.

Unfortunately, as I will document in this post, these impressive performance gains turned out to be primarily due to inadvertently working around a regression in LLVM 19.
Model Context Protocol

MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.
Traversal-resistant file APIs

A path traversal vulnerability arises when an attacker can trick a program into opening a file other than the one it intended. This post explains this class of vulnerability, some existing defenses against it, and describes how the new os.Root API added in Go 1.24 provides a simple and robust defense against unintentional path traversal.

Thursday 2025-03-13 Assorted Links

Assorted Links links

Published: 2025-03-13

Assorted links for Thursday, March 13:

Instrumenting Apache Spark Structured Streaming jobs using OpenTelemetry

Monitoring Apache Spark structured streaming data workloads is challenging because the data is continuously processed as it arrives. Because of this always-on nature of stream processing, it is harder to troubleshoot problems during development and production without real-time metrics, alerting and dashboards. Traces complement metrics, and since Spark doesn’t include them by default, we integrate them using OpenTelemetry.
Protecting user data through source code analysis at scale

Meta’s Anti Scraping team focuses on preventing unauthorized scraping as part of our ongoing work to combat data misuse. In order to protect Meta’s changing codebase from scraping attacks, we have introduced static analysis tools into our workflow. These tools allow us to detect potential scraping vectors at scale across our Facebook, Instagram, and even parts of our Reality Labs codebases.
We’ve figured out the basics of a shape-shifting, T-1000-style material

Campàs and his team drew inspiration from processes called fluidization and convergent extension—mechanisms that cells in embryos use to coordinate their behavior when forming tissues and organs in a developing organism. The team built a robotic collective where each robotic unit behaved like an embryonic cell. As a collective, the robots behaved like a material that could change shape and switch between solid and liquid states, just like the T-1000.
Cross-Modal Retrieval: Why It Matters for Multimodal AI

With its ability to simultaneously process different data types (think text, image, audio, video and more), the continuing development of multimodal AI represents the next step that would help to further enhance a wide range of tools — including those for generative AI and autonomous agentic AI.
The Deployment Bottleneck No One Talks About

Most applications rely on cloud SDKs to connect to services like message brokers, queues, databases, APIs and more.

Rather than working directly with cloud SDKs, a better approach is to introduce a standardized layer between applications and cloud services. This allows developers to interact with essential resources without being tightly coupled to a specific provider’s SDKs. A framework like Dapr helps achieve this by providing a uniform API for interacting with cloud resources.

Wednesday 2025-03-12 Assorted Links

Assorted Links links

Published: 2025-03-12

Assorted links for Wednesday, March 12:

Zen and the Art of Microcode Hacking

The root cause of the EntrySign vulnerability is that the AMD Zen microcode signature verification algorithm uses the CMAC function as a hash function; however, CMAC is a message authentication code and does not necessarily provide the same security guarantees as a cryptographic hash function.

The weakness of using CMAC as a hash function is that anyone who has the encryption key is able to observe the intermediate values of the encryption and calculate a way to “correct” the difference so that the final output remains the same, even if the inputs are completely different.
Thousands of websites hit by four backdoors in 3rd party JavaScript attack

While analyzing threats targeting WordPress frameworks, we found an attack where a single 3rd party JavaScript file was used to inject four separate backdoors into 1,000 compromised websites using cdn.csyndication[.]com/.

Creating four backdoors facilitates the attackers having multiple points of re-entry should one be detected and removed. A unique case we haven’t seen before. Which introduces another type of attack made possibly by abusing websites that don’t monitor 3rd party dependencies in the browser of their users.
How to debug code with GitHub Copilot

GitHub Copilot can streamline your debugging process by troubleshooting in your IDE, analyzing pull requests, and more, helping you tackle issues faster and more robustly.
Finding leaked passwords with AI: How we built Copilot secret scanning

Passwords are notoriously difficult to detect with conventional programming approaches. AI can help us find passwords better because it understands context. This blog post will explore the technical challenges we faced with building the feature and the novel and creative ways we solved them.
Monads

If you understand what a functor is, it should be easy to grasp the idea of a monad. It’s a functor you can flatten.

Tuesday 2025-03-11 Assorted Links

Assorted Links links

Published: 2025-03-11

Assorted links for Tuesday, March 11:

How Generative AI Is Reshaping the SDLC

Amazon Q shows how GenAI is helping developers at all stages of code creation and delivery, said Srini Iragavarapu of AWS in this episode of Makers.
.NET AI Template Now Available in Preview

Want to get started with AI development, but not sure where to start? I’ve got a treat for you – we have a new AI Chat Web App template now in preview.
Rethinking System Architecture: The Rise of Distributed Intelligence with eBPF

With eBPF, we can process, filter, and act on data as it flows through the system — directly at the kernel level. This architecture approach flips the centralized model on its head by embedding decision-making directly into the system at the point where data is generated. This means that instead of forwarding vast amounts of raw data for centralized processing, we can use intelligent, kernel-embedded programs to analyze, process, and act on data exactly where it was generated in real-time. By doing this, eBPF enables a shift from centralized, reactive decision-making to distributed, proactive intelligence.
When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds: This is the AI alignment problem, which has been explored extensively in science fiction.

When sensing defeat in a match against a skilled chess bot, [advanced AI models] don’t always concede, instead sometimes opting to cheat by hacking their opponent so that the bot automatically forfeits the game.
Unlock new possibilities for AI Evaluations for .NET

The Microsoft.Extensions.AI.Evaluations library is designed to simplify the integration of AI evaluation processes into your applications. It provides a robust framework for evaluating your AI applications and automating the assessment of their performance.

Monday 2025-03-10 Assorted Links

Assorted Links links

Published: 2025-03-10

Assorted links for Monday, March 10:

Vector Databases: The Foundation of AI Agent Innovation
NVMe-oF Substantially Reduces Data Access Latency

NVMe-oF is a network protocol that extends the parallel access and low latency features of Nonvolatile Memory Express (NVMe) protocol across networked storage. Originally designed for local storage and common in direct-attached storage (DAS) architectures, NVMe delivers high-speed data access and low latency by directly interfacing with solid-state disks. NVMe-oF allows these same advantages to be achieved in distributed and clustered environments by enabling external storage to perform as if it were local.
Why Observability Needs To Go Headless

Many enterprises generate terabytes of log data every day, resulting in high costs to ingest, store and analyze that data. Even worse, many observability platforms are walled gardens, making it hard to use log data for use cases beyond observability, such as business intelligence, data science and machine learning.

To solve both of these problems, it’s time for headless observability, a fresh approach that decouples the frontend (visualization, querying and analytics) from the backend (data ingestion and storage) — all while keeping operations simple.
The Million-Dollar Problem of Slow Microservices Testing

By shifting integration tests from the slow outer loop into the rapid inner loop, organizations can fundamentally transform their development process.
Strobelight: A profiling service built on open source technology
- We’re sharing details about Strobelight, Meta’s profiling orchestrator.
- Strobelight combines several technologies, many open source, into a single service that helps engineers at Meta improve efficiency and utilization across our fleet.
- Using Strobelight, we’ve seen significant efficiency wins, including one that has resulted in an estimated 15,000 servers' worth of annual capacity savings.

Friday 2025-02-28 Assorted Links

Assorted Links links

Published: 2025-02-28

Assorted links for Friday, Febuary 28:

5 Frameworks That Embrace Declarative State Management
The Staging Bottleneck: Microservices Testing in FinTech

A sandbox is a lightweight, isolated, production-like testing setup created dynamically from a shared baseline environment. Designed to replicate production conditions at a fraction of the cost and complexity, sandboxes effectively transform a single staging environment into multiple independent environments. By multiplexing the baseline staging setup, sandboxes provide tailored environments for individual engineers or QA teams without adding compliance risks or increasing maintenance burdens, as they inherit the same compliance and configuration frameworks as production.
Why AI Agents Need an Operational Database
Database Scalability and the Giant Flea: A Lesson in Complexity
How GitHub uses CodeQL to secure GitHub