Links | Steven Engelhardt

Thursday 2025-02-27 Assorted Links

Assorted Links links

Published: 2025-02-27

Assorted links for Thursday, Febuary 27:

The Engineer’s Guide to Controlling Configuration Drift
Key techniques:
- Infrastructure as Code (IaC)
- Policy as Code (PaC)
- Compliance as Code
- Application Configuration Management
- Configuration Checklist
- Credential Management
- Centralized Configuration Management
- Environment Parity
How Precision Time Protocol handles leap seconds
Observability Isn’t Enough. It’s Time To Federate Log Data

With data federation, you can query data across many different sources without moving it. With this approach, no additional pipeline is needed; there are no egress costs and none of the security risks that come with migrating data.
Data logs: The latest evolution in Meta’s access tools

We created data logs as a solution to provide users who want more granular information with access to data stored in Hive. In this context, an individual data log entry is a formatted version of a single row of data from Hive that has been processed to make the underlying data transparent and easy to understand.
Open Source Redefines Data Platforms

Wednesday 2025-02-26 Assorted Links

Assorted Links links

Published: 2025-02-26

Assorted links for Wednesday, Febuary 26:

What is observability 2.0?
Key differences between traditional observability and observability 2.0
- Data handling:
  - Traditional: Relies on separate tools for metrics, logs, and traces, creating silos and requiring manual correlation.
  - 2.0: Unifies telemetry data into a single platform, offering a comprehensive view of system health.
- Problem detection:
  - Traditional Uses static thresholds and alerts that are often reactive and miss subtle issues.
  - 2.0: Employs AI and machine learning to identify anomalies in real-time, enabling proactive issue resolution.
- Focus on context:
  - Traditional: Provides raw technical data without linking it to broader business outcomes.
  - 2.0: Maps telemetry data to business metrics, ensuring decisions align with organizational goals.
- Scalability and adaptability:
  - Traditional Struggles with dynamic environments like Kubernetes and serverless, often requiring custom setups.
  - 2.0: Designed for dynamic scaling, adapts with ease to changes in cloud-native architectures.
Cloud Native Computing Foundation Announces CubeFS Graduation

CubeFS is an open source distributed storage system that supports access protocols such as POSIX, HDFS, S3, and its own REST API. It can be used in many scenarios, including big data, AI/LLMs, container platforms, separation of storage and computing for databases and middleware, data sharing, and more. Key features of CubeFS include a highly scalable metadata service with strong consistency and multi-tenancy support for better resource utilization and tenant isolation.
What Developers Need to Know About Telemetry Pipelines

A telemetry pipeline is a system that collects, processes and routes telemetry data (logs, metrics and traces) from various sources to the right monitoring and analysis tools. Instead of managing separate agents or collectors for different signals, a telemetry pipeline unifies data handling, making observability more efficient and scalable.
What Are Linux Namespaces and How Are They Used?

Namespaces restrict resources that a containerized process can see so that one process can’t see the resources being used by another. This feature is crucial to the likes of containers and orchestration tools such as Kubernetes because, otherwise, one deployed container would be able to access or view resources used by another.
System Operators to Timekeepers: What Will Replace Leap Seconds?

Earth’s rotation, for thousands of years, has mostly slowed, the biggest driver being the changing tides that come with the gravitational tug of the moon. Currents in the planet’s outer core, which scientists are still trying to figure out, also have slowed the spin. But the core can speed up the spin, too, which may be what’s been happening recently. Additional leap seconds have become a lot less frequent in the past two decades.

Tuesday 2025-02-25 Assorted Links

Assorted Links links

Published: 2025-02-25

Assorted links for Tuesday, Febuary 25:

Exploring the Inevitable Future of Data-Dependent Applications

Instead of combining technologies like MongoDB, Redis, Kafka, and application servers, why not skip system fragmentation and use a single technology platform? Only code makes these systems run, so why not code for just one system?
Simplify AI Development with Machine Learning Containers
Attacks on Maven proxy repositories

All tested [repository managers] not only store and serve artifacts, but also perform complex parsing and indexing operations on them. Therefore, a specially crafted artifact can be used to attack the repository manager that processes it. This opens a possibility for XSS, XXE, archive expansion, and path traversal attacks.
Modernizing legacy code with GitHub Copilot: Tips and examples
Open source AI is already finding its way into production

In our survey of 2,000 enterprise respondents on software development teams across the US, Germany, India, and Brazil, nearly everyone said they had experimented with open source AI models at some point.

Monday 2025-02-24 Assorted Links

Assorted Links links

Published: 2025-02-24

Assorted links for Monday, Febuary 24:

shared_ptr overuse in C++
The pitfall can be divided in two:
- Passing smart pointers to functions that don’t have to deal with ownership
- Improper use of shared_ptr in non-owning objects
.NET 9 Networking Improvements
In this release, we made two impactful performance improvements in HTTP connection pooling.
- We added opt-in support for multiple HTTP/3 connections.
- We also addressed lock contention in HTTP 1.1 connection pooling (dotnet/runtime#70098).
One of the main pain points when debugging HTTP traffic of applications using earlier versions of .NET is that the application doesn’t react to changes in Windows proxy settings. This issue was mitigated in dotnet/runtime#103364, where the HttpClient.DefaultProxy is set to an instance of Windows proxy that listens for registry changes and reloads the proxy settings when notified.
What’s New for ASP.NET Core & Blazor in .NET 9
1. Making Static Files Lightning Fast with MapStaticAssets
2. Blazor Gets Even More Interactive and Hybrid-Friendly
3. Simplified Authentication State Management
Simplify your .NET data transfers with the new Azure Storage Data Movement library
New York Times goes all-in on internal AI tools

The New York Times is greenlighting the use of AI for its product and editorial staff, saying that internal tools could eventually write social copy, SEO headlines, and some code.

Friday 2025-02-21 Assorted Links

Assorted Links links

Published: 2025-02-21

Assorted links for Friday, Febuary 21:

How to Store a Knowledge Graph in a Database

A knowledge graph represents information as a set of nodes and the relationships between those nodes.

When your source data consists of assets like technical documentation, research publications, or highly interconnected websites, a knowledge graph returns better results than a simple vector search. That’s because a knowledge graph search can traverse links between nodes, finding semantically relevant results two or more steps away from the first node.
AI Agents Are About to Blow Up the Business Process Layer

Agentic AI is all about autonomy (think self-driving cars), employing a system of agents to constantly adapt to dynamic environments and independently create, execute and optimize results.

When agentic AI is applied to business process workflows, it can replace fragile, static business processes with dynamic, context-aware automation systems.
Storing, querying and keeping embeddings updated: options and best practices
Database and AI: solutions for keeping embeddings updated
- Using a Database Trigger
- Using Change Tracking
- Using an Azure Function Sql Trigger binding
- Using Azure Logic Apps
- Using Change Data Capture
- Using the new Change Event Stream
Introducing Change Event Streaming: Join the Azure SQL Database Private Preview for Change Data Streaming

Thursday 2025-02-20 Assorted Links

Assorted Links links

Published: 2025-02-20

Assorted links for Thursday, Febuary 20:

Data Infrastructure, Not AI Models, Will Drive IT Spend in 2025

As organizations race to implement Artificial Intelligence (AI) initiatives, they’re encountering an unexpected bottleneck: the massive cost of data infrastructure required to support AI applications.

I’m seeing organizations address these challenges through innovative architectural approaches. One promising direction is the adoption of leaderless architectures combined with object storage. This approach eliminates the need for expensive data movement by leveraging cloud-native storage solutions that simultaneously serve multiple purposes.

Another key strategy involves rethinking how data is organized and accessed. Rather than maintaining separate infrastructures for streaming and batch processing, companies are moving toward unified platforms that can efficiently handle both workloads. This reduces infrastructure costs and simplifies data governance and access patterns.
Object Store Apps: Cloud Native’s Freshest Architecture

An increasing number of start-ups and end-users find that using cloud object storage as the persistence layer saves money and engineering time that would otherwise be needed to ensure consistency.
The Feds Push for WebAssembly Security Over eBPF

According to a National Institute of Standards and Technology (NIST) paper, “A Data Protection Approach for Cloud-Native Applications” (authors: Wesley Hales from LeakSignal; Ramaswamy Chandramouli, a supervisory computer scientist at NIST), WebAssembly could and should be integrated across the cloud native service mesh sphere in particular to enhance security.
Deep Dive Into DeepSeek-R1: How It Works and What It Can Do

During DeepSeek-R1’s training process, it became clear that by rewarding accurate and coherent answers, nascent model behaviors like self-reflection, self-verification, long-chain reasoning and autonomous problem-solving point to the possibility of emergent reasoning that is learned over time, rather than overtly taught — thus possibly paving the way for further breakthroughs in AI research.
Use Azure Cosmos DB as a Docker container in CI/CD pipelines

The Linux-based Azure Cosmos DB emulator is available as a Docker container and can run on a variety of platforms, including ARM64 architectures like Apple Silicon. It allows local development and testing of applications without needing an Azure subscription or incurring service costs. You can easily run it as a Docker container, and use it for local development and testing.

Wednesday 2025-02-19 Assorted Links

Assorted Links links

Published: 2025-02-19

Assorted links for Wednesday, Febuary 19:

AI used to design a multi-step enzyme that can digest some plastics

A new paper today describes a success in making a brand-new enzyme with the potential to digest plastics. But it also shows how even a simple enzyme may have an extremely complex mechanism—and one that’s hard to tackle, even with the latest AI tools.
3 takeaways from red teaming 100 generative AI products
1. Generative AI systems amplify existing security risks and introduce new ones
2. Humans are at the center of improving and securing AI
3. Defense in depth is key for keeping AI systems safe
AIs and Robots Should Sound Robotic

We have a simple proposal: all talking AIs and robots should use a ring modulator. In the mid-twentieth century, before it was easy to create actual robotic-sounding speech synthetically, ring modulators were used to make actors’ voices sound robotic.
2025 OWASP Top 10 for LLM Applications: A Quick Guide
1. LLM01: Prompt injection
2. LLM02: Sensitive information disclosure
3. LLM03: Supply chain
4. LLM04: Data and model poisoning
5. LLM05: Improper output handling
6. LLM06: Excessive agency
7. LLM07: System prompt leakage
8. LLM08: Vector and embedding weaknesses
9. LLM09: Misinformation
10. LLM10: Unbounded consumption
Cloud vs. On-Prem: Which Is Better for Your Kubernetes Cluster?

Cloud solutions offer unparalleled flexibility and ease of scaling, while on-premises setups provide unmatched control and security for sensitive workloads.

Tuesday 2025-02-18 Assorted Links

Assorted Links links

Published: 2025-02-18

Assorted links for Tuesday, Febuary 18:

A brief and incomplete comparison of memory corruption detection tools

ASAN detects a lot more types of memory errors, but it requires that you recompile everything. This can be limiting if you suspect that the problem is coming from a component you cannot recompile (say because you aren’t set up to recompile it, or because you don’t have the source code). Valgrind and AppVerifier have the advantage that you can turn them on for a process without requiring a recompilation.
Why Mocks Fail: Real-Environment Testing for Microservices
- Use mocks for edge cases and scenarios requiring controlled inputs.
- Leverage real environments to validate integration flows, complex API behaviors and performance characteristics against real dependencies.
Emerging Patterns in Building GenAI Products:
- Direct Prompting: Send prompts directly from the user to a Foundation LLM
- Embeddings: Transform large data blocks into numeric vectors so that embeddings near each other represent related concepts
- Evals: Evaluate the responses of an LLM in the context of a specific task
- Hybrid Retriever: Combine searches using embeddings with other search techniques
- Query Rewriting: Use an LLM to create several alternative formulations of a query and search with all the alternatives
- Reranker: Rank a set of retrieved document fragments according to their usefulness and send the best of them to the LLM.
- Retrieval Augmented Generation (RAG): Retrieve relevant document fragments and include these when prompting the LLM
How Meta discovers data flows via lineage at scale

In order to build high-quality data lineage, we developed different techniques to collect data flow signals across different technology stacks: static code analysis for different languages, runtime instrumentation, and input and output data matching, etc.
Sam Altman lays out roadmap for OpenAI’s long-awaited GPT-5 model

GPT-5 will be a system that brings together features from across OpenAI’s current AI model lineup, including conventional AI models, SR models, and specialized models that do tasks like web search and research.

Monday 2025-02-17 Assorted Links

Assorted Links links

Published: 2025-02-17

Assorted links for Monday, Febuary 17:

Time Bandit ChatGPT jailbreak bypasses safeguards on sensitive topics

A ChatGPT jailbreak flaw, dubbed “Time Bandit,” allows you to bypass OpenAI’s safety guidelines when asking for detailed instructions on sensitive topics, including the creation of weapons, information on nuclear topics, and malware creation.
A US Treasury Threat Intelligence Analysis Designates DOGE Staff as ‘Insider Threat’

An internal email reviewed by WIRED calls DOGE staff’s access to federal payments systems “the single greatest insider threat risk the Bureau of the Fiscal Service has ever faced.”
Optimizing for Developer Productivity Creates a Winning DevEx

Developer productivity is not about having 50 tools. It’s about improving experience, speed and productivity with the right kinds of tools.
Microsoft.Testing.Platform: Now Supported by All Major .NET Test Frameworks

Microsoft.Testing.Platform is a lightweight and portable alternative to VSTest for running tests in all contexts, including continuous integration (CI) pipelines, CLI, Visual Studio Test Explorer, and VS Code Text Explorer. The Microsoft.Testing.Platform is embedded directly in your test projects, and there’s no other app dependencies, such as vstest.console or dotnet test needed to run your tests.
OpenAI’s secret weapon against Nvidia dependence takes shape

OpenAI is entering the final stages of designing its long-rumored AI processor with the aim of decreasing the company’s dependence on Nvidia hardware, according to a Reuters report released Monday. The ChatGPT creator plans to send its chip designs to Taiwan Semiconductor Manufacturing Co. (TSMC) for fabrication within the next few months, but the chip has not yet been formally announced.

Tuesday 2025-01-21 Assorted Links

Assorted Links links

Published: 2025-01-21

Assorted links for Tuesday, January 21: