
Assorted links for Monday, May 26:
- How Meta understands data at scale
- Managing and understanding large-scale data ecosystems is a significant challenge for many organizations, requiring innovative solutions to efficiently safeguard user data. Meta’s vast and diverse systems make it particularly challenging to comprehend its structure, meaning, and context at scale.
- To address these challenges, we made substantial investments in advanced data understanding technologies, as part of our Privacy Aware Infrastructure (PAI). Specifically, we have adopted a “shift-left” approach, integrating data schematization and annotations early in the product development process. We also created a universal privacy taxonomy, a standardized framework providing a common semantic vocabulary for data privacy management across Meta’s products that ensures quality data understanding and provides developers with reusable and efficient compliance tooling.
- Efficiently and Elegantly Modeling Embeddings in Azure SQL and SQL Server
Storing and querying text embeddings in a database it might seem challenging, but with the right schema design, it’s not only possible, it’s powerful. Whether you’re building AI-powered search, semantic filtering, or recommendation features, embeddings, and thus vectors, are now a first-class data type. So how do you model them well inside a database like SQL Server and Azure SQL?
- Avoid T-SQL anti-patterns with the free T-SQL analysis tool
T-SQL Analyzer is a free, open-source, cross platform command line tool for identifying, and reporting the presence of anti-patterns and design issues in SQL Server T-SQL scripts.
- Surprising Scalability of Multitenancy
Roughly speaking, the cost of a system scales with its (short-term1) peak traffic, but for most applications the value the system generates scales with the (long-term) average traffic.
The gap between “paying for peak” and “earning on average” is critical to understand how the economics of large-scale cloud systems differ from traditional single-tenant systems.
- FAST ‘23 - Building and Operating a Pretty Big Storage System (My Adventures in Amazon S3)