CS294-252: Architectures and Systems for Hyperscale Cloud Datacenters in the Era of Agentic AI

Name: CS294-252: Architectures and Systems for Hyperscale Cloud Datacenters in the Era of Agentic AI, Fall 2025, UC Berkeley
Author: Sagar Karandikar

Fall 2025, UC Berkeley

Location: Tuesdays and Thursdays from 2pm-3:30pm in 405 Soda

Course Overview: Warehouse-Scale Computers (WSCs) host hyperscale cloud services relied on by billions of daily users and power the latest advances in AI/ML, data processing, and web services. While classical WSCs were built as homogeneous collections of servers and networking hardware, modern hardware scaling trends and exponential increases in demand for AI/ML compute have necessitated the introduction of specialized hardware in datacenter environments, including ML accelerators and ML “supercomputer pods”, SmartNICs, GPUs, and custom server SoCs. The challenge of designing these HW/SW systems is vast and ever-growing, but also critical to enabling the continued advancement of AI-powered applications.

This graduate-level course will explore two major themes:

How do we architect hardware-software systems at scale to support efficient, practical, AI-powered application pipelines, end-to-end? (i.e. more than just the math)
How can AI help us wrangle complexity in designing these HW/SW systems to meet exponential demand, from chip to datacenter and beyond?

Prerequisites: Students must satisfy the following requirements to enroll:

Completion of at least one of: CS252, CS262, CS268, EECS251.

Completion of at least two of: CS152, EECS151, CS162, CS168.

Additionally, if you are an undergraduate, 5th-year master’s, or concurrent enrollment student, please fill out the following form to be considered for enrollment: https://forms.gle/qWfFdmeVUGK2PpJT9.

Calendar / Reading List

August 28: Intro to Warehouse-Scale Computers
Reading 1: L. Barroso, et. al. The Datacenter as a Computer, Third Edition.

September 2: Datacenter-Wide Trends and Workloads
Reading 1: S. Kanev, et. al. Profiling a Warehouse-Scale Computer.
Reading 2: W. Su, et. al. DCPerf: An Open-Source, Battle-Tested Performance Benchmark Suite for Datacenter Workloads.

September 4: Power Management
Reading 1: V. Sakalkar, et. al. Data Center Power Oversubscription with a Medium Voltage Power Plane and Priority-Aware Capping.
Reading 2: P. Patel, et. al. Characterizing Power Management Opportunities for LLMs in the Cloud.

September 9: WSC Networking
Reading 1: L. Poutievski, et. al. Jupiter Evolving: Transforming Google’s Datacenter Network via Optical Circuit Switches and Software-Defined Networking.
Reading 2: D. Firestone, et. al. Azure Accelerated Networking: SmartNICs in the Public Cloud.

September 11: Datacenter-Wide Trends and Workloads Pt. 2
Reading 1: J. Dean, et. al. The tail at scale. +
L. Barroso, et. al. Attack of the Killer Microseconds.
Reading 2: K. Seemakhupt, et. al. A Cloud-Scale Characterization of Remote Procedure Calls.

September 16: Accelerators in WSCs, Pt. 1
Reading 1: I. Magaki, et. al. ASIC Clouds: Specializing the Datacenter.
Reading 2: N. Jouppi, et. al. TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings.

September 18: Accelerators in WSCs, Pt. 2
Reading 1: A. Putnam, et. al. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services.
Reading 2: C. Zhao, et. al. Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures.

September 23: Accelerators in WSCs, Pt. 3
Reading 1: M. D. Hill, et. al. Accelerator-Level Parallelism. +
A. Saidi. Powering Amazon EC2: Deep dive on the AWS Nitro System.
Reading 2: S. Karandikar, et. al. A Hardware Accelerator for Protocol Buffers.

September 25: Agile Hardware Design at Scale
Reading 1: P. Ranganathan, et. al. Warehouse-scale video acceleration: co-design and deployment in the wild.
Reading 2: S. Karandikar, et. al. FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud.

September 30: Memory and Disaggregation, Pt. 1
Reading 1: J. Weiner, et. al. TMO: transparent memory offloading in datacenters.
Reading 2: K. Zhao, et. al. Contiguitas: The Pursuit of Physical Memory Contiguity in Datacenters.

October 2: Sustainability, Pt. 1
Reading 1: B. Acun, et. al. Carbon Explorer: A Holistic Framework for Designing Carbon Aware Datacenters.
Reading 2: I. Schneider, et. al. Life-Cycle Emissions of AI Hardware: A Cradle-To-Grave Approach and Generational Trends.

October 7: Project Proposal Presentations

October 9: Project Proposal Presentations, Pt. 2

October 14: Silent Data Corruption
Reading 1: H. D. Dixit, et. al. Silent Data Corruptions at Scale.
Reading 2: P. H. Hochschild, et. al. Cores that don’t count.

October 16: Memory and Disaggregation, Pt. 2
Reading 1: P. Duraisamy, et. al. Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale.
Reading 2: D. Berger, et. al. Octopus: Scalable Low-Cost CXL Memory Pooling.

October 21: Server Design
Reading 1: G. Ayers, et. al. Memory Hierarchy for Web Search.
Reading 2: A. Sriraman, et. al. SoftSKU: optimizing server architectures for microservice diversity @scale.

October 23: Sustainability, Pt. 2
Reading 1: C. Elsworth, et. al. Measuring the environmental impact of delivering AI at Google Scale.
Reading 2: J. Wang, et. al. Designing Cloud Servers for Lower Carbon.

October 28: Workloads, Pt. 2
Reading 1: M. Ferdman, et. al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware.
Reading 2: Y. Gan, et. al. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems.

October 30: Data Analytics, Pt. 1.
Reading 1: A. Gonzalez, et. al. Profiling Hyperscale Big Data Processing.

November 4: Project Lightning Status Updates

November 6: Data Analytics, Pt. 2
Reading 1: L. Wu, et. al. Q100: The Architecture and Design of a Database Processing Unit.
Reading 2: D. B. Johnston and A. Caldwell. AWS Redshift reimagined / AQUA Accelerator

November 11: Holiday/No Class

November 13: Operating Systems
Reading 1: J. T. Humphries, et. al. ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling.
Reading 2: J. T. Humphries, et. al. A case against (most) context switches.

November 18: Attend IAP/Berkeley AI Workshop

November 20: Cluster Management
Reading 1: A. Verma, et. al. Large-scale cluster management at Google with Borg.
Reading 2: C. Tang, et. al. Twine: A Unified Cluster Management System for Shared Infrastructure.

November 25: Project Lightning Status Updates
Remote attendance/presentation OK.

November 27: Holiday/No Class

December 2: Feedback-Directed Optimization
Reading 1: G. Ayers, et. al. AsmDB: understanding and mitigating front-end stalls in warehouse-scale computers.
Reading 2: Y. Zhang, et. al. OCOLOS: Online COde Layout OptimizationS.

December 4: Performance Monitoring
Reading 1: M. Chow, et. al. ServiceLab: Preventing Tiny Performance Regressions at Hyperscale through Pre-Production Testing.
Reading 2: D. Y. Yoon, et. al. FBDetect: Catching Tiny Performance Regressions at Hyperscale through In-Production Monitoring.

December 8 to 12: N/A (RRR Week)

December TBD: Final Project Presentations (Finals Week)

Weekly Schedule

Lecture/Discussion: Tuesdays and Thursdays from 2pm-3:30pm in 405 Soda
Weekly Reading Reviews: See Ed for submission links.
- Due Mondays @ noon pacific for Tuesday lecture papers.
- Due Wednesdays @ noon pacific for Thursday lecture papers.
Weekly Student Presenter Slides: Check your email for submission instructions.
- Due Fridays @ noon pacific for Tuesday lecture presentations.
- Due Tuesdays @ noon pacific for Thursday lecture presentations.

Assignments and Grading

The course workload will consist of the following:

25% of grade: Each class, students will be required to read and provide a review of the two papers for that day and attend and participate in the class discussion.
- Can drop two classes’ worth, no questions asked.
- After project proposal presentations take place (week of Oct 7 and 9), students are required to read and submit a review for only one paper per-class by the usual pre-class deadline and submit the second review during class based on in-class discussion.
25% of grade: Each student will lead the discussion of a few papers during the semester.
50% of grade: Students will complete a semester-long research project, in groups of 2 or 3, related to the course material.

Instructor

Prof. Sagar Karandikar

sagark@eecs.berkeley.edu

Office Hours: By appointment.