CS294-252: Architectures and Systems for Hyperscale Cloud Datacenters in the Era of Agentic AI
Fall 2025, UC Berkeley
Location: Tuesdays and Thursdays from 2pm-3:30pm in 405 Soda
Course Overview: Warehouse-Scale Computers (WSCs) host hyperscale cloud services relied on by billions of daily users and power the latest advances in AI/ML, data processing, and web services. While classical WSCs were built as homogeneous collections of servers and networking hardware, modern hardware scaling trends and exponential increases in demand for AI/ML compute have necessitated the introduction of specialized hardware in datacenter environments, including ML accelerators and ML “supercomputer pods”, SmartNICs, GPUs, and custom server SoCs. The challenge of designing these HW/SW systems is vast and ever-growing, but also critical to enabling the continued advancement of AI-powered applications.
This graduate-level course will explore two major themes:
- How do we architect hardware-software systems at scale to support efficient, practical, AI-powered application pipelines, end-to-end? (i.e. more than just the math)
- How can AI help us wrangle complexity in designing these HW/SW systems to meet exponential demand, from chip to datacenter and beyond?
Prerequisites: Students must satisfy the following requirements to enroll:
Completion of at least one of: CS252, CS262, CS268, EECS251.
OR
Completion of at least two of: CS152, EECS151, CS162, CS168.
Additionally, if you are an undergraduate, 5th-year master’s, or concurrent enrollment student, please fill out the following form to be considered for enrollment: https://forms.gle/qWfFdmeVUGK2PpJT9.
Calendar / Reading List
- August 28
- Intro to Warehouse-Scale Computers
- Reading 1
- L. Barroso, et. al. The Datacenter as a Computer, Third Edition.
- September 2
- Datacenter-Wide Trends and Workloads
- Reading 1
- S. Kanev, et. al. Profiling a Warehouse-Scale Computer.
- Reading 2
- W. Su, et. al. DCPerf: An Open-Source, Battle-Tested Performance Benchmark Suite for Datacenter Workloads.
- September 4
- Power Management
- Reading 1
- V. Sakalkar, et. al. Data Center Power Oversubscription with a Medium Voltage Power Plane and Priority-Aware Capping.
- Reading 2
- P. Patel, et. al. Characterizing Power Management Opportunities for LLMs in the Cloud.
- September 9
- WSC Networking
- Reading 1
- L. Poutievski, et. al. Jupiter Evolving: Transforming Google’s Datacenter Network via Optical Circuit Switches and Software-Defined Networking.
- Reading 2
- D. Firestone, et. al. Azure Accelerated Networking: SmartNICs in the Public Cloud.
- September 11
- Datacenter-Wide Trends and Workloads Pt. 2
- Reading 1
- J. Dean, et. al. The tail at scale. +
L. Barroso, et. al. Attack of the Killer Microseconds. - Reading 2
- K. Seemakhupt, et. al. A Cloud-Scale Characterization of Remote Procedure Calls.
- September 16
- Accelerators in WSCs, Pt. 1
- Reading 1
- I. Magaki, et. al. ASIC Clouds: Specializing the Datacenter.
- Reading 2
- N. Jouppi, et. al. TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings.
- September 18
- Accelerators in WSCs, Pt. 2
- Reading 1
- A. Putnam, et. al. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services.
- Reading 2
- C. Zhao, et. al. Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures.
- September 23
- Accelerators in WSCs, Pt. 3
- Reading 1
- M. D. Hill, et. al. Accelerator-Level Parallelism. +
A. Saidi. Powering Amazon EC2: Deep dive on the AWS Nitro System. - Reading 2
- S. Karandikar, et. al. A Hardware Accelerator for Protocol Buffers.
- September 25
- Agile Hardware Design at Scale
- Reading 1
- P. Ranganathan, et. al. Warehouse-scale video acceleration: co-design and deployment in the wild.
- Reading 2
- S. Karandikar, et. al. FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud.
- September 30
- Memory and Disaggregation, Pt. 1
- Reading 1
- J. Weiner, et. al. TMO: transparent memory offloading in datacenters.
- Reading 2
- K. Zhao, et. al. Contiguitas: The Pursuit of Physical Memory Contiguity in Datacenters.
- October 2
- Sustainability, Pt. 1
- Reading 1
- B. Acun, et. al. Carbon Explorer: A Holistic Framework for Designing Carbon Aware Datacenters.
- Reading 2
- I. Schneider, et. al. Life-Cycle Emissions of AI Hardware: A Cradle-To-Grave Approach and Generational Trends.
- October 7
- Project Proposal Presentations
- October 9
- Project Proposal Presentations, Pt. 2
- October 14
- Silent Data Corruption
- Reading 1
- H. D. Dixit, et. al. Silent Data Corruptions at Scale.
- Reading 2
- P. H. Hochschild, et. al. Cores that don’t count.
- October 16
- Memory and Disaggregation, Pt. 2
- Reading 1
- P. Duraisamy, et. al. Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale.
- Reading 2
- D. Berger, et. al. Octopus: Scalable Low-Cost CXL Memory Pooling.
- October 21
- Server Design
- Reading 1
- G. Ayers, et. al. Memory Hierarchy for Web Search.
- Reading 2
- A. Sriraman, et. al. SoftSKU: optimizing server architectures for microservice diversity @scale.
- October 23
- Sustainability, Pt. 2
- Reading 1
- C. Elsworth, et. al. Measuring the environmental impact of delivering AI at Google Scale.
- Reading 2
- J. Wang, et. al. Designing Cloud Servers for Lower Carbon.
- October 28
- Workloads, Pt. 2
- Reading 1
- M. Ferdman, et. al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware.
- Reading 2
- Y. Gan, et. al. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems.
- October 30
- Data Analytics, Pt. 1.
- Reading 1
- A. Gonzalez, et. al. Profiling Hyperscale Big Data Processing.
- November 4
- Project Lightning Status Updates
- November 6
- Data Analytics, Pt. 2
- Reading 1
- L. Wu, et. al. Q100: The Architecture and Design of a Database Processing Unit.
- Reading 2
- D. B. Johnston and A. Caldwell. AWS Redshift reimagined / AQUA Accelerator
- November 11
- Holiday/No Class
- November 13
- Operating Systems
- Reading 1
- J. T. Humphries, et. al. ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling.
- Reading 2
- J. T. Humphries, et. al. A case against (most) context switches.
- November 18
- Attend IAP/Berkeley AI Workshop
- November 20
- Cluster Management
- Reading 1
- A. Verma, et. al. Large-scale cluster management at Google with Borg.
- Reading 2
- C. Tang, et. al. Twine: A Unified Cluster Management System for Shared Infrastructure.
- November 25
- Project Lightning Status Updates
Remote attendance/presentation OK.
- November 27
- Holiday/No Class
- December 2
- Feedback-Directed Optimization
- Reading 1
- G. Ayers, et. al. AsmDB: understanding and mitigating front-end stalls in warehouse-scale computers.
- Reading 2
- Y. Zhang, et. al. OCOLOS: Online COde Layout OptimizationS.
- December 4
- Performance Monitoring
- Reading 1
- M. Chow, et. al. ServiceLab: Preventing Tiny Performance Regressions at Hyperscale through Pre-Production Testing.
- Reading 2
- D. Y. Yoon, et. al. FBDetect: Catching Tiny Performance Regressions at Hyperscale through In-Production Monitoring.
- December 8 to 12
- N/A (RRR Week)
- December TBD
- Final Project Presentations (Finals Week)
Weekly Schedule
- Lecture/Discussion: Tuesdays and Thursdays from 2pm-3:30pm in 405 Soda
- Weekly Reading Reviews: See Ed for submission links.
- Due Mondays @ noon pacific for Tuesday lecture papers.
- Due Wednesdays @ noon pacific for Thursday lecture papers.
- Weekly Student Presenter Slides: Check your email for submission instructions.
- Due Fridays @ noon pacific for Tuesday lecture presentations.
- Due Tuesdays @ noon pacific for Thursday lecture presentations.
Assignments and Grading
The course workload will consist of the following:
- 25% of grade: Each class, students will be required to read and provide a review of the two papers for that day and attend and participate in the class discussion.
- Can drop two classes’ worth, no questions asked.
- After project proposal presentations take place (week of Oct 7 and 9), students are required to read and submit a review for only one paper per-class by the usual pre-class deadline and submit the second review during class based on in-class discussion.
- 25% of grade: Each student will lead the discussion of a few papers during the semester.
- 50% of grade: Students will complete a semester-long research project, in groups of 2 or 3, related to the course material.
