
Organizers
General Chair:
Bruce Jacob, Naval Academy
Program Chairs:
Abdel-Hameed Badawy, New Mexico State University
Publication Chair:
Wendy Elsasser, Rambus
Publicity Chair:
Chen Ding, University of Rochester
Web Chair:
Matthias Jung, University of Würzburg
Program Committee
- Bruce Jacob (Naval Academy)
- Abdel-Hameed Badawy (NMSU)
- Atanu Barai (LANL)
- Jonathan Beard (Google)
- Vito Giovanni Castellana (PNNL)
- Bruce Christenson (Intel)
- Emanuele Confalonieri (Micron)
- Chen Ding (U. Rochester)
- David Donofrio (TCL)
- Ronald Dreslinski (U. Michigan)
- Wendy Elsasser (Rambus, Inc.)
- Dietmar Fey (U. Erlangen-Nuremberg)
- Maya Gokhale (LLNL)
- Simon Hammond (DoE/NNSA)
- Michael Jantz (U. Tennessee)
- Matthias Jung (U. Würzburg)
- John Leidel (TCL)
- Andres Marquez (PNNL)
- Dorin Patru (RIT)
- Ivy Bo Peng (KTH)
- Nirmal Prajapati (LANL)
- Petar Radojković (BSC)
- Marc Reichenbach (U. Rostock)
- Arun Rodrigues (SNL)
- Galen Shipman (LANL)
- Abhishek Singh (Samsung)
- Chirag Sudarshan (FZ Jülich)
- Robert Trout (Sadram)
- Thomas Vogelsang (Rambus)
- Norbert Wehn (RPTU)
- Kenneth Wright (AMD)
- Ke Zhang (ICT)
Keynotes
Keynote 1: Algorithm-Driven Codesign of Specialized Architectures for Energy Efficient AI and HPC

John Shalf is the department head for computer science and computer architecture research at Lawrence Berkeley National Laboratory. He is also a Distinguished Lecturer for the IEEE Electronic Packaging Society. Prior to joining Berkeley Lab (25 years ago), he worked at the NCSA in Illinois and was a visiting scientist at the Albert Einstein Institute in Potsdam Germany where he co-created the Cactus Computational Toolkit for applications in General Relativity.
We are entering an era when improvements in energy efficiency for microelectronics is slowing down while simultaneously demand for AI computing is accelerating. The resulting energy crisis is taxing our electric power grid. This paper and associated talk will explore options for continuing performance growth while maintaining energy efficiency for the next generation of AI and HPC systems. As computing engines have advanced in speed and heterogeneity, memory has increasingly become the primary bottleneck and a key determinant of scalability and performance for HPC and AI applications. While emerging memory technologies have shown promise, we lack holistic hardware-software codesign tools to harness emerging memory innovations across a wide range of applications. With Moore’s Law and other traditional sources of performance scaling in decline, the computing industry is turning to heterogeneous accelerators and memory systems to extract specialization-driven gains. However, this extreme heterogeneity challenges current design, programming, and application methodologies, demanding new approaches to manage complexity. By adopting such strategies, future hardware and software can better support multiscale simulations and other demanding workloads, delivering greater performance, scalability, and energy efficiency for scientific computing.
Keynote 2: Tales from the Front Line of the AI Wars … a fireside chat

Kenneth Wright is Senior Technical Director of System Design Engineering at AMD, where he leads end-to-end design and deployment of Instinct™ GPU–based platforms for large-scale AI and HPC. Over three decades, he has bridged silicon, systems, and software—previously holding technical leadership roles at IBM and Rambus Labs—turning paper architectures into reliable production fleets that balance performance, resilience, and cost. Ken’s recent work spans multiple continents and environments, including KT/Moreh in Korea, ENI’s HPC6 on the TOP500 in Italy, sovereign-AI initiatives in the UAE and Saudi Arabia and a new AMD-powered AI facility in Grenoble, France, as well as LUMI in Finland and Pawsey’s Setonix in Australia. He holds 125+ patents, has published across industry and academia, and actively mentors rising technical leaders. A long-time member of the MEMSYS community—attending every year since 2017 and serving on the program committee—Ken’s perspective is unapologetically memory-first: making Flash → DRAM → HBM work in concert (with “memory in the network” along the path) is the difference between theoretical peak and delivered throughput.
In a seated conversation with Bruce Jacob, Kenneth Wright maps a minibatch’s journey through a modern AMD-based AI cluster—cloud/object landing → parallel file system on NVMe flash (VDURA Data Platform or WEKA) → front-end fabric → AMD EPYC™ host DRAM → AMD Instinct™ HBM → back-end fabric → checkpoints back to the PFS—showing why memory is the real limiter at scale. Expect pragmatic takeaways on tiering strategy, tokenizer locality in DRAM, separating storage and compute fabrics, checkpoint cadence that doesn’t crater step time, and how deep switch buffers (“memory in the network”) help keep HBM busy. Framed as an AMD Infinity Storage–centric data path, the chat draws examples from KT/Moreh, ENI HPC6, LUMI, Pawsey, and sovereign-AI rollouts across the Middle East and Europe.