MEMSYS 2025

Organizers

General Chair:
Bruce Jacob, Naval Academy

Program Chairs:
Abdel-Hameed Badawy, New Mexico State University

Publication Chair:
Wendy Elsasser, Rambus

Publicity Chair:
Chen Ding, University of Rochester

Web Chair:
Matthias Jung, University of Würzburg

Program Committee

Bruce Jacob (Naval Academy)
Abdel-Hameed Badawy (NMSU)
Atanu Barai (LANL)
Jonathan Beard (Google)
Vito Giovanni Castellana (PNNL)
Bruce Christenson (Intel)
Emanuele Confalonieri (Micron)
Chen Ding (U. Rochester)
David Donofrio (TCL)
Ronald Dreslinski (U. Michigan)
Wendy Elsasser (Rambus, Inc.)
Dietmar Fey (U. Erlangen-Nuremberg)
Maya Gokhale (LLNL)
Simon Hammond (DoE/NNSA)
Michael Jantz (U. Tennessee)
Matthias Jung (U. Würzburg)
John Leidel (TCL)
Andres Marquez (PNNL)
Dorin Patru (RIT)
Ivy Bo Peng (KTH)
Nirmal Prajapati (LANL)
Petar Radojković (BSC)
Marc Reichenbach (U. Rostock)
Arun Rodrigues (SNL)
Galen Shipman (LANL)
Abhishek Singh (Samsung)
Chirag Sudarshan (FZ Jülich)
Robert Trout (Sadram)
Thomas Vogelsang (Rambus)
Norbert Wehn (RPTU)
Kenneth Wright (AMD)
Ke Zhang (ICT)

Keynotes

Keynote 1: Algorithm-Driven Codesign of Specialized Architectures for Energy Efficient AI and HPC

John Shalf is the department head for computer science and computer architecture research at Lawrence Berkeley National Laboratory. He is also a Distinguished Lecturer for the IEEE Electronic Packaging Society. Prior to joining Berkeley Lab (25 years ago), he worked at the NCSA in Illinois and was a visiting scientist at the Albert Einstein Institute in Potsdam Germany where he co-created the Cactus Computational Toolkit for applications in General Relativity.

We are entering an era when improvements in energy efficiency for microelectronics is slowing down while simultaneously demand for AI computing is accelerating. The resulting energy crisis is taxing our electric power grid. This paper and associated talk will explore options for continuing performance growth while maintaining energy efficiency for the next generation of AI and HPC systems. As computing engines have advanced in speed and heterogeneity, memory has increasingly become the primary bottleneck and a key determinant of scalability and performance for HPC and AI applications. While emerging memory technologies have shown promise, we lack holistic hardware-software codesign tools to harness emerging memory innovations across a wide range of applications. With Moore’s Law and other traditional sources of performance scaling in decline, the computing industry is turning to heterogeneous accelerators and memory systems to extract specialization-driven gains. However, this extreme heterogeneity challenges current design, programming, and application methodologies, demanding new approaches to manage complexity. By adopting such strategies, future hardware and software can better support multiscale simulations and other demanding workloads, delivering greater performance, scalability, and energy efficiency for scientific computing.

Keynote 2: Tales from the Front Line of the AI Wars … a fireside chat

Kenneth Wright is Senior Technical Director of System Design Engineering at AMD, where he leads end-to-end design and deployment of Instinct™ GPU–based platforms for large-scale AI and HPC. Over three decades, he has bridged silicon, systems, and software—previously holding technical leadership roles at IBM and Rambus Labs—turning paper architectures into reliable production fleets that balance performance, resilience, and cost. Ken’s recent work spans multiple continents and environments, including KT/Moreh in Korea, ENI’s HPC6 on the TOP500 in Italy, sovereign-AI initiatives in the UAE and Saudi Arabia and a new AMD-powered AI facility in Grenoble, France, as well as LUMI in Finland and Pawsey’s Setonix in Australia. He holds 125+ patents, has published across industry and academia, and actively mentors rising technical leaders. A long-time member of the MEMSYS community—attending every year since 2017 and serving on the program committee—Ken’s perspective is unapologetically memory-first: making Flash → DRAM → HBM work in concert (with “memory in the network” along the path) is the difference between theoretical peak and delivered throughput.

In a seated conversation with Bruce Jacob, Kenneth Wright maps a minibatch’s journey through a modern AMD-based AI cluster—cloud/object landing → parallel file system on NVMe flash (VDURA Data Platform or WEKA) → front-end fabric → AMD EPYC™ host DRAM → AMD Instinct™ HBM → back-end fabric → checkpoints back to the PFS—showing why memory is the real limiter at scale. Expect pragmatic takeaways on tiering strategy, tokenizer locality in DRAM, separating storage and compute fabrics, checkpoint cadence that doesn’t crater step time, and how deep switch buffers (“memory in the network”) help keep HBM busy. Framed as an AMD Infinity Storage–centric data path, the chat draws examples from KT/Moreh, ENI HPC6, LUMI, Pawsey, and sovereign-AI rollouts across the Middle East and Europe.

Schedule

Monday, October 6th, 2025
6:00 PM – 8:00 PM	Conference Reception & Poster Session 🎉 – Hotel Lobby
Tuesday, October 7th, 2025
9:00 AM – 10:00 AM	Opening and Keynote I
10:00 AM – 10:20 AM	Coffee Break ☕
10:20 AM – 12:00 PM	Session 1: The CXL Ecosystem
10:20 AM – 10:40 AM	Roberto Gioiosa – “Hardware-Software Co-Development for Emerging CXL Architectures”
10:40 AM – 11:00 AM	Anusha Devulapally – “Revisiting Pebble Games for Modeling and Efficient Use of Disaggregated Memory Systems” 📽️ Slides
11:00 AM – 11:20 AM	Fnu Asad Ul Haq – “ZipCXL: CXL-based Main Memory Compression at Low Performance Penalty”
11:20 AM – 11:40 AM	Ellis Giles – “Hierarchical Framework for Multi-node Compute eXpress Link Memory Transactions”
11:40 AM – 12:00 PM	Chandrahas Tirumalasetty – “Exploring multi-level cache prefetching for fabric attached memory”
12:00 PM – 1:30 PM	Lunch Break & Poster Session 🥪
1:30 PM – 2:50 PM	Session 2: Emerging Technologies and Reliability
1:30 PM – 1:50 PM	Lunkai Zhang – “ECC Replay: A Practical and Efficient Scheme To Tolerate High Stuck Bit Rate in Future Memories” 📽️Slides
1:50 PM – 2:10 PM	Faaiq Waqar – “CMOS+X: Stacking Persistent Embedded Memories based on Oxide Transistors upon GPGPU Platforms” 📽️Slides
2:10 PM – 2:30 PM	Mohammad Rezaeifar – “A Neural Network Approach for Calibrating Memristor Crossbars” 📽️ Slides
2:30 PM – 2:50 PM	Sara Ameli – “Associative clustering with analog content-addressable memory”
2:50 PM – 3:10 PM	Coffee Break ☕
3:10 PM – 4:30 PM	Session 3: Optimizing DRAM: Controllers, Mapping, and Writes
3:10 PM – 3:30 PM	Divyansh Maura – “Precision Aware Bank Separated Data Placement for Enhanced DRAM Performance in Mixed-Precision HPCWorkloads”
3:30 PM – 3:50 PM	K Chitra – “Split Write DRAM: Reducing DRAM Access Latency by Mitigating Read-Write Interference”
3:50 PM – 4:10 PM	Andrei Rotaru – “A Mathematical Model for XOR-Based Application Specific DRAM Address Mapping Schemes” 📽️ Slides
4:10 PM – 4:30 PM	Abdelrhman Abotaleb – “Stream-Aware Intelligent Memory Controller through HW/SW Co-Design”
4:30 PM – 4:50 PM	Coffee Break ☕
4:50 PM – 6:05 PM	Special Session: Memory Security 🔒
4:50 PM – 5:05 PM	Bruce Jacob – “Security Session: The Case for Teaching Computer Hardware and Computer Security Together”
5:05 PM – 5:20 PM	William Casey – “Security Session: Extended Abstract: Opportunities and Challenges: Hardware Vulnerability descriptions with Hybrid Logic.”
5:20 PM – 5:35 PM	Avinash Srinivasan – “Security Session: Extended Abstract: Security and Forensics–Is Solid State Drive a Friend or a Foe?” 📽️ Slides
5:35 PM – 5:50 PM	Jennie Hill – “Security Session: Extended Abstract: A Side-channel Framework and Microarchitectural Analysis Application: Ransomware Detection with Hardware Performance Counters”
5:50 PM – 6:05 PM	Mehdi Elahi – “Security Session: Extended Abstract: On the Thermal Vulnerability of 3D-Stacked High-Bandwidth Memory Architectures”
6:05 PM – 7:05 PM	Panel: “Emerging Threats and Defenses in the Memory Hierarchy” Panelists: TBD (Session speakers and invited guests)
Wednesday, October 8th, 2025
9:00 AM – 10:00 AM	Keynote Address II
10:00 AM – 10:20 AM	Coffee Break ☕
10:20 AM – 11:40 AM	Session 4: PIM & AI Accelerators
10:20 AM – 10:40 AM	Ersin Cukurtas – “IMPRINT: In-Memory Processing with Indirect Addressing Techniques for GPU-hosted HBM-PIM”
10:40 AM – 11:00 AM	Alif Ahmed – “Late Breaking Results: TGN-PNM: A Near-Memory Architecture for Temporal GNN Inference on 3D-Stacked Memory” 📽️ Slides
11:00 AM – 11:20 AM	Morteza Baradaran – “Late Breaking Results: TriPIM — Exact Triangle Counting on UPMEM PIM for Graph Analytics” 📽️ Slides
11:20 AM – 11:40 AM	Md. Azahar Alam – “Scalable Analytical Memory Modeling of AI Accelerators”
11:40 AM – 1:10 PM	Lunch Break & Poster Session 🥪
1:10 PM – 2:30 PM	Session 5: Performance Modeling and Analysis Frameworks
1:10 PM – 1:30 PM	Elias Perdomo – “Memory Sandbox 2.0: A Framework for Enabling HBM2e vs HBM2 Performance and Telemetry Analysis on Xilinx FPGAs” 📽️ Slides
1:30 PM – 1:50 PM	Dhruv Gajaria – “HOME: A Hierarchy-Oriented Memory Evaluation Framework for Fast Contention Analysis”
1:50 PM – 2:10 PM	Abdur Razzak – “Static Estimation of Reuse Profiles for Arrays in Nested Loops”
2:10 PM – 2:30 PM	Gabin Schieffer – “A Deep Dive into Inter-APU Communication Efficiency on AMD MI300A Multi-APU Systems with Infinity Fabric Interconnect”
2:30 PM – 2:50 PM	Coffee Break ☕
2:50 PM – 4:10 PM	Session 6: Cache, Locality, and System Software
2:50 PM – 3:10 PM	J. Zach McMichael – “VMem: A Framework for Application Management of Physical Memory Resources”
3:10 PM – 3:30 PM	Fangzhou Liu – “Data Access Complexity: Monotonicity and Proportionality” 📽️ Slides
3:30 PM – 3:50 PM	Yanghui Wu – “Benchmarking Cache Programming Against Optimal Caching” 📽️ Slides
3:50 PM – 4:10 PM	Hyunwoo Kim – “Secure IVSHMEM: End-to-End Shared-Memory Protocol with Hypervisor-CA Handshake and In-Kernel Access Control” 📽️ Slides
4:10 PM – 4:30 PM	Coffee Break ☕
4:30 PM – 6:00 PM	Panel: Topic: “Memory Systems in the Age of AI: Challenges and Opportunities” Panelists: TBD
6:00 PM	Awards & Conference Closing Remarks 🏆👋