Program

Keynote 2:  Challenges and Opportunities in Memory Systems for AI Accelerators

Demand for processors with very high-bandwidth memory systems has exploded in concert with the rapid advances in deep-learning and artificial intelligence.   Within a decade, we can expect processors that require a memory system capable of delivering 100 terabytes per second from over 1 terabyte of capacity in less than 1 kilowatt. This simultaneous need to push the envelope for very high bandwidth at very low per-access energy to a large pool of data creates many challenges.  This talk will detail some of these difficulties, and discuss some of the approaches architects and memory designers might take to address them. 

Mike O’Connor manages the Memory Architecture Research Group at NVIDIA.  His group is responsible for future DRAM and memory system architecture research.  In a prior role at NVIDIA, he was the memory system architecture lead for several generations of NVIDIA GPUs. Mike’s career has also included positions at AMD, Texas Instruments, Silicon Access Networks (a network-processor startup), Sun Microsystems, and IBM.  At AMD, he drove much of the architectural definition for the High-Bandwidth Memory (HBM) specification. Mike has a BSEE from Rice University and an MSEE & PhD from the University of Texas at Austin. 

Here you can see the final program:

Monday, September 30thTuesday, October 1stWednesday, October 2ndThursday, October 3rd
8:00BreakfastBreakfast Breakfast 
8:50Opening RemarksClosing Remarks and Award Ceremony
SessionKeynoteKeynote
9:00TBA
Challenges and Opportunities in Memory Systems for AI Accelerators

Michael O’Connor, NVIDIA
9:20
9:40
10:00BreakBreak
Session1: Processing in Memory5: CXL
10:30PIM-Potential: Broadening Acceleration Reach of PIM ArchitecturesContention aware DRAM caching for CXL-enabled pooled memory
10:50Pimacolaba: Collaborative Acceleration for FFT on Commercial Processing-In-Memory ArchitecturesPerformance Study of CXL Memory Topology
11:10PIMSys: A Virtual Prototype for Processing in MemorySynchronization for CXL Based Memory
11:30Sadram Arithmetic in C++Programming the Future: the Essential Role of System Topology Awareness in Heterogeneous Disaggregated Environments
12:00LunchLunch
Session2: Architecture6: HPC and Accellerators
13:00Characterization and Design of 3D-Stacked Memory for Image Signal Processing on AR/VR DevicesUsing Isoefficiency as a Metric to Assess Disaggregated Memory Systems for High Performance Computing
13:20Data Prefetching on Processors with Heterogeneous MemoryStudying CPU and memory utilization of applications on Fujitsu A64FX and Nvidia Grace Superchip
13:40UpDown: A Novel Architecture for Unlimited Memory ParallelismA comparison of modern memory management schemes in HPC
14:00SMS: Solving Many-sided RowHammerTo Cache or not to Cache? Exploring the Design Space of Tunable, HLS-generated Accelerators.
14:20BreakBreak
Session3: Non-Volatile Memories7: Applications
14:40PROLONG: Priority based Write Bypassing Technique for Longer Lifetime in STT-RAM based LLCA Workflow for the Synthesis of Irregular Memory Access Microbenchmarks
15:00CARDR: DRAM Cache Assisted Ransomware Detection and Recovery in SSDsStatic Reuse Profile Estimation for Stencil Applications
15:20ZipCache: A DRAM/SSD Cache with Built-in Transparent CompressionMemory Efficiency Oriented Fine-Grain Representation and Optimization of FFT
15:40BreakBreak
Session4: Caches I8: Caches II
16:00Measuring Data Access Latency in Large CPU CachesHybrid Cache Design Under Varying Power Supply Stability – A Comparative Study
16:20Implementation of a Two-Level Programmable Cache Emulation and Test SystemMemFriend: Understanding Memory Performance with Spatial-Temporal Affinity
17:00Spirited DiscussionSpirited Discussion
19:00Welcome Reception and Poster SessionTPC DinnerConference Dinner