IBM at 2025 Symposium on VLSI Technology and Circuits
- Kyoto, Japan
About
The VLSI Symposium is a premier international conference on semiconductor technology and circuits, held from June 8th to 12th, 2025 in Kyoto, Japan. It brings together technologists and circuit designers at one venue. It offers an opportunity to interact and synergize on topics spanning the range from process technology to system-on-chip.
Why attend
We look forward to meeting you at the event and telling you more about our latest work. Our team will be presenting a series of papers in the main conference, short course lectures and workshop lectures within the conference.
Agenda
- Description:
Abstract High Performance Computing (HPC) systems and AI accelerators, while having different computing characteristics and applications, share a common need for higher performance, improved power efficiency, and flexible scalability. Chiplet and heterogeneous integration are gaining attention in both areas as technologies to meet these requirements. Chiplets optimize and partition functional blocks to manufacture chips, which improves chip yield and enables modularization of designs. On the other hand, heterogeneous integration, which integrates these separate chips in a single package at high density, enables flexible functional configurations by integrating heterogeneous devices and contributes to future system scalability. This presentation will start with the background and focus on the progress of 3D integration, which is the core of heterogeneous integration, and review the process, implementation issues, and experimental results of hybrid bonding technology, which enables higher density and lower power in die-to-die interconnections in particular.
Speaker: Katsuyuki Sakuma
- Description:
Abstract To achieve system-level benefits, compute-in-memory tiles need to be integrated into heterogeneous architectures alongside general and application-specific digital compute cores, together with a high-bandwidth and reconfigurable on-chip routing fabric that can deliver the right vectors to the right locations for just-in-time DNN compute. In the first part of my talk, I will review some of IBM’s work in developing weight-stationary analog compute cores with a focus on the design choices and optimizations for high tile efficiency. I will then provide a brief introduction to heterogeneous architectures for CIM systems followed by architectural studies of DNNs identifying auxiliary operations that bottleneck the performance. Finally, I will highlight the issue of achieving true weight-stationarity in large models such as Mixture-of-Expert (MoE) Transformer models, and the system-level benefits that such an architecture can achieve.
Speaker: Pritish Narayanan
- Description:
Abstract The data-intensive and highly parallel compute demands of AI models have driven the integration of specialized Neural Processing Units (NPUs) into System-on-Chip devices for edge AI applications. Analog In-Memory Computing (AIMC) offers a promising approach by co-locating memory and computation, enabling notable energy efficiency improvements. This talk will present an embedded NPU architecture for deep learning inference, tailored to meet the stringent energy, area, and cost constraints of edge AI. The heterogeneous architecture combines digital and analog accelerator nodes to support diverse operation types and precision requirements. AIMC tiles leveraging Phase-Change Memory (PCM) are employed for energy-efficient matrix-vector multiplications while supporting a high non-volatile on-chip weight capacity. Complementing this, a digital data path and programmable software cluster provide flexibility and enable end-to-end inference across multiple precision levels. The discussion will also address the challenge of preserving high accuracy in AIMC-based acceleration, focusing on offline training techniques and efficient mapping strategies.
Speaker: Irem Boybat
- Description:
Abstract The advent of large language models and generative AI has ushered enormous demand for hardware accelerators to perform AI training, fine-tuning, and inference. The design of such accelerators depends on holistic optimization of technology, circuits, and systems, but also fundamentally upon the models and use cases that this hardware needs to serve. Achieving the proper balance of compute vs. communication to optimize latency and throughput in AI workloads will require tradeoffs across the hardware/software stack to reconcile the long development cycles needed to build chips and systems with the torrid pace of innovation in AI models and algorithms. This talk will provide an overview of the landscape for AI hardware accelerators and discuss research roadmaps to improve both compute efficiency and communication bandwidth, particularly as Generative AI evolves towards Agentic AI and smaller, fit-for-purpose models.
Speakers:LCLeland ChangPrincipal Research Staff Member & Sr Manager, AI Hardware DesignIBM Research
- Description:
Abstract A 6+ GHz multi-port 10T Ground Rule Clean (GRC) compact Cache is implemented in the recently announced IBM Telum II processor. It features a Multi port design (2 Read and 1 Write) with fine grain banked architecture minimizing read and write collisions. The design is functional across various corner conditions without read and write assist circuits.
Authors: Rajiv Joshi, J. Davis, G. Fredeman, B. Yavoich, U. Srinivasan, R. Hayes, A. Pelella, Z. Chen, P. Bunce, D. Lee, I. Cervantes, G. Tverskoy, D. Leu, J. Pille, B. Huott and S. Kim
- Description:
Speaker: Dan Friedman
- Description:
Abstract NanoStack is a sequential stacking CMOS transistor architecture featuring flexible placement of top and bottom nanosheet channels, thermally stable bottom FET gate stack, thin dielectric bonding and more. We project NanoStack with 4-track base cells to deliver ~50% area scaling, ~50% iso-power performance improvement or ~70% iso-performance power reduction with respect to the 2nm node, fulfilling fundamental requirements for a competitive multi-node CMOS architecture beyond nanosheet. We demonstrate here for the first time a manufacturable sequential integration of multi-channel nanosheet-on-nanosheet NanoStack CMOS featuring ultra-scaled vertical inter-FET isolation.
Authors: Shay Reboh, C. Zhang, T. Yamashita, L. Wai Kin, R. Xie, T. Ando, U. Bajpai, J. Mazza, N. Lanzillo, E. Cho, H. Zhou, J. Strane, E. Miller, J. Satterlee, S. Fan, Y. Sulehria, M. Sankarapandian, S. Mochizuki, N. Shanker, R. Johnson, A. Chu, S. Khan, M. Malley, W. T. Tseng, R. Pujari, E. Stuckert, L. Tierney, J. Li, M. Belyansky, M. Nasseri, N. Putnam, A. Hubbard, D. Durrant, J. Fulham, D. Canaperi, S. Skordas, S.-C. Seo, O. Gluschenkov, M. Sherony, J. Wang, Y. Zhu, J. Arnold, J. Wynne, L. Meli, B. Peethala, J. Zhang, J. Tolbert, D. Dechene, G. Shahidi, D. Edelstein, R. Ramachandran, D. Guo, V. Narayanan, N. Felix, T. Standaert, H. Jagannathan, D.-K. Sohn, H. Bu and M. Khare
- Description:
Abstract In this work, we have developed a compressive SiN (c-SiN) diffusion break (DB) dielectric stressor for gate-all-around nanosheet (GAA-NS) transistor to improve the pFET device and reduce intrinsic nFET/pFET performance offset in this technology. A significant amount of stress in short channel (SC) Si pFET devices is induced through DB gate replacement using compressive SiN (c-SiN) with customized treatment. We report an additional stress in SC NS pFET devices post channel release of ~700MPa induced by cSiN DB stressors, leading to a corresponding Ieff-Ioff performance benefit of 25% on pFET logic devices at scaled CPP with no degradation of the short channel effects and reliability. As expected, the improvement is greater as the devices are closer to the c-SiN DB stressor with a strong dependence on the active length.
Authors: S.Hung, Shogo Mochizuki, A. Pal, X. He, C. Zhao, E. Bazizi, H. Zhou, J. Li, A. Tariq, A. Gasasira, V. Chen, H. Chen, A. Londono Calderon, B. Peethala, P. Anekal, N. Loubet, B. Colombeau and B. Haran
- Description:
Abstract A novel class of mechanically and electrically robust advanced low-k (ALK) dielectrics has been developed. These have far lower plasma-induced damage (PID), excellent built-in Cu oxidation and diffusion barrier performance, and fundamentally more reliable TDDB. One ALK recipe has been fully evaluated as the next generation low-k interlevel dielectric (ILD) for 2 nm and beyond Cu and post Cu dual damascene BEOL. The dense ALK (k=3.2) and lightly porous ALK (k=2.8-3.0) films have high modulus (E ~ 15-33 GPa), from
1/2 to2.4-2.55). The ALK Cu and diffusion barrier properties enable further scaling of metal barriers, to increase Cu line volumes, reducing R, RC while actually improving TDDB and EM. This is confirmed for 2 nm node Cu dual damascene1 and future subtractive Ru/ airgap scheme.1/10 the PID of typical dense SiCOH (k2.7-3.2) or pSiCOH (kAuthors: Son Nguyen, . Huang, A. Jog, M. Shoudy, N. Lanzillo, K. Luedders, T. Cabrera, Y. Yao, D. Metzler, C. Meagher, Y. Mignot, T. Nogami, M. Silvestre, A. Simon, L. Wangoh, K. Motoyama, C. Penny, D. Edelstein, K. Choi, S. Ghosh, V. Narayanan, A. Dutta and S. Choi
- Description:
Abstract The first CMOS current reference with a measured temperature coefficient across cryogenic temperatures is reported. Implemented in a 14nm FinFET technology, occupying 0.14mm2, and drawing 38uA from a 1.4V supply, the reference uses mutual compensation between a MOSFET gate-source voltage and thin-film resistance—a circuit technique that improves as cryogenic temperatures are approached—to achieve a temperature coefficient of 128ppm/K over 5.6-100K, as averaged over 5 dice from 3 wafers. The cryogenic supply sensitivity, at 0.06%/V, is 6x lower than the lowest reported among cryo-CMOS references, either current or voltage. Finally, cryogenic low-frequency noise is measured for the first time among cryo-CMOS references, either current or voltage.
Authors: Subhajit Ray, D. Frank, J. Bulzacchelli, B. Sadhu, K. Tien, M. Yeck, S. Lukashov, J. Timmerwilke, R. Robertazzi, D. Underwood, B. Gaucher and D. Friedman
- Description:
Abstract In this work we demonstrate cryogenic In HEMTs with highly scaled gate footprints, down to 380 x 40 nm2 for a single gate finger, and investigate the impact of footprint scaling on device performance. The 80% In channel devices show f_{MAX} together with a noise indication factor at 4 K, which is a record-high combination of high-frequency and low-noise performance. The performance is enabled by heterostructure engineering, resulting in ultra-low SS < 10 mV/decade $. These results show that cryogenic III-V HEMT technology can provide excellent performance at scaled footprints for readout in future high-density quantum systems.
Authors: Alberto Ferraris, E. Cha, A. Olziersky, M. Sousa, H.-C. Han, E. Charbon, K. Moselund, and C. Zota
Upcoming events
- —
Berkeley Innovation Forum 2025 at IBM Research
- San Jose, CA, USA
- —
IBM at SEMICON India 2025
- New Delhi , India
- —
IBM Quantum Developer Conference 2025
- Atlanta, Georgia, USA