Research

The article introduces SwiftKV Attention, an efficient algorithm designed for low-latency attention inference on edge accelerators, which processes each token in a single pass without resource-intensive operations. It also presents the SwiftKV-MHA accelerator, capable of high precision attention and low precision GEMV, facilitating fast multi-head parallel decoding.

InterPUF: Distributed Authentication via Physically Unclonable Functions and Multi-party Computation for Reconfigurable Interposers

The article introduces InterPUF, a new authentication framework designed for reconfigurable interposers in system-in-package platforms. It addresses trust challenges in decentralized environments by embedding a physically unclonable function and utilizing multi-party computation for secure authentication.

RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs

The article introduces RidgeWalker, a high-performance accelerator for Graph Random Walks (GRWs) on datacenter FPGAs, addressing challenges like data dependencies and irregular memory access. It highlights a novel asynchronous pipeline architecture that enables perfect pipelining and adaptive load balancing through a feedback-driven scheduler.

OpenACM: An Open-Source SRAM-Based Approximate CiM Compiler

OpenACM is introduced as the first open-source compiler designed for SRAM-based approximate Digital Compute-in-Memory (DCiM) architectures, addressing the challenges posed by the memory wall in data-intensive AI workloads. It enables designers to optimize energy consumption through accuracy-configurable multipliers, facilitating fine-grained accuracy-energy trade-offs.

IMS: Intelligent Hardware Monitoring System for Secure SoCs

The article discusses vulnerabilities in the AXI protocol used in modern SoCs, which can lead to denial-of-service attacks. It introduces an Intelligent Hardware Monitoring System (IMS) that uses neural networks for real-time detection of these protocol violations with high accuracy.

Mugi: Value Level Parallelism For Efficient LLMs

The paper introduces Value Level Parallelism (VLP) as a method to enhance the efficiency of large language models (LLMs) by improving general matrix multiply operations. It also presents a new architecture, Mugi, which incorporates these innovations to optimize LLM performance and accuracy.

CarbonClarity: Understanding and Addressing Uncertainty in Embodied Carbon for Sustainable Computing

The article discusses the limitations of existing embodied carbon footprint models in the semiconductor industry, which fail to incorporate uncertainties in the supply chain. It introduces CarbonClarity, a probabilistic framework that allows for better modeling of these uncertainties and their impact on emissions.

NASA Demolishes Historic Test Stands That Built the Space Age

NASA has demolished two historic test stands at the Marshall Space Flight Center, marking the end of an era in U.S. space exploration. These structures were instrumental in the development of technologies that contributed to human spaceflight, including the Apollo missions.

Are There Enough Engineers for the AI Boom?

Jan 17, 2026

The demand for power in U.S. AI data centers is projected to reach 106 gigawatts by 2035, marking a significant increase from earlier forecasts. However, the growth is threatened by shortages of engineers and skilled labor necessary for construction and operation.

IEEE Medal of Honor Recipient Is Nvidia’s CEO Jensen Huang

Jan 16, 2026

Jensen Huang, cofounder and CEO of Nvidia, has been awarded the 2026 IEEE Medal of Honor for his contributions to the development of graphics processing units (GPUs) and their applications in scientific computing and artificial intelligence. The announcement was made by IEEE's president on January 6, 2026.

Video Friday: Bipedal Robot Stops Itself From Falling

Jan 16, 2026

The article features a weekly roundup of robotics videos, highlighting advancements in robotics technology. It also promotes the upcoming ICRA 2026 event in Vienna, encouraging community engagement.

Challenges and Research Directions for Large Language Model Inference Hardware

The article discusses the unique challenges of Large Language Model (LLM) inference, particularly focusing on memory and interconnect issues rather than compute power. It proposes four architectural research opportunities to enhance memory capacity and communication speed in datacenter AI applications.

Architectural Classification of XR Workloads: Cross-Layer Archetypes and Implications

The paper addresses the increasing complexity of workloads in extended reality (XR) applications, which require ultra-low-latency performance while managing power and area constraints. It proposes a cross-layer architectural classification to better understand and optimize these diverse workloads beyond traditional CNN-centric approaches.

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

The article discusses enhancements in attention computation efficiency through the use of FP4 Tensor Cores in Blackwell GPUs, achieving significant speedups in inference. It also introduces an innovative 8-bit attention mechanism for training tasks, demonstrating lossless performance.

2

FaTRQ: Tiered Residual Quantization for LLM Vector Search in Far-Memory-Aware ANNS Systems

The article discusses FaTRQ, a new system designed to enhance Approximate Nearest-Neighbor Search (ANNS) by eliminating the need to fetch full-precision vectors from slower storage. It introduces a tiered memory approach that utilizes compact residuals for faster refinement of candidate scores.

Enhancing LUT-based Deep Neural Networks Inference through Architecture and Connectivity Optimization

The paper introduces SparseLUT, a framework designed to optimize LUT-based deep neural networks for edge devices by addressing LUT size growth and connectivity issues. It achieves significant reductions in LUT consumption and inference latency while maintaining accuracy through architectural enhancements and a novel training algorithm.

How to Gain Footing in AI as the Ground Keeps Shifting

The IEEE Computer Society's guide emphasizes that a successful career in artificial intelligence relies on a blend of technical skills and human-centered capabilities rather than solely on mastering specific tools. This approach is crucial as AI continues to rapidly evolve and reshape the job market.

Lessons for Your Career From 2025

The article reflects on the top career advice shared in the previous year, aimed at helping individuals either secure new jobs or excel in their current positions. It emphasizes the importance of practical strategies for career advancement in the tech industry.

Annotated PIM Bibliography

Jan 14, 2026

The article discusses the growing interest in Processing in Memory (PIM) technologies, which have been around for over 60 years. It aims to provide an annotated bibliography to support a forthcoming article on the subject.