Performance

Performance benchmarks

577 articles

ArXiv Hardware

SwiftKV: An Edge-Oriented Attention Algorithm and Multi-Head Accelerator for Fast, Efficient LLM Decoding

Jan 18, 2026

The article introduces SwiftKV Attention, an efficient algorithm designed for low-latency attention inference on edge accelerators, which processes each token in a single pass without resource-intensive operations. It also presents the SwiftKV-MHA accelerator, capable of high precision attention and low precision GEMV, facilitating fast multi-head parallel decoding.

Page 1 of 29 Next →