KV Cache Articles

Android On-device AI Memory Management: Model Loading Peaks, Tensor Lifetimes, and KV Cache Reclaim

A practical memory-management path for Android on-device LLM deployment, covering mmap model loading, tensor lifecycle reclamation, sliding-window KV cache, layer-wise decay, and LMK survival.