KV Cache Articles
Android On-device AI Memory Management: Model Loading Peaks, Tensor Lifetimes, and KV Cache Reclaim
A practical memory-management path for Android on-device LLM deployment, covering mmap model loading, tensor lifecycle reclamation, sliding-window KV cache, layer-wise decay, and LMK survival.
Read Post