CVE-2025-46570
Description
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.
Predictions
Heuristic predictions, AS-IS, for prioritization only.
Mitigations
No mitigations published for this CVE yet.
The vendor-content worker queues fetches as references arrive (check back in a few minutes). Or โ if you've already worked around this in production โ publish your fix to the community-verified tier.
โ Propose a mitigation on Community โ Mitigations published via the community go through AI scoring + 2 human reviewers + 7-day silent objection window before landing here withsource_tier=community-verified.
References
- https://github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85r
- https://nvd.nist.gov/vuln/detail/CVE-2025-46570
- https://github.com/vllm-project/vllm/pull/17045
- https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f
- https://github.com/pypa/advisory-database/tree/main/vulns/vllm/PYSEC-2025-53.yaml
- https://github.com/vllm-project/vllm
Community-verified mitigations for this CVE will appear above when contributors publish them.
Verify integrity in audit chain (admin only). AS-IS.