CVE-2026-7482 critical Patched AI Draft

Ollama GGUF Model Loader Heap Out-of-Bounds Read — Bleeding Llama (CVE-2026-7482)

CVE CVE-2026-7482 Platform Ollama (all versions before 0.17.1) Type Heap out-of-bounds read / remote memory disclosure

Severity CRITICAL

Status Patched

Zero-Day No

Disclosed May 4, 2026

Patched May 4, 2026

Researcher Cyera Research

CISA KEV Not Listed

CVE-2026-7482 is a heap out-of-bounds read vulnerability (CWE-125) in Ollama’s GGUF model loader, publicly named Bleeding Llama by Cyera Research. The flaw allows an unauthenticated attacker to submit a specially crafted GGUF file via the /api/create API endpoint, triggering an out-of-bounds heap read during quantization and potentially exfiltrating heap contents—including environment variables, API keys, system prompts, and in-flight conversation data—via the /api/push endpoint. Ollama 0.17.1 addresses the vulnerability. No exploitation in the wild has been confirmed; the CISA Advisory Data Program SSVC scores exploitation as none, and the vulnerability is not listed in the CISA Known Exploited Vulnerabilities catalog at time of publication.

Severity Assessment

Exploitability: 8/10
Impact: 8/10
Weaponization Risk: 6/10
Patch Urgency: 8/10
Detection Coverage: 4/10

Exploitability (8/10): Exploitation requires no authentication and no user interaction; an attacker sends HTTP API requests directly to the Ollama server. Attack complexity is low. The primary access constraint is network reachability: default Ollama installations bind to 127.0.0.1 and are not accessible over the network. The elevated score reflects that documented production configurations using OLLAMA_HOST=0.0.0.0—common in shared-inference, containerized, or hosted deployment patterns—expose the endpoint over a network interface without any built-in authentication. CVSS v3.1 score is 9.1 CRITICAL and CVSS v4.0 score is 8.8 HIGH per CVE.org and the National Vulnerability Database.

Impact (8/10): Successful exploitation discloses heap memory contents from the Ollama server process. Exposed data can include environment variables carrying API keys, OAuth tokens, and other secrets; system prompts from loaded models; and in-flight conversation data from concurrent user sessions. The vulnerability does not directly yield code execution, but credential exposure can enable further compromise of integrated services. CVSS v3.1 confidentiality impact is rated High; integrity and availability impacts are rated None.

Weaponization Risk (6/10): Cyera Research published a detailed technical write-up including the GGUF shape-field manipulation primitive and the three-request exfiltration sequence via /api/push, providing sufficient technical grounding for targeted exploitation. The unauthenticated HTTP vector and absence of user interaction requirements lower the operational barrier for network-exposed targets. No public proof-of-concept exploit has been confirmed at time of publication.

Patch Urgency (8/10): Ollama 0.17.1 is available and addresses the vulnerability. Organizations running Ollama in network-exposed configurations are at elevated risk of credential and conversation data disclosure. The sensitivity of potentially exposed data—API keys, system prompts, and multi-user conversation content—warrants prompt upgrade. Default single-host deployments binding to 127.0.0.1 are not directly reachable from the network without separate exposure.

Detection Coverage (4/10): The out-of-bounds read occurs within normal GGUF processing code paths. Malicious /api/create requests are structurally similar to legitimate model upload operations; distinguishing them at the network layer requires inspection of GGUF tensor metadata for offset and size anomalies. No public detection signatures or behavioral rules specific to CVE-2026-7482 exploitation have been published at time of writing.

Summary

CVE-2026-7482 affects all Ollama releases before version 0.17.1. Ollama is an open-source local inference server that loads, quantizes, and serves large language models encoded in GGUF format. The GGUF format stores model tensor data alongside metadata fields that declare tensor names, shapes, byte offsets, and sizes within the file.

The vulnerability arises from insufficient validation of tensor offset and size values in the GGUF ingestion path. When the /api/create endpoint processes a submitted GGUF file, it passes tensor metadata to a quantization pipeline in ggml/gguf.go and server/quantization.go. If the file declares tensor offsets or sizes that exceed the actual file length, the WriteTo() operation in the quantization path reads past the end of the heap-allocated GGUF buffer—a CWE-125 out-of-bounds read. Heap contents adjacent to the GGUF buffer, which may include environment variable strings, loaded API keys, system prompt data, and in-flight conversation data from concurrent sessions, are incorporated into the quantized model artifact produced by this operation. An attacker can then push this artifact to an attacker-controlled Ollama registry using the /api/push endpoint and extract the embedded heap data from the retrieved artifact.

The /api/create endpoint accepts requests without authentication. Default Ollama installations bind to 127.0.0.1, restricting access to the local host. Network exposure arises when operators configure OLLAMA_HOST=0.0.0.0 or bind the server to another network interface, a documented and commonly used configuration for shared or hosted inference deployments.

Affected versions:

Ollama: all releases before 0.17.1

Fixed version:

Ollama 0.17.1 (fix commit 88d57d0483cca907e0b23a968c83627a20b21047, PR #14406)

Exploitation status: No exploitation in the wild has been confirmed. CISA Advisory Data Program SSVC scores exploitation as none. The vulnerability is not included in the CISA Known Exploited Vulnerabilities catalog at time of publication.

Exploit Chain

Stage 1: Crafted GGUF File Construction

An attacker constructs a GGUF model file with deliberately malformed tensor metadata. The GGUF format stores tensor shape, offset, and size values as integer fields in the file header; the attacker sets one or more of these fields to values that exceed the file’s actual byte length. In affected Ollama versions, the GGUF loader does not validate that declared tensor extents fall within the file boundary before initiating quantization. Per Cyera Research, placing an abnormally large value in the shape field causes the tensor conversion loop to iterate past the end of the allocated buffer, reading heap memory beyond the GGUF data region.

Stage 2: Out-of-Bounds Read via `/api/create`

The attacker submits the crafted GGUF file to the Ollama server’s /api/create endpoint. This endpoint accepts a model file or blob reference and initiates quantization. During quantization, the WriteTo() function in ggml/gguf.go and server/quantization.go processes tensor data according to the header-declared metadata. Because the declared offset and size values exceed the actual file length, the operation reads past the end of the heap-allocated GGUF buffer. Heap contents adjacent to this buffer—which may include environment variable strings, API keys accessible to the Ollama process, system prompts from loaded model configurations, and conversation data from concurrently active user sessions—are incorporated into the quantized model artifact. Cyera Research notes that the F16-to-F32 quantization conversion used in this path preserves byte values rather than discarding them, so data read from adjacent heap memory is carried through the conversion into the output artifact intact.

Attacker → POST /api/create (crafted GGUF with oversized tensor offsets)
                        |
           GGUF tensor metadata parsed
           Declared offset/size > file length
                        |
           quantization WriteTo() reads past
           heap buffer boundary (CWE-125)
                        |
           Adjacent heap contents embedded
           in quantized model artifact
           (env vars, API keys, prompts,
           conversation data)

Stage 3: Exfiltration via `/api/push`

The attacker directs the Ollama server to push the quantized artifact containing embedded heap data to an attacker-controlled model registry using the /api/push endpoint. The attacker retrieves the artifact from the controlled registry and extracts the heap contents from within it. The three-step sequence—craft, create, push—requires no authentication on network-exposed deployments and no interaction from legitimate users of the server.

Detection Guidance

Signal	Indicator	Confidence
`/api/push` to unfamiliar or external registry hostnames	Ollama API log entries for `/api/push` targeting registry hostnames outside the operator’s known allowlist	HIGH
GGUF files with anomalous tensor metadata on `/api/create`	Inbound GGUF uploads with declared tensor offsets or sizes that exceed file length, or shape fields with abnormally large integer values	HIGH
Rapid sequential `/api/create` followed by `/api/push`	Short-interval pairing of a model creation request with an immediate push to an external registry, particularly from an unfamiliar source IP	MEDIUM
Outbound connections from Ollama process to novel hosts	Network sessions initiated by the Ollama server process to external IP addresses not associated with known model registries or update endpoints	MEDIUM
Quantization errors or unexpected heap growth on `/api/create`	Error output from the quantization pipeline or anomalous memory consumption during model creation, which may indicate boundary-probing attempts	LOW

Enable access logging for the Ollama API, particularly the /api/create and /api/push endpoints. Review push destinations against a registry allowlist. For network-exposed deployments, placing the Ollama API behind an authenticated reverse proxy eliminates the unauthenticated access condition that makes remote exploitation directly reachable. Upgrade to Ollama 0.17.1 or later as the primary remediation.

Indicators of Compromise

No confirmed exploitation indicators—malware families, attacker infrastructure, or observed attack campaigns—have been published for CVE-2026-7482 at time of publication. The following behavioral indicators are consistent with exploitation or exploitation-attempt activity:

Unexpected model push operations — /api/push requests targeting unfamiliar or external registry hostnames, particularly following a recent /api/create request with an external or unknown GGUF source.
GGUF uploads with out-of-bounds tensor metadata — incoming model files presenting declared tensor offsets or sizes that exceed the file’s actual byte length; this structural condition is the prerequisite for triggering the out-of-bounds read.
Outbound Ollama process connections to novel hosts — network sessions from the Ollama server process to external addresses not normally associated with model downloads or registry operations.
Quantization pipeline errors or anomalous memory events — unexpected error output, heap-growth anomalies, or service instability during /api/create processing that does not correspond to legitimate model loading activity.

Recommended mitigations (in priority order):

Upgrade to Ollama 0.17.1 or later — Fix commit 88d57d0 (PR #14406) adds bounds checking to the GGUF loader that rejects tensor offset and size declarations exceeding the actual file length before quantization begins.
Restrict network access to the Ollama API — If OLLAMA_HOST=0.0.0.0 or another network-binding configuration is required, place the Ollama API behind an authenticated reverse proxy and restrict access using firewall rules to trusted source IP ranges. The Ollama API provides no built-in authentication mechanism.
Audit and allowlist push registry destinations — Review and restrict the registries to which your Ollama deployment is permitted to push; monitor push activity for destinations outside the approved set.
Rotate credentials accessible to the Ollama process — API keys and tokens present in the environment of a network-exposed, unpatched Ollama instance may have been exposed if the service processed attacker-supplied GGUF files. Rotate affected credentials after upgrading.
Enable API access logging — Enable and retain Ollama access logs for /api/create and /api/push operations; review for requests involving externally sourced GGUF files or pushes to unfamiliar registries.

Disclosure Timeline

2026-02-02 — Private Report to Ollama Project Cyera Research reports the heap out-of-bounds read vulnerability to the Ollama project through coordinated disclosure channels.
2026-04-28 — CVE Assignment CVE-2026-7482 is assigned by Echo, a third-party CVE Numbering Authority, in coordination with the disclosure process.
2026-05-04 — CVE Publication and Patch Release CVE-2026-7482 is published to the CVE Program and the National Vulnerability Database with CVSS v3.1 score 9.1 CRITICAL and CVSS v4.0 score 8.8 HIGH. Ollama 0.17.1, incorporating fix commit 88d57d0 from PR #14406, is released concurrently. Cyera Research publishes the Bleeding Llama technical analysis under coordinated disclosure.

Sources & References

CVE Program: CVE-2026-7482 Record — CVE Program, 2026-05-04
National Vulnerability Database: CVE-2026-7482 — National Vulnerability Database, 2026-05-04
GitHub Advisory Database: GHSA-x8qc-fggm-mpqg — GitHub Advisory Database, 2026-05-04
Cyera: Bleeding Llama — Critical Unauthenticated Memory Leak in Ollama — Cyera, 2026-05-04
Ollama Project: v0.17.1 Release Notes — Ollama Project, 2026-05-04
Ollama Project: PR #14406 — Validate GGUF tensor bounds before quantization — Ollama Project, 2026-05-04