CVE-2025-23247
An integer overflow vulnerability exists in the ELF Section Parsing functionality of NVIDIA cuobjdump 12.8.55. A specially crafted fatbin file can lead to an out-of-bounds write. An attacker can provide a malicious file to trigger this vulnerability.
The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.
NVIDIA cuobjdump 12.8.55
cuobjdump - https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#cuobjdump
7.8 - CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE-190 - Integer Overflow or Wraparound
cuobjdump is a command-line utility included in the CUDA Toolkit provided by NVIDIA. Similar to the standard objdump
utility, it parses CUDA executable files and displays information like PTX disassembly, section headers, relocations etc.
cuobjdump
takes fatbin
files as input. A CUDA binary file (cubin
) is a custom ELF-like binary format that contains code compiled by the Nvidia CUDA compiler (nvcc
) for a specific CUDA architecture. fatbin
files can contain multiple cubin
files for compatibility with different CUDA devices.
These ELF-like cubin
files may contain custom sections like .nv_info
, .nv_constant
, .nv_debug_info
etc. that provide extra metadata needed for the CUDA runtime. In order to see all the information in the headers of the embedded ELF files, the --dump-elf
flag can be passed to cuobjdump
.
$ cuobjdump --dump-elf poc.elf
Fatbin elf code:
================
arch = sm_309
code version = [256,280]
host = linux
compile_size = 64bit
32bit elf: type=71, abi=0, sm=102, toolkit=2304, flags = 0x80066
Sections:
Index Offset Size ES Align Type Flags Link Info Name
1 0 0 190100 0 UNKNOWN 600 1 0 .nv_debug_source
2 440 e4 0 0 STRTAB 0 0 0 ource
3 0 32 ab 0 UNKNOWN 0 1 119 .nv_debug_source
4 0 0 2b2 400 UNKNOWN 0 1 0 .nv_debug_source
5 10 0 1 0 UNKNOWN 0 b9000100 0 .nv_debug_source
6 0 0 0 b9000100 UNKNOWN 0 0 0 .nv_debug_source
7 0 0 1010101 1010000 UNKNOWN 0 0 0 .nv_debug_source
cannot get section: .strtab
cannot get section: .symtab
.section .nv_debug_source
Version: 256
Section size: 0
Number of sources: 4
File name:
Extended information size: 1
File format: text
File contents:
.nv_debug_source
is a custom section of a cubin
ELF file and stores information regarding the source files used for compilation. After extracting the ELF file from a fatbin
after looking for the standard ELF magic number, we can analyze it with the readelf
utility:
$ readelf --sections poc.elf
There are 8 section headers, starting at offset 0x3f:
readelf: Warning: Section 5 has an out of range sh_link value of 3103785216
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] .nv_debug_source NULL 00000000 0000f6 0003e8 00 0 0 0
Here we see that the offset of the .nv_debug_source
is 0xf6
from the start of the ELF file embedded in the fatbin
. After reverse-engineering cuobjdump
, we can deduce a basic structure for the section:
struct nv_debug_source_section {
uin16_t version;
uint64_t size;
uint16_t nsources;
uint8_t unknown[20];
struct source_entry1 {
int16_t filename_size;
uint64_t filename_offset;
uint8_t unknown[38];
}
...
}
In summary, an .nv_debug_section
has an entry for every embedded source file that holds the file name size and the offset where the file name resides.
During the parsing of a .nv_debug_source
section, the code reads the size of the source file name as a 2-byte value from the input file at (1) to the ebp
register, doing a sign extension to a 4-byte value. At (2) an 8-byte value is copied to r15
. Then at (3) the code copies the ebp
value to the r13d
register. At (4), ebp
is incremented by 1
, and promoted to an 8-byte value at (5).
00441292 movsx ebp, word [rbx-0x30] (1)
00441296 mov r15, qword [rbx-0x2e] (2)
...
004412a3 mov r13d, ebp (3)
004412a6 add ebp, 0x1 (4)
004412a9 movsxd rbp, ebp (5)
Note that if the 2-byte value taken from the input is 0xffff
, an integer overflow will occur at (4), setting the value of ebp
, and consequently rbp
, to 0
.
The code then proceeds to call memset()
at (7), in order to initialize a buffer with 0
. For the size parameter in the rdx
register at (6), the code will use the rbp
calculated earlier.
004412bc mov rdx, rbp (6)
004412bf xor esi, esi {0x0}
004412c1 mov rdi, r12
004412c4 call memset (7)
Note that for the value of rbp
being 0
, memset()
will simply exit early without writing any memory. For a different input value provided, say 0xfff0
, the execution would result in a segmentation fault since the memset()
would eventually read and write to unmapped memory. However, due to the integer overflow in (4) and the input value of 0xffff
, rbp
is 0
so execution continues.
The code then proceeds to copy data to a buffer using memcpy()
at (10). For the source argument in rsi
at (8), the code uses r15
as an offset, which we saw being read from the input file at (2) earlier. At (9), the code does an 8-byte sign extension of the value of r13w
to get the size parameter for the memcpy()
in rdx
.
004412c9 lea rsi, [r14+r15] (8)
004412cd movsx rdx, r13w (9)
004412d1 mov rdi, r12
004412d4 call memcpy (10)
Since at (3) earlier, r15
holds the value of rbp
before the integer overflow, it will effectively be 0xffffffffffffffff
. Calling a memcpy()
with such a huge size as a parameter would normally lead to the application crashing due to a segmentation fault when it inevitably tries to access unmapped memory.
During testing, however, we noticed that the application did not crash and would in fact continue execution after the memcpy()
call. After deeper analysis, it was evident that since the binary was compiled with optimizations enabled, the default glibc
uses a highly-optimized implementation of memmove()
instead of memcpy()
, that allows for a size like 0xffffffffffffffff
to be used without touching unmapped memory.
In essence, the highly-optimized memmove()
implementation performs the copy backwards, from end to start, performing copies in chunks of 16 bytes in a loop. So by adding 0xffffffffffffffff
to the destination pointer to calculate the end, effectively adding -1
, the code performs a buffer underwrite before the destination pointer. Then at the end of the loop, since the current pointer is before the destination pointer, the code exits having copied 64 bytes before the intended buffer.
For more information, one can consult the source code of glibc
[1] and the relevant research for a similar exploitation primitive in [2].
To reiterate, we can now copy 64 bytes before the destination buffer. As for the source buffer, we saw that at (8), rsi
is calculated with an offset provided by the input file at (2), giving an attacker enough options to copy from any relevant offset in memory. Since r14
holds input data from the file, we can put a small offset for r15
so we can control the contents of the buffer underwrite easily and perform memory corruption with controlled data.
Since the program has a structure that contains pointers before the destination buffer, after the memcpy()
call, we can force the program to use an arbitrary pointer provided by the input and eventually achieve remote code execution.
1: https://sources.debian.org/src/glibc/2.28-10/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S/#L225 2: https://www.fort-knox.org/files/thesis.pdf#page=73
$ gdb --args /usr/local/cuda/bin/cuobjdump --dump-elf ./poc
...
Program received signal SIGSEGV, Segmentation fault.
0x000000000041d8a7 in ?? ()
> x/i $pc
0x41d8a7 sub QWORD PTR [rax+0x8], rbp
> i r $rax
$rax : 0x41414141414141
Vendor advisory: https://nvidia.custhelp.com/app/answers/detail/a_id/5643
2025-02-13 - Vendor Disclosure
2025-05-27 - Vendor Patch Release
2025-06-02 - Public Release
Discovered by Dimitrios Tatsis of Cisco Talos.