Talos Vulnerability Report

TALOS-2025-2151

NVIDIA cuobjdump ELF Section Parsing Integer Overflow Vulnerability

June 2, 2025
CVE Number

CVE-2025-23247

SUMMARY

An integer overflow vulnerability exists in the ELF Section Parsing functionality of NVIDIA cuobjdump 12.8.55. A specially crafted fatbin file can lead to an out-of-bounds write. An attacker can provide a malicious file to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

NVIDIA cuobjdump 12.8.55

PRODUCT URLS

cuobjdump - https://docs.nvidia.com/cuda/cuda-binary-utilities/index.html#cuobjdump

CVSSv3 SCORE

7.8 - CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CWE

CWE-190 - Integer Overflow or Wraparound

DETAILS

cuobjdump is a command-line utility included in the CUDA Toolkit provided by NVIDIA. Similar to the standard objdump utility, it parses CUDA executable files and displays information like PTX disassembly, section headers, relocations etc.

cuobjdump takes fatbin files as input. A CUDA binary file (cubin) is a custom ELF-like binary format that contains code compiled by the Nvidia CUDA compiler (nvcc) for a specific CUDA architecture. fatbin files can contain multiple cubin files for compatibility with different CUDA devices.

These ELF-like cubin files may contain custom sections like .nv_info, .nv_constant, .nv_debug_info etc. that provide extra metadata needed for the CUDA runtime. In order to see all the information in the headers of the embedded ELF files, the --dump-elf flag can be passed to cuobjdump.

    $ cuobjdump --dump-elf poc.elf

    Fatbin elf code:
    ================
    arch = sm_309
    code version = [256,280]
    host = linux
    compile_size = 64bit

    32bit elf: type=71, abi=0, sm=102, toolkit=2304, flags = 0x80066
    Sections:
    Index Offset   Size ES Align        Type        Flags Link     Info Name
        1      0      0 190100  0           UNKNOWN     600    1        0 .nv_debug_source
        2    440     e4  0  0            STRTAB       0    0        0 ource
        3      0     32 ab  0           UNKNOWN       0    1      119 .nv_debug_source
        4      0      0 2b2 400           UNKNOWN       0    1        0 .nv_debug_source
        5     10      0  1  0           UNKNOWN       0 b9000100        0 .nv_debug_source
        6      0      0  0 b9000100           UNKNOWN       0    0        0 .nv_debug_source
        7      0      0 1010101 1010000           UNKNOWN       0    0        0 .nv_debug_source
    cannot get section: .strtab
    cannot get section: .symtab

    .section .nv_debug_source
      Version:                     256
      Section size:                0
      Number of sources:           4
      File name:
      Extended information size:   1
      File format:                 text
      File contents:

.nv_debug_source is a custom section of a cubin ELF file and stores information regarding the source files used for compilation. After extracting the ELF file from a fatbin after looking for the standard ELF magic number, we can analyze it with the readelf utility:

    $ readelf --sections poc.elf
    There are 8 section headers, starting at offset 0x3f:
    readelf: Warning: Section 5 has an out of range sh_link value of 3103785216

    Section Headers:
      [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
      [ 0] .nv_debug_source  NULL            00000000 0000f6 0003e8 00      0   0  0

Here we see that the offset of the .nv_debug_source is 0xf6 from the start of the ELF file embedded in the fatbin. After reverse-engineering cuobjdump, we can deduce a basic structure for the section:

    struct nv_debug_source_section {
        uin16_t version;
        uint64_t size;
        uint16_t nsources;
        uint8_t unknown[20];

        struct source_entry1 {
            int16_t filename_size;
            uint64_t filename_offset;
            uint8_t unknown[38];
        }
        ...
    }

In summary, an .nv_debug_section has an entry for every embedded source file that holds the file name size and the offset where the file name resides.

During the parsing of a .nv_debug_source section, the code reads the size of the source file name as a 2-byte value from the input file at (1) to the ebp register, doing a sign extension to a 4-byte value. At (2) an 8-byte value is copied to r15. Then at (3) the code copies the ebp value to the r13d register. At (4), ebp is incremented by 1, and promoted to an 8-byte value at (5).

00441292  movsx   ebp, word [rbx-0x30]          (1)
00441296  mov     r15, qword [rbx-0x2e]         (2)
...
004412a3  mov     r13d, ebp                     (3)
004412a6  add     ebp, 0x1                      (4)
004412a9  movsxd  rbp, ebp                      (5)

Note that if the 2-byte value taken from the input is 0xffff, an integer overflow will occur at (4), setting the value of ebp, and consequently rbp, to 0.

The code then proceeds to call memset() at (7), in order to initialize a buffer with 0. For the size parameter in the rdx register at (6), the code will use the rbp calculated earlier.

004412bc  mov     rdx, rbp            (6)
004412bf  xor     esi, esi  {0x0}
004412c1  mov     rdi, r12
004412c4  call    memset              (7)

Note that for the value of rbp being 0, memset() will simply exit early without writing any memory. For a different input value provided, say 0xfff0, the execution would result in a segmentation fault since the memset() would eventually read and write to unmapped memory. However, due to the integer overflow in (4) and the input value of 0xffff, rbp is 0 so execution continues.

The code then proceeds to copy data to a buffer using memcpy() at (10). For the source argument in rsi at (8), the code uses r15 as an offset, which we saw being read from the input file at (2) earlier. At (9), the code does an 8-byte sign extension of the value of r13w to get the size parameter for the memcpy() in rdx.

004412c9  lea     rsi, [r14+r15]      (8)
004412cd  movsx   rdx, r13w           (9)
004412d1  mov     rdi, r12
004412d4  call    memcpy              (10)

Since at (3) earlier, r15 holds the value of rbp before the integer overflow, it will effectively be 0xffffffffffffffff. Calling a memcpy() with such a huge size as a parameter would normally lead to the application crashing due to a segmentation fault when it inevitably tries to access unmapped memory.

During testing, however, we noticed that the application did not crash and would in fact continue execution after the memcpy() call. After deeper analysis, it was evident that since the binary was compiled with optimizations enabled, the default glibc uses a highly-optimized implementation of memmove() instead of memcpy(), that allows for a size like 0xffffffffffffffff to be used without touching unmapped memory.

In essence, the highly-optimized memmove() implementation performs the copy backwards, from end to start, performing copies in chunks of 16 bytes in a loop. So by adding 0xffffffffffffffff to the destination pointer to calculate the end, effectively adding -1, the code performs a buffer underwrite before the destination pointer. Then at the end of the loop, since the current pointer is before the destination pointer, the code exits having copied 64 bytes before the intended buffer.

For more information, one can consult the source code of glibc [1] and the relevant research for a similar exploitation primitive in [2].

To reiterate, we can now copy 64 bytes before the destination buffer. As for the source buffer, we saw that at (8), rsi is calculated with an offset provided by the input file at (2), giving an attacker enough options to copy from any relevant offset in memory. Since r14 holds input data from the file, we can put a small offset for r15 so we can control the contents of the buffer underwrite easily and perform memory corruption with controlled data.

Since the program has a structure that contains pointers before the destination buffer, after the memcpy() call, we can force the program to use an arbitrary pointer provided by the input and eventually achieve remote code execution.

References

1: https://sources.debian.org/src/glibc/2.28-10/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S/#L225 2: https://www.fort-knox.org/files/thesis.pdf#page=73

Crash Information

$ gdb --args /usr/local/cuda/bin/cuobjdump --dump-elf ./poc
...
Program received signal SIGSEGV, Segmentation fault.
0x000000000041d8a7 in ?? ()
> x/i $pc
0x41d8a7                  sub    QWORD PTR [rax+0x8], rbp
> i r $rax
$rax   : 0x41414141414141
VENDOR RESPONSE

Vendor advisory: https://nvidia.custhelp.com/app/answers/detail/a_id/5643

TIMELINE

2025-02-13 - Vendor Disclosure
2025-05-27 - Vendor Patch Release
2025-06-02 - Public Release

Credit

Discovered by Dimitrios Tatsis of Cisco Talos.