Talos Vulnerability Report

TALOS-2024-2132

Catdoc utilities OLE Document DIFAT Parser integer underflow vulnerability

June 2, 2025
CVE Number

CVE-2024-54028

SUMMARY

An integer underflow vulnerability exists in the OLE Document DIFAT Parser functionality of catdoc 0.95. A specially crafted malformed file can lead to heap-based memory corruption. An attacker can provide a malicious file to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

catdoc 0.95

PRODUCT URLS

catdoc - http://wagner.pp.ru/~vitus/software/catdoc/

CVSSv3 SCORE

8.4 - CVSS:3.1/AV:L/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

CWE

CWE-191 - Integer Underflow (Wrap or Wraparound)

DETAILS

The catdoc command-line utility is used by several filesystem indexers in order to extract the textual content from a Microsoft Word document. The purpose of this is to enable searching through the contents of documents that are stored on a filesystem.

The catdoc utilities are multiple open-source command-line tools that are designed to extract plaintext from the contents of various Microsoft Office documents. The catdoc program supports file formats from Microsoft Word, with xls2csv being used for Microsoft Excel, and the catppt utility for Microsoft Powerpoint. This suite is used internally by several filesystem indexers such as KDE’s Baloo in order to provide full text search for the contents of documents to the user.

When processing a document, the catdoc utility will first open a handle to the file and then pass it as a parameter to the analyze_format function at [1]. The analyze_format function will begin by reading the first 4 bytes at [2] in order to fingerprint the file format. Afterwards at [3], the next 4 bytes will be read in order to check the beginning of the file against the signature for the OLE compound document format. After the signature has been checked, the ole_init function will be used at [4] to parse the header and allocation tables of the file format.

src/catdoc.c:38-185
int main(int argc, char **argv) {
    ...
            f=fopen(argv[i],"rb");
            if (!f) {
                c=1;
                perror("catdoc");
                continue;
            }
            if (input_buffer) {
                if (setvbuf(f,input_buffer,_IOFBF,FILE_BUFFER)) {
                    perror(argv[i]);
                }
            }
            c=analyze_format(f);                                                        // [1] analyze the opened file \
            fclose(f);
        }
    ...
return c;
}
\
src/analyze.c:26-83
int analyze_format(FILE *f) {
    unsigned char buffer[129];
    long offset=0;
    FILE *new_file, *ole_file;
    int ret_code=69;
...
    catdoc_read(buffer,4,1,f);                                                          // [2] read first 4 bytes to verify the format
    buffer[4]=0;
...
    fread(buffer+4,1,4,f);                                                              // [3] read next 4 bytes to check signature of document
    if (strncmp((char *)&buffer,ole_sign,8)==0) {
        if ((new_file=ole_init(f, buffer, 8)) != NULL) {                                // [4] open up the file to decode the OLE structured storage document
            set_ole_func();
            while((ole_file=ole_readdir(new_file)) != NULL) {
                int res=ole_open(ole_file);
                if (res >= 0) {
                    if (strcmp(((oleEntry*)ole_file)->name , "WordDocument") == 0) {
                        offset=catdoc_read(buffer, 1, 128, ole_file);
                        ret_code=parse_word_header(buffer,ole_file,-offset,offset);     // [5] process the "WordDocument" stream
                    }
                } 
                ole_close(ole_file);
            }
            set_std_func();
            ole_finish();
...
    return ret_code;
}   

The implementation of the ole_init function is directly responsible for parsing the contents of an OLE compound document. Once the size of the file has been checked, at [6] the function will read the first sector (512 bytes) into the oleBuf array. This array contains the header of the structured storage document as per the file format specification. Immediately after reading the header, the function will parse the fields necessary from the file in order to access its contents.

src/ole.c:54-285
FILE* ole_init(FILE *f, void *buffer, size_t bufSize)  {
    unsigned char oleBuf[BBD_BLOCK_SIZE];
    unsigned char *tmpBuf;
    FILE *newfile;
    int ret=0, i;
    long int bbdSize;
    long int sbdMaxLen, sbdCurrent, propMaxLen, propCurrent, mblock, msat_size;
    oleEntry *tEntry;
...
    fseek(newfile,0,SEEK_END);
    fileLength=ftell(newfile);
/* 	fprintf(stderr, "fileLength=%ld\n", fileLength); */
    fseek(newfile,0,SEEK_SET);
    ret=fread(oleBuf,1,BBD_BLOCK_SIZE,newfile);             // [6] read header of file
...
    return newfile;
}

At [7], the implementation reads two 16-bit integers in order to determine the size of the sector and minisectors within the file. These sizes are unbounded and used as a power of 2. It is worth noting that according to Microsoft’s Compound Binary File Format specification, these fields should be either 0x9 or 0xC. Once both sector sizes have been read, the function will then read the number of sectors required for the file allocation table at [8], and then use it to allocate a buffer to contain the entirety of the file allocation table. At [9], the same is done for reading the master file allocation table. This 32-bit size is read directly from the file and stored in the msat_size variable.

src/ole.c:54-285
FILE* ole_init(FILE *f, void *buffer, size_t bufSize)  {
    unsigned char oleBuf[BBD_BLOCK_SIZE];
    unsigned char *tmpBuf;
    FILE *newfile;
    int ret=0, i;
    long int bbdSize;
    long int sbdMaxLen, sbdCurrent, propMaxLen, propCurrent, mblock, msat_size;
    oleEntry *tEntry;
...
    sectorSize = 1<<getshort(oleBuf,0x1e);                  // [7] read number of bits for sector
    shortSectorSize=1<<getshort(oleBuf,0x20);               // [7] read number of bits for minisector

/* Read BBD into memory */
    bbdNumBlocks = getulong(oleBuf,0x2c);                   // [8] read number of sectors for file allocation table
    bbdSize = bbdNumBlocks * sectorSize;
    if (bbdSize > fileLength) {
        /* broken file, BBD size greater than entire file*/
        return NULL;
    }

    if((BBD=malloc(bbdNumBlocks*sectorSize)) == NULL ) {    // [8] allocate space for file allocation table
        return NULL;
    }

    if((tmpBuf=malloc(MSAT_ORIG_SIZE)) == NULL ) {
        return NULL;
    }
    memcpy(tmpBuf,oleBuf+0x4c,MSAT_ORIG_SIZE);              // [9] read first 109 entries of indirect file allocation table
    mblock=getlong(oleBuf,0x44);                            // [9] read starting sector for indirect file allocation table
    msat_size=getlong(oleBuf,0x48);                         // [9] read number of sectors for indirect file allocation table
    if (msat_size * sectorSize > fileLength) {
        free(tmpBuf);
        return NULL;
    }

/* 	fprintf(stderr, "msat_size=%ld\n", msat_size); */

...
    return newfile;
}

After reading the fields composing the indirect fat, the following loop will be executed in order to actually read its contents. At [10], the loop will use the 32-bit integer i as a counter while ensuring that it is less than the total number of sectors that was read earlier and stored within the msat_size variable. The counter is then used to perform an allocation at [11] which is used to store the sectors used for the document’s file allocation table. Specifically at [12], the counter will be used to calculate an offset into the allocated buffer and read a sector from the document. This offset is calculated by taking the sectorSize, subtracting 4, and then multiplying it by the counter.

src/ole.c:54-285
FILE* ole_init(FILE *f, void *buffer, size_t bufSize)  {
    unsigned char oleBuf[BBD_BLOCK_SIZE];
    unsigned char *tmpBuf;
    FILE *newfile;
    int ret=0, i;
    long int bbdSize;
    long int sbdMaxLen, sbdCurrent, propMaxLen, propCurrent, mblock, msat_size;
    oleEntry *tEntry;
...

    i=0;
    while((mblock >= 0) && (i < msat_size)) {                                           // [10] loop until msat_size
        unsigned char *newbuf;
/* 		fprintf(stderr, "i=%d mblock=%ld\n", i, mblock); */
        if ((newbuf=realloc(tmpBuf, sectorSize*(i+1)+MSAT_ORIG_SIZE)) != NULL) {        // [11] allocate memory based using sector size
            tmpBuf=newbuf;
        } else {
            perror("MSAT realloc error");
            free(tmpBuf);
            ole_finish();
            return NULL;
        }

        fseek(newfile, 512+mblock*sectorSize, SEEK_SET);                                // seek to location in file
        if(fread(tmpBuf+MSAT_ORIG_SIZE+(sectorSize-4)*i,                                // [12] integer underflow
                         1, sectorSize, newfile) != sectorSize) {
            fprintf(stderr, "Error read MSAT!\n");
            ole_finish();
            return NULL;
        }

        i++;
        mblock=getlong(tmpBuf, MSAT_ORIG_SIZE+(sectorSize-4)*i);                        // read the index of the next sector.
    }

...
    return newfile;
}

If the sectorSize is less than 4, this can cause an integer underflow resulting in the fread function storing data from the file outside the bounds of the heap allocation. This can be used to corrupt memory which can allow for code execution under the context of the binary. This affects catdoc as well as the other utilities (xls2csv, and catppt).

Crash Information

=================================================================
==445143==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x51800000d47e at pc 0x00000040cad0 bp 0x7ffc717fa430 sp 0x7ffc717fa428
READ of size 1 at 0x51800000d47e thread T0
    #0 0x40cacf in getlong /.../catdoc/src/numutils.c:22
    #1 0x40e1f9 in ole_init /.../catdoc/src/ole.c:158
    #2 0x404cb0 in analyze_format /.../catdoc/src/analyze.c:58
    #3 0x401dc1 in main /.../catdoc/src/catdoc.c:180
    #4 0x7f8fda60f247 in __libc_start_call_main (/lib64/libc.so.6+0x3247) (BuildId: b6c381bfdcb5e08ea82c1c39cf16580181fb6cfc)
    #5 0x7f8fda60f30a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x330a) (BuildId: b6c381bfdcb5e08ea82c1c39cf16580181fb6cfc)
    #6 0x402534 in _start (/.../catdoc/src/catdoc+0x402534) (BuildId: 80cf459f76f8708266c3e4e2709e374feee94951)

0x51800000d47e is located 2 bytes before 874-byte region [0x51800000d480,0x51800000d7ea)
allocated by thread T0 here:
    #0 0x7f8fda8c17d8 in realloc.part.0 (/lib64/libasan.so.8+0xc17d8) (BuildId: 5294bd2731fcae07af92dfea7808576c57d53bc9)
    #1 0x40e17d in ole_init /.../catdoc/src/ole.c:140
    #2 0x404cb0 in analyze_format /.../catdoc/src/analyze.c:58
    #3 0x401dc1 in main /.../catdoc/src/catdoc.c:180
    #4 0x7f8fda60f247 in __libc_start_call_main (/lib64/libc.so.6+0x3247) (BuildId: b6c381bfdcb5e08ea82c1c39cf16580181fb6cfc)
    #5 0x7f8fda60f30a in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x330a) (BuildId: b6c381bfdcb5e08ea82c1c39cf16580181fb6cfc)
    #6 0x402534 in _start (/.../catdoc/src/catdoc+0x402534) (BuildId: 80cf459f76f8708266c3e4e2709e374feee94951)

SUMMARY: AddressSanitizer: heap-buffer-overflow /.../catdoc/src/numutils.c:22 in getlong
Shadow bytes around the buggy address:
  0x51800000d180: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x51800000d200: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x51800000d280: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x51800000d300: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x51800000d380: fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa
=>0x51800000d400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
  0x51800000d480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x51800000d500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x51800000d580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x51800000d600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x51800000d680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==445143==ABORTING

Exploit Proof of Concept

The proof-of-concept requires Python3. To use it to generate a document triggering the vulnerability, the output filename needs to be passed as a parameter. Once the document has been created, it can be used with catdoc to trigger the crash.

$ python3 poc.py3.zip /path/to/filename.doc
...
$ src/catdoc /path/to/filename.doc

The header for the document has the following format. At offset 0x1E of the file are two 16-bit integers containing the power of 2 that is used to determine the size of a regular sector and minisector.

>>> store
<class storage.File> 'unnamed_7f743a085970' {unnamed=True}
[0] <instance storage.Header 'Header'> (little) 0xd0cf11e0a1b11ae1 version=3.62 clsid={00000000-0000-0000-0000-000000000000}
[1e] <instance storage.HeaderSectorShift 'SectorShift'> uSectorShift=1 (0x2) uMiniSectorShift=6 (0x40)
[22] <instance ptype.block 'reserved'> (6) "\x00\x00\x00\x00\x00\x00"
[28] <instance storage.HeaderFat 'Fat'> sectDirectory=ENDOFCHAIN(0xfffffffe) csectDirectory=0 csectFat=0 dwTransaction=0x00000000
[38] <instance storage.HeaderMiniFat 'MiniFat'> ulMiniSectorCutoff=4096 sectMiniFat=ENDOFCHAIN(0xfffffffe) csectMiniFat=2147483648
[44] <instance storage.HeaderDiFat 'DiFat'> sectDifat=0x00000003 csectDifat=2147483647
[4c] <instance storage.DIFAT 'Table'> storage.DIFAT.IndirectPointer[109] .........................................................

Specifically, if the 16-bit integer at 0x1E is 0, 1, or 2, (resulting in a sector size of 0, 2, or 4) the described integer underflow will be triggered. The generated proof-of-concept uses a sector size of 1.

>>> store['sectorshift']
<class storage.HeaderSectorShift> 'SectorShift'
[1e] <instance storage.USHORT 'uSectorShift'> 0x0001 (1)
[20] <instance storage.USHORT 'uMiniSectorShift'> 0x0006 (6)
TIMELINE

2025-01-07 - Initial Vendor Contact
2025-01-14 - Initial Vendor Contact
2025-01-16 - Vendor Disclosure
2025-06-02 - Public Release

Credit

Discovered by a member of Cisco Talos.