CVE-2024-52035
An integer overflow vulnerability exists in the OLE Document File Allocation Table Parser functionality of catdoc 0.95. A specially crafted malformed file can lead to heap-based memory corruption. An attacker can provide a malicious file to trigger this vulnerability.
The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.
catdoc 0.95
catdoc - http://wagner.pp.ru/~vitus/software/catdoc/
8.4 - CVSS:3.1/AV:L/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
CWE-190 - Integer Overflow or Wraparound
The catdoc command-line utility is used by several filesystem indexers in order to extract the textual content from a Microsoft Word document. The purpose of this is to enable searching through the contents of documents that are stored on a filesystem.
The catdoc utilities are multiple open-source command-line tools that are designed to extract plaintext from the contents of various Microsoft Office documents. The catdoc
program supports file formats from Microsoft Word, with xls2csv
being used for Microsoft Excel, and the catppt
utility for Microsoft Powerpoint. This suite is used internally by several filesystem indexers such as KDE’s Baloo in order to provide full text search for the contents of documents to the user.
When processing a document, the catdoc
utility will first open a handle to the file and then pass it as a parameter to the analyze_format
function at [1]. The analyze_format
function will begin by reading the first 4 bytes at [2] in order to fingerprint the file format. Afterwards at [3], the next 4 bytes will be read in order to check the beginning of the file against the signature for the OLE compound document format. After the signature has been checked, the ole_init
function will be used at [4] to parse the header and allocation tables of the file format.
src/catdoc.c:38-185
int main(int argc, char **argv) {
...
f=fopen(argv[i],"rb");
if (!f) {
c=1;
perror("catdoc");
continue;
}
if (input_buffer) {
if (setvbuf(f,input_buffer,_IOFBF,FILE_BUFFER)) {
perror(argv[i]);
}
}
c=analyze_format(f); // [1] analyze the opened file \
fclose(f);
}
...
return c;
}
\
src/analyze.c:26-83
int analyze_format(FILE *f) {
unsigned char buffer[129];
long offset=0;
FILE *new_file, *ole_file;
int ret_code=69;
...
catdoc_read(buffer,4,1,f); // [2] read first 4 bytes to verify the format
buffer[4]=0;
...
fread(buffer+4,1,4,f); // [3] read next 4 bytes to check signature of document
if (strncmp((char *)&buffer,ole_sign,8)==0) {
if ((new_file=ole_init(f, buffer, 8)) != NULL) { // [4] open up the file to decode the OLE structured storage document
set_ole_func();
while((ole_file=ole_readdir(new_file)) != NULL) {
int res=ole_open(ole_file);
if (res >= 0) {
if (strcmp(((oleEntry*)ole_file)->name , "WordDocument") == 0) {
offset=catdoc_read(buffer, 1, 128, ole_file);
ret_code=parse_word_header(buffer,ole_file,-offset,offset); // [5] process the "WordDocument" stream
}
}
ole_close(ole_file);
}
set_std_func();
ole_finish();
...
return ret_code;
}
The implementation of the ole_init
function is directly responsible for parsing the contents of an OLE compound document. Once the size of the file has been checked, at [6] the function will read the first sector (512 bytes) into the oleBuf
array. This array contains the header of the structured storage document as per the file format specification. Immediately after reading the header, the function will parse the fields necessary from the file in order to access its contents.
src/ole.c:54-285
FILE* ole_init(FILE *f, void *buffer, size_t bufSize) {
unsigned char oleBuf[BBD_BLOCK_SIZE];
unsigned char *tmpBuf;
FILE *newfile;
int ret=0, i;
long int bbdSize;
long int sbdMaxLen, sbdCurrent, propMaxLen, propCurrent, mblock, msat_size;
oleEntry *tEntry;
...
fseek(newfile,0,SEEK_END);
fileLength=ftell(newfile);
/* fprintf(stderr, "fileLength=%ld\n", fileLength); */
fseek(newfile,0,SEEK_SET);
ret=fread(oleBuf,1,BBD_BLOCK_SIZE,newfile); // [6] read header of file
...
return newfile;
}
At [7], the implementation reads two 16-bit integers in order to determine the size of the sector and minisectors within the file. These sizes are unbounded and used as a power of 2. It is worth noting that according to Microsoft’s Compound Binary File Format specification, these fields should be either 0x9 or 0xC. Once both sector sizes have been read, the function will then read the number of sectors required for the file allocation table at [8], and then use it to allocate a buffer to contain the entirety of the file allocation table. At [9], the same is done for reading the master file allocation table. This 32-bit size is read directly from the file and stored in the msat_size
variable.
src/ole.c:54-285
FILE* ole_init(FILE *f, void *buffer, size_t bufSize) {
unsigned char oleBuf[BBD_BLOCK_SIZE];
unsigned char *tmpBuf;
FILE *newfile;
int ret=0, i;
long int bbdSize;
long int sbdMaxLen, sbdCurrent, propMaxLen, propCurrent, mblock, msat_size;
oleEntry *tEntry;
...
sectorSize = 1<<getshort(oleBuf,0x1e); // [7] read number of bits for sector
shortSectorSize=1<<getshort(oleBuf,0x20); // [7] read number of bits for minisector
/* Read BBD into memory */
bbdNumBlocks = getulong(oleBuf,0x2c); // [8] read number of sectors for file allocation table
bbdSize = bbdNumBlocks * sectorSize;
if (bbdSize > fileLength) {
/* broken file, BBD size greater than entire file*/
return NULL;
}
if((BBD=malloc(bbdNumBlocks*sectorSize)) == NULL ) { // [8] allocate space for file allocation table
return NULL;
}
if((tmpBuf=malloc(MSAT_ORIG_SIZE)) == NULL ) {
return NULL;
}
memcpy(tmpBuf,oleBuf+0x4c,MSAT_ORIG_SIZE); // [9] read first 109 entries of indirect file allocation table
mblock=getlong(oleBuf,0x44); // [9] read starting sector for indirect file allocation table
msat_size=getlong(oleBuf,0x48); // [9] read number of sectors for indirect file allocation table
if (msat_size * sectorSize > fileLength) {
free(tmpBuf);
return NULL;
}
/* fprintf(stderr, "msat_size=%ld\n", msat_size); */
...
return newfile;
}
After reading the sector sizes from the header, the ole_init
function will use them within the following code in order to read the contents of the file allocation table. At [10], the number of fat sectors will be read from the header as a 32-bit integer. This number will then be multiplied by the sector size that was read earlier and used at [11] to allocate an array on the heap that is used to store the file allocation table. If the product of number of sectors read from the file and the sector size is larger than 32-bits, then this size will overflow resulting in the allocation being undersized. After a sector index is read at [12], the loop will use fread
to read the sector into the allocated buffer at [13].
src/ole.c:54-285
FILE* ole_init(FILE *f, void *buffer, size_t bufSize) {
unsigned char oleBuf[BBD_BLOCK_SIZE];
unsigned char *tmpBuf;
FILE *newfile;
int ret=0, i;
long int bbdSize;
long int sbdMaxLen, sbdCurrent, propMaxLen, propCurrent, mblock, msat_size;
oleEntry *tEntry;
...
bbdNumBlocks = getulong(oleBuf,0x2c); // [10] read number of FAT sectors from header
bbdSize = bbdNumBlocks * sectorSize;
...
if((BBD=malloc(bbdNumBlocks*sectorSize)) == NULL ) { // [11] allocate memory for reading file allocation table
return NULL;
}
...
for(i=0; i< bbdNumBlocks; i++) { // [12] read sector from indirect file allocation table
long int bbdSector=getlong(tmpBuf,4*i);
if (bbdSector >= fileLength/sectorSize || bbdSector < 0) {
fprintf(stderr, "Bad BBD entry!\n");
ole_finish();
return NULL;
}
fseek(newfile, 512+bbdSector*sectorSize, SEEK_SET);
if ( fread(BBD+i*sectorSize, 1, sectorSize, newfile) != sectorSize ) { // [13] read into underallocated memory
fprintf(stderr, "Can't read BBD!\n");
free(tmpBuf);
ole_finish();
return NULL;
}
}
free(tmpBuf);
...
return newfile;
}
Due to the integer overflow causing the target buffer for fread
to be undersized, the fread
function call can write outside the bounds of the allocated memory. This will cause memory corruption which can allow for code execution under the context of the binary. This only affects 32-bit versions of catdoc
, xls2csv
, and catppt
.
=================================================================
==445514==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xf5001071 at pc 0xf793b7ec bp 0xffcc6128 sp 0xffcc5d00
WRITE of size 65024 at 0xf5001071 thread T0
#0 0xf793b7eb in fread.part.0 (/lib/libasan.so.8+0x4c7eb) (BuildId: 4750f09fdf739c5f6973660b631b7f5960de4d51)
#1 0x8056d80 in ole_init /.../catdoc/src/ole.c:171
#2 0x804d1df in analyze_format /.../catdoc/src/analyze.c:58
#3 0x804a009 in main /.../catdoc/src/catdoc.c:180
#4 0xf76f2d42 in __libc_start_call_main (/lib/libc.so.6+0x2d42) (BuildId: 5846c8c4629a038f4d1a47fd0ffd369dedf95400)
#5 0xf76f2e07 in __libc_start_main@@GLIBC_2.34 (/lib/libc.so.6+0x2e07) (BuildId: 5846c8c4629a038f4d1a47fd0ffd369dedf95400)
#6 0x804a967 in _start (/.../catdoc/src/catdoc+0x804a967) (BuildId: 4f51ecdc1415eaa1741ac429598beb3648914d48)
0xf5001071 is located 0 bytes after 1-byte region [0xf5001070,0xf5001071)
allocated by thread T0 here:
#0 0xf79bea8f in malloc (/lib/libasan.so.8+0xcfa8f) (BuildId: 4750f09fdf739c5f6973660b631b7f5960de4d51)
#1 0x8056afd in ole_init /.../catdoc/src/ole.c:119
#2 0x804d1df in analyze_format /.../catdoc/src/analyze.c:58
#3 0x804a009 in main /.../catdoc/src/catdoc.c:180
#4 0xf76f2d42 in __libc_start_call_main (/lib/libc.so.6+0x2d42) (BuildId: 5846c8c4629a038f4d1a47fd0ffd369dedf95400)
SUMMARY: AddressSanitizer: heap-buffer-overflow (/lib/libasan.so.8+0x4c7eb) (BuildId: 4750f09fdf739c5f6973660b631b7f5960de4d51) in fread.part.0
Shadow bytes around the buggy address:
0xf5000d80: fa fa 02 fa fa fa 02 fa fa fa 02 fa fa fa 02 fa
0xf5000e00: fa fa 02 fa fa fa 03 fa fa fa 02 fa fa fa 03 fa
0xf5000e80: fa fa 02 fa fa fa 02 fa fa fa 02 fa fa fa 02 fa
0xf5000f00: fa fa 03 fa fa fa 03 fa fa fa 02 fa fa fa 03 fa
0xf5000f80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0xf5001000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa[01]fa
0xf5001080: fa fa 04 fa fa fa 03 fa fa fa 02 fa fa fa 03 fa
0xf5001100: fa fa 04 fa fa fa 02 fa fa fa 02 fa fa fa 03 fa
0xf5001180: fa fa 03 fa fa fa 02 fa fa fa 02 fa fa fa 02 fa
0xf5001200: fa fa 05 fa fa fa 03 fa fa fa 02 fa fa fa 03 fa
0xf5001280: fa fa 03 fa fa fa 03 fa fa fa 03 fa fa fa 02 fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==445514==ABORTING
The proof-of-concept requires Python3. To use it to generate a document triggering the vulnerability, the output filename needs to be passed as a parameter. Once the document has been created, it can be used with 32-bit build of catdoc
in order to trigger the crash.
$ python3 poc.py3.zip /path/to/filename.doc
...
$ src/catdoc /path/to/filename.doc
The header for the document has the following format. At offset 0x28 is a structure describing information used for the file allocation table.
>>> store
<class storage.File> 'unnamed_7fa99c135970' {unnamed=True,underload=True}
[0] <instance storage.Header 'Header'> (little) 0xd0cf11e0a1b11ae1 version=3.62 clsid={00000000-0000-0000-0000-000000000000}
[1e] <instance storage.HeaderSectorShift 'SectorShift'> uSectorShift=16 (0x10000) uMiniSectorShift=1 (0x2)
[22] <instance ptype.block 'reserved'> (6) "\x00\x00\x00\x00\x00\x00"
[28] <instance storage.HeaderFat 'Fat'> sectDirectory=ENDOFCHAIN(0xfffffffe) csectDirectory=0 csectFat=65536 dwTransaction=0x00000000
[38] <instance storage.HeaderMiniFat 'MiniFat'> ulMiniSectorCutoff=4096 sectMiniFat=ENDOFCHAIN(0xfffffffe) csectMiniFat=0
[44] <instance storage.HeaderDiFat 'DiFat'> sectDifat=ENDOFCHAIN(0xfffffffe) csectDifat=0
[4c] <instance storage.DIFAT 'Table'> storage.DIFAT.IndirectPointer[109] +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[200] <instance ptype.block 'padding(Table)'> ...
[200] <instance FileSectors 'Data'> {underload=True} _object_[1] "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 ...total 65024 bytes... \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
Within this structure at offset 0x2C is a 32-bit integer containing the number of sectors used for the file allocation table. If the product of this number and the sector size at offset 0x1E is larger than 32-bits, then this vulnerability is being triggered. Within the generated proof-of-concept, the sector size and number of sectors are both set to 0x10000.
>>> store['fat']
<class storage.HeaderFat> 'Fat'
[28] <instance storage.DWORD 'csectDirectory'> 0x00000000 (0)
[2c] <instance storage.DWORD 'csectFat'> 0x00010000 (65536)
[30] <instance storage.SECT(Pointer._object_, SECT._calculate_) 'sectDirectory'> ENDOFCHAIN(0xfffffffe)
[34] <instance storage.DWORD 'dwTransaction'> 0x00000000 (0)
2025-01-07 - Initial Vendor Contact
2025-01-14 - Initial Vendor Contact
2025-01-16 - Vendor Disclosure
2025-06-02 - Public Release
Discovered by a member of Cisco Talos.