Talos Vulnerability Report

TALOS-2024-2128

Catdoc xls2csv utility Shared String Table Record Parser memory corruption vulnerability

June 2, 2025
CVE Number

CVE-2024-48877

SUMMARY

A memory corruption vulnerability exists in the Shared String Table Record Parser implementation in xls2csv utility version 0.95. A specially crafted malformed file can lead to a heap buffer overflow. An attacker can provide a malicious file to trigger this vulnerability.

CONFIRMED VULNERABLE VERSIONS

The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.

xls2csv 0.95

PRODUCT URLS

xls2csv - http://wagner.pp.ru/~vitus/software/catdoc/

CVSSv3 SCORE

8.4 - CVSS:3.1/AV:L/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

CWE

CWE-680 - Integer Overflow to Buffer Overflow

DETAILS

The xls2csv command-line utility is used by several filesystem indexers in order to extract the textual content from a Microsoft Excel document. The purpose of this is to enable searching through the contents of office documents that are stored on a filesystem.

The xls2csv program is an open-source application that comes with the catdoc suite of command-line utilities. These utilities are designed to extract plaintext from the contents of various Microsoft Office documents. These programs support file formats from Microsoft Word, Excel, and Powerpoint. They are used internally by several filesystem indexers such as KDE’s Baloo in order to provide full text search for the contents of documents to the user.

The xls2csv application is responsible for extracting plaintext from Microsoft Excel documents. At [1], the binary will open up a handle to the document that is to be parsed and then process its header using the call to ole_init at [2]. After processing the header, a loop will be entered to iterate through each directory entry in the format. If the directory entry has the name “Book” or “Workbook”, the do_table function at [3] will be used to parse its contents.

src/xls2csv.c:40-180
char *input_buffer, *output_buffer;
int main(int argc, char *argv[])
{
...
    for (i=optind;i<argc;i++) {
       filename = argv[i];
       input=fopen(filename,"rb");                              // [1] open up handle to document
       if (!input) {
           perror(filename);
           exit(1);
       }
       if ((new_file=ole_init(input, NULL, 0)) != NULL) {       // [2] read the OLE compound document header
           set_ole_func();
           while((ole_file=ole_readdir(new_file)) != NULL) {    // [2] traverse each directory entry
               int res=ole_open(ole_file);
...
               if (res >= 0) {
                   if (strcasecmp(((oleEntry*)ole_file)->name , "Workbook") == 0
                           || strcasecmp(((oleEntry*)ole_file)->name,"Book") == 0) {
                           do_table(ole_file,filename);         // [3] process the excel "Workbook" stream
                   }
               } 
               ole_close(ole_file);
           }
           set_std_func();
...
    return 0;
}

The following function is the implementation of do_table. This function is used to read each individual record within the stream and process them. At [4] and [5], the function will read the 16-bit record type and length from the stream. Once this information has been read, both fields will be passed to the process_item function call at [6].

src/xlsparse.c:35-124
void do_table(FILE *input,char *filename) {    
    long rectype;
    long reclen,build_year=0,build_rel=0,offset=0;
    int eof_flag=0;
    int itemsread=1;
    date_shift=25569.0; /* Windows 1900 date system */
    CleanUpFormatIdxUsed();
...
    while(itemsread){
        unsigned char buffer[2];
...
        rectype=getshort(buffer,0);                         // [4] read 16-bit record type
        itemsread = catdoc_read(buffer, 2, 1, input);
        if(itemsread == 0)
            break;
        reclen=getshort(buffer,0);                          // [5] read 16-bit record length
        if (reclen && reclen <MAX_MS_RECSIZE &&reclen >0){
            itemsread = catdoc_read(rec, 1, reclen, input);
            rec[reclen] = '\0';
        }
...
        process_item(rectype,reclen,rec);                   // [6] process the record
...
    }
    return;
}

The process_item function will use the record type and length to parse each individual record within the directory stream. The two record types that are associated with this vulnerability are the “SST” and “CONTINUE” records. At [7], the record length will be used to allocate a buffer for the “SST” record, and then read the record’s contents into it. At [8], the record length will be used to resize the current buffer for the SST and then read the contents of the “CONTINUE” record into it. Once the process_item function encounters a record type that is not a “CONTINUE” record, the parse_sst function at [9] will be used to process the data read from the stream.

src/xlsparse.c:137-416
void process_item (int rectype, int reclen, unsigned char *rec) {
    if (rectype != CONTINUE && prev_rectype == SST) {
...
        parse_sst(sstBuffer,sstBytes);                  // [9] Parse the collected SST contents
    }	
    switch (rectype) {
...
    case SST: {
        /* Just copy SST into buffer, and wait until we get
         * all CONTINUE records
         */
...
        /* If exists first SST entry, then just drop it and start new*/
        if (sstBuffer != NULL) 
            free(sstBuffer);
        if (sst != NULL)
            free(sst);
...
        sstBuffer=(unsigned char*)malloc(reclen);       // [7] allocate memory for SST record
        sstBytes = reclen;
...
        memcpy(sstBuffer,rec,reclen);                   // [7] write contents of SST to allocation
        break;
    }	
    case CONTINUE: {
        if (prev_rectype != SST) {
            return; /* to avoid changing of prev_rectype;*/
        }    
        sstBuffer=realloc(sstBuffer,sstBytes+reclen);   // [8] resize SST buffer for CONTINUE record
...
        memcpy(sstBuffer+sstBytes,rec,reclen);          // [8] write contents of CONTINUE record
        sstBytes+=reclen;
        return;
    }			   
...
}  

When parsing the linked contents of the combined “SST” and “CONTINUE” records, the following parse_sst function will be used. This function will first read a 32-bit integer from the beginning of the buffer that was populated. This 32-bit size will then be used to allocate an array of pointers at [11]. On 32-bit platforms, this can result in an integer overflow. Immediately afterwards at [12], a loop will be entered in order to populate the recently allocated array. Due to the integer overflow, the array can result in being undersized. At which point this loop will write outside the bounds of the undersized array.

src/xlsparse.c:758-779
void parse_sst(unsigned char *sstbuf,int bufsize) {
    int i; /* index into sst */
    unsigned char *curString; /* pointer into unparsed buffer*/
    unsigned char *barrier=(unsigned char *)sstbuf+bufsize; /*pointer to end of buffer*/
    unsigned char **parsedString;/*pointer into parsed array*/ 
                    
    sstsize = getlong(sstbuf+4,0);                                  // [10] read 32-bit integer
    sst=(unsigned char **)malloc(sstsize*sizeof(unsigned char *));  // [11] allocate memory for string references
...
    memset(sst,0,sstsize*sizeof(char *));
    for (i=0,parsedString=sst,curString=sstbuf+8;                   // [12] loop to read each string from SST
             i<sstsize && curString<barrier; i++,parsedString++) {
...
        *parsedString = copy_unicode_string(&curString);            // [13] assign reference to string into allocation
    }       
...
}	

Crash Information

=================================================================
==446418==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xf5000ff0 at pc 0x0804ce54 bp 0xff8b4c38 sp 0xff8b4c2c
WRITE of size 4 at 0xf5000ff0 thread T0
    #0 0x804ce53 in parse_sst /.../catdoc/src/xlsparse.c:775
    #1 0x804d55f in process_item /.../catdoc/src/xlsparse.c:142
    #2 0x804f1a1 in do_table /.../catdoc/src/xlsparse.c:116
    #3 0x804a0bb in main /.../catdoc/src/xls2csv.c:167
    #4 0xf75e6d42 in __libc_start_call_main (/lib/libc.so.6+0x2d42) (BuildId: 5846c8c4629a038f4d1a47fd0ffd369dedf95400)
    #5 0xf75e6e07 in __libc_start_main@@GLIBC_2.34 (/lib/libc.so.6+0x2e07) (BuildId: 5846c8c4629a038f4d1a47fd0ffd369dedf95400)
    #6 0x804a887 in _start (/.../catdoc/src/xls2csv+0x804a887) (BuildId: b35845f47ccf2f2aec66798eb0192a603c879079)

0xf5000ff1 is located 0 bytes after 1-byte region [0xf5000ff0,0xf5000ff1)
allocated by thread T0 here:
    #0 0xf7989f0b in calloc (/lib/libasan.so.8+0xcef0b) (BuildId: 4750f09fdf739c5f6973660b631b7f5960de4d51)
    #1 0x804cd46 in parse_sst /.../catdoc/src/xlsparse.c:765
    #2 0x804d55f in process_item /.../catdoc/src/xlsparse.c:142
    #3 0x804f1a1 in do_table /.../catdoc/src/xlsparse.c:116
    #4 0x804a0bb in main /.../catdoc/src/xls2csv.c:167
    #5 0xf75e6d42 in __libc_start_call_main (/lib/libc.so.6+0x2d42) (BuildId: 5846c8c4629a038f4d1a47fd0ffd369dedf95400)

SUMMARY: AddressSanitizer: heap-buffer-overflow /.../catdoc/src/xlsparse.c:775 in parse_sst
Shadow bytes around the buggy address:
  0xf5000d00: fa fa 02 fa fa fa 02 fa fa fa 02 fa fa fa 02 fa
  0xf5000d80: fa fa 02 fa fa fa 02 fa fa fa 02 fa fa fa 02 fa
  0xf5000e00: fa fa 03 fa fa fa 02 fa fa fa 03 fa fa fa 02 fa
  0xf5000e80: fa fa 02 fa fa fa 02 fa fa fa 02 fa fa fa 03 fa
  0xf5000f00: fa fa 03 fa fa fa 02 fa fa fa 03 fa fa fa 02 fa
=>0xf5000f80: fa fa fa fa fa fa fa fa fa fa 01 fa fa fa[01]fa
  0xf5001000: fa fa fd fa fa fa 01 fa fa fa 01 fa fa fa 04 fa
  0xf5001080: fa fa 03 fa fa fa 02 fa fa fa 03 fa fa fa 04 fa
  0xf5001100: fa fa 02 fa fa fa 02 fa fa fa 03 fa fa fa 03 fa
  0xf5001180: fa fa 02 fa fa fa 02 fa fa fa 02 fa fa fa 05 fa
  0xf5001200: fa fa 03 fa fa fa 02 fa fa fa 03 fa fa fa 03 fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==446418==ABORTING

Exploit Proof of Concept

The included proof-of-concept depends on Python3 and is used to generate a Microsoft Excel document. To use it, pass the desired filename as a parameter. Once the process has completed, the generated file can be used as a parameter to the xls2csv binary.

$ python3 poc.py3.zip /path/to/filename.xls
...
$ src/xls2csv /path/to/filename.xls

The “Workbook” stream in the generated document has 1 worksheet that is composed of 6 records.

>>> store
<class excel.File> 'unnamed_7fe164ba77d0' {unnamed=True,overcommit=True,size=8288} excel.BiffSubStream[1]
[0] excel.BiffSubStream[0] : 0(version=8) : 6 records : excel.BOF5[0:+14] "\x09\x08\x08\x00\x00\x00\x00\x00\x00" ... excel.EOF[2054:+4] "\x0a\x00\x00\x00"

>>> store[0]
<class excel.BiffSubStream> '0' {overcommit=True,document-type='0x0000 (0)',document-version='0.0',document-year=0,document-build=0} excel.RecordGeneral[6]
[0] excel.BiffSubStream[0] : excel.BOF5[0809] : vers=0x0000 (0) rupBuild=0x0000 (0) rupYear=0x0000 (0) dt=0x0000 (0) flags=(0xffffffff00080204,64) :> reserved2=1048575 verLastXLSaved=15 verLowestBiff=255 reserved1 fOOM fBeta
[c] excel.BiffSubStream[1] : 2 * excel.BIFF8.unknown<0204>[0204] : \xff\xff\xff\xff\x00\x00\x00\x00\x01\x00\x01\x00\x00\x00\x00\x00
[24] excel.BiffSubStream[3] : excel.SST[00fc] : "\x01\x00\x00\x00\x00\x00\x00\x40"
[30] *excel.BiffSubStream[4] : excel.Continue[003c] : ...
[2054] excel.BiffSubStream[5] : excel.EOF[000a] : ...

The first record in this worksheet must be a BOF record type of type 0x0809.

>>> store[0][0]
<class excel.RecordGeneral> '0'
[0] <instance excel.RecordGeneral.Header 'header'> type=0x0809 length=0x8(8)
[4] <instance excel.BOF5 'data'> vers=0x0000 (0) rupBuild=0x0000 (0) rupYear=0x0000 (0) dt=0x0000 (0) flags=(0xffffffff00080204,64) :> reserved2=1048575 verLastXLSaved=15 verLowestBiff=255 reserved1 fOOM fBeta
[14] <instance dynamic.block(0) 'extra'> (0) ""

The third record is the SST record of type 0x00fc that is directly responsible for this vulnerability.

>>> store[0][3]
<class excel.RecordGeneral> '3'
[24] <instance excel.RecordGeneral.Header 'header'> type=0x00fc length=0x8(8)
[28] <instance excel.SST 'data'> "\x01\x00\x00\x00\x00\x00\x00\x40"
[30] <instance dynamic.block(0) 'extra'> (0) ""

In this record, the 32-bit “cstUnique” field is used for allocating the array of pointers. If the product of this integer and the size of a pointer is larger than 32-bits, then this vulnerability is being triggered.

>>> store[0][3]['data']
<class excel.SST> 'data'
[28] <instance office.sint4 'cstTotal'> +0x00000001 (1)
[2c] <instance office.sint4 'cstUnique'> +0x40000000 (1073741824)
[30] <instance dynamic.blockarray(excel.XLUnicodeRichExtendedString, 0) 'rgb'> excel.XLUnicodeRichExtendedString[0] ""
TIMELINE

2025-01-07 - Initial Vendor Contact
2025-01-14 - Initial Vendor Contact
2025-01-16 - Vendor Disclosure
2025-06-02 - Public Release

Credit

Discovered by a member of Cisco Talos.