CVE-2023-39235,CVE-2023-39234
Multiple out-of-bounds write vulnerabilities exist in the VZT vzt_rd_process_block autosort functionality of GTKWave 3.3.115. A specially crafted .vzt file can lead to arbitrary code execution. A victim would need to open a malicious file to trigger these vulnerabilities.
The versions below were either tested or verified to be vulnerable by Talos or confirmed to be vulnerable by the vendor.
GTKWave 3.3.115
GTKWave - https://gtkwave.sourceforge.net
7.8 - CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H
CWE-129 - Improper Validation of Array Index
GTKWave is a wave viewer, often used to analyze FPGA simulations and logic analyzer captures. It includes a GUI to view and analyze traces, as well as convert across several file formats (.lxt
, .lxt2
, .vzt
, .fst
, .ghw
, .vcd
, .evcd
) either by using the UI or its command line tools. GTKWave is available for Linux, Windows and MacOS. Trace files can be shared within teams or organizations, for example to compare results of simulation runs across different design implementations, to analyze protocols captured with logic analyzers or just as a reference when porting design implementations.
GTKWave sets up mime types for its supported extensions. So, for example, it’s enough for a victim to double-click on a wave file received by e-mail to cause the gtkwave program to be executed and load a potentially malicious file.
VZT (Verilog Zipped Trace) files are parsed by the functions found in vzt_read.c
. These functions are used in the vzt2vcd
file conversion utility, vztminer
, and by the GUI portion of GTKwave. Thus both are affected by the issue described in this report.
To parse VZT files, the function vzt_rd_init_smp
is called:
struct vzt_rd_trace *vzt_rd_init_smp(const char *name, unsigned int num_cpus) {
[1] struct vzt_rd_trace *lt = (struct vzt_rd_trace *)calloc(1, sizeof(struct vzt_rd_trace));
...
[2] if (!(lt->handle = fopen(name, "rb"))) {
vzt_rd_close(lt);
lt = NULL;
} else {
vztint16_t id = 0, version = 0;
...
[3] if (!fread(&id, 2, 1, lt->handle)) {
id = 0;
}
if (!fread(&version, 2, 1, lt->handle)) {
id = 0;
}
if (!fread(<->granule_size, 1, 1, lt->handle)) {
id = 0;
}
...
At [1] the lt
structure is initialized. This is the structure that will contain all the information about the input file.
The input file is opened [2] and 3 fields are read [3] to make sure the input file is a supported VZT file.
...
rcf = fread(<->numfacs, 4, 1, lt->handle);
[4] lt->numfacs = rcf ? vzt_rd_get_32(<->numfacs, 0) : 0;
...
rcf = fread(<->numfacbytes, 4, 1, lt->handle);
lt->numfacbytes = rcf ? vzt_rd_get_32(<->numfacbytes, 0) : 0;
rcf = fread(<->longestname, 4, 1, lt->handle);
lt->longestname = rcf ? vzt_rd_get_32(<->longestname, 0) : 0;
rcf = fread(<->zfacnamesize, 4, 1, lt->handle);
lt->zfacnamesize = rcf ? vzt_rd_get_32(<->zfacnamesize, 0) : 0;
rcf = fread(<->zfacname_predec_size, 4, 1, lt->handle);
lt->zfacname_predec_size = rcf ? vzt_rd_get_32(<->zfacname_predec_size, 0) : 0;
rcf = fread(<->zfacgeometrysize, 4, 1, lt->handle);
lt->zfacgeometrysize = rcf ? vzt_rd_get_32(<->zfacgeometrysize, 0) : 0;
rcf = fread(<->timescale, 1, 1, lt->handle);
...
Several fields are then read from the file [4]:
numfacs
: the number of facilities (elements in facnames
)numfacbytes
: unusedlongestname
: keeps the longest length of all defined facilities’ nameszfacnamesize
: compressed size of facnames
zfacname_predec_size
: decompressed size of facnames
zfacgeometrysize
: compressed size of facgeometry
Then, the facnames
and facgeometry
structures are extracted. They can be compressed with either gzip, bzip2 or lzma, depending on the first 2 bytes within the structure buffer.
Right after these two structures, there’s a sequence of blocks that can be arbitrarily long.
for (;;) {
...
[5] b = calloc(1, sizeof(struct vzt_rd_block));
b->last_rd_value_idx = ~0;
[6] rcf = fread(&b->uncompressed_siz, 4, 1, lt->handle);
b->uncompressed_siz = rcf ? vzt_rd_get_32(&b->uncompressed_siz, 0) : 0;
rcf = fread(&b->compressed_siz, 4, 1, lt->handle);
b->compressed_siz = rcf ? vzt_rd_get_32(&b->compressed_siz, 0) : 0;
rcf = fread(&b->start, 8, 1, lt->handle);
b->start = rcf ? vzt_rd_get_64(&b->start, 0) : 0;
rcf = fread(&b->end, 8, 1, lt->handle);
b->end = rcf ? vzt_rd_get_64(&b->end, 0) : 0;
pos = ftello(lt->handle);
...
if ((b->uncompressed_siz) && (b->compressed_siz) && (b->end)) {
/* fprintf(stderr, VZT_RDLOAD"block [%d] %lld / %lld\n", lt->numblocks, b->start, b->end); */
fseeko(lt->handle, b->compressed_siz, SEEK_CUR);
lt->numblocks++;
if (lt->numblocks <= lt->pthreads) {
vzt_rd_pthread_mutex_init(lt, &b->mutex, NULL);
vzt_rd_decompress_blk_pth(lt, b); /* prefetch first block */
}
[7] if (lt->block_curr) {
b->prev = lt->block_curr;
lt->block_curr->next = b;
lt->block_curr = b;
lt->end = b->end;
} else {
lt->block_head = lt->block_curr = b;
lt->start = b->start;
lt->end = b->end;
}
} else {
free(b);
break;
}
pos += b->compressed_siz;
}
At [5] the block structure is initialized. At [6] some fields are extracted. Finally, the block is saved inside a linked list [7].
From this code we can see the file structure for a block as follows:
uncompressed_siz
- unsigned big endian 32-bitcompressed_siz
- unsigned big endian 32-bitstart_time
- unsigned big endian 64-bitend_time
- unsigned big endian 64-bitcompressed_siz
Upon return from the current vzt_rd_init_smp
function, the blocks are parsed inside vzt_rd_iter_blocks
.
Eventually, a call to vzt_rd_decompress_blk
decompresses the compressed contents of the block and sets b->mem
to point to the contents of the decompressed data.
Once b->mem
is set, we reach a call to vzt_rd_block_vch_decode
that parses the compressed block contents.
static void vzt_rd_block_vch_decode(struct vzt_rd_trace *lt, struct vzt_rd_block *b) {
vzt_rd_pthread_mutex_lock(lt, &b->mutex);
if ((!b->times) && (b->mem)) {
vztint64_t *times = NULL;
vztint32_t *change_dict = NULL;
vztint32_t *val_dict = NULL;
unsigned int num_time_ticks, num_sections, num_dict_entries;
[8] unsigned char *pnt = b->mem;
vztint32_t i, j, m, num_dict_words;
/* vztint32_t *block_end = (vztint32_t *)(pnt + b->uncompressed_siz); */
vztint32_t *val_tmp;
unsigned int num_bitplanes;
uintptr_t padskip;
[9] num_time_ticks = vzt_rd_get_v32(&pnt);
...
[10] num_sections = vzt_rd_get_v32(&pnt);
num_dict_entries = vzt_rd_get_v32(&pnt);
padskip = ((uintptr_t)pnt) & 3;
[11] pnt += (padskip) ? 4 - padskip : 0; /* skip pad to next 4 byte boundary */
...
val_dict = (vztint32_t *)pnt;
pnt = (char *)(val_dict + (num_dict_words = num_dict_entries * num_sections));
bpcalc:
[12] num_bitplanes = vzt_rd_get_byte(pnt, 0) + 1;
pnt++;
b->multi_state = (num_bitplanes > 1);
padskip = ((uintptr_t)pnt) & 3;
[13] pnt += (padskip) ? 4 - padskip : 0; /* skip pad to next 4 byte boundary */
b->vindex = (vztint32_t *)(pnt);
...
pnt = (char *)(b->vindex + num_bitplanes * lt->total_values);
...
num_dict_words = (num_sections * num_dict_entries) * sizeof(vztint32_t);
[14] change_dict = malloc(num_dict_words ? num_dict_words : sizeof(vztint32_t)); /* scan-build */
m = 0;
for (i = 0; i < num_dict_entries; i++) {
vztint32_t pbit = 0;
for (j = 0; j < num_sections; j++) {
vztint32_t k = val_dict[m];
[15] vztint32_t l = k ^ ((k << 1) ^ pbit);
change_dict[m++] = l;
pbit = k >> 31;
}
}
[16] b->val_dict = val_dict;
b->change_dict = change_dict;
b->times = times;
b->num_time_ticks = num_time_ticks;
b->num_dict_entries = num_dict_entries;
b->num_sections = num_sections;
}
At [8] pnt
is set to point to the decompressed block data.
At [9] num_time_ticks
is extracted as a 32-bit varint from pnt
and times are extracted from the block.
At [10] num_sections
and num_dict_entries
are extracted.
At [11] a pointer to val_dict
is extracted, ensuring it’s aligned to a 4-byte boundary. This array contains wave values.
At [12] num_bitplanes
is extracted, 1 byte.
At [13] a pointer to vindex
is extracted, ensuring it’s aligned to a 4-byte boundary.
At [14] change_dict
is calculated based on val_dict
. We can see at [15] that change_dict
is performing an operation to calculate all signal transitions in the val_dict
integer array, which is treated as a bit-stream. This is important to note as we can tell change_dict
is arbitrarily controlled by the file contents.
Finally at [16] the extracted values are saved inside the block structure b
.
Back inside vzt_rd_process_block
at [17], the function loops over all facilities:
int vzt_rd_process_block(struct vzt_rd_trace *lt, struct vzt_rd_block *b) {
unsigned int i, i2;
vztint32_t idx;
char *pnt = lt->value_current_sector, *pnt2 = lt->value_previous_sector;
char buf[32];
char *bufpnt;
struct vzt_ncycle_autosort **autosort;
struct vzt_ncycle_autosort *deadlist = NULL;
struct vzt_ncycle_autosort *autofacs = calloc(lt->numrealfacs ? lt->numrealfacs : 1, sizeof(struct vzt_ncycle_autosort)); /* fix for scan-build on lt->numrealfacs */
vzt_rd_block_vch_decode(lt, b);
[17] vzt_rd_pthread_mutex_lock(lt, &b->mutex);
[18] autosort = calloc(b->num_time_ticks, sizeof(struct vzt_ncycle_autosort *));
for (i = 0; i < b->num_time_ticks; i++) autosort[i] = NULL;
deadlist = NULL;
[19] for (idx = 0; idx < lt->numrealfacs; idx++) {
int process_idx = idx / 8;
int process_bit = idx & 7;
...
i2 = vzt_rd_next_value_chg_time(lt, b, i, idx);
if (i2) {
struct vzt_ncycle_autosort *t = autosort[i2];
autofacs[idx].next = t;
[22] autosort[i2] = autofacs + idx;
} else {
struct vzt_ncycle_autosort *t = deadlist;
autofacs[idx].next = t;
deadlist = autofacs + idx;
}
}
}
[20] for (i = 1; i < b->num_time_ticks; i++) {
struct vzt_ncycle_autosort *t = autosort[i];
if (t) {
while (t) {
struct vzt_ncycle_autosort *tn = t->next;
idx = t - autofacs;
vzt_rd_fac_value(lt, b, i, idx, pnt);
if (!(lt->flags[idx] & (VZT_RD_SYM_F_DOUBLE | VZT_RD_SYM_F_STRING))) {
lt->value_change_callback(<, &b->times[i], &idx, &pnt);
} else {
if (lt->flags[idx] & VZT_RD_SYM_F_DOUBLE) {
bufpnt = buf;
vzt_rd_double_xdr(pnt, buf);
[21] lt->value_change_callback(<, &b->times[i], &idx, &bufpnt);
} else {
unsigned int spnt = vzt_rd_make_sindex(pnt);
char *msg = ((!i) && (b->prev)) ? "UNDEF" : b->sindex[spnt];
[21] lt->value_change_callback(<, &b->times[i], &idx, &msg);
}
}
i2 = vzt_rd_next_value_chg_time(lt, b, i, idx);
if (i2 != i) {
struct vzt_ncycle_autosort *ta = autosort[i2];
autofacs[idx].next = ta;
[23] autosort[i2] = autofacs + idx;
} else {
struct vzt_ncycle_autosort *ta = deadlist;
autofacs[idx].next = ta;
deadlist = autofacs + idx;
}
t = tn;
}
}
}
...
At high level, this function sorts the wave value changes by looping over all facilities [19] and time ticks [20] and eventually emits VCD syntax accordingly [21].
At [18], autosort
is allocated with a size of b->num_time_ticks * sizeof(void *)
. Then, the autosort
array is written and points to the autofacs
array, using the i2
index. Ideally this index should be within the size of autosort
. However, this is not the case, as i2
can be arbitrarily controlled, and there are no checks that make sure the writes at [22] and [23] happen within the bounds of the autosort
array.
i2
is returned by vzt_rd_next_value_chg_time
:
vztint32_t vzt_rd_next_value_chg_time(struct vzt_rd_trace *lt, struct vzt_rd_block *b, vztint32_t time_offset, vztint32_t facidx) {
unsigned int i;
vztint32_t len = lt->len[facidx];
vztint32_t vindex_offset = lt->vindex_offset[facidx];
vztint32_t vindex_offset_x = vindex_offset + lt->total_values;
vztint32_t old_time_offset = time_offset;
int word = time_offset / 32;
int bit = (time_offset & 31) + 1;
int row_size = b->num_sections;
vztint32_t *valpnt, *valpnt_x;
vztint32_t change_msk;
if ((time_offset >= (b->num_time_ticks - 1)) || (facidx > lt->numrealfacs)) return (time_offset);
time_offset &= ~31;
for (; word < row_size; word++) {
if (bit != 32) {
change_msk = 0;
if (!(lt->flags[facidx] & VZT_RD_SYM_F_SYNVEC)) {
if (b->multi_state) {
for (i = 0; i < len; i++) {
valpnt = b->change_dict + (b->vindex[vindex_offset + i] * row_size + word);
valpnt_x = b->change_dict + (b->vindex[vindex_offset_x + i] * row_size + word);
change_msk |= *valpnt;
change_msk |= *valpnt_x;
}
} else {
for (i = 0; i < len; i++) {
valpnt = b->change_dict + (b->vindex[vindex_offset + i] * row_size + word);
change_msk |= *valpnt;
}
}
} else {
if (b->multi_state) {
for (i = 0; i < len; i++) {
if ((facidx + i) >= lt->numfacs) break;
vindex_offset = lt->vindex_offset[facidx + i];
vindex_offset_x = vindex_offset + lt->total_values;
valpnt = b->change_dict + (b->vindex[vindex_offset] * row_size + word);
valpnt_x = b->change_dict + (b->vindex[vindex_offset_x] * row_size + word);
change_msk |= *valpnt;
change_msk |= *valpnt_x;
}
} else {
for (i = 0; i < len; i++) {
if ((facidx + i) >= lt->numfacs) break;
vindex_offset = lt->vindex_offset[facidx + i];
valpnt = b->change_dict + (b->vindex[vindex_offset] * row_size + word);
change_msk |= *valpnt;
}
}
}
change_msk >>= bit;
if (change_msk) {
[24] return ((change_msk & 1 ? 0 : vzt_rd_tzc(change_msk)) + time_offset + bit);
}
}
time_offset += 32;
bit = 0;
}
return (old_time_offset);
}
In this function change_msk
is calculated based on change_dict
, which is arbitrarily controlled by values in inside blocks [15]. At [24], the function can return the number position of the most significant bit in change_msk
. As this is controlled by file contents, this issue can be used to control i2
, which in turn leads to writing out-of-bounds in the heap at [22] and [23].
With careful heap manipulation, this issue can be used to execute code arbitrarily.
At [22], while looping over lt->numrealfacs
, autosort
can be written out-of-bounds in the heap, which can in turn lead to arbitrary code execution.
At [23], while looping over lt->num_time_ticks
, autosort
can be written out-of-bounds in the heap, which can in turn lead to arbitrary code execution.
Fixed in version 3.3.118, available from https://sourceforge.net/projects/gtkwave/files/gtkwave-3.3.118/
2023-08-02 - Vendor Disclosure
2023-12-31 - Vendor Patch Release
2024-01-08 - Public Release
Discovered by Claudio Bozzato of Cisco Talos.