CVE-2016-4331
HDF5 is a file format that is maintained by a non-profit organization, The HDF Group. HDF5 is designed to be used for storage and organization of large amounts of scientific data and is used to exchange data structures between applications in industries such as the GIS industry via libraries such as GDAL, OGR, or as part of software like ArcGIS.
The vulnerability exists when the library is decoding data out of a dataset encoded with the H5Z_NBIT decoding. When calculating the precision that a BCD number is encoded as, the library will fail to ensure that the precision is within the bounds of the size. Due to this, the library will calculate an index outside the bounds of the space allocated for the BCD number. Whilst decoding this data, the library will then write outside the bounds of the buffer leading to a heap-based buffer overflow. This can lead to code execution under the context of the application using the library.
hdf5-1.8.16.tar.bz2
tools/h5ls: Version 1.8.16
tools/h5stat: Version 1.8.16
tools/h5dump: Version 1.8.16
http://www.hdfgroup.org/HDF5/
http://www.hdfgroup.org/HDF5/release/obtainsrc.html
http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.16.tar.bz2
8.6 - CVSS:3.0/AV:L/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H
The HDF file format is intended to be a general file format that is self-describing for various types of data structures used in the scientific community [1]. These data structures are intended to be stored in two types of objects, Datasets and Groups. Paralleling the file-format to a file system, a Dataset can be interpreted as a file, and a Group can be interpreted as a directory that’s able to contain other Datasets or Groups. Associated with each entry, is metadata containing user-defined named attributes that can be used to describe the dataset.
When reading a dataset out of the file, the HDF5 library will call the following function, H5Dread. After allocating space for the buffer, the library will call an internal function H5D__read which will eventually call into H5D__chunk_lock. This function is responsible for reading the contents of the dataset into a cache that the application will later be able to access.
src/H5Dio.c:125
herr_t
H5Dread(hid_t dset_id, hid_t mem_type_id, hid_t mem_space_id,
hid_t file_space_id, hid_t plist_id, void *buf/*out*/)
{
...
/* read raw data */
if(H5D__read(dset, mem_type_id, mem_space, file_space, plist_id, buf/*out*/) < 0)
HGOTO_ERROR(H5E_DATASET, H5E_READERROR, FAIL, "can't read data")
src/H5Dio.c:373
herr_t
H5D__read(H5D_t *dataset, hid_t mem_type_id, const H5S_t *mem_space,
const H5S_t *file_space, hid_t dxpl_id, void *buf/*out*/)
{
...
/* Invoke correct "high level" I/O routine */
if((*io_info.io_ops.multi_read)(&io_info, &type_info, nelmts, file_space, mem_space, &fm) <
0) // XXX: \
HGOTO_ERROR(H5E_DATASET, H5E_READERROR, FAIL, "can't read data")
\ ... src/H5Dchunk.c:1873
/* Lock the chunk into the cache */
if(NULL == (chunk = H5D__chunk_lock(io_info, &udata, FALSE)))
HGOTO_ERROR(H5E_IO, H5E_READERROR, FAIL, "unable to read raw data chunk")
Once chunk_lock is read, the library will call into it’s pipeline to determine how it can decode the data. This happens by calling H5Z_pipeline. Inside H5Z_pipeline, the library will determine what kind of filter to choose and then call the “filter” method from a data structure that contains methods or handlers that deal with the specific encoding type.
src/H5Dchunk.c:2808
void *
H5D__chunk_lock(const H5D_io_info_t *io_info, H5D_chunk_ud_t *udata,
hbool_t relax)
{
if(H5Z_pipeline(pline, H5Z_FLAG_REVERSE, &(udata->filter_mask), io_info->dxpl_cache-
>err_detect, // XXX: \
io_info->dxpl_cache->filter_cb, &chunk_alloc, &chunk_alloc, &chunk) < 0)
HGOTO_ERROR(H5E_PLINE, H5E_CANTFILTER, NULL, "data pipeline read failed")
H5_CHECKED_ASSIGN(udata->nbytes, uint32_t, chunk_alloc, size_t);
\
src/H5Z.c:1285
herr_t
H5Z_pipeline(const H5O_pline_t *pline, unsigned flags,
unsigned *filter_mask/*in,out*/, H5Z_EDC_t edc_read,
H5Z_cb_t cb_struct, size_t *nbytes/*in,out*/,
size_t *buf_size/*in,out*/, void **buf/*in,out*/)
{
tmp_flags=flags|(pline->filter[idx].flags);
tmp_flags|=(edc_read== H5Z_DISABLE_EDC) ? H5Z_FLAG_SKIP_EDC : 0;
new_nbytes = (fclass->filter)(tmp_flags, pline->filter[idx].cd_nelmts, // XXX: calls
H5Z_filter_nbit
pline->filter[idx].cd_values, *nbytes, buf_size, buf);
When handling data encoded with the nbit encoding type, the library will call H5Z_filter_nbit. This function will take inputs from the file and use it to calculate the amount of space required to decode the encoded data. This is done by taking the number of elements and multiplying it by the size of elements. With the provided proof of concept, the size is 4 and the number of elements is 12. This results in a buffer size of 48 bytes.
src/H5Znbit.c:865
static size_t
H5Z_filter_nbit(unsigned flags, size_t cd_nelmts, const unsigned cd_values[],
size_t nbytes, size_t *buf_size, void **buf)
{
/* copy a filter parameter to d_nelmts */
d_nelmts = cd_values[2]; // XXX: number of elements
/* input; decompress */
if(flags & H5Z_FLAG_REVERSE) {
size_out = d_nelmts * cd_values[4]; /* cd_values[4] stores datatype size */ // XXX:
size of elements
/* allocate memory space for decompressed buffer */
if(NULL == (outbuf = (unsigned char *)H5MM_malloc(size_out))) // XXX:
seems to be 0x30
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, 0, "memory allocation failed for nbit
decompression")
/* decompress the buffer */
H5Z_nbit_decompress(outbuf, d_nelmts, (unsigned char *)*buf, cd_values); // XXX:
decompress data into outbuf
When entering the H5Z_NBIT_ATOMIC case, the library copy input from the file into a structure that gets passed to H5Z_nbit_decompress_one_atomic. This loop will iterate for the number of elements that were stored in the dataset. The field that is used later to write outside the buffer allocated in the prior snippet is used to determine the precision of a binary-coded-decimal number.
src/H5Znbit.c:1140
static void
H5Z_nbit_decompress(unsigned char *data, unsigned d_nelmts, unsigned char *buffer,
const unsigned parms[])
{
switch(parms[3]) {
case H5Z_NBIT_ATOMIC:
/* set the index before goto function call */
p.size = parms[4];
p.order = parms[5];
p.precision = parms[6]; // XXX: used later
p.offset = parms[7];
for(i = 0; i < d_nelmts; i++) {
H5Z_nbit_decompress_one_atomic(data, i*p.size, buffer, &j, &buf_len, p);
}
break;
Once inside H5Z_nbit_decompress_one_atomic, the library will use the value of p.precision to calculate the index into the buffer that was allocated. Due to a lack of bounds-checking, this index will allow for a loop that is executed later to write outside the bounds of the buffer. If precision is larger than datatype_len, then the index can be made to overflow.
src/H5Znbit.c:1012
static void
H5Z_nbit_decompress_one_atomic(unsigned char *data, size_t data_offset,
unsigned char *buffer, size_t *j, int *buf_len, parms_atomic p)
{
/* begin_i: the index of byte having first significant bit
end_i: the index of byte having last significant bit */
int k, begin_i, end_i, datatype_len;
datatype_len = p.size * 8;
if(p.order == H5Z_NBIT_ORDER_BE) { /* big endian */
/* calculate begin_i and end_i */
begin_i = (datatype_len - p.precision - p.offset) / 8; // XXX: p.precision is used here to calculate begin_i
if(p.offset % 8 != 0)
end_i = (datatype_len - p.offset) / 8;
else
end_i = (datatype_len - p.offset) / 8 - 1;
for(k = begin_i; k <= end_i; k++)
H5Z_nbit_decompress_one_byte(data, data_offset, k, begin_i, end_i, // XXX: k == begin_i
buffer, j, buf_len, p, datatype_len);
It is within H5Z_nbit_decompress_one_byte that the library will write outside it's bounds. Due to the value of begin_i pointing outside the buffer, this can be used to trigger a buffer overflow and overwrite adjacent data. This is a heap-based buffer overflow and can lead to code execution within the context of the library.
src/H5Znbit.c:943
static void
H5Z_nbit_decompress_one_byte(unsigned char *data, size_t data_offset, int k,
int begin_i, int end_i, unsigned char *buffer, size_t *j, int *buf_len,
parms_atomic p, int datatype_len)
{
data[data_offset + k] = // XXX: data_offset + k points
outside of data
((val & ~(~0 << *buf_len)) << (dat_len - *buf_len)) << uchar_offset;
$ gdb --directory ~/build/hdf5-1.8.16/release/bin -q --args ~/build/hdf5-1.8.16/release/bin/h5dump 00015
8.hdf
Reading symbols from /home/vrt/build/hdf5-1.8.16/release/bin/h5dump...done.
(gdb) bp H5Znbit.c:896
Breakpoint 3 at 0x8269e94: file ../../src/H5Znbit.c, line 896.
(gdb) bp H5Znbit.c:1162
Breakpoint 4 at 0x8269f09: file ../../src/H5Znbit.c, line 1162.
(gdb) bp H5Znbit.c:1030
Breakpoint 5 at 0x8269549: file ../../src/H5Znbit.c, line 1030.
(gdb) r
Breakpoint 3, H5Z_filter_nbit (flags=0x101, cd_nelmts=0x8, cd_values=0x848db88, nbytes=0x1f, buf_size=0xbfffbc7c, buf=0xbfffbc38) at ../../src/H5Znbit.c:896
896 if(NULL == (outbuf = (unsigned char *)H5MM_malloc(size_out)))
(gdb) p size_out
$6 = 0x0
(gdb) c
Continuing.
Breakpoint 4, H5Z_nbit_decompress (parms=0x848db88, buffer=0x8498530 ":\252\263Ê«>", d_nelmts=0xc, data=0x8498558 "") at ../../src/H5Znbit.c:1162
1162 p.precision = parms[6];
(gdb) c
Continuing.
Breakpoint 5, H5Z_nbit_decompress_one_atomic (data=data@entry=0x8498558 "", data_offset=0x0, buffer=buffer@entry=0x8498530 ":\252\263Ê«>", j=j@entry=0xbfffbb08, buf_len=buf_len@entry=0xbfffbb0c, p=...) at ../../src/H5Znbit.c:1030
1030 for(k = begin_i; k >= end_i; k--)
(gdb) p k
$7 = 0x2003
(gdb) c
Continuing.
Catchpoint 2 (signal SIGSEGV), 0x082695ff in H5Z_nbit_decompress_one_byte (datatype_len=0x20, buf_len=0xbfffbb0c, j=0xbfffbb08, buffer=0x8498530 ":\252\263Ê«>", end_i=0x0, begin_i=0x2003, k=0x557, data_offset=<optimized out>, data=<optimized out>, p=...) at ../../src/H5Znbit.c:983
983 ((val >> (*buf_len - dat_len)) & ~(~0 << dat_len)) << uchar_offset;
###Crash Analysis (Address Sanitizer)
=================================================================
==2374==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xb3519a53 at pc 0xb74003af bp 0xbfcd6c38 sp 0xbfcd6c30
WRITE of size 1 at 0xb3519a53 thread T0
#0 0xb74003ae in H5Z_nbit_decompress_one_byte /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Znbit.c:975
#1 0xb73f7a71 in H5Z_nbit_decompress_one_atomic /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Znbit.c:1031
#2 0xb73e96ac in H5Z_nbit_decompress /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Znbit.c:1165
#3 0xb73e783f in H5Z_filter_nbit /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Znbit.c:900
#4 0xb73ccfd6 in H5Z_pipeline /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Z.c:1360
#5 0xb568fc12 in H5D__chunk_lock /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Dchunk.c:2903
#6 0xb5673334 in H5D__chunk_read /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Dchunk.c:1874
#7 0xb57bb68d in H5D__read /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Dio.c:550
#8 0xb57b5813 in H5Dread /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Dio.c:172
#9 0x81e9cf3 in h5tools_dump_simple_dset /home/vrt/build/hdf5-1.8.16/memcheck/tools/lib/../../../tools/lib/h5tools_dump.c:1619
#10 0x81e543d in h5tools_dump_dset /home/vrt/build/hdf5-1.8.16/memcheck/tools/lib/../../../tools/lib/h5tools_dump.c:1790
#11 0x821316a in h5tools_dump_data /home/vrt/build/hdf5-1.8.16/memcheck/tools/lib/../../../tools/lib/h5tools_dump.c:3859
#12 0x81045ca in dump_dataset /home/vrt/build/hdf5-1.8.16/memcheck/tools/h5dump/../../../tools/h5dump/h5dump_ddl.c:1053
#13 0x80f5da3 in dump_all_cb /home/vrt/build/hdf5-1.8.16/memcheck/tools/h5dump/../../../tools/h5dump/h5dump_ddl.c:358
#14 0xb5c1669a in H5G_iterate_cb /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Gint.c:782
#15 0xb5c71a72 in H5G__node_iterate /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Gnode.c:1026
#16 0xb5465c85 in H5B_iterate_helper /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5B.c:1175
#17 0xb54636db in H5B_iterate /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5B.c:1220
#18 0xb5cca773 in H5G__stab_iterate /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Gstab.c:565
#19 0xb5c95af2 in H5G__obj_iterate /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Gobj.c:707
#20 0xb5c14ba9 in H5G_iterate /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Gint.c:843
#21 0xb5fcdb47 in H5Literate /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5L.c:1182
#22 0x80f1609 in link_iteration /home/vrt/build/hdf5-1.8.16/memcheck/tools/h5dump/../../../tools/h5dump/h5dump_ddl.c:608
#23 0x8100596 in dump_group /home/vrt/build/hdf5-1.8.16/memcheck/tools/h5dump/../../../tools/h5dump/h5dump_ddl.c:890
#24 0x80d3345 in main /home/vrt/build/hdf5-1.8.16/memcheck/tools/h5dump/../../../tools/h5dump/h5dump.c:1542
#25 0xb5021a82 (/lib/i386-linux-gnu/libc.so.6+0x19a82)
#26 0x80cec04 in _start (/home/vrt/build/hdf5-1.8.16/memcheck/bin/h5dump+0x80cec04)
AddressSanitizer can not describe address in more detail (wild memory access suspected).
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/vrt/build/hdf5-1.8.16/memcheck/src/../../src/H5Znbit.c:975 H5Z_nbit_decompress_one_byte
2016-05-17 - Vendor Notification
2016-11-15 - Public Disclosure
[1] https://en.wikipedia.org/wiki/Hierarchical_Data_Format
[2] http://www.hdfgroup.org/HDF5/
Discovered by Cisco Talos