CVE-2016-4333
HDF5 is a file format that is maintained by a non-profit organization, The HDF Group. HDF5 is designed to be used for storage and organization of large amounts of scientific data and is used to exchange data structures between applications in industries such as the GIS industry via libraries such as GDAL, OGR, or as part of software like ArcGIS.
The vulnerability exists due to the library allocating space for the array using a value from the file, and then within the loop for initializing said array allowing a value within the file to modify the loop’s terminator. Due to this, an aggressor can cause the loop’s index to point outside the bounds of the array when initializing it. This is a heap-based buffer overflow, and can lead to code execution under the context of the application using the library.
hdf5-1.8.16.tar.bz2
tools/h5ls: Version 1.8.16
tools/h5stat: Version 1.8.16
tools/h5dump: Version 1.8.16
http://www.hdfgroup.org/HDF5/
http://www.hdfgroup.org/HDF5/release/obtainsrc.html
http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.16.tar.bz2
8.6 – CVSS:3.0/AV:L/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H
The HDF file format is intended to be a general file format that is self-describing for various types of data structures used in the scientific community [1]. These data structures are intended to be stored in two types of objects, Datasets and Groups. Paralleling the file-format to a file system, a Dataset can be interpreted as a file, and a Group can be interpreted as a directory that’s able to contain other Datasets or Groups. Associated with each entry, is metadata containing user-defined named attributes that can be used to describe the dataset.
Within the HDF file format, paths can be specified as the ‘/’-separated posix format. When reading a dataset, the library will open the object using H5D__open_oid. Inside this function, the library will read the type and it’s location. Once the type and it’s location are read, then the library will pass the H5O_DTYPE_ID value along with it’s location onto H5O_msg_read.
src/H5Dint.c:1221
static herr_t
H5D__open_oid(H5D_t *dataset, hid_t dapl_id, hid_t dxpl_id)
{
...
/* Open the dataset object */
if(H5O_open(&(dataset->oloc)) < 0)
HGOTO_ERROR(H5E_DATASET, H5E_CANTOPENOBJ, FAIL, "unable to open")
/* Get the type and space */
if(NULL == (dataset->shared->type = (H5T_t *)H5O_msg_read(&(dataset->oloc), H5O_DTYPE_ID, NULL, dxpl_id))) // XXX: \
HGOTO_ERROR(H5E_DATASET, H5E_CANTINIT, FAIL, "unable to load type info from dataset header")
\
src/H5Omessage.c:463
void *
H5O_msg_read(const H5O_loc_t *loc, unsigned type_id, void *mesg,
hid_t dxpl_id)
{
H5O_t *oh = NULL; /* Object header to use */
void *ret_value; /* Return value */
...
/* Get the object header */
if(NULL == (oh = H5O_protect(loc, dxpl_id, H5AC_READ)))
HGOTO_ERROR(H5E_OHDR, H5E_CANTPROTECT, NULL, "unable to protect object header")
/* Call the "real" read routine */
if(NULL == (ret_value = H5O_msg_read_oh(loc->file, dxpl_id, oh, type_id, mesg))) // XXX: read the message from the object header
HGOTO_ERROR(H5E_OHDR, H5E_READERROR, NULL, "unable to read object header message")
Inside H5O_msg_read_oh, the application will use the type_id argument to determine which message type is being used for a message. This message type is used to determine which callback to use in order to handle the message. This process occurs within the macro H5O_LOAD_NATIVE at H5Omessage.c:545
src/H5Omessage.c:517
void *
H5O_msg_read_oh(H5F_t *f, hid_t dxpl_id, H5O_t *oh, unsigned type_id,
void *mesg)
{
const H5O_msg_class_t *type; /* Actual H5O class type for the ID */
unsigned idx; /* Message's index in object header */
void *ret_value = NULL;
...
for(idx = 0; idx < oh->nmesgs; idx++)
if(type == oh->mesg[idx].type)
break;
...
H5O_LOAD_NATIVE(f, dxpl_id, 0, oh, &(oh->mesg[idx]), NULL)
Inside the H5O_LOAD_NATIVE macro, the application will select a structure containing function pointers out of the msg->type field. This structure contains various functions that are used to decode the message. When decoding a msg of type H5O_DTYPE_ID, the library will dispatch into the H5O_dtype_shared_decode function. This function will eventually call H5O_dtype_decode. Inside H5O_dtype_decode, the application will then call H5O_dtype_decode_helper which is responsible for decoding the data types.
src/H5Oshared.h:50
static H5_INLINE void *
H5O_SHARED_DECODE(H5F_t *f, hid_t dxpl_id, H5O_t *open_oh, unsigned mesg_flags,
unsigned *ioflags, const uint8_t *p)
{
...
/* Decode native message directly */
if(NULL == (ret_value = H5O_SHARED_DECODE_REAL(f, dxpl_id, open_oh, mesg_flags, ioflags, p))) // XXX: \
HGOTO_ERROR(H5E_OHDR, H5E_CANTDECODE, NULL, "unable to decode native message")
} /* end else */
\
src/H5Odtype.c:1091
static void *
H5O_dtype_decode(H5F_t *f, hid_t H5_ATTR_UNUSED dxpl_id, H5O_t H5_ATTR_UNUSED *open_oh, unsigned H5_ATTR_UNUSED mesg_flags,
unsigned *ioflags/*in,out*/, const uint8_t *p)
{
...
/* Allocate datatype message */
if(NULL == (dt = H5T__alloc()))
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, NULL, "memory allocation failed")
/* Perform actual decode of message */
if(H5O_dtype_decode_helper(f, ioflags, &p, dt) < 0)
HGOTO_ERROR(H5E_DATATYPE, H5E_CANTDECODE, NULL, "can't decode type")
Inside decode helper, the library will read a dword from the file and use the bottom 4 bits to determine the datatype. If the datatype is H5T_COMPOUND(6), the library will enter the case at src/H5Odtype.c:260. At the beginning of this case, the library will use a bitmask from the fields to allocate space for the number of members.
src/H5Odtype.c:133
static htri_t
H5O_dtype_decode_helper(H5F_t *f, unsigned *ioflags/*in,out*/, const uint8_t **pp, H5T_t *dt)
{
...
case H5T_COMPOUND:
{
...
dt->shared->u.compnd.nmembs = flags & 0xffff;
if(dt->shared->u.compnd.nmembs == 0)
HGOTO_ERROR(H5E_DATATYPE, H5E_BADVALUE, FAIL, "invalid number of members: %u", dt->shared->u.compnd.nmembs)
dt->shared->u.compnd.nalloc = dt->shared->u.compnd.nmembs; // XXX: proof-of-concept sets this to 3
dt->shared->u.compnd.memb = (H5T_cmemb_t *)H5MM_calloc(dt->shared->u.compnd.nalloc * sizeof(H5T_cmemb_t)); // XXX: buffer that's later written to
dt->shared->u.compnd.memb_size = 0;
Immediately afterwards, the library will enter a loop that is terminated by the number of members in the prior snippet. For each iteration of this loop, the library will read a number of dimensions that will be passed to a function H5T__array_create. Although the library checks that the number of dimensions that are read are bound by 4, the check is done via an assertion. When the library is built in production mode[3], this assertion will be optimized out by the preprocessor.
src/H5Odtype.c:282
for(i = 0; i < dt->shared->u.compnd.nmembs; i++) { // XXX: u.array.ndims
unsigned ndims = 0; /* Number of dimensions of the array field */
htri_t can_upgrade; /* Whether we can upgrade this type's version */
hsize_t dim[H5O_LAYOUT_NDIMS]; /* Dimensions of the array */
H5T_t *array_dt; /* Temporary pointer to the array datatype */
H5T_t *temp_type; /* Temporary pointer to the field's datatype */
...
if(version == H5O_DTYPE_VERSION_1) {
/* Decode the number of dimensions */
ndims = *(*pp)++; // XXX: ndims can be changed within the loop
HDassert(ndims <= 4); // XXX: assertion, if ndims > 4 then H5T_array_create will read oob
*pp += 3; /*reserved bytes */
...
} /* end if */
...
if(version == H5O_DTYPE_VERSION_1) {
...
if((array_dt = H5T__array_create(temp_type, ndims, dim)) == NULL) { // XXX: ndims is passed here
...
} /* end if */
Inside H5T__array_create, the library will use the ndims value as a terminator to a loop. This loop is used to calculate the size of the array. Due to the index being oob of the 4-element array, the loop can assign an arbitrary value to u.array.ndims and u.array.nelem. These values are actually a union within the structure that they’re written to, and due to this can be used to change the length of the loop after the space has already been allocated.
src/H5Tarray.c:179
H5T_t *
H5T__array_create(H5T_t *base, unsigned ndims, const hsize_t dim[/* ndims */])
{
H5T_t *ret_value; /* new array data type */
unsigned u; /* local index variable */
...
/* Build new type */
if(NULL == (ret_value = H5T__alloc()))
HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, NULL, "memory allocation failed")
ret_value->shared->type = H5T_ARRAY;
...
/* Set the array parameters */
ret_value->shared->u.array.ndims = ndims; // XXX: writes to u.compnd.nmembs
/* Copy the array dimensions & compute the # of elements in the array */
for(u = 0, ret_value->shared->u.array.nelem = 1; u < ndims; u++) {
H5_CHECKED_ASSIGN(ret_value->shared->u.array.dim[u], size_t, dim[u], hsize_t);
ret_value->shared->u.array.nelem *= (size_t)dim[u]; // XXX: multiply using uninitialized values. writes to u.compnd.nalloc
} /* end for */
/* Set the array's size (number of elements * element datatype's size) */
ret_value->shared->size = ret_value->shared->parent->shared->size * ret_value->shared->u.array.nelem; // XXX
...
FUNC_LEAVE_NOAPI(ret_value)
} /* end H5T__array_create */
The structure’s that overlap are located within the H5T_shared_t definition in src/H5Tpkg.h:288. In this structure, the “u” field is a union of both an H5T_array_t and an H5T_compnd_T which both are used within the loop that was explained in the prior snippet.
src/H5Tpkg.h:288
typedef struct H5T_shared_t {
hsize_t fo_count; /* number of references to this file object */
...
struct H5T_t *parent;/*parent type for derived datatypes */
union {
H5T_atomic_t atomic; /* an atomic datatype */
H5T_compnd_t compnd; /* a compound datatype (struct) */
H5T_enum_t enumer; /* an enumeration type (enum) */
H5T_vlen_t vlen; /* a variable-length datatype */
H5T_opaque_t opaque; /* an opaque datatype */
H5T_array_t array; /* an array datatype */
} u;
} H5T_shared_t;
In these structures, H5T_array_t.nelem is the same as H5T_compnd_t.nalloc, and H5T_array_t.ndims is the same as H5T_compnd_t.nmembs. These are defined below. The field’s that are used to control the allocation and the loop are marked.
src/H5Tpkg.h:273
typedef struct H5T_array_t {
size_t nelem; /* total number of elements in array */ // XXX: modified using elements outside of the dims variable
unsigned ndims; /* member dimensionality */ // XXX: modified inside H5T__array_create
size_t dim[H5S_MAX_RANK]; /* size in each dimension */
} H5T_array_t;
src/H5Tpkg.h:217
typedef struct H5T_compnd_t {
unsigned nalloc; /*num entries allocated in MEMB array*/ // XXX: used to control the allocation
unsigned nmembs; /*number of members defined in struct*/ // XXX: used to terminate the loop
H5T_sort_t sorted; /*how are members sorted? */
hbool_t packed; /*are members packed together? */
H5T_cmemb_t *memb; /*array of struct members */
size_t memb_size; /*total of all member sizes */
} H5T_compnd_t;
Referring back to the loop, these two fields are used to control when the loop terminates. Since u.array.ndims let’s the librayr modify the value of u.compnd.nmembs, the code at line 391 will write outside the bounds of the allocation. This is a heap-based buffer overflow and can lead to code execution under the context of the application using the library.
src/H5Odtype.c:282
for(i = 0; i < dt->shared->u.compnd.nmembs; i++) { // XXX: u.array.ndims
... src/H5Odtype.c:391 ...
/* Member size */
dt->shared->u.compnd.memb[i].size = temp_type->shared->size; // XXX: writes outside of bounds of loop.
dt->shared->u.compnd.memb_size += temp_type->shared->size;
/* Set the field datatype (finally :-) */
dt->shared->u.compnd.memb[i].type = temp_type;
$ gdb -q --args bin/h5stat poc.hdf
1542 ../../../tools/h5stat/h5stat.c: No such file or directory.
(gdb) bp src/H5Odtype.c:278
Breakpoint 4 at 0xb6b04b3f: file ../../src/H5Odtype.c, line 278.
(gdb) bp src/H5Odtype.c:312
Breakpoint 5 at 0xb6b07356: file ../../src/H5Odtype.c, line 312.
(gdb) bp src/H5Odtype.c:352
Breakpoint 6 at 0xb6b091f7: file ../../src/H5Odtype.c, line 352.
(gdb) bp src/H5Odtype.c:392
Breakpoint 7 at 0xb6b0a852: file ../../src/H5Odtype.c, line 392.
(gdb) r
Starting program: $HOME/hdf5-1.8.16/release/bin/h5stat poc.hdf
Filename: poc.hdf
Breakpoint 3, H5O_dtype_decode_helper (f=f@entry=0x83f0e48, ioflags=ioflags@entry=0xbfffed6c, pp=pp@entry=0xbfffed1c, dt=dt@entry=0x83df358) at ../../src/H5Odtype.c:278
278 dt->shared->u.compnd.memb = (H5T_cmemb_t *)H5MM_calloc(dt->shared->u.compnd.nalloc * sizeof(H5T_cmemb_t));
(gdb) p dt->shared->u.compnd.nalloc * sizeof(H5T_cmemb_t)
$1 = 0x30
(gdb) n
279 dt->shared->u.compnd.memb_size = 0;
(gdb) p dt->shared->u.compnd.memb
$2 = (H5T_cmemb_t *) 0x83f4070
(gdb) ba dt->shared->u.compnd.memb + dt->shared->u.compnd.nalloc * sizeof(H5T_cmemb_t)
Hardware watchpoint 7: *(dt->shared->u.compnd.memb + dt->shared->u.compnd.nalloc * sizeof(H5T_cmemb_t))
(gdb) c
Continuing.
Hardware watchpoint 7: *(dt->shared->u.compnd.memb + dt->shared->u.compnd.nalloc * sizeof(H5T_cmemb_t))
Old value = 0x0
New value = <unreadable>
H5T__array_create (base=base@entry=0x83df448, ndims=ndims@entry=0x80, dim=dim@entry=0xbfffebc8) at ../../src/H5Tarray.c:206
206 ret_value->shared->u.array.nelem *= (size_t)dim[u];
(gdb) c
Continuing.
Breakpoint 6, H5O_dtype_decode_helper (f=f@entry=0x83f0e48, ioflags=ioflags@entry=0xbfffed6c, pp=pp@entry=0xbfffed1c, dt=dt@entry=0x83df358) at ../../src/H5Odtype.c:392
392 dt->shared->u.compnd.memb[i].size = temp_type->shared->size;
(gdb) n
Catchpoint 2 (signal SIGSEGV), 0x08148372 in H5O_dtype_decode_helper (f=f@entry=0x83f0e48, ioflags=ioflags@entry=0xbfffed6c, pp=pp@entry=0xbfffed1c, dt=dt@entry=0x83df358) at ../../src/H5Odtype.c:392
392 dt->shared->u.compnd.memb[i].size = temp_type->shared->size;
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb)
==2061==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xb2b20758 at pc 0xb699e18c bp 0xbfa0e618 sp 0xbfa0e610
WRITE of size 4 at 0xb2b20758 thread T0
#0 0xb699e18b in H5T__array_create $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Tarray.c:205
#1 0xb629b2e4 in H5O_dtype_decode_helper $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Odtype.c:352
#2 0xb628d881 in H5O_dtype_decode $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Odtype.c:1108
#3 0xb6259fd8 in H5O_dtype_shared_decode $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Oshared.h:84
#4 0xb6335a5c in H5O_msg_read_oh $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Omessage.c:554
#5 0xb63338a6 in H5O_msg_read $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Omessage.c:483
#6 0xb57d3b96 in H5D__open_oid $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Dint.c:1245
#7 0xb57d0df7 in H5D_open $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Dint.c:1153
#8 0xb56763f9 in H5Dopen2 $HOME/hdf5-1.8.16/memcheck/src/../../src/H5D.c:368
#9 0x80e0ecd in dataset_stats $HOME/hdf5-1.8.16/memcheck/tools/h5stat/../../../tools/h5stat/h5stat.c:473
#10 0x80d1d39 in obj_stats $HOME/hdf5-1.8.16/memcheck/tools/h5stat/../../../tools/h5stat/h5stat.c:685
#11 0x81d307d in traverse_cb $HOME/hdf5-1.8.16/memcheck/tools/lib/../../../tools/lib/h5trav.c:237
#12 0xb5c6a66a in H5G_visit_cb $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Gint.c:939
#13 0xb5cbea72 in H5G__node_iterate $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Gnode.c:1026
#14 0xb54b2c85 in H5B_iterate_helper $HOME/hdf5-1.8.16/memcheck/src/../../src/H5B.c:1175
#15 0xb54b06db in H5B_iterate $HOME/hdf5-1.8.16/memcheck/src/../../src/H5B.c:1220
#16 0xb5d17773 in H5G__stab_iterate $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Gstab.c:565
#17 0xb5ce2af2 in H5G__obj_iterate $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Gobj.c:707
#18 0xb5c67be2 in H5G_visit $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Gint.c:1174
#19 0xb6022f7d in H5Lvisit_by_name $HOME/hdf5-1.8.16/memcheck/src/../../src/H5L.c:1378
#20 0x81bed2e in traverse $HOME/hdf5-1.8.16/memcheck/tools/lib/../../../tools/lib/h5trav.c:310
#21 0x81c9df5 in h5trav_visit $HOME/hdf5-1.8.16/memcheck/tools/lib/../../../tools/lib/h5trav.c:1164
#22 0x80cf9e3 in main $HOME/hdf5-1.8.16/memcheck/tools/h5stat/../../../tools/h5stat/h5stat.c:1623
#23 0xb506ea82 (/lib/i386-linux-gnu/libc.so.6+0x19a82)
#24 0x80cde74 in _start ($HOME/hdf5-1.8.16/memcheck/bin/h5stat+0x80cde74)
0xb2b20758 is located 0 bytes to the right of 168-byte region [0xb2b206b0,0xb2b20758)
allocated by thread T0 here:
#0 0x80b6b8e in calloc ($HOME/hdf5-1.8.16/memcheck/bin/h5stat+0x80b6b8e)
#1 0xb6093d5b in H5MM_calloc $HOME/hdf5-1.8.16/memcheck/src/../../src/H5MM.c:107
#2 0xb6982712 in H5T__alloc $HOME/hdf5-1.8.16/memcheck/src/../../src/H5T.c:3462
#3 0xb699d08c in H5T__array_create $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Tarray.c:192
#4 0xb629b2e4 in H5O_dtype_decode_helper $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Odtype.c:352
#5 0xb628d881 in H5O_dtype_decode $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Odtype.c:1108
#6 0xb6259fd8 in H5O_dtype_shared_decode $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Oshared.h:84
#7 0xb6335a5c in H5O_msg_read_oh $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Omessage.c:554
#8 0xb63338a6 in H5O_msg_read $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Omessage.c:483
#9 0xb57d3b96 in H5D__open_oid $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Dint.c:1245
#10 0xb57d0df7 in H5D_open $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Dint.c:1153
#11 0xb56763f9 in H5Dopen2 $HOME/hdf5-1.8.16/memcheck/src/../../src/H5D.c:368
#12 0x80e0ecd in dataset_stats $HOME/hdf5-1.8.16/memcheck/tools/h5stat/../../../tools/h5stat/h5stat.c:473
#13 0x80d1d39 in obj_stats $HOME/hdf5-1.8.16/memcheck/tools/h5stat/../../../tools/h5stat/h5stat.c:685
#14 0x81d307d in traverse_cb $HOME/hdf5-1.8.16/memcheck/tools/lib/../../../tools/lib/h5trav.c:237
#15 0xb5c6a66a in H5G_visit_cb $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Gint.c:939
#16 0xb5cbea72 in H5G__node_iterate $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Gnode.c:1026
#17 0xb54b2c85 in H5B_iterate_helper $HOME/hdf5-1.8.16/memcheck/src/../../src/H5B.c:1175
#18 0xb54b06db in H5B_iterate $HOME/hdf5-1.8.16/memcheck/src/../../src/H5B.c:1220
#19 0xb5d17773 in H5G__stab_iterate $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Gstab.c:565
#20 0xb5ce2af2 in H5G__obj_iterate $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Gobj.c:707
#21 0xb5c67be2 in H5G_visit $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Gint.c:1174
#22 0xb6022f7d in H5Lvisit_by_name $HOME/hdf5-1.8.16/memcheck/src/../../src/H5L.c:1378
#23 0x81bed2e in traverse $HOME/hdf5-1.8.16/memcheck/tools/lib/../../../tools/lib/h5trav.c:310
#24 0x81c9df5 in h5trav_visit $HOME/hdf5-1.8.16/memcheck/tools/lib/../../../tools/lib/h5trav.c:1164
#25 0x80cf9e3 in main $HOME/hdf5-1.8.16/memcheck/tools/h5stat/../../../tools/h5stat/h5stat.c:1623
#26 0xb506ea82 (/lib/i386-linux-gnu/libc.so.6+0x19a82)
SUMMARY: AddressSanitizer: heap-buffer-overflow $HOME/hdf5-1.8.16/memcheck/src/../../src/H5Tarray.c:205 H5T__array_create
2016-05-08 - Discovery
2016-05-17 - Vendor Notification
2016-11-15 - Public Disclosure
[1] https://en.wikipedia.org/wiki/Hierarchical_Data_Format
[2] http://www.hdfgroup.org/HDF5/
Discovered by Cisco Talos.