The Dynamic Array is an excellent all purpose container class. It can be used for many purposes where you might use regular arrays or linked lists. A dynamic array is just an array that reallocates its own memory as it grows. In C a normal array is allocated 'statically', where the amount of memory allocated for the array does not change. Often you don't know ahead of time how much memory will be needed. The simplest solution is to allocate a buffer bigger than you are likely to need. This has two problems. You are likely to waste memory. If the amount is small this won't matter much, but allowing some slack in even a 1 meg buffer, may mean wasting 10 megs of core. Secondly there is no way to recover if you run out of memory. The program will just overwrite whatever comes after the array in memory.

A better solution is to use a dynamic container like a linked list or a dynamic array. A linked list is a series of small memory structures where each structure has a pointer to the next, and each structure has enough memory to store one or more objects:

struct LINK { LINK* ptr; ------> struct LINK FOO_OBJECT object; { } LINK* ptr; ------> FOO_OBJECT object; }

The list ends when the pointer is set to a 'magic' value, typically NULL. To add more objects to the list you need only allocate another LINK, and add it to the end of the list. One great strengths of linked lists is that if you need to add objects in a particular order, the new links can be inserted anywhere in the list almost as easily as they can be inserted at the head or the tail of the list.

Linked lists though have several drawbacks and tend to be overused. First they waste a lot of memory. It may seem that because they allocate blocks of memory exactly as big as they need each time, that the linked list is very efficient with memory. But this is not true because most system memory managers allocate more actual memory than they need for each block that is requested of them. For each block allocated by the system, most memory managers add a header with their bookkeeping information. Most memory managers only allocate memory down to some multiple of the machine's word size. So if your machine has a 4 byte word size, the memory manager may only round the amount of memory up top the nearest 4 or 16 or 32 bytes. Now these may not seem like a big waste but linked lists are often used to hold say a single int or pointer in each link. So for a 4 byte int, you need four more bytes for the link pointer, then if you want a back pointer, which is very useful, that's four more, for 24 byes, now add in a 32 byte header for malloc, and round that up to the nearest 16, for 64 bytes to store each 4 byte int. Many memory managers only allocate memory in blocks that are powers of 2.

Another problem with linked lists is that they are very slow. To read through each element you must follow a pointer to a new random location in memory. Linked list intensive programs usually spend a good amount of their time just following these pointers. Linked lists are also slow to be created, especially if you make calls to the system's malloc. ( directly or with new ). If you don't know which memory manager you are using and how long each malloc takes... you should. And if you can't find out, or your program will be used on systems where you cannot predict which memory manager will be used, assume the manager will be unacceptably slow, and will get slower as the number of calls to malloc increase, because on many systems this is the case.

For most purposes a dynamic array is a better generic container class. In a dynamic array, a small array is allocated, with a count of the number of items stored in the array. Whenever an item is added to the array the count increases, whenever an item is removed the count decrements. Should the original array get filled up, a new larger buffer is allocated and the contents are copied into the new array, the old smaller buffer is then deallocated. How big the original array is and how much it is increased by each time it is filled can be tailored to the application.

A good general purpose setting is the make the original array small, and double the array's size each time more space is needed. At first this may seem a little reckless. One could imagine the array growing out of control easily, and wasting a huge amount of memory. In practice it is actually very well behaved. With an array that doubles in size each time, the original array can be set very small and it will get to any desired size in a small number of increases. ( to go from 32 bytes to one megabyte would be 15 doublings ) If the array is always increasing and decreasing in size, it will reach its maximum in a few iterations, and will then be unlikely to need to allocate more memory. A linked list by contrast would need to call malloc continuously. A doubling dynamic array will only tend to waste about 30% of the allocated memory. Since all the memory is in one large buffer, the memory manager's own overhead for each block will tend to be negligible. Each entry will take up only as many bytes as it needs in the array. The only waste will be in the part of the pre-allocated array that is unused. Right after a doubling the waste will be about 50%, but this is only an upper bound. Typically the array will continue to fill up to some middling range, where the waste will average around 30%. Now many memory managers only allocate memory in blocks that are powers of two anyway, so they would have wasted this memory anyway, so asking for power of two blocks ( or in some cases power of two minus the block header size ) at least makes this memory potentially useful.

The C++ implementation is quite straightforward. The only tricky part of this code is the template syntax. Note that this complete implementation fits in a single header file, which is a good idea since it uses templates and many compilers do not handle template instantiation in the source files very well. ( They need to resort to tricks like hidden files and directories, and with secret copies of machine altered source code, and dependencies in the order of .cc compilation, and #pragma etc. ):

#ifndef dblarray_h #define dblarray_h #include #include #include #include #ifndef DBLARRAY_SIZE #define DBLARRAY_SIZE 16 #endif template class DBLARRAY { public: DBLARRAY() { _num = 0; _size = 0; _list = 0; } DBLARRAY( int size ) { _num = 0; _size = size; _list = new T[_size]; } ~DBLARRAY(){ delete _list; } int Add( T& obj ) { if ( !_size ) { _size = DBLARRAY_SIZE; _list = new T[_size]; } if ( _size == _num ) { _size <<= 1; T* tmp = new T[_size]; memcpy( tmp, _list, ( _num * sizeof( T ))); delete (void*)_list; _list = tmp; } _list[ _num ] = (T&)obj; memset( &obj, sizeof(T), 0 ); return _num++; } T Remove() { if ( !_num ){ cerr << "DBLARRAY Underflow (@" << (void*)this << ")\n"; exit(0); } return _list[ --_num ]; } T Remove( int i ) { if ( i > _num ) { cerr << "DBLARRAY attempt to remove object " << i << " from array with only " << _num << " objects\n"; exit(0); } T t = _list[i]; memmove( _list+i, _list+i+1, (( _num - i )*sizeof(T))); _num--; return t; } void Clear(){ _num = 0; } T& operator[]( int i ){ return _list[i]; } T operator[]( int i ) const { return _list[i]; } int IsEmpty(){ return ( _num == 0 ); } int Size() const { return _size; } int Num() const { return _num; } protected: T* _list; int _size; int _num; };

In this example the array only holds on to pointers to the objects it 'holds'. It does not assume any responsibility for deleting the objects when the container is deleted, since it cannot know ahead of time if anyone else is also holding a pointer to its client objects. It is up to the programmer to remember to delete them. Of course this is in keeping with the philosophy that the module which allocates a block of memory should be responsible for freeing it also.

The above example is also missing a few useful features, like generic file i/o routines, an 'add unique' function which would only add items that are not already in the array, and various types of housekeeping functions ( 'donate' the _list member to a caller, and allocate a new _list member, resize the _list member to a given size, or to the minimum required etc. ). All of these functions are straightforward to add.

The dynamic array object can be extended easily to implement other common containers like stacks, queues and even linked lists. To implement a stack you need only add a 'push' and 'pop' member, which work just like the existing 'Add()' and 'Remove()' functions respectively. They can be worked into a derived class for a clearer interface:

template class DBLSTACK : public DBLARRAY< T > { public: DBLSTACK( int size = DBLARRAY_SIZE ): DBLARRAY< T >( size ){} void Push( T& obj ){ Add( obj ); } T Pop(){ if ( !_num ) { cerr << "DBLSTACK underflow (@" << (void*)this << ")\n"; exit(0);} return _list[ --_num ]; } };

A queue is very similar except you want to retrieve objects in the order that they come in:

template class DBLQUEUE : public DBLARRAY< T > { public: DBLQUEUE( int size = DBLARRAY_SIZE ): DBLARRAY< T >( size ) { next_idx = 0; } T Next() { if ( next_idx == _num ) next_idx = 0; return _list[ _next_idx++ ]; } private: int next_idx; };

In the previous example, the 'Next()' operation just returns the next queue item without removing it from the list, unlike the 'DBL_STACK' variant which actually removes the pop'd item. The particular new application should drive the design of new variants.

A linked list can be simulated with two arrays, one which holds the objects themselves and another with indexes into the first array which indicate the next item on the list. This simulated linked list has some advantages over actual lists with pointers; it uses less memory, it can be constructed faster and it has easy random access, while still having many of the linked list advantages like easy re-ordering, and easy ordered insertion.