Here is what the code for training a two layer back propagation net would look like:
Below is the documentation for tensor:
Introduction
This is a document on how to use the tensor library 'tensor.cc'. This is a very small linear algebra library designed primarily for coding up neural nets quickly. Basically it implements a 'tensor multiplication' algorithm and a few other things.
If you don't know very much linear algebra you won't understand
what this code does, or be able to use it. In an emergency you can learn
almost all the linear algebra you need from chapter 9 of Rumelhart
and MacClelland's Parallel Distributed Processing vol 1.
Concept
A fully connected layer in a feed forward net is just a vector multiplied by a matrix, yielding another vector. This new vector holds the activation values for the output units of that layer. ( Many nets will then pass all of these through a non-linear function ). This output vector can be used as an input to a lower layer. Simply multiply the output vector by the lower matrix to get the final output vector for the next layer.
Single layer net:
vec1 * Matrix1 = vec2
Two layer net:
vec1 * Matrix1 = vec2, vec2 * Matrix2 = vec3
'tensor.cc' implements a handy matrix multiply, which allows you to string together matrices and vectors quickly to make custom nets.
How the Library Works
A 'tensor' is like a matrix except it can have more ( or fewer ) than two dimensions.
Here's the concept, a vector is an array of numbers, and a matrix can be thought of as an array of vectors. It is easy to extend this idea to include cubic structures of numbers that are arrays of matrices, or hyper-cubic structures that are 'arrays' of the previous cubes. This whole class of objects is what the 'tensor' objects in this library try to represent.
A vector has one dimension, its length can range from one to an arbitrarily large number. A matrix can be measured along two dimensions, its rows and columns. Cubic structures will have three measurable dimensions, hyper-cubic ones will have four etc. 'Tensor' objects will represent any of these. So we define a property of tensor objects called the 'dimension' of the object. Vectors are tensors of dimension 1, matrices are of dimension 2.
For a given tensor object, with n dimensions, there is an array of n numbers that represent the length of each of its dimensions. We will call this array the 'extensions' of a tensor. So a vector of length 7 can be thought of as a tensor of dimension (1) with a single extension, (7). A 4 X 2 matrix is a tensor of dimension (2) with extensions ( 4, 2 )
Imagine a matrix which happens to have only one row, for example, a ( 5, 1 ) matrix. It has the same structure as a vector. Likewise a vector of length 5 could be interpreted as a matrix ( 5, 1). In fact, it is always possible to take a tensor of lower dimension and make its dimension higher by padding its extensions with 1's. So a tensor of dimension (1), extension ( 5 ) has the same structure as one of dimension 5, extensions ( 5, 1, 1, 1, 1 ). The 1's on the end of an extension work just like leading zeroes on a number. A scalar value, is a tensor of dimension 0, with extensions ( null ), which is the same structure as a vector, dimension 1, extensions ( 1 ).
Tensors of a higher dimension can always be brought to a lower dimension if any of their extensions are 1. So a tensor d ( 4 ), e ( 2, 1, 3, 1 ), has the same structure as one d( 2 ), e( 2, 3 ).
Thinking of vectors and matrices this way allows you to define a single multiply operation between all tensors instead of multiple ones for scalars and vectors, vectors and matrices, matrices and matrices etc.
Here's how it works. When you multiply a vector by a matrix the vector must be the same length as one of the dimensions of the matrix. So you can multiply a vector of length n only by a matrix of size n X m. The result is always a vector of length m. Likewise a matrix may only be multiplied by another matrix if one of the dimensions of each matrix match. The product is a matrix with dimensions equal to the non-matching dimensions of the original matrices.
(n X m) * (n X p) = (m X p)
It is also possible if two matrices have both dimensions matching to multiply them such that the result is a scalar, the 'inner' product.
(n X m) . (n X m) = (1 X 1)
For the same matrices the 'outer' product would have been:
(n X m ) * (n X m) = (m X m)
A pattern is emerging, one that will let us define a multiply operation that works on any two tensors.
A matrix multiply is just a summation of multiplied terms. For example the inner product of two matrices A and B both of size (m X n ) can be expressed as ( using E for sigma ):
Emn Amn Bmn = C
Their outer product is either:
Em Amn Bmo = Cno
or:
En Amn Bon = Cmo
Two matrices can be multiplied different ways depending on how many 'indexes' ( extensions ) are summed over. More generally for any two tensor objects there are as many different multiply operations as there are ways to collect and sum over the extensions.
We can organize this by first defining the 'depth' of a multiply as the number of extensions which will be summed over. Of course the summed over extensions must match in both tensors. It is also possible to sum over no extensions. The result of such a multiply always has all of the extensions that were not summed over.
Multiply two tensors by first reordering the 'extensions' such that the extensions that are to be summed over are at the beginning of the list, and that these extensions match for both tensors. Then specify the depth. This can only be as large as there are matching extensions. Reordering the the extensions amounts to doing the equivalent of a transpose on a matrix. So when the extensions are changed you should imagine the tensor object being oriented in 'space'.
So two matrices (n X m ) have a product of depth 2 ( the inner product ) which is a tensor d ( 0 ) ( a scalar ). They have a product of depth 1 ( the outer product ) which is a tensor of d( 2 ), ( a matrix either m X m or n X n depending on which extension was summed over ). They also have a product of depth 0 , a tensor d ( 4 ), e ( n, m, n ,m ).
In some neural net implementations it is possible to have sums with more than two elements that have indexes. For example the feed forward dynamics of a second order recurrent net can be expressed as (from C.L. Giles et al., Second Order Recurrent Neural Networks, Neural Computation Vol 4 Num 3 ):
oi = Ejk Wijk Sj Ik
To express this as the multiplication of tensors you just need to remember not to sum ( that is 'sum' with depth 0 ) until you do the last multiply. So first you multiply ( S, I, 0 ) to get a tensor T, d ( 2 ), e ( j ,k ). Then multiply ( W, T, 2 ) ( with W's extensions oriented j,k,i and T's oriented j,k ) to get a tensor O, d( 1 ), e ( i ).
Note that for this kind of network the connection weights ( W ) are
represented by a tensor of dimension 3. This would be inconvenient for
neural net libraries based only on vector and matrix objects.
Implementation
A 'Tensor' class object has the following members:
int dimension /* the dimension of the tensor, also the length of
extensions array */
int *extensions /* an array containing the extensions */
double *values /* an array containing the 'body' of the tensor */
int size /* this is the length of the values array. It is
redundant because it is the product of all of the
extensions, but useful and save recomputing it */
char *name /* a string to identify the tensor, useful for
diagnostics */
A tensor object can be declared using the constructor 'tensor'. a tensor can be declared with no arguments in which case a tensor of d(0), e( null ), with no name is allocated. A tensor with specific attributes can be declared by providing the tensor constructor with the dimension and an array of extensions:
int ext_array[4] = { 4, 3, 2, 1 };
tensor my_tensor( 4, ext_array, "My Tensor" );
tensor temp( "Temporary" )
Here two tensors are declared, one named "My Tensor", d( 4 ) , e( 4, 3, 2, 1 ). The other named "Temporary", is d( 0 ), e ( null ). It is also possible to duplicate a tensor using the constructor:
tensor new_tensor( my_tensor, "copy of my_tensor" );
This makes a new tensor, 'new_tensor', which is identical to 'my_tensor' except that the name field is different.
The Tensor class has the following member functions:
tensor& transpose( int e1=0, int e2=1 );
This function is used to 'orient' a tensor. It does the equivalent
of a matrix transpose. It takes two numbers as arguments. These are
indexes to the extensions array of the tensor. It switches these
extensions in the array and move all the values in the values array
to the proper place, such that a value that was accessed in the value
array with one set of indexes will now be accessible with the two
indexes in question transposed. Calling transpose with two equal
arguments is a big waste of time.
tensor& widen( int dim );
Takes a tensor of dimension less than 'dim' and raises it to
dimension 'dim', lengthens the extensions array appropriately and
pads it with 1's. The values array is not touched.
tensor& narrow( int dim = 0 );
Takes a tensor with some extensions that are 1's and removes
as many as necessary to get the tensor down to 'dim' dimensions.
If there are not enough 1's it removes all that exist. Values array
is not touched.
tensor& print();
Prints the tensor object to stdout.
tensor& add_name( char *nam );
Fills in the name field in a tensor. If the tensor already has a name
it is replaced.
tensor& load( double *val );
Copies the contents of the array 'val' to the 'values' array of the
tensor. Care should be taken to make sure that 'val' is of the right
length. If it is too short whatever garbage is in memory at the end
of the array may be silently copied into the tensor.
tensor& resize( int dim, int *ext );
Gives a tensor object a new dimension and extensions. The values array
is discarded and initialized to zeroes. This call basically replaces
the current tensor with a new one. It is used mostly by other
functions ( like cross_mult ) who if they are passed a tensor of the
wrong size for the result must 'resize' it.
tensor& subtensor( tensor &t1, int *from_ext, int *to_ext );
given tensor t1, makes a smaller tensor which is only a part of the
original ( t1 ). The resulting tensor will have in it's values array
only values that can be indexed by indexes between the numbers
specified by from_ext and to_ext. The subtensor is of the same
dimension as the original and has extensions equal to the difference
between the values in to_ext and from_ext. To return only the third
and fourth columns in a matrix:
int array1[ big_tensor.t_dimension() ] = { 0, 2 };
int array2[ big_tensor.t_dimension() ] = { 0, 3 };
small_tensor.subtensor( big_tensor, array1, array2 );
tensor& add( tensor &t1, tensor &t2 );
element-wise addition of the values arrays of two tensors. Both
tensors must have equal 'size'.
tensor& subtract( tensor &t1, tensor &t2 );
element-wise subtraction of the values arrays of two tensors. Both
tensors must have equal 'size'.
tensor& serial_mult( tensor &t1, tensor &t2 );
element-wise multiplication of the values arrays of two tensors.
Both tensors must have equal 'size'.
tensor& cross_product( tensor &t1, tensor &t2, int dim );
Given two tensors and a number that expresses the 'depth' of the
operation. The first 'depth' members of the extensions array must
match. The best way to understand the operation is to just carefully
read the code.
tensor& scalar_add( double s );
Adds the scalar value 's' to each element of the 'values' array.
tensor& scalar_mult( double s );
Mulitplies each member of the 'values' array by 's'.
tensor& normalize() { return scalar_mult( 1/length()); }
Takes a tensor and returns one whose 'direction' is the same
but whose 'magnitude' is 1. For a one dimensional tensor
( a vector ) this operation makes the Euclidean distance between
the origin and the point specified by the vector equal to 1 while
the 'direction' of the vector is unchanged.
tensor& concatenate( tensor &t1, tensor &t2 );
Takes the values arrays of two tensors and concatenates them.
Both tensors must be of the same dimension and have extensions that
match all except for the last extension. The output tensor will have
it's last extension equal the sum of the last extensions of the input
tensors.
tensor& randomize();
Initializes the elements of the 'values' array to small
( -1.0 > n < 1.0 ) random values. Useful for 'symmetry breaking'
in deterministic nets.
tensor& identity( int dim, int ext );
Given the number of desired dimensions and the length of one
extension, creates a tensor of 'dim' dimensions, all of whose
extensions are equal to 'ext', and where all of the elements
of the 'values' array are equal to zero except those values
that can be indexed by the same value for all extensions are equal
to one. Essentially this creates an identity tensor that when
multiplied by a tensor of the same dimensions and extensions will
output that tensor itself. ( i.e. an identity matrix is a square
matrix that has all ones across its diagonal and zeroes elsewhere ).
An identity tensor can be used to model Kronecker's delta in
equations.
double length ();
In one dimensional tensors returns the euclidean distance between
the origin and a point specified by the values of the vector.
double distance ( tensor &t1 );
Distance between two tensors. For a one dimensional tensor this
is the Euclidean distance between the points specified by the
values of the two vectors. Distance is only defined between two
tensors of equal dimensions and matching extensions. Often used
as an error measure.
double angle ( tensor &t1 );
With one dimensional tensors returns the angle, in radians, between
two vectors with the same origin. Often used as an error measure.
double projection ( tensor &t1 );
With one dimensional tensors returns the length of one vector
projected on another.
tensor& squash( PFDD );
Applies the input function ( a function of type "double (*)(double)"
- that is a function from type double to type double ) to each value
of the 'values' array. This is an easy way to apply a non-linear or
squashing function to the activation values of a vector. A logistic
function is provided for this purpose but any function of the right
type will do. In most neural net models it is only required that this
function be continuous and differentiable. Here is an example:
double sigmoid( double x ); my_tensor.squash( sigmoid );
tensor& bp_update_layer( tensor &error_t, tensor &output_t );
Updates the mapping matrix for a layer of a multilayer feed-forward
neural net using the 'back-propagation' learning rule. Takes
the mapping matrix as the implicit argument and two vectors, the
output vector of the layer above, and an error vector and updates
the mapping matrix. The amount of learning can be changed by setting
the values of the global 'learn_rate'.
tensor& bp_update_layer_momentum( tensor &error_t, tensor &output_t, tensor &momentum_t );
Just like 'bp_update_layer', except that it also take a momentum
vector as an argument. This momentum vector represents a 'sum'
of the previous changes done to the mapping matrix. The momentum
term can be scaled by the global 'momentum'.
tensor& bp_get_hidden_error( tensor &l_map_t, tensor &l_error_t, tensor &d_squashed_net_t );
Calculates the error vector for a hidden layer. Takes
the mapping matrix of the layer below the hidden layer, the error
vector for the lower layer, and the output of the hidden layer
squashed by the derivative of the squashing function, and the mapping
matrix of the hidden layer ( as the implicit argument ) and returns
the error vector for the hidden layer.
int t_dimension() { return dimension; }
int* t_extensions() { return extensions; }
double* t_values() { return values; }
int t_size() { return size; }
char* t_name() { return name; }These five functions are 'access functions'. Use these to directly access the elements of the tensor object.
my_tensor.t_dimension(); my_tensor.t_extensions()[4];
The first example returns the dimension of 'my_tensor', the second return the fifth element of the extensions array (array's being numbered from zero ).
The general philosophy of many of these functions is that the output tensor object is always passed as the implicit argument ( *this ). No checking is done on whether the implicit argument is of the right size to hold the result. If it is the wrong size the tensor object is simply resized. In an iterative system like a neural net most objects will only get resized once. On the second pass the output tensors will already be the right size.
So to multiply two tensors t1 and t2 and have the result be stored in t3 the following call may be used:
t3.cross_mult( t1, t2, 2 );
Some checking is done on the arguments. Most often both of the explicit
arguments must have some properties in common, being of the same dimension,
having matching extensions, or having equal 'size'. If the checks fail
the operation is not performed, an error message is printed and the
output tensor is returned untouched. The program will not come to a halt.
Non Tensor Class Functions
There are two functions included that are not part of any class:
double sigmoid( double x );
Maps a logistic function from doubles to doubles.
double sigmoid_derivative( double x );
When composed with the 'sigmoid' function will return the derivative of the logisitic function. That is to get the derivative of the logisitic function first pass the value through sigmoid then through sigmoid_derivative:
double x; x = sigmoid( x ); /* x = f(x) */ x = sigmoid_derivative( x ); /* x = f'(x) */
This is usually a convenient way of saving on computation. Calculating the derivative of the logisitic function can be expensive especially if the tensor is large. Since most of that work is done when you calculate the logisitic itself, and you usually have to calculate the logistic first ( in the feed-forward stage ) before you need to calculate it's derivative ( in the learning stage ), you can just send the already 'squashed' values through sigmoid_derivative to get the correct values. It must be stressed that by itself sigmoid_derivative is not a very useful function don't make the mistake of using it on vectors that have not already been 'squashed'.
Coef_counter Objects
In using the tensor library there is never a need to use coef_counter
objects directly. They are used internally by tensor class member functions
to save on computation involved in calculating the index into the 'values'
array specified by indexes in the extensions. They are purely an optimization.
Tensor objects would run fine without them ( just slower ).