Tensor is a small library for building neural nets. With it you can implement many standard neural nets, like multilayer feed-forward nets with back propagation, Jordan nets, Giles nets etc.

Here is what the code for training a two layer back propagation net would look like:


/* feed forward to get output given the input */ hidden.cross_product( input, upper, 1 ).squash( s_func ); h_net.load( hidden.t_values()).squash( s_func_der ); output.cross_product( hidden, lower, 1).squash( s_func ); o_net.load( output.t_values()).squash( s_func_der ); /* get the error vector */ error.subtract( target, output ); error.serial_mult( error, o_net ); /* back propagate and modify weights of mapping matrixes */ h_err.bp_get_hidden_error( lower, error, h_net ); lower.bp_update_layer_momentum( error, hidden, o_mom ); upper.bp_update_layer_momentum( h_err, input, h_mom );

Below is the documentation for tensor:

Introduction


This is a document on how to use the tensor library 'tensor.cc'. This is a very small linear algebra library designed primarily for coding up neural nets quickly. Basically it implements a 'tensor multiplication' algorithm and a few other things.

If you don't know very much linear algebra you won't understand what this code does, or be able to use it. In an emergency you can learn almost all the linear algebra you need from chapter 9 of Rumelhart and MacClelland's Parallel Distributed Processing vol 1.

Concept


A fully connected layer in a feed forward net is just a vector multiplied by a matrix, yielding another vector. This new vector holds the activation values for the output units of that layer. ( Many nets will then pass all of these through a non-linear function ). This output vector can be used as an input to a lower layer. Simply multiply the output vector by the lower matrix to get the final output vector for the next layer.

Single layer net:

vec1 * Matrix1 = vec2

Two layer net:

vec1 * Matrix1 = vec2,
vec2 * Matrix2 = vec3

'tensor.cc' implements a handy matrix multiply, which allows you to string together matrices and vectors quickly to make custom nets.



How the Library Works


A 'tensor' is like a matrix except it can have more ( or fewer ) than two dimensions.

Here's the concept, a vector is an array of numbers, and a matrix can be thought of as an array of vectors. It is easy to extend this idea to include cubic structures of numbers that are arrays of matrices, or hyper-cubic structures that are 'arrays' of the previous cubes. This whole class of objects is what the 'tensor' objects in this library try to represent.

A vector has one dimension, its length can range from one to an arbitrarily large number. A matrix can be measured along two dimensions, its rows and columns. Cubic structures will have three measurable dimensions, hyper-cubic ones will have four etc. 'Tensor' objects will represent any of these. So we define a property of tensor objects called the 'dimension' of the object. Vectors are tensors of dimension 1, matrices are of dimension 2.

For a given tensor object, with n dimensions, there is an array of n numbers that represent the length of each of its dimensions. We will call this array the 'extensions' of a tensor. So a vector of length 7 can be thought of as a tensor of dimension (1) with a single extension, (7). A 4 X 2 matrix is a tensor of dimension (2) with extensions ( 4, 2 )

Imagine a matrix which happens to have only one row, for example, a ( 5, 1 ) matrix. It has the same structure as a vector. Likewise a vector of length 5 could be interpreted as a matrix ( 5, 1). In fact, it is always possible to take a tensor of lower dimension and make its dimension higher by padding its extensions with 1's. So a tensor of dimension (1), extension ( 5 ) has the same structure as one of dimension 5, extensions ( 5, 1, 1, 1, 1 ). The 1's on the end of an extension work just like leading zeroes on a number. A scalar value, is a tensor of dimension 0, with extensions ( null ), which is the same structure as a vector, dimension 1, extensions ( 1 ).

Tensors of a higher dimension can always be brought to a lower dimension if any of their extensions are 1. So a tensor d ( 4 ), e ( 2, 1, 3, 1 ), has the same structure as one d( 2 ), e( 2, 3 ).

Thinking of vectors and matrices this way allows you to define a single multiply operation between all tensors instead of multiple ones for scalars and vectors, vectors and matrices, matrices and matrices etc.

Here's how it works. When you multiply a vector by a matrix the vector must be the same length as one of the dimensions of the matrix. So you can multiply a vector of length n only by a matrix of size n X m. The result is always a vector of length m. Likewise a matrix may only be multiplied by another matrix if one of the dimensions of each matrix match. The product is a matrix with dimensions equal to the non-matching dimensions of the original matrices.

(n X m) * (n X p) = (m X p)

It is also possible if two matrices have both dimensions matching to multiply them such that the result is a scalar, the 'inner' product.

(n X m) . (n X m) = (1 X 1)

For the same matrices the 'outer' product would have been:

(n X m ) * (n X m) = (m X m)

A pattern is emerging, one that will let us define a multiply operation that works on any two tensors.

A matrix multiply is just a summation of multiplied terms. For example the inner product of two matrices A and B both of size (m X n ) can be expressed as ( using E for sigma ):

Emn Amn Bmn = C

Their outer product is either:

Em Amn Bmo = Cno

or:

En Amn Bon = Cmo

Two matrices can be multiplied different ways depending on how many 'indexes' ( extensions ) are summed over. More generally for any two tensor objects there are as many different multiply operations as there are ways to collect and sum over the extensions.

We can organize this by first defining the 'depth' of a multiply as the number of extensions which will be summed over. Of course the summed over extensions must match in both tensors. It is also possible to sum over no extensions. The result of such a multiply always has all of the extensions that were not summed over.

Multiply two tensors by first reordering the 'extensions' such that the extensions that are to be summed over are at the beginning of the list, and that these extensions match for both tensors. Then specify the depth. This can only be as large as there are matching extensions. Reordering the the extensions amounts to doing the equivalent of a transpose on a matrix. So when the extensions are changed you should imagine the tensor object being oriented in 'space'.

So two matrices (n X m ) have a product of depth 2 ( the inner product ) which is a tensor d ( 0 ) ( a scalar ). They have a product of depth 1 ( the outer product ) which is a tensor of d( 2 ), ( a matrix either m X m or n X n depending on which extension was summed over ). They also have a product of depth 0 , a tensor d ( 4 ), e ( n, m, n ,m ).

In some neural net implementations it is possible to have sums with more than two elements that have indexes. For example the feed forward dynamics of a second order recurrent net can be expressed as (from C.L. Giles et al., Second Order Recurrent Neural Networks, Neural Computation Vol 4 Num 3 ):

oi = Ejk Wijk Sj Ik

To express this as the multiplication of tensors you just need to remember not to sum ( that is 'sum' with depth 0 ) until you do the last multiply. So first you multiply ( S, I, 0 ) to get a tensor T, d ( 2 ), e ( j ,k ). Then multiply ( W, T, 2 ) ( with W's extensions oriented j,k,i and T's oriented j,k ) to get a tensor O, d( 1 ), e ( i ).

Note that for this kind of network the connection weights ( W ) are represented by a tensor of dimension 3. This would be inconvenient for neural net libraries based only on vector and matrix objects.

Implementation


A 'Tensor' class object has the following members:

int dimension     /* the dimension of the tensor, also the length of
                     extensions array */
int *extensions   /* an array containing the extensions */
double *values    /* an array containing the 'body' of the tensor */

int size          /* this is the length of the values array. It is
                     redundant because it is the product of all of the
                     extensions, but useful and save recomputing it */
char *name        /* a string to identify the tensor, useful for
                     diagnostics */

A tensor object can be declared using the constructor 'tensor'. a tensor can be declared with no arguments in which case a tensor of d(0), e( null ), with no name is allocated. A tensor with specific attributes can be declared by providing the tensor constructor with the dimension and an array of extensions:

int ext_array[4] = { 4, 3, 2, 1 };
tensor my_tensor( 4, ext_array, "My Tensor" );

tensor temp( "Temporary" )

Here two tensors are declared, one named "My Tensor", d( 4 ) , e( 4, 3, 2, 1 ). The other named "Temporary", is d( 0 ), e ( null ). It is also possible to duplicate a tensor using the constructor:

tensor new_tensor( my_tensor, "copy of my_tensor" );

This makes a new tensor, 'new_tensor', which is identical to 'my_tensor' except that the name field is different.

The Tensor class has the following member functions:

tensor& transpose( int e1=0, int e2=1 );
This function is used to 'orient' a tensor. It does the equivalent of a matrix transpose. It takes two numbers as arguments. These are indexes to the extensions array of the tensor. It switches these extensions in the array and move all the values in the values array to the proper place, such that a value that was accessed in the value array with one set of indexes will now be accessible with the two indexes in question transposed. Calling transpose with two equal arguments is a big waste of time.

tensor& widen( int dim );
Takes a tensor of dimension less than 'dim' and raises it to dimension 'dim', lengthens the extensions array appropriately and pads it with 1's. The values array is not touched.

tensor& narrow( int dim = 0 );
Takes a tensor with some extensions that are 1's and removes as many as necessary to get the tensor down to 'dim' dimensions. If there are not enough 1's it removes all that exist. Values array is not touched.

tensor& print();
Prints the tensor object to stdout.

tensor& add_name( char *nam );
Fills in the name field in a tensor. If the tensor already has a name it is replaced.

tensor& load( double *val );
Copies the contents of the array 'val' to the 'values' array of the tensor. Care should be taken to make sure that 'val' is of the right length. If it is too short whatever garbage is in memory at the end of the array may be silently copied into the tensor.

tensor& resize( int dim, int *ext );
Gives a tensor object a new dimension and extensions. The values array is discarded and initialized to zeroes. This call basically replaces the current tensor with a new one. It is used mostly by other functions ( like cross_mult ) who if they are passed a tensor of the wrong size for the result must 'resize' it.

tensor& subtensor( tensor &t1, int *from_ext, int *to_ext );
given tensor t1, makes a smaller tensor which is only a part of the original ( t1 ). The resulting tensor will have in it's values array only values that can be indexed by indexes between the numbers specified by from_ext and to_ext. The subtensor is of the same dimension as the original and has extensions equal to the difference between the values in to_ext and from_ext. To return only the third and fourth columns in a matrix:

int array1[ big_tensor.t_dimension() ] = { 0, 2 };
int array2[ big_tensor.t_dimension() ] = { 0, 3 };
small_tensor.subtensor( big_tensor, array1, array2 );

tensor& add( tensor &t1, tensor &t2 );
element-wise addition of the values arrays of two tensors. Both tensors must have equal 'size'.

tensor& subtract( tensor &t1, tensor &t2 );
element-wise subtraction of the values arrays of two tensors. Both tensors must have equal 'size'.

tensor& serial_mult( tensor &t1, tensor &t2 );
element-wise multiplication of the values arrays of two tensors. Both tensors must have equal 'size'.

tensor& cross_product( tensor &t1, tensor &t2, int dim );
Given two tensors and a number that expresses the 'depth' of the operation. The first 'depth' members of the extensions array must match. The best way to understand the operation is to just carefully read the code.

tensor& scalar_add( double s );
Adds the scalar value 's' to each element of the 'values' array.

tensor& scalar_mult( double s );
Mulitplies each member of the 'values' array by 's'.

tensor& normalize() { return scalar_mult( 1/length()); }
Takes a tensor and returns one whose 'direction' is the same but whose 'magnitude' is 1. For a one dimensional tensor ( a vector ) this operation makes the Euclidean distance between the origin and the point specified by the vector equal to 1 while the 'direction' of the vector is unchanged.

tensor& concatenate( tensor &t1, tensor &t2 );
Takes the values arrays of two tensors and concatenates them. Both tensors must be of the same dimension and have extensions that match all except for the last extension. The output tensor will have it's last extension equal the sum of the last extensions of the input tensors.

tensor& randomize();
Initializes the elements of the 'values' array to small ( -1.0 > n < 1.0 ) random values. Useful for 'symmetry breaking' in deterministic nets.

tensor& identity( int dim, int ext );
Given the number of desired dimensions and the length of one extension, creates a tensor of 'dim' dimensions, all of whose extensions are equal to 'ext', and where all of the elements of the 'values' array are equal to zero except those values that can be indexed by the same value for all extensions are equal to one. Essentially this creates an identity tensor that when multiplied by a tensor of the same dimensions and extensions will output that tensor itself. ( i.e. an identity matrix is a square matrix that has all ones across its diagonal and zeroes elsewhere ). An identity tensor can be used to model Kronecker's delta in equations.

double length ();
In one dimensional tensors returns the euclidean distance between the origin and a point specified by the values of the vector.

double distance ( tensor &t1 );
Distance between two tensors. For a one dimensional tensor this is the Euclidean distance between the points specified by the values of the two vectors. Distance is only defined between two tensors of equal dimensions and matching extensions. Often used as an error measure.

double angle ( tensor &t1 );
With one dimensional tensors returns the angle, in radians, between two vectors with the same origin. Often used as an error measure.

double projection ( tensor &t1 );
With one dimensional tensors returns the length of one vector projected on another.

tensor& squash( PFDD );
Applies the input function ( a function of type "double (*)(double)" - that is a function from type double to type double ) to each value of the 'values' array. This is an easy way to apply a non-linear or squashing function to the activation values of a vector. A logistic function is provided for this purpose but any function of the right type will do. In most neural net models it is only required that this function be continuous and differentiable. Here is an example:

double sigmoid( double x );
my_tensor.squash( sigmoid );

tensor& bp_update_layer( tensor &error_t, tensor &output_t );
Updates the mapping matrix for a layer of a multilayer feed-forward neural net using the 'back-propagation' learning rule. Takes the mapping matrix as the implicit argument and two vectors, the output vector of the layer above, and an error vector and updates the mapping matrix. The amount of learning can be changed by setting the values of the global 'learn_rate'.

tensor& bp_update_layer_momentum( tensor &error_t, tensor &output_t, tensor &momentum_t );
Just like 'bp_update_layer', except that it also take a momentum vector as an argument. This momentum vector represents a 'sum' of the previous changes done to the mapping matrix. The momentum term can be scaled by the global 'momentum'.

tensor& bp_get_hidden_error( tensor &l_map_t, tensor &l_error_t, tensor &d_squashed_net_t );
Calculates the error vector for a hidden layer. Takes the mapping matrix of the layer below the hidden layer, the error vector for the lower layer, and the output of the hidden layer squashed by the derivative of the squashing function, and the mapping matrix of the hidden layer ( as the implicit argument ) and returns the error vector for the hidden layer.

int t_dimension() { return dimension; }
int* t_extensions() { return extensions; }
double* t_values() { return values; }
int t_size() { return size; }
char* t_name() { return name; }

These five functions are 'access functions'. Use these to directly access the elements of the tensor object.

my_tensor.t_dimension();
my_tensor.t_extensions()[4];

The first example returns the dimension of 'my_tensor', the second return the fifth element of the extensions array (array's being numbered from zero ).

The general philosophy of many of these functions is that the output tensor object is always passed as the implicit argument ( *this ). No checking is done on whether the implicit argument is of the right size to hold the result. If it is the wrong size the tensor object is simply resized. In an iterative system like a neural net most objects will only get resized once. On the second pass the output tensors will already be the right size.

So to multiply two tensors t1 and t2 and have the result be stored in t3 the following call may be used:

t3.cross_mult( t1, t2, 2 );

Some checking is done on the arguments. Most often both of the explicit arguments must have some properties in common, being of the same dimension, having matching extensions, or having equal 'size'. If the checks fail the operation is not performed, an error message is printed and the output tensor is returned untouched. The program will not come to a halt.

Non Tensor Class Functions


There are two functions included that are not part of any class:

double sigmoid( double x );

Maps a logistic function from doubles to doubles.

double sigmoid_derivative( double x );

When composed with the 'sigmoid' function will return the derivative of the logisitic function. That is to get the derivative of the logisitic function first pass the value through sigmoid then through sigmoid_derivative:

double x;
x = sigmoid( x );       /* x = f(x)  */
x = sigmoid_derivative( x );    /* x = f'(x) */

This is usually a convenient way of saving on computation. Calculating the derivative of the logisitic function can be expensive especially if the tensor is large. Since most of that work is done when you calculate the logisitic itself, and you usually have to calculate the logistic first ( in the feed-forward stage ) before you need to calculate it's derivative ( in the learning stage ), you can just send the already 'squashed' values through sigmoid_derivative to get the correct values. It must be stressed that by itself sigmoid_derivative is not a very useful function don't make the mistake of using it on vectors that have not already been 'squashed'.



Coef_counter Objects


In using the tensor library there is never a need to use coef_counter objects directly. They are used internally by tensor class member functions to save on computation involved in calculating the index into the 'values' array specified by indexes in the extensions. They are purely an optimization. Tensor objects would run fine without them ( just slower ).