#####################################################################
######################################################################
##                                                                  ##
##                                                                  ##
##                          dynet                                   ##
##                                                                  ##
##                a recurrent neural network                        ##
##              for modelling dynamical systems                     ##
##                                                                  ##
##                           by                                     ##
##                                                                  ##
##                    Coryn Bailer-Jones                            ##
##                                                                  ##
##                         22/05/98                                 ##
##                                                                  ##
##             email: calj@mpia-hd.mpg.de                           ##
##               www: http://www.mpia-hd.mpg.de/stars/calj          ##
##                                                                  ##
##                                                                  ##
##        see the README file for disclaimer and warranty           ##
##        see the dynet_manual file for operational details         ##
##                                                                  ##
##        This file is copyright 1998 by C.A.L. Bailer-Jones        ##
##                                                                  ##
##                                                                  ##
######################################################################
######################################################################


FILE:		dynet_manual
DESCRIPTION:    operations manual for dynet
AUTHOR:		Coryn Bailer-Jones
LAST MOD DATE:	02.03.99 


######################################################################
######################################################################
##                                                                  ##
##                   dynet operations manual                        ##
##                                                                  ##
######################################################################
######################################################################


This file provides the information required to use the dynet
software. It assumes an understanding of the principles behind dynet.
This manual is relevant to software versions 1.19 and up, although
some of the program files were rearranged for version 1.21.

Contents
--------
1.  What dynet is
2.  Other sources of information
3.  The dynet files
4.  How to run dynet
5.  The specfile
6.  File formats
7.  The synthetic problem
8.  Tips on using dynet
9.  Modifications since the previous release version
10. The dynet program



######################################################################
#                            What dynet is                           #
######################################################################

dynet is a recurrent neural network for modelling dynamical systems by
means of discrete time measurement of temporal patterns produced by
the dynamical system. Its learning routine is fully recurrent, and can
be viewed as performing temporal interpolation of one or more temporal
patterns.




######################################################################
#                    Other sources of information                    #
######################################################################

The paper "A Recurrent Neural Network for Modelling Dynamical
Systems", by Bailer-Jones, MacKay & Withers 1998, Network: Computation
in Neural Systems, 9, 531-547, (hereafter BJMW) is available from my
web page: 
http://www.mpia-hd.mpg.de/stars/calj/dynet_paper.html 
I strongly recommend you read this paper, as this manual does not
explain the principles behind dynet, only use of the software.

The source code for dynet is reasonably well documented.

The conjugate gradient optimizer "macopt" has its own source of
documentation, which can be obtained from
http://wol.ra.phy.cam.ac.uk/mackay/c/macopt.html

Once you have read the BJMW paper and this manual, further enquiries
can be addressed to the author at the email address at the head of
this file.  Related publications and other information are available
from the dynet web page
http://www.mpia-hd.mpg.de/stars/calj/dynet.html




######################################################################
#                        The dynet files                             #
######################################################################

Makefile	the makefile
README		disclaimer and warranty. Please read this before you
		proceed.
ansi/		a directory which contains macopt (David MacKay's
		conjugate gradient optimizer) and its ancillary
		files. This is the standard macopt distribution
		and has not been modified in any way.
dynet		the dynet executable
dynet.c		main dynet program
dynet.h		header file for dynet
dynet_manual	this file
netsubs.c	collection of subroutines
netsubs.h	header file for netsubs
ran1.c		random number generator
syn/		a directory containing the data files for a synthetic
		problem (see section "synthetic problem" below)
syn.26.spec	the specfile for the synthetic problem




######################################################################
#                           How to run dynet                         #
######################################################################

First unpack the tar file, dynet.tar. Type

tar xvf dynet.tar

This will create a directory called "dynet" which will contain all of
the files listed above.

dynet is written in ANSI C and was developed on SunOS, Solaris and
Linux platforms. The gcc compiler has been used.  The executable
included in this release is for SunOS 5.5.  To get it to run under any
other flavour of UNIX you'll need to compile it. To do this you must
first compile macopt and then dynet

cd dynet/macopt/
make test_macII
cd ../
make dynet

This will work for SunOs 5.5. For other UNIX platforms you may need to
adjust the makefile. (For some reason it compiles under the IS machine
jura (Linux) but crashes in run-time). The compliation will give a few
warnings, but these are not important and can be ignored.  Note that
in one place dynet uses a UNIX specific command via the C function
"system" (although this only prints the date so just delete that line
from dynet.c if it causes problems).

To run dynet, type

dynet specfile

where specfile is the only command line argument which dynet reads.
Typing "dynet -v" will give the version number only.
See the section "synthetic problem" for an example application.

When running dynet, most of the screen output is diagnostic
information, in particular warnings and error messages. You can reduce
it by lowering the verbosity value in the specfile. As all the
information you need is currently given in the output files, the
screen dump will not be explained here. Some of the screen dump is
from macopt: see the documentation on that code for details.



######################################################################
#                            The specfile                            #
######################################################################

The specfile contains all of the relevant information on the
temporal pattern (tp) files, network architecture and training
details. specfiles should be given the ".spec" suffix.

The specfile is read by searching for strings, such as
"train_network?_(yes/no)", therefore these strings must not
be changed or they wil be ignored. The relevant input for each
string must be the next item on the same line, e.g.

train_network?_(yes/no)				yes

The items in parentheses "(yes/no)" indicate the choices available.
Note that any line in the specfile preceded by a "#" symbol will be
ignored. Inappropriate inputs which disobey the required value will be
flagged as errors by dynet. The only exception is when real values are
specified instead of integers, in which case only the integer part of
the number will be used (ANSI C %f to %d conversion). The order of the
input strings is arbitrary, although there are a few restrictions,
most of which are obvious and are indicated below.

The various input strings in the specfile are described below, along
with the possible choices (given in rounds parentheses if not part of
the relevant string already) and the default value (in square
parentheses). Note that some strings do not have default values: their
values must be specified or dynet will exit with an error
message. Note also that a few of the options have not yet been
implemented or fully de-bugged: these are flagged below.  Most of the
input strings are prefixed by a three letter code indicating to what
part of the code they are relevant:

NET  network architecture
TRN  training of network
GRD  gradient descent
MAC  macopt (conjugate gradient optimizer)
APP  application of network

Note that in earlier versions of this manual the state variables were
often referred to as "recurrent inputs".  Although I now use the term
"state variable", there may still be some reference to "recurrent
inputs" in the program source code. The terms are synonymous.

The possible entries in the specfile are now listed and discussed.


verbosity_level_(0/1/2/3/4)			[2]
 - Amount of output from the program, ranging from 0 (nothing apart
   from error messages) to 4 (lots of diagnostic stuff).  Level 2 is
   appropriate for normal running. Level 1 will reduce this to
   only errors and warnings; level 0 will only report errors.

train_network?_(yes/no)				[no]
 - do you want to learn the weights from a given set of data.

apply_network?_(yes/no)				[no]
 - do you want to apply the network to a set of data.

NET:number_of_state_variables_(V)		(+ve integer) [no default]
 - total number of state variables

NET:number_of_measured_state_variables_(Vm)	(+ve integer) [no default]
 - number of state variables. Must have V>= Vm.  If V-Vm>0, then this
   is the number of "unmeasured" state variables (see BJMW).

NET:number_of_external_inputs_(X)		(integer >= 0) [no default]
 - number of external inputs (excludes input-hidden bias, which is
   automatically included)

NET:number_of_hidden_nodes_(H)			(integer >= 0) [no default]
 - number of hidden nodes in the one and only hidden layer (excludes
   hidden-output bias, which is automatically included). Note that you
   can set H=0, although I have no idea why you may want to do this.

NET:data_scaling_(none/var/maxmin/netsize)	[var]
 - the external inputs and state variables can, and should, be
   scaled. "var" separately scales each input and state variable to
   have zero mean and unit standard deviation (this is the recommended
   option). If all values for a given input or state variable node
   are the same, then the standard deviation is forced to 1.0 to prevent 
   divide by zero. "maxmin" is not yet implemented. The "netsize" option
   ensures that the sum input to the hidden layers does not grown with
   the number of inputs. The input-hidden transfer function is H =
   tanh (Hlam*S) where S is the sum over the product of each input and
   its associated input-hidden weight.  If the "netsize" option is
   used, Hlam is set to Hlam = 1/( sqrt((double)(Xsize+1+Vsize)). This
   is also included as part of the "var" scaling option.

NET:input_weight_file				(file name) [no default]
 - if dynet has already been trained and a weight file produced, this
   weight file can be read in using this option. This is used to
   continue training from a given set of weights, or if you just want
   to use these weights to evaluate state variable sequences for a
   given sequence of external inputs. Often you will train and apply
   in a single run, in which case this field does not need to be
   set. Weight files should be given the suffix ".wt".  See section
   below for details of file format. If you are using the data scaling
   option and continue training from a set of weights using a new set
   of training data, new scale values will not be calculated: the 
   scale values in the input weight file will be used.

TRN:output_weight_file				(file name) [dynet.wt]
 - The default weight file name is only there in case you forget to
   specify it yourself. If you just apply the network with a set of
   weights which you read in (using 'NET:input_weight_file') then this
   option will be ignored, i.e. the weights file will not be
   re-written.  Weight files should be given the suffix ".wt".  See
   section below for details of file format.

TRN:error_file					(file name) [no default]
 - name of file in which to error values at each training iterations.
   See section below for details of file format.

TRN:form_of_weight_init_(uniform/gaussian)	[uniform]
 - initial weights for the network are drawn from a uniform or a
   Gaussian distribution. (Actually, they can only be drawn from a
   uniform distribution as the Gaussian option has not been
   implemented).

TRN:initial_weight_range			(real value) [0.1]
 - scale of random distribution from which initial weights are
   drawn. If "uniform" distribution has been chosen, it will
   range from -wtrng to +wtrng, where wtrng is the value specified here.

TRN:random_number_seed				(integer value) [731]
 - used to seed selection of initial weights. 

TRN:optimization_method_(grd/macopt)		[macopt]
 - the weights can be optimised using the gradient descent method
   (grd) or a conjugate graident optimizer (macopt, written by David
   MacKay).  Both are implemented, but grd has not been fully tested.

TRN:update_method_(1/3/4)			[4]
 - With gradient descent, the weights can be updated in one of three
   ways:
   1. after each epoch of each temporal pattern file (this is like
      on-line learning or Real Time Recurrent Learning extended to
      multiple temporal patterns)
   3. after all epochs of each temporal pattern file
   4. after all epochs of all temporal pattern files (total batch)
   Although with macopt these options could also apply, only method
   no.4 has been implemented.

TRN:weight_decay_(none/default/list)		[none]
 - weight decay can be used to regularize the training procedure.
   1/sqrt(alpha) can be thought of as the standard deviation of the
   Gaussian prior over the weights (with zero mean).  The alpha
   parameters of the weight decay (see BJMW) can be set using the
   "list" option, or the default values can be used.  Note that the
   current version of dynet cannot learn the optimum alpha values from
   the data. If the list option is used, the following four lines must
   be specified:
TRN:alpha_VH					(real >=0)
TRN:alpha_XH					(real >=0)
TRN:alpha_bH					(real >=0)
TRN:alpha_HY					(real >=0)
 - These are the alpha parameters for the state variable to hidden,
   external input to hidden, input bias to hidden, and hidden to
   output weights respectively. alpha values can be set to zero
   (regularisation turned off), e.g. if you only want to apply weight 
   decay to some sets of weights. The default value for all alphas
   is 0.04.

TRN:use_beta_parameters?_(yes/no)		[no]
 - beta is the coefficient of the error term for each state variable.
   beta sets the level of modelling precision which you want to achieve
   for each state variable. In general this is limited by the noise in
   the data, so 1/sqrt(beta) should be considered as the standard 
   deviation of the noise in the state variable. If this option is set 
   to "no", all of the beta values are set to the default value, 6.0. 
   Otherwise, the user must specify the beta values for the V state 
   variables on the next V lines. Thus if V=2 the next two lines 
   would be:
TRN:beta					(real >=0)
TRN:beta					(real >=0)
   for the first and second state variables respectively. If the
   number of beta values specified is fewer than V, the remainder will
   be set to the last value of beta given (I don't recommend you use,
   this, but it's useful if you're experimenting with different numbers
   of unmeasured state variables and forget to alter the number of
   betas). If any beta value for any state variable is set to zero, 
   then that state variable will not contribute anything to the error
   function. You can think of this as saying that the noise on this 
   variable is infinite, so you don't care what it's value is. I don't 
   know why you may want to do this, but the option is there. Note that 
   if you use scaling, then beta is on the scale of the scaled 
   variables, not the raw values in the temporal pattern files (see the 
   "Tips" section). 

TRN:number_of_temporal_pattern_files		(+ve integer) [0]
 - if the value specified is P, then the next P lines must be the
   names of the P temporal pattern files to be used for training the
   network. The required format of these files is specified in the
   next section. The files should have the suffix ".tpin1".

GRD:number_of_iterations			(+ve integer) [0]
 - if using gradient descent, this is the total number of training
   iterations which will be performed

GRD:learning_rate				(+ve real) [0.0]
 - if using gradient descent, this is the learning rate (eta).

MAC:convergence_tolerance_gradient		(+ve real) [no default]
 - if using macopt, this is the gradient convergence tolerance.
   Training will stop once the gradient of the total error
   function is less than the modulus of this value. The total 
   error is defined in BJMW. This value is crucial and is dependent 
   upon: 
   (1) the scale of the training data
   (2) the total number of target values (error calculations)
   (3) the beta terms
   (4) the alpha terms (if using weight decay)

MAC:maximum_number_of_iterations		(+ve integer) [no default]
 - if using macopt, this is the maximum number of training iterations
   which will be performed.

MAC:perform_maccheckgrad?_(yes/no)		[no]
 - if using macopt, you can check that the gradient dynet is
   evaluating correct by using a routine in macopt which compares the
   analytic gradient with one calculated using first differences. This
   should only be necessary when debugging, but may be worth checking
   if dynet appears to be going wild.

MAC:maccheckgrad_tolerance			(real >=0) [0.000001]
 - to tolerance at which to check the gradient.

APP:plot_file_name				(file name) [no default]
 - when dynet is applied to a new set of data, the values of the
   state variable at the last epoch for each temporal pattern file
   are written to this file. See section below for details of file format.
   File should be given the ".dat" extension.

APP:include_v(t=0)_in_plot_file?_(yes/no)	[no]
 - allows you to also have the initial v values written to the ".dat"
   file specified by "APP:plot_file_name".

APP:write_tper_files?_(yes/no)			[no]
 - Select whether or not you want an error file for each temporal
   pattern. See section below for details of file contents.

APP:number_of_temporal_pattern_files		(+ve integer) [no default]
 - if the value specified is P, then the next P lines must be the
   names of the P temporal pattern files to which dynet is applied to
   give temporal sequences. The required format of these files is
   specified in the next section. The files should have the suffix
   ".tpin1".



Obsolete options
----------------

The following specfile options are retained for compatibility with
previous versions. Do not use in addition replacement option! (If you
do, the last in the specfile will apply).

MAC:convergence_tolerance			(+ve real) [no default]
 - if using macopt, this is the gradient convergence tolerance.
   In other words, once the *square* of the gradient is less than
   this value, training will halt.
   replaced with "MAC:convergence_tolerance_gradient"

NET:number_of_recurrent_inputs_(V)		(+ve integer) [no default]
 - replaced with NET:number_of_state_variables_(V)

NET:number_of_measured_recurrent_inputs_(Vm)	(+ve integer) [no default]
 - replaced with NET:number_of_measured_state_variables_(Vm)

TRN:number_of_temporal_process_files		(+ve integer)
 - replaced with TRN:number_of_temporal_pattern_files

APP:number_of_temporal_process_files		(+ve integer)
 - replaced with APP:number_of_temporal_pattern_files



######################################################################
#                           File formats                             #
######################################################################

Input files (user written):   tp input files: ".tpin1" (or ".tpin2")
Output files (dynet written): error files: ".err"
			      weight files: ".wt"
			      tp output files: ".tpot"
			      tp error files: ".tper"
			      final epoch files: ".dat"



Temporal Pattern input files (.tpin1 .tpin2)
--------------------------------------------

The temporal pattern input files contain the time series (temporal
patterns) which you wish dynet to model.  You may give then any
appendix you like, but using a name "XXX.tpin1" means that dynet will
recognise the suffix and generate corresponding output files
"XXX.tpot".


An example of the file is as follows.

# dynet tpin file  - do not add or remove lines
# #############################################
# Vm (meas rec), X (ext input), epochs:
  2              2              11
# Data (epoch/recurrent/external):
  0  0.00   0.00000   0.00000  -0.74179   0.32059
  1  0.49   x         x         0.01926   0.53881
  2  0.34   x         x        -0.83042   0.50905
  3  0.24   x         x        -0.94573  -0.51070
  4  0.27   x         x        -0.68900  -0.14798
  5  0.18   x         x         0.88249  -0.59774
  6  0.03   x         x        -0.79563   0.91946
  7  0.21   x         x         0.37771  -0.45139
  8  0.14   x         x         0.32615  -0.88162
  9  0.30   x         x         0.16759   0.60141
 10  0.49   2.24016  -0.16262   0.85671   0.37204

The header must consist of five lines. The first three are comment
lines. The fourth line has three fields:
1. number of measured state variables, Vm
2. number of external inputs, X
3. number of epochs, N
The fifth line is a comment line.
The next N lines are the data at the N epochs. The first epoch sets
the initial conditions. There are 2+Vm+X columns.
1. epoch label, t (can be any single string, e.g. consecutive integers 
   to number lines or total elapsed time)
2. time step between the current epoch and the previous epoch, dt.
   It follows from this definition that dt at the initial epoch is not
   used by dynet, but it is convenient to set this to zero for clarity.
3. The next Vm columns are the values of the Vm state variables.  When
   training dynet, these form the target values used to define the
   error which is to be minimized. (The exception is the intial values
   of v, i.e. v(t=0).)  When applying dynet, we will typically only
   have v(t=0). If v is specified at additional epochs, these will not
   be used by dynet in the application phase. Whenever a v is not
   specified an "x" or "X" should be written. If ever any v at t=0 is
   not specified (i.e. an "x" written), dynet will set it to the
   default value VINITDEF, which is zero. This is necessary as we must
   always have an initial condition for a dynamical system.
   I RECOMMEND THAT YOU ALWAYS SPECIFY THE INITIAL VALUES OF V.
   (The VINITDEF value will not be subject to any scaling you are
   using in dynet, that is, *within the network* the initial values
   will be set to VINITDEF. If scaling is used these initial values
   will translate to other values in the output files. While I think
   the code will handle all this properly, I still strongly recommend
   for your own sanity to specify v(t=0), whether or not you used
   scaling.  Your specified values will, of course, be subject to any
   scaling you use.)
   In training dynet we will often only specify v at
   the initial and final epochs. However, if v is specified at
   intermediate epochs these values will also be used to define the
   minimization error, and thus help to learn the weights.
4. The next X columns are the values of the external inputs.  These
   must be defined at every epoch (a future development of dynet is
   intended in which this restriction will be lifted).

Note that the data at a single epoch is *not* an input--target pair.
The whole point of the dynamic model in dynet (which is a discrete
approximation of a first order PDE) is that x and v values at time t-1
produce a v at time t. Therefore, the target for x(t-1) and v(t-1) is
v(t), i.e. the value on the next line. It follows then that the
external inputs at the final epoch are not used by dynet. However,
they should not just be left as blanks. Just write in some number,
e.g. 0.  In a future version I'll allow you to write the conventional
"x" or "X".

The data can be specified in floating point (746.35) or exponential
(7.4635e+02 or 7.4635E+02) format.

I use the file suffix ".tpin2" is used for files in which the complete
state variable sequence is specified, and the suffix ".tpin1" used
when only the intial and final state variables are specified.  Of
course you are free to specify whatever targets you like in these
files.

You can, of course, used log values as the inputs and state
variables. However, linear (non-logged) values for the dt steps must
be used. This is on account of the first order expansion of the Taylor
series which evaluates the next state variables on the bases of the
output (derivative) and previous state variables.



Error files (.err)
------------------

This is a dump of the nework error function and the error surface
gradient as a function of iteration number. It is currently only
produced when using macopt for training. The file name is specified by
the "TRN:error_file" string in the training file.

The file has 7 columns:

1. training iteration number
2. the likelihood error, lerr
3. the fractional contribution of lerr to the total error,
   i.e. lerr/toterr
4. the weight decay (regularization) error, werr
5. the fractional contribution of werr to the total error,
   i.e. werr/toterr
6. the total error, toterr = lerr + werr
7. the gradient, g. g = sqrt(gg), where gg is the squared gradient
   written by macopt (and written to STOUT when verbose>=2).

Note that the errors scale with the total number of targets defined in
the training data. The errors are also in terms of the scaled
variables internal to the program. The gradient has similar
dependencies. The data in this file is really intended for a
qualitative indiciation of how training proceeds, or for making
comparisons between different network models trained with identical
data sets.


Weight files (.wt)
------------------

The weight file is written by dynet after training, the file name
being specified by the specfile string "TRN:output_weight_file".  The
weight files can also be read in by dynet using the string
"APP:input_weight_file". A typical weight file is:

# dynet weights file - do not add or remove lines
# ###############################################
# V (tot state), Vm (meas state), X (ext input), H (hidden): (exc biases)
  2              2                2              8
# scaling type: 
var
# V (state variables) mean and stdev scaling factors:
  6.17275e-01  8.04589e-01
 -1.55049e+00  3.46766e+00
# X (external input) mean and stdev scaling factors:
 -6.07996e-01  6.13417e-01
  5.71975e-01  6.80077e-01
# Lambda scale parameter for hidden layer:
  0.44721
# wtVH (state-hidden weights):
 -0.56010  -1.29233   0.45621   0.66428   2.30999   0.48269  -0.15339  -1.21327 
 -1.02396   1.33431   1.21793   1.34536   2.67156  -1.26362   0.94542   0.34031 
# wtXH (input-hidden weights):
 -1.99026   1.60542   0.50664   1.57786   4.08565  -0.11660   2.82143  -0.85219 
  0.06155   1.16499   2.59686  -0.45088  -1.31578  -2.89263   0.31608  -0.22411 
  2.50053   1.29021   1.24943   0.87688  -2.49171  -2.27946   1.72011   1.44805 
# wtHY (hidden-output weights):
  2.07704   2.22658 
  0.56307   0.22885 
  1.47584  -2.00067 
 -0.10726   0.80079 
  1.28100   3.81347 
  1.06695   1.72487 
 -2.92203  -0.39190 
  2.28566   1.23744 
  0.84620  -0.79857

The first three lines are comments. The fourth contains four fields:
1. total number of state variables, V
2. number of measured state variables, Vm
3. number of external inputs, X
4. number of hidden nodes, H
The fifth line is a comment line.
The sixth line specifies the data scaling type. If this is "var", then
the next lines specify the scaling factors, as shown in the above
example.
If "var" or "netsize" scaling is given, the Hlam parameter is also
specified. 
The weights themselves are specified in three groups:
1. wtVH (state variable-hidden weights)
2. wtXH (input-hidden weights)
3. wtHY (hidden-output weights)



Temporal Pattern output files (.tpot)
-------------------------------------

For each temporal pattern input file to which dynet is applied, a
temporal pattern output file is produced. If the input file name is
TPFILE, the corresponding output file is called TPFILE.tpot.  The
exception is when TPFILE has the suffix ".tpin1" or ".tpin2", in which
case this suffix is *replaced* with ".tpot". dynet tells you when it
runs exactly what the tpot files will be called, e.g.

syn.04.200.tpin1 -> syn.04.200.tpot

A typical tpot file is

# dynet temporal pattern output file
# ##################################
# input   file = syn.04.600.tpin1
# weights file = syn.04d.wt
# V (tot state), Vm (meas state), epochs:
  2              2                11
# State variables (epoch/measured/unmeasured):
  0  -0.00000  -0.00000 
  1   0.46486  -0.19967 
  2   0.36306  -0.32085 
  3   0.57170  -0.39786 
  4   0.98769  -0.36904 
  5   1.20681  -0.35651 
  6   1.17218  -0.36505 
  7   1.32371  -0.34871 
  8   1.27403  -0.36157 
  9   1.29278  -0.28753 
 10   2.10707  -0.46257 

The first five lines are comment lines, which tell you what the
corresponding tpin file and weight files are. The sixth line has three
fields:
1. number of state variables, V
2. number of measured state variables, Vm
3. number of epochs, N
The seventh line is a comment line.

The next N lines are the values of the V state variables at the N
epochs. The first epoch is the initial conditions.
There are V+1 columns:
1. epoch label, t, numbering from 0 to N-1 inclusive.
The next Vm columns are the Vm measured state variables.
The next V-Vm columns are the unmeasured state variables.



Temporal Pattern error files (.tper)
------------------------------------

These are very similar to the tpot files, but with the targets and
some additional error information added.  The tper files are only
written if "APP:write_tper_files?_(yes/no)" is set to "yes" in the
specfile.  See the documentation above on the format of the tpot
files. A typical tper file is:

# dynet temporal pattern error file
# ##################################
# input   file = syn.04.600.tpin1
# weights file = syn.04d.wt
# V (tot state), Vm (meas state), epochs:
  2              2                11
# Measured (state/target/state-target/|diff/target|):
  0  -0.00000 -0.00000  0.00    0.00  -0.00000 -0.00000  0.00    0.00 
  1   0.46486  -------  0.42805 ----  -0.19967  ------- -0.28822 ---- 
  2   0.36306  -------  0.32624 ----  -0.32085  ------- -0.40940 ---- 
  3   0.57170  -------  0.53489 ----  -0.39786  ------- -0.48641 ---- 
  4   0.98769  -------  0.95088 ----  -0.36904  ------- -0.45759 ---- 
  5   1.20681  -------  1.17000 ----  -0.35651  ------- -0.44506 ---- 
  6   1.17218  -------  1.13536 ----  -0.36505  ------- -0.45360 ---- 
  7   1.32371  -------  1.28690 ----  -0.34871  ------- -0.43726 ---- 
  8   1.27403  -------  1.23721 ----  -0.36157  ------- -0.45012 ---- 
  9   1.29278  -------  1.25597 ----  -0.28753  ------- -0.37608 ---- 
 20   2.10707  2.24016 -0.13309 0.06  -0.46257 -0.16262 -0.29995 1.84 

The header (first seven lines) are the same as the corresponding tpot
file. The last N lines consist of 1+4*Vm columns. The first column is
t. There are then four columns for each state variable:
1. The state variable at that epoch (as given in tpot file), v
2. The corresponding target, t (as given in the tpin file). If no
   target was specified in the tpin file then "-------" will appear.
3. diff = v - t 
4. |diff/t|
If t=0, column 4 will show "Div0", to flag a divide by zero.
If t=diff=0, column 4 will show "0.00".



Final Epoch files (.dat)
------------------------

Intended for making a plot of predicted vs. measured for the final
epoch for all tp files. It consists of P lines, where P is specified
in the specfile with the string "APP:number_of_temporal_pattern_files".
There are 1+(2*Vm) columns.
The first column labels the patterns from 1 to P inclusive.
There are then two columns for each state variable:
1. The state variable at the final epoch (as given in the tpot file)
2. The corresponding target (as given in the tpin file). If no
   target was specified in the tpin file then "-------" will appear.

If "APP:include_v(t=0)_in_plot_file?_(yes/no)" was set to "yes" in the
specfile, then an additional Vm columns will be added on the right
which give the initial conditions for each pattern. If initial
conditions for any pattern were not specified then the default value,
VINITDEF, will be written.




######################################################################
#                        Tips on using dynet                         #
######################################################################

1. Unless you have good reason to do otherwise, always use the "var"
scaling option. Remember that the value of beta is then in terms of
the scaled variables. As the var scaling option gives unit standard
deviation, 1/SQRT(beta_k) can be roughly interpreted as a percentage
uncertainty in the k^th variable.  This can be useful as it means that
you don't have to change beta whenever you change the training data
files. (The default beta_k value of 6.0 corresponds to a standard
deviation of 0.4. Thus if the data are variance scaled and roughly
normally distributed, 95% of the data will lie in the range -2 to +2,
so this standard deviation corresponds to about a 10% uncertainty.)

2. If you use regularisation, the appropriate alpha value for the
hidden-output weights depends on the size of the delta_t terms in your
temporal pattern files. For example, if delta_t is very small, then
the outputs, and hence the hidden-output weights, need to be large,
but this will be inhibited if alpha_HY is too small. The appropriate
alpha can be calculated from the scale of the state variables (easy if
the variance scaling option is used) and the size of the delta_t
terms. However, if your time series data have a wide range of delta_t
(time separation between successive epochs), I recommend setting
turning off the regularisation of the hidden-output weights,
i.e. setting "TRN:alpha_HY" to zero.

3. Successful training is crucially dependent on correct settings of
the alpha and beta parameters, and the macopt convergence tolerance
parameter. Unfortunately there is no good robust criterion for setting
the latter. However, I suggest the following procedure:

(a) Set the beta parameters to appropriate values based on your
knowledge of the assumed error in the target state variables (or the
degree of precision you wish to achieve in the case of zero
noise). Remeber that beta is specified in terms of the scaled state
variables, if scaling is being used.

(b) Set the values of alpha to give a suitable amount of
regularisation (use the defaults if in doubt). This means getting a
suitable ratio between the data and weight decay error terms in the
error equation (eqn. 24 of BJMW).

(c) dynet will converge once the gradient of the total error is less
than the specified tolerance.  Evaluate the order of magnitude of the
total error near a minimum.  Assuming that some small change in the
weights is unimportant (e.g. 0.01, although this depends upon alpha)
you can estimate the error gradient tolerance. However, this will only
be a rough estimate so you may need to do some trial and error. In
particular, I recommend plotting a graph of error vs. training
iteration (data in the .err file) and ensuring that the error has
leveled out before training is stopped.  Of course, you must also
ensure that the maximum number of iterations is sufficiently
large. Remember that changing the total number of training patterns
changes the total error, so the convergence tolerance may need to be
changed.



######################################################################
#          Modifications since the previous release version          #
######################################################################

Latest version: v1.22, 03.02.99

Modifications between v1.19 and v1.22
------------------------------------- 

Version 1.22 is completely compatible with all files (in particular
specfiles, temporal pattern files and weight files) from version 1.19.

1. Missing values in the external inputs of the temporal pattern files
now trapped by dynet if they are written as "x" or "X". Note that
dynet cannot cope with missing values, but this modification means
that dynet exists graciously rather than crashing.  External epochs at
final epoch are not used by dynet, and do not now have to be
specified. Before you had to give them a numeric value anyway, but now
you can use the "no data" symbol, "x" or "X".

2. The dynetsubs.c program has been rearranged and renamed netsubs.c
(as it is general to more than just dynet) and the stuff in dynet.h
specific to netsubs.c has been put into the new file netsubs.h. The
subroutine wderr() has also been moved from netsubs.c to dynet.c.

3. Default beta changed from 1 to 6.0. This is roughly a 10% error
when using variance scaling.

4. Default alpha_HY changed from 0.04 to 0.0, i.e. no regularisation
on the hidden-output weights.

5. If using scaling and all inputs or state variables are the same for
a given node, then standard deviation is forced to 1.0 to avoid divide
by zero in datascale(). Although this error was trapped previously,
dynet did not allow you to proceed.

6. The option "MAC:convergence_tolerance_gradient" is now used to
specify the convergence tolerance *gradient* rather than the square of
the gradient. However, the old version, "MAC:convergence_tolerance"
is retained for backwards compatibility.

7. Set a few more defaults and trapped a few more non-set parameters
in specfile.

8. Error training file (.err file) bug corrected so that it doesn't
write a bogus first line, and writes error at final epoch.

9. Minor improvements to the manual, and the addition of two new
sections, "Tips on using dynet" and "Modifications since the previous
release version".



######################################################################
#                        The synthetic problem                       #
######################################################################

The directory syn/ contains files for a synthetic problem.  It is the
same problem as discussed in Bailer-Jones, MacKay & Withers 1998.  It
consists of two external inputs, x1 and x2, and two state variables,
v1 and v2.  The problem is:

dv1/dt = x1 - 2*v1 + 8*v2 - x1*v1
dv2/dt = x2 - 5*v1 +   v2 - x2*v2

The autonomous part of this dynamical system (that with the external
inputs set to zero) is a decaying harmonic oscillator, with period 1.0
and e^-1 damping timescale 2.0.

The files syn.26.000.tpin2 to syn.26.099.tpin2 are 100 instantiations
of this dynamical systems (i.e. 100 temporal patterns). In all cases
the x input sequences were generated from constrained random walks: x1
(x2) changes with a probability per unit time of 0.65 (0.999) by a
random amount uniformly distributed between -0.5 and +0.5 (-1 and +1).
The modulus of x is then taken to ensure a positive sequence.  The
initial v values were randomly selected from a uniform distribution
between -1 and +1. The sequences were simulated numerically between
t=0 and t=8 inclusive, and sampled with a constant epoch spacing of
dt=0.1.  Thus the files contain 81 lines.

As explained above, the corresponding .tpin1 files
(i.e. syn.26.000.tpin1 to syn.26.099.tpin1) are the same files but
with the state variable data removed at all but the initial and final
state variables removed. The files syn.26.200.tpin1 to
syn.26.299.tpin1 are the same temporal patterns but with the v1 state
variable removed completely. These can be used to test the performance
of dynet with an unmeasured state variable.

The specfile provided, syn.26a.spec, is set up to apply dynet to the
syn.26.000.tpin1 to syn.26.099.tpin1 files.  The weights file,
syn/syn.26a.wt, is the result of having trained dynet on the 50
syn.26.000.tpin1 to syn.26.049.tpin1 files: the screen dump from this
run can be seen in the file syn/syn.26a.out; the error file is
syn/syn.26a.err. Running dynet on the specfile as it stands will
produce syn/syn.26a.dat and the .tpot and .tper files for the temporal
patterns.  I suggest that you play with this synthetic problem (in
particular, retraining dynet) to get a feel for dynet and to get
familiar with the specfile. (N.B. Don't expect to get *exactly* the
same output files if you re-train dynet using the same specfile, as
slightly different specs were used!)




######################################################################
#                        The dynet program                           #
######################################################################

The advanced user may find the following notes useful, although they
have not been written with the user in mind and are far from
comprehensive.


Subroutines
-----------

The subroutines within dynet are ordered in the following manner:

Principal Control Routines	 dynettrain, dynetapply
Forward Pass Routines		 dynetloop
Gradient Descent Routines	 graddescent, updatewt
Macopt-relevant Routines	 callmacopt, dymacint, dymacfn
Gradient Evaluation Routines	 ederiv, cumederivs
Initialisation Routines		 dynetinit, dynetloopinit, dysyswtinit
Scaling Routines		 scalecalc, datascale, unscale
Input/Output Routines		 specread, dataread, writeweights,
				 evtpnewname

The non-dynet-specific subroutines, in particular memory
allocation/deallocations, and the transfer and error functions, are in
the netsubs.c file.

scalecalc(): Evaluates scaling for the data.  Options will be
implemented but the current procedure is to scale data to have zero
mean and unit variance. Although dynet deals with different patterns
which may show behaviour over very different time scales, we still
expect a given input variable to be of the same type for all
patterns. In particular we expect any input to be of the same scale
for all patterns. Therefore we only need one mean and one variance
parameter for any one input.

scalecalc(): This routine always calculates Hlam based on the size of
the network.  Therefore if you choose var or maxmin data scaling Hlam
will also be set.

datascale(): Scales the data using scaling factors calculated by
scalecalc() or read in from a file. This scales both the external
inputs and the state variables. Note that scaling of the latter
automatically scales the outputs (Y). No matter what size the
delta_time terms are, the Y values can accommodate this and keep the V
values in scale. This is because of the linear output transfer
function which gives the Y values an arbitrarily large dynamic
range. This has implications for weight decay as we don't want weight
decay to penalize large hidden-output weights just because, for
example, the time_deltas are very small (thus requiring the Ys and
hence the wtHY weights to be large).

What I call a single pattern is a single temporal pattern, i.e.  a
time sequence of external inputs and corresponding state variables.  I
deal with one pattern at a time, i.e. I evaluate the derivatives at
all of the epochs of a given pattern before moving onto the next
pattern. When I update the weights is decided in graddescent() where
there are the three update method available. When using macopt() only
the batch update method is implemented.

dysysinit(): Initializes the dynamic system. Initialisation must occur
when a new pattern is presented to the network for training.  All of
the _prev variables are initialized to zero.  If they need to be
changed then the default values are set in the header file.  However
this would surely only apply to v_prev and y_prev as I cannot see why
te initial weight derivatives should be anything but zero. The effects
of the data scaling must be considered when initialising the system.

Pointer arithmetic is done in callmacopt and dymacint to accommodate
the use of single offset vectors macopt. This potentially makes the
code less robust as it would screw up if the types for the weight
(wtvec) and error gradient (wt_grad) were ever changed (from the
present double type). However, the change required would simply be to
define wtvec and wt_grad as floats rather than doubles, which is
the kind of change one should make anyway if the types of a dependent
subroutine change.


Bayesian Aspects
----------------

The network incorporates basic Bayesian features via the noise terms
(beta parameters) and weight decay terms (alpha parameters).

There is a separate noise term for each state variable.
There are four classes of weight decay terms:

alpha[0]  state variable to hidden weights (VH)
alpha[1]  external input to hidden weights (XH)
alpha[2]  input bias to hidden weights
alpha[3]  hidden to output weights (inc. bias) (HY)

Given the argument in the "Scaling" section you may not want to use
the alpha[3] term: the scale of the delta(t) terms will influence your
choice of alpha[3].

Both the alpha and beta terms are used in the ederiv() subroutine for
calculating the total error derivative. Thus the ed terms returned by
this subroutine contain both the likelihood term (beta) and the prior
(alpha).

Note that the scale of the gradient which the network evaluates
depends on the scale of the following:

1. the input data
2. the weights
3. the Bayesian parameters alpha and beta

Provided you use some kind of data scaling, the first two are taken
care of. However, the third is not. In particular, using values of
alpha and beta which differ signficantly from unity will mean that the
macopt convergence tolerance will have to be changed.
Assuming beta >> alpha, then increasing beta by a factor of x will
require MAC:convergence_tolerance to be increased by a factor of x too.



Variable names
--------------

Vsize, Xsize, Hsize and Ysize are numbers of nodes in the state
variable layer, external input layer, hidden layer and output
layer respectively. These numbers do not include biases.
Corresponding vectors (v,x,h,y) start at 0.
Input layer bias is x[Xsize] and is introduced to network by adding
extra constant input to input vectors.
Hidden layer bias is h[Hsize] and is set to HBIAS.

counting variables with dedicated use:

p - pattern
t - epoch
k - state variable (V) node
l - external input (X) node
m - hidden node
n - output node
i,j - general node values in ederiv()

tar is generally a 3D array of type targets. Variable ordering is:

tar[p][t][k] (pattern,epoch,node)  1 <= p <= Npats	
				   0 <= t <  ntsteps[p]
				   0 <= k <  Vsize
tar is a structure with two fields. The first, .def, specifies whether
or not a target is specified. The second, .val, gives the value. If
tar[p][t][k].def is zero, then the target has not been specified by
the user, and you cannot expect tar[p][t][k].val to be meaningful.
If you use scaling, the values stored in the tar array will be
changed. If the initial tar values, tar[p][0][k], are not defined,
we use the default value VINITDEF to initialise the sequence. Note,
however, that this value is not written into tar[p][0][k], nor is
tar[p][t][k].def set to 1.
