Dear Andrea and George,
than you for your efforts on the Kalman filter. I would like to regroup
efforts along the following lines:
1) Kalman filter C++:
I would still like to be able to compare with an implementation that
removes all object instantiation inside the loop: the lowest level
object should be Kalman filter and all necessary work space should be
allocated once for ever outside the main loop of the filter. The
multiplications should be done using optimized Lapack DGEM. George
should continue along this line. I'm suspicious of timings taken in
Windows and we should double check them under Linux. For this reason as
well, Makefile should also provide for building of tests under Linux.
George and Andrea should have accounts on karaba. Sebastien, can you
take care of that?
2) Unless I'm missing something, there are only two explanations for not
being able to beat DGEMM
a) our compilers don't optimize as well as the one used by Matlab
b) there is a problem in the algorithm.
In order to eliminate and be able to quantify the importance of (a), I
would like Andrea to write a standalone test for both operations:
- A*x
- A*P*A'
calling DGEMM (from Matlab's Lapack library) and compare it with his own
routines.
A minimum modification would be to take DGEMM as it is and modify the
loops so as to accomodate a quasi-triangular matrix instead of a general
matrix. In this exercise, all lower level BLAS functions should be
linked with Matlab's BLAS library.
We will make a new attempt to integrate Andrea's routines in the Kalman
filter only when we obtain satisfactory standalone test results.
I think it is important to spend more time on these tests, because we
are evaluating how much speed improvement we can expect from using
compiled code instead of Matlab.
All the best,
Michel