Dear Andrea and George,
than you for your efforts on the Kalman filter. I would like to regroup efforts along the following lines: 1) Kalman filter C++: I would still like to be able to compare with an implementation that removes all object instantiation inside the loop: the lowest level object should be Kalman filter and all necessary work space should be allocated once for ever outside the main loop of the filter. The multiplications should be done using optimized Lapack DGEM. George should continue along this line. I'm suspicious of timings taken in Windows and we should double check them under Linux. For this reason as well, Makefile should also provide for building of tests under Linux. George and Andrea should have accounts on karaba. Sebastien, can you take care of that?
2) Unless I'm missing something, there are only two explanations for not being able to beat DGEMM a) our compilers don't optimize as well as the one used by Matlab b) there is a problem in the algorithm.
In order to eliminate and be able to quantify the importance of (a), I would like Andrea to write a standalone test for both operations: - A*x - A*P*A' calling DGEMM (from Matlab's Lapack library) and compare it with his own routines.
A minimum modification would be to take DGEMM as it is and modify the loops so as to accomodate a quasi-triangular matrix instead of a general matrix. In this exercise, all lower level BLAS functions should be linked with Matlab's BLAS library.
We will make a new attempt to integrate Andrea's routines in the Kalman filter only when we obtain satisfactory standalone test results.
I think it is important to spend more time on these tests, because we are evaluating how much speed improvement we can expect from using compiled code instead of Matlab.
All the best,
Michel