Re: [DynareDev] C++ Kalman filter

4 Jul 2009

      Hi all,
I will keep working on this following Michel and George suggestions. h
I agree with George that at first look the public code of dgemm looks
quite ordinary.
In my successive attempt  I have tried to specialized a matrix times
matrix (or vector)  having in mind the special case of KF.
Just counting the number of operations there is a reduction.
I will do the various test (in the line of George and Michel)  and we will see.
I will off for a week or so, but I will do some work in the spare
time. No much chance to have Internet connection though
Best
Andrea
On Sat, Jul 4, 2009 at 1:07 PM, G.
Perendiageorge@perendia.orangehome.co.uk wrote:
...
Dear Michel (et al)

I will continue refactoring KF.

When you say
"The multiplications should be done using optimized Lapack DGEM"
do you mean stand-alone ATLAS or Matlab  DGEMM (I suspect the Matlab one)?

Another possible reason we can not beat dgemm is that Matlab dgemm

function is not just a compilation of the publicly available BLAS dgemm.f ,
highly optimised by a compiler, but a function with the same name and
interface which has been rewritten by Mathworks specially for Matlab, a
system which justifies its commercial existence on fast matrix manipulation.
It is possible they rewrote few of the most commonly used routines to boost
the performance even further. E.g. It may use direct memory manipulation,
multi-threading, blocking/subdivision and poss. some other size reduction
features for large or even special handlers for QT and symmetric matrices
that are not present in the public domain version of dgemm.f (or in the
BOOST library).
E.g., public version of dgemm.f  uses classic nested loop such as
IF (BETA.EQ.ZERO) THEN
DO 20 J = 1,N
   DO 10 I = 1,M
       C(I,J) = ZERO
....
whilst memcpy() is  a much quicker way to (re-)initialise a memory range to
a values such as 0  respectively.
Hence, a test for our compilers, QT and the possible additional boosting
features of Matlab dgemm would be the comparison of timing of our test
routines being linked to:
a) our own optimised  compilation of the publicly available dgemm/dgemv in
the way we do it for QT routines, and see how that scores against both, the
same tests when using and linked to
b) our QT routines on one and to
c) the Matlab dgemm/dgemv  on another hand.
Best regards
George
artilogica@btconnect.com
----- Original Message -----
From: "Michel Juillard" michel.juillard@ens.fr
To: "G. Perendia" george@perendia.orangehome.co.uk; "'Andrea Pagano'"
andrea.pagano@jrc.it; "List for Dynare developers" dev@dynare.org
Sent: Friday, July 03, 2009 8:07 PM
Subject: C++ Kalman filter
...
Dear Andrea and George,
than you for your efforts on the Kalman filter. I would like to regroup
efforts along the following lines:

Kalman filter C++:

I would still like to be able to compare with an implementation that
removes all object instantiation inside the loop: the lowest level
object should be Kalman filter and all necessary work space should be
allocated once for ever outside the main loop of the filter. The
multiplications should be done using optimized Lapack DGEM. George
should continue along this line. I'm suspicious of timings taken in
Windows and we should double check them under Linux. For this reason as
well, Makefile should also provide for building of tests under Linux.
George and Andrea should have accounts on karaba.  Sebastien, can you
take care of that?

Unless I'm missing something, there are only two explanations for not

being able to beat DGEMM
a) our compilers don't optimize as well as the one used by Matlab
b) there is a problem in the algorithm.
In order to eliminate and be able to quantify the importance of (a), I
would like Andrea to write a standalone test for both operations:

A*x
A*P*A'

calling DGEMM (from Matlab's Lapack library) and compare it with his own
routines.
A minimum modification would be to take DGEMM as it is and modify the
loops so as to accomodate a quasi-triangular matrix instead of a general
matrix. In this exercise, all lower level BLAS functions should be
linked with Matlab's BLAS library.
We will make a new attempt to integrate Andrea's routines in the Kalman
filter only when we obtain satisfactory standalone test results.
I think it is important to spend more time on these tests, because we
are evaluating how much speed improvement we can expect from using
compiled code instead of Matlab.
All the best,
Michel
-- 
Andrea Pagano
via Veratti VARESE
tel. +3903321691261
cell.+393403804397

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: [DynareDev] C++ Kalman filter