Dear Michel
1) Re: Small model performance:
a) running recently again 10,000 loops with Matlab KF on the same small and fast converging model but this time with conversion turned off(*) I got the following results for matlabKF_time
1st run: 277.6900, 2nd run: 283.9890 3rd (today): 316.3650**
which is much higher compared to matlabKF_time (normal, with conversion working) run initially at around 48.9600
b) Initially, calling Kalman_filters.DLL in the 10,000 loop, (with preparation of H and Z matrices in each loop), total_dll_time = 202.7320, and a rerun 161.7530 which is actually faster than Matlab without convergion!
Running the same tests with Kalman_filters.DLL called in the 10,000 loop, after the two stages of refactoring: on 11th June total_dll_times = 128.0240, and 117.9300
(*) the P conversion and other shortcuts were switched off by setting kalman and riccati tol. (or just riccati) to -1 (**) Matlab execution ties vary greatly from run to run
2) OK, I will start integrating Andrea's library now.
3) Re GPROF output: You are right, a few functions used inside C++KF loop work as copy-constructors, e.g. A= B*C+D is copy-constructor for A too, whilst on occasion, a matrix is constructed inside KF loop first (e.g. F=H as F(H)) before it is used.as a host (and target) of a complex embedded operation and some of these could be a subject to the next stage of refactoring which is what I initially thought of doing next (i.e. before the integration).
Best regards
George
----- Original Message ----- From: "Michel Juillard" michel.juillard@ens.fr To: "List for Dynare developers" dev@dynare.org Sent: Thursday, June 11, 2009 8:33 PM Subject: Re: [DynareDev] Kalman Filter
Thanks George,
I understand that for sw_euro_3 the time of C++ and Matlab are about the same. For the small model, is C++ ¨significantly better" than Matlab or than before?
It seems a good time to try Andrea´s code, but we need to be able to carefully measure its contribution. So it is necessary to time the operations that this code performs in standard and in the improved implementation.
In the GPROF output, I'm surprised at the number -- and therefore time consumption -- of calls to matrix constructors and destructors. It looks as if matrices were constructed inside the filter loop. It would seems more efficient to allocate the necessary space once and use it over and over again. I suspect that it has to do with the very high modularity of the current implementation and that we will need to rewrite it basically from scratch in a more integrated manner.
Best,
Michel
G. Perendia wrote:
Dear Michel
- With the 2nd cut of refactoring we achieved another substantial
performance improvement, about 25-30% for the basic KFover all models,
i.e.
the small- dll and exe- and the larger, sw_euro_3 using either inner loop dll or calling dll in the loop. The times for the larger model are now similar if not marginally better than those for the Matlab Dynare KF loops (i.e. 95.6 sec for new C++ comparing to 97.5 for Matlab KF loop), whilst those for the small are now significantly better.
The main change made in the 2nd cut was overloading member-by-member GeneralMatrix copy () (used by the constructor too) with a memcpy()
version
in the Dynare++ sylv/cc/GeneralMatrix.h and .cpp files. Together with Vector.h/.cpp and only 3 other (utility) headers from that directory they are also used by C++KF. I added that small subset of sylv files that are needed for KF to the new sylv/cc subdirectory of the mex/sources/kalman
(see
NOTE (*) below)
Note also, however, that the same performance improvement change may poss. be applied in the main Dynare++ sylv as well as to the (similar) mex/sources/gensylv versions of those files too!
- I will start devising a method to compare subtask execution times as
you
suggested but that may be a bit tricky.
However, I would like to try few more things that can be done to improve
the
performance of the existing C++ code - that is - without major changes
being
implemented at this stage yet such as e.g. adding Andrea's quasi
triangular
matrix multiplication library.
As can be seen from the enclosed profile file taken from running the optimised executable with inner loop, the top 5 CPU time "spending" sub-tasks now are the productive gmemm, matrix constructor(still) and
matrix
inverter (i.e beside the main KalmanTask -filterNonDiffuse and _Unwind_SjLj_Register exception controller, two of which little can be
done
about).
NOTE: (*) I.e. There are few small differences between gensylv directory
in
Dynare mex and the sylv in Dynare ++ and those differences (e.g. missing GeneralMatrix.isZero() in gensylv, etc) are still affecting successful compilation of kalman filter using the gensylv. I would therefore need either to modify gensylv (and I am afraid to break it) or to keep a copy
of
the small required subset of Dynare++ sylv directory specially associated to the kalman filter. I would suggest the latter as few more changes may be needed for KF and a merge may then poss. be performed at a later stage. ______________
Best regards
George
_______________________________________________ Dev mailing list Dev@dynare.org http://www.dynare.org/cgi-bin/mailman/listinfo/dev