Thanks George,
I understand that for sw_euro_3 the time of C++ and Matlab are about the same. For the small model, is C++ ¨significantly better" than Matlab or than before?
It seems a good time to try Andrea´s code, but we need to be able to carefully measure its contribution. So it is necessary to time the operations that this code performs in standard and in the improved implementation.
In the GPROF output, I'm surprised at the number -- and therefore time consumption -- of calls to matrix constructors and destructors. It looks as if matrices were constructed inside the filter loop. It would seems more efficient to allocate the necessary space once and use it over and over again. I suspect that it has to do with the very high modularity of the current implementation and that we will need to rewrite it basically from scratch in a more integrated manner.
Best,
Michel
G. Perendia wrote:
Dear Michel
- With the 2nd cut of refactoring we achieved another substantial
performance improvement, about 25-30% for the basic KFover all models, i.e. the small- dll and exe- and the larger, sw_euro_3 using either inner loop dll or calling dll in the loop. The times for the larger model are now similar if not marginally better than those for the Matlab Dynare KF loops (i.e. 95.6 sec for new C++ comparing to 97.5 for Matlab KF loop), whilst those for the small are now significantly better.
The main change made in the 2nd cut was overloading member-by-member GeneralMatrix copy () (used by the constructor too) with a memcpy() version in the Dynare++ sylv/cc/GeneralMatrix.h and .cpp files. Together with Vector.h/.cpp and only 3 other (utility) headers from that directory they are also used by C++KF. I added that small subset of sylv files that are needed for KF to the new sylv/cc subdirectory of the mex/sources/kalman (see NOTE (*) below)
Note also, however, that the same performance improvement change may poss. be applied in the main Dynare++ sylv as well as to the (similar) mex/sources/gensylv versions of those files too!
- I will start devising a method to compare subtask execution times as you
suggested but that may be a bit tricky.
However, I would like to try few more things that can be done to improve the performance of the existing C++ code - that is - without major changes being implemented at this stage yet such as e.g. adding Andrea's quasi triangular matrix multiplication library.
As can be seen from the enclosed profile file taken from running the optimised executable with inner loop, the top 5 CPU time "spending" sub-tasks now are the productive gmemm, matrix constructor(still) and matrix inverter (i.e beside the main KalmanTask -filterNonDiffuse and _Unwind_SjLj_Register exception controller, two of which little can be done about).
NOTE: (*) I.e. There are few small differences between gensylv directory in Dynare mex and the sylv in Dynare ++ and those differences (e.g. missing GeneralMatrix.isZero() in gensylv, etc) are still affecting successful compilation of kalman filter using the gensylv. I would therefore need either to modify gensylv (and I am afraid to break it) or to keep a copy of the small required subset of Dynare++ sylv directory specially associated to the kalman filter. I would suggest the latter as few more changes may be needed for KF and a merge may then poss. be performed at a later stage. ______________
Best regards
George