Thanks for the update. Let me know how it works out.
Best
Michel
G. Perendia wrote:
Dear Michel
re 1) The Matlab Dynare KF tests comparable to C++ KF are those with conversion turned off since C++ has not conversion built in yet.
re2&3)
I started looking into 2) and, on the way, made a small change replacing one of the copy constructors (F) with the improved assignment operator. Though the performance impact was negligent, the profile (enclosed) points to much less of time in GeneralMatrix constructors (i.e. replaced by copy() used by the assignment).
Profiles also points to the gemm as the main CPU time consumer in total time accounted to it (16.4%), though, not per call since it had 8 times more calls than the inversion multInvRight which accounts for 5.8% in total, thus, on average taking nearly 3 times longer per call then gemm.
There is however, still a very large number (though, proportionally not very much time consuming, 1.78 % in total ) of copy-constructing used for recasting from GeneralMatrix to ConstGeneralMatrix, done so to utilise member functions defined for the latter but not the former. Those can be optimised by overloading member functions for GeneralMatrix so to reduce number of those recasting required.
I will thus continue now to analyse the best way to improve gemm by integrating Andrea's library though, it seems that I should re-optimise at least the most relevant multiplications used for a and P first as I would need to use those for the f90 code too.
Best regards
George ----- Original Message ----- From: "Michel Juillard" michel.juillard@ens.fr To: "List for Dynare developers" dev@dynare.org Sent: Friday, June 12, 2009 5:46 PM Subject: Re: [DynareDev] Kalman Filter
Dear George,
concerning
- from now on, you should only look at the tests where the computations
are the exactly the same in Matlab and C++. It becomes hard to make sense of all the tests. At some point, we should run the tests on a Linux machine that gives more consistent time measures than Windows 2) perfect, thanks 3) the objective here is to remove every call to a constructor within the main loop of the Kalman filter. You are right that this may change how you integrate Andrea's code, so you may need to analyze the removal of constructor first. I'm afraid that actual implementation of the changes will push back testing of Andrea's code too far away.
All the best,
Michel
G. Perendia wrote:
Dear Michel
- Re: Small model performance:
a) running recently again 10,000 loops with Matlab KF on the same small
and
fast converging model but this time with conversion turned off(*) I got the following results for matlabKF_time
1st run: 277.6900, 2nd run: 283.9890 3rd (today): 316.3650**
which is much higher compared to matlabKF_time (normal, with conversion working) run initially at around 48.9600
b) Initially, calling Kalman_filters.DLL in the 10,000 loop, (with preparation of H and Z matrices in each loop), total_dll_time = 202.7320, and a rerun 161.7530 which is actually faster than Matlab without convergion!
Running the same tests with Kalman_filters.DLL called in the 10,000 loop, after the two stages of refactoring: on 11th June total_dll_times = 128.0240, and 117.9300
(*) the P conversion and other shortcuts were switched off by setting
kalman
and riccati tol. (or just riccati) to -1 (**) Matlab execution ties vary greatly from run to run
OK, I will start integrating Andrea's library now.
Re GPROF output:
You are right, a few functions used inside C++KF loop work as copy-constructors, e.g. A= B*C+D is copy-constructor for A too, whilst on occasion, a matrix is
constructed
inside KF loop first (e.g. F=H as F(H)) before it is used.as a host (and target) of a complex embedded operation and some of these could be a
subject
to the next stage of refactoring which is what I initially thought of
doing
next (i.e. before the integration).
Best regards
George
----- Original Message ----- From: "Michel Juillard" michel.juillard@ens.fr To: "List for Dynare developers" dev@dynare.org Sent: Thursday, June 11, 2009 8:33 PM Subject: Re: [DynareDev] Kalman Filter
Thanks George,
I understand that for sw_euro_3 the time of C++ and Matlab are about the same. For the small model, is C++ ¨significantly better" than Matlab or than before?
It seems a good time to try Andrea´s code, but we need to be able to carefully measure its contribution. So it is necessary to time the operations that this code performs in standard and in the improved implementation.
In the GPROF output, I'm surprised at the number -- and therefore time consumption -- of calls to matrix constructors and destructors. It looks as if matrices were constructed inside the filter loop. It would seems more efficient to allocate the necessary space once and use it over and over again. I suspect that it has to do with the very high modularity of the current implementation and that we will need to rewrite it basically from scratch in a more integrated manner.
Best,
Michel
G. Perendia wrote:
Dear Michel
- With the 2nd cut of refactoring we achieved another substantial
performance improvement, about 25-30% for the basic KFover all models,
i.e.
the small- dll and exe- and the larger, sw_euro_3 using either inner
loop
dll or calling dll in the loop. The times for the larger model are now similar if not marginally better than those for the Matlab Dynare KF
loops
(i.e. 95.6 sec for new C++ comparing to 97.5 for Matlab KF loop), whilst those for the small are now significantly better.
The main change made in the 2nd cut was overloading member-by-member GeneralMatrix copy () (used by the constructor too) with a memcpy()
version
in the Dynare++ sylv/cc/GeneralMatrix.h and .cpp files. Together with Vector.h/.cpp and only 3 other (utility) headers from that directory they are also used by C++KF. I added that small subset of sylv files that are needed for KF to the new sylv/cc subdirectory of the mex/sources/kalman
(see
NOTE (*) below)
Note also, however, that the same performance improvement change may
poss.
be applied in the main Dynare++ sylv as well as to the (similar) mex/sources/gensylv versions of those files too!
- I will start devising a method to compare subtask execution times as
you
suggested but that may be a bit tricky.
However, I would like to try few more things that can be done to improve
the
performance of the existing C++ code - that is - without major changes
being
implemented at this stage yet such as e.g. adding Andrea's quasi
triangular
matrix multiplication library.
As can be seen from the enclosed profile file taken from running the optimised executable with inner loop, the top 5 CPU time "spending" sub-tasks now are the productive gmemm, matrix constructor(still) and
matrix
inverter (i.e beside the main KalmanTask -filterNonDiffuse and _Unwind_SjLj_Register exception controller, two of which little can be
done
about).
NOTE: (*) I.e. There are few small differences between gensylv directory
in
Dynare mex and the sylv in Dynare ++ and those differences (e.g. missing GeneralMatrix.isZero() in gensylv, etc) are still affecting successful compilation of kalman filter using the gensylv. I would therefore need either to modify gensylv (and I am afraid to break it) or to keep a copy
of
the small required subset of Dynare++ sylv directory specially associated to the kalman filter. I would suggest the latter as few more changes may be needed for KF and a merge may then poss. be performed at a later stage. ______________
Best regards
George
Dev mailing list Dev@dynare.org http://www.dynare.org/cgi-bin/mailman/listinfo/dev
Dev mailing list Dev@dynare.org http://www.dynare.org/cgi-bin/mailman/listinfo/dev
Dev mailing list Dev@dynare.org http://www.dynare.org/cgi-bin/mailman/listinfo/dev
Dev mailing list Dev@dynare.org http://www.dynare.org/cgi-bin/mailman/listinfo/dev