Dear Michel and Andrea
Using the updated version of QTV, the QTV_1.f90, we got some 10% improvement
on large matrices:
Running 1,000,000 loops with 100x100 matrix and vector:
Dgemv : 26 sec
QTV high level subroutine: 59 sec
QTV_1 2nd version of high level subroutine 53sec
QT modular: 240 sec
Best regards
George
Mob. +44(0)7951415480
artilogica(a)btconnect.com
----- Original Message -----
From: "G. Perendia" <george(a)perendia.orangehome.co.uk>
To: "Michel Juillard" <michel.juillard(a)ens.fr>
Cc: "andrea pagano" <pagano.andrea(a)gmail.com>; "List for Dynare developers"
<dev(a)dynare.org>
Sent: Wednesday, June 24, 2009 2:19 PM
Subject: Re: Quasi triangular matrices in Kalman filters
> Dear Michel and Andrea
>
> I run some initial standalone .exe tests with 2 matrices sizes and 3
> optimised programs:
>
> dgemv_exe - calling Matlab Lapack/Blas dgemv,
> qtvmv_exe - calling new high level QT library subroutine QTV.f90, and,
> qtamv_exe - calling 4 different low level QT library subroutines in a
> highly modular fashion and performing vector addition at the end,
>
> Whilst the new high-level subroutine is marginally faster than dgemv for
> smaller matrices (11x11), both QT subroutines appear however slower than
> dgemv for the larger matrices (100x100).
>
>
> QT , Matlab and GM T*a Stand-alone exe's calculations
>
> Running 10,000,000 loops with 11x11 matrix and vector:
>
> Dgemv : 10 sec
>
> QTV high level subroutine: 9 sec
>
> QT modular: 32 sec
>
> Running 1,000,000 loops with 100x100 matrix and vector:
>
> Dgemv : 26 sec
>
> QTV high level subroutine: 59 sec
>
> QT modular: 240 sec
>
>
>
> It is quite obvious that the modular approach is no-go as it looks like it
> is spending too much time calling 4 f90 subroutines and passing data
between
> C++ and f90, and an overall optimised, (new) high level subroutine
(QTV.f90)
> is a much (~4 times?) better solution.
>
>
>
> NOTES:
>
> The QT matrices are from qz decomposition or two rand(size) matrices
>
> The lapse time has been very consistent across several repeated runs!
>
> The lapse execution time was measured as difference between the unix date
> after and before the .exe runs, e.g. as:
>
> $ date;./qtamvm_exe QT100t_tab.txt aa100_dat 100 1000000;date
>
>
>
> PS: For correct results, one needs to supply a tab-delimited file
containing
> the transpose of the QT matrices at the entry due to the FORTRAN matrix
> orientation being different form C++.
>
>
> Shall I upload the data files too?
>
> Best regards
>
> George
> Mob. +44(0)7951415480
> artilogica(a)btconnect.com
>
> ----- Original Message -----
> From: "Michel Juillard" <michel.juillard(a)ens.fr>
> To: "G. Perendia" <george(a)perendia.orangehome.co.uk>
> Cc: "andrea pagano" <pagano.andrea(a)gmail.com>; "List for Dynare
developers"
> <dev(a)dynare.org>
> Sent: Monday, June 22, 2009 7:39 PM
> Subject: Re: Quasi triangular matrices in Kalman filters
>
>
> >
> > Andrea and George,
> >
> > 1) please write a standalone test/timing of the QT code so that we can
> > profile it using standard tools
> > 2) Compare with call to dgmev() using Lapack + optimized Blas (possibly
> > from a Matlab distribution).
> > 3) Upload the code to SVN, so we can test it on other machines. The huge
> > variability between 2 runs reported by George, may be due to Windows and
> > usually is less important under Linux.
> >
> > All the best,
> >
> > Michel
> >
> >
> > G. Perendia wrote:
> > > Dear Andrea
> > >
> > > Thanks for the new libraries.
> > >
> > > I run some initial performance tests today for the simple T*a
> matrix*vector
> > > multiplication with 2 different QT matrices sizes but in summary this
is
> > > what I am, getting:
> > >
> > > 10000 iteration loop with 100x100 random QT matrix (from qz
> decomposition)
> > > and a vector:
> > >
> > > 1st & 2nd run (after
> restart)
> > >
> > > Native matlab matrix multiplication in a loop
> > >
> > > Ta_time = 0.3010 & 0.6610
> > >
> > > Calling dgemv() using Sylv Vector and General Matrix is faster than
> Matlab
> > > loop:
> > >
> > > GMcppTaInnrLoop_time = 0.1600 & 0.3310
> > >
> > > Calling QT f90 library using of Sylv Vector and General Matrix:
> > >
> > > QTcppTaInnerLoop_time = 8.5730 & 20.5300
> > >
> > > Calling QT f90 library without use of Sylv Vector and General Matrix
but
> > > using only pure C/C++ double arrays is only marginally faster:
> > >
> > > QTcpp_noSylv_TaInnerLoop_time = 8.4420
> > >
> > >
> > >
> > > 1000 loop with 10x10 random QT matrix and vector:
> > >
> > > For a 10x10 matrix, calling QT f90 library takes about twice the time
> Matlab
> > > loop does but dgemv is still faster.
> > >
> > > Matlab: 0.0400
> > >
> > > GMcppTaInnrLoop_time = 0.0300
> > >
> > > QTcppTaInnerLoop_time = 0.0800
> > >
> > >
> > >
> > > It is, however, possible that the MinGW f95 I am using is not the best
> > > optimising compiler that can be used and/or that tests for PTP', which
I
> am
> > > planning to do next, may be better..
> > >
> > >
> > >
> > > What are your thoughts? Do you think that we may be able to improve
> > > performance of this multiplication somehow.
> > >
> > > I wander if making many cross-language calls may be rather detrimental
> and
> > > that we may improve performance if we reduce this high level of
> > > modularisation and calling, e.g. by using a higher level subroutine
that
> > > will perform all operations within f90, passing back only the final
Ta?
> > >
> > >
> > >
> > > NOTES:
> > >
> > > After a restart, Matlab appears to be much slower than later!
> > >
> > > Also, matlab multiplication reports both, the real and the imaginary
> part of
> > > the result which appear complex but the real part matches QT and dgemv
> > > outputs..
> > >
> > >
> > >
> > > Best regards
> > >
> > >
> > > George
> > > artilogica(a)btconnect.com
> > >
> > > ----- Original Message -----
> > > From: "Michel Juillard" <michel.juillard(a)ens.fr>
> > > To: "andrea pagano" <pagano.andrea(a)gmail.com>
> > > Cc: "G. Perendia" <george(a)perendia.orangehome.co.uk>
> > > Sent: Friday, June 19, 2009 1:33 PM
> > > Subject: Re: Quasi triangular matrices in Kalman filters
> > >
> > >
> > >
> > >> Thanks Andrea
> > >>
> > >> amities
> > >>
> > >> Michel
> > >>
> > >> andrea pagano wrote:
> > >>
> > >>> Hi all
> > >>> I would go for subroutines. I will do it over the weekend while
> > >>> looking at other possibilities fortran pointers.
> > >>>
> > >>>
> > >>> Best
> > >>>
> > >>> Andrea
> > >>>
> > >>>
> > >>> On Fri, Jun 19, 2009 at 10:04 AM, G.
> > >>> Perendia<george(a)perendia.orangehome.co.uk> wrote:
> > >>>
> > >>>
> > >>>> Dear Andrea
> > >>>>
> > >>>> Problem:
> > >>>>
> > >>>> I have encountered a problem integrating KalmanFilter with the
> > >>>> f90 QT library - passing the QT result arrays back to C++.
> > >>>>
> > >>>> QT Fortran routines have been written in standard Fortran FUNCTION
> > >>>>
> > > format,
> > >
> > >>>> (i.e., not SUBROUTINE), so that they are returning double or single
> > >>>> dimensional array (they are named by), by value ( not reference).
> > >>>>
> > > However,
> > >
> > >>>> as it appears, only simple, single variables seems can be passed
from
> > >>>> Fortran
> > >>>> FUNCTIONs back to C++ (e.g. INT or REAL).
> > >>>>
> > >>>> On the other hand, NAG, BLAS and LAPACK routines have all been
> written
> > >>>>
> > > as
> > >
> > >>>> Fortran SUBROUTINEs and they can be integrated with C more easily -
> > >>>>
> > > they
> > >
> > >>>> receive parameters
> > >>>> and return their results through the variables passed as calling
> > >>>>
> > > parameters,
> > >
> > >>>> by references.
> > >>>>
> > >>>> For example, dgemv.f from BLAS library gets Y by reference and
> returns
> > >>>> modified Y passed as calling parameter reference.
> > >>>>
> > >>>> SUBROUTINE DGEMV(TRANS,M,N,ALPHA,A,LDA,X,INCX,BETA,Y,INCY)
> > >>>> ....
> > >>>> * Y - DOUBLE PRECISION array of DIMENSION at least
> > >>>> ...
> > >>>> * Before entry .... the incremented array Y
> > >>>> * must contain the vector y. On exit, Y is overwritten by
> the
> > >>>> * updated vector y.
> > >>>> ....
> > >>>>
> > >>>> Poss. Solutions:
> > >>>>
> > >>>> I could not find any references on how to get arrays from Fortran
> > >>>>
> > > FUCTION as
> > >
> > >>>> return value back to C - do you or anyone around you know how to do
> it,
> > >>>>
> > > if
> > >
> > >>>> at all possible? In any case, passing array by value is also not
> > >>>>
> > > recommended
> > >
> > >>>> as rather un-economical, especially for larger matrices.
> > >>>>
> > >>>> One alternative way I can think of is less explored option of
> returning
> > >>>> Fortran pointer to the resulting array from the QT functions
instead
> > >>>>
> > > of the
> > >
> > >>>> array by value
> > >>>> and I think I can work one of that out but suggestions are more
than
> > >>>> welcome.
> > >>>>
> > >>>> I can see few options:
> > >>>>
> > >>>> a) to rewrite QT library as SUBROUTINE instead FUCTION routines,
or,
> > >>>>
> > >>>> b) try to use Fortran pointers and, if we can
> > >>>> then also rewrite QT library to return pointers, or,
> > >>>>
> > >>>> c) write 3 or more high level f90 shell SUBROUTINES calling the
> > >>>>
> > > existing and
> > >
> > >>>> unmodified QT functions and performing the all operations needed to
> > >>>> construct the resulting Ta and
> > >>>> TPT' (for both cases 1 and 2) instead of doing low-level QT
> > >>>>
> > > manipulation in
> > >
> > >>>> C++.
> > >>>>
> > >>>> This way QT library need not be changed and those new SUBROUTINEs
> will
> > >>>>
> > > also
> > >
> > >>>> act as interface with C++. I think this is a more productive and
> > >>>>
> > > optimal
> > >
> > >>>> alternative of the three since those combination utilities would
have
> > >>>>
> > > to be
> > >
> > >>>> written anyway, except it seem to be easier to do that now in
Fortran
> > >>>>
> > > than
> > >
> > >>>> in C++.
> > >>>>
> > >>>> If you like and/or are busy, I can by Monday develop the Ta and the
> > >>>>
> > > first
> > >
> > >>>> case of TPT' SUBROUTINES whilst the second case may need more
> thinking
> > >>>>
> > > and
> > >
> > >>>> more granular approach to take advantage of multiple processors.
> > >>>>
> > >>>>
> > >>>> Please let me know your thoughts on this issue and, whether if you
> have
> > >>>>
> > > time
> > >
> > >>>> to make the needed changes or additions in the f90 files.
> > >>>>
> > >>>> Best regards
> > >>>>
> > >>>> George
> > >>>> artilogica(a)btconnect.com
> > >>>>
> > >>>> ----- Original Message -----
> > >>>> From: "andrea pagano" <pagano.andrea(a)gmail.com>
> > >>>> To: <george(a)perendia.orangehome.co.uk>
> > >>>> Cc: "Michel Juillard" <Michel.Juillard(a)ens.fr>
> > >>>> Sent: Monday, June 01, 2009 7:47 PM
> > >>>> Subject: Quasi triangular matrices in Kalman filters
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>> Hi all
> > >>>>>
> > >>>>> I am sending you a set of Fortran routines to calculate the
> matricial
> > >>>>> expression in Kalman filter together with some explenations.
> > >>>>>
> > >>>>> Hope they can be a starting point in optimizing the overall
> > >>>>>
> > > computation
> > >
> > >>>>> Best
> > >>>>>
> > >>>>> Andrea
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Andrea Pagano
> > >>>>> via Veratti VARESE
> > >>>>> tel. +3903321691261
> > >>>>> cell.+393403804397
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>
> > >
> > >
> > >
> > >
> >
> >
>