Andrea and George,
1) please write a standalone test/timing of the QT code so that we can
profile it using standard tools
2) Compare with call to dgmev() using Lapack + optimized Blas (possibly
from a Matlab distribution).
3) Upload the code to SVN, so we can test it on other machines. The huge
variability between 2 runs reported by George, may be due to Windows and
usually is less important under Linux.
All the best,
Michel
G. Perendia wrote:
> Dear Andrea
>
> Thanks for the new libraries.
>
> I run some initial performance tests today for the simple T*a matrix*vector
> multiplication with 2 different QT matrices sizes but in summary this is
> what I am, getting:
>
> 10000 iteration loop with 100x100 random QT matrix (from qz decomposition)
> and a vector:
>
> 1st & 2nd run (after restart)
>
> Native matlab matrix multiplication in a loop
>
> Ta_time = 0.3010 & 0.6610
>
> Calling dgemv() using Sylv Vector and General Matrix is faster than Matlab
> loop:
>
> GMcppTaInnrLoop_time = 0.1600 & 0.3310
>
> Calling QT f90 library using of Sylv Vector and General Matrix:
>
> QTcppTaInnerLoop_time = 8.5730 & 20.5300
>
> Calling QT f90 library without use of Sylv Vector and General Matrix but
> using only pure C/C++ double arrays is only marginally faster:
>
> QTcpp_noSylv_TaInnerLoop_time = 8.4420
>
>
>
> 1000 loop with 10x10 random QT matrix and vector:
>
> For a 10x10 matrix, calling QT f90 library takes about twice the time Matlab
> loop does but dgemv is still faster.
>
> Matlab: 0.0400
>
> GMcppTaInnrLoop_time = 0.0300
>
> QTcppTaInnerLoop_time = 0.0800
>
>
>
> It is, however, possible that the MinGW f95 I am using is not the best
> optimising compiler that can be used and/or that tests for PTP', which I am
> planning to do next, may be better..
>
>
>
> What are your thoughts? Do you think that we may be able to improve
> performance of this multiplication somehow.
>
> I wander if making many cross-language calls may be rather detrimental and
> that we may improve performance if we reduce this high level of
> modularisation and calling, e.g. by using a higher level subroutine that
> will perform all operations within f90, passing back only the final Ta?
>
>
>
> NOTES:
>
> After a restart, Matlab appears to be much slower than later!
>
> Also, matlab multiplication reports both, the real and the imaginary part of
> the result which appear complex but the real part matches QT and dgemv
> outputs..
>
>
>
> Best regards
>
>
> George
> artilogica(a)btconnect.com
>
> ----- Original Message -----
> From: "Michel Juillard" <michel.juillard(a)ens.fr>
> To: "andrea pagano" <pagano.andrea(a)gmail.com>
> Cc: "G. Perendia" <george(a)perendia.orangehome.co.uk>
> Sent: Friday, June 19, 2009 1:33 PM
> Subject: Re: Quasi triangular matrices in Kalman filters
>
>
>
>> Thanks Andrea
>>
>> amities
>>
>> Michel
>>
>> andrea pagano wrote:
>>
>>> Hi all
>>> I would go for subroutines. I will do it over the weekend while
>>> looking at other possibilities fortran pointers.
>>>
>>>
>>> Best
>>>
>>> Andrea
>>>
>>>
>>> On Fri, Jun 19, 2009 at 10:04 AM, G.
>>> Perendia<george(a)perendia.orangehome.co.uk> wrote:
>>>
>>>
>>>> Dear Andrea
>>>>
>>>> Problem:
>>>>
>>>> I have encountered a problem integrating KalmanFilter with the
>>>> f90 QT library - passing the QT result arrays back to C++.
>>>>
>>>> QT Fortran routines have been written in standard Fortran FUNCTION
>>>>
> format,
>
>>>> (i.e., not SUBROUTINE), so that they are returning double or single
>>>> dimensional array (they are named by), by value ( not reference).
>>>>
> However,
>
>>>> as it appears, only simple, single variables seems can be passed from
>>>> Fortran
>>>> FUNCTIONs back to C++ (e.g. INT or REAL).
>>>>
>>>> On the other hand, NAG, BLAS and LAPACK routines have all been written
>>>>
> as
>
>>>> Fortran SUBROUTINEs and they can be integrated with C more easily -
>>>>
> they
>
>>>> receive parameters
>>>> and return their results through the variables passed as calling
>>>>
> parameters,
>
>>>> by references.
>>>>
>>>> For example, dgemv.f from BLAS library gets Y by reference and returns
>>>> modified Y passed as calling parameter reference.
>>>>
>>>> SUBROUTINE DGEMV(TRANS,M,N,ALPHA,A,LDA,X,INCX,BETA,Y,INCY)
>>>> ....
>>>> * Y - DOUBLE PRECISION array of DIMENSION at least
>>>> ...
>>>> * Before entry .... the incremented array Y
>>>> * must contain the vector y. On exit, Y is overwritten by the
>>>> * updated vector y.
>>>> ....
>>>>
>>>> Poss. Solutions:
>>>>
>>>> I could not find any references on how to get arrays from Fortran
>>>>
> FUCTION as
>
>>>> return value back to C - do you or anyone around you know how to do it,
>>>>
> if
>
>>>> at all possible? In any case, passing array by value is also not
>>>>
> recommended
>
>>>> as rather un-economical, especially for larger matrices.
>>>>
>>>> One alternative way I can think of is less explored option of returning
>>>> Fortran pointer to the resulting array from the QT functions instead
>>>>
> of the
>
>>>> array by value
>>>> and I think I can work one of that out but suggestions are more than
>>>> welcome.
>>>>
>>>> I can see few options:
>>>>
>>>> a) to rewrite QT library as SUBROUTINE instead FUCTION routines, or,
>>>>
>>>> b) try to use Fortran pointers and, if we can
>>>> then also rewrite QT library to return pointers, or,
>>>>
>>>> c) write 3 or more high level f90 shell SUBROUTINES calling the
>>>>
> existing and
>
>>>> unmodified QT functions and performing the all operations needed to
>>>> construct the resulting Ta and
>>>> TPT' (for both cases 1 and 2) instead of doing low-level QT
>>>>
> manipulation in
>
>>>> C++.
>>>>
>>>> This way QT library need not be changed and those new SUBROUTINEs will
>>>>
> also
>
>>>> act as interface with C++. I think this is a more productive and
>>>>
> optimal
>
>>>> alternative of the three since those combination utilities would have
>>>>
> to be
>
>>>> written anyway, except it seem to be easier to do that now in Fortran
>>>>
> than
>
>>>> in C++.
>>>>
>>>> If you like and/or are busy, I can by Monday develop the Ta and the
>>>>
> first
>
>>>> case of TPT' SUBROUTINES whilst the second case may need more thinking
>>>>
> and
>
>>>> more granular approach to take advantage of multiple processors.
>>>>
>>>>
>>>> Please let me know your thoughts on this issue and, whether if you have
>>>>
> time
>
>>>> to make the needed changes or additions in the f90 files.
>>>>
>>>> Best regards
>>>>
>>>> George
>>>> artilogica(a)btconnect.com
>>>>
>>>> ----- Original Message -----
>>>> From: "andrea pagano" <pagano.andrea(a)gmail.com>
>>>> To: <george(a)perendia.orangehome.co.uk>
>>>> Cc: "Michel Juillard" <Michel.Juillard(a)ens.fr>
>>>> Sent: Monday, June 01, 2009 7:47 PM
>>>> Subject: Quasi triangular matrices in Kalman filters
>>>>
>>>>
>>>>
>>>>
>>>>> Hi all
>>>>>
>>>>> I am sending you a set of Fortran routines to calculate the matricial
>>>>> expression in Kalman filter together with some explenations.
>>>>>
>>>>> Hope they can be a starting point in optimizing the overall
>>>>>
> computation
>
>>>>> Best
>>>>>
>>>>> Andrea
>>>>>
>>>>>
>>>>> --
>>>>> Andrea Pagano
>>>>> via Veratti VARESE
>>>>> tel. +3903321691261
>>>>> cell.+393403804397
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
>