Note on the performances and vectorization
This note explains the design choices which have been made in the view of running a simulation pipeline, with realistic duration and number of sources.
Orbits
Orbits evolves along 2 dimensions: nSample x nLink with nLink << nSample.
Travel times are scalar, whereas positions are 3d vectors.
The most efficient way of managing orbits is to make implicit loops over time samples using ndarray.
In the orbit object, time input/output argument is always considered as a collection of time samples (not a single value).
Current software performances are:
Analytic | |
---|---|
duration | 1 yr |
time step | 5 s |
position cpu | 0.6 s |
orbits memmax | 800 Mb |
orbits memory | 500 Mb |
nb spacecrafts | 3 |
nb links | 6 |
Alpha, cosine and sine of alpha are temporarily stored in memory in order to save CPU time, as they are used 3 times when computing positions. They are also reused for each link in projection.
HpHc
HpHc evolves with source parameters, and time, that is nSource x nSample.
We use 2D ndarray to handle those 2 dimensions when computing waveforms when possible (galatic binaries), to avoid explicit loop, but this option is not used in the simulation pipeline.
The objects offers some method to merge or split HpHc object, such that an HpHc object can contain one or several sources of the same kind.
Below is a summary of assumptions and current software performance.
MBHB | EMRI | GB | SOBBH | |
---|---|---|---|---|
duration | 1 yr | 1 yr | 2 yr | 2 yr |
time step | 5 s | 15 s | 15 s | 5 s |
hphc cpu | 8 s | 68 s | 0.3 s | 15 s |
interp | 1.3s | 0.5 s | 1.9 s | 5.4 s |
hphc memmax | 1.4 Gb | 800 Mb | 400 Mb | 1.2 Gb |
hphc memory | 400 Mb | 200 Mb | 250 Mb | 600 Mb |
nb sources | 100 ? | 10-100? | 30e6 | 21e3 |
Numbers of given for a single source. MBHB memory consumption is high due to sampling in Fourier (5s).
Projected strains
Projected strain dimensions of nSource x nLink x nSample.
It uses quantities computed by Orbits which depends on time and link, and HpHc wich depends on time and sources.
It is envisaged to dispatch the workload along number of sources, as they can be treated independantly. The cost of doing this is a duplication of orbits computation. A mitigated way would be to handle batches of sources.
Therefore, the ProjectedStrain object uses implicit loop on time samples, like orbits and hphc.
Question left is: how to manage the loop over arms ? For now, this is an explicit loop.
Below is a summary of assumptions and current software performance.
MBHB | EMRI | GB | SOBBH | |
---|---|---|---|---|
duration | 1 yr | 1 yr | 2 yr | 2 yr |
time step | 5 s | 15 s | 15 s | 5 s |
proj cpu/comput | 100 s | >5min | 8 s | 3 min |
proj cpu/interp | 21 s | 72 s | 11 s | 52 s |
proj memmax | 2.25 Gb | 1.2 Gb | 1.2 Gb | 4 Gb |
nb sources | 100 ? | 10-100? | 30e6 | 21e3 |
nb links | 6 | 6 | 6 | 6 |
Numbers of given for a single source, projected on the 6 arms. Two options are studied:
- cpu/comput: hp,hc are computed twice for each link
- cpu/interp: hp,hc are computed once for all, on a slightly extended time range, and then interpolated twice for each link. The interpolator for hp and hc is computed once for all.