Running

Running the test pipeline

  • cd test
  • snakemake -j 1

Setting a working directory

Choose a location for your pipeline, where all the data products, log and configuration files will be saved.

  • cd sangria
  • runsim --init path/to/your/working/directory
  • cd path/to/your/working/directory
  • runsim --info

The runsim --info command gives a summary of the pipeline settings.

Ancillary data

GW sources are taken from ancillary catalogs, which need to be downloaded beforehand.

There are for now 4 catalogs corresponding to the 4 source types included in the pipeline:

  • VGB.h5 for the 17 verification binaries;
  • Q3d_complete for the 692 MBHB;
  • 007_SeBa_r105_ag_wdwd_pop_highres_P025g70_8col.npy for the 26 millions detached galactic binaries;
  • AMCVn_GWR_MLDC_bulgefix_opt.npy for the 3 millions interacting galactic binaries.

They can be retrieved by request to LDC working group.

In the case where the pipeline is to be run with singularity, one needs to bind the data location to the singularity container.

  • export SINGULARITY_BINDPATH="/path/to/your/data/directory:/data"
  • runsim --data /data

Instrumental configuration

The instrumental configuration is given by the config.yml file.

Advanced user can also change the lisanode configuration by editing the lisanode_config.py file.

GW source configuration

The list of sources included in the simulation is given by the sources parameter on the config.yml file. For each of them, a dedicated parameter file is set and can be adjusted. In particular, they give the location of the input source catalog and the number of sources to be included.

The galactic binary sources follow a special handling due to their very high number. They are built by a dedicated subworkflows, located in the dgb-pipeline and igb-pipeline subdirectories, and therefore are configured through a dedicated config.yml file. In particular, the sampling can be adjusted to a lower value to increase the computing speed (dt parameter), and the number of batch can be set to get a high level of parallelization (nbatch parameter). This subworkflow is automatically run by the main one.

Pipeline configuration

The pipeline is configured through parameters set in the config.yml files:

  • nbatch: gives the number of projections which can be done independantly, it corresponds to the maximum number of parallel jobs desired. The actual number of jobs used is set at the command line.

  • dirname: gives the name of the subdirectory in which the data product will be saved.

Running

The pipeline execution is controlled by snakemake commands, see: snakemake --help

For more documentation, see The SnakeMake officiel documentation.

Dry run

It is useful to systematically check what snakemake will compute before running. To do so: snakemake -nr Snakemake checks the last date of modification of the files used by all stages (rules) of the pipeline. If one of the input file has changed, the stage is (re)processed.

It is sometimes necessary to bypass this rule, by setting manually an early date for a given file:

touch --date='2020-03-01 10:00:00' config.yml

Full run

By default, only the noisy TDI containing all GW sources together is produced. In order to also get independant noise free TDI for all GW source types, run snakemake full.

Run on a laptop

snakemake --cores 2

Number of workers that can run in parallel is set by the --cores option (or -j).

Run on a cluster

snakemake --cores 2 --cluster "qsub -l nodes=1:ppn=1,walltime=96:00:00"

Job properties are given by the --cluster option. A specific amount of memory is set at the last stage of the pipeline (see l. 116 of Snakefile). A default amount of memory can be set for the other rules, by using:

snakemake --cores 2 --default-resources "mem_mb=10000" --cluster "qsub -l nodes=1:ppn=1,mem={resources.mem_mb}mb,walltime=96:00:00"

Run with singularity

In your working directory, download the singularity image (see README for instructions)

snakemake --cores 2 --use-singularity

More on the pipeline execution

Log files are systematically produced by snakemake, and saved in the log subdirectory.

Snakemake can build a report on the run, containing plots of the GW strain, as well as CPU statistics: snakemake --report report.html

A graphical representation of the underlying workflow can be obtained via: snakemake --dag | dot -Tpng > dag.png

Pipeline DAG