Null Simulation

Motivation

The goal of the Null Generating Process is to result in the theoretical null hypothesis of "no effect". The main rationale behind this simulation is that most of the genome is still believed to be irrelevant. It is thus of particular importance that the proposed semi-parametric estimators are not too sensitive to noise in order to control the false discovery rate. Since we are interested in the effects of genetic variants on traits, the data generating process must satisfy: $∀ j, Y ⫫ V_j$, where $Y$ is a given trait. In practice, we enforce an even stronger condition, where all variables are pairwise independent. This is done by drawing independently $n$ samples with replacement from the empirical marginal distribution of each variable. The only exception is that $(PCs, C)$ are sampled jointly. This generating process hence preserves many characteristics of the original dataset, while resulting in the Null hypothesis.

Null Simulation

Running the Workflow

To run the null simulation, the NULL_SIMULATION entry should be added to the Nextflow command-line as follows

nextflow run https://github.com/TARGENE/targene-pipeline/ -r v0.11.1 -entry NULL_SIMULATION

Output

The output is a "null_simulation_results.hdf5" file (see Description of Simulations Outputs).