nf-core/sarek
Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
3.0.2). The latest
stable release is
3.6.0
.
Define where the pipeline should find input data and save output data.
Starting step
stringPath to comma-separated file containing information about the samples in the experiment.
string\.csv$The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringMost common options used for the pipeline
Specify how many reads each split of a FastQ file contains. Set 0 to turn off splitting at all.
integer50000000Enable when exome or panel data is provided.
booleanPath to target bed file in case of whole exome or targeted sequencing or intervals file.
stringEstimate interval size.
number1000Disable usage of intervals.
booleanTools to use for variant calling and/or for annotation.
string^((ascat|cnvkit|controlfreec|deepvariant|freebayes|haplotypecaller|manta|merge|mpileup|msisensorpro|mutect2|snpeff|strelka|tiddit|vep)?,?)*[^,]+$Disable specified tools.
string^((baserecalibrator|baserecalibrator_report|bcftools|documentation|fastqc|markduplicates|markduplicates_report|mosdepth|multiqc|samtools|vcftools|versions)?,?)*[^,]+$Trim fastq file or handle UMIs
Run FastP for read trimming
booleanRemove bp from the 5’ end of read 1
integerRemove bp from the 5’ end of read 2
integerRemove bp from the 3’ end of read 1
integerRemove bp from the 3’ end of read 2
integerRemoving poly-G tails.
integerSave trimmed FastQ file intermediates.
booleanSpecify UMI read structure
stringDefault strategy with UMI
stringAdjacencyIf set, publishes split FASTQ files. Intended for testing purposes.
booleanConfigure preprocessing tools
Specify aligner to be used to map reads to reference genome.
stringSave mapped BAMs.
booleanSaves output from Markduplicates & Baserecalibration as BAM file instead of CRAM
booleanEnable usage of GATK Spark implementation for duplicate marking and/or base quality score recalibration
string^((baserecalibrator|markduplicates)?,?)*[^,]+$Configure variant calling tools
If true, skips germline variant calling for matched normal to tumor sample. Normal samples without matched tumor will still be processed through germline variant calling tools.
booleanTurn on the joint germline variant calling for GATK haplotypecaller
booleanOverwrite Ascat min base quality required for a read to be counted.
number20Overwrite Ascat minimum depth required in the normal for a SNP to be considered.
number10Overwrite Ascat min mapping quality required for a read to be counted.
number35Overwrite ASCAT ploidy.
numberOverwrite ASCAT purity.
numberSpecify a custom chromosome length file.
stringOverwrite Control-FREEC coefficientOfVariation
number0.05Overwrite Control-FREEC contaminationAdjustement
booleanDesign known contamination value for Control-FREEC
numberMinimal sequencing quality for a position to be considered in BAF analysis.
numberMinimal read coverage for a position to be considered in BAF analysis.
numberGenome ploidy used by ControlFREEC
string2Overwrite Control-FREEC window size.
numberPanel-of-normals VCF (bgzipped) for GATK Mutect2
stringIndex of PON panel-of-normals VCF.
stringDo not analyze soft clipped bases in the reads for GATK Mutect2.
booleanAllow usage of fasta file for annotation with VEP
booleanEnable the use of the VEP dbNSFP plugin.
booleanPath to dbNSFP processed file.
stringPath to dbNSFP tabix indexed file.
stringConsequence to annotate with
stringFields to annotate with
stringrs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AFEnable the use of the VEP LOFTEE plugin.
booleanEnable the use of the VEP SpliceAI plugin.
booleanPath to spliceai raw scores snv file.
stringPath to spliceai raw scores snv tabix indexed file.
stringPath to spliceai raw scores indel file.
stringPath to spliceai raw scores indel tabix indexed file.
stringEnable the use of the VEP SpliceRegion plugin.
booleanPath to snpEff cache.
stringPath to VEP cache.
stringVEP output-file format.
stringReference genome related files and options required for the workflow.
Name of iGenomes reference.
stringGATK.GRCh38ASCAT genome.
stringPath to ASCAT allele zip file.
stringPath to ASCAT loci zip file.
stringPath to ASCAT GC content correction file.
stringPath to ASCAT RT (replictiming) correction file.
stringPath to BWA mem indices.
stringPath to bwa-mem2 mem indices.
stringPath to chromosomes folder used with ControLFREEC.
stringPath to dbsnp file.
stringPath to dbsnp index.
stringlabel string for VariantRecalibration (haplotypecaller joint variant calling)
stringPath to FASTA dictionary file.
stringPath to dragmap indices.
stringPath to FASTA genome file.
string\.fn?a(sta)?(\.gz)?$Path to FASTA reference index.
stringPath to GATK Mutect2 Germline Resource File.
stringPath to GATK Mutect2 Germline Resource Index.
stringPath to known indels file.
stringPath to known indels file index.
stringIf you use AWS iGenomes, this has already been set for you appropriately.
1st label string for VariantRecalibration (haplotypecaller joint variant calling)
stringIf you use AWS iGenomes, this has already been set for you appropriately.
Path to known snps file.
stringPath to known snps file snps.
stringIf you use AWS iGenomes, this has already been set for you appropriately.
label string for VariantRecalibration (haplotypecaller joint variant calling)
stringPath to Control-FREEC mappability file.
stringsnpEff DB version.
stringsnpEff genome.
stringsnpEff version.
stringVEP genome.
stringVEP species.
stringVEP cache version.
numberVEP version.
stringSave built references.
booleanDirectory / URL base for iGenomes references.
strings3://ngi-igenomes/igenomes/Do not load the iGenomes reference config.
booleanParameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional config name.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
stringSequencing center information to be added to read group (CN field).
stringSequencing platform information to be added to read group (PL field).
stringILLUMINASet the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer16Maximum amount of memory that can be requested for any single job.
string128.GB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Maximum amount of time that can be requested for any single job.
string240.h^(\d+\.?\s*(s|m|h|day)\s*)+$Less common options for the pipeline, typically set in a config file.
Display help text.
booleanMethod used to save pipeline results to output directory.
stringEmail address for completion summary.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Email address for completion summary, only when pipeline fails.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanFile size limit when attaching MultiQC reports to summary emails.
string25.MB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Do not use coloured log outputs.
booleanMultiQC report title. Printed as page header, used for filename if not otherwise specified.
stringCustom config file to supply to MultiQC.
stringDirectory to keep pipeline Nextflow logs and reports.
string${params.outdir}/pipeline_infoBoolean whether to validate parameters against the schema at runtime
booleantrueShow all params when using --help
booleanRun this workflow with Conda. You can also use ‘-profile conda’ instead of providing this parameter.
boolean