Alignment of DNA-seq reads
align_dna(threads = 1, output.dir = ".", hisat2 = "hisat2", samtools = "samtools", sambamba = "sambamba", species = c("human", "mouse"), idx = NULL, reads1 = NULL, reads2 = NULL, fastq = TRUE, fasta = FALSE, softClipPenalty = NULL, noSoftClip = FALSE, tmo = FALSE, secondary = FALSE, maxAlign = NULL, nomixed = FALSE, nodiscordant = FALSE, rgid = NULL, quiet = FALSE, non_deterministic = FALSE, maxInsert = NULL, memory = "1G", remove.mitochondrial = "MT", remove.duplicates = TRUE, hash_table = 262144, overflow_size = 2e+05, io_buffer = 128)
threads | an integer value indicating the number of parallel threads to be used by FastQC. [DEFAULT = maximum number of available threads - 1]. |
---|---|
output.dir | character string specifying the directory to which results will be saved. If the directory does not exist, it will be created. |
hisat2 | a character string specifying the path to the hisat2 executable. [DEFAULT = "hisat2"]. |
samtools | a character string specifying the path to the samtools executable. [DEFAULT = "samtools"]. |
sambamba | a character string specifying the path to the sambamba executable. [DEFAULT = "sambamba"]. |
species | character string specifying the name of the species. Only
|
idx | character vector specifying the basename of the index for the reference genome.
The basename is the name of any of the index files up to but not including the final
.1.ht2, etc. If |
reads1 | Character vector of mate1 reads. If specified, then reads.dir must be NULL. |
reads2 | Character vector of mate2 reads. If specified, then reads.dir must be NULL. Must be the same length as mate1. If single-end sequencing, then should be left as NULL. |
fastq | Logical indicating if reads are FASTQ files. |
fasta | Logical indicating if reads are FASTA files. |
softClipPenalty | Sets the maximum (MX) and minimum (MN) penalties for soft-clipping per base, both integers. Must be given in the format "MX,MN". |
noSoftClip | Logical indicating whether to disallow soft-clipping. |
tmo | Logical indicating whether to report only those reads aligning to known transcripts. |
secondary | Logical indicating whether to report secondary alignments. |
maxAlign | Integer indicating the maximum number of distinct primary alignments to search for each read. |
nomixed | By default, when hisat2 cannot find a concordant or discordant alignment for a pair, it then tries to find alignments for the individual mates. If TRUE, this option disables that behavior. |
nodiscordant | By default, hisat2 looks for discordant alignments if it cannot find any concordant alignments. If true, this option disables that behavior. |
rgid | Character string, to which the read group ID is set. |
quiet | If TRUE, print nothing except alignments and serious errors. |
non_deterministic | When set to TRUE, HISAT2 re-initializes its pseudo-random generator for each read using the current time. |
maxInsert | The maximum fragment length for valid paired-end alignments. This option is valid only with noSplice = TRUE. |
memory | String specifying maximum memory per thread; suffix K/M/G recognized. |
remove.mitochondrial | Character string. If set, this will remove reads mapping to the mitochondrial genome. The string should match the reference name for the mitochindrial genome in the alignment file. Examples include "ChrM", "M" and "MT". |
remove.duplicates | If TRUE, duplicate reads will be removed. |
hash_table | Size of hash table for finding read pairs (default is 262144 reads); will be rounded down to the nearest power of two. For best performance should be > (average coverage) * (insert size). |
overflow_size | Size of the overflow list where reads, thrown out of the hash table, get a second chance to meet their pairs (default is 200000 reads); increasing the size reduces the number of temporary files created. |
io_buffer | Controls sizes of the two buffers (in MB) used for reading and writing BAM during the second pass (default is 128). |
Raw and filtered BAM files