Script to align reads to a reference genome using hisat2. This requires an existing index which may be created using hisat2 itself. Commonly used genome indices may also be downloaded from the HISAT2 homepage.
run_hisat2(hisat2 = "hisat2", idx = NULL, mate1 = NULL, mate2 = NULL, fastq = TRUE, fasta = FALSE, softClipPenalty = NULL, noSoftClip = FALSE, noSplice = FALSE, knownSplice = NULL, strand = NULL, tmo = FALSE, maxAlign = NULL, secondary = FALSE, minInsert = NULL, maxInsert = NULL, nomixed = FALSE, nodiscordant = FALSE, threads = 1, rgid = NULL, quiet = FALSE, non_deterministic = FALSE)
hisat2 | Path to hisat2 (if using WSL, then this should be the full path on the linux subsystem) |
---|---|
idx | The basename of the index for the reference genome. The basename is the name of any of the index files up to but not including the final .1.ht2, etc. |
mate1 | Comma-separated list of files containing mate 1s (filename usually includes _1) |
mate2 | Comma-separated list of files containing mate 2s (filename usually includes _2). Sequences specified with this option must correspond file-for-file and read-for-read with those specified in . |
fastq | Logical indicating if reads are FASTQ files. |
fasta | Logical indicating if reads are FASTA files. |
softClipPenalty | Sets the maximum (MX) and minimum (MN) penalties for soft-clipping per base, both integers. Must be given in the format "MX,MN". |
noSoftClip | Logical indicating whether to disallow soft-clipping. |
noSplice | Logical indicating whether to switch off spliced alignment, e.g., for DNA-seq analysis. |
knownSplice | Path to text file containing known splice sites. |
strand | Specify strand-specific information. Default is unstranded. |
tmo | Logical indicating whether to report only those reads aligning to known transcripts. |
maxAlign | Integer indicating the maximum number of distinct primary alignments to search for each read. |
secondary | Logical indicating whether to report secondary alignments. |
minInsert | The minimum fragment length for valid paired-end alignments. This option is valid only with noSplice = TRUE. |
maxInsert | The maximum fragment length for valid paired-end alignments. This option is valid only with noSplice = TRUE. |
nomixed | By default, when hisat2 cannot find a concordant or discordant alignment for a pair, it then tries to find alignments for the individual mates. If TRUE, this option disables that behavior. |
nodiscordant | By default, hisat2 looks for discordant alignments if it cannot find any concordant alignments. If true, this option disables that behavior. |
threads | an integer value indicating the number of workers to be used. If NULL then one less than the maximum number of cores will be used. [DEFAULT = NULL]. |
rgid | Character string, to which the read group ID is set. |
quiet | If TRUE, print nothing except alignments and serious errors. |
non_deterministic | When set to TRUE, HISAT2 re-initializes its pseudo-random generator for each read using the current time. |
Alignment file in SAM format
# NOT RUN { run_hisat2(hisat2 = "hisat2", idx = "../prana/data-raw/index/UCSC.hg19", mate1 = "../prana/data-raw/seqFiles/HB1_sample_1.fastq.gz", mate2 = "../prana/data-raw/seqFiles/HB1_sample_2.fastq.gz", fastq = TRUE, fasta = FALSE, softClipPenalty = NULL, noSoftClip = FALSE, noSplice = FALSE, knownSplice = NULL, strand = NULL, tmo = FALSE, maxAlign = NULL, secondary = FALSE, minInsert = NULL, maxInsert = NULL, nomixed = FALSE, nodiscordant = FALSE, threads = (parallel::detectCores() - 1), rgid = NULL, quiet = FALSE, non_deterministic = TRUE) # }