Run the fastp tool
trim_fastq(fastp = "fastp", fastq1, fastq2 = NULL, dest.dir = NULL, disable_adapter_trimming = FALSE, adapter_sequence = NULL, adapter_sequence_r2 = NULL, trim_front1 = 0, trim_front2 = 0, trim_tail1 = 0, trim_tail2 = 0, trim_poly_g = FALSE, poly_g_min_len = 10, trim_poly_x = FALSE, poly_x_min_len = 10, cut_by_quality5 = FALSE, cut_by_quality3 = FALSE, cut_window_size = 4, cut_mean_quality = 20, disable_quality_filtering = FALSE, qualified_quality_phred = 15, unqualified_percent_limit = 40, n_base_limit = 5, disable_length_filtering = FALSE, length_required = 15, length_limit = 0, low_complexity_filter = FALSE, complexity_threshold = 30, filter_by_index1 = NULL, filter_by_index2 = NULL, filter_by_index_threshold = 0, correction = FALSE, overlap_len_require = 30, overlap_diff_limit = 5, overrepresentation_analysis = FALSE, overrepresentation_sampling = 20, threads = NULL)
fastp | a character string specifying the path to the fastp executable. [DEFAULT = "fastp"]. |
---|---|
fastq1 | a character vector indicating the read files to be trimmed. |
fastq2 | (optional) a character vector indicating read files to be
trimmmed. If specified, it is assumed the reads are paired, and this vector
MUST be in the same order as those listed in |
dest.dir | a character string specifying the output directory. If NULL a directory named "TRIMMED_FASTQC" is created in the current working directory. [DEFAULT = NULL]. |
disable_adapter_trimming | logical, if TRUE adapter trimming is disabled. [DEFAULT = FALSE] |
adapter_sequence | character string, specifying the adapter for read1. For SE data, if not specified, the adapter will be auto-detected. For PE data, this is used if R1/R2 are found not overlapped. [DEFAULT = NULL] |
adapter_sequence_r2 | character string, the adapter for read2 (PE data only). This is used if R1/R2 are found not overlapped. If not specified, it will be the same as <adapter_sequence>. [DEFAULT = NULL] |
trim_front1 | integer specifying number of bases to trim at 5' end for read1 [DEFAULT = 0] |
trim_front2 | integer specifying number of bases to trim at 5' end for read2 [DEFAULT = 0] |
trim_tail1 | integer specifying number of bases to trim at 3' end for read1 [DEFAULT = 0] |
trim_tail2 | integer specifying number of bases to trim at 3' end for read2 [DEFAULT = 0] |
trim_poly_g | logical, if TRUE, force polyG tail trimming [DEFAULT = FALSE]. |
poly_g_min_len | integer specifying the minimum length to detect polyG in the read tail. [DEFAULT = 10] |
trim_poly_x | logical, f TRUE, enable polyX trimming in 3' ends. [DEFAULT = FALSE] |
poly_x_min_len | integer specifying the minimum length to detect polyX in the read tail. [DEFAULT = 10] |
cut_by_quality5 | logical, if TRUE enable per read cutting by quality at 5' end (WARNING: this will interfere deduplication for both PE/SE data) [DEFAULT = FALSE] |
cut_by_quality3 | logical, if TRUE enable per read cutting by quality at 3' end (WARNING: this will interfere deduplication for both SE data) [DEFAULT = FALSE] |
cut_window_size | integer specifying the base pair size of the sliding window for sliding window trimming [DEFAULT = 4] |
cut_mean_quality | integer specifying the mean phred quality threshold within a sliding window for removing bases [DEFAULT = 20] |
disable_quality_filtering | logical, if TRUE then quality filtering is enabled. [DEFAULT = TRUE] |
qualified_quality_phred | integer specifying the base quality threshold. [DEFAULT = 15] |
unqualified_percent_limit | numeric specifying the percentage of bases allowed to be below the threshold before a read/pair is discarded. [DEFAULT = 40] |
n_base_limit | integer specifying the number of allowable uncallable reads (N) before a read/pair is discarded. [DEFAULT = 5] |
disable_length_filtering | logical, if TRUE then length filtering is enabled. [DEFAULT = TRUE] |
length_required | integer specifying the length below which reads will be discarded. [DEFAULT = 15] |
length_limit | integer specifying the length above which reads will be discarded; if 0 then no limit applied. [DEFAULT = 0] |
low_complexity_filter | logical, if TRUE then enable low complexity filter. The complexity is defined as the percentage of base that is different from its next base (base[i] != base[i+1]). [DEFAULT = FALSE] |
complexity_threshold | numeric specifying the threshold for the low complexity filter (0~100). [DEFAULT = 30] |
filter_by_index1 | character string specifying a file containing a list of barcodes of index1 to be filtered out, one barcode per line. [DEFAULT = NULL] |
filter_by_index2 | character string specifying a file containing a list of barcodes of index2 to be filtered out, one barcode per line. [DEFAULT = NULL] |
filter_by_index_threshold | the allowed difference of index barcode for index filtering; 0 means completely identical. [DEFAULT = 0] |
correction | logical, if TRUE theb enable base correction in overlapped regions (only for PE data). [DEFAULT = FALSE] |
overlap_len_require | integer specifying the minimum length of the overlapped region for overlap analysis based adapter trimming and correction. [DEFAULT = 30] |
overlap_diff_limit | integer specifying the maximum difference of the overlapped region for overlap analysis based adapter trimming and correction. [DEFAULT = 5] |
overrepresentation_analysis | logical, if TRUE then enable overrepresented sequence analysis. [DEFAULT = FALSE] |
overrepresentation_sampling | integer specifying how reads will be computed for overrepresentation analysis, e.g. if set to 20, then 1-in029 reads will be sampled. May range from 1 to 10000; smaller is slower, [DEFAULT = 20] |
threads | an integer value indicating the number of workers to be used. If NULL then one less than the maximum number of cores will be used. [DEFAULT = NULL]. |
This script runs the fastp tool and requires installation of fastp. Pre-compiled binaries and installation instructions may be found at https://github.com/OpenGene/fastp
If the executable is in $PATH
, then the default value for paths
("fastp"
) will work. If it is not in $PATH
, then the absolute
path should be given. If using Windows 10, it is assumed that fastp has
been installed in WSL, and the same rules apply.
Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu (2018): fastp: an ultra-fast all-in-one FASTQ preprocessor. BioRxiv 274100; https://doi.org/10.1101/274100