An R-based wrapper for fastp

Run the fastp tool

trim_fastq(fastp = "fastp", fastq1, fastq2 = NULL, dest.dir = NULL,
  disable_adapter_trimming = FALSE, adapter_sequence = NULL,
  adapter_sequence_r2 = NULL, trim_front1 = 0, trim_front2 = 0,
  trim_tail1 = 0, trim_tail2 = 0, trim_poly_g = FALSE,
  poly_g_min_len = 10, trim_poly_x = FALSE, poly_x_min_len = 10,
  cut_by_quality5 = FALSE, cut_by_quality3 = FALSE,
  cut_window_size = 4, cut_mean_quality = 20,
  disable_quality_filtering = FALSE, qualified_quality_phred = 15,
  unqualified_percent_limit = 40, n_base_limit = 5,
  disable_length_filtering = FALSE, length_required = 15,
  length_limit = 0, low_complexity_filter = FALSE,
  complexity_threshold = 30, filter_by_index1 = NULL,
  filter_by_index2 = NULL, filter_by_index_threshold = 0,
  correction = FALSE, overlap_len_require = 30,
  overlap_diff_limit = 5, overrepresentation_analysis = FALSE,
  overrepresentation_sampling = 20, threads = NULL)

Arguments

fastp	a character string specifying the path to the fastp executable. [DEFAULT = "fastp"].
fastq1	a character vector indicating the read files to be trimmed.
fastq2	(optional) a character vector indicating read files to be trimmmed. If specified, it is assumed the reads are paired, and this vector MUST be in the same order as those listed in `fastq1`. If `NULL` then it is assumed the reads are single-end. [DEFAULT = NULL]
dest.dir	a character string specifying the output directory. If NULL a directory named "TRIMMED_FASTQC" is created in the current working directory. [DEFAULT = NULL].
disable_adapter_trimming	logical, if TRUE adapter trimming is disabled. [DEFAULT = FALSE]
adapter_sequence	character string, specifying the adapter for read1. For SE data, if not specified, the adapter will be auto-detected. For PE data, this is used if R1/R2 are found not overlapped. [DEFAULT = NULL]
adapter_sequence_r2	character string, the adapter for read2 (PE data only). This is used if R1/R2 are found not overlapped. If not specified, it will be the same as <adapter_sequence>. [DEFAULT = NULL]
trim_front1	integer specifying number of bases to trim at 5' end for read1 [DEFAULT = 0]
trim_front2	integer specifying number of bases to trim at 5' end for read2 [DEFAULT = 0]
trim_tail1	integer specifying number of bases to trim at 3' end for read1 [DEFAULT = 0]
trim_tail2	integer specifying number of bases to trim at 3' end for read2 [DEFAULT = 0]
trim_poly_g	logical, if TRUE, force polyG tail trimming [DEFAULT = FALSE].
poly_g_min_len	integer specifying the minimum length to detect polyG in the read tail. [DEFAULT = 10]
trim_poly_x	logical, f TRUE, enable polyX trimming in 3' ends. [DEFAULT = FALSE]
poly_x_min_len	integer specifying the minimum length to detect polyX in the read tail. [DEFAULT = 10]
cut_by_quality5	logical, if TRUE enable per read cutting by quality at 5' end (WARNING: this will interfere deduplication for both PE/SE data) [DEFAULT = FALSE]
cut_by_quality3	logical, if TRUE enable per read cutting by quality at 3' end (WARNING: this will interfere deduplication for both SE data) [DEFAULT = FALSE]
cut_window_size	integer specifying the base pair size of the sliding window for sliding window trimming [DEFAULT = 4]
cut_mean_quality	integer specifying the mean phred quality threshold within a sliding window for removing bases [DEFAULT = 20]
disable_quality_filtering	logical, if TRUE then quality filtering is enabled. [DEFAULT = TRUE]
qualified_quality_phred	integer specifying the base quality threshold. [DEFAULT = 15]
unqualified_percent_limit	numeric specifying the percentage of bases allowed to be below the threshold before a read/pair is discarded. [DEFAULT = 40]
n_base_limit	integer specifying the number of allowable uncallable reads (N) before a read/pair is discarded. [DEFAULT = 5]
disable_length_filtering	logical, if TRUE then length filtering is enabled. [DEFAULT = TRUE]
length_required	integer specifying the length below which reads will be discarded. [DEFAULT = 15]
length_limit	integer specifying the length above which reads will be discarded; if 0 then no limit applied. [DEFAULT = 0]
low_complexity_filter	logical, if TRUE then enable low complexity filter. The complexity is defined as the percentage of base that is different from its next base (base[i] != base[i+1]). [DEFAULT = FALSE]
complexity_threshold	numeric specifying the threshold for the low complexity filter (0~100). [DEFAULT = 30]
filter_by_index1	character string specifying a file containing a list of barcodes of index1 to be filtered out, one barcode per line. [DEFAULT = NULL]
filter_by_index2	character string specifying a file containing a list of barcodes of index2 to be filtered out, one barcode per line. [DEFAULT = NULL]
filter_by_index_threshold	the allowed difference of index barcode for index filtering; 0 means completely identical. [DEFAULT = 0]
correction	logical, if TRUE theb enable base correction in overlapped regions (only for PE data). [DEFAULT = FALSE]
overlap_len_require	integer specifying the minimum length of the overlapped region for overlap analysis based adapter trimming and correction. [DEFAULT = 30]
overlap_diff_limit	integer specifying the maximum difference of the overlapped region for overlap analysis based adapter trimming and correction. [DEFAULT = 5]
overrepresentation_analysis	logical, if TRUE then enable overrepresented sequence analysis. [DEFAULT = FALSE]
overrepresentation_sampling	integer specifying how reads will be computed for overrepresentation analysis, e.g. if set to 20, then 1-in029 reads will be sampled. May range from 1 to 10000; smaller is slower, [DEFAULT = 20]
threads	an integer value indicating the number of workers to be used. If NULL then one less than the maximum number of cores will be used. [DEFAULT = NULL].

Details

This script runs the fastp tool and requires installation of fastp. Pre-compiled binaries and installation instructions may be found at https://github.com/OpenGene/fastp

fastp path

If the executable is in $PATH, then the default value for paths ("fastp") will work. If it is not in $PATH, then the absolute path should be given. If using Windows 10, it is assumed that fastp has been installed in WSL, and the same rules apply.

References

Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu (2018): fastp: an ultra-fast all-in-one FASTQ preprocessor. BioRxiv 274100; https://doi.org/10.1101/274100

Arguments

Details

fastp path

References

Contents