Wrapper script to mark duplicates and optionally remove them in
a BAM file using Sambamba
.
run_sambambadup(sambamba = "sambamba", bamfile = NULL, outfile = NULL, remove = FALSE, threads = 1, hash_table = 262144, overflow_size = 2e+05, io_buffer = 128)
sambamba | Path to Sambamba. |
---|---|
bamfile | Vector of characters specifying path to BAM files. |
outfile | Name of output file. If left as NULL, the suffix _markdup or _dedup will be appended to the input name to indicate marking only or removal of duplicates. |
remove | Boolean. If TRUE, duplicate reads are removed. |
threads | Number of threads to use. |
hash_table | Size of hash table for finding read pairs (default is 262144 reads); will be rounded down to the nearest power of two. For best performance should be > (average coverage) * (insert size). |
overflow_size | Size of the overflow list where reads, thrown out of the hash table, get a second chance to meet their pairs (default is 200000 reads); increasing the size reduces the number of temporary files created. |
io_buffer | Controls sizes of the two buffers (in MB) used for reading and writing BAM during the second pass (default is 128). |
A BAM file in which duplicate reads have been marked or removed.
# NOT RUN { run_sambambadup(sambamba = "sambamba", bamfile = "HB1_sample.bam", outfile = "HB1_sample_markdup.bam", remove = FALSE, threads = (parallel::detectCores() - 1), hash_table = 1000000, overflow_size = 1000000) # }