Mark duplicates in BAM file

Wrapper script to mark duplicates and optionally remove them in a BAM file using Sambamba.

run_sambambadup(sambamba = "sambamba", bamfile = NULL,
  outfile = NULL, remove = FALSE, threads = 1, hash_table = 262144,
  overflow_size = 2e+05, io_buffer = 128)

Arguments

sambamba	Path to Sambamba.
bamfile	Vector of characters specifying path to BAM files.
outfile	Name of output file. If left as NULL, the suffix _markdup or _dedup will be appended to the input name to indicate marking only or removal of duplicates.
remove	Boolean. If TRUE, duplicate reads are removed.
threads	Number of threads to use.
hash_table	Size of hash table for finding read pairs (default is 262144 reads); will be rounded down to the nearest power of two. For best performance should be > (average coverage) * (insert size).
overflow_size	Size of the overflow list where reads, thrown out of the hash table, get a second chance to meet their pairs (default is 200000 reads); increasing the size reduces the number of temporary files created.
io_buffer	Controls sizes of the two buffers (in MB) used for reading and writing BAM during the second pass (default is 128).

Value

A BAM file in which duplicate reads have been marked or removed.

Examples

# NOT RUN {
run_sambambadup(sambamba = "sambamba", bamfile = "HB1_sample.bam",
                outfile = "HB1_sample_markdup.bam", remove = FALSE,
                threads = (parallel::detectCores() - 1),
                hash_table = 1000000, overflow_size = 1000000)
# }

Arguments

Value

Examples

Contents