Table of Contents | ||||
---|---|---|---|---|
|
...
As an alternative to BlueBee, we recommend utilizing the publicly available UMI-Tools package available on GitHub here. Detailed documentation can be found in the ReadTheDocs. The following command line will extract the UMI sequence from the read while removing the adjacent 4 nt TATA spacer:
Code Block | |||
---|---|---|---|
| '
| ||
umi_tools extract –extract-method=regex –bc-pattern "(?P<umi_1>.{6})(?P<discard_1>TATA1>.{4}).*" -L "/path/to/my_outputlog.txt" -I "/path/to/my_input.fastq.gz" -S "/path/to/my_output.fastq.gz"' |
After alignment, reads can be deduplicated with the following command:
Code Block | ||
---|---|---|
| ||
umi_tools dedup -I example.bam –output-stats=deduplicated -S deduplicated.bam |
...
The deduplication method of UMI-Tools has been published here.
NOTE: The current implementation of this method can take some time and can consume significant memory. If you experience issues with run time or memory usage, please refer to these FAQs.
...