SubSample Pipeline Description
FASTQ SubSampling and Filtering Pipeline Overview
- Accepts paired-end FASTQ files (.fastq.gz format)
- Automatically pairs R1 and R2 files based on naming convention
- Supports multiple sample processing in batch
- File size validation and compression optimization
- Bowtie2 alignment against M. tuberculosis H37Rv reference genome
- Local alignment with dovetail option for better sensitivity
- Extracts only M.tb-specific sequences
- Generates unaligned reads for non-M.tb sequences
- Maintains paired-end read integrity
- Automatic file size monitoring (98MB threshold)
- Seqtk-based random subsampling with fixed seed (1987)
- Adaptive read count reduction for optimal file sizes
- Maintains paired-end read synchronization
- 7z compression for efficient storage
- Processed FASTQ files with consistent naming
- Optional M.tb-filtered sequences
- Unaligned reads (when filtering is applied)
- Compressed output for efficient transfer
- Results delivered via email notification