Dear all
If the report from FastQScreen is desirable and works, then it might be possible to swap bowtie2 with minimap2 (as the output is the same format). This would make it handle ONT data more smoothly, and would even work with Illumina data too (but faster).
If the main issue lies within bacteria contamination, i.e. more than one colony was sequenced. Then Malte has built a new tool, that not only detects mixed samples of different species, but also allows detection of same species mixes (e.g. two different strains of E. coli mixed together).
Kraken2 might work with the newer R10.4.1 flowcells, but I have not heard any reports of it yet. Here there might be problems with the long k-mers, and the solely mapping based approach without alignment.
The assembly approach would work, but the computational requirements here will of course be higher. For KCRI this should not be a problem however, as you have a pretty decent HPC setup (larger than our servers at DTU).
Best,
[http://www.dtu.dk/-/media/DTU_Generelt/Andet/mail-signature-logo.png] Philip Thomas Lanken Conradsen Clausen Postdoc National Food Institute plan@food.dtu.dkmailto:plan@food.dtu.dk Kemitorvet Building 204 2800 Kgs. Lyngby www.food.dtu.dkhttp://www.food.dtu.dk/ ________________________________ Fra: Marco van Zwetselaar io@zwets.it Sendt: 13. maj 2024 17:01 Til: bioinfo-list@seqshare.org bioinfo-list@seqshare.org Cc: Philip Thomas Lanken Conradsen Clausen plan@food.dtu.dk Emne: Long read alternative for FastQScreen
Dear all,
In KCRI's Illumina Basecall workflow, I always used to run FastQScreen (with GCHR38 for human, and UniVec_Core for contaminants) to detect contamination.
It's slow but it gives easily interpretable output (including graphs and MultiQC integration), and if desired you very easily make it "bucket" the reads into the database(s) that they map on.
When I copied the job into the Nanopore basecall workflow I was in for a surprise. The jobs took forever or ran out of requested HPC memory (if I remember well, even 96G didn't cut it for some runs).
Clearly, like FastQC, FastQScreen is an oldie, and especially read mapping (which is effectively what it does) has since been optimised a lot. My intuition would be that e.g. KMA would do this in a fraction of the time. (cc-ing Philip in case he's not on this list)
The only thing that FastQScreen effectively adds to the mapping is the user-friendly table and graph.
Anyone keen to add a newer mapper to FastQScreen (it now has BWA and Bowtie2 as options, if I remember well)? Alternatively, suggestions for an alternative to FastQScreen?
During the CoP meeting, more suggestions were made: - Use Kraken (with the added advantage of getting much more information than the "yes/no contamination" from FastQScreen), including quantification of cross-sample/species contamination - Why detect contamination on reads, when it's much easier to do this on the assembly, contaminants will come "falling out" anyway - Also: for assemblies there are tools such as CheckM (for quantifying both completeness and contamination, including within-species)
Marco