Dear all

If the report from FastQScreen is desirable and works, then it might be possible to swap bowtie2 with minimap2 (as the output is the same format). This would make it handle ONT data more smoothly, and would even work with Illumina data too (but faster).

If the main issue lies within bacteria contamination, i.e. more than one colony was sequenced. Then Malte has built a new tool, that not only detects mixed samples of different species, but also allows detection of same species mixes (e.g. two different strains of E. coli mixed together).

Kraken2 might work with the newer R10.4.1 flowcells, but I have not heard any reports of it yet. Here there might be problems with the long k-mers, and the solely mapping based approach without alignment.

The assembly approach would work, but the computational requirements here will of course be higher. For KCRI this should not be a problem however, as you have a pretty decent HPC setup (larger than our servers at DTU).

Best,

Philip Thomas Lanken Conradsen Clausen
Postdoc
National Food Institute
Kemitorvet
Building 204
2800 Kgs. Lyngby

Fra: Marco van Zwetselaar <io@zwets.it>
Sendt: 13. maj 2024 17:01
Til: bioinfo-list@seqshare.org <bioinfo-list@seqshare.org>
Cc: Philip Thomas Lanken Conradsen Clausen <plan@food.dtu.dk>
Emne: Long read alternative for FastQScreen
 
Dear all,

In KCRI's Illumina Basecall workflow, I always used to run FastQScreen
(with GCHR38 for human, and UniVec_Core for contaminants) to detect
contamination.

It's slow but it gives easily interpretable output (including graphs and
MultiQC integration), and if desired you very easily make it "bucket"
the reads into the database(s) that they map on.

When I copied the job into the Nanopore basecall workflow I was in for a
surprise. The jobs took forever or ran out of requested HPC memory (if I
remember well, even 96G didn't cut it for some runs).

Clearly, like FastQC, FastQScreen is an oldie, and especially read
mapping (which is effectively what it does) has since been optimised a
lot. My intuition would be that e.g. KMA would do this in a fraction of
the time. (cc-ing Philip in case he's not on this list)

The only thing that FastQScreen effectively adds to the mapping is the
user-friendly table and graph.

Anyone keen to add a newer mapper to FastQScreen (it now has BWA and
Bowtie2 as options, if I remember well)? Alternatively, suggestions for
an alternative to FastQScreen?

During the CoP meeting, more suggestions were made:
  - Use Kraken (with the added advantage of getting much more
information than the "yes/no contamination" from FastQScreen), including
quantification of cross-sample/species contamination
  - Why detect contamination on reads, when it's much easier to do this
on the assembly, contaminants will come "falling out" anyway
  - Also: for assemblies there are tools such as CheckM (for quantifying
both completeness and contamination, including within-species)

Marco