May 2024 - Bioinfo List - SeqShare Mailing Lists

Fwd: Reminder to register | Genomic surveillance of pathogens | Frontiers Forum webinar
by Marco van Zwetselaar 29 May '24

29 May '24

Dear all, FYI. Registration is open: https://fro.ntiers.in/znQ6 Kind regards, Marco -------- Forwarded Message -------- Subject: Reminder to register | Genomic surveillance of pathogens | Frontiers Forum webinar Date: Wed, 29 May 2024 09:44:59 +0000 From: Laure Sonnier | Frontiers Forum [<fsci(a)events.frontiersin.org>](mailto:fsci@events.frontiersin.org) To: [89aa0f9662f81a2d5fda6d72096f3727.FSCI_Forum_Net-Zero_Siegert_Deep-dive_Bann… Dear Marco, Don't miss this complimentary Frontiers Forum Deep Dive webinar on 5 June to discuss technologies and enablers for global surveillance of infectious diseases and antimicrobial resistance. The event is led by the former Chief Microbiologist at the European Centre for Disease Prevention and Control (ECDC), Prof Marc Struelens. Marc is joined by expert panel members Prof Vitali Sintchenko, Prof Guido Werner, Prof Margaret Ip, Dr Stephen A Morse, and Dr Josefina Campos. [Register](http://email.events.frontiersin.org/c/eJxszL9q6zAUgPGnkTabo_PHtgY… [Decline](http://email.events.frontiersin.org/c/eJxszL1u7CAQQOGngc7WMAx_BcWV… This interactive webinar builds on a recent Frontiers in Science article advocating for the integration of pathogen genomic data and epidemiological data into globally interconnected One Health surveillance systems. [Read the article](http://email.events.frontiersin.org/c/eJxczjGO6yAQgOHTQBc0zAwGCorX… The article authors and panelists will explore the latest advances in whole genome sequencing of pathogens, and the next steps for deploying real-time genomic surveillance to protect us from future pandemics. You can pose questions to the panel during the audience Q&A. I look forward to the discussion on 5 June. Kind regards, [bc5daef97e935cf7716a1e91a3edec64.Laure-Sonnier.png](http://email.events.fro… Frontiers Forum Deep Dive sessions bring researchers, policy experts, and innovators together from around the world, to discuss a specific area of transformational science published in Frontiers' flagship, multidisciplinary journal,[ Frontiers in Science](http://email.events.frontiersin.org/c/eJxcyrFywyAMANCvgS0-kASYQUMX…, and explore next steps for the field. [Privacy Policy](http://email.events.frontiersin.org/c/eJxcyrGSgyAQANCvgU5nWXYVCopr_… | [Manage your email preferences](http://email.events.frontiersin.org/c/eJxcy0Fz1CAYxvFPA7fNwAsE… [Unsubscribe](http://email.events.frontiersin.org/c/eJxUjc1q8zAQAJ9GusWsdld_…

1 0

Soapbox talk for June meeting
by Marco van Zwetselaar 26 May '24

26 May '24

Dear all, Who of you would be up for a brief talk in the June CoP Bioinformatics meeting? The idea of the "soap box" is that you get to talk (for 10-20mn) about something you've been working on / looking into / essentially anything that could be of interest to fellow bioinformaticians. The meeting will be on 6 June at noon UTC, so there's time to think something up! :-) Cheers Marco

1 0

WHO bacterial priority pathogens list 2024
by Marco van Zwetselaar 23 May '24

23 May '24

The WHO bacterial priority pathogens list 2024 has been issued: https://www.who.int/publications/i/item/9789240093461 Greetings, Marco

1 0

AMRViz enables seamless genomics analysis and visualization of antimicrobial resistance
by Marco van Zwetselaar 20 May '24

20 May '24

Another candidate for the AMR genomics toolbox. Nice graphics for sure. And @Christa: they claim it works for Nanopore reads as well. :) Le, D.Q., Nguyen, S.H., Nguyen, T.T. et al. AMRViz enables seamless genomics analysis and visualization of antimicrobial resistance. BMC Bioinformatics 25, 193 (2024). https://doi.org/10.1186/s12859-024-05792-9 Cheers, Marco

2 1

Concordance of genotypic AMR prediction and lab AST
by Marco van Zwetselaar 16 May '24

16 May '24

Dear all, Does anyone have pointers to studies on the concordance of (our) genotypic AMR predictions and actual lab AST results? When I was at KCRI, we did one such analysis but only for our small collection of Acinetobacters[1]. What I would be interested in is a systematic large scale analysis. As the SeqAfrica consortium (and with other partners in & outside Africa), we should have plenty of data to do such a study. My interest in this used to be due to the obvious reason that we need to know how good our predictions are, but there's a much more compelling reason now to gather an extensive collection of *false negatives*. I'll explain later. Marco [1] https://academic.oup.com/jac/article/74/6/1484/5370329

3 4

Are we missing novel resistance genes?
by Marco van Zwetselaar 14 May '24

14 May '24

Dear all, I think with this final question I have covered all the open ends from the CoP meeting. Please drop anything I missed on the list! This interesting question was brought up by several people in the meeting: to what extent are we able to detect novel resistance genes (or virulence genes, or plasmids)? Given that our current tools are all based on mapping sequence data on reference databases, to what extent does this generalise beyond relatively similar sequences? An obvious direction to look in are deep learning models, which appear to have a remarkable capacity to generalise beyond their training data - but could they actually predict genes coding for a hitherto unknown resistance mechanism, or even predict resistance for a highly diverged gene, while never having seen the phenotypic "ground truth"? This explains my interest in the concordance of genotypic prediction and phenotypic AMR, and collecting the false negatives: isolates with positive AST but negative ResFinder prediction are precisely the ones we'd like to be able to generalise to! In my very limited experience (we did one study in 2021, looking at GPlas / MLPlasmid and Deeplasmid for predicting plasmids), the tools essentially predicted (poorly) what was already in the PlasmidFinder database. Clearly though there has been a lot of development since, and I would be very interested to hear of your experiences! Marco

2 2

Parameter settings question
by Marco van Zwetselaar 14 May '24

14 May '24

Dear all, With internet connectivity seemingly returning to East Africa, I'll spam the list with some more questions that came up in the CoP meeting. Here's the next. When running tools like ResFinder and VirulenceFinder, I generally rely on their default cut-off settings (usually: 60% coverage, 90% identity). Is there an empirical basis for these settings or did someone just pull them out of thin air? Do the default settings suffice or should we be more judicious? Perhaps use different settings for different species, different AMR genes, when testing on reads vs assemblies? Marco

1 0

Long read alternative for FastQScreen
by Marco van Zwetselaar 14 May '24

14 May '24

Dear all, In KCRI's Illumina Basecall workflow, I always used to run FastQScreen (with GCHR38 for human, and UniVec_Core for contaminants) to detect contamination. It's slow but it gives easily interpretable output (including graphs and MultiQC integration), and if desired you very easily make it "bucket" the reads into the database(s) that they map on. When I copied the job into the Nanopore basecall workflow I was in for a surprise. The jobs took forever or ran out of requested HPC memory (if I remember well, even 96G didn't cut it for some runs). Clearly, like FastQC, FastQScreen is an oldie, and especially read mapping (which is effectively what it does) has since been optimised a lot. My intuition would be that e.g. KMA would do this in a fraction of the time. (cc-ing Philip in case he's not on this list) The only thing that FastQScreen effectively adds to the mapping is the user-friendly table and graph. Anyone keen to add a newer mapper to FastQScreen (it now has BWA and Bowtie2 as options, if I remember well)? Alternatively, suggestions for an alternative to FastQScreen? During the CoP meeting, more suggestions were made: - Use Kraken (with the added advantage of getting much more information than the "yes/no contamination" from FastQScreen), including quantification of cross-sample/species contamination - Why detect contamination on reads, when it's much easier to do this on the assembly, contaminants will come "falling out" anyway - Also: for assemblies there are tools such as CheckM (for quantifying both completeness and contamination, including within-species) Marco

2 1

Tip: FastQC crashing on large input files
by Marco van Zwetselaar 13 May '24

13 May '24

Dear all, On the topic of FastQC: if you see FastQC failing on large fastq files, here's the fix. The issue is that FastQC crashes as it runs out of memory, but then exits with a zero exit code, and without logging an error. The reason why it runs out of memory is that its startup script sets the JVM max memory parameter (-Xmx) to 250MB. That number stems straight from the dark ages, when 2GB was a lot already. :-) Two fixes: 1. Edit the fastqc startup script, look for '-Xmx250m' and change this to e.g. '-Xmx8g'. Downside: you must remember to redo this after reinstallation or upgrade. 2. Run fastqc with '-t 32' (or some other large number). In theory this makes it multithread on 32 threads, but in practice it still runs on a single CPU[1]. The upside (despite the lack of speed benefits) is that the startup script now allocates 32 x 250MB memory. Cheers Marco [1] Presumably it uses Java (software) threads, rather than system processes / threads.

1 0

Long read alternative for FastQC?
by Marco van Zwetselaar 13 May '24

13 May '24

Dear all, FastQC has long been the standard choice for screening Illumina reads / runs. What do you use instead for Nanopore reads? In my experience, it's always worthwhile going over the PDF report produced by MinKNOW after a run. Just like the SAV on Illumina, it gives pretty deep insight, esp. when things have gone wrong. :) But what to use if you only have the reads? nanoQC[1] was suggested in the meeting. From the MultiQC docs it appears that it can integrate Nanoplot output[2] (from the maker of NanoQC), which suggests NanoQC integrates in MultiQC as well. (I like MultiQC as it makes it easy to bundle QC outputs for the "customer".) On a separate note: is there a conventional Q cutoff for reporting ONT read quality? I picked "percentage Q17+ bases" as my analogue for the typical "pct Q30+" on Illumina, because Q30 made little sense (esp. in the early Nanopore days), and Q17 corresponds to ~98% accuracy, a decent target for MinION reads. I've always wondered if there isn't a generally agreed Q-cutoff value? Marco [1] https://github.com/wdecoster/nanoQC [2] https://github.com/MultiQC/MultiQC/issues/1074 [3] https://github.com/wdecoster/NanoPlot

1 0

2025

2024

Bioinfo List May 2024