Dear all,
Does anyone have pointers to studies on the concordance of (our) genotypic AMR predictions and actual lab AST results?
When I was at KCRI, we did one such analysis but only for our small collection of Acinetobacters[1]. What I would be interested in is a systematic large scale analysis.
As the SeqAfrica consortium (and with other partners in & outside Africa), we should have plenty of data to do such a study.
My interest in this used to be due to the obvious reason that we need to know how good our predictions are, but there's a much more compelling reason now to gather an extensive collection of *false negatives*. I'll explain later.
Marco
Hi Marco, and everyone.
I don’t know if this will be helpful - but please check the attached pre-print: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4652526 We compared 3104 AMR genomic results with phenotypic AST results and got the attached table (please do not distribute - work is under review)
Useful studies for penicillin resistance comparison between phenotypic and genomic results are attached below for S. pneumoniae - as an example (the papers are related). The findings of these papers are incorporated in the S. pneumoniae pipeline that we use and similar definitions are adapted in the above pre-print.
https://pubmed.ncbi.nlm.nih.gov/27302760/ - Penicillin-binding protein transpeptidase signatures for tracking and predicting β-lactam resistance levels in Streptococcus pneumoniae https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-4017-7 - Validation of β-lactam minimum inhibitory concentration predictions for pneumococcal isolates with newly encountered penicillin binding protein (PBP) sequences
I'm happy to chat more about this.
Best, Cebile
On 14/05/2024, 11:14, "Marco van Zwetselaar via Bioinfo List" <bioinfo-list@seqshare.org mailto:bioinfo-list@seqshare.org> wrote:
Dear all,
Does anyone have pointers to studies on the concordance of (our) genotypic AMR predictions and actual lab AST results?
When I was at KCRI, we did one such analysis but only for our small collection of Acinetobacters[1]. What I would be interested in is a systematic large scale analysis.
As the SeqAfrica consortium (and with other partners in & outside Africa), we should have plenty of data to do such a study.
My interest in this used to be due to the obvious reason that we need to know how good our predictions are, but there's a much more compelling reason now to gather an extensive collection of *false negatives*. I'll explain later.
Marco
[1] https://academic.oup.com/jac/article/74/6/1484/5370329 https://academic.oup.com/jac/article/74/6/1484/5370329
_______________________________________________ Bioinfo List mailing list -- bioinfo-list@seqshare.org mailto:bioinfo-list@seqshare.org To unsubscribe send an email to bioinfo-list-leave@seqshare.org mailto:bioinfo-list-leave@seqshare.org
Thanks Cebile,
On 14/05/2024 12:33, Cebile Lekhuleni wrote:
I don’t know if this will be helpful - but please check the attached pre-print: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4652526 We compared 3104 AMR genomic results with phenotypic AST results and got the attached table (please do not distribute - work is under review)
This is great, and (from a quick read) it seems you got very high concordance numbers. With that number of isolates, it's a treasure trove. Looking forward to taking a closer look!
Thanks again, Marco
PS: I have suppressed the attachments in the list archives (the list is archived publicly!) so they don't pop up in searches. Can undo once they've passed review.
If you are interested in enterics, we have some of this concordance data on https://wwwn.cdc.gov/narmsnow/
There is also the AMRFinder validation paper from NCBI that does phenotype to genotype correlations https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6811410/
For false negatives, do you mean phenotypically resistant but AMR gene or mutation not identified?
Thanks,
Heather
-----Original Message----- From: Marco van Zwetselaar via Bioinfo List bioinfo-list@seqshare.org Sent: Tuesday, May 14, 2024 5:13 AM To: bioinfo-list@seqshare.org Subject: [Bioinfo-list] Concordance of genotypic AMR prediction and lab AST
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. ________________________________
Dear all,
Does anyone have pointers to studies on the concordance of (our) genotypic AMR predictions and actual lab AST results?
When I was at KCRI, we did one such analysis but only for our small collection of Acinetobacters[1]. What I would be interested in is a systematic large scale analysis.
As the SeqAfrica consortium (and with other partners in & outside Africa), we should have plenty of data to do such a study.
My interest in this used to be due to the obvious reason that we need to know how good our predictions are, but there's a much more compelling reason now to gather an extensive collection of *false negatives*. I'll explain later.
Marco
[1] https://academic.oup.com/jac/article/74/6/1484/5370329
_______________________________________________ Bioinfo List mailing list -- bioinfo-list@seqshare.org To unsubscribe send an email to bioinfo-list-leave@seqshare.org
Thanks Heather,
On 14/05/2024 15:50, Carleton, Heather (CDC/NCEZID/DFWED/EDLB) via Bioinfo List wrote:
If you are interested in enterics, we have some of this concordance data on https://wwwn.cdc.gov/narmsnow/ There is also the AMRFinder validation paper from NCBI that does phenotype to genotype correlations https://www.ncbi.nlm.nih.govg/pmc/articles/PMC6811410/
Great, will definitely look into those!
For false negatives, do you mean phenotypically resistant but AMR gene or mutation not identified?
Yes. The issue I see with the machine learning papers is that even if they split the data into a training set (the data that the model is allowed to see and learns to recognise; usually 80% of the available data) and a validation set (the remaining 20%, which the model has never seen and is asked to predict in order to assess its validity), the nature of the data we're dealing with is such that obtaining a high validation rate is (a) almost inevitable and (b) not helpful for assessing generalisability.
What I mean is this: if I train a deep neural net on 80% of the ResFinder database, while keeping 20% (randomly selected) away from it, then - due to orthology - I suspect it will classify most of the unseen 20% correctly, simply because it had seen a sufficiently similar sequence in the training set.
It's cool that we can train a model to learn this, of course, but we could already do that the classical way: just blast the unknown sequence against the known 80%. Effectively, we've trained a model to measure sequence similarity.
I'm being a bit unfair here: what is remarkable is that even if you thin out the data to not contain sequences with more than 90% mutual similarity (as in the ARGnet paper), the models still attain a high validation rate. Presumably DNNs somehow accommodate for rearrangements that we don't capture with similarity metrics. They are especially strong at predicting non-linear relationships and detecting structure in high-dimensional spaces - typically the problem area for GWAS. Maybe individual resistance genes aren't quite the right "problem level" for DNNs.
Generalisability is a separate issue: no matter how well a model predicts the validation set, what we are interested in is predicting _novel_ resistance. But how do we know the model is correct? This is biology, we first need to see it happen in real life.
That's where the false negatives come in! Isolates that we know are resistant, but that have no matches with the known ARG databases. Those will be interesting cases to challenge deep learning models with.
Cheers Marco