How Are New Peptides Discovered? A Clear Guide - OP Labs (Formerly Oxford Peptides)

In 2024, a single machine-learning study screened the world’s microbial genomes and predicted 863,498 previously unknown candidate antimicrobial peptides in one pass (Santos-Júnior et al., 2024). A decade earlier, that scale would have been unthinkable. Peptide discovery has quietly become one of the fastest-moving corners of molecular biology — and the methods behind it are stranger and more inventive than you might guess.

So how does a peptide actually go from “doesn’t exist in any database” to a known, characterised sequence?

3D render of a bacteriophage capsid with blue glowing edges and gold nanoparticles, representing peptide discovery methods

What is a peptide, really?

A peptide is a short chain of amino acids — typically fewer than fifty, where a protein runs to hundreds or thousands. That shorter length is what makes them workable: they can be synthesised in the lab, sequenced quickly, and searched computationally. If the term is new to you, our guide to peptides vs proteins vs amino acids covers the basics.

“Discovering” a new peptide means identifying a sequence and function that wasn’t previously known — and the methods for doing so have changed more in the last five years than in the previous forty. So where do researchers actually look first?

Where do scientists go looking for new peptides?

The oldest approach is also one of the most productive: look at what living systems already make. Venoms, frog and insect defences, marine organisms and the human microbiome are rich sources of peptides that evolution spent millions of years optimising (Torres et al., 2024).

What’s changed is the scale. Instead of extracting one molecule at a time from a tissue sample, researchers now sequence whole genomes — or entire environmental samples containing thousands of species — and scan computationally for peptide-encoding signatures. The 2024 study above mined 63,410 metagenomes and surfaced almost a million candidate antimicrobial peptides in a single pass (Santos-Júnior et al., 2024).

The hunt has even reached extinct organisms. A deep-learning system called APEX was trained to search the “extinctome” — the reconstructed proteins of extinct species — and identified novel antimicrobial candidates from organisms including the woolly mammoth (Wan et al., 2024). Earlier work by the same group demonstrated the broader “molecular de-extinction” concept across ancient hominin proteomes (Maasch et al., 2023).

These are predictions, not finished products. Each candidate still has to be synthesised and tested experimentally. But finding the candidates at all used to be the bottleneck — and that bottleneck has now substantially shifted.

If nature gives us so many candidates already, why does anyone need to build artificial peptide libraries?

How does phage display screen billions of sequences at once?

Because sometimes researchers don’t yet know which sequence they want. The trick, in that case, is to make billions of variants and let selection reveal the winners.

The foundational technique here is phage display, first described in 1985 (Smith, 1985). Short random peptides are attached to the surface of bacteriophages — viruses that infect bacteria — with each particle carrying both the displayed peptide and the DNA that encodes it. A single library can hold around a billion variants. Researchers expose the library to a target, wash away whatever doesn’t bind, then amplify and re-screen the survivors over several rounds of “biopanning”. Because each phage carries its own genetic code, the winning sequences can simply be sequenced. The technique earned its inventor a share of the 2018 Nobel Prize in Chemistry.

Related platforms have extended the same idea. mRNA display links each peptide to its own messenger RNA, allowing even larger and chemically more diverse libraries. DNA-encoded libraries tag each molecule with a DNA “barcode” so that pooled screening plus sequencing reveals which structures bound. And affinity-selection mass spectrometry screens synthetic peptide mixtures of phage-like diversity and identifies binders directly.

Selection works when you have a target to screen against. But what if you simply want to know which peptides are present in a real biological sample?

Can mass spectrometry read a peptide directly?

Yes — and this is the foundation of an entire field called peptidomics, which catalogues the peptides a biological system already produces.

A mass spectrometer breaks a sample into its peptides, ionises them, and sorts them by mass. By fragmenting each peptide and measuring the masses of the pieces, software can reconstruct the original sequence. There are two main modes. Database searching matches fragmentation patterns against predicted spectra from known proteins — fast, but blind to anything genuinely novel. De novo sequencing reads the sequence directly from the fragment masses with no reference, which is exactly what’s required to identify peptides that don’t appear in any database (Frank & Pevzner, 2005).

Modern de novo algorithms are a big reason truly new peptides can now be pulled from complex samples such as blood, tissue or microbial cultures. For more on the natural origins of these molecules, see where do peptides come from?

All three approaches above search for something that already exists somewhere — in nature, in a synthetic library, or in a biological sample. Could you skip the search entirely?

Can AI design peptides that nature never made?

This is the newest direction in the field, and the one moving fastest. Rather than searching, researchers use machine learning to predict and invent promising sequences before anything is physically made.

The work splits into two complementary jobs. Predictive models act as filters: trained on databases of active and inactive peptides, they learn to score whether a candidate sequence is likely to have a given property such as antimicrobial activity (Veltri et al., 2018). Generative models go further and propose entirely new sequences. HydrAMP, a conditional variational autoencoder developed at the University of Warsaw, generated peptides whose activity was confirmed in wet-lab tests against five bacterial strains (Szymczak et al., 2023). AMPGAN v2 uses a generative adversarial network to similar ends (Van Oort et al., 2021).

The typical workflow is a closed loop: a generator proposes candidates, a predictor ranks them, and only the strongest reach the lab. Researchers also increasingly favour explainable models, because understanding why a sequence scores well turns a guess into a reusable design rule (Wang et al., 2025).

One honest limitation worth stating plainly: the evidence base for AI-designed peptides remains dominated by computational and in vitro results. Models that flag candidates with high confidence often see fewer of them hold up under experimental testing, which is precisely why prediction doesn’t replace experiments — it prioritises which experiments to run.

So whether the candidate came from nature, a library, a mass spec readout or a neural network — what happens next?

How is a new peptide confirmed?

No peptide is genuinely “discovered” until it has been synthesised and tested. The promising sequence is chemically made and run through functional assays appropriate to its predicted role — for an antimicrobial candidate, that means measuring whether it inhibits bacterial growth, at what concentration, and how it affects healthy cells in culture. Initial hits are often weak, so a round of optimisation (random mutagenesis or structure-guided redesign) tunes affinity, stability and selectivity.

It’s worth being precise about the kinds of evidence at each stage. In vitro studies test peptides on isolated cells in a laboratory dish. In vivo studies test them in living organisms, usually mice. A small fraction of candidates progress to human clinical trials. A result in a petri dish is not a result in a person, and a result in a mouse is not a result in a person either — a distinction that matters as much in peptide research as in any other branch of biology.

The whole pipeline is best understood as a design–build–test–learn cycle. Each loop feeds data back into the next, and discovery becomes a tightening spiral rather than a single eureka moment.

Discovery has shifted from luck toward systematic search — and machine learning now explores regions of sequence space that nature never visited.

What’s next for peptide discovery?

What unites these methods is a shift from luck toward systematic search. Phage display industrialised trial and error (Smith, 1985). Mass spectrometry made the invisible readable. Genome mining scaled the hunt to entire ecosystems (Santos-Júnior et al., 2024). And machine learning is now exploring sequence space at a scale never possible before, including regions evolution itself never visited (Wan et al., 2024).

The next frontier is closing the gap between in silico promise and in vivo reality, where most candidates still fail. Expect the most interesting work over the coming years to sit precisely at that junction.

Frequently asked questions

What are the main methods used to discover new peptides?

The four main methods are mining peptides from natural sources (genome and microbiome mining), screening molecular libraries via techniques such as phage display, identifying peptides directly with mass spectrometry, and designing new sequences with machine learning. Most discovery programmes combine several of these approaches (Wan et al., 2024).

What is phage display in peptide discovery?

Phage display is a technique where short peptides are presented on the surface of bacteriophages, with each particle carrying both the displayed peptide and the DNA that encodes it. Researchers screen a library of around a billion variants against a target, then sequence the survivors. It was first described by Smith in 1985 and earned a share of the 2018 Nobel Prize in Chemistry.

How is AI used to discover peptides?

AI is used in two ways: predictive models score whether a candidate peptide is likely to have a desired property (Veltri et al., 2018), and generative models design entirely new sequences (Szymczak et al., 2023). Candidates are then synthesised and tested in the laboratory — computational predictions are a starting point, not a substitute for experiments.

How long does it take to go from peptide discovery to a validated sequence?

It depends on the method. Computational screening can flag candidates in hours, but laboratory synthesis, binding assays and activity testing typically add weeks to months. Most discovery programmes run the full cycle from initial hit to a characterised, validated sequence over several months to a year or more.

Continue reading: Where do peptides come from?

Next in the series, we trace the natural sources that have driven peptide discovery for a century — from cone snails and frog skin to the human gut microbiome. Read the next article →

This content is for educational and informational purposes only. It does not constitute medical advice and should not be used to inform clinical decisions. Our products are not licensed medicines. Please consult a qualified healthcare professional before beginning any new protocol.

References

Frank, A. & Pevzner, P. (2005). PepNovo: De novo peptide sequencing via probabilistic network modeling. Analytical Chemistry, 77(4), 964–973. https://doi.org/10.1021/ac048788h
Maasch, J. R. M. A., Torres, M. D. T., Melo, M. C. R. & de la Fuente-Nunez, C. (2023). Molecular de-extinction of ancient antimicrobial peptides enabled by machine learning. Cell Host & Microbe, 31(8), 1260–1274.e6. https://doi.org/10.1016/j.chom.2023.07.001
Santos-Júnior, C. D., Torres, M. D. T., Duan, Y. et al. (2024). Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell, 187(14), 3761–3778.e16. https://doi.org/10.1016/j.cell.2024.05.013
Smith, G. P. (1985). Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science, 228(4705), 1315–1317. https://doi.org/10.1126/science.4001944
Szymczak, P. et al. (2023). Discovering highly potent antimicrobial peptides with deep generative model HydrAMP. Nature Communications, 14, 1453. https://doi.org/10.1038/s41467-023-36994-z
Torres, M. D. T. et al. (2024). Mining human microbiomes reveals an untapped source of peptide antibiotics. Cell, 187(19), 5453–5467.e15. https://doi.org/10.1016/j.cell.2024.07.027
Van Oort, C. M. et al. (2021). AMPGAN v2: Machine learning-guided design of antimicrobial peptides. Journal of Chemical Information and Modeling, 61(5), 2198–2207. https://doi.org/10.1021/acs.jcim.0c01441
Veltri, D., Kamath, U. & Shehu, A. (2018). Deep learning improves antimicrobial peptide recognition. Bioinformatics, 34(16), 2740–2747. https://doi.org/10.1093/bioinformatics/bty179
Wan, F., Torres, M. D. T., Peng, J. & de la Fuente-Nunez, C. (2024). Deep-learning-enabled antibiotic discovery through molecular de-extinction. Nature Biomedical Engineering, 8(7), 854–871. https://doi.org/10.1038/s41551-024-01201-x
Wang, B., Lin, P., Zhong, Y. et al. (2025). Explainable deep learning and virtual evolution identifies antimicrobial peptides with activity against multidrug-resistant human pathogens. Nature Microbiology, 10, 332–347. https://doi.org/10.1038/s41564-024-01907-3