molecular biology

some-ai-tools-don’t-understand-biology-yet

Some AI tools don’t understand biology yet


A collection of new studies on gene activity shows that AI tools aren’t very good.

Gene activity appears to remain beyond the abilities of AI at the moment. Credit: BSIP

Biology is an area of science where AI and machine-learning approaches have seen some spectacular successes, such as designing enzymes to digest plastics and proteins to block snake venom. But in an era of seemingly endless AI hype, it might be easy to think that we could just set AI loose on the mounds of data we’ve already generated and end up with a good understanding of most areas of biology, allowing us to skip a lot of messy experiments and the unpleasantness of research on animals.

But biology involves a whole lot more than just protein structures. And it’s extremely premature to suggest that AI can be equally effective at handling all aspects of biology. So we were intrigued to see a study comparing a set of AI software packages designed to predict how active genes will be in cells exposed to different conditions. As it turns out, the AI systems couldn’t manage to do any better than a deliberately simplified method of predicting.

The results serve as a useful caution that biology is incredibly complex, and developing AI systems that work for one aspect of it is not an indication that they can work for biology generally.

AI and gene activity

The study was conducted by a trio of researchers based in Heidelberg: Constantin Ahlmann-Eltze, Wolfgang Huber, and Simon Anders. They note that a handful of additional studies have been released while their work was on a pre-print server, all of them coming to roughly the same conclusions. But these authors’ approach is pretty easy to understand, so we’ll use it as an example.

The AI software they examined attempts to predict changes in gene activity. While every cell carries copies of the roughly 20,000 genes in the human genome, not all of them are active in a given cell—”active” in this case meaning they are producing messenger RNAs. Some provide an essential function and are active at high levels at all times. Others are only active in specific cell types, like nerves or skin. Still others are activated under specific conditions, like low oxygen or high temperatures.

Over the years, we’ve done many studies examining the activity of every gene in a given cell type under different conditions. These studies can range from using gene chips to determine which messenger RNAs are present in a population of cells to sequencing the RNAs isolated from single cells and using that data to identify which genes are active. But collectively, they can provide a broad, if incomplete, picture that links the activity of genes with different biological circumstances. It’s a picture you could potentially use to train an AI that would make predictions about gene activity under conditions that haven’t been tested.

Ahlmann-Eltze, Huber, and Anders tested a set of what are called single-cell foundation models that have been trained on this sort of gene activity data. The “single cell” portion indicates that these models have been trained on gene activity obtained from individual cells rather than a population average of a cell type. Foundation models mean that they have been trained on a broad range of data but will require additional training before they’re deployed for a specific task.

Underwhelming performance

The task in this case is predicting how gene activity might change when genes are altered. When an individual gene is lost or activated, it’s possible that the only messenger RNA that is altered is the one made by that gene. But some genes encode proteins that regulate a collection of other genes, in which case you might see changes in the activity of dozens of genes. In other cases, the loss or activation of a gene could affect a cell’s metabolism, resulting in widespread alterations of gene activity.

Things get even more complicated when two genes are involved. In many cases, the genes will do unrelated things, and you get a simple additive effect: the changes caused by the loss of one, plus the changes caused by the loss of others. But if there’s some overlap between the functions, you can get an enhancement of some changes, suppression of others, and other unexpected changes.

To start exploring these effects, researchers have intentionally altered the activity of one or more genes using the CRISPR DNA editing technology, then sequenced every RNA in the cell afterward to see what sorts of changes took place. This approach (termed Perturb-seq) is useful because it can give us a sense of what the altered gene does in a cell. But for Ahlmann-Eltze, Huber, and Anders, it provides the data they need to determine if these foundation models can be trained to predict the ensuing changes in the activity of other genes.

Starting with the foundation models, the researchers conducted additional training using data from an experiment where either one or two genes were activated using CRISPR. This training used the data from 100 individual gene activations and another 62 where two genes were activated. Then, the AI packages were asked to predict the results for another 62 pairs of genes that were activated. For comparison, the researchers also made predictions using two extremely simple models: one that always predicted that nothing would change and a second that always predicted an additive effect (meaning that activating genes A and B would produce the changes caused by activating A plus the changes caused by activating B).

They didn’t work. “All models had a prediction error substantially higher than the additive baseline,” the researchers concluded. The result held when the researchers used alternative measurements of the accuracy of the AI’s predictions.

The gist of the problem seemed to be that the trained foundation models weren’t very good at predicting when the alterations of pairs of genes would produce complex patterns of changes—when the alteration of one gene synergized with the alteration of a second. “The deep learning models rarely predicted synergistic interactions, and it was even rarer that those predictions were correct,” the researchers concluded. In a separate test that looked specifically at these synergies between genes, it turned out that none of the models were better than the simplified system that always predicted no changes.

Not there yet

The overall conclusions from the work are pretty clear. “As our deliberately simple baselines are incapable of representing realistic biological complexity yet were not outperformed by the foundation models,” the researchers write, “we conclude that the latter’s goal of providing a generalizable representation of cellular states and predicting the outcome of not-yet-performed experiments is still elusive.”

It’s important to emphasize that “still elusive” doesn’t mean we’re incapable of ever developing an AI that can help with this problem. It also doesn’t mean that this applies to all cellular states (the results are specific to gene activity), much less all of biology. At the same time, the work provides a valuable caution at a time when there’s a lot of enthusiasm for the idea that AI’s success in a couple of areas means we’re on the cusp of a world where it can be applied to anything.

Nature Methods, 2025. DOI: 10.1038/s41592-025-02772-6  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Some AI tools don’t understand biology yet Read More »

dna-based-bacterial-parasite-uses-completely-new-dna-editing-method

DNA-based bacterial parasite uses completely new DNA-editing method

Top row: individual steps in the reaction process. Bottom row: cartoon diagram of the top, showing the position of each DNA and RNA strand.

Enlarge / Top row: individual steps in the reaction process. Bottom row: cartoon diagram of the top, showing the position of each DNA and RNA strand.

Hiraizumi, et. al.

While CRISPR is probably the most prominent gene-editing technology, there are a variety of others, some developed before, others since. And people have been developing CRISPR variants to perform more specialized functions, like altering specific bases. In all of these cases, researchers are trying to balance a number of competing factors: convenience; flexibility; specificity and precision for the editing; low error rates; and so on.

So, having additional options for editing can be a good thing, enabling new ways of balancing those different needs. On Wednesday, a pair of papers in Nature describe a DNA-based parasite that moves itself around bacterial genomes through a mechanism that hasn’t been previously described. It’s nowhere near ready for use in humans, but it may have some distinctive features that make it worth further development.

Going mobile

Mobile genetic elements, commonly called transposons, are quite common in many species—they make up nearly half the sequences in the human genome, for example. They are indeed mobile, showing up in new locations throughout the genome, sometimes by cutting themselves out and hopping to new locations, other times by sending a copy out to a new place in the genome. For any of this to work, they need to have an enzyme that cuts DNA and specifically recognizes the right transposon sequence to insert into the cut.

The specificity of that interaction, needed to ensure the system only inserts new copies of itself, and the cutting of DNA, are features we’d like for gene editing, which places a value on better understanding these systems.

Bacterial genomes tend to have very few transposons—the extra DNA isn’t really in keeping with the bacterial reproduction approach of “copy all the DNA as quickly as possible when there’s food around.” Yet bacterial transposons do exist, and a team of scientists based in the US and Japan identified one with a rather unusual feature. As an intermediate step in moving to a new location, the two ends of the transposon (called IS110) are linked together to form a circular piece of DNA.

In its circular form, the DNA sequences at the junction act as a signal that tells the cell to make an RNA copy of nearby DNA (termed a “promoter”). When linear, each of the two bits of DNA on either side of the junction lacks the ability to act as a signal; it only works when the transposon is circular. And the researchers confirmed that there is in fact an RNA produced by the circular form, although the RNA does not encode for any proteins.

So, the research team looked at over 100 different relatives of IS110 and found that they could all produce similar non-protein-coding RNAs, all of which shared some key features. These included stretches where nearby sections of the RNA could base-pair with each other, leaving an unpaired loop of RNA in between. Two of these loops contained sequences that either base-paired with the transposon itself or at the sites in the E. coli genome where it inserted.

That suggests that the RNA produced by the circular form of the transposon helped to act as a guide, ensuring that the transposon’s DNA was specifically used and only inserted into precise locations in the genome.

Editing without precision

To confirm this was right, the researchers developed a system where the transposon would produce a fluorescent protein when it was properly inserted into the genome. They used this to show that mutations in the loop that recognized the transposon would stop it from being inserted into the genome—and that it was possible to direct it to new locations in the genome by changing the recognition sequences in the second loop.

To show this was potentially useful for gene editing, the researchers blocked the production of the transposon’s own RNA and fed it a replacement RNA that worked. So, you could potentially use this system to insert arbitrary DNA sequences into arbitrary locations in a genome. It could also be used with targeting RNAs that caused specific DNA sequences to be deleted. All of this is potentially very useful for gene editing.

Emphasis on “potentially.” The problem is that the targeting sequences in the loops are quite short, with the insertion site targeted by a recognition sequence that’s only four to seven bases long. At the short end of this range, you’d expect that a random string of bases would have an insertion site about once every 250 bases.

That relatively low specificity showed. At the high end, various experiments could see an insertion accuracy ranging from a close-to-being-useful 94 percent down to a positively threatening 50 percent. For deletion experiments, the low end of the range was a catastrophic 32 percent accuracy. So, while this has some features of an interesting gene-editing system, there’s a lot of work to do before it could fulfill that potential. It’s possible that these recognition loops could be made longer to add the sort of specificity that would be needed for editing vertebrate genomes, but we simply don’t know at this point.

DNA-based bacterial parasite uses completely new DNA-editing method Read More »

what-turns-a-fungal-scavenger-into-a-killer?

What turns a fungal scavenger into a killer?

Fungus feeding —

Beware the sticky, tricky genetic weapons of a fungal carnivore.

greyscale microscope image of a long, thin, unsegmented worm.

Enlarge / The fungus’ favorite food.

Some of the scariest monsters are microscopic. The carnivorous fungus Arthrobotrys oligospora doesn’t seem like much while it’s eating away at rotting wood. But when it senses a live worm, it will trap its victim and consume it alive—pure nightmare fuel.

Until now, not much was known about how the attack of the killer fungus happens on a molecular level. Researchers from Academia Sinica in Taiwan have finally found out how the gene activity of the fungus changes when a nematode creeps too close to A. oligospora. Led by molecular biologist Hung-Che Lin, the research team discovered that the fungus synthesizes a sort of worm adhesive and additional trapping proteins to get ahold of its meal. It then produces enzymes that break down the worm so it can start feasting.

Caught in a trap

A. oligospora lives in the soil and is mostly saprotrophic, meaning it feeds on decaying organic matter. But that can quickly change if it finds itself deprived of nutrients or senses a tempting nematode nearby. This is when it goes into carnivore mode.

Lin and his colleagues wanted to see what happened when the fungus, low on nutrients, was introduced to the nematode Caenorhabiditis elegans. The fungus showed a significant increase in DNA replication when it sensed the worm. This resulted in trap cells having additional copies of the genome. The trap cells reside in fungal filaments, or hyphae, and produce a specialized worm adhesive that would allow those hyphae to stick to the worm once it was caught in the trap.

What may be the most important genetic actions in helping the fungus to create a trap out of hyphae is ribosome biogenesis, which enables increased protein production. Ribosomes are where proteins are made, so their biogenesis (literally the creation of more ribosomes) controls cell growth and also determines how much protein is synthesized.

The researchers also identified a new group of proteins, now known as Trap Enriched Proteins (TEPs), which were the most commonly produced proteins in fungal trap cells. These seemed to contribute to trap function rather than formation.

“Given TEP protein localization to the surface of trap cells, we hypothesized that TEPs may be critical for the function of the traps,” they said in a study recently published in PLoS Biology. “Adding C. elegans… leads to their immediate capture.”

As the fungus put more effort into creating a trap and forming worm adhesive, it deprioritized activities that are not really involved in the process. Segments of DNA that usually help A. oligospora digest dead matter were down-regulated, meaning there was lower gene activity on these segments in response to the fungus sensing the worm. When a worm came close to A. oligospora, the fungus showed an up-regulation of genes that produce proteases, or enzymes that break down proteins.

Can’t get out

Additional other genes didn’t see changes in activity until the worm was already caught. Once C. elegans entered the trap that A. oligospora had set with a sticky net of hyphae, the team noticed an increase in the production of proteins that weaken prey. These proteins are able to manipulate the cells of their prey so those cells function differently, potentially providing a way for the pathogen to break in and take over. The fungus then uses proteases to digest nematodes that get stuck in its hyphae.

A. oligospora has over 400 genes that encode proteins that control its interactions with other organisms. When the introduction of a nematode made the fungus go carnivorous, more than half of these started to behave differently. These proteins weaken C. elegans through a variety of mechanisms. To give one example, some of them fight off antimicrobial peptides produced by the nematode.

The adhesive synthesized by the fungus, now thought to have a close association with TEP proteins, may have no effect on humans but is a superglue for worms that binds hyphae to their flesh. They have no way of worming their way out of being eaten alive.

This experiment might have been ghastly for the nematodes involved, but it was a breakthrough for Lin’s team. They have now identified an entire new group of genes that make a fungal trap function. Their findings with A. oligospora could be compared to the gene activity of other pathogenic fungi, including those that destroy crops, so an improved generation of antifungals might someday be influenced by this microscopic horror movie.

PLOS Biology, 2023.  DOI: 10.1371/journal.pbio.3002400

What turns a fungal scavenger into a killer? Read More »