Bonobos, great apes related to us and chimpanzees that live in the Republic of Congo, communicate with vocal calls including peeps, hoots, yelps, grunts, and whistles. Now, a team of Swiss scientists led by Melissa Berthet, an evolutionary anthropologist at the University of Zurich, discovered bonobos can combine these basic sounds into larger semantic structures. In these communications, meaning is something more than just a sum of individual calls—a trait known as non-trivial compositionality, which we once thought was uniquely human.
To do this, Berthet and her colleagues built a database of 700 bonobo calls and deciphered them using methods drawn from distributional semantics, the methodology we’ve relied on in reconstructing long-lost languages like Etruscan or Rongorongo. For the first time, we have a glimpse into what bonobos mean when they call to each other in the wild.
Context is everything
The key idea behind distributional semantics is that when words appear in similar contexts, they tend to have similar meanings. To decipher an unknown language, you need to collect a large corpus of words and turn those words into vectors—mathematical representations that let you place them in a multidimensional semantic space. The second thing you need is context data, which tells you the circumstances in which these words were used (that gets vectorized, too). When you map your word vectors onto context vectors in this multidimensional space, what usually happens is that words with similar meaning end up close to each other. Berthet and her colleagues wanted to apply the same trick to bonobos’ calls. That seemed straightforward at first glance, but proved painfully hard to execute.
“We worked at a camp in the forest, got up super early at 3: 30 in the morning, walked one or two hours to get to the bonobos’ nest. At [the] time they would wake up, I would switch my microphone on for the whole day to collect as many vocalizations as I could,” Berthet says. Each recorded call then had to be annotated with a horribly long list of contextual parameters. Berthet had a questionnaire filled with queries like: is there a neighboring group around; are there predators around; is the caller feeding, resting, or grooming; is another individual approaching the caller, etc. There were 300 questions that had to be answered for each of the 700 recorded calls.
Due to past work, we’ve already identified the brain structure that controls the activity of the key vocal organ, the syrinx, located in the bird’s throat. The new study, done by Zetian Yang and Michael Long of New York University, managed to place fine electrodes into this area of the brain in both species and track the activity of neurons there while the birds were awake and going about normal activities. This allowed them to associate neural activity with any vocalizations made by the birds. For the budgerigars, they had an average of over 1,000 calls from each of the four birds carrying the implanted electrodes.
For the zebra finch, neural activity during song production showed a pattern that was based on timing; the same neurons tended to be most active at the same point in the song. You can think of this as a bit like a player piano central organizing principle, timing when different notes should be played. “Different configurations [of neurons] are active at different moments, representing an evolving population ‘barcode,’” as Yang and Long describe this pattern.
That is not at all what was seen with the budgerigars. Here, instead, they saw patterns where the same populations of neurons tended to be active when the bird was producing a similar sound. They broke the warbles down into parts that they characterized on a scale that ranged from harmonic to noisy. They found that the groups of neurons tended to be more active whenever the warble was harmonic, and different groups tended to spike when it got noisy. Those observations led them to identify a third population, which was active whenever the budgerigars produced a low-frequency sound.
In addition, Yang and Long analyzed the pitch of the vocalizations. Only about half of the neurons in the relevant region of the brain were linked to pitch. However, the half that was linked had small groups of neurons that fired during the production of a relatively narrow range of pitches. They could use the activity of as few as five individual neurons and accurately predict the pitch of the vocalizations at the time.
Utterances like um, wow, and mm-hmm aren’t garbage—they keep conversations flowing.
Interjections—one-word utterances that aren’t part of a larger sentence—used to be dismissed as irrelevant linguistic detritus. But some linguists now think they play an essential role in regulating conversations. Credit: Daniel Garcia/Knowable Magazine
Interjections—one-word utterances that aren’t part of a larger sentence—used to be dismissed as irrelevant linguistic detritus. But some linguists now think they play an essential role in regulating conversations. Credit: Daniel Garcia/Knowable Magazine
Listen carefully to a spoken conversation and you’ll notice that the speakers use a lot of little quasi-words—mm-hmm, um, huh? and the like—that don’t convey any information about the topic of the conversation itself. For many decades, linguists regarded such utterances as largely irrelevant noise, the flotsam and jetsam that accumulate on the margins of language when speakers aren’t as articulate as they’d like to be.
But these little words may be much more important than that. A few linguists now think that far from being detritus, they may be crucial traffic signals to regulate the flow of conversation as well as tools to negotiate mutual understanding. That puts them at the heart of language itself—and they may be the hardest part of language for artificial intelligence to master.
“Here is this phenomenon that lives right under our nose, that we barely noticed,” says Mark Dingemanse, a linguist at Radboud University in the Netherlands, “that turns out to upend our ideas of what makes complex language even possible in the first place.”
For most of the history of linguistics, scholars have tended to focus on written language, in large part because that’s what they had records of. But once recordings of conversation became available, they could begin to analyze spoken language the same way as writing.
When they did, they observed that interjections—that is, short utterances of just a word or two that are not part of a larger sentence—were ubiquitous in everyday speech. “One in every seven utterances are one of these things,” says Dingemanse, who explores the use of interjections in the 2024 Annual Review of Linguistics. “You’re going to find one of those little guys flying by every 12 seconds. Apparently, we need them.”
Many of these interjections serve to regulate the flow of conversation. “Think of it as a tool kit for conducting interactions,” says Dingemanse. “If you want to have streamlined conversations, these are the tools you need.” An um or uh from the speaker, for example, signals that they’re about to pause, but aren’t finished speaking. A quick huh? or what? from the listener, on the other hand, can signal a failure of communication that the speaker needs to repair.
That need seems to be universal: In a survey of 31 languages around the world, Dingemanse and his colleagues found that all of them used a short, neutral syllable similar to huh? as a repair signal, probably because it’s quick to produce. “In that moment of difficulty, you’re going to need the simplest possible question word, and that’s what huh? is,” says Dingemanse. “We think all societies will stumble on this, for the same reason.”
Other interjections serve as what some linguists call “continuers,” such as mm-hmm — signals from the listener that they’re paying attention and the speaker should keep going. Once again, the form of the word is well suited to its function: Because mm-hmm is made with a closed mouth, it’s clear that the signaler does not intend to speak.
Sign languages often handle continuers differently, but then again, two people signing at the same time can be less disruptive than two people speaking, says Carl Börstell, a linguist at the University of Bergen in Norway. In Swedish Sign Language, for example, listeners often sign yes as a continuer for long stretches, but to keep this continuer unobtrusive, the sender tends to hold their hands lower than usual.
Different interjections can send slightly different signals. Consider, for example, one person describing to another how to build a piece of Ikea furniture, says Allison Nguyen, a psycholinguist at Illinois State University. In such a conversation, mm-hmm might indicate that the speaker should continue explaining the current step, while yeah or OK would imply that the listener is done with that step and it’s time to move on to the next.
Wow! There’s more
Continuers aren’t merely for politeness—they really matter to a conversation, says Dingemanse. In one classic experiment from more than two decades ago, 34 undergraduate students listened as another volunteer told them a story. Some of the listeners gave the usual “I’m listening” signals, while others—who had been instructed to count the number of words beginning with the letter t—were too distracted to do so. The lack of normal signals from the listeners led to stories that were less well crafted, the researchers found. “That shows that these little words are quite consequential,” says Dingemanse.
Nguyen agrees that such words are far from meaningless. “They really do a lot for mutual understanding and mutual conversation,” she says. She’s now working to see if emojis serve similar functions in text conversations.
Storytellers depend on feedback such as mm-hmm and other interjections from their listeners. In this experiment, some listeners were told to count the number of times the storyteller used a word starting with t—a challenging task that prevented them from giving normal feedback. The quality of storytelling declined significantly, with problems like abrupt endings, rambling on, uneven or choppy pacing and overexplaining or justifying the point. Credit: Knowable Magazine
The role of interjections goes even deeper than regulating the flow of conversation. Interjections also help in negotiating the ground rules of a conversation. Every time two people converse, they need to establish an understanding of where each is coming from: what each participant knows to begin with, what they think the other person knows and how much detail they want to hear. Much of this work—what linguists call “grounding”—is carried out by interjections.
“If I’m telling you a story and you say something like ‘Wow!’ I might find that encouraging and add more detail,” says Nguyen. “But if you do something like, ‘Uh-huh,’ I’m going to assume you aren’t interested in more detail.”
A key part of grounding is working out what each participant thinks about the other’s knowledge, says Martina Wiltschko, a theoretical linguist at the Catalan Institution for Research and Advanced Studies in Barcelona, Spain. Some languages, like Mandarin, explicitly differentiate between “I’m telling you something you didn’t know” and “I’m telling you something that I think you knew already.” In English, that task falls largely on interjections.
One of Wiltschko’s favorite examples is the Canadian eh? “If I tell you you have a new dog, I’m usually not telling you stuff you don’t know, so it’s weird for me to tell you,” she says. But ‘You have a new dog, eh?’ eliminates the weirdness by flagging the statement as news to the speaker, not the listener.
Other interjections can indicate that the speaker knows they’re not giving the other participant what they sought. “If you ask me what’s the weather like in Barcelona, I can say ‘Well, I haven’t been outside yet,’” says Wiltschko. The well is an acknowledgement that she’s not quite answering the question.
Wiltschko and her students have now examined more than 20 languages, and every one of them uses little words for negotiations like these. “I haven’t found a language that doesn’t do these three general things: what I know, what I think you know and turn-taking,” she says. They are key to regulating conversations, she adds: “We are building common ground, and we are taking turns.”
Details like these aren’t just arcana for linguists to obsess over. Using interjections properly is a key part of sounding fluent in speaking a second language, notes Wiltschko, but language teachers often ignore them. “When it comes to language teaching, you get points deducted for using ums and uhs, because you’re ‘not fluent,’” she says. “But native speakers use them, because it helps! They should be taught.” Artificial intelligence, too, can struggle to use interjections well, she notes, making them the best way to distinguish between a computer and a real human.
And interjections also provide a window into interpersonal relationships. “These little markers say so much about what you think,” she says—and they’re harder to control than the actual content. Maybe couples therapists, for example, would find that interjections afford useful insights into how their clients regard one another and how they negotiate power in a conversation. The interjection oh often signals confrontation, she says, as in the difference between “Do you want to go out for dinner?” and “Oh, so now you want to go out for dinner?”
Indeed, these little words go right to the heart of language and what it is for. “Language exists because we need to interact with one another,” says Börstell. “For me, that’s the main reason for language being so successful.”
Dingemanse goes one step further. Interjections, he says, don’t just facilitate our conversations. In negotiating points of view and grounding, they’re also how language talks about talking.
“With huh? you say not just ‘I didn’t understand,’” says Dingemanse. “It’s ‘I understand you’re trying to tell me something, but I didn’t get it.’” That reflexivity enables more sophisticated speech and thought. Indeed, he says, “I don’t think we would have complex language if it were not for these simple words.”
Knowable Magazine explores the real-world significance of scholarly work through a journalistic lens.
There’s a difference between knowing a word and knowing a concept.
Large language models like ChatGPT display conversational skills, but the problem is they don’t really understand the words they use. They are primarily systems that interact with data obtained from the real world but not the real world itself. Humans, on the other hand, associate language with experiences. We know what the word “hot” means because we’ve been burned at some point in our lives.
Is it possible to get an AI to achieve a human-like understanding of language? A team of researchers at the Okinawa Institute of Science and Technology built a brain-inspired AI model comprising multiple neural networks. The AI was very limited—it could learn a total of just five nouns and eight verbs. But their AI seems to have learned more than just those words; it learned the concepts behind them.
Babysitting robotic arms
“The inspiration for our model came from developmental psychology. We tried to emulate how infants learn and develop language,” says Prasanna Vijayaraghavan, a researcher at the Okinawa Institute of Science and Technology and the lead author of the study.
While the idea of teaching AIs the same way we teach little babies is not new—we applied it to standard neural nets that associated words with visuals. Researchers also tried teaching an AI using a video feed from a GoPro strapped to a human baby. The problem is babies do way more than just associate items with words when they learn. They touch everything—grasp things, manipulate them, throw stuff around, and this way, they learn to think and plan their actions in language. An abstract AI model couldn’t do any of that, so Vijayaraghavan’s team gave one an embodied experience—their AI was trained in an actual robot that could interact with the world.
Vijayaraghavan’s robot was a fairly simple system with an arm and a gripper that could pick objects up and move them around. Vision was provided by a simple RGB camera feeding videos in a somewhat crude 64×64 pixels resolution.
The robot and the camera were placed in a workspace, put in front of a white table with blocks painted green, yellow, red, purple, and blue. The robot’s task was to manipulate those blocks in response to simple prompts like “move red left,” “move blue right,” or “put red on blue.” All that didn’t seem particularly challenging. What was challenging, though, was building an AI that could process all those words and movements in a manner similar to humans. “I don’t want to say we tried to make the system biologically plausible,” Vijayaraghavan told Ars. “Let’s say we tried to draw inspiration from the human brain.”
Chasing free energy
The starting point for Vijayaraghavan’s team was the free energy principle, a hypothesis that the brain constantly makes predictions about the world based on internal models, then updates these predictions based on sensory input. The idea is that we first think of an action plan to achieve a desired goal, and then this plan is updated in real time based on what we experience during execution. This goal-directed planning scheme, if the hypothesis is correct, governs everything we do, from picking up a cup of coffee to landing a dream job.
All that is closely intertwined with language. Neuroscientists at the University of Parma found that motor areas in the brain got activated when the participants in their study listened to action-related sentences. To emulate that in a robot, Vijayaraghavan used four neural networks working in a closely interconnected system. The first was responsible for processing visual data coming from the camera. It was tightly integrated with a second neural net that handled proprioception: all the processes that ensured the robot was aware of its position and the movement of its body. This second neural net also built internal models of actions necessary to manipulate blocks on the table. Those two neural nets were additionally hooked up to visual memory and attention modules that enabled them to reliably focus on the chosen object and separate it from the image’s background.
The third neural net was relatively simple and processed language using vectorized representations of those “move red right” sentences. Finally, the fourth neural net worked as an associative layer and predicted the output of the previous three at every time step. “When we do an action, we don’t always have to verbalize it, but we have this verbalization in our minds at some point,” Vijayaraghavan says. The AI he and his team built was meant to do just that: seamlessly connect language, proprioception, action planning, and vision.
When the robotic brain was up and running, they started teaching it some of the possible combinations of commands and sequences of movements. But they didn’t teach it all of them.
The birth of compositionality
In 2016, Brenden Lake, a professor of psychology and data science, published a paper in which his team named a set of competencies machines need to master to truly learn and think like humans. One of them was compositionality: the ability to compose or decompose a whole into parts that can be reused. This reuse lets them generalize acquired knowledge to new tasks and situations. “The compositionality phase is when children learn to combine words to explain things. They [initially] learn the names of objects, the names of actions, but those are just single words. When they learn this compositionality concept, their ability to communicate kind of explodes,” Vijayaraghavan explains.
The AI his team built was made for this exact purpose: to see if it would develop compositionality. And it did.
Once the robot learned how certain commands and actions were connected, it also learned to generalize that knowledge to execute commands it never heard before. recognizing the names of actions it had not performed and then performing them on combinations of blocks it had never seen. Vijayaraghavan’s AI figured out the concept of moving something to the right or the left or putting an item on top of something. It could also combine words to name previously unseen actions, like putting a blue block on a red one.
While teaching robots to extract concepts from language has been done before, those efforts were focused on making them understand how words were used to describe visuals. Vijayaragha built on that to include proprioception and action planning, basically adding a layer that integrated sense and movement to the way his robot made sense of the world.
But some issues are yet to overcome. The AI had very limited workspace. The were only a few objects and all had a single, cubical shape. The vocabulary included only names of colors and actions, so no modifiers, adjectives, or adverbs. Finally, the robot had to learn around 80 percent of all possible combinations of nouns and verbs before it could generalize well to the remaining 20 percent. Its performance was worse when those ratios dropped to 60/40 and 40/60.
But it’s possible that just a bit more computing power could fix this. “What we had for this study was a single RTX 3090 GPU, so with the latest generation GPU, we could solve a lot of those issues,” Vijayaraghavan argued. That’s because the team hopes that adding more words and more actions won’t result in a dramatic need for computing power. “We want to scale the system up. We have a humanoid robot with cameras in its head and two hands that can do way more than a single robotic arm. So that’s the next step: using it in the real world with real world robots,” Vijayaraghavan said.
Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.
Enlarge/ Human Neuron, Digital Light Microscope. (Photo By BSIP/Universal Images Group via Getty Images)
BSIP/Universal Images Group via Getty Images
“Language is a huge field, and we are novices in this. We know a lot about how different areas of the brain are involved in linguistic tasks, but the details are not very clear,” says Mohsen Jamali, a computational neuroscience researcher at Harvard Medical School who led a recent study into the mechanism of human language comprehension.
“What was unique in our work was that we were looking at single neurons. There is a lot of studies like that on animals—studies in electrophysiology, but they are very limited in humans. We had a unique opportunity to access neurons in humans,” Jamali adds.
Probing the brain
Jamali’s experiment involved playing recorded sets of words to patients who, for clinical reasons, had implants that monitored the activity of neurons located in their left prefrontal cortex—the area that’s largely responsible for processing language. “We had data from two types of electrodes: the old-fashioned tungsten microarrays that can pick the activity of a few neurons; and the Neuropixel probes which are the latest development in electrophysiology,” Jamali says. The Neuropixels were first inserted in human patients in 2022 and could record the activity of over a hundred neurons.
“So we were in the operation room and asked the patient to participate. We had a mixture of sentences and words, including gibberish sounds that weren’t actual words but sounded like words. We also had a short story about Elvis,” Jamali explains. He said the goal was to figure out if there was some structure to the neuronal response to language. Gibberish words were used as a control to see if the neurons responded to them in a different way.
“The electrodes we used in the study registered voltage—it was a continuous signal at 30 kHz sampling rate—and the critical part was to dissociate how many neurons we had in each recording channel. We used statistical analysis to separate individual neurons in the signal,” Jamali says. Then, his team synchronized the neuronal activity signals with the recordings played to the patients down to a millisecond and started analyzing the data they gathered.
Putting words in drawers
“First, we translated words in our sets to vectors,” Jamali says. Specifically, his team used the Word2Vec, a technique used in computer science to find relationships between words contained in a large corpus of text. What Word2Vec can do is tell if certain words have something in common—if they are synonyms, for example. “Each word was represented by a vector in a 300-dimensional space. Then we just looked at the distance between those vectors and if the distance was close, we concluded the words belonged in the same category,” Jamali explains.
Then the team used these vectors to identify words that clustered together, which suggested they had something in common (something they later confirmed by examining which words were in a cluster together). They then determined whether specific neurons responded differently to different clusters of words. It turned out they did.
“We ended up with nine clusters. We looked at which words were in those clusters and labeled them,” Jamali says. It turned out that each cluster corresponded to a neat semantic domain. Specialized neurons responded to words referring to animals, while other groups responded to words referring to feelings, activities, names, weather, and so on. “Most of the neurons we registered had one preferred domain. Some had more, like two or three,” Jamali explained.
The mechanics of comprehension
The team also tested if the neurons were triggered by the mere sound of a word or by its meaning. “Apart from the gibberish words, another control we used in the study was homophones,” Jamali says. The idea was to test if the neurons responded differently to the word “sun” and the word “son,” for example.
It turned out that the response changed based on context. When the sentence made it clear the word referred to a star, the sound triggered neurons triggered by weather phenomena. When it was clear that the same sound referred to a person, it triggered neurons responsible for relatives. “We also presented the same words at random without any context and found that it didn’t elicit as strong a response as when the context was available,” Jamali claims.
But the language processing in our brains will need to involve more than just different semantic categories being processed by different groups of neurons.
“There are many unanswered questions in linguistic processing. One of them is how much a structure matters, the syntax. Is it represented by a distributed network, or can we find a subset of neurons that encode structure rather than meaning?” Jamali asked. Another thing his team wants to study is what the neural processing looks like during speech production, in addition to comprehension. “How are those two processes related in terms of brain areas and the way the information is processed,” Jamali adds.
The last thing—and according to Jamali the most challenging thing—is using the Neuropixel probes to see how information is processed across different layers of the brain. “The Neuropixel probe travels through the depths of the cortex, and we can look at the neurons along the electrode and say like, ‘OK, the information from this layer, which is responsible for semantics, goes to this layer, which is responsible for something else.’ We want to learn how much information is processed by each layer. This should be challenging, but it would be interesting to see how different areas of the brain are involved at the same time when presented with linguistic stimuli,” Jamali concludes.
Whales use complex communication systems we still don’t understand, a trope exploited in sci-fi shows like Apple TV’s Extrapolations. That show featured a humpback whale (voiced by Meryl Streep) discussing Mahler’s symphonies with a human researcher via some AI-powered inter-species translation app developed in 2046.
We’re a long way from that future. But a team of MIT researchers has now analyzed a database of Caribbean sperm whales’ calls and has found there really is a contextual and combinatorial structure in there. But does it mean whales have a human-like language and we can just wait until Chat GPT 8.0 to figure out how to translate from English to Sperm-Whaleish? Not really.
One-page dictionary
“Sperm whales communicate using clicks. These clicks occur in short packets we call codas that typically last less than two seconds, containing three to 40 clicks,” said Pratyusha Sharma, a researcher at the MIT Computer Science and Artificial Intelligence Laboratory and the lead author of the study. Her team argues that codas are analogues of words in human language and are further organized in coda sequences that are analogues of sentences. “Sperm whales are not born with this communication system; it’s acquired and changes over the course of time,” Sharma said.
Seemingly, sperm whales have a lot to communicate about. Earlier observational studies revealed that they live a fairly complex social life revolving around family units forming larger structures called clans. They also have advanced hunting strategies and do group decision-making, seeking consensus on where to go and what to do.
Despite this complexity in behavior and relationships, their vocabulary seemed surprisingly sparse.
Sharma’s team sourced a record of codas from the dataset of the Dominica Sperm Whale Project, a long-term study on sperm whales that recorded and annotated 8,719 individual codas made by EC-1, a sperm whale clan living in East Caribbean waters. Those 8,719 recorded codas, according to earlier research on this database, were really just 21 coda types that the whales were using over and over.
A set of 21 words didn’t look like much of a language. “But this [number] is exactly what we found was not true,” Sharma said.
Fine-grained changes
“People doing those earlier studies were looking at the calls in isolation… They were annotating these calls, taking them out of context, shuffling them up, and then tried to figure out what kind of patterns were recurring,” Sharma explained. Her team, by contrast, analyzed the same calls in their full context, basically looking at entire exchanges rather than at separate codas. “One of the things we saw was fine-grained changes in the codas that other whales participating in the exchange were noticing and reacting to. If you looked at all these calls out of context, all these fine-grained changes would be lost; they would be considered noise,” Sharma said.
The first of those newly recognized fine-grained changes was termed “rubato,” borrowed from music, where it means introducing slight variations in the tempo of a piece. Communicating sperm whales could stretch or shrink a coda while keeping the same rhythm (where rhythm describes the spacing between the clicks in a coda).
The second feature the researchers discovered was ornamentation. “An ornament is an extra click added at the end of the coda. And when you have this extra click, it marks a critical point, and the call changes. It either happens toward the beginning or at the end of the call,” said Sharma.
The whales could individually manipulate rubato and ornamentation, as well as previously identified rhythm and tempo features. By combining this variation, they can produce a very large variety of codas. “The whales produce way more combinations of these features than 21—the information-carrying capacity of this system is a lot more capable than that,” Sharma said.
Her team identified 18 types of rhythm, three variants of rubato, five types of tempo, and an ability to add an ornament or not in the sperm whale’s communication system. That adds up to 540 possible codas, of which there are roughly 150 these whales frequently used in real life. Not only were sperm whales’ calls built with distinctive units at a coda level (meaning they were combinatorial), but they were compositional in that a call contained multiple codas.
But does that get us any closer to decoding the whale’s language?
“The combinatoriality at the word level and compositionality at the sentence level in human languages is something that looks very similar to what we found,” Sharma said. But the team didn’t determine whether meaning was being conveyed, she added. And without evidence of meaning, we might be barking up the wrong tree entirely.
With all the technological advances humans have made, it may seem like we’ve lost touch with nature—but not all of us have. People in some parts of Africa use a guide more effective than any GPS system when it comes to finding beeswax and honey. This is not a gizmo, but a bird.
The Greater Honeyguide (highly appropriate name), Indicator indicator (even more appropriate scientific name), knows where all the beehives are because it eats beeswax. The Hadza people of Tanzania and Yao people of Mozambique realized this long ago. Hadza and Yao honey hunters have formed a unique relationship with this bird species by making distinct calls, and the honeyguide reciprocates with its own calls, leading them to a hive.
Because the Hadza and Yao calls differ, zoologist Claire Spottiswoode of the University of Cambridge and anthropologist Brian Wood of UCLA wanted to find out if the birds respond generically to human calls, or are attuned to their local humans. They found that the birds are much more likely to respond to a local call, meaning that they have learned to recognize that call.
Come on, get that honey
To see which sound the birds were most likely to respond to, Spottiswoode and Wood played three recordings, starting with the local call. The Yao honeyguide call is what the researchers describe as “a loud trill followed by a grunt (‘brrrr-hm’) while the Hadza call is more of “a melodic whistle,” as they say in a study recently published in Science. The second recording they would play was the foreign call, which would be the Yao call in Hadza territory and vice versa.
The third recording was an unrelated human sound meant to test whether the human voice alone was enough for a honeyguide to follow. Because Hadza and Yao voices sound similar, the researchers would alternate among recordings of honey hunters speaking words such as their names.
So which sounds were the most effective cues for honeyguides to partner with humans? In Tanzania, local Hadza calls were three times more likely to initiate a partnership with a honeyguide than Yao calls or human voices. Local Yao calls were also the most successful in Mozambique, where, in comparison to Hadza calls and human voices, they were twice as likely to elicit a response that would lead to a cooperative effort to search for a beehive. Though honeyguides did sometimes respond to the other sounds, and were often willing to cooperate when hearing them, it became clear that the birds in each region had learned a local cultural tradition that had become just as much a part of their lives as those of the humans who began it.
Now you’re speaking my language
There is a reason that honey hunters in both the Hadza and Yao tribes told Wood and Spottiswoode that they have never changed their calls and will never change them. If they did, they’d be unlikely to gather nearly as much honey.
How did this interspecies communication evolve? Other African cultures besides the Hadza and Yao have their own calls to summon a honeyguide. Why do the types of calls differ? The researchers do not think these calls came about randomly.
Both the Hadza and Yao people have their own unique languages, and sounds from them may have been incorporated into their calls. But there is more to it than that. The Hadza often hunt animals when hunting for honey. Therefore, the Hadza don’t want their calls to be recognized as human, or else the prey they are after might sense a threat and flee. This may be why they use whistles to communicate with honeyguides—by sounding like birds, they can both attract the honeyguides and stalk prey without being detected.
In contrast, the Yao do not hunt mammals, relying mostly on agriculture and fishing for food. This, along with the fact that they try to avoid potentially dangerous creatures such as lions, rhinos, and elephants, and can explain why they use recognizably human vocalizations to call honeyguides. Human voices may scare these animals away, so Yao honey hunters can safely seek honey with their honeyguide partners. These findings show that cultural diversity has had a significant influence on calls to honeyguides.
While animals might not literally speak our language, the honeyguide is just one of many species that has its own way of communicating with us. They can even learn our cultural traditions.
“Cultural traditions of consistent behavior are widespread in non-human animals and could plausibly mediate other forms of interspecies cooperation,” the researchers said in the same study.
Honeyguides start guiding humans as soon as they begin to fly, and this knack, combined with learning to answer traditional calls and collaborate with honey hunters, works well for both human and bird. Maybe they are (in a way) speaking our language.