Computer science

flour,-water,-salt,-github:-the-bread-code-is-a-sourdough-baking-framework

Flour, water, salt, GitHub: The Bread Code is a sourdough baking framework

One year ago, I didn’t know how to bake bread. I just knew how to follow a recipe.

If everything went perfectly, I could turn out something plain but palatable. But should anything change—temperature, timing, flour, Mercury being in Scorpio—I’d turn out a partly poofy pancake. I presented my partly poofy pancakes to people, and they were polite, but those platters were not particularly palatable.

During a group vacation last year, a friend made fresh sourdough loaves every day, and we devoured it. He gladly shared his knowledge, his starter, and his go-to recipe. I took it home, tried it out, and made a naturally leavened, artisanal pancake.

I took my confusion to YouTube, where I found Hendrik Kleinwächter’s “The Bread Code” channel and his video promising a course on “Your First Sourdough Bread.” I watched and learned a lot, but I couldn’t quite translate 30 minutes of intensive couch time to hours of mixing, raising, slicing, and baking. Pancakes, part three.

It felt like there had to be more to this. And there was—a whole GitHub repository more.

The Bread Code gave Kleinwächter a gratifying second career, and it’s given me bread I’m eager to serve people. This week alone, I’m making sourdough Parker House rolls, a rosemary olive loaf for Friendsgiving, and then a za’atar flatbread and standard wheat loaf for actual Thanksgiving. And each of us has learned more about perhaps the most important aspect of coding, bread, teaching, and lots of other things: patience.

Hendrik Kleinwächter on his Bread Code channel, explaining his book.

Resources, not recipes

The Bread Code is centered around a book, The Sourdough Framework. It’s an open source codebase that self-compiles into new LaTeX book editions and is free to read online. It has one real bread loaf recipe, if you can call a 68-page middle-section journey a recipe. It has 17 flowcharts, 15 tables, and dozens of timelines, process illustrations, and photos of sourdough going both well and terribly. Like any cookbook, there’s a bit about Kleinwächter’s history with this food, and some sourdough bread history. Then the reader is dropped straight into “How Sourdough Works,” which is in no way a summary.

“To understand the many enzymatic reactions that take place when flour and water are mixed, we must first understand seeds and their role in the lifecycle of wheat and other grains,” Kleinwächter writes. From there, we follow a seed through hibernation, germination, photosynthesis, and, through humans’ grinding of these seeds, exposure to amylase and protease enzymes.

I had arrived at this book with these specific loaf problems to address. But first, it asks me to consider, “What is wheat?” This sparked vivid memories of Computer Science 114, in which a professor, asked to troubleshoot misbehaving code, would instead tell students to “Think like a compiler,” or “Consider the recursive way to do it.”

And yet, “What is wheat” did help. Having a sense of what was happening inside my starter, and my dough (which is really just a big, slow starter), helped me diagnose what was going right or wrong with my breads. Extra-sticky dough and tightly arrayed holes in the bread meant I had let the bacteria win out over the yeast. I learned when to be rough with the dough to form gluten and when to gently guide it into shape to preserve its gas-filled form.

I could eat a slice of each loaf and get a sense of how things had gone. The inputs, outputs, and errors could be ascertained and analyzed more easily than in my prior stance, which was, roughly, “This starter is cursed and so am I.” Using hydration percentages, measurements relative to protein content, a few tests, and troubleshooting steps, I could move closer to fresh, delicious bread. Framework: accomplished.

I have found myself very grateful lately that Kleinwächter did not find success with 30-minute YouTube tutorials. Strangely, so has he.

Sometimes weird scoring looks pretty neat. Kevin Purdy

The slow bread of childhood dreams

“I have had some successful startups; I have also had disastrous startups,” Kleinwächter said in an interview. “I have made some money, then I’ve been poor again. I’ve done so many things.”

Most of those things involve software. Kleinwächter is a German full-stack engineer, and he has founded firms and worked at companies related to blogging, e-commerce, food ordering, travel, and health. He tried to escape the boom-bust startup cycle by starting his own digital agency before one of his products was acquired by hotel booking firm Trivago. After that, he needed a break—and he could afford to take one.

“I went to Naples, worked there in a pizzeria for a week, and just figured out, ‘What do I want to do with my life?’ And I found my passion. My passion is to teach people how to make amazing bread and pizza at home,” Kleinwächter said.

Kleinwächter’s formative bread experiences—weekend loaves baked by his mother, awe-inspiring pizza from Italian ski towns, discovering all the extra ingredients in a supermarket’s version of the dark Schwarzbrot—made him want to bake his own. Like me, he started with recipes, and he wasted a lot of time and flour turning out stuff that produced both failures and a drive for knowledge. He dug in, learned as much as he could, and once he had his head around the how and why, he worked on a way to guide others along the path.

Bugs and syntax errors in baking

When using recipes, there’s a strong, societally reinforced idea that there is one best, tested, and timed way to arrive at a finished food. That’s why we have America’s Test Kitchen, The Food Lab, and all manner of blogs and videos promoting food “hacks.” I should know; I wrote up a whole bunch of them as a young Lifehacker writer. I’m still a fan of such things, from the standpoint of simply getting food done.

As such, the ultimate “hack” for making bread is to use commercial yeast, i.e., dried “active” or “instant” yeast. A manufacturer has done the work of selecting and isolating yeast at its prime state and preserving it for you. Get your liquids and dough to a yeast-friendly temperature and you’ve removed most of the variables; your success should be repeatable. If you just want bread, you can make the iconic no-knead bread with prepared yeast and very little intervention, and you’ll probably get bread that’s better than you can get at the grocery store.

Baking sourdough—or “naturally leavened,” or with “levain”—means a lot of intervention. You are cultivating and maintaining a small ecosystem of yeast and bacteria, unleashing them onto flour, water, and salt, and stepping in after they’ve produced enough flavor and lift—but before they eat all the stretchy gluten bonds. What that looks like depends on many things: your water, your flours, what you fed your starter, how active it was when you added it, the air in your home, and other variables. Most important is your ability to notice things over long periods of time.

When things go wrong, debugging can be tricky. I was able to personally ask Kleinwächter what was up with my bread, because I was interviewing him for this article. There were many potential answers, including:

  • I should recognize, first off, that I was trying to bake the hardest kind of bread: Freestanding wheat-based sourdough
  • You have to watch—and smell—your starter to make sure it has the right mix of yeast to bacteria before you use it
  • Using less starter (lower “inoculation”) would make it easier not to over-ferment
  • Eyeballing my dough rise in a bowl was hard; try measuring a sample in something like an aliquot tube
  • Winter and summer are very different dough timings, even with modern indoor climate control.

But I kept with it. I was particularly susceptible to wanting things to go quicker and demanding to see a huge rise in my dough before baking. This ironically leads to the flattest results, as the bacteria eats all the gluten bonds. When I slowed down, changed just one thing at a time, and looked deeper into my results, I got better.

Screenshot of Kleinwaechter's YouTube page, with video titles like

The Bread Code YouTube page and the ways in which one must cater to algorithms.

Credit: The Bread Code

The Bread Code YouTube page and the ways in which one must cater to algorithms. Credit: The Bread Code

YouTube faces and TikTok sausage

Emailing and trading video responses with Kleinwächter, I got the sense that he, too, has learned to go the slow, steady route with his Bread Code project.

For a while, he was turning out YouTube videos, and he wanted them to work. “I’m very data-driven and very analytical. I always read the video metrics, and I try to optimize my videos,” Kleinwächter said. “Which means I have to use a clickbait title, and I have to use a clickbait-y thumbnail, plus I need to make sure that I catch people in the first 30 seconds of the video.” This, however, is “not good for us as humans because it leads to more and more extreme content.”

Kleinwächter also dabbled in TikTok, making videos in which, leaning into his German heritage, “the idea was to turn everything into a sausage.” The metrics and imperatives on TikTok were similar to those on YouTube but hyperscaled. He could put hours or days into a video, only for 1 percent of his 200,000 YouTube subscribers to see it unless he caught the algorithm wind.

The frustrations inspired him to slow down and focus on his site and his book. With his community’s help, The Bread Code has just finished its second Kickstarter-backed printing run of 2,000 copies. There’s a Discord full of bread heads eager to diagnose and correct each other’s loaves and occasional pull requests from inspired readers. Kleinwächter has seen people go from buying what he calls “Turbo bread” at the store to making their own, and that’s what keeps him going. He’s not gambling on an attention-getting hit, but he’s in better control of how his knowledge and message get out.

“I think homemade bread is something that’s super, super undervalued, and I see a lot of benefits to making it yourself,” Kleinwächter said. “Good bread just contains flour, water, and salt—nothing else.”

Loaf that is split across the middle-top, with flecks of olives showing.

A test loaf of rosemary olive sourdough bread. An uneven amount of olive bits ended up on the top and bottom, because there is always more to learn.

Credit: Kevin Purdy

A test loaf of rosemary olive sourdough bread. An uneven amount of olive bits ended up on the top and bottom, because there is always more to learn. Credit: Kevin Purdy

You gotta keep doing it—that’s the hard part

I can’t say it has been entirely smooth sailing ever since I self-certified with The Bread Code framework. I know what level of fermentation I’m aiming for, but I sometimes get home from an outing later than planned, arriving at dough that’s trying to escape its bucket. My starter can be very temperamental when my house gets dry and chilly in the winter. And my dough slicing (scoring), being the very last step before baking, can be rushed, resulting in some loaves with weird “ears,” not quite ready for the bakery window.

But that’s all part of it. Your sourdough starter is a collection of organisms that are best suited to what you’ve fed them, developed over time, shaped by their environment. There are some modern hacks that can help make good bread, like using a pH meter. But the big hack is just doing it, learning from it, and getting better at figuring out what’s going on. I’m thankful that folks like Kleinwächter are out there encouraging folks like me to slow down, hack less, and learn more.

Flour, water, salt, GitHub: The Bread Code is a sourdough baking framework Read More »

qubit-that-makes-most-errors-obvious-now-available-to-customers

Qubit that makes most errors obvious now available to customers


Can a small machine that makes error correction easier upend the market?

A graphic representation of the two resonance cavities that can hold photons, along with a channel that lets the photon move between them. Credit: Quantum Circuits

We’re nearing the end of the year, and there are typically a flood of announcements regarding quantum computers around now, in part because some companies want to live up to promised schedules. Most of these involve evolutionary improvements on previous generations of hardware. But this year, we have something new: the first company to market with a new qubit technology.

The technology is called a dual-rail qubit, and it is intended to make the most common form of error trivially easy to detect in hardware, thus making error correction far more efficient. And, while tech giant Amazon has been experimenting with them, a startup called Quantum Circuits is the first to give the public access to dual-rail qubits via a cloud service.

While the tech is interesting on its own, it also provides us with a window into how the field as a whole is thinking about getting error-corrected quantum computing to work.

What’s a dual-rail qubit?

Dual-rail qubits are variants of the hardware used in transmons, the qubits favored by companies like Google and IBM. The basic hardware unit links a loop of superconducting wire to a tiny cavity that allows microwave photons to resonate. This setup allows the presence of microwave photons in the resonator to influence the behavior of the current in the wire and vice versa. In a transmon, microwave photons are used to control the current. But there are other companies that have hardware that does the reverse, controlling the state of the photons by altering the current.

Dual-rail qubits use two of these systems linked together, allowing photons to move from the resonator to the other. Using the superconducting loops, it’s possible to control the probability that a photon will end up in the left or right resonator. The actual location of the photon will remain unknown until it’s measured, allowing the system as a whole to hold a single bit of quantum information—a qubit.

This has an obvious disadvantage: You have to build twice as much hardware for the same number of qubits. So why bother? Because the vast majority of errors involve the loss of the photon, and that’s easily detected. “It’s about 90 percent or more [of the errors],” said Quantum Circuits’ Andrei Petrenko. “So it’s a huge advantage that we have with photon loss over other errors. And that’s actually what makes the error correction a lot more efficient: The fact that photon losses are by far the dominant error.”

Petrenko said that, without doing a measurement that would disrupt the storage of the qubit, it’s possible to determine if there is an odd number of photons in the hardware. If that isn’t the case, you know an error has occurred—most likely a photon loss (gains of photons are rare but do occur). For simple algorithms, this would be a signal to simply start over.

But it does not eliminate the need for error correction if we want to do more complex computations that can’t make it to completion without encountering an error. There’s still the remaining 10 percent of errors, which are primarily something called a phase flip that is distinct to quantum systems. Bit flips are even more rare in dual-rail setups. Finally, simply knowing that a photon was lost doesn’t tell you everything you need to know to fix the problem; error-correction measurements of other parts of the logical qubit are still needed to fix any problems.

The layout of the new machine. Each qubit (gray square) involves a left and right resonance chamber (blue dots) that a photon can move between. Each of the qubits has connections that allow entanglement with its nearest neighbors. Credit: Quantum Circuits

In fact, the initial hardware that’s being made available is too small to even approach useful computations. Instead, Quantum Circuits chose to link eight qubits with nearest-neighbor connections in order to allow it to host a single logical qubit that enables error correction. Put differently: this machine is meant to enable people to learn how to use the unique features of dual-rail qubits to improve error correction.

One consequence of having this distinctive hardware is that the software stack that controls operations needs to take advantage of its error detection capabilities. None of the other hardware on the market can be directly queried to determine whether it has encountered an error. So, Quantum Circuits has had to develop its own software stack to allow users to actually benefit from dual-rail qubits. Petrenko said that the company also chose to provide access to its hardware via its own cloud service because it wanted to connect directly with the early adopters in order to better understand their needs and expectations.

Numbers or noise?

Given that a number of companies have already released multiple revisions of their quantum hardware and have scaled them into hundreds of individual qubits, it may seem a bit strange to see a company enter the market now with a machine that has just a handful of qubits. But amazingly, Quantum Circuits isn’t alone in planning a relatively late entry into the market with hardware that only hosts a few qubits.

Having talked with several of them, there is a logic to what they’re doing. What follows is my attempt to convey that logic in a general form, without focusing on any single company’s case.

Everyone agrees that the future of quantum computation is error correction, which requires linking together multiple hardware qubits into a single unit termed a logical qubit. To get really robust, error-free performance, you have two choices. One is to devote lots of hardware qubits to the logical qubit, so you can handle multiple errors at once. Or you can lower the error rate of the hardware, so that you can get a logical qubit with equivalent performance while using fewer hardware qubits. (The two options aren’t mutually exclusive, and everyone will need to do a bit of both.)

The two options pose very different challenges. Improving the hardware error rate means diving into the physics of individual qubits and the hardware that controls them. In other words, getting lasers that have fewer of the inevitable fluctuations in frequency and energy. Or figuring out how to manufacture loops of superconducting wire with fewer defects or handle stray charges on the surface of electronics. These are relatively hard problems.

By contrast, scaling qubit count largely involves being able to consistently do something you already know how to do. So, if you already know how to make good superconducting wire, you simply need to make a few thousand instances of that wire instead of a few dozen. The electronics that will trap an atom can be made in a way that will make it easier to make them thousands of times. These are mostly engineering problems, and generally of similar complexity to problems we’ve already solved to make the electronics revolution happen.

In other words, within limits, scaling is a much easier problem to solve than errors. It’s still going to be extremely difficult to get the millions of hardware qubits we’d need to error correct complex algorithms on today’s hardware. But if we can get the error rate down a bit, we can use smaller logical qubits and might only need 10,000 hardware qubits, which will be more approachable.

Errors first

And there’s evidence that even the early entries in quantum computing have reasoned the same way. Google has been working iterations of the same chip design since its 2019 quantum supremacy announcement, focusing on understanding the errors that occur on improved versions of that chip. IBM made hitting the 1,000 qubit mark a major goal but has since been focused on reducing the error rate in smaller processors. Someone at a quantum computing startup once told us it would be trivial to trap more atoms in its hardware and boost the qubit count, but there wasn’t much point in doing so given the error rates of the qubits on the then-current generation machine.

The new companies entering this market now are making the argument that they have a technology that will either radically reduce the error rate or make handling the errors that do occur much easier. Quantum Circuits clearly falls into the latter category, as dual-rail qubits are entirely about making the most common form of error trivial to detect. The former category includes companies like Oxford Ionics, which has indicated it can perform single-qubit gates with a fidelity of over 99.9991 percent. Or Alice & Bob, which stores qubits in the behavior of multiple photons in a single resonance cavity, making them very robust to the loss of individual photons.

These companies are betting that they have distinct technology that will let them handle error rate issues more effectively than established players. That will lower the total scaling they need to do, and scaling will be an easier problem overall—and one that they may already have the pieces in place to handle. Quantum Circuits’ Petrenko, for example, told Ars, “I think that we’re at the point where we’ve gone through a number of iterations of this qubit architecture where we’ve de-risked a number of the engineering roadblocks.” And Oxford Ionics told us that if they could make the electronics they use to trap ions in their hardware once, it would be easy to mass manufacture them.

None of this should imply that these companies will have it easy compared to a startup that already has experience with both reducing errors and scaling, or a giant like Google or IBM that has the resources to do both. But it does explain why, even at this stage in quantum computing’s development, we’re still seeing startups enter the field.

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Qubit that makes most errors obvious now available to customers Read More »

microsoft-and-atom-computing-combine-for-quantum-error-correction-demo

Microsoft and Atom Computing combine for quantum error correction demo


New work provides a good view of where the field currently stands.

The first-generation tech demo of Atom’s hardware. Things have progressed considerably since. Credit: Atom Computing

In September, Microsoft made an unusual combination of announcements. It demonstrated progress with quantum error correction, something that will be needed for the technology to move much beyond the interesting demo phase, using hardware from a quantum computing startup called Quantinuum. At the same time, however, the company also announced that it was forming a partnership with a different startup, Atom Computing, which uses a different technology to make qubits available for computations.

Given that, it was probably inevitable that the folks in Redmond, Washington, would want to show that similar error correction techniques would also work with Atom Computing’s hardware. It didn’t take long, as the two companies are releasing a draft manuscript describing their work on error correction today. The paper serves as both a good summary of where things currently stand in the world of error correction, as well as a good look at some of the distinct features of computation using neutral atoms.

Atoms and errors

While we have various technologies that provide a way of storing and manipulating bits of quantum information, none of them can be operated error-free. At present, errors make it difficult to perform even the simplest computations that are clearly beyond the capabilities of classical computers. More sophisticated algorithms would inevitably encounter an error before they could be completed, a situation that would remain true even if we could somehow improve the hardware error rates of qubits by a factor of 1,000—something we’re unlikely to ever be able to do.

The solution to this is to use what are called logical qubits, which distribute quantum information across multiple hardware qubits and allow the detection and correction of errors when they occur. Since multiple qubits get linked together to operate as a single logical unit, the hardware error rate still matters. If it’s too high, then adding more hardware qubits just means that errors will pop up faster than they can possibly be corrected.

We’re now at the point where, for a number of technologies, hardware error rates have passed the break-even point, and adding more hardware qubits can lower the error rate of a logical qubit based on them. This was demonstrated using neutral atom qubits by an academic lab at Harvard University about a year ago. The new manuscript demonstrates that it also works on a commercial machine from Atom Computing.

Neutral atoms, which can be held in place using a lattice of laser light, have a number of distinct advantages when it comes to quantum computing. Every single atom will behave identically, meaning that you don’t have to manage the device-to-device variability that’s inevitable with fabricated electronic qubits. Atoms can also be moved around, allowing any atom to be entangled with any other. This any-to-any connectivity can enable more efficient algorithms and error-correction schemes. The quantum information is typically stored in the spin of the atom’s nucleus, which is shielded from environmental influences by the cloud of electrons that surround it, making them relatively long-lived qubits.

Operations, including gates and readout, are performed using lasers. The way the physics works, the spacing of the atoms determines how the laser affects them. If two atoms are a critical distance apart, the laser can perform a single operation, called a two-qubit gate, that affects both of their states. Anywhere outside this distance, and a laser only affects each atom individually. This allows a fine control over gate operations.

That said, operations are relatively slow compared to some electronic qubits, and atoms can occasionally be lost entirely. The optical traps that hold atoms in place are also contingent upon the atom being in its ground state; if any atom ends up stuck in a different state, it will be able to drift off and be lost. This is actually somewhat useful, in that it converts an unexpected state into a clear error.

Image of a grid of dots arranged in sets of parallel vertical rows. There is a red bar across the top, and a green bar near the bottom of the grid.

Atom Computing’s system. Rows of atoms are held far enough apart so that a single laser sent across them (green bar) only operates on individual atoms. If the atoms are moved to the interaction zone (red bar), a laser can perform gates on pairs of atoms. Spaces where atoms can be held can be left empty to avoid performing unneeded operations. Credit: Reichardt, et al.

The machine used in the new demonstration hosts 256 of these neutral atoms. Atom Computing has them arranged in sets of parallel rows, with space in between to let the atoms be shuffled around. For single-qubit gates, it’s possible to shine a laser across the rows, causing every atom it touches to undergo that operation. For two-qubit gates, pairs of atoms get moved to the end of the row and moved a specific distance apart, at which point a laser will cause the gate to be performed on every pair present.

Atom’s hardware also allows a constant supply of new atoms to be brought in to replace any that are lost. It’s also possible to image the atom array in between operations to determine whether any atoms have been lost and if any are in the wrong state.

It’s only logical

As a general rule, the more hardware qubits you dedicate to each logical qubit, the more simultaneous errors you can identify. This identification can enable two ways of handling the error. In the first, you simply discard any calculation with an error and start over. In the second, you can use information about the error to try to fix it, although the repair involves additional operations that can potentially trigger a separate error.

For this work, the Microsoft/Atom team used relatively small logical qubits (meaning they used very few hardware qubits), which meant they could fit more of them within 256 total hardware qubits the machine made available. They also checked the error rate of both error detection with discard and error detection with correction.

The research team did two main demonstrations. One was placing 24 of these logical qubits into what’s called a cat state, named after Schrödinger’s hypothetical feline. This is when a quantum object simultaneously has non-zero probability of being in two mutually exclusive states. In this case, the researchers placed 24 logical qubits in an entangled cat state, the largest ensemble of this sort yet created. Separately, they implemented what’s called the Bernstein-Vazirani algorithm. The classical version of this algorithm requires individual queries to identify each bit in a string of them; the quantum version obtains the entire string with a single query, so is a notable case of something where a quantum speedup is possible.

Both of these showed a similar pattern. When done directly on the hardware, with each qubit being a single atom, there was an appreciable error rate. By detecting errors and discarding those calculations where they occurred, it was possible to significantly improve the error rate of the remaining calculations. Note that this doesn’t eliminate errors, as it’s possible for multiple errors to occur simultaneously, altering the value of the qubit without leaving an indication that can be spotted with these small logical qubits.

Discarding has its limits; as calculations become increasingly complex, involving more qubits or operations, it will inevitably mean every calculation will have an error, so you’d end up wanting to discard everything. Which is why we’ll ultimately need to correct the errors.

In these experiments, however, the process of correcting the error—taking an entirely new atom and setting it into the appropriate state—was also error-prone. So, while it could be done, it ended up having an overall error rate that was intermediate between the approach of catching and discarding errors and the rate when operations were done directly on the hardware.

In the end, the current hardware has an error rate that’s good enough that error correction actually improves the probability that a set of operations can be performed without producing an error. But not good enough that we can perform the sort of complex operations that would lead quantum computers to have an advantage in useful calculations. And that’s not just true for Atom’s hardware; similar things can be said for other error-correction demonstrations done on different machines.

There are two ways to go beyond these current limits. One is simply to improve the error rates of the hardware qubits further, as fewer total errors make it more likely that we can catch and correct them. The second is to increase the qubit counts so that we can host larger, more robust logical qubits. We’re obviously going to need to do both, and Atom’s partnership with Microsoft was formed in the hope that it will help both companies get there faster.

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Microsoft and Atom Computing combine for quantum error correction demo Read More »

ibm-boosts-the-amount-of-computation-you-can-get-done-on-quantum-hardware

IBM boosts the amount of computation you can get done on quantum hardware

By making small adjustments to the frequency that the qubits are operating at, it’s possible to avoid these problems. This can be done when the Heron chip is being calibrated before it’s opened for general use.

Separately, the company has done a rewrite of the software that controls the system during operations. “After learning from the community, seeing how to run larger circuits, [we were able to] almost better define what it should be and rewrite the whole stack towards that,” Gambetta said. The result is a dramatic speed-up. “Something that took 122 hours now is down to a couple of hours,” he told Ars.

Since people are paying for time on this hardware, that’s good for customers now. However,  it could also pay off in the longer run, as some errors can occur randomly, so less time spent on a calculation can mean fewer errors.

Deeper computations

Despite all those improvements, errors are still likely during any significant calculations. While it continues to work toward developing error-corrected qubits, IBM is focusing on what it calls error mitigation, which it first detailed last year. As we described it then:

“The researchers turned to a method where they intentionally amplified and then measured the processor’s noise at different levels. These measurements are used to estimate a function that produces similar output to the actual measurements. That function can then have its noise set to zero to produce an estimate of what the processor would do without any noise at all.”

The problem here is that using the function is computationally difficult, and the difficulty increases with the qubit count. So, while it’s still easier to do error mitigation calculations than simulate the quantum computer’s behavior on the same hardware, there’s still the risk of it becoming computationally intractable. But IBM has also taken the time to optimize that, too. “They’ve got algorithmic improvements, and the method that uses tensor methods [now] uses the GPU,” Gambetta told Ars. “So I think it’s a combination of both.”

IBM boosts the amount of computation you can get done on quantum hardware Read More »

google-identifies-low-noise-“phase-transition”-in-its-quantum-processor

Google identifies low noise “phase transition” in its quantum processor


Noisy, but not that noisy

Benchmark may help us understand how quantum computers can operate with low error.

Image of a chip above iridescent wiring.

Google’s Sycamore processor. Credit: Google

Back in 2019, Google made waves by claiming it had achieved what has been called “quantum supremacy”—the ability of a quantum computer to perform operations that would take a wildly impractical amount of time to simulate on standard computing hardware. That claim proved to be controversial, in that the operations were little more than a benchmark that involved getting the quantum computer to behave like a quantum computer; separately, improved ideas about how to perform the simulation on a supercomputer cut the time required down significantly.

But Google is back with a new exploration of the benchmark, described in a paper published in Nature on Wednesday. It uses the benchmark to identify what it calls a phase transition in the performance of its quantum processor and uses it to identify conditions where the processor can operate with low noise. Taking advantage of that, they again show that, even giving classical hardware every potential advantage, it would take a supercomputer a dozen years to simulate things.

Cross entropy benchmarking

The benchmark in question involves the performance of what are called quantum random circuits, which involves performing a set of operations on qubits and letting the state of the system evolve over time, so that the output depends heavily on the stochastic nature of measurement outcomes in quantum mechanics. Each qubit will have a probability of producing one of two results, but unless that probability is one, there’s no way of knowing which of the results you’ll actually get. As a result, the output of the operations will be a string of truly random bits.

If enough qubits are involved in the operations, then it becomes increasingly difficult to simulate the performance of a quantum random circuit on classical hardware. That difficulty is what Google originally used to claim quantum supremacy.

The big challenge with running quantum random circuits on today’s hardware is the inevitability of errors. And there’s a specific approach, called cross-entropy benchmarking, that relates the performance of quantum random circuits to the overall fidelity of the hardware (meaning its ability to perform error-free operations).

Google Principal Scientist Sergio Boixo likened performing quantum random circuits to a race between trying to build the circuit and errors that would destroy it. “In essence, this is a competition between quantum correlations spreading because you’re entangling, and random circuits entangle as fast as possible,” he told Ars. “We use two qubit gates that entangle as fast as possible. So it’s a competition between correlations or entanglement growing as fast as you want. On the other hand, noise is doing the opposite. Noise is killing correlations, it’s killing the growth of correlations. So these are the two tendencies.”

The focus of the paper is using the cross-entropy benchmark to explore the errors that occur on the company’s latest generation of Sycamore chip and use that to identify the transition point between situations where errors dominate, and what the paper terms a “low noise regime,” where the probability of errors are minimized—where entanglement wins the race. The researchers likened this to a phase transition between two states.

Low noise performance

The researchers used a number of methods to identify the location of this phase transition, including numerical estimates of the system’s behavior and experiments using the Sycamore processor. Boixo explained that the transition point is related to the errors per cycle, with each cycle involving performing an operation on all of the qubits involved. So, the total number of qubits being used influences the location of the transition, since more qubits means more operations to perform. But so does the overall error rate on the processor.

If you want to operate in the low noise regime, then you have to limit the number of qubits involved (which has the side effect of making things easier to simulate on classical hardware). The only way to add more qubits is to lower the error rate. While the Sycamore processor itself had a well-understood minimal error rate, Google could artificially increase that error rate and then gradually lower it to explore Sycamore’s behavior at the transition point.

The low noise regime wasn’t error free; each operation still has the potential for error, and qubits will sometimes lose their state even when sitting around doing nothing. But this error rate could be estimated using the cross-entropy benchmark to explore the system’s overall fidelity. That wasn’t the case beyond the transition point, where errors occurred quickly enough that they would interrupt the entanglement process.

When this occurs, the result is often two separate, smaller entangled systems, each of which were subject to the Sycamore chip’s base error rates. The researchers simulated this by creating two distinct clusters of entangled qubits that could be entangled with each other by a single operation, allowing them to turn entanglement on and off at will. They showed that this behavior allowed a classical computer to spoof the overall behavior by breaking the computation up into two manageable chunks.

Ultimately, they used their characterization of the phase transition to identify the maximum number of qubits they could keep in the low noise regime given the Sycamore processor’s base error rate and then performed a million random circuits on them. While this is relatively easy to do on quantum hardware, even assuming that we could build a supercomputer without bandwidth constraints, simulating it would take roughly 10,000 years on an existing supercomputer (the Frontier system). Allowing all of the system’s storage to operate as secondary memory cut the estimate down to 12 years.

What does this tell us?

Boixo emphasized that the value of the work isn’t really based on the value of performing random quantum circuits. Truly random bit strings might be useful in some contexts, but he emphasized that the real benefit here is a better understanding of the noise level that can be tolerated in quantum algorithms more generally. Since this benchmark is designed to make it as easy as possible to outperform classical computations, you would need the best standard computers here to have any hope of beating them to the answer for more complicated problems.

“Before you can do any other application, you need to win on this benchmark,” Boixo said. “If you are not winning on this benchmark, then you’re not winning on any other benchmark. This is the easiest thing for a noisy quantum computer compared to a supercomputer.”

Knowing how to identify this phase transition, he suggested, will also be helpful for anyone trying to run useful computations on today’s processors. “As we define the phase, it opens the possibility for finding applications in that phase on noisy quantum computers, where they will outperform classical computers,” Boixo said.

Implicit in this argument is an indication of why Google has focused on iterating on a single processor design even as many of its competitors have been pushing to increase qubit counts rapidly. If this benchmark indicates that you can’t get all of Sycamore’s qubits involved in the simplest low-noise regime calculation, then it’s not clear whether there’s a lot of value in increasing the qubit count. And the only way to change that is to lower the base error rate of the processor, so that’s where the company’s focus has been.

All of that, however, assumes that you hope to run useful calculations on today’s noisy hardware qubits. The alternative is to use error-corrected logical qubits, which will require major increases in qubit count. But Google has been seeing similar limitations due to Sycamore’s base error rate in tests that used it to host an error-corrected logical qubit, something we hope to return to in future coverage.

Nature, 2024. DOI: 10.1038/s41586-024-07998-6  (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Google identifies low noise “phase transition” in its quantum processor Read More »

ibm-opens-its-quantum-computing-stack-to-third-parties

IBM opens its quantum-computing stack to third parties

Image of a large collection of copper-colored metal plates and wires, all surrounding a small, black piece of silicon.

Enlarge / The small quantum processor (center) surrounded by cables that carry microwave signals to it, and the refrigeration hardware.

As we described earlier this year, operating a quantum computer will require a significant investment in classical computing resources, given the amount of measurements and control operations that need to be executed and interpreted. That means that operating a quantum computer will also require a software stack to control and interpret the flow of information from the quantum side.

But software also gets involved well before anything gets executed. While it’s possible to execute algorithms on quantum hardware by defining the full set of commands sent to the hardware, most users are going to want to focus on algorithm development, rather than the details of controlling any single piece of quantum hardware. “If everyone’s got to get down and know what the noise is, [use] performance management tools, they’ve got to know how to compile a quantum circuit through hardware, you’ve got to become an expert in too much to be able to do the algorithm discovery,” said IBM’s Jay Gambetta. So, part of the software stack that companies are developing to control their quantum hardware includes software that converts abstract representations of quantum algorithms into the series of commands needed to execute them.

IBM’s version of this software is called Qiskit (although it was made open source and has since been adopted by other companies). Recently, IBM made a couple of announcements regarding Qiskit, both benchmarking it in comparison to other software stacks and opening it up to third-party modules. We’ll take a look at what software stacks do before getting into the details of what’s new.

What’s the software stack do?

It’s tempting to view IBM’s Qiskit as the equivalent of a compiler. And at the most basic level, that’s a reasonable analogy, in that it takes algorithms defined by humans and converts them to things that can be executed by hardware. But there are significant differences in the details. A compiler for a classical computer produces code that the computer’s processor converts to internal instructions that are used to configure the processor hardware and execute operations.

Even when using what’s termed “machine language,” programmers don’t directly control the hardware; programmers have no control over where on the hardware things are executed (ie, which processor or execution unit within that processor), or even the order instructions are executed in.

Things are very different for quantum computers, at least at present. For starters, everything that happens on the processor is controlled by external hardware, which typically act by generating a series of laser or microwave pulses. So, software like IBM’s Qiskit or Microsoft’s Q# act by converting the code they’re given into commands that are sent to hardware that’s external to the processor.

These “compilers” must also keep track of exactly which part of the processor things are happening on. Quantum computers act by performing specific operations (called gates) on individual or pairs of qubits; to do that, you have to know exactly which qubit you’re addressing. And, for things like superconducting qubits, where there can be device-to-device variations, which hardware qubits you end up using can have a significant effect on the outcome of the calculations.

As a result, most things like Qiskit provide the option of directly addressing the hardware. If a programmer chooses not to, however, the software can transform generic instructions into a precise series of actions that will execute whatever algorithm has been encoded. That involves the software stack making choices about which physical qubits to use, what gates and measurements to execute, and what order to execute them in.

The role of the software stack, however, is likely to expand considerably over the next few years. A number of companies are experimenting with hardware qubit designs that can flag when one type of common error occurs, and there has been progress with developing logical qubits that enable error correction. Ultimately, any company providing access to quantum computers will want to modify its software stack so that these features are enabled without requiring effort on the part of the people designing the algorithms.

IBM opens its quantum-computing stack to third parties Read More »

we-can-now-watch-grace-hopper’s-famed-1982-lecture-on-youtube

We can now watch Grace Hopper’s famed 1982 lecture on YouTube

Amazing Grace —

The lecture featured Hopper discussing future challenges of protecting information.

Rear Admiral Grace Hopper on Future Possibilities: Data, Hardware, Software, and People (Part One, 1982).

The late Rear Admiral Grace Hopper was a gifted mathematician and undisputed pioneer in computer programming, honored posthumously in 2016 with the Presidential Medal of Freedom. She was also very much in demand as a speaker in her later career. Hopper’s famous 1982 lecture on “Future Possibilities: Data, Hardware, Software, and People,” has long been publicly unavailable because of the obsolete media on which it was recorded. The National Archives and Records Administration (NARA) finally managed to retrieve the footage for the National Security Agency (NSA), which posted the lecture in two parts on YouTube (Part One embedded above, Part Two embedded below).

Hopper earned undergraduate degrees in math and physics from Vassar College and a PhD in math from Yale in 1930. She returned to Vassar as a professor, but when World War II broke out, she sought to enlist in the US Naval Reserve. She was initially denied on the basis of her age (34) and low weight-to-height ratio, and also because her expertise elsewhere made her particularly valuable to the war effort. Hopper got an exemption, and after graduating first in her class, she joined the Bureau of Ships Computation Project at Harvard University, where she served on the Mark I computer programming staff under Howard H. Aiken.

She stayed with the lab until 1949 and was next hired as a senior mathematician by Eckert-Mauchly Computer Corporation to develop the Universal Automatic Computer, or UNIVAC, the first computer. Hopper championed the development of a new programming language based on English words. “It’s much easier for most people to write an English statement than it is to use symbols,” she reasoned. “So I decided data processors ought to be able to write their programs in English and the computers would translate them into machine code.”

Her superiors were skeptical, but Hopper persisted, publishing papers on what became known as compilers. When Remington Rand took over the company, she created her first A-0 compiler. This early achievement would one day lead to the development of COBOL for data processors, which is still the major programming language used today.

“Grandma COBOL”

In November 1952, the UNIVAC was introduced to America by CBS news anchor Walter Cronkite as the presidential election results rolled in. Hopper and the rest of her team had worked tirelessly to input voting statistics from earlier elections and write the code that would allow the calculator to extrapolate the election results based on previous races. National pollsters predicted Adlai Stevenson II would win, while the UNIVAC group predicted a landslide for Dwight D. Eisenhower. UNIVAC’s prediction proved to be correct: Eisenhower won over 55 percent of the popular vote with an electoral margin of 442 to 89.  

Hopper retired at age 60 from the Naval Reserve in 1966 with the rank of commander but was subsequently recalled to active duty for many more years, thanks to congressional special approval allowing her to remain beyond the mandatory retirement age. She was promoted to commodore in 1983, a rank that was renamed “rear admiral” two years later, and Rear Admiral Grace Hopper finally retired permanently in 1986. But she didn’t stop working: She became a senior consultant to Digital Equipment Corporation and “goodwill ambassador,” giving public lectures at various computer-related events.

One of Hopper’s best-known lectures was delivered to NSA employees in August 1982. According to a National Security Agency press release, the footage had been preserved in a defunct media format—specifically, two 1-inch AMPEX tapes. The agency asked NARA to retrieve that footage and digitize it for public release, and NARA did so. The NSA described it as “one of the more unique public proactive transparency record releases… to date.”

Hopper was a very popular speaker not just because of her pioneering contributions to computing, but because she was a natural raconteur, telling entertaining and often irreverent war stories from her early days. And she spoke plainly, as evidenced in the 1982 lecture when she drew an analogy between using pairs of oxen to move large logs in the days before large tractors, and pairing computers to get more computer power rather than just getting a bigger computer—”which of course is what common sense would have told us to begin with.” For those who love the history of computers and computation, the full lecture is very much worth the time.

Grace Hopper on Future Possibilities: Data, Hardware, Software, and People (Part Two, 1982).

Listing image by Lynn Gilbert/CC BY-SA 4.0

We can now watch Grace Hopper’s famed 1982 lecture on YouTube Read More »

people-game-ais-via-game-theory

People game AIs via game theory

Games inside games —

They reject more of the AI’s offers, probably to get it to be more generous.

A judge's gavel near a pile of small change.

Enlarge / In the experiments, people had to judge what constituted a fair monetary offer.

In many cases, AIs are trained on material that’s either made or curated by humans. As a result, it can become a significant challenge to keep the AI from replicating the biases of those humans and the society they belong to. And the stakes are high, given we’re using AIs to make medical and financial decisions.

But some researchers at Washington University in St. Louis have found an additional wrinkle in these challenges: The people doing the training may potentially change their behavior when they know it can influence the future choices made by an AI. And, in at least some cases, they carry the changed behaviors into situations that don’t involve AI training.

Would you like to play a game?

The work involved getting volunteers to participate in a simple form of game theory. Testers gave two participants a pot of money—$10, in this case. One of the two was then asked to offer some fraction of that money to the other, who could choose to accept or reject the offer. If the offer was rejected, nobody got any money.

From a purely rational economic perspective, people should accept anything they’re offered, since they’ll end up with more money than they would have otherwise. But in reality, people tend to reject offers that deviate too much from a 50/50 split, as they have a sense that a highly imbalanced split is unfair. Their rejection allows them to punish the person who made the unfair offer. While there are some cultural differences in terms of where the split becomes unfair, this effect has been replicated many times, including in the current work.

The twist with the new work, performed by Lauren Treimana, Chien-Ju Hoa, and Wouter Kool, is that they told some of the participants that their partner was an AI, and the results of their interactions with it would be fed back into the system to train its future performance.

This takes something that’s implicit in a purely game-theory-focused setup—that rejecting offers can help partners figure out what sorts of offers are fair—and makes it highly explicit. Participants, or at least the subset involved in the experimental group that are being told they’re training an AI, could readily infer that their actions would influence the AI’s future offers.

The question the researchers were curious about was whether this would influence the behavior of the human participants. They compared this to the behavior of a control group who just participated in the standard game theory test.

Training fairness

Treimana, Hoa, and Kool had pre-registered a number of multivariate analyses that they planned to perform with the data. But these didn’t always produce consistent results between experiments, possibly because there weren’t enough participants to tease out relatively subtle effects with any statistical confidence and possibly because the relatively large number of tests would mean that a few positive results would turn up by chance.

So, we’ll focus on the simplest question that was addressed: Did being told that you were training an AI alter someone’s behavior? This question was asked through a number of experiments that were very similar. (One of the key differences between them was whether the information regarding AI training was displayed with a camera icon, since people will sometimes change their behavior if they’re aware they’re being observed.)

The answer to the question is a clear yes: people will in fact change their behavior when they think they’re training an AI. Through a number of experiments, participants were more likely to reject unfair offers if they were told that their sessions would be used to train an AI. In a few of the experiments, they were also more likely to reject what were considered fair offers (in US populations, the rejection rate goes up dramatically once someone proposes a 70/30 split, meaning $7 goes to the person making the proposal in these experiments). The researchers suspect this is due to people being more likely to reject borderline “fair” offers such as a 60/40 split.

This happened even though rejecting any offer exacts an economic cost on the participants. And people persisted in this behavior even when they were told that they wouldn’t ever interact with the AI after training was complete, meaning they wouldn’t personally benefit from any changes in the AI’s behavior. So here, it appeared that people would make a financial sacrifice to train the AI in a way that would benefit others.

Strikingly, in two of the three experiments that did follow up testing, participants continued to reject offers at a higher rate two days after their participation in the AI training, even when they were told that their actions were no longer being used to train the AI. So, to some extent, participating in AI training seems to have caused them to train themselves to behave differently.

Obviously, this won’t affect every sort of AI training, and a lot of the work that goes into producing material that’s used in training something like a Large Language Model won’t have been done with any awareness that it might be used to train an AI. Still, there’s plenty of cases where humans do get more directly involved in training, so it’s worthwhile being aware that this is another route that can allow biases to creep in.

PNAS, 2024. DOI: 10.1073/pnas.2408731121  (About DOIs).

People game AIs via game theory Read More »

lightening-the-load:-ai-helps-exoskeleton-work-with-different-strides

Lightening the load: AI helps exoskeleton work with different strides

One model to rule them all —

A model trained in a virtual environment does remarkably well in the real world.

Image of two people using powered exoskeletons to move heavy items around, as seen in the movie Aliens.

Enlarge / Right now, the software doesn’t do arms, so don’t go taking on any aliens with it.

20th Century Fox

Exoskeletons today look like something straight out of sci-fi. But the reality is they are nowhere near as robust as their fictional counterparts. They’re quite wobbly, and it takes long hours of handcrafting software policies, which regulate how they work—a process that has to be repeated for each individual user.

To bring the technology a bit closer to Avatar’s Skel Suits or Warhammer 40k power armor, a team at North Carolina University’s Lab of Biomechatronics and Intelligent Robotics used AI to build the first one-size-fits-all exoskeleton that supports walking, running, and stair-climbing. Critically, its software adapts itself to new users with no need for any user-specific adjustments. “You just wear it and it works,” says Hao Su, an associate professor and co-author of the study.

Tailor-made robots

An exoskeleton is a robot you wear to aid your movements—it makes walking, running, and other activities less taxing, the same way an e-bike adds extra watts on top of those you generate yourself, making pedaling easier. “The problem is, exoskeletons have a hard time understanding human intentions, whether you want to run or walk or climb stairs. It’s solved with locomotion recognition: systems that recognize human locomotion intentions,” says Su.

Building those locomotion recognition systems currently relies on elaborate policies that define what actuators in an exoskeleton need to do in each possible scenario. “Let’s take walking. The current state of the art is we put the exoskeleton on you and you walk on a treadmill for an hour. Based on that, we try to adjust its operation to your individual set of movements,” Su explains.

Building handcrafted control policies and doing long human trials for each user makes exoskeletons super expensive, with prices reaching $200,000 or more. So, Su’s team used AI to automatically generate control policies and eliminate human training. “I think within two or three years, exoskeletons priced between $2,000 and $5,000 will be absolutely doable,” Su claims.

His team hopes these savings will come from developing the exoskeleton control policy using a digital model, rather than living, breathing humans.

Digitizing robo-aided humans

Su’s team started by building digital models of a human musculoskeletal system and an exoskeleton robot. Then they used multiple neural networks that operated each component. One was running the digitized model of a human skeleton, moved by simplified muscles. The second neural network was running the exoskeleton model. Finally, the third neural net was responsible for imitating motion—basically predicting how a human model would move wearing the exoskeleton and how the two would interact with each other. “We trained all three neural networks simultaneously to minimize muscle activity,” says Su.

One problem the team faced is that exoskeleton studies typically use a performance metric based on metabolic rate reduction. “Humans, though, are incredibly complex, and it is very hard to build a model with enough fidelity to accurately simulate metabolism,” Su explains. Luckily, according to the team, reducing muscle activations is rather tightly correlated with metabolic rate reduction, so it kept the digital model’s complexity within reasonable limits. The training of the entire human-exoskeleton system with all three neural networks took roughly eight hours on a single RTX 3090 GPU. And the results were record-breaking.

Bridging the sim-to-real gap

After developing the controllers for the digital exoskeleton model, which were developed by the neural networks in simulation, Su’s team simply copy-pasted the control policy to a real controller running a real exoskeleton. Then, they tested how an exoskeleton trained this way would work with 20 different participants. The averaged metabolic rate reduction in walking was over 24 percent, over 13 percent in running, and 15.4 percent in stair climbing—all record numbers, meaning their exoskeleton beat every other exoskeleton ever made in each category.

This was achieved without needing any tweaks to fit it to individual gaits. But the neural networks’ magic didn’t end there.

“The problem with traditional, handcrafted policies was that it was just telling it ‘if walking is detected do one thing; if walking faster is detected do another thing.’ These were [a mix of] finite state machines and switch controllers. We introduced end-to-end continuous control,” says Su. What this continuous control meant was that the exoskeleton could follow the human body as it made smooth transitions between different activities—from walking to running, from running to climbing stairs, etc. There was no abrupt mode switching.

“In terms of software, I think everyone will be using this neural network-based approach soon,” Su claims. To improve the exoskeletons in the future, his team wants to make them quieter, lighter, and more comfortable.

But the plan is also to make them work for people who need them the most. “The limitation now is that we tested these exoskeletons with able-bodied participants, not people with gait impairments. So, what we want to do is something they did in another exoskeleton study at Stanford University. We would take a one-minute video of you walking, and based on that, we would build a model to individualize our general model. This should work well for people with impairments like knee arthritis,” Su claims.

Nature, 2024.  DOI: 10.1038/s41586-024-07382-4

Lightening the load: AI helps exoskeleton work with different strides Read More »

researchers-describe-how-to-tell-if-chatgpt-is-confabulating

Researchers describe how to tell if ChatGPT is confabulating

Researchers describe how to tell if ChatGPT is confabulating

Aurich Lawson | Getty Images

It’s one of the world’s worst-kept secrets that large language models give blatantly false answers to queries and do so with a confidence that’s indistinguishable from when they get things right. There are a number of reasons for this. The AI could have been trained on misinformation; the answer could require some extrapolation from facts that the LLM isn’t capable of; or some aspect of the LLM’s training might have incentivized a falsehood.

But perhaps the simplest explanation is that an LLM doesn’t recognize what constitutes a correct answer but is compelled to provide one. So it simply makes something up, a habit that has been termed confabulation.

Figuring out when an LLM is making something up would obviously have tremendous value, given how quickly people have started relying on them for everything from college essays to job applications. Now, researchers from the University of Oxford say they’ve found a relatively simple way to determine when LLMs appear to be confabulating that works with all popular models and across a broad range of subjects. And, in doing so, they develop evidence that most of the alternative facts LLMs provide are a product of confabulation.

Catching confabulation

The new research is strictly about confabulations, and not instances such as training on false inputs. As the Oxford team defines them in their paper describing the work, confabulations are where “LLMs fluently make claims that are both wrong and arbitrary—by which we mean that the answer is sensitive to irrelevant details such as random seed.”

The reasoning behind their work is actually quite simple. LLMs aren’t trained for accuracy; they’re simply trained on massive quantities of text and learn to produce human-sounding phrasing through that. If enough text examples in its training consistently present something as a fact, then the LLM is likely to present it as a fact. But if the examples in its training are few, or inconsistent in their facts, then the LLMs synthesize a plausible-sounding answer that is likely incorrect.

But the LLM could also run into a similar situation when it has multiple options for phrasing the right answer. To use an example from the researchers’ paper, “Paris,” “It’s in Paris,” and “France’s capital, Paris” are all valid answers to “Where’s the Eiffel Tower?” So, statistical uncertainty, termed entropy in this context, can arise either when the LLM isn’t certain about how to phrase the right answer or when it can’t identify the right answer.

This means it’s not a great idea to simply force the LLM to return “I don’t know” when confronted with several roughly equivalent answers. We’d probably block a lot of correct answers by doing so.

So instead, the researchers focus on what they call semantic entropy. This evaluates all the statistically likely answers evaluated by the LLM and determines how many of them are semantically equivalent. If a large number all have the same meaning, then the LLM is likely uncertain about phrasing but has the right answer. If not, then it is presumably in a situation where it would be prone to confabulation and should be prevented from doing so.

Researchers describe how to tell if ChatGPT is confabulating Read More »

exploration-focused-training-lets-robotics-ai-immediately-handle-new-tasks

Exploration-focused training lets robotics AI immediately handle new tasks

Exploratory —

Maximum Diffusion Reinforcement Learning focuses training on end states, not process.

A woman performs maintenance on a robotic arm.

boonchai wedmakawand

Reinforcement-learning algorithms in systems like ChatGPT or Google’s Gemini can work wonders, but they usually need hundreds of thousands of shots at a task before they get good at it. That’s why it’s always been hard to transfer this performance to robots. You can’t let a self-driving car crash 3,000 times just so it can learn crashing is bad.

But now a team of researchers at Northwestern University may have found a way around it. “That is what we think is going to be transformative in the development of the embodied AI in the real world,” says Thomas Berrueta who led the development of the Maximum Diffusion Reinforcement Learning (MaxDiff RL), an algorithm tailored specifically for robots.

Introducing chaos

The problem with deploying most reinforcement-learning algorithms in robots starts with the built-in assumption that the data they learn from is independent and identically distributed. The independence, in this context, means the value of one variable does not depend on the value of another variable in the dataset—when you flip a coin two times, getting tails on the second attempt does not depend on the result of your first flip. Identical distribution means that the probability of seeing any specific outcome is the same. In the coin-flipping example, the probability of getting heads is the same as getting tails: 50 percent for each.

In virtual, disembodied systems, like YouTube recommendation algorithms, getting such data is easy because most of the time it meets these requirements right off the bat. “You have a bunch of users of a website, and you get data from one of them, and then you get data from another one. Most likely, those two users are not in the same household, they are not highly related to each other. They could be, but it is very unlikely,” says Todd Murphey, a professor of mechanical engineering at Northwestern.

The problem is that, if those two users were related to each other and were in the same household, it could be that the only reason one of them watched a video was that their housemate watched it and told them to watch it. This would violate the independence requirement and compromise the learning.

“In a robot, getting this independent, identically distributed data is not possible in general. You exist at a specific point in space and time when you are embodied, so your experiences have to be correlated in some way,” says Berrueta. To solve this, his team designed an algorithm that pushes robots be as randomly adventurous as possible to get the widest set of experiences to learn from.

Two flavors of entropy

The idea itself is not new. Nearly two decades ago, people in AI figured out algorithms, like Maximum Entropy Reinforcement Learning (MaxEnt RL), that worked by randomizing actions during training. “The hope was that when you take as diverse set of actions as possible, you will explore more varied sets of possible futures. The problem is that those actions do not exist in a vacuum,” Berrueta claims. Every action a robot takes has some kind of impact on its environment and on its own condition—disregarding those impacts completely often leads to trouble. To put it simply, an autonomous car that was teaching itself how to drive using this approach could elegantly park into your driveway but would be just as likely to hit a wall at full speed.

To solve this, Berrueta’s team moved away from maximizing the diversity of actions and went for maximizing the diversity of state changes. Robots powered by MaxDiff RL did not flail their robotic joints at random to see what that would do. Instead, they conceptualized goals like “can I reach this spot ahead of me” and then tried to figure out which actions would take them there safely.

Berrueta and his colleagues achieved that through something called ergodicity, a mathematical concept that says that a point in a moving system will eventually visit all parts of the space that the system moves in. Basically, MaxDiff RL encouraged the robots to achieve every available state in their environment. And the results of first tests in simulated environments were quite surprising.

Racing pool noodles

“In reinforcement learning there are standard benchmarks that people run their algorithms on so we can have a good way of comparing different algorithms on a standard framework,” says Allison Pinosky, a researcher at Northwestern and co-author of the MaxDiff RL study. One of those benchmarks is a simulated swimmer: a three-link body resting on the ground in a viscous environment that needs to learn to swim as fast as possible in a certain direction.

In the swimmer test, MaxDiff RL outperformed two other state-of-the-art reinforcement learning algorithms (NN-MPPI and SAC). These two needed several resets to figure out how to move the swimmers. To complete the task, they were following a standard AI learning process divided down into a training phase where an algorithm goes through multiple failed attempts to slowly improve its performance, and a testing phase where it tries to perform the learned task. MaxDiff RL, by contrast, nailed it, immediately adapting its learned behaviors to the new task.

The earlier algorithms ended up failing to learn because they got stuck trying the same options and never progressing to where they could learn that alternatives work. “They experienced the same data repeatedly because they were locally doing certain actions, and they assumed that was all they could do and stopped learning,” Pinosky explains. MaxDiff RL, on the other hand, continued changing states, exploring, getting richer data to learn from, and finally succeeded. And because, by design, it seeks to achieve every possible state, it can potentially complete all possible tasks within an environment.

But does this mean we can take MaxDiff RL, upload it to a self-driving car, and let it out on the road to figure everything out on its own? Not really.

Exploration-focused training lets robotics AI immediately handle new tasks Read More »

high-speed-imaging-and-ai-help-us-understand-how-insect-wings-work

High-speed imaging and AI help us understand how insect wings work

Black and white images of a fly with its wings in a variety of positions, showing the details of a wing beat.

Enlarge / A time-lapse showing how an insect’s wing adopts very specific positions during flight.

Florian Muijres, Dickinson Lab

About 350 million years ago, our planet witnessed the evolution of the first flying creatures. They are still around, and some of them continue to annoy us with their buzzing. While scientists have classified these creatures as pterygotes, the rest of the world simply calls them winged insects.

There are many aspects of insect biology, especially their flight, that remain a mystery for scientists. One is simply how they move their wings. The insect wing hinge is a specialized joint that connects an insect’s wings with its body. It’s composed of five interconnected plate-like structures called sclerites. When these plates are shifted by the underlying muscles, it makes the insect wings flap.

Until now, it has been tricky for scientists to understand the biomechanics that govern the motion of the sclerites even using advanced imaging technologies. “The sclerites within the wing hinge are so small and move so rapidly that their mechanical operation during flight has not been accurately captured despite efforts using stroboscopic photography, high-speed videography, and X-ray tomography,” Michael Dickinson, Zarem professor of biology and bioengineering at the California Institute of Technology (Caltech), told Ars Technica.

As a result, scientists are unable to visualize exactly what’s going on at the micro-scale within the wing hinge as they fly, preventing them from studying insect flight in detail. However, a new study by Dickinson and his team finally revealed the working of sclerites and the insect wing hinge. They captured the wing motion of fruit flies (Drosophila melanogaster) analyzing 72,000 recorded wing beats using a neural network to decode the role individual sclerites played in shaping insect wing motion.

Understanding the insect wing hinge

The biomechanics that govern insect flight are quite different from those of birds and bats. This is because wings in insects didn’t evolve from limbs. “In the case of birds, bats, and pterosaurs we know exactly where the wings came from evolutionarily because all these animals fly with their forelimbs. They’re basically using their arms to fly. In insects, it’s a completely different story. They evolved from six-legged organisms and they kept all six legs. However, they added flapping appendages to the dorsal side of their body, and it is a mystery as to where those wings came from,” Dickinson explained.

Some researchers suggest that insect wings came from gill-like appendages present in ancient aquatic arthropods. Others argue that wings originated from “lobes,” special outgrowths found on the legs of ancient crustaceans, which were ancestors of insects. This debate is still ongoing, so its evolution can’t tell us much about how the hinge and the sclerites operate.

Understanding the hinge mechanics is crucial because this is what makes insects efficient flying creatures. It enables them to fly at impressive speeds relative to their body sizes (some insects can fly at 33 mph) and to demonstrate great maneuverability and stability while in flight.

“The insect wing hinge is arguably among the most sophisticated and evolutionarily important skeletal structures in the natural world,” according to the study authors.

However, imaging the activity of four of the five sclerites that form the hinge has been impossible due to their size and the speeds at which they move. Dickinson and his team employed a multidisciplinary approach to overcome this challenge. They designed an apparatus equipped with three high-speed cameras that recorded the activity of tethered fruit flies at 15,000 frames per second using infrared light.

They also used a calcium-sensitive protein to track changes in the activity of the steering muscles of the insects as they flew (calcium helps trigger muscle contractions). “We recorded a total of 485 flight sequences from 82 flies. After excluding a subset of wingbeats from sequences when the fly either stopped flying or flew at an abnormally low wingbeat frequency, we obtained a final dataset of 72,219 wingbeats,” the researchers note.

Next, they trained a machine-learning-based convolutional neural network (CNN) using 85 percent of the dataset. “We used the CNN model to investigate the transformation between muscle activity and wing motion by performing a set of virtual manipulations, exploiting the network to execute experiments that would be difficult to perform on actual flies,” they explained.

In addition to the neural network, they also developed an encoder-decoder neural network (an architecture used in machine learning) and fed it data related to steering muscle activity. While the CNN model could predict wing motion, the encoder/decoder could predict the action of individual sclerite muscles during the movement of the wings. Now, it was time to check whether the data they predicted was accurate.

High-speed imaging and AI help us understand how insect wings work Read More »