A.I.’s Latest Challenge: the Math Olympics

For four years, the computer scientist Trieu Trinh has been consumed with something of a meta-math problem: how to build an A.I. model that solves geometry problems from the International Mathematical Olympiad, the annual competition for the world’s most mathematically attuned high-school students.

Last week Dr. Trinh successfully defended his doctoral dissertation on this topic at New York University; this week, he described the result of his labors in the journal Nature. Named AlphaGeometry, the system solves Olympiad geometry problems at nearly the level of a human gold medalist.

While developing the project, Dr. Trinh pitched it to two research scientists at Google, and they brought him on as a resident from 2021 to 2023. AlphaGeometry joins Google DeepMind’s fleet of A.I. systems, which have become known for tackling grand challenges. Perhaps most famously, AlphaZero, a deep-learning algorithm, conquered chess in 2017. Math is a harder problem, as the number of possible paths toward a solution is sometimes infinite; chess is always finite.

“I kept running into dead ends, going down the wrong path,” said Dr. Trinh, the lead author and driving force of the project.

The paper’s co-authors are Dr. Trinh’s doctoral adviser, He He, at New York University; Yuhuai Wu, known as Tony, a co-founder of xAI (formerly at Google) who in 2019 had independently started exploring a similar idea; Thang Luong, the principal investigator, and Quoc Le, both from Google DeepMind.

Dr. Trinh’s perseverance paid off. “We’re not making incremental improvement,” he said. “We’re making a big jump, a big breakthrough in terms of the result.”

“Just don’t overhype it,” he said.

Dr. Trinh presented the AlphaGeometry system with a test set of 30 Olympiad geometry problems drawn from 2000 to 2022. The system solved 25; historically, over that same period, the average human gold medalist solved 25.9. Dr. Trinh also gave the problems to a system developed in the 1970s that was known to be the strongest geometry theorem prover; it solved 10.

Over the last few years, Google DeepMind has pursued a number of projects investigating the application of A.I. to mathematics. And more broadly in this research realm, Olympiad math problems have been adopted as a benchmark; OpenAI and Meta AI have achieved some results. For extra motivation, there’s the I.M.O. Grand Challenge, and a new challenge announced in November, the Artificial Intelligence Mathematical Olympiad Prize, with a $5 million pot going to the first A.I. that wins Olympiad gold.

The AlphaGeometry paper opens with the contention that proving Olympiad theorems “represents a notable milestone in human-level automated reasoning.” Michael Barany, a historian of mathematics and science at the University of Edinburgh, said he wondered whether that was a meaningful mathematical milestone. “What the I.M.O. is testing is very different from what creative mathematics looks like for the vast majority of mathematicians,” he said.

Terence Tao, a mathematician at the University of California, Los Angeles — and the youngest-ever Olympiad gold medalist, when he was 12 — said he thought that AlphaGeometry was “nice work” and had achieved “surprisingly strong results.” Fine-tuning an A.I.-system to solve Olympiad problems might not improve its deep-research skills, he said, but in this case the journey may prove more valuable than the destination.

As Dr. Trinh sees it, mathematical reasoning is just one type of reasoning, but it holds the advantage of being easily verified. “Math is the language of truth,” he said. “If you want to build an A.I., it’s important to build a truth-seeking, reliable A.I. that you can trust,” especially for “safety critical applications.”

AlphaGeometry is a “neuro-symbolic” system. It pairs a neural net language model (good at artificial intuition, like ChatGPT but smaller) with a symbolic engine (good at artificial reasoning, like a logical calculator, of sorts).

And it is custom-made for geometry. “Euclidean geometry is a nice test bed for automatic reasoning, since it constitutes a self-contained domain with fixed rules,” said Heather Macbeth, a geometer at Fordham University and an expert in computer-verified reasoning. (As a teenager, Dr. Macbeth won two I.M.O. medals.) AlphaGeometry “seems to constitute good progress,” she said.

The system has two especially novel features. First, the neural net is trained only on algorithmically generated data — a whopping 100 million geometric proofs — using no human examples. The use of synthetic data made from scratch overcame an obstacle in automated theorem-proving: the dearth of human-proof training data translated into a machine-readable language. “To be honest, initially I had some doubts about how this would succeed,” Dr. He said.

Second, once AlphaGeometry was set loose on a problem, the symbolic engine started solving; if it got stuck, the neural net suggested ways to augment the proof argument. The loop continued until a solution materialized, or until time ran out (four and a half hours). In math lingo, this augmentation process is called “auxiliary construction.” Add a line, bisect an angle, draw a circle — this is how mathematicians, student or elite, tinker and try to gain purchase on a problem. In this system, the neural net learned to do auxiliary construction, and in a humanlike way. Dr. Trinh likened it to wrapping a rubber band around a stubborn jar lid in helping the hand get a better grip.

“It’s a very interesting proof of concept,” said Christian Szegedy, a co-founder at xAI who was formerly at Google. But it “leaves a lot of questions open,” he said, and is not “easily generalizable to other domains and other areas of math.”

Dr. Trinh said he would attempt to generalize the system across mathematical fields and beyond. He said he wanted to step back and consider “the common underlying principle” of all types of reasoning.

Stanislas Dehaene, a cognitive neuroscientist at the Collège de France who has a research interest in foundational geometric knowledge, said he was impressed with AlphaGeometry’s performance. But he observed that “it does not ‘see’ anything about the problems that it solves” — rather, it only takes in logical and numerical encodings of pictures. (Drawings in the paper are for the benefit of the human reader.) “There is absolutely no spatial perception of the circles, lines and triangles that the system learns to manipulate,” Dr. Dehaene said. The researchers agreed that a visual component might be valuable; Dr. Luong said it could be added, perhaps within the year, using Google’s Gemini, a “multimodal” system that ingests both text and images.

In early December, Dr. Luong visited his old high school in Ho Chi Minh City, Vietnam, and showed AlphaGeometry to his former teacher and I.M.O. coach, Le Ba Khanh Trinh. Dr. Lê was the top gold medalist at the 1979 Olympiad and won a special prize for his elegant geometry solution. Dr. Lê parsed one of AlphaGeometry’s proofs and found it remarkable yet unsatisfying, Dr. Luong recalled: “He found it mechanical, and said it lacks the soul, the beauty of a solution that he seeks.”

Dr. Trinh had previously asked Evan Chen, a mathematics doctoral student at M.I.T. — and an I.M.O. coach and Olympiad gold medalist — to check some of AlphaGeometry’s work. It was correct, Mr. Chen said, and he added that he was intrigued by how the system had found the solutions.

“I would like to know how the machine is coming up with this,” he said. “But, I mean, for that matter, I would like to know how humans come up with solutions, too.”