Strategic Communications and Marketing News Bureau

Software teaches computers to translate words to math

CHAMPAIGN, Ill. – If Johnny has five apples and seven oranges, and he wants to share them with three of his friends, can a computer understand the text to figure out how many pieces of fruit each person gets?

Thanks to new software developed at the University of Illinois, machines now can learn to understand mathematical reasoning expressed in language, which could greatly improve search engines and access to data as well as boost mathematics education.

U. of I. computer sciences professor Dan Roth and graduate student Subhro Roy published their work in the journal Transactions of the Association for Computational Linguistics.

“There is a lot of data available in news archives and public records, but it cannot be accessed in a meaningful way,” Roth said. “For example, if people want to know what percentage of a state’s budget has been spent on education over the past 20 years, a query like that won’t give the desired result with a keyword search performed today in a search engine like Google. But if the engine were able to do quantitative reasoning, it would infer from the text the type of information the user is looking for. It can find the numbers, then calculate the percentages and addition required to do this.”

The first hurdle, and the biggest challenge, was in teaching the computer to identify quantities and units in text regardless of how they are expressed, something humans do unconsciously when reading. Secondly, the software has to decide what to do with the identified numbers.

In the problem with Johnny, for example, the computer has to understand that both apples and oranges are fruit; it has to know that the words five, seven and three are equivalent to the numerical values 5, 7 and 3; it has to determine what kind of operation(s) the question requires – in this case, addition and division – and in which order to conduct those operations. Once the program has converted the text into an equation, it can easily compute that Johnny and his friends each have three pieces of fruit.

The computer also has to be able to determine the different equation corresponding to the text if, for example, the text had said that Johnny wanted to split the fruit among his three friends, instead of sharing the fruit with them. In that case, the subtle change in the language implies that each of the friends would receive four pieces of fruit, with Johnny keeping none for himself.

Such sophisticated reasoning is required for search queries as well. When accessing a text regarding financial earnings, for instance, the computer has to identify whether an amount is exact or approximate, static or dynamic, a range, presented in relation to something else, and all of the other contextual cues that a reader would inherently understand.

“The computer reads two pounds; two pounds of what? Or is it referencing currency? What about monetary conversions?” Roth said. “If you talk about dates, it gets even harder. I could say, the week after Thanksgiving, or the first week in December, or Dec. 3. To you and me it means the same thing, but a keyword search can’t equate them.”

The researchers tested their software’s abilities to identify and normalize quantities in text, to perform searches regarding monetary currencies, and to understand and solve elementary-school-level math word problems. They found that the software performed well in all tasks. It even outperformed the average elementary-level student on standardized word problems, Roth said, getting 87 percent of answers correct.

Roth hopes that the ability to understand numbers in context will help make information more accessible to all, from researchers looking for correlations to investors looking for clear financial data to citizens seeking to form educated opinions. He also hopes that using technology to break down mathematical concepts can help students improve their own quantitative reasoning abilities.

“As we move forward and want to help kids understand math, it makes sense to use technology,” Roth said. “If you search the Web today, you see tons of Web pages that help kids and parents with math homework, so we know this is a challenge for people. If a program were able to understand text and word problems to the extent that you can see what the variables are and what you should focus on in the problem, that could help people learn better. This shows that computers could help people learn in ways that could not be done before.”

The Army Research Laboratory and the Defense Advanced Research Projects Agency supported this work.

To reach Dan Roth, call (217) 244-7068; email danr@illinois.edu.

The paper, “Reasoning about Quantities in Natural Language,” is available online.

Read Next

Announcements Graphic says: 2025 Highly Cited Researchers. Background is orange with an image of journal articles stacked and open.

Twelve Illinois scientists rank among the world’s most influential

CHAMPAIGN, Ill. — Twelve scientists at the University of Illinois Urbana-Champaign have been named to the 2025 Clarivate Analytics Highly Cited Researchers list. The list recognizes researchers and social scientists who have demonstrated exceptional influence, as reflected through their publication of multiple papers frequently cited by their peers during the last decade. The highly cited […]

Engineering A tilted view of miscellaneous of multicolored used batteries.

Study shows new hope for commercially attractive lithium extraction from spent batteries

A new study shows that lithium — a critical element used in rechargeable batteries and susceptible to supply chain disruption — can be recovered from battery waste using an electrochemically driven recovery process. The method has been tested on commonly used types of lithium-containing batteries and demonstrates economic viability with the potential to simplify operations, minimize costs and increase the sustainability and attractiveness of the recovery process for commercial use.

Health and Medicine Research team in the lab.

Study: A cellular protein, FGD3, boosts breast cancer chemotherapy, immunotherapy

CHAMPAIGN, Ill. — A naturally occurring protein that tends to be expressed at higher levels in breast cancer cells boosts the effectiveness of some anticancer agents, including doxorubicin, one of the most widely used chemotherapies, and a preclinical drug known as ErSO, researchers report. The protein, FGD3, contributes to the rupture of cancer cells disrupted […]

Strategic Communications and Marketing News Bureau

507 E. Green St
MC-426
Champaign, IL 61820

Email: stratcom@illinois.edu

Phone (217) 333-5010