CHAMPAIGN, Ill. - If Johnny has five apples and seven oranges, and he wants to share them with three of his friends, can a computer understand the text to figure out how many pieces of fruit each person gets?
Thanks to new software developed at the University of Illinois, machines now can learn to understand mathematical reasoning expressed in language, which could greatly improve search engines and access to data as well as boost mathematics education.
U. of I. computer sciences professor Dan Roth and graduate student Subhro Roy published their work in the journal Transactions of the Association for Computational Linguistics.
"There is a lot of data available in news archives and public records, but it cannot be accessed in a meaningful way," Roth said. "For example, if people want to know what percentage of a state's budget has been spent on education over the past 20 years, a query like that won't give the desired result with a keyword search performed today in a search engine like Google. But if the engine were able to do quantitative reasoning, it would infer from the text the type of information the user is looking for. It can find the numbers, then calculate the percentages and addition required to do this."
The first hurdle, and the biggest challenge, was in teaching the computer to identify quantities and units in text regardless of how they are expressed, something humans do unconsciously when reading. Secondly, the software has to decide what to do with the identified numbers.
In the problem with Johnny, for example, the computer has to understand that both apples and oranges are fruit; it has to know that the words five, seven and three are equivalent to the numerical values 5, 7 and 3; it has to determine what kind of operation(s) the question requires - in this case, addition and division - and in which order to conduct those operations. Once the program has converted the text into an equation, it can easily compute that Johnny and his friends each have three pieces of fruit.
The computer also has to be able to determine the different equation corresponding to the text if, for example, the text had said that Johnny wanted to split the fruit among his three friends, instead of sharing the fruit with them. In that case, the subtle change in the language implies that each of the friends would receive four pieces of fruit, with Johnny keeping none for himself.
Such sophisticated reasoning is required for search queries as well. When accessing a text regarding financial earnings, for instance, the computer has to identify whether an amount is exact or approximate, static or dynamic, a range, presented in relation to something else, and all of the other contextual cues that a reader would inherently understand.
"The computer reads two pounds; two pounds of what? Or is it referencing currency? What about monetary conversions?" Roth said. "If you talk about dates, it gets even harder. I could say, the week after Thanksgiving, or the first week in December, or Dec. 3. To you and me it means the same thing, but a keyword search can't equate them."
The researchers tested their software's abilities to identify and normalize quantities in text, to perform searches regarding monetary currencies, and to understand and solve elementary-school-level math word problems. They found that the software performed well in all tasks. It even outperformed the average elementary-level student on standardized word problems, Roth said, getting 87 percent of answers correct.
Roth hopes that the ability to understand numbers in context will help make information more accessible to all, from researchers looking for correlations to investors looking for clear financial data to citizens seeking to form educated opinions. He also hopes that using technology to break down mathematical concepts can help students improve their own quantitative reasoning abilities.
"As we move forward and want to help kids understand math, it makes sense to use technology," Roth said. "If you search the Web today, you see tons of Web pages that help kids and parents with math homework, so we know this is a challenge for people. If a program were able to understand text and word problems to the extent that you can see what the variables are and what you should focus on in the problem, that could help people learn better. This shows that computers could help people learn in ways that could not be done before."
The Army Research Laboratory and the Defense Advanced Research Projects Agency supported this work.