The team at Microsoft Research Asia reached the human parity milestone using the “Stanford Question Answering Dataset”, known among researchers as “SQuAD”. It’s a machine reading comprehension dataset that is made up of questions about a set of Wikipedia articles. According to the SQuAD leaderboard, Microsoft submitted a model that reached the score of 82.650 on the exact match portion.
The human performance on the same set of questions and answers is 82.304. On January 5, researchers with the Chinese e-commerce company Alibaba submitted a score of 82.440, also about the same as a human. “The two companies are currently tied for first place on the SQuAD ‘leaderboard’ which lists the results of research organisations’ efforts,” the post read.
With machine reading comprehension, researchers say computers also would be able to quickly parse through information found in books and documents and provide people with the information they need most in an easily understandable way. That would let drivers more easily find the answer they need in a dense car manual, saving time and effort in tense or difficult situations.
“These tools also could let doctors, lawyers and other experts more quickly get through the drudgery of things like reading through large documents for specific medical findings or rarified legal precedent,” Linn said. According to Ming Zhou, Assistant Managing Firector of Microsoft Research Asia, the “SQuAD” dataset results are an important milestone, but overall, people are still much better than machines at comprehending the complexity and nuance of language.
“Natural language processing is still an area with lots of challenges that we all need to keep investing in and pushing forward,” Zhou said. “This milestone is just a start.”
Watch: Tech and Auto Show EP 28 | 2018 Audi Q5, LG V30+ & More.