Yoav is an associate professor of computer science at Bar Ilan University, and also the research director of AI2 Israel. His research interests include language understanding technologies with real world applications, combining symbolic and neural representations, uncovering latent information in text, syntactic and semantic processing, and interpretability and foundational understanding of deep learning models for text and sequences. He authored a textbook on deep learning techniques for natural language processing, and was among the IEEE's AI Top 10 to Watch in 2018, and a recipient of the Krill Prize in Science in 2017. He is also a recipient of multiple best-paper and outstanding awards at major NLP conferences. He received his Ph.D. in Computer Science from Ben Gurion University, and spent time in Google Research as a post-doc.
Reading between the lines: a useful task and a challenging benchmark for reading comprehension by machines.
What does it mean to "understand" a text? Many recent NLP works take a question-answering (QA) perspective to this question, under which understanding a text means the ability to accurately answer questions about it. While this approach is userful, it can also be misleading: some questions are easier than others, some questions can be answered without access to the text at all, and, furthermore, questions often leak information about the answer. Thus, determining understanding based on question-answering ability alone is not sufficient. Indeed, many QA datasets are now considered "solved" by deep learning systems, while it should be clear that these systems do not really understand the text.
In this talk I focus on a task that human readers perform almost subconciously when reading a text: establishing links between entities in it. Some of these links are explicitly mentioned, while others are implicit, and must be inferred. We formulate this as a concrete NLP task which we call Text-based NP Enrichment (TNE). The TNE task is centered around considering all pair of non-recursive noun-phrases in a text, and attempting to connect them via prepositions.
As a simple example, given the sentence pair "I entered the house. The walls were white", the system should infer that the "walls" are "of the house".
I argue that TNE is a foundational task: humans performs it almost subconsciously as they read, and it surfaces a lot of "hidden" structures in the text. As a consequence, being able to perform it well will be very useful to a large range of NLP applications. Furthermore, I argue that---while it is not the ultimate test for text understanding---it is a much stronger and much more well motivated test for text understanding that QA is.
I advocate for the use of the TNE task and its accompanying dataset as a benchmark for text understanding ability by machines, as well as a generic and useful NLP component.