SQUASH: Generating Question-Answer Hierarchies
(ACL 2019)

Paper » Demo » Dataset » Project Code » Demo Code »

SQUASH, a new text generation task and an alternate way to read documents.

Several years ago, the Greek philosopher Socrates encouraged his students to learn about the world by questioning everything. More recently, the process of knowledge acquisition has been viewed as a question-answer game between a student and a teacher in which the student typically starts by asking broad, open-ended questions before drilling down into specifics (Hakkarainen and Sintonen, 2002).

In this project, we introduce a novel text generation task (Specificity-controlled Question-Answer Hierarchies, or SQUASH for short) which aims to convert a sequence of input paragraphs into a hierarchy of question-answer pairs about the paragraphs. The hierarchy is determined by the specificity of the question. The higher level of the hierarchy has broader higher-level questions (for example, Why did Frodo leave the Fellowship?) whereas the lower level of the hierarchy has related but more specific questions (for example, Who did Frodo leave with?). In order to tackle this task, we classify questions in existing reading comprehension datasets (like SQuAD, CoQA and QuAC) according to their specificity using a question taxonomy loosely based on Lehnert 1978. These specificity-labelled datasets are used to train a specificity and context conditioned neural question generation model which forms a part of a larger pipeline to SQUASH paragraphs.


Along with the academic publication and codebase, we are releasing a web demonstration you can play with! The demonstration contains an improved version of the original model and has customizable generation and filtering hyperparameters. Here is a technical note on the modifications made. In a nutshell, we have leveraged language model pretraining in the question generation (via GPT2 small) and question answering modules (via BERT). All our question generation now uses top-p random sampling (Holtzman et al. 2019). The dataset has been modified to resolve coreferences in the QuAC questions via huggingface/neuralcoref. Finally, the filtering process has been simplified.


If you find this paper relevant, please cite us:
Author = {Kalpesh Krishna and Mohit Iyyer},
Booktitle = {Association for Computational Linguistics,
Year = "2019",
Title = {Generating Question-Answer Hierarchies}

website credits - Rowan Zellers' website on HellaSwag.