Explanation Bank

Our long term interest is in building inference algorithms capable of answering complex questions and producing human-readable explanations by aggregating information from multiple sources and knowledge bases (sometimes called “multi-hop” inference). The resources provided here in the Explanation Bank are intended to help others with similar interests in explanation-centered inference, and will be regularly updated as we generate more resources.

EntailmentBank (EMNLP 2021)
University of Arizona & Allen Institute for Artificial Intelligence
A large set of 1,840 expert-authored tree-structured explanations to science exam questions from the paper Explaining Answers with Entailment Trees. Each explanation takes the form of an entailment tree, where a given fact (parent node) is entailed by its children (more atomic facts), providing an explicit step-by-step description of the reasoning path required to arrive at an inference.
[ EntailmentBank @ AllenAI.org ]
[ EntailmentBank as a book of explanation graphs (PDF) ] NEW!

Expert Science Teacher Relevance Ratings for WorldTree Explanation Corpus (EMNLP 2021)
University of Arizona
This is the dataset for the paper On the Challenges of Evaluating Compositional Explanations in Multi-Hop Inference: Relevance, Completeness, and Expert Ratings. This is a vast expansion of the WorldTree V2 corpus — the data includes approximately a quarter-million expert-generated (science teacher) relevance ratings for 126k facts in the WorldTree V2 explanation corpus, increasing the number of relevant facts per question by approximately 4x over the original gold annotation, while providing a graded notion of relevance (evaluated on a scale of 0-3). The data also includes completeness ratings for 50 questions from the three models (ranking, generative, and schema) evaluated in the paper.
[ emnlp-2021-teacher-ratings.zip ]

Worldtree Corpus (V2.1) of Explanation Graphs and Inference Patterns supporting Multi-hop Inference (Februrary 2020 snapshot)
University of Arizona & Allen Institute for Artificial Intelligence
This is the February 2020 snapshot of the Worldtree corpus of explanation graphs, explanatory role ratings, and associated tablestore, from the paper WorldTree V2: A Corpus of Science-Domain Structured Explanations and Inference Patterns supporting Multi-Hop Inference (LREC 2020). WorldTree is one of the most detailed multi-hop question answering/explanation datasets, where questions require combining between 1 and 16 facts (average 6) to generate detailed explanations for question answering inference. Explanation graphs for approximately 4,400 questions, and 9,000 tablestore rows across 81 semi-structured tables are provided. New in V2, this corpus also includes a set of 355 “inference patterns” that define high-level multi-hop inference patterns and problem solving methods. A larger, raw set of 5,100 explanations is provided, including duplicate questions (to study interannotator agreement). This data is intended to be paired with the AI2 Mercury Licensed questions, which requires agreeing to an accompanying EULA before use. A separate Scala API is also provided to ease parsing and use of the corpus. Version 2 also includes a new 1,500 page book of explanation graphs (PDF) in the training corpus, providing a gentle introduction to this explanation corpus.
[ WorldtreeExplanationCorpusV2.1_Feb2020.zip ]
[ WorldTree Explanation Corpus Desk Reference (PDF) ] — Book with over 2,000 human-readable explanation graphs!
[ Worldtree-api Github Repository (from WorldTree V1) ]

Example explanation graph (ice melting)

ScienceExamCER: A Common-Entity Recognition (CER) dataset and tagger for science domain text (LREC 2020, November 2019 snapshot)
University of Arizona
This is the dataset and code for the paper ScienceExamCER: A High-Density Fine-Grained Science-Domain Corpus for Common Entity Recognition. The common-entity recognition (CER) task aims to provide nearly every word in text with a detailed semantic class label. The release includes: (a) A set of 4,239 science exam questions tagged with 133k mentions, covering approximately 96% of all content words in this corpus, drawn from a detailed taxonomy of 601 semantic classes, (b) the detailed typology of 601 fine-grained semantic classes constructed from a data-driven analysis of the domain, and (c) code, including a pre-trained model, for performing the CER tagging task on plain text.
[ ScienceExamCER_data.zip ]
[ Pretrained BERT-NER model ]
[ Github Repository for CER Tagger ]

Explanatory Inference Patterns for Matter Science Exam Questions (COIN workshop, November 2019 snapshot)
University of Arizona
This is the dataset for the paper Extracting Common Inference Patterns from Semi-Structured Explanations (COIN 2019). The data contains: (a) a selection of 67 common inference patterns extracted from 42 “matter” science questions in the WorldTree explanation corpus, and (b) A graph-based visualization showing both the curated knowledge graph produced from this paper, as well as the high-level inference patterns that were extracted.
[ thiem_jansen_coin2019_inferencepatterns.zip ]
[ Online Graph Visualization ]

Question Classification Labels for Science Questions (LREC 2020, August 2019 snapshot)
University of Arizona & Allen Institute for Artificial Intelligence
This is the dataset for the paper Multi-class Hierarchical Question Classification for Multiple Choice Science Exams (arXiv). This is, to the best of our knowledge, the largest and most fine-grained question classification corpus available, being both larger and containing nearly an order of magnitude more classification labels than TREC-50. This dataset also includes classification labels for each question in the WorldTree explanation corpus (below). The data contains: (a) a taxonomy of 462 problem domains for 3rd to 9th grade standardized science exams, organized into 6 levels of granularity, (b) classification labels for all 7,787 science exam questions in the Aristo Reasoning Challenge (ARC) corpus. Precomputed predictions from the BERT-QC model are also available for download. Code is available through the Github Repository.
[ arc-questionclassificationdata-2019.zip ]
[ bertqc-precomputed-predictions.zip ]
[ BERT-QC Github Repository ]

Worldtree Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-hop Inference (September 2017 snapshot)
University of Arizona & Allen Institute for Artificial Intelligence
This is the September 2017 snapshot of the Worldtree corpus of explanation graphs, explanatory role ratings, and associated tablestore, from the paper Worldtree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-hop Inference (LREC 2018). Explanation graphs for 1,680 questions, and 4,950 tablestore rows across 62 semi-structured tables are provided. This data is intended to be paired with the AI2 Mercury Licensed questions, which requires agreeing to an accompanying EULA before use. Two versions are available, one with the Mercury questions (and a large PDF of the Worldtree visualization), and one without. A separate Scala API is also provided to ease parsing and use of the corpus, as well as replicate the analyses and visualizations in the LREC2018 paper.
[ Worldtree_Explanation_Corpus_V1_Sept2017_noMercury.zip ]
[ Worldtree_Explanation_Corpus_V1_Sept2017_withMercury.zip ]
[ TextGraphs2019 Explanation Reconstruction Shared Task Data (Worldtree Corpus + Explanation Graph Visualizations) ]
[ Worldtree-api Github Repository ]
[ Explanation Authoring Tool — Coming Soon! ]
[ Talk on this project ]

Ratings to determine Lexical Connection Quality for Information Aggregation (TextGraphs 2018)
University of Arizona
This is the supplementary dataset for the paper Multi-hop Inference for Sentence-level TextGraphs: How Challenging is Meaningfully Combining Information for Science Question Answering?. These ratings were used to generate average utility ratings in Tables 2 and 3. This archive also contains an expanded copy of the paper with an Appendix showing additional analyses for the 2-sentence aggregation scenario.
[ textgraphs2018_data.zip ]

Common Explanatory Patterns (AKBC 2017)
University of Arizona
This is the supplementary dataset for the paper A Study of Automatically Acquiring Explanatory Inference Patterns from Corpora of Explanations: Lessons from Elementary Science (AKBC 2017). The dataset contains: (a) common explanatory patterns (i.e. patterns found more than once) in the first 800 questions of the September 2017 WorldTree corpus described in Section 3.1, and (b) A fine-grained characterization of reconstruction quality by number of edges in the gold graph, as an expansion to Figure 4.
[ AKBC2017_ExplanatoryPatterns_Nov2017.zip ]

Explanations for Science Questions (COLING 2016)
University of Arizona, Stony Brook University & Allen Institute for Artificial Intelligence
This is the dataset for the paper What’s in an Explanation? Characterizing Knowledge and Inference Requirements for Elementary Science Exams (COLING 2016). The data contains: (a) 1,363 gold explanation sentences supporting 363 science questions, (b) relation annotation for a subset of those explanations, and (c) a graphical annotation tool with annotation guidelines.
[ COLING2016_Explanations_Oct2016.zip ]