Natural Language Reasoning and Structured Explanations Workshop

August @ ACL 2024. Bangkok, Thailand.

With recent scaling of large pre-trained Transformer language models (LLMs), the scope of feasible NLP tasks has broadened. Significant recent work has focused on tasks that require some kind of natural language reasoning. A trajectory in question answering has led us from extraction-oriented datasets like SQuAD to “multi-hop” reasoning datasets like HotpotQA and StrategyQA. Although LLMs have shown remarkable performance on most NLP tasks, it is often unclear why their answers follow from what they know. To address this gap, a new class of explanation techniques has emerged which play an integral part in structuring the reasoning necessary to solve these datasets. For example, the chain-of-thought paradigm leverages explanations as vehicles for LLMs to mimic human reasoning processes. Entailment trees offer a way to ground multi-step reasoning in a collection of verifiable steps. Frameworks like SayCan bridge high-level planning in language and with low-level action trajectories. As a result, we see a confluence of methods blending explainable machine learning/NLP, classical AI (especially theorem proving), and cognitive science (how do humans structure explanations?). This workshop aims to bring together a diverse set of perspectives from these different traditions and attempt to establish common ground for how these various kinds of explanation structures can tackle a broad class of reasoning problems in natural language and beyond.


Yejin Choi
University of Washington
Sherry Tongshuang Wu
Carnegie Mellon University
Karthik Narasimhan
Princeton University
Thomas Icard
Stanford University

Call for Papers

We welcome submissions on all topics related to natural language reasoning or structured explanations, which might include:

  • Multi-step natural language reasoning;
  • Structured explanations;
  • Foundations of natural language reasoning;
  • Knowledge retrieval for multi-step reasoning;
  • Reasoning in interactive environments;
  • Applications of natural language reasoning;
  • Reasoning as programs;
  • Neuro-symbolic reasoning;

With recent scaling of large pre-trained Transformer language models (LLMs), the scope of feasible NLP tasks has broadened, including tasks requiring increasingly complex reasoning. Although LLMs have shown remarkable performance, it is still unclear how to best elicit this reasoning and to what extent the answers that models give follow from what they “know.” This workshop aims to bring together a diverse set of perspectives and attempts to establish common ground for how various kinds of explanation structures can tackle a broad class of reasoning problems in natural language and beyond. As such, the workshop welcomes and covers a wide range of topics, including (non-exclusively):

  • Multi-step natural language reasoning: Solving reasoning problems, such as those involving abstract manipulations, has been a long-standing challenge in the field of artificial intelligence. Large language models have recently achieved a new state-of-the-art performance on many reasoning benchmarks, often with approaches only requiring prompting. Current research frontiers are exploring what kinds of explanation formats are most effective, how reasoning is most effectively broken down, how to get language models to plan their reasoning, and what resources can be used to improve reasoning capabilities of language models. Tasks include mathematical reasoning, logical reasoning, commonsense reasoning, and more.
  • Structured explanations: Explanations for these complex tasks are typically composed of two or more facts that are used to help guide the reasoning process while also providing a record of the path taken to arrive at an inference. What representations can be best used by inference algorithms to construct large explanations? Frontiers of research include exploring search algorithms over such representations, how to represent annotations at scale and continual learning models.
  • Foundations of natural language reasoning: Does the structured reasoning constitute a plausible (interpretable to humans) and faithful (true to the model's processes) explanation? Does perturbing the reasoning lead to correctly modified behavior?
  • Knowledge retrieval for multi-step reasoning: It has been shown that LLMs can store factual knowledge implicitly in their parameters, however, their ability to access and manipulate knowledge is still limited. Future avenues of research include effective methods to combine parametric and non-parametric knowledge for complex reasoning, conditioning retrieval given intermediate reasoning context, retrieving better provenance for structured explanations.
  • Reasoning in interactive environments: Interactive environments are becoming an increasingly popular method for evaluating reasoning where an agent observes the environment, then takes steps in that environment to accomplish some goal. Here, manner (i.e. how-to) explanations take the form of the list of actions the agent required to accomplish some goal, e.g., "how to boil water in a kitchen", "how to grow an apple tree", "how to book a flight and a hotel in LA".
  • Applications of natural language reasoning: New QA settings, language grounding, explainable diagnosis systems, theorem provers using natural language, reasoning for scientific discovery, and more.
  • Reasoning as programs: Another body of work within computational cognitive science and AI has formalized reasoning as inference over programs, building on classical views of human reasoning in a symbol-like language of thought and linguistic semantics with logical languages. Language models of code to produce structured reasoning for commonsense problems or other similar approaches are all in scope here.
  • Neuro-symbolic reasoning: Pockets of contemporary work have proposed reformulating natural language reasoning as proceeding via modular neurosymbolic systems. Here LLMs operate as declarative programmers, “translating” natural language into a formal specification, such as one accepted by a satisfiability solver, and explicit inference is offloaded to classical symbolic algorithms for planning, constraint satisfaction, or probabilistic simulation.

Submission Guidelines

We welcome three types of papers: archival workshop papers, non-archival papers, and non-archival cross-submissions. Only regular workshop papers will be included in the workshop proceedings. Regular workshop submissions (both archival and non-archival) should be in PDF format and made through the OpenReview website set up for this workshop (link). In line with the ACL main conference policy, camera-ready versions of regular workshop papers will be given one additional page of content. Non-archival cross-submissions should be made through the form (link).

  • Archival regular workshop papers: Authors should submit a paper up to 8 pages (both short and long papers are welcome), with unlimited pages for references, following the ACL author guidelines. The reported research should be substantially original. All submissions will be reviewed in a single track, regardless of length. Accepted papers will be presented as posters by default, and best papers may be given the opportunity for a brief talk to introduce their work. Reviewing will be double-blind, and thus no author information should be included in the papers; self-reference that identifies the authors should be avoided or anonymised. Accepted papers will appear in the workshop proceedings. Preference for oral presentation slots in the workshop will be given to archival papers.
  • Non-archival regular workshop papers: This is the same as the option above, but these papers will not appear in the proceedings and will typically only receive poster presentation slots. Non-archival submissions in this category will still undergo the review process. This is appropriate for nearly finished work that is intended for submission to another venue at a later date.
  • Non-archival cross-submissions: We also solicit cross-submissions, i.e., papers on relevant topics that have already appeared in other venues (e.g., workshop or conference papers at NLP, ML, or cognitive science venues, among others). Accepted papers will be presented at the workshop, with an indication of original venue, but will not be included in the workshop proceedings. Cross-submissions are ideal for related work which would benefit from exposure to the NLReasoning audience. Papers in this category do not need to follow the ACL format, and the submission length is determined by the original venue. The paper selection will be solely determined by the organizing committee in a non-blind fashion. These papers will typically receive poster presentation slots.

In addition, we welcome papers on relevant topics that are under review or to be submitted to other venues (including the ACL 2024 main conference). These papers must follow the regular workshop paper format and will not be included in the workshop proceedings. Papers in this category will be reviewed by workshop reviewers.

Note to authors: For archival and non-archival regular workshop submissions, while you submit your paper through OpenReview (link), please select the "Submission Type" properly based on the guidelines. For cross-submissions, please fill out this form (link) and do NOT submit through OpenReview.

For questions about the submission guidelines, please contact workshop organizers via

Important Dates

Paper Submission Deadline May 17, 2024 (All deadlines are 11:59 PM AoE time.)
Decision Notifications June 17, 2024
Camera Ready Paper Deadline July 1, 2024
Workshop Date TBD (ACL 24 will take place in Bangkok, Thailand from 11th to 16th August, 2024)


Greg Durrett
University of Texas, Austin
Bhavana Dalvi
Allen Institute for AI
Peter Jansen
University of Arizona
Danilo Ribeiro
Amazon AWS
Xi Ye
University of Texas, Austin
Wenting Zhao
Cornell University