The 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) was run as a hybrid conference in Seattle, WA and online.
After ACL, this was my second in-person conference. Similar to ACL 2022, the conference experimented with a new reviewing process based on ACL rolling review (ARR). Overall, 442 papers were accepted for the main proceedings and 209 for publications in the "Findings of ACL: NAACL 2022".
In addition to various oral and poster sessions, NAACL offered very interesting panels, keynotes, tutorials, and workshops. The first conference day kicked off with six tutorials and on the last two days 23 different workshops covered a wide range of topics.
This blog post summarizes NAACL papers and talks that I found particularly interesting.
Panel: The Place of Linguistics and Symbolic Structures (w/ Emily Bender, Dilek Hakkani-Tür, Chitta Baral, and Chris Manning)
A very interesting session at the conference was the panel on “The Place of Linguistics and Symbolic Structures” with Emily Bender, Dilek Hakkani-Tür, Chitta Baral, and Chris Manning. Each panellist gave a short pitch on the topic at the start of the panel. Chitta Baral argued for the importance of symbolic structures if we want to go beyond solving the dataset only and concentrate on the task. Symbolic knowledge and structure can help in creating better datasets for that purpose. While there are quite some open challenges involving symbolic aspects, current research is only concentrating on few of them; for example, focusing on common-sense facts while common-sense reasoning includes much more challenges. Emily Bender called for a partnership between NLP and linguistics to use the deep scholarship available in linguistics research for NLP. For example, prior work in sociolinguistics can be useful for reasoning about the potential harms of today’s language technologies. Dilek Hakkani-Tür emphasized challenges related to dialog systems and how symbolic structure and knowledge grounding can be useful to address them. Finally, Chris Manning highlighted the difference between language as a symbolic structure and the human brain as a processor of these symbols, which is not implemented as a physical symbols system itself. He also mentioned that fundamental concepts of linguistics are becoming more important in deep learning research in general and used to understand human intelligence.
During the session, panellists mentioned grounded learning as well as social learning (motivated by how humans acquire language) multiple times as interesting directions for future research.
Multimodality
Like ACL, multimodality and grounded language learning was quite present and likewise the conference offered a tutorial and workshop on multimodal machine learning.
The tutorial by Louis-Philippe Morency, Paul Pu Liang, and Amir Zadeh started with some discussion on terms and concepts central to multimodal research. Louise-Philippe provided an historical overview of multimodal research and tasks on which the community focused in the past five years.
During the rest of the tutorial, the presenter discussed six core challenges of multimodal ML:
Representation learning: Learning multimodal representations that capture cross-modal interaction between data points of different modalities.
Alignment: Aligning data points such that interconnection and dependencies between different modalities become apparent, e.g. which object relates to which word in image captioning.
Reasoning: Using the multimodal knowledge acquired through representation learning and alignment for reasoning in a multistep fashion.
Generation: Producing raw modality that captures information from other modalities. Exemplary generation tasks are summarization, translation, and creation.
Summarization: Summarizing information content from multiple modalities into a smaller, compressed set of data and modalities.
Translation: Aiming to maintain the information content while translating from one modality to another.
Creation: Expanding the information content, e.g. going from latent representation to image.
Transfer concentrates on transferring knowledge between different modalities to overcome cases of noisy or limited resources by using data from other modalities.
Quantification encapsulates the previous challenges by studying heterogeneity in data, cross-modal interaction and multimodal learning.
Tutorial on Self-supervised representation learning for speech processing machine Learning
Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer (Zhao et al., 2022)
Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction (Chen et al., 2022)
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation (Liu et al., 2022)
Exposing the Limits of Video-Text Models through Contrast Sets (Park et al., 2022)
Explainability
On the Diversity and Limits of Human Explanations (Tan et al., 2022)
This paper provides an overview of work that uses human-generated explanations as the gold standard for evaluating models’ performance in explanation generation.
It discusses the limitations of human explanations and argues that these must be considered if explanations are used as uniform ground truth labels.
Explanation-Based Human Debugging of NLP Models: A Survey (Lertvittayakumjorn et al., 2022)
Can Rationalization Improve Robustness? (Chen et al., 2022)
Explaining Toxic Text via Knowledge Enhanced Text Generation (Sridhar et al., 2022)
Data collection and benchmarking
Panel on the future of data collection @DADC workshop
I attended the DADC workshop and especially enjoyed the panel on the future of data collection with lists, with Anna Rogers, Jordan Boyd-Graber, Sam Bowman, Sherry Tongshuang Wu, Lora Aroyo, Douwe Kiela, and Swabha Swayamdipta.
Starting with the status quo in data collection and evaluation, Lora emphasized that the goal of data collection should be capturing the natural expressions, perception, and diversity in humans instead of aiming for answers that fit well into models. She sees currently a lack of evaluation approaches ensuring that we reach this goal.
Swapah highlighted that a common pitfall of adversarial data collection is collecting data samples that are no longer meaningful for the task at hand. As adversarial examples can be very diverse, as a community we need to take a step back and consider what actually are adversarial examples. Also, we need to define certain terms that are commonly used in adversarial data collection but only vaguely defined.
Douwe raised the interesting point that in data collection a key point is to understand our data better, mentioning the data cartography work by Swayamdipta et al. (2020). Moreover, from an academic perspective it is useful that we ask the question of what kind of data we want to collect and work on as a community to measure our progress.
Douwe and Sherry also both mentioned interactive data collection with the help of humans and different models in the loop as an effective method that can help to become more model agnostic. Sam criticized one point about adversarial data collection, namely that data becomes biased towards the model used while collecting.
The panel also had an interesting discussion on the role of experts and crowdworkers in data collection, emphasizing the diversity in annotator groups. Concentrating only on expert datasets (with experts from our research community or neighbouring research communities), we might miss things due to the lack of diversity in the annotators for some dimensions/variables. Using experts, crowdworkers, and models jointly in data collection would lead to more diverse data.
Diagnosing Vision-and-Language Navigation: What Really Matters (Zhu et al., 2022)
Transparent Human Evaluation for Image Captioning (Kasai et al., 2022)
Tutorial on Human-Centered Evaluation of Explanations
Bias & Fairness
Measuring Fairness with Biased Rulers: A Comparative Study on Bias Metrics for Pre-trained Language Model (Delobelle et al., 2022)
This paper surveys metrices for measuring bias and fairness in pretrained language models and evaluates their compatibility. The authors find measures proposed in literature difficult to compare as they strongly depend on design choices such as probing templates, target seeds, and embeddings that are probed.
Benchmarking Intersectional Biases in NLP (Lalor et al., 2022)
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection (Sap et al., 2022)
Some further papers I enjoyed reading!
Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models (McDonald et al., 2022)
While every few days new pretrained models are released that are trained with more data, the carbon footprint of these models has been a subject of discussion in recent times.
This paper evaluates techniques for reducing energy consumption while maintaining model performance and computation time. For example, one of the proposed methods is power capping, which limits the maximum power consumption of GPUs, resulting in a 15% reduction in energy consumption.
Do Prompt-Based Models Really Understand the Meaning of Their Prompts (Webson et al., 2022)
Same Neurons, Different Languages: Probing Morphosyntax in Multilingual Pre-trained Models (Stanczak et al., 2022)
FNet: Mixing Tokens with Fourier Transforms (Lee-Thorp et al., 2022)
I hope you enjoyed this blog post!
Feel free to contact me with feedback, suggestions for future editions, or your thoughts on the discussed papers: mubashara.akhtar@kcl.ac.uk or @akhtarmubashara.