📚 Poems, Flowers, and Dragons at EMNLP 2022
Nachrichtenbereich: 🔧 AI Nachrichten
🔗 Quelle: towardsdatascience.com
The EMNLP conference is a highly regarded event in the field of natural language processing, where researchers come together to share and discuss the latest findings in the field. This year’s conference took place from December 7th to December 11th in Abu Dhabi. Of the many papers presented at the conference, I wanted to highlight three that stood out to me. These papers may not necessarily be the most practical or well-known, but I believe they are worth mentioning. Two papers were presented as posters, while the third was a full talk. My favorite of the three is PoeLM.
PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation
- Paper: Ormazabal et al., 2022
- Organizations: University of the Basque Country, Meta AI, University of Copenhagen
- Code: https://github.com/aitorormazabal/poetry_generation, though there is only dataset creation.
- Main idea: Generating Spanish and Basque formal verse poems through control codes with a language model trained on non-poetic texts.
Motivation
Can modern language models write poems? Of course, they can. You can quickly test it with ChatGPT. The challenges arise when trying to impose specific constraints, such as a fixed number of syllables or a specific rhyme or rhythm scheme.
How can we force language models to generate formal verse poems? One way is to modify the decoding algorithm, which is complicated with modern language models as they operate with sub-words, which are neither words nor syllables. This paper describes another way to do it. For this to work, you will need a regular text corpus and a system capable of analyzing syllables and rhymes.
Training a language model
Here is what you need to do:
- Get a regular, non-poetic corpus, and split it into phrases.
- Group the text in blocks of N phrases, where N is randomly sampled.
- Augment groups with structure descriptors (=prefixes) to include the number of syllables and rhyme endings for each phrase.
- Train a classic transformer language model with structure descriptors treated as ordinary tokens.
A structure descriptor from the figure above is
<PREF>
<LEN:11><END:echo>
<LEN:11><END:ura>
<LEN:11><END:ura>
<LEN:11><END:echo>
</PREF>
This descriptor means four lines; each has 11 syllables; the first and last lines end with “echo”, and lines 2 and 3 end with “ura”. The model will learn how to use these codes, as generating texts using such hints is easier than without them.
Generation
- Choose a rhyming scheme and number of syllables.
- Generate a structure descriptor. Authors do it from the given scheme by sampling each rhyming sound independently from the training corpus’s five most common rhyme sounds.
- Provide the first line of a poem (optionally)
- Generate a lot of poem candidates using the trained language model.
- Generate a lot of poem candidates using the trained language model.
- Re-rank remaining candidates by general fluency using the trained language model without a structure descriptor and output the one with the highest score.
How well does it work?
The filtering rate from step 5 is 30.9% for Spanish poems and 23.4% for Basque poems. 37.3% of humans prefer automatic poems over those written by renowned poets comparing poems with the same first line.
Can you do the same in your language?
A reliable syllabication and rhyme detection process are necessary to use the described algorithm. While such programs may already exist for some languages, other languages may have more complex features, such as rhythm, that need to be considered. The structure descriptors can be modified in these cases to include additional components.
Why is it important to me?
Six years ago, Daniil Anastasyev and I developed a system for the Russian poem generation, rupo. It was an LSTM-based language model with some unique features: it predicted texts from right to left, separately using normal forms of words and their grammatical features, and it was based on finite-state acceptors. Since then, natural language processing technologies have advanced significantly, making it likely easier to create a similar system today.
Draw Me a Flower: Processing and Grounding Abstraction in Natural Language
- Paper: Lachmy et al., 2022
- Organizations: Bar-Ilan University, AI2
- Code: https://github.com/OnlpLab/Hexagons, but there are no baselines yet, only the dataset itself.
- Main idea: Creating a benchmark for grounded abstractions in natural language with instruction-based pattern drawing on a hexagonal grid.
Motivation
We know large language models can’t count correctly or perform back-of-the-envelope calculations. Even a simple spatial reasoning task is a problem (chain-of-thought helps, though). But what about abstraction? When you command your hypothetical AI assistant, “order three pizzas, one BBQ, one Pepperoni, and one Margherita, first two large, the last medium, at 5 pm”, it should be able to understand you. It’s not only about ellipsis but also conditions, iterations, functional decomposition, recursion, and other mechanisms.
To measure the extent to which a model can grasp abstract concepts, we can ground it in various virtual worlds. In this case, the authors used a hexagonal board with 10x18 tiles and eight colors as the basis for grounding abstractions.
Dataset
The dataset for this study was gathered through crowd-sourcing efforts. While the authors provided the starting images, crowd workers also contributed by drawing additional patterns. The annotation process was divided into two phases: in the first phase, a group of annotators wrote instructions based on the images, and in the second phase, another group attempted to recreate the images based on the instructions. Any discrepancies or disagreements were resolved through manual inspection. The resulting dataset has 175 unique images, 620 instruction sets, and 4177 instruction steps.
Experiments
Two types of models were tested: classification and generation-based. DeBERTa was used for the classification to predict every tile’s state. For the generation, T5 was used to generate a set of actions. The models were tested under various settings that varied in terms of the amount of history and current board information available to them: no history, one previous step, full history, predicted board, and oracle board. The results indicate that the models performed significantly worse than humans and could only handle the most basic abstractions, even with access to an oracle board and full history.
Why is it important?
It is a great visual representation of how challenging this problem is for natural language models. This benchmark makes it possible to identify which abstraction mechanisms are lacking in these models quickly. I suspect code-based models would perform better in this task and am interested in testing this hypothesis.
Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence
- Paper: Callison-Burch et al., 2022
- Organizations: University of Pennsylvania, Google Research
- Code: not yet released, should be here
- Main idea: Creating a challenge for dialogue systems based on D&D conversations, where the tasks are to generate the next conversational turn in the game and predict the state of the game, given the dialogue history.
Motivation
Dungeons & Dragons is a fantasy tabletop role-playing game. Characters embark upon adventures within a fantasy setting. A Dungeon Master serves as the game’s referee and storyteller while maintaining the setting in which the adventures occur, and playing the role of the game world’s inhabitants, also referred to as non-player characters (NPCs). The characters form a party and interact with the setting’s inhabitants and each other. Together they solve dilemmas, engage in battles, explore, and gather treasure and knowledge. In the process, the characters earn experience points to rise in levels and become increasingly powerful over a series of separate gaming sessions. — Wikipedia
Many natural language processing datasets are highly specialized, focusing on a specific task. Dungeons and Dragons (D&D) is a human activity that requires a high level of language comprehension from all participants. It involves a range of skills such as text generation, knowledge base lookup, multi-party dialogue, goal setting, common sense reasoning, intent detection, state tracking, and question answering, making it an ideal testbed for evaluating the capabilities of NLP models.
Other applications of AI for D&D include character photo creation and, of course, the famous AI Dungeon.
Dataset
Authors scraped Play-By-Post data from the D&D Beyond web forum, where people play by taking turns posting on the forum to describe their moves. It isn’t the only possible source for D&D sessions. For instance, the CRD3 dataset used transcripts from the Critical Role show.
Rule-based heuristics were used to extract game state information from texts using regular expressions and NER. In addition, a CNN classifier for texts was used in cases where heuristics failed to extract anything. The dataset includes not only in-character texts but also out-of-character posts.
Experiments
LaMDA, Google’s large language model similar to GPT-3, was used to tackle two tasks: game state tracking and response generation. The authors experimented with various fine-tuning variations of the model, including using states from the current or previous turns as control features. To evaluate the model’s performance, six professional raters interested in the fantasy genre and prior experience with D&D, including three who had served as Dungeon Masters, were recruited for a manual assessment.
The evaluation results show that domain adaptation is beneficial, but the impact of control features could be clearer. However, these features enable the model to take on specific roles within the game, which could make it a valuable substitute for a Dungeon Master or a player in actual D&D games.
The results for the game state tracking task could have been better. The model was fed all previous dialog turns and their corresponding state variables, as well as the text of the current turn, and was expected to output the correct state variables for the current turn. The joint accuracy for the model was 58%. These results suggest that the use of a large language model alone is not sufficient for this task and that further modifications may be necessary to improve performance.
Conclusion
In conclusion, the research and findings discussed above highlight the ongoing challenges and areas for improvement. It is essential to consider the value of non-mainstream papers, as they may offer unique insights and approaches that could be overlooked in a rush to keep up with more widely recognized works.
Poems, Flowers, and Dragons at EMNLP 2022 was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
...
📰 Poems, Flowers, and Dragons at EMNLP 2022
📈 100.17 Punkte
🔧 AI Nachrichten
📰 EMNLP 2022
📈 32.52 Punkte
🔧 AI Nachrichten
📰 Google at EMNLP 2022
📈 32.52 Punkte
🔧 AI Nachrichten
📰 The Perimeter Is Dead. Send Flowers.
📈 21.37 Punkte
📰 IT Security Nachrichten
🎥 Build an Android app to recognize flowers
📈 21.37 Punkte
🎥 Video | Youtube
🔧 Behind the Design: Wylde Flowers
📈 21.37 Punkte
🔧 Programmierung