Lädt...


📚 Poems, Flowers, and Dragons at EMNLP 2022


Nachrichtenbereich: 🔧 AI Nachrichten
🔗 Quelle: towardsdatascience.com

“poems, flowers, dungeons and dragons united, digital art — ar 3:2 — v 4”, Midjourney
“poems, flowers, dungeons and dragons united, digital art — ar 3:2 — v 4”, Midjourney

The EMNLP conference is a highly regarded event in the field of natural language processing, where researchers come together to share and discuss the latest findings in the field. This year’s conference took place from December 7th to December 11th in Abu Dhabi. Of the many papers presented at the conference, I wanted to highlight three that stood out to me. These papers may not necessarily be the most practical or well-known, but I believe they are worth mentioning. Two papers were presented as posters, while the third was a full talk. My favorite of the three is PoeLM.

PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised Poetry Generation

Motivation

Can modern language models write poems? Of course, they can. You can quickly test it with ChatGPT. The challenges arise when trying to impose specific constraints, such as a fixed number of syllables or a specific rhyme or rhythm scheme.

How can we force language models to generate formal verse poems? One way is to modify the decoding algorithm, which is complicated with modern language models as they operate with sub-words, which are neither words nor syllables. This paper describes another way to do it. For this to work, you will need a regular text corpus and a system capable of analyzing syllables and rhymes.

Training a language model

Figure from the paper, a proposed method.

Here is what you need to do:

  1. Get a regular, non-poetic corpus, and split it into phrases.
  2. Group the text in blocks of N phrases, where N is randomly sampled.
  3. Augment groups with structure descriptors (=prefixes) to include the number of syllables and rhyme endings for each phrase.
  4. Train a classic transformer language model with structure descriptors treated as ordinary tokens.
Figure from the paper. A formal verse poem and its associated structure descriptor.

A structure descriptor from the figure above is

<PREF>
<LEN:11><END:echo>
<LEN:11><END:ura>
<LEN:11><END:ura>
<LEN:11><END:echo>
</PREF>

This descriptor means four lines; each has 11 syllables; the first and last lines end with “echo”, and lines 2 and 3 end with “ura”. The model will learn how to use these codes, as generating texts using such hints is easier than without them.

Generation

  1. Choose a rhyming scheme and number of syllables.
  2. Generate a structure descriptor. Authors do it from the given scheme by sampling each rhyming sound independently from the training corpus’s five most common rhyme sounds.
  3. Provide the first line of a poem (optionally)
  4. Generate a lot of poem candidates using the trained language model.
  5. Generate a lot of poem candidates using the trained language model.
  6. Re-rank remaining candidates by general fluency using the trained language model without a structure descriptor and output the one with the highest score.

How well does it work?

Table from the paper. Percentage of times that system S1 is ranked ahead of S2 in the human evaluation.

The filtering rate from step 5 is 30.9% for Spanish poems and 23.4% for Basque poems. 37.3% of humans prefer automatic poems over those written by renowned poets comparing poems with the same first line.

Can you do the same in your language?

A reliable syllabication and rhyme detection process are necessary to use the described algorithm. While such programs may already exist for some languages, other languages may have more complex features, such as rhythm, that need to be considered. The structure descriptors can be modified in these cases to include additional components.

Why is it important to me?

Six years ago, Daniil Anastasyev and I developed a system for the Russian poem generation, rupo. It was an LSTM-based language model with some unique features: it predicted texts from right to left, separately using normal forms of words and their grammatical features, and it was based on finite-state acceptors. Since then, natural language processing technologies have advanced significantly, making it likely easier to create a similar system today.

Draw Me a Flower: Processing and Grounding Abstraction in Natural Language

  • Paper: Lachmy et al., 2022
  • Organizations: Bar-Ilan University, AI2
  • Code: https://github.com/OnlpLab/Hexagons, but there are no baselines yet, only the dataset itself.
  • Main idea: Creating a benchmark for grounded abstractions in natural language with instruction-based pattern drawing on a hexagonal grid.
Figure from the paper, levels of abstraction in natural language

Motivation

We know large language models can’t count correctly or perform back-of-the-envelope calculations. Even a simple spatial reasoning task is a problem (chain-of-thought helps, though). But what about abstraction? When you command your hypothetical AI assistant, “order three pizzas, one BBQ, one Pepperoni, and one Margherita, first two large, the last medium, at 5 pm”, it should be able to understand you. It’s not only about ellipsis but also conditions, iterations, functional decomposition, recursion, and other mechanisms.

To measure the extent to which a model can grasp abstract concepts, we can ground it in various virtual worlds. In this case, the authors used a hexagonal board with 10x18 tiles and eight colors as the basis for grounding abstractions.

Dataset

The dataset for this study was gathered through crowd-sourcing efforts. While the authors provided the starting images, crowd workers also contributed by drawing additional patterns. The annotation process was divided into two phases: in the first phase, a group of annotators wrote instructions based on the images, and in the second phase, another group attempted to recreate the images based on the instructions. Any discrepancies or disagreements were resolved through manual inspection. The resulting dataset has 175 unique images, 620 instruction sets, and 4177 instruction steps.

Figure from the paper, a gallery sample.

Experiments

Two types of models were tested: classification and generation-based. DeBERTa was used for the classification to predict every tile’s state. For the generation, T5 was used to generate a set of actions. The models were tested under various settings that varied in terms of the amount of history and current board information available to them: no history, one previous step, full history, predicted board, and oracle board. The results indicate that the models performed significantly worse than humans and could only handle the most basic abstractions, even with access to an oracle board and full history.

Table from the paper. Results for both types of models on the test set, actions-based metrics.
Table from the paper. Dataset evaluation, human performance.

Why is it important?

It is a great visual representation of how challenging this problem is for natural language models. This benchmark makes it possible to identify which abstraction mechanisms are lacking in these models quickly. I suspect code-based models would perform better in this task and am interested in testing this hypothesis.

Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence

  • Paper: Callison-Burch et al., 2022
  • Organizations: University of Pennsylvania, Google Research
  • Code: not yet released, should be here
  • Main idea: Creating a challenge for dialogue systems based on D&D conversations, where the tasks are to generate the next conversational turn in the game and predict the state of the game, given the dialogue history.
“robots playing D&D, digital art, futuristic — ar 3:2 — v 4”, Midjourney

Motivation

Dungeons & Dragons is a fantasy tabletop role-playing game. Characters embark upon adventures within a fantasy setting. A Dungeon Master serves as the game’s referee and storyteller while maintaining the setting in which the adventures occur, and playing the role of the game world’s inhabitants, also referred to as non-player characters (NPCs). The characters form a party and interact with the setting’s inhabitants and each other. Together they solve dilemmas, engage in battles, explore, and gather treasure and knowledge. In the process, the characters earn experience points to rise in levels and become increasingly powerful over a series of separate gaming sessions. — Wikipedia

Many natural language processing datasets are highly specialized, focusing on a specific task. Dungeons and Dragons (D&D) is a human activity that requires a high level of language comprehension from all participants. It involves a range of skills such as text generation, knowledge base lookup, multi-party dialogue, goal setting, common sense reasoning, intent detection, state tracking, and question answering, making it an ideal testbed for evaluating the capabilities of NLP models.

Other applications of AI for D&D include character photo creation and, of course, the famous AI Dungeon.

Dataset

Figure from the paper. Example of 3 turns in the D&D Beyond play-by-post forum.

Authors scraped Play-By-Post data from the D&D Beyond web forum, where people play by taking turns posting on the forum to describe their moves. It isn’t the only possible source for D&D sessions. For instance, the CRD3 dataset used transcripts from the Critical Role show.

Table from the paper, dataset statistics.

Rule-based heuristics were used to extract game state information from texts using regular expressions and NER. In addition, a CNN classifier for texts was used in cases where heuristics failed to extract anything. The dataset includes not only in-character texts but also out-of-character posts.

Experiments

LaMDA, Google’s large language model similar to GPT-3, was used to tackle two tasks: game state tracking and response generation. The authors experimented with various fine-tuning variations of the model, including using states from the current or previous turns as control features. To evaluate the model’s performance, six professional raters interested in the fantasy genre and prior experience with D&D, including three who had served as Dungeon Masters, were recruited for a manual assessment.

Table from the paper. Average human evaluators’ scores for systems and human-written gold responses.

The evaluation results show that domain adaptation is beneficial, but the impact of control features could be clearer. However, these features enable the model to take on specific roles within the game, which could make it a valuable substitute for a Dungeon Master or a player in actual D&D games.

Table from the paper. Average accuracy for GST compared to a majority class baseline.

The results for the game state tracking task could have been better. The model was fed all previous dialog turns and their corresponding state variables, as well as the text of the current turn, and was expected to output the correct state variables for the current turn. The joint accuracy for the model was 58%. These results suggest that the use of a large language model alone is not sufficient for this task and that further modifications may be necessary to improve performance.

Conclusion

In conclusion, the research and findings discussed above highlight the ongoing challenges and areas for improvement. It is essential to consider the value of non-mainstream papers, as they may offer unique insights and approaches that could be overlooked in a rush to keep up with more widely recognized works.


Poems, Flowers, and Dragons at EMNLP 2022 was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

...

📰 Poems, Flowers, and Dragons at EMNLP 2022


📈 100.17 Punkte
🔧 AI Nachrichten

🔧 Writing {{'poems that change', 'chance poems', 'dynamic poetry'}}


📈 57.16 Punkte
🔧 Programmierung

📰 EMNLP 2022


📈 32.52 Punkte
🔧 AI Nachrichten

📰 Google at EMNLP 2022


📈 32.52 Punkte
🔧 AI Nachrichten

📰 Empirical Methods in Natural Language Processing (EMNLP) 2023


📈 30.16 Punkte
🔧 AI Nachrichten

📰 Empirical Methods in Natural Language Processing (EMNLP) 2024


📈 30.16 Punkte
🔧 AI Nachrichten

📰 Have Amazon product questions? Its new AI bot has answers, and poems


📈 29.8 Punkte
📰 IT Nachrichten

🎥 Using AI to generate poems from pictures with Python in Firebase


📈 28.58 Punkte
🎥 Video | Youtube

🔧 "Programming Poetry: A Collection of Poems on the Art of Programming"


📈 28.58 Punkte
🔧 Programmierung

📰 Marriott Breach And 1-800 Flowers Breached For Years


📈 22.59 Punkte
📰 IT Security Nachrichten

📰 Why it’s Best to Stick to Sharing Chocolates and Flowers this Valentine’s Day


📈 22.59 Punkte
📰 IT Security Nachrichten

🪟 The Invincible PC Review: Slow down and smell the mysterious metal flowers


📈 22.59 Punkte
🪟 Windows Tipps

🔧 Let's detect flowers! (with SageMaker and DeepLens)


📈 22.59 Punkte
🔧 Programmierung

🍏 Save on flowers, jewelry, clothes and more with these select Apple Pay deals


📈 22.59 Punkte
🍏 iOS / Mac OS

🔧 The Connection Between Machine Learning and Nature: Detecting Flowers with AI


📈 22.59 Punkte
🔧 Programmierung

📰 Microsoft Launches iPhone-Exclusive App That Can Identify Flowers


📈 21.37 Punkte
📰 IT Security

📰 Microsoft Launches iPhone-Exclusive App That Can Identify Flowers


📈 21.37 Punkte
📰 IT Security

📰 Can We Pollinate Flowers With Tiny Flying Drones?


📈 21.37 Punkte
📰 IT Security Nachrichten

📰 Debenhams Flowers shoppers stung by bank card-stealing tech pest


📈 21.37 Punkte
📰 IT Security Nachrichten

📰 Cybercriminals are saying it with flowers from Debenhams


📈 21.37 Punkte
📰 IT Security Nachrichten

🕵️ FloweRS 2.0 cas.php rok cross site scripting


📈 21.37 Punkte
🕵️ Sicherheitslücken

🕵️ FloweRS 2.0 cas.php den cross site scripting


📈 21.37 Punkte
🕵️ Sicherheitslücken

📰 Grey&#039;s Anatomy - Staffel 15: Recap zu Folge 6 &quot;Flowers Grow Out of My Grave&quot;


📈 21.37 Punkte
📰 IT Nachrichten

📰 1-800-Flowers Becomes Latest Payment Breach Victim


📈 21.37 Punkte
📰 IT Security Nachrichten

📰 Plants Can Hear Animals Using Their Flowers


📈 21.37 Punkte
📰 IT Security Nachrichten

📰 The Perimeter Is Dead. Send Flowers.


📈 21.37 Punkte
📰 IT Security Nachrichten

🪟 Cactus Flowers – Neues Theme für euch im Microsoft Store


📈 21.37 Punkte
🪟 Windows Tipps

🎥 Build an Android app to recognize flowers


📈 21.37 Punkte
🎥 Video | Youtube

🔧 Behind the Design: Wylde Flowers


📈 21.37 Punkte
🔧 Programmierung

📰 How to Plant Flowers in Animal Crossing: Create a Colorful Island


📈 21.37 Punkte
🖥️ Betriebssysteme

📰 Dahmer: Was wurde aus dem Überlebenden Ronald Flowers?


📈 21.37 Punkte
📰 IT Nachrichten

📰 "Buy Myself Flowers": Warum sich Frau dieses Jahr zu Valentinstag selbst beschenkt


📈 21.37 Punkte
📰 IT Nachrichten

📰 These Lego flowers make a perfect Valentine's Day bouquet (seriously, it'll last forever)


📈 21.37 Punkte
📰 IT Nachrichten

matomo