This AI paper proposes a recursive memory generation method to enhance long-term conversational coherence in large language models

Rate this post

In recent years there has been an increase in interest and research in chatbots and other types of open-domain communication systems. A long-term discussion setting is challenging because it requires knowing and remembering important points from previous conversations.

Large language models (LLMs) such as ChatGPT and GPT-4 have shown encouraging results in many recent natural language tasks. As a result, open-domain/task chatbots are built using LLM’s capabilities in prompting. However, in long conversations, even ChatGPT can lose track of context and give inconsistent answers.

Researchers from the Chinese Academy of Sciences and the University of Sydney investigate whether LLM can be efficiently used in long-term conversations without labeled data or additional tools. Researchers use LLMs to create recursive summaries as memories, where they preserve important information from ongoing conversations, inspired by the memory-enhanced approach. In actual use, the LLM will initially be given a brief background and asked to summarize it. Then, they have the LLM combine the previous and subsequent statements to create a new summary/memory. Then, he concluded by asking the LLM to make a decision based on the most recent information he had collected.

🔥 Keep up with important advances in AI research with our newsletter – subscribe now while it’s free!

The proposed schema can serve as a viable solution to enable existing LLMs to model extremely long contexts (conversation sessions) without costly extension of the maximum length setting and modeling long-term discourse.

The applicability of the proposed scheme is demonstrated experimentally on a public long-term dataset using the LLM API ChatGPT and text-davinci-003. Moreover, the study shows that using a single labeled sample can significantly increase the performance of the suggested strategy.

An arbitrary large language model is asked by researchers to perform memory management and answer generation tasks. The former is in charge of iteratively summarizing the important details of an ongoing conversation, and the latter involves memory to produce acceptable responses.

In this study, the team used only automated measures to judge the effectiveness of the proposed methodology, which may not be optimal for open-domain chatbots. In real-world applications, they cannot ignore the cost of making huge model calls, which are not taken into account by their solution.

In the future, the researchers plan to test the effectiveness of their approach to long-context modeling on other long-context jobs, including story generation. They plan to improve the summarization capabilities of their method by using locally supervised fine-tuned LLMs instead of expensive online APIs.

check paper All credit for this research goes to the researchers in this project. Also, don’t forget to participate Our 30k+ ML SubReddit, 40k+ Facebook community, Discord ChannelAnd Email newsletterWhere we share the latest AI research news, cool AI projects and more.

If you like our work, you will like our newsletter.

Dhanashree Shenwai is a Computer Science Engineer with a keen interest in the applications of AI and has good experience in FinTech companies covering Financial, Cards & Payments and Banking domains. She is passionate about discovering new technologies and advancements that make everyone’s life easier in today’s developing world.

Leave a Comment