DeepMind has found that LLMs can optimize their own prompts

Rate this post

When people program new deep learning AI models — which can themselves focus on the right features of the data — the vast majority rely on optimization algorithms or optimizers to ensure the models have a high rate of accuracy. But one of the most commonly used optimizers — derivative-based optimizers — has difficulty handling real-world applications.

In a new paper, DeepMind researchers propose a new approach: Optimization by PROmpting (OPRO), a method that uses AI Large Language Models (LLM) as optimizers. The unique aspect of this approach is that the optimization task is defined in natural language rather than through formal mathematical definitions.

The researchers write, “Instead of formally defining an optimization problem and obtaining an update step with a programmed solver, we describe the optimization problem in natural language, then instruct the LLM to iteratively generate new solutions based on the problem description and previously found solutions. .”

The technique is highly adaptable. By simply changing the problem description or adding specific instructions, LLM can be guided to solve many problems.

The researchers found that, on small-scale optimization problems, LLMs can generate effective solutions through prompting alone, sometimes matching or even surpassing the performance of expert-designed heuristic algorithms. However, OPRO’s true potential lies in its ability to optimize the LLM prompt to obtain maximum accuracy from the models.

How optimization by PROmpting works

OPRO’s processing starts with a “meta-prompt” as input. These meta-prompts include a natural language description of the task at hand with some examples of problems, placeholders for prompt instructions, and associated solutions.

As the optimization process unfolds, the Large Language Model (LLM) generates candidate solutions. These are based on the problem description and previous solutions included in the meta-prompt.

OPRO then evaluates these candidate solutions, assigning each a quality score. Optimal solutions and their scores are added to the meta-prompt, enriching the context for the next round of solution generation. This iterative process continues until the model stops proposing better solutions.

“The main advantage of LLMs for optimization is their ability to understand natural language, which allows people to describe their optimization tasks without formal specifications,” the researchers explain.

This means users can specify target metrics such as “accuracy” and also provide other suggestions. For example, they may request that the model produce solutions that are concise and broadly applicable.

OPRO also capitalizes on LLMs’ ability to explore context patterns. This enables the model to identify optimization paths based on examples included in the meta-prompt. The researchers noted, “Incorporating optimization trajectories into the meta-prompt allows LLM to recognize the similarity of solutions with high scores, encouraging LLM to build on existing good solutions to generate potentially better solutions without having to explicitly define what the solution should be. Updated.”

To verify OPRO’s effectiveness, the researchers tested it on two well-known mathematical optimization problems: linear regression and the “traveling salesman problem.” Although OPRO is not the most optimal way to solve these problems, the results are promising.

“On both tasks, we observe that LLMs correctly capture optimization directions on small-scale problems based only on past optimization trajectories provided in the meta-prompt,” the researchers report.

Optimizing the LLM prompt with OPRO

Experiments show that prompt engineering can dramatically affect the output of the model. For example, adding the phrase “let’s think step-by-step” to the prompt can make the model more logical, allowing it to outline the steps needed to solve the problem. This can often lead to more accurate results.

However, it is important to remember that this does not mean that LLMs have human-like reasoning abilities. Their responses are highly dependent on the nature of the prompt, and semantically the same prompt can produce widely different results. “Optimal prompt formats can be model-specific and task-specific,” the DeepMind researchers write.

The real potential of optimization through PROmpting lies in its ability to optimize prompts for LLMs like OpenAI’s ChatGPT and Google’s PaLM. This can guide these models to find the best prompts that maximize task accuracy.

“OPRO enables LLM to gradually generate new prompts that improve task accuracy throughout the optimization process, where initial prompts have low task accuracy,” they write.

To illustrate this, consider the task of finding the optimal prompt for solving a word-math problem. An “Optimizer LLM” is provided with a meta-prompt that includes instructions and examples with placeholders for the optimization prompt (eg, “Let’s think step-by-step”). The model generates a set of different optimization prompts and sends them to the “Scorer LLM”. This scorer tests them on LLM problem examples and evaluates the results. The best prompts, along with their scores, are added to the beginning of the meta-prompt and the process is repeated.

The researchers evaluated this technique using several LLMs from the PaLM and GPT families. They found that “all LLMs in our evaluation are able to act as optimizers, consistently improving the performance of generated prompts through iterative optimization until convergence.”

For example, when testing OPRO with PaLM-2 on GSM8K, a benchmark of grade school math word problems, the model produced interesting results. It started with the prompt “Let’s solve the problem” and generated other strings, such as “Let’s think about the problem carefully and solve it together,” “Let’s break it down,” “Let’s calculate a way to solve it,” and finally “Let’s do the math, ” which provided the highest accuracy.

In another experiment, the most accurate results were produced when the string “take a deep breath and work through this problem step by step” was added before the LLM’s answer.

These results are both fascinating and somewhat disturbing. To a human, all these instructions would mean the same thing, but they triggered very different behavior in LLM. It serves as a caution against anthropomorphic LLMs and highlights how much we still have to learn about their inner workings.

However, the advantage of OPRO is clear. It provides a systematic way to explore the wide space of possible LLM prompts and find the one that works best for a particular type of problem. How it will hold up in real-world applications remains to be seen, but this research could be a step toward understanding how LLM works.

VentureBeat’s mission It is a digital town square for technical decision makers to gain knowledge about transformative enterprise technologies and practices. Find our briefing.

Leave a Comment