Thinking Faster By Writing Less We are diving into the rapidly evolving world of AI, and it's essential to stay ahead of the curve with innovative prompting strategies.

One of the latest advancements is the Chain of Draft framework, which redefines how large language models (LLMs) process information. At University 365, we recognize the importance of such innovations in equipping our students and professionals with the skills necessary to thrive in the AI-driven job market.

What is the Chain of Draft Framework?

The Chain of Draft is a novel prompting strategy that significantly enhances the efficiency of thinking models compared to its predecessor, Chain of Thought. While Chain of Thought has been instrumental in advancing AI capabilities, it often requires extensive computational resources and produces verbose outputs, leading to increased latency. The Chain of Draft, developed by researchers from Zoom Communications, aims to streamline this process by encouraging models to produce concise, dense outputs, closely mirroring human cognitive processes.

Why Shift from Chain of Thought to Chain of Draft?

Chain of Thought enables models to break down problems step-by-step, reflecting a structured reasoning process similar to human thinking. However, this approach can be inherently slow and resource-intensive due to the high number of tokens processed before arriving at a final output. The Chain of Draft addresses these challenges by promoting a more efficient way of thinking, allowing LLMs to generate essential insights without unnecessary elaboration.

How Does Chain of Draft Work?

Instead of generating verbose intermediate steps, Chain of Draft encourages LLMs to focus on critical information, enabling them to advance towards solutions more efficiently. This method allows for a more human-like approach to problem-solving, where only the most pertinent information is captured, significantly reducing computational overhead.

For example, consider a simple math problem: "Jason had 20 lollipops. After giving some to Denny, he now has 12. How many did he give to Denny?

Using Chain of Thought, the model might elaborate extensively on the steps involved, resulting in a lengthy output. In contrast, Chain of Draft would succinctly capture the essence of the problem:

20 - x = 12
x = 20 - 12
x = 8

This approach not only arrives at the correct answer but does so in a fraction of the tokens used by the Chain of Thought method.

Comparative Performance: Chain of Draft vs. Chain of Thought

Research indicates that Chain of Draft performs on par with or exceeds the Chain of Thought method in terms of accuracy while drastically reducing the number of tokens and latency. For instance, in tests with the GPT-4 model, Chain of Thought achieved 95% accuracy but used 200 tokens with a latency of 4.2 seconds. In contrast, Chain of Draft maintained an impressive 91.1% accuracy, utilizing only 43 tokens and completing the task in just one second.

Challenges with the Chain of Thought

While Chain of Thought has proven effective, it can lead to overthinking, especially with simpler tasks. This often results in unnecessary resource consumption. Techniques like streaming have been introduced to alleviate perceived latency by showing intermediate steps, but they do not fully resolve the inherent inefficiencies.

Moreover, other evolving techniques such as the Skeleton of Thought aim to streamline the process further but still face limitations in computational costs and applicability to complex tasks.

Implementing the Chain of Draft Framework

The beauty of the Chain of Draft framework lies in its simplicity. It does not require extensive model updates or fine-tuning; instead, it focuses on modifying the prompting strategy. By instructing the model to provide minimal drafts for each thinking step, users can guide it to deliver concise yet effective outputs. For instance, prompts can specify a word limit for each step, ensuring brevity and clarity.

INSIDE - Publications

Understanding the Chain of Draft Framework in AI LLM Prompting