Skip to content

LM-Guided Chain-of-Thought

Overview

A new paper by Lee et al. (2024) proposes to improve reasoning in LLMs using small language models. This approach introduces an innovative method for enhancing reasoning capabilities while maintaining computational efficiency.

Methodology

Knowledge Distillation Approach

It first applies knowledge distillation to a small LM with rationales generated by the large LM with the hope of narrowing the gap in reasoning capabilities.

Task Decomposition Strategy

Essentially, the rationale is generated by the lightweight LM and the answer prediction is then left for the frozen large LM. This resource-efficient approach avoids the need to fine-tune the large model and instead offloads the rationale generation to the small language model.

Reinforcement Learning Optimization

The knowledge-distilled LM is further optimized with reinforcement learning using several rational-oriented and task-oriented reward signals.

Research Reference

"LM-Guide Chain-of-Thought"

Source: https://arxiv.org/pdf/2404.03414.pdf

Performance Results

Multi-hop Question Answering

The framework is tested on multi-hop extractive question answering and outperforms all baselines in terms of answer prediction accuracy. RL helps to improve the quality of generated rationales which further improves question-answering performance.

Comparison with Other Methods

The LM-guided CoT prompting approach proposed in this paper outperforms both standard prompting and CoT prompting. Self-consistency decoding also enhances performance.

Key Insights

This approach shows a clever use of small language models for rationale generation. The results are remarkable given that larger language models are preferred for this capability over smaller ones.

Practical Implications

Task Decomposition Strategy

Decomposing tasks in this way is something developers should think deeply about. Not everything needs to be done by the large models.

Fine-tuning Considerations

When fine-tuning, it's useful to think about what exact aspect you want to optimize and test to see if a small language model can do it for you.

Key Benefits

  1. Resource Efficiency: Avoids fine-tuning large models
  2. Improved Performance: Outperforms standard and CoT prompting
  3. Scalable Architecture: Leverages small models for specific tasks
  4. Cost Effective: Reduces computational requirements
  5. Flexible Design: Allows task-specific optimization

Technical Details

  • Architecture: Small LM for rationale generation, frozen large LM for answer prediction
  • Training: Knowledge distillation + reinforcement learning
  • Reward Signals: Rational-oriented and task-oriented
  • Decoding: Enhanced with self-consistency