LM-Guided Chain-of-Thought
Overview
A new paper by Lee et al. (2024) proposes to improve reasoning in LLMs using small language models. This approach introduces an innovative method for enhancing reasoning capabilities while maintaining computational efficiency.
Methodology
Knowledge Distillation Approach
It first applies knowledge distillation to a small LM with rationales generated by the large LM with the hope of narrowing the gap in reasoning capabilities.
Task Decomposition Strategy
Essentially, the rationale is generated by the lightweight LM and the answer prediction is then left for the frozen large LM. This resource-efficient approach avoids the need to fine-tune the large model and instead offloads the rationale generation to the small language model.
Reinforcement Learning Optimization
The knowledge-distilled LM is further optimized with reinforcement learning using several rational-oriented and task-oriented reward signals.
Research Reference
"LM-Guide Chain-of-Thought"
Source: https://arxiv.org/pdf/2404.03414.pdf
Performance Results
Multi-hop Question Answering
The framework is tested on multi-hop extractive question answering and outperforms all baselines in terms of answer prediction accuracy. RL helps to improve the quality of generated rationales which further improves question-answering performance.
Comparison with Other Methods
The LM-guided CoT prompting approach proposed in this paper outperforms both standard prompting and CoT prompting. Self-consistency decoding also enhances performance.
Key Insights
This approach shows a clever use of small language models for rationale generation. The results are remarkable given that larger language models are preferred for this capability over smaller ones.
Practical Implications
Task Decomposition Strategy
Decomposing tasks in this way is something developers should think deeply about. Not everything needs to be done by the large models.
Fine-tuning Considerations
When fine-tuning, it's useful to think about what exact aspect you want to optimize and test to see if a small language model can do it for you.
Key Benefits
- Resource Efficiency: Avoids fine-tuning large models
- Improved Performance: Outperforms standard and CoT prompting
- Scalable Architecture: Leverages small models for specific tasks
- Cost Effective: Reduces computational requirements
- Flexible Design: Allows task-specific optimization
Technical Details
- Architecture: Small LM for rationale generation, frozen large LM for answer prediction
- Training: Knowledge distillation + reinforcement learning
- Reward Signals: Rational-oriented and task-oriented
- Decoding: Enhanced with self-consistency
