Fixing Gemini & Claude: Chain-of-Thought Issue
Have you ever tried to use the chain-of-thought feature with models like gemini-claude-sonnet-4-5-thinking and found it just doesn't work? You're not alone! This article dives into a specific issue where the chain-of-thought (thinking) capability isn't functioning as expected across various APIs (OpenAI, Gemini, and Claude). We'll break down the problem, explore the technical analysis, and offer a solution to get these models thinking step by step.
The Problem: Chain-of-Thought Isn't Working
The core issue is that the gemini-claude-sonnet-4-5-thinking model fails to return any chain-of-thought content when using OpenAI-compatible, Gemini-compatible, or Claude Code-compatible APIs. When you send a request that includes parameters like reasoning_effort or the Anthropic thinking block, you only receive the direct answer. There's no step-by-step reasoning or thought process included in the API response, despite the model providing a regular output. This can be frustrating because the chain-of-thought is important for understanding how the model arrives at its answer, especially for complex tasks.
Why is Chain-of-Thought Important?
Chain-of-thought prompting is a technique that enhances the reasoning capabilities of large language models (LLMs). It encourages the model to break down a complex problem into smaller, more manageable steps, mirroring human thought processes. By explicitly showing its reasoning, the model not only arrives at a solution but also provides a transparent and understandable path to that solution. This is crucial for:
- Debugging: Understanding the model's reasoning helps identify potential errors or biases in its thought process.
- Trust: Seeing the steps taken to reach a conclusion builds trust in the model's output.
- Learning: Chain-of-thought can be used as a teaching tool, demonstrating problem-solving strategies.
- Complex Problem Solving: For intricate tasks, chain-of-thought allows the model to tackle each sub-problem individually, leading to more accurate and reliable results.
Unfortunately, with the current configuration, the gemini-claude-sonnet-4-5-thinking model skips this vital step, hindering its potential for more sophisticated applications. The lack of chain-of-thought output limits the model's ability to explain its reasoning, making it challenging to verify the correctness of its answers, particularly in scenarios where transparency and auditability are essential.
Technical Analysis: Diving into the Code
This isn't just a matter of incorrect payload or parameters. The problem lies deeper within the model's configuration. The analysis reveals that all API modes are disregarding the 'reasoning' or 'thinking' configurations for this specific model. It consistently returns only the final answer, without any of the valuable chain-of-thought elements.
To pinpoint the root cause, the project's source code was examined, revealing a crucial check: the util.ModelSupportsThinking(modelName) function in the request handling code (OpenAI, Claude, Gemini, etc.). This function determines whether a model is capable of generating and streaming thinking blocks. However, for gemini-claude-sonnet-4-5-thinking, ModelSupportsThinking() returns false. This is because there's no Thinking configuration associated with this model in the model registry (internal/registry/model_definitions.go).
This absence of a Thinking configuration acts as a gatekeeper, preventing the model from engaging in chain-of-thought reasoning. As a result, regardless of the client payload or API route used, the chain-of-thought fields are bypassed. Even attempts to force the configuration using the payload in config.yaml are futile, as the fundamental logic within the code prevents the feature from activating.
The Missing Piece: Model Registry Configuration
The core of the issue resides in the internal/registry/model_definitions.go file. This registry acts as a central directory, defining the capabilities and configurations of various language models. Within this file, each model is assigned a set of properties that dictate its behavior. For gemini-claude-sonnet-4-5-thinking, the crucial Thinking configuration is missing. This omission effectively tells the system that the model doesn't support chain-of-thought, regardless of its underlying potential.
This configuration typically includes parameters that govern the chain-of-thought process, such as the minimum and maximum number of tokens allowed for reasoning, whether zero tokens are permitted, and if dynamic allocation of tokens is enabled. Without this configuration, the model lacks the necessary instructions and resources to generate step-by-step reasoning, leading to the observed lack of chain-of-thought output.
The Solution: Adding the Thinking Config
The suggested solution is straightforward: add a Thinking/reasoning configuration block to both gemini-claude-sonnet-4-5-thinking and claude-sonnet-4-5-thinking in the model registry (internal/registry/model_definitions.go). This involves adding a code snippet similar to the following:
Thinking: &ThinkingSupport{Min: 1024, Max: 100000, ZeroAllowed: false, DynamicAllowed: true},
This configuration block defines the parameters for chain-of-thought generation. Min and Max specify the minimum and maximum number of tokens that can be used for reasoning, while ZeroAllowed indicates whether zero tokens are permissible. DynamicAllowed determines if the model can dynamically allocate tokens for reasoning based on the complexity of the input. By adding this configuration, the logic paths will correctly recognize the model's ability to support thought/reasoning.
Once this configuration is in place, the APIs will be able to output thinking deltas/chain-of-thought, provided that the client sends the correct parameters in their requests. This seemingly small change unlocks a significant feature, allowing users to leverage the full potential of these models.
Step-by-Step Implementation
- Locate the Model Registry: Navigate to the
internal/registry/model_definitions.gofile within the project's source code. - Find the Model Definitions: Locate the definitions for
gemini-claude-sonnet-4-5-thinkingandclaude-sonnet-4-5-thinking. - Add the Thinking Configuration: Within each model's definition, add the
Thinkingconfiguration block as shown above. - Save and Deploy: Save the changes and deploy the updated code.
- Test the Implementation: Send requests with appropriate parameters (e.g.,
reasoning_effort, Anthropicthinkingblock) and verify that the API responses now include chain-of-thought output.
Impact: Unleashing Chain-of-Thought Capabilities
Currently, users are only seeing the final result, missing the crucial step-by-step thoughts and reasoning. This defeats the purpose of the thinking/chain-of-thought modes for these models. By implementing the suggested solution, the models will be able to provide a much richer and more transparent output, displaying the reasoning process behind their answers.
This has a significant impact on the usability and trustworthiness of these models. Users will be able to:
- Understand the Model's Reasoning: Gain insights into how the model arrives at its conclusions, fostering a deeper understanding and trust in its outputs.
- Debug and Improve Results: Identify potential errors or biases in the reasoning process and refine prompts accordingly.
- Leverage Chain-of-Thought for Complex Tasks: Utilize the chain-of-thought capabilities for more intricate problem-solving scenarios where step-by-step reasoning is crucial.
- Enhance Educational Applications: Employ the models as educational tools, demonstrating effective problem-solving strategies.
Testing and Expected Output
To verify the fix, a sample test prompt can be used: 9.11 vs 9.8, which is bigger? Please reason step by step.
- Expected Output: The API should return one or more reasoning/thinking content blocks as part of the response when the thinking mode is enabled and enough budget tokens are set.
- Actual Output (Before Fix): Only a direct answer block is returned, with no chain-of-thought or internal thoughts.
After implementing the solution, the expected output should include a detailed step-by-step explanation of the model's reasoning process, in addition to the final answer. This allows users to follow the model's thought process and verify the correctness of its conclusions.
Conclusion
The lack of chain-of-thought in gemini-claude-sonnet-4-5-thinking models is a significant limitation, hindering their ability to provide transparent and understandable reasoning. By adding the Thinking configuration block to the model registry, we can unlock the full potential of these models, enabling them to generate step-by-step explanations and enhancing their usability for complex tasks. This simple fix can greatly improve the transparency, trustworthiness, and educational value of these powerful language models.
For more information on Chain-of-Thought prompting, you can visit this helpful resource from OpenAI.