Open WebUI: Image Editing Issue With Model-Generated Images
Experiencing issues with image editing in Open WebUI, especially when it comes to model-generated images? You're not alone! This article dives deep into a specific problem encountered while using Open WebUI with models like 'gemini-3-pro-image-preview' and 'gemini-2.5-flash-image' via OpenRouter. We'll break down the issue, explore the expected vs. actual behavior, provide clear steps to reproduce the problem, and analyze potential causes. If you're struggling with image editing in Open WebUI, this is the guide for you.
Understanding the Issue
The core issue revolves around how Open WebUI handles image editing, specifically when dealing with images generated by the model itself or images uploaded by the user. The expectation is that the model should be able to take a previously generated or edited image as input for further modifications. However, the reality is a bit different, leading to frustrating results.
Expected Behavior
When using models like 'gemini-3-pro-image-preview' or 'gemini-2.5-flash-image' in a direct chat within Open WebUI, the following behavior is anticipated:
- Model-Generated Images: If you prompt the model to generate an image, that image should then be usable as a base for subsequent edits. In other words, if you ask the model to create a picture of a cat, and then ask it to add a hat to the cat, it should modify the generated cat image.
- Uploaded Images (Initial Edit): When you upload an image to the chat and ask for an edit, the model should correctly apply the requested changes. This part generally works as expected – for the first edit.
- Uploaded Images (Subsequent Edits): This is where the problem arises. If you ask for a second edit, the model should use the edited image (the result of the first edit) as the input. However, it instead reverts to the originally uploaded image, effectively ignoring the previous modification.
Actual Behavior
Unfortunately, the actual behavior deviates from the expected in the following ways:
- Model-Generated Images: Images generated by the model in response to a prompt are not used when asking for edits. Instead of modifying the generated image, the model returns a completely new image, disregarding the initial output.
- Uploaded Images: While the first edit on an uploaded image works correctly, subsequent edits always operate on the original uploaded image. The model doesn't seem to remember or utilize the previously edited version, leading to repetitive modifications of the same starting point.
This behavior significantly hinders the iterative image editing process, making it impossible to progressively refine an image through multiple edit requests. It forces users to re-upload the edited image after each modification, which is a cumbersome and inefficient workaround.
Reproducing the Issue: A Step-by-Step Guide
To demonstrate this issue, you can follow these steps. These steps are based on the original issue report and provide a clear, reproducible scenario. This is essential for developers to understand and address the problem effectively.
Prerequisites
- Docker: You'll need Docker installed and running on your system.
- Open WebUI: You should have Open WebUI set up using Docker.
- OpenRouter Account: You'll need an account with OpenRouter to access models like 'gemini-3-pro-image-preview' and 'gemini-2.5-flash-image'.
Steps
-
Configure Docker Environment: Set the following environment variables in your Docker environment. These variables are crucial for enabling the necessary features and handling larger image data.
CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZE=10485760ENABLE_CHAT_RESPONSE_BASE64_IMAGE_URL_CONVERSION=true
-
Model-Generated Image Issue:
- Start a chat with either 'gemini-3-pro-image-preview' or 'gemini-2.5-flash-image' via OpenRouter.
- Ask the model to generate an image using a prompt like, "Generate a picture of a futuristic cityscape."
- Once the image is generated, ask for an edit. For example, "Add a flying car to the image."
- Expected: The model should modify the previously generated cityscape image by adding a flying car.
- Actual: The model generates a completely new image, likely a different cityscape, instead of editing the original.
-
Uploaded Image Issue:
- Start a chat with the same model ('gemini-3-pro-image-preview' or 'gemini-2.5-flash-image').
- Upload an image to the chat. You can use any image for this purpose.
- Ask for an edit. For example, "Make the sky in the image more blue."
- The model should correctly edit the image, making the sky bluer.
- Now, ask for another edit. For example, "Add a rainbow to the sky."
- Expected: The model should take the edited image (with the bluer sky) and add a rainbow to it.
- Actual: The model reverts to the original uploaded image and adds a rainbow to the original sky, ignoring the previous edit.
By following these steps, you can reliably reproduce the image editing issues within Open WebUI. This clear reproduction path is vital for developers to investigate and fix the problem.
Analyzing the Problem: Potential Causes and Solutions
Pinpointing the exact cause of this issue requires a deeper dive into Open WebUI's code and its interaction with the underlying models. However, based on the observed behavior, we can speculate on some potential causes and suggest possible solutions.
Potential Causes
- Context Management: One possibility is that Open WebUI isn't correctly managing the context of the chat session. The model needs to retain information about the previous image generated or edited to use it as input for subsequent requests. If this context is lost or not properly passed, the model will treat each edit request as a fresh start.
- Image Handling: The way Open WebUI handles image data, particularly when converting it to and from the model's expected format, could be a factor. If the image data isn't being stored or retrieved correctly, the model might not receive the correct input for editing.
- API Interaction: There might be an issue in how Open WebUI interacts with the OpenRouter API or the specific APIs of the models being used. The image data might not be correctly included in the API requests, or the responses might not be parsed correctly.
- Buffering Issues: The
CHAT_STREAM_RESPONSE_CHUNK_MAX_BUFFER_SIZEvariable suggests that streaming responses are being used. If there are issues with buffering or handling these streamed responses, it could lead to incomplete or corrupted image data being passed for editing.
Possible Solutions
- Context Tracking: Ensure that Open WebUI correctly tracks the conversation history and the state of the images within a chat session. This might involve storing image data or references to images within the session context.
- Image Data Management: Review the image data handling mechanisms within Open WebUI. Verify that images are being correctly stored, retrieved, and converted between different formats as needed.
- API Request/Response Inspection: Carefully examine the API requests and responses between Open WebUI and OpenRouter (or the model's API). Ensure that the image data is being included correctly and that the responses are being parsed accurately.
- Streaming Response Handling: Investigate the handling of streamed responses, particularly in relation to image data. Ensure that the buffering and processing of these responses are robust and don't lead to data loss or corruption.
- Debugging and Logging: Add more detailed logging within Open WebUI to track the flow of image data and the interactions with the model's API. This will help pinpoint where the issue is occurring.
Conclusion
The issue of model-generated image editing not working correctly in Open WebUI is a significant impediment to a smooth user experience. By understanding the problem, following the steps to reproduce it, and considering potential causes and solutions, we can work towards resolving this issue and unlocking the full potential of image editing within Open WebUI.
If you're experiencing this problem, be sure to contribute to the discussion on the Open WebUI platform and provide any additional information that might be helpful. Collaboration is key to finding effective solutions.
For further information on Large Language Models and image editing, you can check out resources like the Hugging Face blog for insights and updates in the field.