Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration

Source: arXiv AI Papers

Maestro leverages specialized multimodal LLM agents that act as critics to assess the quality of generated images and correct under-specifications in prompts. By providing interpretable edit suggestions, the system guides a verifier agent to align improvements with the user’s original intent, making the process of image generation more efficient and user-friendly.

The self-evolution component of Maestro allows it to advance creativity by comparing various iterations of generated images against each other. This approach not only filters out lower quality outputs but also generates refined prompts that better reflect the user’s desires. Extensive experiments indicate that Maestro significantly outperforms existing automated methods, particularly when sophisticated MLLM components are employed.

The implications of this system are profound as it reduces the need for manual intervention in T2I generation, potentially democratizing access to high-quality image creation tools. However, the reliance on MLLM to make artistic decisions raises questions about creative autonomy and the subjective nature of image quality.

👉 Pročitaj original: arXiv AI Papers