Beyond Diffusion: What is Personalized Image Generation and How Can You Customize Image Synthesis?
Personalized Image Generation by Fine-Tuning the Stable Diffusion Models
In this article you will learn about customization and personalization of diffusion model-based image generation. More specifically, you will learn about the Textual-Inversion and Dream-Booth. This article will build upon the concepts of Autoencoders, Stable Diffusion Models (SD) and Transformers. So, if you would like to know more about those concepts, feel free to check out my earlier posts on these topics.
Text-to-Image generators based on diffusion models are one of the major developments in the fields of Deep Learning and Image generation. Such generators (e.g., Stable Diffusion Models [1]) are robust and can accurately generate images of various concepts in a variety of backgrounds and contexts. This opened a whole new area of research and innovation. However, these generations are uncontrolled and cannot be customized to personal taste. One must rely only on the concepts already trained in the network. For instance, it is not possible to query at the prompt of SD with a word/image from your own personal life (e.g., a unique name of your pet/cartoon-character or image of your personal toy) and modify its pose or context.