Semantically controlled texture synthesis by diffusion model

Abstract

Texture synthesis is a versatile task applicable across various domains, from entertainment to medical imaging. Traditionally, texture synthesis involves generating one type of texture per image. However, real-world objects and scenes often comprise multiple materials. Nowadays, image synthesis focuses on solutions based on generative neural networks. One of the possible controlling mechanisms for such methods uses an input semantic mask that specifies the layout of individual objects. This approach can also be beneficial for synthesizing combined textures without needing to concatenate regions of different textures by additional approaches.

In this study, we propose a method for semantically controlled texture synthesis by diffusion model. Used architecture of the proposed method is based on the adapted solution
for semantic image synthesis, which we tuned to improve the quality for the texture synthesis task, and faster sampling/training. The proposed method also contains a specialized training approach to reduce the sharp seams at the edges of different regions for multi-texture semantic synthesis on a dataset containing only single-region texture images. Additionally, we experimented with an approach for the synthesis of texture transition/interpolation during the synthesis process. Evaluation of our proposed method shows improved quality of texture synthesis compared to the existing semantically controlled synthesis solutions.

Dataset

You can download the raw images from the following link – texture_dataset.tar.gz (11 GB)

Code

Github page

Citation

  • Kollár, M., Hudec, L., Benesova, W. (2024). Semantically Controlled Texture Synthesis by Diffusion Model. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2024, Volume 1. FTC 2024. Lecture Notes in Networks and Systems, vol 1154. Springer, Cham. https://doi.org/10.1007/978-3-031-73110-5_26