Peng Wang
Published:2025-05-06

Title: Understanding Distribution Learning of Diffusion Models via Low-Dimensional Modeling

Report time: May 9th, 2025 (Friday) 14:30-1600

Report location: Conference Room 201,Mathematics Museum​, ECNU

Inviter: Peng Wang


Abstract: 

Recent empirical studies have demonstrated that diffusion models can effectively learn the image distribution and generate new samples. Remarkably, these models can achieve this even with a small number of training samples despite a large image dimension, circumventing the curse of dimensionality. In this work, we provide theoretical insights into this phenomenon by leveraging key empirical observations: (i) the low intrinsic dimensionality of image datasets and (ii) the low-rank property of the denoising autoencoder in trained diffusion models. These observations motivate us to assume the underlying data distribution as a mixture of low-rank Gaussians and to parameterize the denoising autoencoder as a low-rank model. With these setups, we rigorously show that optimizing the training loss of diffusion models is equivalent to solving the canonical subspace clustering problem over the training samples. This insight carries practical implications for training and controlling diffusion models. Specifically, it allows us to characterize precisely the minimal number of samples necessary for learning correctly the low-rank data support, shedding light on the phase transition from memorization to generalization. Moreover, we empirically establish a correspondence between the subspaces and the semantic representations of image data, facilitating image editing. We validate these results with corroborated experimental results on both simulated distributions and image datasets.


Biography:

Peng Wang is currently a postdoc research fellow in the Department of Electrical Engineering & compute Science at University of Michigan. Before that, he got the Ph.D. degree in Systems Engineering and Engineering Management advised by Professor Anthony Man-cho So at The Chinese University of Hong Kong, His research interest lies in the intersections of optimization, machine learning,and data science. Currently, He is devoted to studying the foundations of deep learning models, espcially diffcuison and large language models.


Software Engineering Institute
Dishui Lake International Software Engineering Institute

Address:North Zhongshan Road 3663, Shanghai
                Nanmu Road111, Shanghai

E-mail:office@sei.ecnu.edu.cn  | Tel:021-62232550

www.sei.ecnu.edu.cn Copyright Software Engineering Institute