Full Program »

Perception and Evaluation of Text-To-Image Generative Ai Models: A Comparative Study of Dall-E, Google Imagen, Grok, and Stable Diffusion

Generative Artificial Intelligence (AI) model is a revolutionary type of AI capable of producing high quality images based on textual inputs. These models utilize natural language processing (NLP) techniques and computer vision to understand and interpret the textual descriptions and then generate images that align with the given descriptions. This study evaluates four prominent text-to-image generative models- DALL-E, Google Imagen, Stable Diffusion, and GROK AI emphasizing on the text-to-image diffusion models. Using a comprehensive evaluation approach, we employ three mathematical formulas the Fréchet Inception Distance (FID), Structural Similarity Index (SSIM), and Peak Signal-to-Noise Ratio (PSNR) to assess image quality and realism across datasets collected from these AI platforms. Additionally, human evaluations are conducted to compare the perceptual impact of AI-generated images with mathematical metrics. Our findings highlight the varying degrees of perceived realism among different image generative With AI models, DALL-E and Imagen generally being perceived as more realistic than Stable Diffusion and GROK. As examined, human evaluation is the current gold standard in text-to-image evaluation; however, mathematical based metrics also have promise and value. Our findings contribute to the advancement of text-to-image synthesis and our results provide support for FID as the gold standard evaluation method as it most closely represented human evaluation.

Suhaima Jamal
Georgia Southern University
United States

Hayden Wimmer
Georgia Southern University
United States

Carl Rebman
University of San Diego
United States