An anonymous Discord user posted alleged leaks of OpenAI’s upcoming upgrade to its AI image generator, stunning AI fans worldwide. Later, the artificial intelligence news website The Decoder elaborated on its significant upgrades. Besides enhanced visuals, DALL-E 3 can better understand user prompts and render image texts than the previous version.
The best part about the AI race is the companies involved innovate relentlessly. Arguably, tech firms haven’t released upgrades for products and services with such speed and impact. Yet, OpenAI may blow away the world again by significantly upgrading its AI image generator. Soon, we will see its impact as more people use AI tools daily.
This article will discuss DALL-E 3’s alleged improvements over its predecessor. Later, I will contrast it with its competitors to illustrate its technological advancement.
DALL-E 3 vs. DALL-E 2
You may know OpenAI for its world-famous ChatGPT text generation tool. However, it has other artificial intelligence projects like DALL-E, its proprietary AI image generator.
As the name suggests, this program produces pictures based on user prompts. The current version is DALL-E 2 at the time of writing. It had similar upgrades to its competitors.
For example, DALL-E 2’s biggest advantage over 1 is creating more visually-appealing images. Meanwhile, other tools like Midjourney and Stable Diffusion have followed that trend.
It seems OpenAI is preparing to break ground with its upcoming DALL-E upgrade. The Decoder said a leaker shared the news on a Discord channel in May.
He claimed he was part of an alpha test for OpenAI’s new AI image model. He shared numerous samples to prove himself, eventually attracting online attention.
You may also like: What is DALL-E?
The leaker said the test version did not censor images so that they may depict violence, nudity, and copyrighted material. Fortunately, YouTuber MattVidPro AI contacted this person and shared safe-for-work examples.
He called the upgrade “extremely exciting.” Also, he claimed, “This blows anything we’ve seen before out of the water. It’s insane.” The program ups the ante for visual quality. As expected, it can render higher-quality pictures than its predecessor.
However, it solved AI-specific problems, such as properly rendering hands, fingers, and toes. Older models usually show too many or too few fingers or merged ones. Also, DALL-E 3 can understand user prompts better than 2.
DALL-E 3 vs. other AI image generators
DALL-E 3’s significant upgrades are hard to understand without illustrations. Let’s start by comparing images created with the 2nd and 3rd versions based on this prompt:
“A painting of a pink jester giving a high five to a panda while in a cycling competition. The bikes are made of cheese, and the ground is very muddy. They are driving in a foggy forest. The panda is angry.”
The image on top is from DALL-E 2, and the other is from 3. The former depicts cheese-colored bikes and a jester and panda touching hands while riding. However, the hands seem distorted and conjoined.
Conversely, the bottom one conveys the panda’s anger. You can see the muddy path and another biker, more accurately portraying a biking competition.
Their bikes look like they consist of the yellow dairy product. More importantly, DALL-E 3 properly depicts the jester’s hands and the panda’s paws high-fiving. Meanwhile, here’s the same prompt depicted by Midjourney:
It shows a concrete path, a happy bear, no jesters, and no cheese bicycles. Consequently, Midjourney struggles to match DALL-E 3’s quality. Also, DALL-E 3 can include text in its images. For example, check out how this tool interpreted the following command:
“An image of an angel holding the sun and moon. above the angel, it says, “BE NOT AFRIAD” in the background is the entire universe. fantasy art, 8k reoslution, beautiful, emotional.”
The leaker intentionally included the grammatical errors to illustrate how DALL-E 3 can recognize them. Despite the deliberate errors, it created an angel with the words “Be not afraid” above.
You may also like: How to make a Discord bot
DALL-E 3 also avoids concept spillover, which involves an image model mixing different content concepts. For example, here’s how this AI model interpreted this prompt:
“A group of farm animals (cows, sheep, and pigs) made out of cheese and ham on a wooden board. There is a dog in the background eyeing the board hungrily.”
Tools like Stable Diffusion and Midjourney would likely turn every animal in this image into a food model. However, the new OpenAI program ensures the background dog remains realistic.
Conclusion
An unknown person leaked images allegedly created by OpenAI’s upcoming AI image generator, DALL-E 3. It is significantly better than its predecessor and competitors.
It can understand prompts better and deliver higher-quality pictures. However, other tools have a more powerful advantage: they are available to the public.
At the time of writing, OpenAI hasn’t confirmed it is developing DALL-E 3 or a similar upgrade. Check out other digital tips and trends at Inquirer Tech.