The DALL-E Mini software from a group of open-source developers isn’t perfect, but sometimes it actually provides images that match people’s text descriptions.
If you’ve been scrolling through your social media feeds lately, there’s a good chance you’ve noticed illustrations with captions. You are popular now.
The images you see are probably made possible by a text-to-image program called DALL-E. Before the illustrations are posted, people insert words, which are then turned into images by artificial intelligence models.
For example, a Twitter user posted a tweet that read, “To be or not to be, Rabbi holding avocado, marble sculpture.” The attached image, which is very elegant, shows a marble statue of a bearded man in a robe and bowler hat grasping an avocado.
The AI models come from Google’s Imagen software and from OpenAI, a Microsoft-backed start-up that developed DALL-E 2. On its website, OpenAI calls DALL-E 2 “a new AI system capable of creating realistic images and artwork with a natural language description.”
But most of what happens in this space comes from a relatively small group of people who share their images and in some cases generate high engagement. That’s because Google and OpenAI haven’t made the technology widely available.
Many of the early adopters of OpenAI are friends and relatives of collaborators. If you seek access, you must join a waiting list and indicate whether you are a professional artist, developer, academic researcher, journalist, or online creator.
“We’re working hard to speed up access, but it will likely take time for us to reach everyone. As of June 15, we invited 10,217 people to try DALL-E,” OpenAI’s Joanne Jang wrote on a help page on the company’s website.
A publicly available system is DALL-E Mini. It relies on open source code from a loosely organized development team and is often overloaded with demand. Attempts to use it may be greeted with a dialog box saying “Too much traffic, please try again”.
It’s a bit reminiscent of Google’s Gmail service, which offered unlimited email storage in 2004. Early adopters could initially only join by invitation, millions had to wait. Now Gmail is one of the most popular email service in the world.
Creating images from text may never be as ubiquitous as email. But the technology is certainly having a moment, and part of its appeal lies in the exclusivity.
Private research lab Midjourney requires people to fill out a form if they want to experiment with its image generation bot from a channel on the Discord chat app. Only a select group of people uses and posts images of Imagen.
Text-to-picture services are sophisticated, identifying the most important parts of a user’s prompts and then guessing how best to illustrate those terms. Google trained its Imagen model with hundreds of its in-house AI chips on 460 million internal image-text pairs, in addition to external data.
The interfaces are simple. There is generally a text box, a button to start the generation process, and an area below to view images. To give credit, Google and OpenAI add watermarks in the lower right corner of images from DALL-E 2 and Imagen.
The companies and corporations that build the software are rightly concerned that everyone will storm the gates at once. Serving web requests to run queries with these AI models can get expensive. More importantly, the models aren’t perfect and don’t always produce results that accurately represent the world.
Engineers trained the models using extensive collections of words and images from around the web, including photos posted to Flickr.
San Francisco-based OpenAI recognizes the potential for harm that could come from a model that learned how to create images by essentially crawling the web. To counter the risk, employees have removed violent content from training data, and filters are in place to prevent DALL-E 2 from generating images when users send prompts that may violate company policy on nudity, violence, conspiracy, or political content.
“There is an ongoing process to improve the security of these systems,” said Prafulla Dhariwal, an OpenAI researcher.
Bias in the results is also important to understand and poses a broader problem for the AI. Boris Dayma, a developer from Texas, and others who worked on DALL-E Mini described the problem in a statement from their software.
“Occupations with a higher level of education (such as engineers, doctors or scientists) or with high manual labor (such as in the construction industry) are represented mainly by white males,” they wrote. “Nursing staff, secretaries or assistants, on the other hand, are typically women, often white.”
Google described similar shortcomings in its Imagen model in a scientific paper.
Despite the risks, OpenAI is excited about the things the technology can make possible. Dhariwal said it can bring creative opportunities to individuals and help with commercial applications for interior design or website design.
Results should continue to improve over time. DALL-E 2, unveiled in April, spits out more realistic images than the original version OpenAI announced last year, and the company’s text generation model, GPT, has become more sophisticated with each generation.
“That’s to be expected with a lot of these systems,” Dhariwal said.
CLOCK: Former Pres. Obama backs disinformation, says things could get worse with AI