Illustrations that are accompanied by captions are popular things now. These known types of illustrations are made possible by text-to-image programs where people can insert words that can be converted into images through artificial intelligence models like DALL-E.
A Twitter user, for instance, captioned in a tweet, "To be or not to be, rabbi holding avocado, marble sculpture," and attached to it is a sophisticated image of a marble statue of a bearded man in a robe and a bowler hat, grasping an avocado.
These Artificial Intelligence (AI) models come from Google's Imagen software and a start-up backed by Microsoft known as OpenAI, which also developed DALL-E 2. According to its developer, it is capable of realistic images and art from natural language descriptions.
Those trying to access this program must join a waiting list and tell whether they are professional artists, developers, academic researchers, journalists, or online creators.
However, Google and OpenAI have not made the technology broadly available to the public, which is why only a small batch of people can share their pictures and generate engagement. In fact, most of the first users are friends and relatives of employees.
Nevertheless, a version called DALL-E Mini, which draws on open-source code from loosely organized developers and is mainly overloaded with demand, is publicly available. Yet, those who try to use it might come across "Too much traffic, please try again."
Similar to DALL-E, Google's Gmail service's early users could get in by invitation, leaving millions waiting. But now, Gmail is one of the most prevalent email services globally.
Furthermore, Joanne Jang of OpenAI wrote on the company website's help page that they are toiling to improve access, but it's likely to take time. As of June 15, OpenAI has invited 10,217 people to try their text-to-image AI program.
Google trained its Imagen model with hundreds of its in-house AI chips on 460 million internal image-text pairs and outside data. They identify the essential parts of a user's prompts and guess the best way to illustrate those. This resulted in refined text-to-picture services.
Amidst its refinement, its interface is simple. It only comprises a text box, a button to initiate the generation, and an area that displays images. Google and OpenAI add watermarks in the bottom right corner of pictures from DALL-E 2 and Imagen as a way to credit their sources.
The companies and groups building the software are justifiably concerned about having everyone storming the gates at once. The San Francisco-based OpenAI sees the harm a model that learned how to make images by scouring the web could create.
Engineers trained the AI models on extensive collections of words and pictures from the web. Employees removed violent content from training data to handle the danger. Some filters stop DALL-E 2 from generating images that might violate company policy against nudity, violence, conspiracies, or political content.
Results are seen to improve over time continually. As the DALL-E 2, which was introduced in April, gives more realistic images now compared to the initial version, OpenAI announced last year.
The company's text-generation model, called Generative Pre-trained Transformer (GPT), has become more cultured with each generation. So, despite the risks, OpenAI is excited about the types of things the technology can enable.