An astronaut on a horse on the moon or a soup monster made of threads. DALL-E 2 neural network released

OpenAI has released DALL-E 2, a new version of its DALL-E neural network that draws pictures from a text description. The novelty is able to create a larger image at a lower delay and differs in that it can edit existing pictures. It is reported by The Verge.

“Astronaut riding a horse in a photorealistic style”

The image editing function has become one of the main innovations – users can upload a picture, specify the desired area on it and replace it. For example, you can add a photo of a room, limit the work of the neural network to one painting on the wall, and tell it to replace the painting with another or completely erase it. The new model perceives not only the objects themselves, but also the details accompanying them – for example, the shadows from them.

Another feature is the creation of similar pictures, similar to the original version. Also, DALL-E 2 can create an image based on two others, using elements from both of them. Separately, it is worth noting that the generated images have a size of 1024 × 1024 pixels – this is a significant improvement compared to the size of 256 × 256 pixels in the first version of DALL-E.

“A bowl of soup that looks like a monster knitted out of wool”

The updated neural network is based on the CLIP computer vision system. Researcher Prafulla Dhariwal from OpenAI commented:

In DALL-E 1, we simply took our approach from GPT-3 and applied it to generate pictures: we compressed the image into a series of words and simply learned to predict what would happen next.

But word matching did not always reflect exactly what people thought was most important, and the prediction process limited the realism of the images. The CLIP system was designed to analyze images and briefly describe their content as a human would. OpenAI has refined this process to create unCLIP – roughly speaking, a reversed version of the system that starts with a description and works on an image. In other words, DALL-E 2 generates images using diffusion.

Now the capabilities of DALL-E 2 are shown only on the official website – the developers do not plan to publish it in the public domain. Interested researchers can only apply for further testing of the preliminary version of the neural network.

Source: Trash Box

You may also like