Building the Future: An Overview of DALL-E 2

Blog
Building the Future: An Overview of DALL-E 2
Building the Future
Data & AI
Copilot
AI & Machine Learning

Perhaps the most visually impressive example of the incredible power of modern Artificial Intelligence (AI) models is OpenAI’s DALL-E.

Named for Spanish artist Salvador Dalí and the main character from Pixar’s WALL-E, DALL-E is a photo generative AI. Using plain text input from a user, the model can generate entirely new images and art in a variety of styles.

Originally founded in 2015 by a collection of investors which included Elon Musk, OpenAI made waves in 2020 upon the release of GPT-3, a neural network language model capable of producing human-like text. OpenAI adapted some of GPT-3's architecture to their DALL-E project.

The GPT-3 neural network demonstrated an incredible ability to replicate language patterns, though not on a distinctly human level. This capability was invaluable for DALL-E, a neural network that also needed to work with plain text in generating original images.

Like all AI models, DALL-E needs training data to be effective. In this case, DALL-E was trained on hundreds of thousands of images from the internet and their associated “alt text”: a short description to help images be identified by search engines.

The actual internal workings of DALL-E are incredibly complicated and not completely intuitive. In summary, the system works with pixels as data points, constructing a multi-dimensional model of where this data lies in space. Certain pixels correspond with certain colors, shapes, textures and other characteristics.

The DALL-E model has no human-level understanding of color, shape, texture, etc. However, the relative orientation of these pixels in space is indicative of certain shared characteristics. These are characteristics that the model can begin to associate with words from the corresponding alt text captions of images in the training set. In this way, the model learns to associate certain words with collections of pixels, effectively allowing it to generate images based on plain text inputs.

DALL-E progressively refines image outputs through a process that OpenAI calls “diffusion”. On receiving a text input, the model begins with a collection of random pixels, constantly refining them until recognizable shapes, patterns and objects appear that correspond with the requested inputs.

DALL-E 2

The success of the original DALL-E has led to an even more capable iteration, DALL-E 2. This second version is capable of generating images with 4x the resolution of the original and editing these outputs in real-time.

OpenAI states that they are currently focused on mitigating some of the complications that have arisen from the latest version of DALL-E. These include harmful image generations and preventing users from requesting harmful content. Addressing these issues is no easy task, as thousands of images likely need to be identified and removed from the training dataset.

DALL-E 2 was released in April but access was limited with a long waiting list. As of September 2022, DALL-E 2 is available to the public for anyone that completes the DALL-E 2 sign up. Now anyone can experiment with the potential of AI generated art and AI generated photo editing.

Complications

For generative AI like DALL-E, training data can be a bit of a double-edged sword. In just two short years, these models have evolved from generating small, highly pixelated outputs to seemingly real-life images. Without access to millions of diverse high-quality training images and their alt text, none of this technology would be possible.

However, the need for millions of training images to make these AI models effective has unintended consequences, especially since inappropriate content can comfortably slip through the cracks without quality control protocols.

For the present, OpenAI is working to address these issues as part of their larger mission to build AI solutions or applications beneficial for humanity, inspire creative expression, and help the public understand the power of these technologies.

In the meantime, several substitutes for DALL-E 2 are floating around the internet from independent developers. Compared to the original, these models are fairly crude in their results, but regardless, they show the incredible capabilities of photo-generative AI.

The Dura Digital Takeaway

DALL-E 2 is just another example of the incredible power of AI systems and technologies. It has opened up new ideas and thoughts around the ability to generate one form of output from a different mode of input. One can only imagine that at some point this form of technology will be migrated to video, music and other digital modes. It certainly is an exciting time to be working in AI!

At Dura Digital we continually invest in learning new technologies so that we can provide you, our customers, broad scale insights and awareness that help you transform your business. Contact us for more details on how we can help you advance your business.