DALL-E Legacy | AI To Learn

Fauvist painting by DALL-E 3 via ChatGPT

DALL-E

Legacy of AI Image Generation

I heard about this new AI image generator called "DALL-E" shortly after its public release in October, 2022. It piqued my curiosity, but I was apprehensive. What kind of art can a machine possibly make, and what do I do with them? A few months later, I was determined to learn about ChatGPT (public release: November 2022) for my teaching and joined Open AI's Discord community. One day, I strolled over to the DALL-E channels out of curiosity. Shocked, and then inspired, by the quality and variety of images I saw, I decided to give it a try. The first set of images I created were stunningly bad (see slideshow below). How come my images look so crude and simplistic, compared to the ones I saw on Discord? That was the beginning of my accidental journey.

I noticed the trick was the prompt. Experienced users each had a unique style of structuring their prompts and a favorite list of art terms, and I started incorporating their techniques when I wrote mine. Then I caught on to Bing Image Creator, a Microsoft platform powered by DALL-E 2 with backend prompt enhancement mechanism. I generated dozens of images everyday, spent hours reading up on different art styles and techniques, and could feel my prompting skills developing by leaps and bounds. A little by little, I began to create images that reflected my vision.

After a couple of weeks, ideas started pouring out of my brain non-stop. I experimented with different art styles and started mixing them up to see what will happen. I also recognized the storytelling potential in AI-generated images to visualize the whole world in a picture. Once teased by an art teacher because I couldn't paint inside the lines, I became excited about creative expression for the first time in my life. My first month with DALL-E was among the most impactful experiences of my life.

FIRST POST!

One of the first images I created with DALL-E 2

DALI ROBOT CAT

Starting to get a better handle of prompting (Bing Image Creator)

Backstreet Ramen

Daily theme: "soup." Others focused on a bowl of soup; I wanted to depict a scene. (Bing Image Creator)

FIRST POST!

One of the first images I created with DALL-E 2

1/8

Slideshow: Early Creations with DALL-E 2

Both images mage with DALL-E 2, via OpenAI Discord server. Left: Fauvist oil painting of a girl in a blue dress. Right: Post-Impressionist painting of a bearded man in a red sweater.

If later models are better at following prompts and excel in realistic rendering, DALL-E 2 has its own quirky charm. It can produce painterly images full of character, which many OG users sorely missed since the closure of OAI online labs last year.

Recently, OAI brought back this legacy model for limited use (5 images/day) via their Discord server. If you are already a member, look for "#image-bot" in the channel list, and follow the instructions. To be added to the OAI Discord server, head to Discord to request an invitation. To join, you will need a ChatGPT account and Discord account (both free).

Then Came the Change

In October, 2023, about a month after I started posting my DALL-E generated images on Discord, I was invited to participate in the early testing of the much-anticipated new model, DALL-E 3. The first images I produced with the test version of DALL-E 3 knocked my socks off. Compared to a DALL-E 2 image created with the same prompt, the realism of DALL-E 3 image was stunning. DALL-E 2 had a hard time including logos in the image and often misspelled words; DALL-E 3 improved significantly in this regard (though some errors still happen), suggesting the possibility of AI-generated graphic design. Choice of square, tall and wide formats (DALL-E 2 was a square format only) also expanded the range of compositions.

With all its strengths, DALL-E 3 fell short in some respects. Experienced users quickly noticed the bias towards highly saturated color palettes and "plasticky" textures. Its composition tended to be highly symmetrical, crowded and busy. I almost cried when I saw when DALL-E 3 butchered my tried-and-true prompt for a Nihonga painting and turned it into a kitschy souvenir art. These perceived deficiencies were modulated over the next few weeks both by adjustments in user prompts and finetuning on the system side. Nevertheless, these issues were indicative of DALL-E 3's innate characteristics that signaled a significant departure from the painterly style of DALL-E 2.

This early experience taught me an important lesson about AI image generation: that much of prompt engineering work can turn worthless when, inevitably, major technological updates take place, and that depending on frontier technology like DALL-E for creative work will put us on a constant learning curb. Changes and innovations happen all the time in traditional art, too, and artists must recalibrate their approaches accordingly. But the speed and extent of change is much more drastic in the world of AI, and to keep creating, we must also adapt quickly and make the most of what new technology offers, as I discovered in the last two years.

In this side-by-side comparison of a "macrophotographic image of a hummingbird hovering over prickly pear blossoms", the strength of DALL-E 3 (above) in photorealism is clearly demonstrated. "Bokeh background" in the prompt was beautifully executed with prized blurred lights. DALL-E 2 rendering (right) seems rather painterly and without the cinematic depth of DALL-E 3

DALL-E 3 IMAGE GALLERY

Postcard from Hollywood Hills

Featured in OpenAI's Discord server, this image showcases the "pointillist" technique, as well as DALL-E3's ability to include logos in the generated image.