AI And Artistry: Conjuring A Magical Universe With Midjourney
Updated: Sep 27
Lessons In Selecting 60 Illustrations Out Of 1,000s Of Generative-AI Images
When I read a story, I imagine it in my mind. An illustration breaths life into the text of the storybook. After creating a story using ChatGPT, my next step to create a personalized storybook for my wife was to paint the fictional universe in the storybook. I wanted to make a vivid storybook beyond the 6,000 words in it.
In the previous articles, I started referring to ChatGPT as C. C was not yet a person, but not just a tool. C was a collaborator. But, I’ll also give its full name once in each section because a reader may skip to some section.
In this article, I share my vibrant but challenging experience with AI-generated art with Midjourney. AI helped me create my own magical universe, one image at a time.
Choosing The Perfect Illustration
My initial plan with Midjourney was to generate visuals for each key scene. I envisioned one for the shopping district, one for the ice cream parlor, one for the auction place, and one for some mysterious potion lab. But why restrict myself to just 4 or 5 visuals? With Midjourney, I can create so much more.
My dream was to cast medieval watercolor renditions of my spouse and me as the central characters in the story. These characters would have engaged in interesting activities. But, generating these images with generative AI without using Photoshop proved challenging. So, I turned to abstract art to illustrate parts of the story. Instead of portraying a woman casting a spell with her wand, I illustrated a wand releasing some sparkling magical waves.
For image generation, I created an order of priority:
Location of the Scene: Based on Matthew Dicks’ advice, the setting is pivotal in helping readers visualize the story.
Food: Since my wife and I are enthusiastic foodies.
Magic: My wife and I are drawn to magical stories.
Characters: My goal was to show abstract or silhouette versions of characters, given the complexity.
Selecting The Artistic Medium And Inspiration
C (ChatGPT) and I collaborated to identify the artistic medium, historical era, and other specifics to refine the prompts for Midjourney. My aim was to align with the ambiance of the Harry Potter universe. After experimenting with various styles in Midjourney, I zeroed in on using medieval, magical, and watercolor.
what artistic medium, historical periods, location, etc. should I append to midjourney prompts to get illustrations that will suit scenes set in the harry potter universe? Let’s Think Step by Step
C is not good at creating image generation prompts.C tends to create detailed and lengthy prompts. This is a problem as Midjourney does not process grammar.
C’s example prompt:
Fabian Fortescue, with lines of worry on his face, converses with Sylvia and Hector. Overhead, a magical owl flies, possibly carrying news. Lanterns hanging from the shops cast dim lights onto the street, adding to the ambience. Medieval, Magical, Watercolor.
Midjourney itself shortens C’s prompt but cutting out unnecessary words:
Fabian Fortescue, his, owl, news. Lanterns hanging, shops cast dim lights onto the street, adding, Medieval, Watercolor
My chosen prompt for the scene:
concerned happy man old, wizard, eccentric dress, medieval, magical, watercolor, night, dark
On one hand, my prompt emphasized only one part of the overall scene. But on the other hand, Midjourney’s output favours inconsequential aspects when we add a lot of keywords. It does not understand it should not focus on proper nouns, unless for metonymic names.
Crafting Midjourney Prompts For Illustrating The Storybook
I have divided my writing of Midjourney prompts into two areas:
abstract images and
the puppet method.
Here is a walkthrough of all the images I generated using Midjourney.
An about 1 minute walkthrough here:
Four Learnings In Creating Abstract Images
I had a few learnings:
Consistent artistic style
Using existing artwork as base
1 - I stuck to an artistic style of 'watercolor', 'medieval', and 'magical' in my prompts for consistency across the images.
2 - Midjourney does not understand grammar. A detailed scene, for instance, a young Indian man with specific attributes enjoying a magical ice cream in a specific setting, becomes convoluted for it. My workaround was to lower my expectations by simplifying the goal and provide independent digestible prompts. These are called multi-prompts.
If I gave a simple prompt, then the output did not match my expectations.
inside Ice Cream Parlour. illustrations on wall. leprechaun design on wall. victorian, magical, fantasy, watercolor.
Whereas, if I gave a multi-prompt, I was able to reach my desired result better. The leprechaun wall design did not come on the walls, and I was hesitant to increase its weight as it distorted the faces of the people. I adjusted the emphasis of multi-prompt components to achieve the intended result.
kids dressed as witches wizards eating ice cream, watercolor, medieval:: chairs tables in ice cream parlour, victorian, magical, fantasy, watercolor::2 leprechauns wall design watercolor, medieval::0.5
3 - I found using existing artwork a workaround to knowing the picture perfect prompt, because I can provide the artwork as an input to Midjourney and go from there. For example, I started with the Diagon Alley image from this fandom page and edited it to make it rainy and in a similar aesthetic, by adding that image as an input to this prompt:
wizarding alley::1 dusk, evening::2 floating lanterns::0.5 scattered rain, watercolor, magical, medieval::2
4 - Sometimes, I edited parts of the image that did not work for me. I once had an image of Sylvia, the protagonist eating something. I wanted to edit that to make her eat ice cream instead. It was hard but I made some small tweaks that were better.
I tried varying her hands to make her hold a spoon with ice cream in the spoon, and I got a better result.
The Puppet Method For Characters
I discussed my project with Rands Leadership community, Micah Freedman, Michael Shostack, and Zachary Cohn. Michael had created a virtual model, locally trained, from a person’s images which could be used for a personalized greeting. Zachary created an illustrated children’s storybook as a personal project by using gen-AI and photoshop. Micah referenced steps and templates he found about a method to create consistent character images across a story.
I learned about the puppet method. You use a single body outline and adjust it for varied poses to get visual uniformity. Tech N Trendz and Haoshuo have explanations, examples, and a template. After reviewing those blogs and more on Youtube, I tested this method using a few images. I tried starting with my wife or my image. I tried starting with a text prompt to Midjourney. I used my image and gave it a long prompt. But it created a person unlike me, to my eyes.
On the other hand, when I started with a text based prompt, further variations drifted in useless directions. For example, in the grid below, you will see my attempts to add some facial beard, put on a disguise, make him hold a wand, get a happy expression, get a glimmer in his eyes, and drench him from rain. Midjourney took a strong fascination to the circular light behind it and his somber expression. Midjourney also often changed his appearance to a girl.
young indian man, rugged look, medieval, magical, wizard, watercolor:: wand in hand:: light brown skin:: toussled black hair:: rimless round spectacles::
But I was able to improve the output when I fed the puppet image after removing its background using remove.bg.
Comparing Generative AI: 2 Differences In Image Vs Text
1 - Feedback Loop
I can use C (ChatGPT), a text Gen-AI tool, to generate some text, then independently ask it to evaluate it. Midjourney cannot evaluate images. It can describe an image fed to it, but I don’t know of any way to evaluate images. I need to think of a way to use ‘describe’ command and C together to evaluate the image generation.
I will lay out a potential way to use describe for a feedback loop. I provide a text prompt to Midjourney, which then creates an image. After selecting an image, I will ask Midjourney to describe it. Then, I will request ChatGPT to compare the differences between my original text and the description. Additionally, I can use the description as a new input for Midjourney.
2 - Scaffolding
I wrote the story in multiple steps with ChatGPT. One way to summarize it is these 3 steps:
I built a story scaffold with C, then generated the story, and lastly, reviewed the output. But, no image generation tool can assist with the visual prompts. No image generation tool can explain the steps to generate images to meet a goal. I wrote more about scaffolding in my earlier article here.
From Illustrations And Text To Paper
After getting the images and the text ready, we need to lay them out with the story text. I talk more about it in the next article:
Drafting Book Metadata with ChatGPT: How AI can speed up the ancillary tasks like writing the perfect book metadata.
Clocking the Hours: Tracking time spent on this project.
My Toolbox for an Illustrated Storybook: The software tools I used and gave up during the creation process.
And I’ll end the next article with sharing the finished storybook.