Stable Diffusion is an amazing tool for marketers to generate marketing collateral.
I recently had a client who just received the render of a product. This means they couldn’t organize a real photoshoot.
I realized it’s more effective to use Stable Diffusion to generate a model, and Photoshop the product onto the model’s hands.
Here’s the result.
General steps to generate this image
- Generate image
- Upscale if necessary
- Inpaint face, hands, feet, etc
- Place product in character’s hands
- Put product in guy’s hands
- Clone/heal areas that aren’t relevant to the image
One big challenge involves getting Stable Diffusion to pose your character properly.
In the old days, which is a few months ago in the AI world, how you’d word your prompt matters the most. You could also throw Stable Diffusion a reference image and see what happens.
Generally, I found that Stable Diffusion can comply with your needs, especially if you generate enough images. There will be a hit somewhere in there.
Using Stable Diffusion in the first part of the process
In this case, it used a “prompt-and-pray” method. I generated these images with the following prompt:
photorealistic asian man smiling holding something on a white background
Steps: 40, Sampler: DPM2 Karras, CFG scale: 7, Seed: 3640075990, Size: 512×512, Model hash: 44bf0551
There are a few other hopefuls in these nine images, but the only one that fit in terms of lighting and hand position is the guy in the blue shirt. The fingers are also high quality so it helps.
With the image chosen, the next step is to fix the guy’s face as it’s not realistic enough. Onto the inpainting!
Note: if you need to upscale it, I’ll do it before you inpainted the face.
Once everything is done in Stable Diffusion, you can move onto the Photoshop part.
Using Photoshop to finalize the image
As you can see, the light on the product is coming from both the left and the right.
If your batch count is high enough, you would probably have an image that fits the lighting scenario.
If not, you could take the image into Photoshop and paint in some shadows and regenerate using img2img with the same prompt.
How did I get from the before to the after?
Upon reviewing the layers of the PSD, it seems that I actually placed the product in the guy’s hands, pasted the new face on the guy, matched the luminance of the product to the photo and finally, inpainted the areas of the guy’s shirt that’s covered by the box.
Use this action to blend your product with your AI generated image
In order to blend two images taken in different settings, you need to ensure their saturation, luminance and hue match.
I used an action provided by Unmesh Dinda for his PiXimperfect channel. The tutorial below will teach you how to use it.
If you can, photograph your own product image
Of course, if you can photograph your own image, that would be ideal. That way, you can manipulate the lighting so that it matches the generated image.
Take a look at the following image:
In the course of writing this article, I challenged myself to see whether it’s possible to use stock images to create images that marketers would use.
I’d say the image above is a good image. Since an ad like this will probably get a second of view time. Most won’t notice the difference in lighting between the beer bottle and the woman.
But yes, if you are picky with your own work, that imperfect lighting can grind your gears.
Anyway, below are the source images. A few faults that I cleaned up in Photoshop included a missing finger (I cloned a finger) and the water’s horizon being mismatched. I mirrored the image because I wanted the beer to be the focus.
The same techniques I applied above were applied to this image.
Getting Stable Diffusion to do a specific pose
With the prompt-and-pray method, you might never get what you want. Or you spend a ton of time generating samples so you can see how Stable Diffusion responds to your prompts.
Here’s how I’d go about guiding Stable Diffusion based on how complicated the steps are. The quality of the results have a positive correlation to the complexity.
At the most basic, find an image with a pose you want and then img2img
But prior to Controlnet, I would find an image online and use img2img to ensure Stable Diffusion generates a pose similar to what the image is.
You can also do a sketchbook of images in order to further guide Stable Diffusion with your idea.
Take the same image, Controlnet it
If you have the sample image you found online, you can use it in Controlnet. I have found openpose to be good along with depth. Canny is good for specific things where details are lost with openpose or depth, such as fingers or toes.
Generate your own poses using PoseMyArt then Controlnet
For example, if I wanted to create the beer image again, I would use a 3D rendering tool to create a mannequin that is in the pose I want.
For example, I used PoseMyArt to create this:
I sent it through Controlnet’s “preview annotator” to see the results. Sometimes you get no results, especially in non-upright poses. Also, make sure to enable the Controlnet module.
By just using a single Controlnet (depth), I got decent results:
By combining a depth and canny (for the hand only, see below), I got much more consistent results:
When I saw this method in action, I was amazed. The fingers were in the exact position I wanted them to be!
This video is a good tutorial on how to use these tools.
Creating an ad using the images above
Here are the images I used to create an image of a woman being shocked at receiving a gift.
For some reason, I wondered if we would be better off just doing a face swap with Stable Diffusion.
I used Controlnet to create the pose above. This is what I used:
And the result:
Using Blender to pose
A tool by toyxyz, named “Character bones that look like Openpose for blender“, might help you further.
Instead of using an online tool, you could just use Blender. The benefits of this method isn’t clear to me.
One thing I noticed is that you’d be working directly on the openpose model, so it’s a bit abstract compared to using PoseMyArt where you can actually see the human body.
The tool is quite intimidating, but this tutorial really helped. Here are the relevant timestamps:
- 3:17 Getting into pose mode to manipulate character
- 4:55 Manipulating hands to make a peace sign
- 6:23 Moving camera to export specific portion
- 7:50 How to export either Canny or Depth map
- 9:07 How to export hands in Canny output
- 10:20 Export openpose
- 11:20 using exports in automatic1111
You might also have to install Autorig, which comes in toyxyz’s download. Here’s a video of how to install it:
Results using Blender to pose for Stable Diffusion
I exported three images from Blender:
Openpose, hands-only canny, hands-only depth
I wanted the generated image to have the “rock on” sign. Here’s what I got with the following prompt:
a woman smiling with showing a rock-and-roll hand sign outdoors
Negative prompt: monochrome, black-and-white
The rock-and-roll sign was perfect but I wondered what happened with the woman’s left hand.
As you can see from this image, the left arm and hand is supposed to be more or less straight down.
Oh well, this is Stable Diffusion. Lots to learn all the time.
Nota bene for prompting
Since it’s generally advisable to produce images that reflect the people you’re marketing to, it’s important to be obvious when prompting, even if it would be impolite in regular speech.
If you don’t specify the facial features of the character, you will likely get a Caucasian person.
There is a rumour stating that Caucasian and East Asian women are heavily featured in models.
There is a difference between “woman” and “girl” as well as “man” and “boy”. If you want to be specific about the character’s age, use “woman” or “man” and add adjectives like “old”, “middle-aged”, or specify an age “in his sixties”.
The pose that will create the best image involve an upright standing pose. Standing characters will get you the most hits per batch.
Generating your own images can be rather complicated and time consuming.
If you find a stock image that you like, but the person isn’t representative of your market, try a face swap because it’s extremely quick and effective.