Categories
Tutorials

Turn Selfie Into Cartoon With Stable Diffusion

If you can’t draw but want a comic book and cartoonish interpretation of yourself, then you have two options.

  • Hire someone to do a digital illustration
  • Use Stable Diffusion to generate an AI image

Ultimately, the second method is a lot cheaper and you can do it in the comfort of your own home. Let’s find out how to make a Stable Diffusion cartoon.

This tutorial assumes you have an understanding of AI image generator terms, have Automatic1111 installed and a general understanding of Controlnet.

General steps

  • Take a photo
  • Choose an appropriate AI model
  • Prompt and pray
  • Use Controlnet for further refinement
  • Plot different parameters to see outcome
  • Generate a large batch and find a winner

Generating a cartoon using img2img

If you’re looking for perfection, you might want to train a textual inversion of yourself using the inbuilt training model or Dreambooth. LoRa is much more accurate but it does take a lot of effort.

However, I found that using img2img gave me pretty good results, and I want to share my method.

Photograph an input image for img2img

Take your camera and photograph yourself on a white background.

As img2img is sensitive to colour, it’ll be ideal if you could get an evenly lit face. Try a window on a cloudy day.

The background also matters because if the original image has a coloured background or has shadows, switches or items on it, then so will the generated image.

That said, even with a less-than-perfect photo, you can get pretty good results. Here’s what I took:

xuyun zeng ready for img2img in stable diffusion

I photographed this image on my phone, and cropped it to 512x512px.

Artificial intelligence image generation is a huge resource hog and I only have a Nvidia RTX2070 with 8GB of RAM, so I can’t be too ambitious.

I placed this into img2img.

Automatic1111 img2img generating cartoon portrait

Select the right model

In my experimentation, the AI model you use has a major effect on the style of the AI generated image.

TFM American Cartoons Model as hosted on Civitai

I tried three models and settled on TFM American Cartoons Model. The other two I used was the Stable Diffusion v1.5 pruned model and Realistic Vision v1.2.

I would recommend you gave both the TFM American Cartoons model a shot, as well as Realistic Vision.

Prompt and Pray

Let’s start prompting.

I have found that this prompt is a good start:

Prompt

color comic book cartoon avatar of an male smiling, white background

Negative prompt:

3d, photorealistic

Since models can be biased towards specific races (usually Asian or Caucasian), you could experiment with putting your race in the prompt and whatever you are not in the negative prompt.

At the risk of sounding a bit unrefined, you should avoid using colours instead of the names of the race. So, avoid “white”, but use “Caucasian” or “European”.

Controlnet Canny

To increase fidelity to the source photo, I used Controlnet to further guide the artificial intelligence.

Setting up Stable Diffusion Controlnet canny in Automatic1111.

Upload your image to Controlnet and make sure to enable it.

Choose “canny” for the preprocessor and do the same for the model (bottom red rectangle).

Since my image is 512x512px, I didn’t have to change anything related to dimensions.

Then press “preview annotator result” at the bottom.

Results

With this prompt and pray technique, plus Controlnet, here’s what I got. Some keepers:

Some duds:

And some just make me scream, “who dat”.

X/Y plot: denoise vs CFG

Regardless of the model that you choose, I’d recommend you do a denoise-CFG plot.

Depending on the model you use, your results might sway from what you want to something completely unusable. Take a look:

CFG and denoising strength x/y plot for stable diffusion cartoon creation

There’s a sweet spot highlighted in the red square between CFG 5 to 11, and denoising strength of 0.7 to 1.0.

At lower denoising strengths, you get realistic images. At higher CFGs, you get abstract paintings.

It wasn’t until I switched from Realistic Vision to TFM American Cartoons model that I got even more hits.

Consistently, TFM produces much more cartoony and comic-like images. The cartoon faces also look much more like me.

It’s now just a question of seeing what I prefer and locking in the parameters in Automatic1111.

Prompt. Pray. Spray.

Now that you’ve found the perfect settings, it’s time to pray and spray.

Lock in your ideal denoising strength. Choose the model which gives you the best chance of a positive outcome.

Then, make a large batch of images.

Find your keeper.

Done!

Besides the apparent male-pattern baldness, the three images above are winners.

Your model matters. Your prompt matters.

If I had to create a cartoon avatar of a headshot, I’d do the following:

  • Take the best portrait
  • Use the prompt above and refine it based on what I’m getting
  • Run a X/Y CFG-denoising strength plot, and find a sweet spot (e.g. CFG 7 and denoising 0.9)
  • Take this sweet spot and generate more samples.
  • Pick the winner output image and upsample it if necessary.

AI is amazing for communications and marketing professionals, especially if you are looking for a model to market your products.

Have a project in mind?

Websites. Graphics. SEO-oriented content.

I can get your next project off the ground.

See how I have helped my clients.