PASTEDOWN   387   3
   1476 9.06 KB    154

Training a Latex Pony LoRA

By twiByte
Created: 2024-02-20 09:57:39
Updated: 2024-04-18 23:53:24
Expiry: Never

Training a Latex Pony LoRA

This guide is specifically meant for training LoRA models on 2D pony images featuring latex clothes, but will likely apply to most pony images regardless of subject and style. It also assumes that you already know how to use LoRAs in your Stable Diffusion interface of choice.

txt2img examples:

Inpainting example:

Prerequisites

  • An NVIDIA GPU with at least 18GB of VRAM.
    • Untested on AMD/Intel GPUs and may produce poor results.
  • Windows 11.
    • Untested on other operating systems.
  • Python.
    • Check Kohya's recommended version.
  • Pony Diffusion V6 XL and its matching SDXL VAE.
  • Kohya GUI.
    • https://github.com/bmaltais/kohya_ss
    • If using an NVIDIA GPU, make sure you install the CUDA libraries as specified in the installation instructions and check that xformers is enabled in the interface for faster model training.
    • This may spew exception errors when you first try to run it.
      • Read the errors - you may just need to install Python dependencies such as "accelerate" into Kohya's virtual Python environment.
      • Install missing dependencies using the included Python environment found in the "venv" folder instead of using your system's Python libraries!
  • AUTOMATIC1111 (or equivalent) Stable Diffusion interface.
  • ~10-50 uncaptioned artist images.
    • You'll need to source these yourself!
    • Too few may make prompting ineffective or extremely rigid (i.e. undercooked).
    • Too many may "burn" your prompted images (i.e. overcooked).
    • Try to avoid having any text or watermarks in your images.
    • It's crucial that you don't use images of "decapitated/floating" pony portraits/busts, as this may confuse the model.
      • The occasional partially occluded pony image (e.g. a pony partially standing off-screen) may potentially help with training.
    • Avoid having characters be in "confusing" poses/predicaments (e.g. rope suspension bondage, partial mummification) UNLESS that's the specific element you want to train for.
    • Either use ponies, Equestria Girls, anthros or human images.
      • LoRAs trained exclusively on pony images usually work well with humanoid forms by default.
      • If low on artist images, you may be able to get away with the odd mixed image as long as the rendering style is consistent.
  • ~100 uncaptioned all-purpose pony regularisation images.
    • https://twibooru.org/galleries/458
      • Use gallery-dl or a similar tool to download everything in this gallery.
    • If using your own, use a mixture of clothed/naked pony vectors and any unwatermarked screencaps containing ~1-3 ponies.
    • Helps the model understand which objects and rendering styles are desired in your outputs.
    • Adding a few images of occluded ponies may help with training.
    • Either use ponies, Equestria Girls, anthros or human images.
      • LoRAs trained exclusively on pony images usually work well with humanoid forms by default.
      • Your class prompt (see below) must describe the character format you choose (e.g. pony).

Training with Kohya

In the Kohya GUI, the only settings you need to change under the LoRA tab are:

  • Click on the LoRA tab at the top first!
    • The Dreambooth tab may be selected by default instead and looks practically identical.
  • Source model
    • Model Quick Pick
      • Set this to "custom".
    • Pretrained model name or path
      • Set this to your Pony Diffusion V6 XL model path.
    • Tick the "SDXL Model" checkbox.
  • Folders
    • Image/regularisation/output/logging folders
      • Will be automatically filled in after the dataset preparation steps (see below).
    • Model output name
      • Can be anything, although I always keep it the same as the instance prompt (see below).
  • Parameters
    • Basic
      • Max resolution
        • Needs to be changed from 512,512 to 1024,1024 to capture most details.
  • Dataset preparation
    • Instance prompt
      • This keyword must be present in your prompts to trigger the LoRA model.
      • Use a deliberately clunky keyword such as: latexanon
      • Must absolutely NOT be similar to an existing artist's booru tag!
    • Class prompt
      • Should just be set to: pony
        • You may need to change this if your artist images are mostly non-pony.
    • Training images
      • Set this to the folder containing your artist images.
      • Repeats
        • For best results: ~1,800 comparisons / total artist images = total repeats
    • Regularisation images
      • Set this to the folder containing your regularisation images.
      • Repeats
        • Leave this set to 1.
    • Destination training directory
      • Can be any empty folder you want.
    • Make sure you click the "Prepare training data" and "Copy info to Folders Tab" buttons once you're done here.

Once all that's done, click on the "Start training" button to begin!

During training

  • Keep an eye on the avr_loss value.
    • If it ever says NaN (i.e. Not a Number), the model is likely to be dead.
      • Check your artist images for consistency and possibly remove a few "odd" images before trying again.
    • This should be around ~0.05 for the best results.
  • If using an NVIDIA GPU and you exceed your VRAM, the GPU driver may start using your system RAM or page file instead, causing training to become EXTREMELY slow.
    • Check dedicated VRAM usage on Windows 10+ in the Task Manager to confirm.
  • It takes about an hour with a 4090 to train a model with ~3,600 steps at ~1.8it/s (iterations/steps per second).
    • Slower GPUs may see s/its (seconds per iteration/step) instead.

After training

  • Install the LoRA model in your Stable Diffusion interface.
    • For AUTOMATIC1111, copy the model to: /models/Lora
  • Make sure both the Pony Diffusion V6 XL model and separate VAE are selected in your interface.

Recommended prompt settings

  • Euler a.
  • CFG scale 4-7.
  • 40-80 sampling steps.
  • Use a width and height of 1,024 or less to avoid "longcat"-esque bodies and other strange anomalies.

Recommended upscaling ("Hires. fix") settings

  • 4x Loyaldk SuperPony V2.0 OR R-ESRGAN 4x+ Anime6B upscaler.
  • Upscale by 2.
  • 20 hires steps.
  • 0.3 denoising strength.

Troubleshooting

Your model isn't producing desirable results

  • Initial prompts may produce disappointing results until you figure out your model's quirks.
    • Use the LoRA initialiser and instance prompt at the start of your prompt.
      • E.g. If your LoRA model is named latexanon and your class prompt shares the same name: <lora:latexanon:1>, latexanon clothes
        • In this particular example, adding clothes to the instance prompt seems to help.
      • You shouldn't need to adjust the weight of your LoRA (i.e. the :1 decimal suffix) unless it is severely overcooked.
    • Add one or more sets of round brackets around prompt phrases (e.g. (((latex stockings))) ) to increasingly encourage the model to focus on it.
      • E.g. adding the (((((naked))))) prompt weight to strongly encourage the model to not paste a catsuit onto every pony may help when you just want latex stockings instead.
    • Add one or more sets of square brackets around prompt phrases (e.g. [[[anthro]]] ) to increasingly discourage the model from focusing on it.
    • Negative prompts don't tend to do much with Pony Diffusion and may negatively affect your generated images.
    • Use the class prompt in your prompt.
      • This is most likely pony if you followed the guide exactly.
      • Not usually required, especially when asking for Equestria Girls/anthros/humans instead, but may help in some cases.
    • Use booru tags when possible.
      • E.g. score_9, score_8, score_7, rating_suggestive, twilight sparkle, pony, alicorn, show accurate, vector
      • Misspelt/misformatted booru tags aren't usually required (e.g. ball gag works well even though the boorus all use ballgag instead).
    • You may need to describe the artist's particular style as well.
      • E.g. pointy art style, bold linework

Your model seemingly has little to no effect

  • Your model may be undercooked.
    • E.g. Not enough steps, too few artist images etc.

Images look "burnt" or uncanny

  • Your model may be overcooked.
    • E.g. Too many steps, too many artist images etc.

Images are strange or uncanny

  • Your model may be confused.
    • E.g. Confusing poses/predicaments in training images.
    • E.g. Mixing pony/Equestria Girls/anthro/human images.
      • LoRAs trained exclusively on pony images usually work well with humanoid forms by default.
    • E.g. Using "decapitated/floating" pony portrait/bust training images.
    • E.g. Lots of text featured in training images.
  • "Longcat"-esque long bodies may sometimes occur if your output resolution is greater than your base model resolution.
    • Try generating again using a different seed.