Dreambooth Extension for Stable-Diffusion-WebUI
It also adds several other features, including training multiple concepts simultaneously, and (Coming soon) Inpainting training.
To install, simply go to the "Extensions" tab in the SD Web UI, select the "Available" sub-tab, pick "Load from:" to load the list of extensions, and finally, click "install" next to the Dreambooth entry.
For 8bit adam to run properly, it may be necessary to install the CU116 version of torch and torchvision, which can be accomplished below:
Refer to the appropriate script below for extra flags to install requirements:
Setting the torch command to:
TORCH_COMMAND=pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
will ensure that the proper torch version is installed when webui-user is executed, and then left alone after that, versus trying to install conflicting versions.
We also need a newer version of diffusers, as SD-WebUI uses version 0.3.0, while DB training requires > 0.6.0, so we use 0.7.2. Not having the right diffusers version is the cause of the 'UNet2DConditionModel' object has no attribute 'enable_gradient_checkpointing' error message, as well as safety checker warnings.
To force sd-web-ui to only install one set of requirements, we can specify the command line argument:
And last, if you wish to completely skip the "native" install routine of Dreambooth, you can set the following environment flag: DREAMBOOTH_SKIP_INSTALL=True
This is ideal for "offline mode", where you don't want the script to constantly check things from pypi.
After installing via the WebUI, it is recommended to set the above flags and re-launch the entire Stable-diffusion-webui, not just reload it.
Create a Model
- Go to the Dreambooth tab.
- Under the "Create Model" sub-tab, enter a new model name and select the source checkpoint to train from. The source checkpoint will be extracted to models\dreambooth\MODELNAME\working - the original will not be touched. 2b. Optionally, you can also specify a huggingface model directory and token to create the Dreambooth dataset from huggingface.co. Model path format should be like so: 'runwayml/stable-diffusion-v1-5'
- Click "Create". This will take a minute or two, but when done, the UI should indicate that a new model directory has been set up.
Training (Basic Settings)
- After creating a new model, select the new model name from the "Model" dropdown at the very top.
- Select the "Train Model" sub-tab.
- Fill in the paramters as described below:
Concepts List - The path to a JSON file or a JSON string containing multiple concepts. See here for an example.
If a concepts list is specified, then the instance prompt, class prompt, instance data dir, and class data dir fields will be ignored.
Instance Prompt - A short descriptor of your subject using a UNIQUE keyword and a classifier word. If training a dog, your instance prompt could be "photo of zkz dog".
The key here is that "zkz" is not a word that might overlap with something in the real world "fluff", and "dog" is a generic word to describe your subject. This is only necessary if using prior preservation.
You can use
[filewords] as placeholder for reading caption from the image filename or a seprarte .txt file containing caption, for example,
[filewords], in the style of zymkyr. This syntax is the same as textual inversion templates.
Class Prompt - A keyword indicating what type of "thing" your subject is. If your instance prompt is "photo of zkz dog", your class prompt would be "photo of a dog". Leave this blank to disable prior preservation training.
Dataset Directory - The path to the directory where the images described in Instance Prompt are kept. REQUIRED
Classification dataset directory - The path to the directory where the images described in Class Prompt are kept. If a class prompt is specified and this is left blank, images will be generated to /models/dreambooth/MODELNAME/classifiers/
Total number of classification images to use - Leave at 0 to disable prior preservation. For best results you want ~n*10 classification images - so if you have 40 training photos, then set this to 400. This is just a guess.
Training steps - How many total training steps to complete. According to this guide, you should train for appx 100 steps per sample image. So, if you have 40 instance/sample images, you would train for 4k steps. This is, of course, a rough approximation, and other values will have an effect on final output fidelity.
Batch size - How many training steps to process simultaneously. You probably want to leave this at 1.
Class batch size - How many classification images to generate simultaneously. Set this to whatever you can safely process at once using Txt2Image, or just leave it alone.
Learning rate - You probably don't want to touch this.
Resolution - The resolution to train images at. You probably want to keep this number at 512 or lower unless your GPU is insane. Lowering this (and the resolution of training images) may help with lower-VRAM GPUs.
Save a checkpoint every N steps - How frequently to save a checkpoint from the trained data. I should probably change the default of this to 1000.
Generate a preview image every N steps - How frequently will an image be generated as an example of training progress.
Preview image prompt - The prompt to use to generate preview image. Leave blank to use the instance prompt.
Preview image negative prompt - Like above, but negative. Leave blank to do nothing. :P
Number of samples to generate - Self explainatory?
Sample guidance scale - Like CFG Scale in Txt2Image/Img2Img, used for generating preview.
Sample steps - Same as sample guidance scale, but the number of steps to run to generate preview.
Use CPU Only - As indicated, this is more of a last resort if you can't get it to train with any other settings. Also, as indicated, it will be abysmally slow. Also, you cannot use 8Bit-Adam with CPU Training, or you'll have a bad time.
Don't Cache Latents - Why is this not just called "cache" latents? Because that's what the original script uses, and I'm trying to maintain the ability to update this as easily as possible. Anyway...when this box is checked latents will not be cached. When latents are not cached, you will save a bit of VRAM, but train slightly slower.
Train Text Encoder - Not required, but recommended. Enabling this will probably cost a bit more VRAM, but also purportedly increase output image fidelity.
Use 8Bit Adam - Enable this to save VRAM. Should now work on both windows and Linux without needing WSL.
Center Crop - Crop images if they aren't the right dimensions? I don't use this, and I recommend you just crop your images "right".
Gradient Checkpointing - Enable this to save VRAM at the cost of a bit of speed.
Scale Learning Rate - I don't use this, not sure what impact it has on performance or output quality.
Mixed Precision - Set to 'fp16' to save VRAM at the cost of speed.
Everything after 'Mixed Precision' - Adjust at your own risk. Performance/quality benefits from changing these remain to be tested.
The next two were added after I wrote the above bit, so just ignore me being a big liar.
Pad Tokens - Pads the text tokens to a longer length for some reason.
Max Token Length - raise the tokenizer's default limit above 75. Requires Pad Tokens for > 75.
Apply Horizontal Flip - "Apply horizontal flip augmentation". Flips images horizontally at random, which can potentially offer better editability?
Use EMA for finetuning - Use exponential moving average weight to reduce overfitting during the last iterations.
Once a model has been trained for any number of steps, a config file is saved which contains all of the parameters from the UI.
If you wish to continue training a model, you can simply select the model name from the dropdown and then click the blue button next to the model name dropdown to load previous parameters.
Use DreamBooth to Fine-Tune Stable Diffusion in Google Colab
When choosing images, it’s recommended to keep the following in mind to get the best results:
- Upload a variety of images of your subject. If you’re uploading images of a person, try something like 70% close-ups, 20% from the chest up, 10% full body, so Stable Diffusion also gets some idea of the rest of the subject and not only the face.
- Try to change things up as much as possible in each picture. This means:
- Varying the body pose
- Taking pictures on different days, in different lighting conditions, and with different backgrounds
- Showing a variety of expressions and emotions
- When generating new images, whatever you capture will be over-represented. For example, if you take multiple pictures with the same green field behind you, it’s likely that the generated images of you will also contain the green field, even if you want a dystopic background. This can apply to anything, like jewelry, clothes, or even people in the background. If you want to avoid seeing that element in your generated image, make sure not to repeat it in every shot. On the other hand, if you want it in the generated images, make sure it’s in your pictures more often.
- It’s recommended that you provide ~50 images of what you’d like to train Stable Diffusion on to get great results. However, I’ve only used 20-30 so far, and the results are pretty good. If you’re just starting out and want to test it out, I think 20-30 images should be good enough for now, and you can get 50 images after you’ve seen it work.
Resize & Crop to 512 x 512px
Once you’ve chosen your images, you should prepare them.
First, we need to resize and crop our images to be 512 x 512px. We can easily do this using the website https://birme.net.
To do this, just:
- Visit the website
- Upload your images
- Set your dimensions to 512 x 512px
- Adjust the cropping area to center your subject
- Click on Save as Zip to download the archive.
- You can then unzip it on your computer, and we’ll use them a bit later.
- Birme.net - Resize Images
- Resizing Images using Birme.net
Renaming Your Images
We’ll also want to rename our images to contain the subject’s name:
Firstly, the subject name should be one unique/random/unknown keyword. This is because Stable Diffusion also has some knowledge of The Sandman from other sources other than the one played by Tom Sturridge and we don’t want it to get confused and make a combination of interpretations of The Sandman. As such, I’ll call it Sandman2022 to make sure it’s unique.
Renaming images to subject (1), subject (2) .. subject (30). This is because, using this method, you can train multiple subjects at once. If you want to fine-tune Stable Diffusion with Sandman, your friend Kevin, and your cat, you can give it prepare images for each of them. For the Sandman you’d have Sandman2022 (1), Sandman2022 (2) … Sandman (30), for Kevin you’d have KevinKevinson2022 (1), KevinKevinson2022 (2) … KevinKevinson (30), and for your cat you’d have DexterTheCat (1), DexterTheCat (2) … DexterTheCat(30).
Here’s me renaming my images for Sandman2022 in bulk on Windows. Just select them all, right click one of them and click Rename and give it what name you want and click anywhere to finish the renaming. Everything else will be renamed as well.