# Stable Diffusion Troubleshooting Guide

## Installation Issues

### Package conflicts

**Error**: `ImportError: cannot import name 'cached_download' from 'huggingface_hub'`

**Fix**:
```bash
# Update huggingface_hub
pip install --upgrade huggingface_hub

# Reinstall diffusers
pip install --upgrade diffusers
```

### xFormers installation fails

**Error**: `RuntimeError: CUDA error: no kernel image is available for execution`

**Fix**:
```bash
# Check CUDA version
nvcc --version

# Install matching xformers
pip install xformers --index-url https://download.pytorch.org/whl/cu121  # For CUDA 12.1

# Or build from source
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
```

### Torch/CUDA mismatch

**Error**: `RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED`

**Fix**:
```bash
# Check versions
python -c "import torch; print(torch.__version__, torch.cuda.is_available())"

# Reinstall PyTorch with correct CUDA
pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
```

## Memory Issues

### CUDA out of memory

**Error**: `torch.cuda.OutOfMemoryError: CUDA out of memory`

**Solutions**:

```python
# Solution 1: Enable CPU offloading
pipe.enable_model_cpu_offload()

# Solution 2: Sequential CPU offload (more aggressive)
pipe.enable_sequential_cpu_offload()

# Solution 3: Attention slicing
pipe.enable_attention_slicing()

# Solution 4: VAE slicing for large images
pipe.enable_vae_slicing()

# Solution 5: Use lower precision
pipe = DiffusionPipeline.from_pretrained(
    "model-id",
    torch_dtype=torch.float16  # or torch.bfloat16
)

# Solution 6: Reduce batch size
image = pipe(prompt, num_images_per_prompt=1).images[0]

# Solution 7: Generate smaller images
image = pipe(prompt, height=512, width=512).images[0]

# Solution 8: Clear cache between generations
import gc
torch.cuda.empty_cache()
gc.collect()
```

### Memory grows over time

**Problem**: Memory usage increases with each generation

**Fix**:
```python
import gc
import torch

def generate_with_cleanup(pipe, prompt, **kwargs):
    try:
        image = pipe(prompt, **kwargs).images[0]
        return image
    finally:
        # Clear cache after generation
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
        gc.collect()
```

### Large model loading fails

**Error**: `RuntimeError: Unable to load model weights`

**Fix**:
```python
# Use low CPU memory mode
pipe = DiffusionPipeline.from_pretrained(
    "large-model-id",
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16
)
```

## Generation Issues

### Black images

**Problem**: Output images are completely black

**Solutions**:
```python
# Solution 1: Disable safety checker
pipe.safety_checker = None

# Solution 2: Check VAE scaling
# The issue might be with VAE encoding/decoding
latents = latents / pipe.vae.config.scaling_factor  # Before decode

# Solution 3: Ensure proper dtype
pipe = pipe.to(dtype=torch.float16)
pipe.vae = pipe.vae.to(dtype=torch.float32)  # VAE often needs fp32

# Solution 4: Check guidance scale
# Too high can cause issues
image = pipe(prompt, guidance_scale=7.5).images[0]  # Not 20+
```

### Noise/static images

**Problem**: Output looks like random noise

**Solutions**:
```python
# Solution 1: Increase inference steps
image = pipe(prompt, num_inference_steps=50).images[0]

# Solution 2: Check scheduler configuration
pipe.scheduler = pipe.scheduler.from_config(pipe.scheduler.config)

# Solution 3: Verify model was loaded correctly
print(pipe.unet)  # Should show model architecture
```

### Blurry images

**Problem**: Output images are low quality or blurry

**Solutions**:
```python
# Solution 1: Use more steps
image = pipe(prompt, num_inference_steps=50).images[0]

# Solution 2: Use better VAE
from diffusers import AutoencoderKL
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
pipe.vae = vae

# Solution 3: Use SDXL or refiner
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0"
)

# Solution 4: Upscale with img2img
upscale_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(...)
upscaled = upscale_pipe(
    prompt=prompt,
    image=image.resize((1024, 1024)),
    strength=0.3
).images[0]
```

### Prompt not being followed

**Problem**: Generated image doesn't match the prompt

**Solutions**:
```python
# Solution 1: Increase guidance scale
image = pipe(prompt, guidance_scale=10.0).images[0]

# Solution 2: Use negative prompts
image = pipe(
    prompt="A red car",
    negative_prompt="blue, green, yellow, wrong color",
    guidance_scale=7.5
).images[0]

# Solution 3: Use prompt weighting
# Emphasize important words
prompt = "A (red:1.5) car on a street"

# Solution 4: Use longer, more detailed prompts
prompt = """
A bright red sports car, ferrari style, parked on a city street,
photorealistic, high detail, 8k, professional photography
"""
```

### Distorted faces/hands

**Problem**: Faces and hands look deformed

**Solutions**:
```python
# Solution 1: Use negative prompts
negative_prompt = """
bad hands, bad anatomy, deformed, ugly, blurry,
extra fingers, mutated hands, poorly drawn hands,
poorly drawn face, mutation, deformed face
"""

# Solution 2: Use face-specific models
# ADetailer or similar post-processing

# Solution 3: Use ControlNet for poses
# Load pose estimation and condition generation

# Solution 4: Inpaint problematic areas
mask = create_face_mask(image)
fixed = inpaint_pipe(
    prompt="beautiful detailed face",
    image=image,
    mask_image=mask
).images[0]
```

## Scheduler Issues

### Scheduler not compatible

**Error**: `ValueError: Scheduler ... is not compatible with pipeline`

**Fix**:
```python
from diffusers import EulerDiscreteScheduler

# Create scheduler from config
pipe.scheduler = EulerDiscreteScheduler.from_config(
    pipe.scheduler.config
)

# Check compatible schedulers
print(pipe.scheduler.compatibles)
```

### Wrong number of steps

**Problem**: Model generates different quality with same steps

**Fix**:
```python
# Reset timesteps explicitly
pipe.scheduler.set_timesteps(num_inference_steps)

# Check scheduler's step count
print(len(pipe.scheduler.timesteps))
```

## LoRA Issues

### LoRA weights not loading

**Error**: `RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel`

**Fix**:
```python
# Check weight file format
# Should be .safetensors or .bin

# Load with correct key prefix
pipe.load_lora_weights(
    "path/to/lora",
    weight_name="lora.safetensors"
)

# Try loading into specific component
pipe.unet.load_attn_procs("path/to/lora")
```

### LoRA not affecting output

**Problem**: Generated images look the same with/without LoRA

**Fix**:
```python
# Fuse LoRA weights
pipe.fuse_lora(lora_scale=1.0)

# Or set scale explicitly
pipe.set_adapters(["lora_name"], adapter_weights=[1.0])

# Verify LoRA is loaded
print(list(pipe.unet.attn_processors.keys()))
```

### Multiple LoRAs conflict

**Problem**: Multiple LoRAs produce artifacts

**Fix**:
```python
# Load with different adapter names
pipe.load_lora_weights("lora1", adapter_name="style")
pipe.load_lora_weights("lora2", adapter_name="subject")

# Balance weights
pipe.set_adapters(
    ["style", "subject"],
    adapter_weights=[0.5, 0.5]  # Lower weights
)

# Or use LoRA merge before loading
# Merge LoRAs offline with appropriate ratios
```

## ControlNet Issues

### ControlNet not conditioning

**Problem**: ControlNet has no effect on output

**Fix**:
```python
# Check control image format
# Should be RGB, matching generation size
control_image = control_image.resize((512, 512))

# Increase conditioning scale
image = pipe(
    prompt=prompt,
    image=control_image,
    controlnet_conditioning_scale=1.0,  # Try 0.5-1.5
    num_inference_steps=30
).images[0]

# Verify ControlNet is loaded
print(pipe.controlnet)
```

### Control image preprocessing

**Fix**:
```python
from controlnet_aux import CannyDetector

# Proper preprocessing
canny = CannyDetector()
control_image = canny(input_image)

# Ensure correct format
control_image = control_image.convert("RGB")
control_image = control_image.resize((512, 512))
```

## Hub/Download Issues

### Model download fails

**Error**: `requests.exceptions.ConnectionError`

**Fix**:
```bash
# Set longer timeout
export HF_HUB_DOWNLOAD_TIMEOUT=600

# Use mirror if available
export HF_ENDPOINT=https://hf-mirror.com

# Or download manually
huggingface-cli download stable-diffusion-v1-5/stable-diffusion-v1-5
```

### Cache issues

**Error**: `OSError: Can't load model from cache`

**Fix**:
```bash
# Clear cache
rm -rf ~/.cache/huggingface/hub

# Or set different cache location
export HF_HOME=/path/to/cache

# Force re-download
pipe = DiffusionPipeline.from_pretrained(
    "model-id",
    force_download=True
)
```

### Access denied for gated models

**Error**: `401 Client Error: Unauthorized`

**Fix**:
```bash
# Login to Hugging Face
huggingface-cli login

# Or use token
pipe = DiffusionPipeline.from_pretrained(
    "model-id",
    token="hf_xxxxx"
)

# Accept model license on Hub website first
```

## Performance Issues

### Slow generation

**Problem**: Generation takes too long

**Solutions**:
```python
# Solution 1: Use faster scheduler
from diffusers import DPMSolverMultistepScheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
    pipe.scheduler.config
)

# Solution 2: Reduce steps
image = pipe(prompt, num_inference_steps=20).images[0]

# Solution 3: Use LCM
from diffusers import LCMScheduler
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
image = pipe(prompt, num_inference_steps=4, guidance_scale=1.0).images[0]

# Solution 4: Enable xFormers
pipe.enable_xformers_memory_efficient_attention()

# Solution 5: Compile model
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
```

### First generation is slow

**Problem**: First image takes much longer

**Fix**:
```python
# Warm up the model
_ = pipe("warmup", num_inference_steps=1)

# Then run actual generation
image = pipe(prompt, num_inference_steps=50).images[0]

# Compile for faster subsequent runs
pipe.unet = torch.compile(pipe.unet)
```

## Debugging Tips

### Enable debug logging

```python
import logging
logging.basicConfig(level=logging.DEBUG)

# Or for specific modules
logging.getLogger("diffusers").setLevel(logging.DEBUG)
logging.getLogger("transformers").setLevel(logging.DEBUG)
```

### Check model components

```python
# Print pipeline components
print(pipe.components)

# Check model config
print(pipe.unet.config)
print(pipe.vae.config)
print(pipe.scheduler.config)

# Verify device placement
print(pipe.device)
for name, module in pipe.components.items():
    if hasattr(module, 'device'):
        print(f"{name}: {module.device}")
```

### Validate inputs

```python
# Check image dimensions
print(f"Height: {height}, Width: {width}")
assert height % 8 == 0, "Height must be divisible by 8"
assert width % 8 == 0, "Width must be divisible by 8"

# Check prompt tokenization
tokens = pipe.tokenizer(prompt, return_tensors="pt")
print(f"Token count: {tokens.input_ids.shape[1]}")  # Max 77 for SD
```

### Save intermediate results

```python
def save_latents_callback(pipe, step_index, timestep, callback_kwargs):
    latents = callback_kwargs["latents"]

    # Decode and save intermediate
    with torch.no_grad():
        image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor).sample
    image = (image / 2 + 0.5).clamp(0, 1)
    image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
    Image.fromarray((image * 255).astype("uint8")).save(f"step_{step_index}.png")

    return callback_kwargs

image = pipe(
    prompt,
    callback_on_step_end=save_latents_callback,
    callback_on_step_end_tensor_inputs=["latents"]
).images[0]
```

## Getting Help

1. **Documentation**: https://huggingface.co/docs/diffusers
2. **GitHub Issues**: https://github.com/huggingface/diffusers/issues
3. **Discord**: https://discord.gg/diffusers
4. **Forum**: https://discuss.huggingface.co

### Reporting Issues

Include:
- Diffusers version: `pip show diffusers`
- PyTorch version: `python -c "import torch; print(torch.__version__)"`
- CUDA version: `nvcc --version`
- GPU model: `nvidia-smi`
- Full error traceback
- Minimal reproducible code
- Model name/ID used