Sync all skills and memories 2026-04-14 07:27

2026-04-14 07:27:20 +09:00
parent 516bb44fe6
commit 1eba2bca95
386 changed files with 167655 additions and 0 deletions
--- a/skills/mlops/models/stable-diffusion/references/troubleshooting.md
+++ b/skills/mlops/models/stable-diffusion/references/troubleshooting.md
@@ -0,0 +1,555 @@
+# Stable Diffusion Troubleshooting Guide
+
+## Installation Issues
+
+### Package conflicts
+
+**Error**: `ImportError: cannot import name 'cached_download' from 'huggingface_hub'`
+
+**Fix**:
+```bash
+# Update huggingface_hub
+pip install --upgrade huggingface_hub
+
+# Reinstall diffusers
+pip install --upgrade diffusers
+```
+
+### xFormers installation fails
+
+**Error**: `RuntimeError: CUDA error: no kernel image is available for execution`
+
+**Fix**:
+```bash
+# Check CUDA version
+nvcc --version
+
+# Install matching xformers
+pip install xformers --index-url https://download.pytorch.org/whl/cu121  # For CUDA 12.1
+
+# Or build from source
+pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
+```
+
+### Torch/CUDA mismatch
+
+**Error**: `RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED`
+
+**Fix**:
+```bash
+# Check versions
+python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
+
+# Reinstall PyTorch with correct CUDA
+pip uninstall torch torchvision
+pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
+```
+
+## Memory Issues
+
+### CUDA out of memory
+
+**Error**: `torch.cuda.OutOfMemoryError: CUDA out of memory`
+
+**Solutions**:
+
+```python
+# Solution 1: Enable CPU offloading
+pipe.enable_model_cpu_offload()
+
+# Solution 2: Sequential CPU offload (more aggressive)
+pipe.enable_sequential_cpu_offload()
+
+# Solution 3: Attention slicing
+pipe.enable_attention_slicing()
+
+# Solution 4: VAE slicing for large images
+pipe.enable_vae_slicing()
+
+# Solution 5: Use lower precision
+pipe = DiffusionPipeline.from_pretrained(
+    "model-id",
+    torch_dtype=torch.float16  # or torch.bfloat16
+)
+
+# Solution 6: Reduce batch size
+image = pipe(prompt, num_images_per_prompt=1).images[0]
+
+# Solution 7: Generate smaller images
+image = pipe(prompt, height=512, width=512).images[0]
+
+# Solution 8: Clear cache between generations
+import gc
+torch.cuda.empty_cache()
+gc.collect()
+```
+
+### Memory grows over time
+
+**Problem**: Memory usage increases with each generation
+
+**Fix**:
+```python
+import gc
+import torch
+
+def generate_with_cleanup(pipe, prompt, **kwargs):
+    try:
+        image = pipe(prompt, **kwargs).images[0]
+        return image
+    finally:
+        # Clear cache after generation
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+        gc.collect()
+```
+
+### Large model loading fails
+
+**Error**: `RuntimeError: Unable to load model weights`
+
+**Fix**:
+```python
+# Use low CPU memory mode
+pipe = DiffusionPipeline.from_pretrained(
+    "large-model-id",
+    low_cpu_mem_usage=True,
+    torch_dtype=torch.float16
+)
+```
+
+## Generation Issues
+
+### Black images
+
+**Problem**: Output images are completely black
+
+**Solutions**:
+```python
+# Solution 1: Disable safety checker
+pipe.safety_checker = None
+
+# Solution 2: Check VAE scaling
+# The issue might be with VAE encoding/decoding
+latents = latents / pipe.vae.config.scaling_factor  # Before decode
+
+# Solution 3: Ensure proper dtype
+pipe = pipe.to(dtype=torch.float16)
+pipe.vae = pipe.vae.to(dtype=torch.float32)  # VAE often needs fp32
+
+# Solution 4: Check guidance scale
+# Too high can cause issues
+image = pipe(prompt, guidance_scale=7.5).images[0]  # Not 20+
+```
+
+### Noise/static images
+
+**Problem**: Output looks like random noise
+
+**Solutions**:
+```python
+# Solution 1: Increase inference steps
+image = pipe(prompt, num_inference_steps=50).images[0]
+
+# Solution 2: Check scheduler configuration
+pipe.scheduler = pipe.scheduler.from_config(pipe.scheduler.config)
+
+# Solution 3: Verify model was loaded correctly
+print(pipe.unet)  # Should show model architecture
+```
+
+### Blurry images
+
+**Problem**: Output images are low quality or blurry
+
+**Solutions**:
+```python
+# Solution 1: Use more steps
+image = pipe(prompt, num_inference_steps=50).images[0]
+
+# Solution 2: Use better VAE
+from diffusers import AutoencoderKL
+vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
+pipe.vae = vae
+
+# Solution 3: Use SDXL or refiner
+pipe = DiffusionPipeline.from_pretrained(
+    "stabilityai/stable-diffusion-xl-base-1.0"
+)
+
+# Solution 4: Upscale with img2img
+upscale_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(...)
+upscaled = upscale_pipe(
+    prompt=prompt,
+    image=image.resize((1024, 1024)),
+    strength=0.3
+).images[0]
+```
+
+### Prompt not being followed
+
+**Problem**: Generated image doesn't match the prompt
+
+**Solutions**:
+```python
+# Solution 1: Increase guidance scale
+image = pipe(prompt, guidance_scale=10.0).images[0]
+
+# Solution 2: Use negative prompts
+image = pipe(
+    prompt="A red car",
+    negative_prompt="blue, green, yellow, wrong color",
+    guidance_scale=7.5
+).images[0]
+
+# Solution 3: Use prompt weighting
+# Emphasize important words
+prompt = "A (red:1.5) car on a street"
+
+# Solution 4: Use longer, more detailed prompts
+prompt = """
+A bright red sports car, ferrari style, parked on a city street,
+photorealistic, high detail, 8k, professional photography
+"""
+```
+
+### Distorted faces/hands
+
+**Problem**: Faces and hands look deformed
+
+**Solutions**:
+```python
+# Solution 1: Use negative prompts
+negative_prompt = """
+bad hands, bad anatomy, deformed, ugly, blurry,
+extra fingers, mutated hands, poorly drawn hands,
+poorly drawn face, mutation, deformed face
+"""
+
+# Solution 2: Use face-specific models
+# ADetailer or similar post-processing
+
+# Solution 3: Use ControlNet for poses
+# Load pose estimation and condition generation
+
+# Solution 4: Inpaint problematic areas
+mask = create_face_mask(image)
+fixed = inpaint_pipe(
+    prompt="beautiful detailed face",
+    image=image,
+    mask_image=mask
+).images[0]
+```
+
+## Scheduler Issues
+
+### Scheduler not compatible
+
+**Error**: `ValueError: Scheduler ... is not compatible with pipeline`
+
+**Fix**:
+```python
+from diffusers import EulerDiscreteScheduler
+
+# Create scheduler from config
+pipe.scheduler = EulerDiscreteScheduler.from_config(
+    pipe.scheduler.config
+)
+
+# Check compatible schedulers
+print(pipe.scheduler.compatibles)
+```
+
+### Wrong number of steps
+
+**Problem**: Model generates different quality with same steps
+
+**Fix**:
+```python
+# Reset timesteps explicitly
+pipe.scheduler.set_timesteps(num_inference_steps)
+
+# Check scheduler's step count
+print(len(pipe.scheduler.timesteps))
+```
+
+## LoRA Issues
+
+### LoRA weights not loading
+
+**Error**: `RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel`
+
+**Fix**:
+```python
+# Check weight file format
+# Should be .safetensors or .bin
+
+# Load with correct key prefix
+pipe.load_lora_weights(
+    "path/to/lora",
+    weight_name="lora.safetensors"
+)
+
+# Try loading into specific component
+pipe.unet.load_attn_procs("path/to/lora")
+```
+
+### LoRA not affecting output
+
+**Problem**: Generated images look the same with/without LoRA
+
+**Fix**:
+```python
+# Fuse LoRA weights
+pipe.fuse_lora(lora_scale=1.0)
+
+# Or set scale explicitly
+pipe.set_adapters(["lora_name"], adapter_weights=[1.0])
+
+# Verify LoRA is loaded
+print(list(pipe.unet.attn_processors.keys()))
+```
+
+### Multiple LoRAs conflict
+
+**Problem**: Multiple LoRAs produce artifacts
+
+**Fix**:
+```python
+# Load with different adapter names
+pipe.load_lora_weights("lora1", adapter_name="style")
+pipe.load_lora_weights("lora2", adapter_name="subject")
+
+# Balance weights
+pipe.set_adapters(
+    ["style", "subject"],
+    adapter_weights=[0.5, 0.5]  # Lower weights
+)
+
+# Or use LoRA merge before loading
+# Merge LoRAs offline with appropriate ratios
+```
+
+## ControlNet Issues
+
+### ControlNet not conditioning
+
+**Problem**: ControlNet has no effect on output
+
+**Fix**:
+```python
+# Check control image format
+# Should be RGB, matching generation size
+control_image = control_image.resize((512, 512))
+
+# Increase conditioning scale
+image = pipe(
+    prompt=prompt,
+    image=control_image,
+    controlnet_conditioning_scale=1.0,  # Try 0.5-1.5
+    num_inference_steps=30
+).images[0]
+
+# Verify ControlNet is loaded
+print(pipe.controlnet)
+```
+
+### Control image preprocessing
+
+**Fix**:
+```python
+from controlnet_aux import CannyDetector
+
+# Proper preprocessing
+canny = CannyDetector()
+control_image = canny(input_image)
+
+# Ensure correct format
+control_image = control_image.convert("RGB")
+control_image = control_image.resize((512, 512))
+```
+
+## Hub/Download Issues
+
+### Model download fails
+
+**Error**: `requests.exceptions.ConnectionError`
+
+**Fix**:
+```bash
+# Set longer timeout
+export HF_HUB_DOWNLOAD_TIMEOUT=600
+
+# Use mirror if available
+export HF_ENDPOINT=https://hf-mirror.com
+
+# Or download manually
+huggingface-cli download stable-diffusion-v1-5/stable-diffusion-v1-5
+```
+
+### Cache issues
+
+**Error**: `OSError: Can't load model from cache`
+
+**Fix**:
+```bash
+# Clear cache
+rm -rf ~/.cache/huggingface/hub
+
+# Or set different cache location
+export HF_HOME=/path/to/cache
+
+# Force re-download
+pipe = DiffusionPipeline.from_pretrained(
+    "model-id",
+    force_download=True
+)
+```
+
+### Access denied for gated models
+
+**Error**: `401 Client Error: Unauthorized`
+
+**Fix**:
+```bash
+# Login to Hugging Face
+huggingface-cli login
+
+# Or use token
+pipe = DiffusionPipeline.from_pretrained(
+    "model-id",
+    token="hf_xxxxx"
+)
+
+# Accept model license on Hub website first
+```
+
+## Performance Issues
+
+### Slow generation
+
+**Problem**: Generation takes too long
+
+**Solutions**:
+```python
+# Solution 1: Use faster scheduler
+from diffusers import DPMSolverMultistepScheduler
+pipe.scheduler = DPMSolverMultistepScheduler.from_config(
+    pipe.scheduler.config
+)
+
+# Solution 2: Reduce steps
+image = pipe(prompt, num_inference_steps=20).images[0]
+
+# Solution 3: Use LCM
+from diffusers import LCMScheduler
+pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
+pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
+image = pipe(prompt, num_inference_steps=4, guidance_scale=1.0).images[0]
+
+# Solution 4: Enable xFormers
+pipe.enable_xformers_memory_efficient_attention()
+
+# Solution 5: Compile model
+pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
+```
+
+### First generation is slow
+
+**Problem**: First image takes much longer
+
+**Fix**:
+```python
+# Warm up the model
+_ = pipe("warmup", num_inference_steps=1)
+
+# Then run actual generation
+image = pipe(prompt, num_inference_steps=50).images[0]
+
+# Compile for faster subsequent runs
+pipe.unet = torch.compile(pipe.unet)
+```
+
+## Debugging Tips
+
+### Enable debug logging
+
+```python
+import logging
+logging.basicConfig(level=logging.DEBUG)
+
+# Or for specific modules
+logging.getLogger("diffusers").setLevel(logging.DEBUG)
+logging.getLogger("transformers").setLevel(logging.DEBUG)
+```
+
+### Check model components
+
+```python
+# Print pipeline components
+print(pipe.components)
+
+# Check model config
+print(pipe.unet.config)
+print(pipe.vae.config)
+print(pipe.scheduler.config)
+
+# Verify device placement
+print(pipe.device)
+for name, module in pipe.components.items():
+    if hasattr(module, 'device'):
+        print(f"{name}: {module.device}")
+```
+
+### Validate inputs
+
+```python
+# Check image dimensions
+print(f"Height: {height}, Width: {width}")
+assert height % 8 == 0, "Height must be divisible by 8"
+assert width % 8 == 0, "Width must be divisible by 8"
+
+# Check prompt tokenization
+tokens = pipe.tokenizer(prompt, return_tensors="pt")
+print(f"Token count: {tokens.input_ids.shape[1]}")  # Max 77 for SD
+```
+
+### Save intermediate results
+
+```python
+def save_latents_callback(pipe, step_index, timestep, callback_kwargs):
+    latents = callback_kwargs["latents"]
+
+    # Decode and save intermediate
+    with torch.no_grad():
+        image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor).sample
+    image = (image / 2 + 0.5).clamp(0, 1)
+    image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
+    Image.fromarray((image * 255).astype("uint8")).save(f"step_{step_index}.png")
+
+    return callback_kwargs
+
+image = pipe(
+    prompt,
+    callback_on_step_end=save_latents_callback,
+    callback_on_step_end_tensor_inputs=["latents"]
+).images[0]
+```
+
+## Getting Help
+
+1. **Documentation**: https://huggingface.co/docs/diffusers
+2. **GitHub Issues**: https://github.com/huggingface/diffusers/issues
+3. **Discord**: https://discord.gg/diffusers
+4. **Forum**: https://discuss.huggingface.co
+
+### Reporting Issues
+
+Include:
+- Diffusers version: `pip show diffusers`
+- PyTorch version: `python -c "import torch; print(torch.__version__)"`
+- CUDA version: `nvcc --version`
+- GPU model: `nvidia-smi`
+- Full error traceback
+- Minimal reproducible code
+- Model name/ID used