485 lines
12 KiB
Markdown
485 lines
12 KiB
Markdown
|
|
# Segment Anything Troubleshooting Guide
|
||
|
|
|
||
|
|
## Installation Issues
|
||
|
|
|
||
|
|
### CUDA not available
|
||
|
|
|
||
|
|
**Error**: `RuntimeError: CUDA not available`
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
# Check CUDA availability
|
||
|
|
import torch
|
||
|
|
print(torch.cuda.is_available())
|
||
|
|
print(torch.version.cuda)
|
||
|
|
|
||
|
|
# Install PyTorch with CUDA
|
||
|
|
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
|
||
|
|
|
||
|
|
# If CUDA works but SAM doesn't use it
|
||
|
|
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
|
||
|
|
sam.to("cuda") # Explicitly move to GPU
|
||
|
|
```
|
||
|
|
|
||
|
|
### Import errors
|
||
|
|
|
||
|
|
**Error**: `ModuleNotFoundError: No module named 'segment_anything'`
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```bash
|
||
|
|
# Install from GitHub
|
||
|
|
pip install git+https://github.com/facebookresearch/segment-anything.git
|
||
|
|
|
||
|
|
# Or clone and install
|
||
|
|
git clone https://github.com/facebookresearch/segment-anything.git
|
||
|
|
cd segment-anything
|
||
|
|
pip install -e .
|
||
|
|
|
||
|
|
# Verify installation
|
||
|
|
python -c "from segment_anything import sam_model_registry; print('OK')"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Missing dependencies
|
||
|
|
|
||
|
|
**Error**: `ModuleNotFoundError: No module named 'cv2'` or similar
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```bash
|
||
|
|
# Install all optional dependencies
|
||
|
|
pip install opencv-python pycocotools matplotlib onnxruntime onnx
|
||
|
|
|
||
|
|
# For pycocotools on Windows
|
||
|
|
pip install pycocotools-windows
|
||
|
|
```
|
||
|
|
|
||
|
|
## Model Loading Issues
|
||
|
|
|
||
|
|
### Checkpoint not found
|
||
|
|
|
||
|
|
**Error**: `FileNotFoundError: checkpoint file not found`
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```bash
|
||
|
|
# Download correct checkpoint
|
||
|
|
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
|
||
|
|
|
||
|
|
# Verify file integrity
|
||
|
|
md5sum sam_vit_h_4b8939.pth
|
||
|
|
# Expected: a7bf3b02f3ebf1267aba913ff637d9a2
|
||
|
|
|
||
|
|
# Use absolute path
|
||
|
|
sam = sam_model_registry["vit_h"](checkpoint="/full/path/to/sam_vit_h_4b8939.pth")
|
||
|
|
```
|
||
|
|
|
||
|
|
### Model type mismatch
|
||
|
|
|
||
|
|
**Error**: `KeyError: 'unexpected key in state_dict'`
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
# Ensure model type matches checkpoint
|
||
|
|
# vit_h checkpoint → vit_h model
|
||
|
|
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
|
||
|
|
|
||
|
|
# vit_l checkpoint → vit_l model
|
||
|
|
sam = sam_model_registry["vit_l"](checkpoint="sam_vit_l_0b3195.pth")
|
||
|
|
|
||
|
|
# vit_b checkpoint → vit_b model
|
||
|
|
sam = sam_model_registry["vit_b"](checkpoint="sam_vit_b_01ec64.pth")
|
||
|
|
```
|
||
|
|
|
||
|
|
### Out of memory during load
|
||
|
|
|
||
|
|
**Error**: `CUDA out of memory` during model loading
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
# Use smaller model
|
||
|
|
sam = sam_model_registry["vit_b"](checkpoint="sam_vit_b_01ec64.pth")
|
||
|
|
|
||
|
|
# Load to CPU first, then move
|
||
|
|
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
|
||
|
|
sam.to("cpu")
|
||
|
|
torch.cuda.empty_cache()
|
||
|
|
sam.to("cuda")
|
||
|
|
|
||
|
|
# Use half precision
|
||
|
|
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth")
|
||
|
|
sam = sam.half()
|
||
|
|
sam.to("cuda")
|
||
|
|
```
|
||
|
|
|
||
|
|
## Inference Issues
|
||
|
|
|
||
|
|
### Image format errors
|
||
|
|
|
||
|
|
**Error**: `ValueError: expected input to have 3 channels`
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
import cv2
|
||
|
|
|
||
|
|
# Ensure RGB format
|
||
|
|
image = cv2.imread("image.jpg")
|
||
|
|
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # BGR to RGB
|
||
|
|
|
||
|
|
# Convert grayscale to RGB
|
||
|
|
if len(image.shape) == 2:
|
||
|
|
image = cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)
|
||
|
|
|
||
|
|
# Handle RGBA
|
||
|
|
if image.shape[2] == 4:
|
||
|
|
image = image[:, :, :3] # Drop alpha channel
|
||
|
|
```
|
||
|
|
|
||
|
|
### Coordinate errors
|
||
|
|
|
||
|
|
**Error**: `IndexError: index out of bounds` or incorrect mask location
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
# Ensure points are (x, y) not (row, col)
|
||
|
|
# x = column index, y = row index
|
||
|
|
point = np.array([[x, y]]) # Correct
|
||
|
|
|
||
|
|
# Verify coordinates are within image bounds
|
||
|
|
h, w = image.shape[:2]
|
||
|
|
assert 0 <= x < w and 0 <= y < h, "Point outside image"
|
||
|
|
|
||
|
|
# For bounding boxes: [x1, y1, x2, y2]
|
||
|
|
box = np.array([x1, y1, x2, y2])
|
||
|
|
assert x1 < x2 and y1 < y2, "Invalid box coordinates"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Empty or incorrect masks
|
||
|
|
|
||
|
|
**Problem**: Masks don't match expected object
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
# Try multiple prompts
|
||
|
|
input_points = np.array([[x1, y1], [x2, y2]])
|
||
|
|
input_labels = np.array([1, 1]) # Multiple foreground points
|
||
|
|
|
||
|
|
# Add background points
|
||
|
|
input_points = np.array([[obj_x, obj_y], [bg_x, bg_y]])
|
||
|
|
input_labels = np.array([1, 0]) # 1=foreground, 0=background
|
||
|
|
|
||
|
|
# Use box prompt for large objects
|
||
|
|
box = np.array([x1, y1, x2, y2])
|
||
|
|
masks, scores, _ = predictor.predict(box=box, multimask_output=False)
|
||
|
|
|
||
|
|
# Combine box and point
|
||
|
|
masks, scores, _ = predictor.predict(
|
||
|
|
point_coords=np.array([[center_x, center_y]]),
|
||
|
|
point_labels=np.array([1]),
|
||
|
|
box=np.array([x1, y1, x2, y2]),
|
||
|
|
multimask_output=True
|
||
|
|
)
|
||
|
|
|
||
|
|
# Check scores and select best
|
||
|
|
print(f"Scores: {scores}")
|
||
|
|
best_mask = masks[np.argmax(scores)]
|
||
|
|
```
|
||
|
|
|
||
|
|
### Slow inference
|
||
|
|
|
||
|
|
**Problem**: Prediction takes too long
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
# Use smaller model
|
||
|
|
sam = sam_model_registry["vit_b"](checkpoint="sam_vit_b_01ec64.pth")
|
||
|
|
|
||
|
|
# Reuse image embeddings
|
||
|
|
predictor.set_image(image) # Compute once
|
||
|
|
for point in points:
|
||
|
|
masks, _, _ = predictor.predict(...) # Fast, reuses embeddings
|
||
|
|
|
||
|
|
# Reduce automatic generation points
|
||
|
|
mask_generator = SamAutomaticMaskGenerator(
|
||
|
|
model=sam,
|
||
|
|
points_per_side=16, # Default is 32
|
||
|
|
)
|
||
|
|
|
||
|
|
# Use ONNX for deployment
|
||
|
|
# Export: python scripts/export_onnx_model.py --return-single-mask
|
||
|
|
```
|
||
|
|
|
||
|
|
## Automatic Mask Generation Issues
|
||
|
|
|
||
|
|
### Too many masks
|
||
|
|
|
||
|
|
**Problem**: Generating thousands of overlapping masks
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
mask_generator = SamAutomaticMaskGenerator(
|
||
|
|
model=sam,
|
||
|
|
points_per_side=16, # Reduce from 32
|
||
|
|
pred_iou_thresh=0.92, # Increase from 0.88
|
||
|
|
stability_score_thresh=0.98, # Increase from 0.95
|
||
|
|
box_nms_thresh=0.5, # More aggressive NMS
|
||
|
|
min_mask_region_area=500, # Remove small masks
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Too few masks
|
||
|
|
|
||
|
|
**Problem**: Missing objects in automatic generation
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
mask_generator = SamAutomaticMaskGenerator(
|
||
|
|
model=sam,
|
||
|
|
points_per_side=64, # Increase density
|
||
|
|
pred_iou_thresh=0.80, # Lower threshold
|
||
|
|
stability_score_thresh=0.85, # Lower threshold
|
||
|
|
crop_n_layers=2, # Add multi-scale
|
||
|
|
min_mask_region_area=0, # Keep all masks
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Small objects missed
|
||
|
|
|
||
|
|
**Problem**: Automatic generation misses small objects
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
# Use crop layers for multi-scale detection
|
||
|
|
mask_generator = SamAutomaticMaskGenerator(
|
||
|
|
model=sam,
|
||
|
|
crop_n_layers=2,
|
||
|
|
crop_n_points_downscale_factor=1, # Don't reduce points in crops
|
||
|
|
min_mask_region_area=10, # Very small minimum
|
||
|
|
)
|
||
|
|
|
||
|
|
# Or process image patches
|
||
|
|
def segment_with_patches(image, patch_size=512, overlap=64):
|
||
|
|
h, w = image.shape[:2]
|
||
|
|
all_masks = []
|
||
|
|
|
||
|
|
for y in range(0, h, patch_size - overlap):
|
||
|
|
for x in range(0, w, patch_size - overlap):
|
||
|
|
patch = image[y:y+patch_size, x:x+patch_size]
|
||
|
|
masks = mask_generator.generate(patch)
|
||
|
|
|
||
|
|
# Offset masks to original coordinates
|
||
|
|
for m in masks:
|
||
|
|
m['bbox'][0] += x
|
||
|
|
m['bbox'][1] += y
|
||
|
|
# Offset segmentation mask too
|
||
|
|
|
||
|
|
all_masks.extend(masks)
|
||
|
|
|
||
|
|
return all_masks
|
||
|
|
```
|
||
|
|
|
||
|
|
## Memory Issues
|
||
|
|
|
||
|
|
### CUDA out of memory
|
||
|
|
|
||
|
|
**Error**: `torch.cuda.OutOfMemoryError: CUDA out of memory`
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
# Use smaller model
|
||
|
|
sam = sam_model_registry["vit_b"](checkpoint="sam_vit_b_01ec64.pth")
|
||
|
|
|
||
|
|
# Clear cache between images
|
||
|
|
torch.cuda.empty_cache()
|
||
|
|
|
||
|
|
# Process images sequentially, not batched
|
||
|
|
for image in images:
|
||
|
|
predictor.set_image(image)
|
||
|
|
masks, _, _ = predictor.predict(...)
|
||
|
|
torch.cuda.empty_cache()
|
||
|
|
|
||
|
|
# Reduce image size
|
||
|
|
max_size = 1024
|
||
|
|
h, w = image.shape[:2]
|
||
|
|
if max(h, w) > max_size:
|
||
|
|
scale = max_size / max(h, w)
|
||
|
|
image = cv2.resize(image, (int(w*scale), int(h*scale)))
|
||
|
|
|
||
|
|
# Use CPU for large batch processing
|
||
|
|
sam.to("cpu")
|
||
|
|
```
|
||
|
|
|
||
|
|
### RAM out of memory
|
||
|
|
|
||
|
|
**Problem**: System runs out of RAM
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
# Process images one at a time
|
||
|
|
for img_path in image_paths:
|
||
|
|
image = cv2.imread(img_path)
|
||
|
|
masks = process_image(image)
|
||
|
|
save_results(masks)
|
||
|
|
del image, masks
|
||
|
|
gc.collect()
|
||
|
|
|
||
|
|
# Use generators instead of lists
|
||
|
|
def generate_masks_lazy(image_paths):
|
||
|
|
for path in image_paths:
|
||
|
|
image = cv2.imread(path)
|
||
|
|
masks = mask_generator.generate(image)
|
||
|
|
yield path, masks
|
||
|
|
```
|
||
|
|
|
||
|
|
## ONNX Export Issues
|
||
|
|
|
||
|
|
### Export fails
|
||
|
|
|
||
|
|
**Error**: Various export errors
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```bash
|
||
|
|
# Install correct ONNX version
|
||
|
|
pip install onnx==1.14.0 onnxruntime==1.15.0
|
||
|
|
|
||
|
|
# Use correct opset version
|
||
|
|
python scripts/export_onnx_model.py \
|
||
|
|
--checkpoint sam_vit_h_4b8939.pth \
|
||
|
|
--model-type vit_h \
|
||
|
|
--output sam.onnx \
|
||
|
|
--opset 17
|
||
|
|
```
|
||
|
|
|
||
|
|
### ONNX runtime errors
|
||
|
|
|
||
|
|
**Error**: `ONNXRuntimeError` during inference
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
import onnxruntime
|
||
|
|
|
||
|
|
# Check available providers
|
||
|
|
print(onnxruntime.get_available_providers())
|
||
|
|
|
||
|
|
# Use CPU provider if GPU fails
|
||
|
|
session = onnxruntime.InferenceSession(
|
||
|
|
"sam.onnx",
|
||
|
|
providers=['CPUExecutionProvider']
|
||
|
|
)
|
||
|
|
|
||
|
|
# Verify input shapes
|
||
|
|
for input in session.get_inputs():
|
||
|
|
print(f"{input.name}: {input.shape}")
|
||
|
|
```
|
||
|
|
|
||
|
|
## HuggingFace Integration Issues
|
||
|
|
|
||
|
|
### Processor errors
|
||
|
|
|
||
|
|
**Error**: Issues with SamProcessor
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
from transformers import SamModel, SamProcessor
|
||
|
|
|
||
|
|
# Use matching processor and model
|
||
|
|
model = SamModel.from_pretrained("facebook/sam-vit-huge")
|
||
|
|
processor = SamProcessor.from_pretrained("facebook/sam-vit-huge")
|
||
|
|
|
||
|
|
# Ensure input format
|
||
|
|
input_points = [[[x, y]]] # Nested list for batch dimension
|
||
|
|
inputs = processor(image, input_points=input_points, return_tensors="pt")
|
||
|
|
|
||
|
|
# Post-process correctly
|
||
|
|
masks = processor.image_processor.post_process_masks(
|
||
|
|
outputs.pred_masks.cpu(),
|
||
|
|
inputs["original_sizes"].cpu(),
|
||
|
|
inputs["reshaped_input_sizes"].cpu()
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Quality Issues
|
||
|
|
|
||
|
|
### Jagged mask edges
|
||
|
|
|
||
|
|
**Problem**: Masks have rough, pixelated edges
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
import cv2
|
||
|
|
from scipy import ndimage
|
||
|
|
|
||
|
|
def smooth_mask(mask, sigma=2):
|
||
|
|
"""Smooth mask edges."""
|
||
|
|
# Gaussian blur
|
||
|
|
smooth = ndimage.gaussian_filter(mask.astype(float), sigma=sigma)
|
||
|
|
return smooth > 0.5
|
||
|
|
|
||
|
|
def refine_edges(mask, kernel_size=5):
|
||
|
|
"""Refine mask edges with morphological operations."""
|
||
|
|
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (kernel_size, kernel_size))
|
||
|
|
# Close small gaps
|
||
|
|
closed = cv2.morphologyEx(mask.astype(np.uint8), cv2.MORPH_CLOSE, kernel)
|
||
|
|
# Open to remove noise
|
||
|
|
opened = cv2.morphologyEx(closed, cv2.MORPH_OPEN, kernel)
|
||
|
|
return opened.astype(bool)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Incomplete segmentation
|
||
|
|
|
||
|
|
**Problem**: Mask doesn't cover entire object
|
||
|
|
|
||
|
|
**Solutions**:
|
||
|
|
```python
|
||
|
|
# Add multiple points
|
||
|
|
input_points = np.array([
|
||
|
|
[obj_center_x, obj_center_y],
|
||
|
|
[obj_left_x, obj_center_y],
|
||
|
|
[obj_right_x, obj_center_y],
|
||
|
|
[obj_center_x, obj_top_y],
|
||
|
|
[obj_center_x, obj_bottom_y]
|
||
|
|
])
|
||
|
|
input_labels = np.array([1, 1, 1, 1, 1])
|
||
|
|
|
||
|
|
# Use bounding box
|
||
|
|
masks, _, _ = predictor.predict(
|
||
|
|
box=np.array([x1, y1, x2, y2]),
|
||
|
|
multimask_output=False
|
||
|
|
)
|
||
|
|
|
||
|
|
# Iterative refinement
|
||
|
|
mask_input = None
|
||
|
|
for point in points:
|
||
|
|
masks, scores, logits = predictor.predict(
|
||
|
|
point_coords=point.reshape(1, 2),
|
||
|
|
point_labels=np.array([1]),
|
||
|
|
mask_input=mask_input,
|
||
|
|
multimask_output=False
|
||
|
|
)
|
||
|
|
mask_input = logits
|
||
|
|
```
|
||
|
|
|
||
|
|
## Common Error Messages
|
||
|
|
|
||
|
|
| Error | Cause | Solution |
|
||
|
|
|-------|-------|----------|
|
||
|
|
| `CUDA out of memory` | GPU memory full | Use smaller model, clear cache |
|
||
|
|
| `expected 3 channels` | Wrong image format | Convert to RGB |
|
||
|
|
| `index out of bounds` | Invalid coordinates | Check point/box bounds |
|
||
|
|
| `checkpoint not found` | Wrong path | Use absolute path |
|
||
|
|
| `unexpected key` | Model/checkpoint mismatch | Match model type |
|
||
|
|
| `invalid box coordinates` | x1 > x2 or y1 > y2 | Fix box format |
|
||
|
|
|
||
|
|
## Getting Help
|
||
|
|
|
||
|
|
1. **GitHub Issues**: https://github.com/facebookresearch/segment-anything/issues
|
||
|
|
2. **HuggingFace Forums**: https://discuss.huggingface.co
|
||
|
|
3. **Paper**: https://arxiv.org/abs/2304.02643
|
||
|
|
|
||
|
|
### Reporting Issues
|
||
|
|
|
||
|
|
Include:
|
||
|
|
- Python version
|
||
|
|
- PyTorch version: `python -c "import torch; print(torch.__version__)"`
|
||
|
|
- CUDA version: `python -c "import torch; print(torch.version.cuda)"`
|
||
|
|
- SAM model type (vit_b/l/h)
|
||
|
|
- Full error traceback
|
||
|
|
- Minimal reproducible code
|