# Segment Anything Troubleshooting Guide ## Installation Issues ### CUDA not available **Error**: `RuntimeError: CUDA not available` **Solutions**: ```python # Check CUDA availability import torch print(torch.cuda.is_available()) print(torch.version.cuda) # Install PyTorch with CUDA pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 # If CUDA works but SAM doesn't use it sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth") sam.to("cuda") # Explicitly move to GPU ``` ### Import errors **Error**: `ModuleNotFoundError: No module named 'segment_anything'` **Solutions**: ```bash # Install from GitHub pip install git+https://github.com/facebookresearch/segment-anything.git # Or clone and install git clone https://github.com/facebookresearch/segment-anything.git cd segment-anything pip install -e . # Verify installation python -c "from segment_anything import sam_model_registry; print('OK')" ``` ### Missing dependencies **Error**: `ModuleNotFoundError: No module named 'cv2'` or similar **Solutions**: ```bash # Install all optional dependencies pip install opencv-python pycocotools matplotlib onnxruntime onnx # For pycocotools on Windows pip install pycocotools-windows ``` ## Model Loading Issues ### Checkpoint not found **Error**: `FileNotFoundError: checkpoint file not found` **Solutions**: ```bash # Download correct checkpoint wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth # Verify file integrity md5sum sam_vit_h_4b8939.pth # Expected: a7bf3b02f3ebf1267aba913ff637d9a2 # Use absolute path sam = sam_model_registry["vit_h"](checkpoint="/full/path/to/sam_vit_h_4b8939.pth") ``` ### Model type mismatch **Error**: `KeyError: 'unexpected key in state_dict'` **Solutions**: ```python # Ensure model type matches checkpoint # vit_h checkpoint → vit_h model sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth") # vit_l checkpoint → vit_l model sam = sam_model_registry["vit_l"](checkpoint="sam_vit_l_0b3195.pth") # vit_b checkpoint → vit_b model sam = sam_model_registry["vit_b"](checkpoint="sam_vit_b_01ec64.pth") ``` ### Out of memory during load **Error**: `CUDA out of memory` during model loading **Solutions**: ```python # Use smaller model sam = sam_model_registry["vit_b"](checkpoint="sam_vit_b_01ec64.pth") # Load to CPU first, then move sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth") sam.to("cpu") torch.cuda.empty_cache() sam.to("cuda") # Use half precision sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h_4b8939.pth") sam = sam.half() sam.to("cuda") ``` ## Inference Issues ### Image format errors **Error**: `ValueError: expected input to have 3 channels` **Solutions**: ```python import cv2 # Ensure RGB format image = cv2.imread("image.jpg") image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # BGR to RGB # Convert grayscale to RGB if len(image.shape) == 2: image = cv2.cvtColor(image, cv2.COLOR_GRAY2RGB) # Handle RGBA if image.shape[2] == 4: image = image[:, :, :3] # Drop alpha channel ``` ### Coordinate errors **Error**: `IndexError: index out of bounds` or incorrect mask location **Solutions**: ```python # Ensure points are (x, y) not (row, col) # x = column index, y = row index point = np.array([[x, y]]) # Correct # Verify coordinates are within image bounds h, w = image.shape[:2] assert 0 <= x < w and 0 <= y < h, "Point outside image" # For bounding boxes: [x1, y1, x2, y2] box = np.array([x1, y1, x2, y2]) assert x1 < x2 and y1 < y2, "Invalid box coordinates" ``` ### Empty or incorrect masks **Problem**: Masks don't match expected object **Solutions**: ```python # Try multiple prompts input_points = np.array([[x1, y1], [x2, y2]]) input_labels = np.array([1, 1]) # Multiple foreground points # Add background points input_points = np.array([[obj_x, obj_y], [bg_x, bg_y]]) input_labels = np.array([1, 0]) # 1=foreground, 0=background # Use box prompt for large objects box = np.array([x1, y1, x2, y2]) masks, scores, _ = predictor.predict(box=box, multimask_output=False) # Combine box and point masks, scores, _ = predictor.predict( point_coords=np.array([[center_x, center_y]]), point_labels=np.array([1]), box=np.array([x1, y1, x2, y2]), multimask_output=True ) # Check scores and select best print(f"Scores: {scores}") best_mask = masks[np.argmax(scores)] ``` ### Slow inference **Problem**: Prediction takes too long **Solutions**: ```python # Use smaller model sam = sam_model_registry["vit_b"](checkpoint="sam_vit_b_01ec64.pth") # Reuse image embeddings predictor.set_image(image) # Compute once for point in points: masks, _, _ = predictor.predict(...) # Fast, reuses embeddings # Reduce automatic generation points mask_generator = SamAutomaticMaskGenerator( model=sam, points_per_side=16, # Default is 32 ) # Use ONNX for deployment # Export: python scripts/export_onnx_model.py --return-single-mask ``` ## Automatic Mask Generation Issues ### Too many masks **Problem**: Generating thousands of overlapping masks **Solutions**: ```python mask_generator = SamAutomaticMaskGenerator( model=sam, points_per_side=16, # Reduce from 32 pred_iou_thresh=0.92, # Increase from 0.88 stability_score_thresh=0.98, # Increase from 0.95 box_nms_thresh=0.5, # More aggressive NMS min_mask_region_area=500, # Remove small masks ) ``` ### Too few masks **Problem**: Missing objects in automatic generation **Solutions**: ```python mask_generator = SamAutomaticMaskGenerator( model=sam, points_per_side=64, # Increase density pred_iou_thresh=0.80, # Lower threshold stability_score_thresh=0.85, # Lower threshold crop_n_layers=2, # Add multi-scale min_mask_region_area=0, # Keep all masks ) ``` ### Small objects missed **Problem**: Automatic generation misses small objects **Solutions**: ```python # Use crop layers for multi-scale detection mask_generator = SamAutomaticMaskGenerator( model=sam, crop_n_layers=2, crop_n_points_downscale_factor=1, # Don't reduce points in crops min_mask_region_area=10, # Very small minimum ) # Or process image patches def segment_with_patches(image, patch_size=512, overlap=64): h, w = image.shape[:2] all_masks = [] for y in range(0, h, patch_size - overlap): for x in range(0, w, patch_size - overlap): patch = image[y:y+patch_size, x:x+patch_size] masks = mask_generator.generate(patch) # Offset masks to original coordinates for m in masks: m['bbox'][0] += x m['bbox'][1] += y # Offset segmentation mask too all_masks.extend(masks) return all_masks ``` ## Memory Issues ### CUDA out of memory **Error**: `torch.cuda.OutOfMemoryError: CUDA out of memory` **Solutions**: ```python # Use smaller model sam = sam_model_registry["vit_b"](checkpoint="sam_vit_b_01ec64.pth") # Clear cache between images torch.cuda.empty_cache() # Process images sequentially, not batched for image in images: predictor.set_image(image) masks, _, _ = predictor.predict(...) torch.cuda.empty_cache() # Reduce image size max_size = 1024 h, w = image.shape[:2] if max(h, w) > max_size: scale = max_size / max(h, w) image = cv2.resize(image, (int(w*scale), int(h*scale))) # Use CPU for large batch processing sam.to("cpu") ``` ### RAM out of memory **Problem**: System runs out of RAM **Solutions**: ```python # Process images one at a time for img_path in image_paths: image = cv2.imread(img_path) masks = process_image(image) save_results(masks) del image, masks gc.collect() # Use generators instead of lists def generate_masks_lazy(image_paths): for path in image_paths: image = cv2.imread(path) masks = mask_generator.generate(image) yield path, masks ``` ## ONNX Export Issues ### Export fails **Error**: Various export errors **Solutions**: ```bash # Install correct ONNX version pip install onnx==1.14.0 onnxruntime==1.15.0 # Use correct opset version python scripts/export_onnx_model.py \ --checkpoint sam_vit_h_4b8939.pth \ --model-type vit_h \ --output sam.onnx \ --opset 17 ``` ### ONNX runtime errors **Error**: `ONNXRuntimeError` during inference **Solutions**: ```python import onnxruntime # Check available providers print(onnxruntime.get_available_providers()) # Use CPU provider if GPU fails session = onnxruntime.InferenceSession( "sam.onnx", providers=['CPUExecutionProvider'] ) # Verify input shapes for input in session.get_inputs(): print(f"{input.name}: {input.shape}") ``` ## HuggingFace Integration Issues ### Processor errors **Error**: Issues with SamProcessor **Solutions**: ```python from transformers import SamModel, SamProcessor # Use matching processor and model model = SamModel.from_pretrained("facebook/sam-vit-huge") processor = SamProcessor.from_pretrained("facebook/sam-vit-huge") # Ensure input format input_points = [[[x, y]]] # Nested list for batch dimension inputs = processor(image, input_points=input_points, return_tensors="pt") # Post-process correctly masks = processor.image_processor.post_process_masks( outputs.pred_masks.cpu(), inputs["original_sizes"].cpu(), inputs["reshaped_input_sizes"].cpu() ) ``` ## Quality Issues ### Jagged mask edges **Problem**: Masks have rough, pixelated edges **Solutions**: ```python import cv2 from scipy import ndimage def smooth_mask(mask, sigma=2): """Smooth mask edges.""" # Gaussian blur smooth = ndimage.gaussian_filter(mask.astype(float), sigma=sigma) return smooth > 0.5 def refine_edges(mask, kernel_size=5): """Refine mask edges with morphological operations.""" kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (kernel_size, kernel_size)) # Close small gaps closed = cv2.morphologyEx(mask.astype(np.uint8), cv2.MORPH_CLOSE, kernel) # Open to remove noise opened = cv2.morphologyEx(closed, cv2.MORPH_OPEN, kernel) return opened.astype(bool) ``` ### Incomplete segmentation **Problem**: Mask doesn't cover entire object **Solutions**: ```python # Add multiple points input_points = np.array([ [obj_center_x, obj_center_y], [obj_left_x, obj_center_y], [obj_right_x, obj_center_y], [obj_center_x, obj_top_y], [obj_center_x, obj_bottom_y] ]) input_labels = np.array([1, 1, 1, 1, 1]) # Use bounding box masks, _, _ = predictor.predict( box=np.array([x1, y1, x2, y2]), multimask_output=False ) # Iterative refinement mask_input = None for point in points: masks, scores, logits = predictor.predict( point_coords=point.reshape(1, 2), point_labels=np.array([1]), mask_input=mask_input, multimask_output=False ) mask_input = logits ``` ## Common Error Messages | Error | Cause | Solution | |-------|-------|----------| | `CUDA out of memory` | GPU memory full | Use smaller model, clear cache | | `expected 3 channels` | Wrong image format | Convert to RGB | | `index out of bounds` | Invalid coordinates | Check point/box bounds | | `checkpoint not found` | Wrong path | Use absolute path | | `unexpected key` | Model/checkpoint mismatch | Match model type | | `invalid box coordinates` | x1 > x2 or y1 > y2 | Fix box format | ## Getting Help 1. **GitHub Issues**: https://github.com/facebookresearch/segment-anything/issues 2. **HuggingFace Forums**: https://discuss.huggingface.co 3. **Paper**: https://arxiv.org/abs/2304.02643 ### Reporting Issues Include: - Python version - PyTorch version: `python -c "import torch; print(torch.__version__)"` - CUDA version: `python -c "import torch; print(torch.version.cuda)"` - SAM model type (vit_b/l/h) - Full error traceback - Minimal reproducible code