Sync all skills and memories 2026-04-14 07:27

2026-04-14 07:27:20 +09:00
parent 516bb44fe6
commit 1eba2bca95
386 changed files with 167655 additions and 0 deletions
--- a/skills/mlops/inference/obliteratus/SKILL.md
+++ b/skills/mlops/inference/obliteratus/SKILL.md
@@ -0,0 +1,330 @@
+---
+name: obliteratus
+description: Remove refusal behaviors from open-weight LLMs using OBLITERATUS — mechanistic interpretability techniques (diff-in-means, SVD, whitened SVD, LEACE, SAE decomposition, etc.) to excise guardrails while preserving reasoning. 9 CLI methods, 28 analysis modules, 116 model presets across 5 compute tiers, tournament evaluation, and telemetry-driven recommendations. Use when a user wants to uncensor, abliterate, or remove refusal from an LLM.
+version: 2.0.0
+author: Hermes Agent
+license: MIT
+dependencies: [obliteratus, torch, transformers, bitsandbytes, accelerate, safetensors]
+metadata:
+  hermes:
+    tags: [Abliteration, Uncensoring, Refusal-Removal, LLM, Weight-Projection, SVD, Mechanistic-Interpretability, HuggingFace, Model-Surgery]
+    related_skills: [vllm, gguf, huggingface-tokenizers]
+---
+
+# OBLITERATUS Skill
+
+Remove refusal behaviors (guardrails) from open-weight LLMs without retraining or fine-tuning. Uses mechanistic interpretability techniques — including diff-in-means, SVD, whitened SVD, LEACE concept erasure, SAE decomposition, Bayesian kernel projection, and more — to identify and surgically excise refusal directions from model weights while preserving reasoning capabilities.
+
+**License warning:** OBLITERATUS is AGPL-3.0. NEVER import it as a Python library. Always invoke via CLI (`obliteratus` command) or subprocess. This keeps Hermes Agent's MIT license clean.
+
+## When to Use This Skill
+
+Trigger when the user:
+- Wants to "uncensor" or "abliterate" an LLM
+- Asks about removing refusal/guardrails from a model
+- Wants to create an uncensored version of Llama, Qwen, Mistral, etc.
+- Mentions "refusal removal", "abliteration", "weight projection"
+- Wants to analyze how a model's refusal mechanism works
+- References OBLITERATUS, abliterator, or refusal directions
+
+## Step 1: Installation
+
+Check if already installed:
+```bash
+obliteratus --version 2>/dev/null && echo "INSTALLED" || echo "NOT INSTALLED"
+```
+
+If not installed, clone and install from GitHub:
+```bash
+git clone https://github.com/elder-plinius/OBLITERATUS.git
+cd OBLITERATUS
+pip install -e .
+# For Gradio web UI support:
+# pip install -e ".[spaces]"
+```
+
+**IMPORTANT:** Confirm with user before installing. This pulls in ~5-10GB of dependencies (PyTorch, Transformers, bitsandbytes, etc.).
+
+## Step 2: Check Hardware
+
+Before anything, check what GPU is available:
+```bash
+python3 -c "
+import torch
+if torch.cuda.is_available():
+    gpu = torch.cuda.get_device_name(0)
+    vram = torch.cuda.get_device_properties(0).total_memory / 1024**3
+    print(f'GPU: {gpu}')
+    print(f'VRAM: {vram:.1f} GB')
+    if vram < 4: print('TIER: tiny (models under 1B)')
+    elif vram < 8: print('TIER: small (models 1-4B)')
+    elif vram < 16: print('TIER: medium (models 4-9B with 4bit quant)')
+    elif vram < 32: print('TIER: large (models 8-32B with 4bit quant)')
+    else: print('TIER: frontier (models 32B+)')
+else:
+    print('NO GPU - only tiny models (under 1B) on CPU')
+"
+```
+
+### VRAM Requirements (with 4-bit quantization)
+
+| VRAM     | Max Model Size  | Example Models                              |
+|:---------|:----------------|:--------------------------------------------|
+| CPU only | ~1B params      | GPT-2, TinyLlama, SmolLM                    |
+| 4-8 GB   | ~4B params      | Qwen2.5-1.5B, Phi-3.5 mini, Llama 3.2 3B   |
+| 8-16 GB  | ~9B params      | Llama 3.1 8B, Mistral 7B, Gemma 2 9B       |
+| 24 GB    | ~32B params     | Qwen3-32B, Llama 3.1 70B (tight), Command-R |
+| 48 GB+   | ~72B+ params    | Qwen2.5-72B, DeepSeek-R1                    |
+| Multi-GPU| 200B+ params    | Llama 3.1 405B, DeepSeek-V3 (685B MoE)      |
+
+## Step 3: Browse Available Models & Get Recommendations
+
+```bash
+# Browse models by compute tier
+obliteratus models --tier medium
+
+# Get architecture info for a specific model
+obliteratus info <model_name>
+
+# Get telemetry-driven recommendation for best method & params
+obliteratus recommend <model_name>
+obliteratus recommend <model_name> --insights  # global cross-architecture rankings
+```
+
+## Step 4: Choose a Method
+
+### Method Selection Guide
+**Default / recommended for most cases: `advanced`.** It uses multi-direction SVD with norm-preserving projection and is well-tested.
+
+| Situation                         | Recommended Method | Why                                      |
+|:----------------------------------|:-------------------|:-----------------------------------------|
+| Default / most models             | `advanced`         | Multi-direction SVD, norm-preserving, reliable |
+| Quick test / prototyping          | `basic`            | Fast, simple, good enough to evaluate    |
+| Dense model (Llama, Mistral)      | `advanced`         | Multi-direction, norm-preserving         |
+| MoE model (DeepSeek, Mixtral)     | `nuclear`          | Expert-granular, handles MoE complexity  |
+| Reasoning model (R1 distills)     | `surgical`         | CoT-aware, preserves chain-of-thought    |
+| Stubborn refusals persist         | `aggressive`       | Whitened SVD + head surgery + jailbreak   |
+| Want reversible changes           | Use steering vectors (see Analysis section) |
+| Maximum quality, time no object   | `optimized`        | Bayesian search for best parameters      |
+| Experimental auto-detection       | `informed`         | Auto-detects alignment type — experimental, may not always outperform advanced |
+
+### 9 CLI Methods
+- **basic** — Single refusal direction via diff-in-means. Fast (~5-10 min for 8B).
+- **advanced** (DEFAULT, RECOMMENDED) — Multiple SVD directions, norm-preserving projection, 2 refinement passes. Medium speed (~10-20 min).
+- **aggressive** — Whitened SVD + jailbreak-contrastive + attention head surgery. Higher risk of coherence damage.
+- **spectral_cascade** — DCT frequency-domain decomposition. Research/novel approach.
+- **informed** — Runs analysis DURING abliteration to auto-configure. Experimental — slower and less predictable than advanced.
+- **surgical** — SAE features + neuron masking + head surgery + per-expert. Very slow (~1-2 hrs). Best for reasoning models.
+- **optimized** — Bayesian hyperparameter search (Optuna TPE). Longest runtime but finds optimal parameters.
+- **inverted** — Flips the refusal direction. Model becomes actively willing.
+- **nuclear** — Maximum force combo for stubborn MoE models. Expert-granular.
+
+### Direction Extraction Methods (--direction-method flag)
+- **diff_means** (default) — Simple difference-in-means between refused/complied activations. Robust.
+- **svd** — Multi-direction SVD extraction. Better for complex alignment.
+- **leace** — LEACE (Linear Erasure via Closed-form Estimation). Optimal linear erasure.
+
+### 4 Python-API-Only Methods
+(NOT available via CLI — require Python import, which violates AGPL boundary. Mention to user only if they explicitly want to use OBLITERATUS as a library in their own AGPL project.)
+- failspy, gabliteration, heretic, rdo
+
+## Step 5: Run Abliteration
+
+### Standard usage
+```bash
+# Default method (advanced) — recommended for most models
+obliteratus obliterate <model_name> --method advanced --output-dir ./abliterated-models
+
+# With 4-bit quantization (saves VRAM)
+obliteratus obliterate <model_name> --method advanced --quantization 4bit --output-dir ./abliterated-models
+
+# Large models (70B+) — conservative defaults
+obliteratus obliterate <model_name> --method advanced --quantization 4bit --large-model --output-dir ./abliterated-models
+```
+
+### Fine-tuning parameters
+```bash
+obliteratus obliterate <model_name> \
+  --method advanced \
+  --direction-method diff_means \
+  --n-directions 4 \
+  --refinement-passes 2 \
+  --regularization 0.1 \
+  --quantization 4bit \
+  --output-dir ./abliterated-models \
+  --contribute  # opt-in telemetry for community research
+```
+
+### Key flags
+| Flag | Description | Default |
+|:-----|:------------|:--------|
+| `--method` | Abliteration method | advanced |
+| `--direction-method` | Direction extraction | diff_means |
+| `--n-directions` | Number of refusal directions (1-32) | method-dependent |
+| `--refinement-passes` | Iterative passes (1-5) | 2 |
+| `--regularization` | Regularization strength (0.0-1.0) | 0.1 |
+| `--quantization` | Load in 4bit or 8bit | none (full precision) |
+| `--large-model` | Conservative defaults for 120B+ | false |
+| `--output-dir` | Where to save the abliterated model | ./obliterated_model |
+| `--contribute` | Share anonymized results for research | false |
+| `--verify-sample-size` | Number of test prompts for refusal check | 20 |
+| `--dtype` | Model dtype (float16, bfloat16) | auto |
+
+### Other execution modes
+```bash
+# Interactive guided mode (hardware → model → preset)
+obliteratus interactive
+
+# Web UI (Gradio)
+obliteratus ui --port 7860
+
+# Run a full ablation study from YAML config
+obliteratus run config.yaml --preset quick
+
+# Tournament: pit all methods against each other
+obliteratus tourney <model_name>
+```
+
+## Step 6: Verify Results
+
+After abliteration, check the output metrics:
+
+| Metric | Good Value | Warning |
+|:-------|:-----------|:--------|
+| Refusal rate | < 5% (ideally ~0%) | > 10% means refusals persist |
+| Perplexity change | < 10% increase | > 15% means coherence damage |
+| KL divergence | < 0.1 | > 0.5 means significant distribution shift |
+| Coherence | High / passes qualitative check | Degraded responses, repetition |
+
+### If refusals persist (> 10%)
+1. Try `aggressive` method
+2. Increase `--n-directions` (e.g., 8 or 16)
+3. Add `--refinement-passes 3`
+4. Try `--direction-method svd` instead of diff_means
+
+### If coherence is damaged (perplexity > 15% increase)
+1. Reduce `--n-directions` (try 2)
+2. Increase `--regularization` (try 0.3)
+3. Reduce `--refinement-passes` to 1
+4. Try `basic` method (gentler)
+
+## Step 7: Use the Abliterated Model
+
+The output is a standard HuggingFace model directory.
+
+```bash
+# Test locally with transformers
+python3 -c "
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained('./abliterated-models/<model>')
+tokenizer = AutoTokenizer.from_pretrained('./abliterated-models/<model>')
+inputs = tokenizer('How do I pick a lock?', return_tensors='pt')
+outputs = model.generate(**inputs, max_new_tokens=200)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+"
+
+# Upload to HuggingFace Hub
+huggingface-cli upload <username>/<model-name>-abliterated ./abliterated-models/<model>
+
+# Serve with vLLM
+vllm serve ./abliterated-models/<model>
+```
+
+## CLI Command Reference
+
+| Command | Description |
+|:--------|:------------|
+| `obliteratus obliterate` | Main abliteration command |
+| `obliteratus info <model>` | Print model architecture details |
+| `obliteratus models --tier <tier>` | Browse curated models by compute tier |
+| `obliteratus recommend <model>` | Telemetry-driven method/param suggestion |
+| `obliteratus interactive` | Guided setup wizard |
+| `obliteratus tourney <model>` | Tournament: all methods head-to-head |
+| `obliteratus run <config.yaml>` | Execute ablation study from YAML |
+| `obliteratus strategies` | List all registered ablation strategies |
+| `obliteratus report <results.json>` | Regenerate visual reports |
+| `obliteratus ui` | Launch Gradio web interface |
+| `obliteratus aggregate` | Summarize community telemetry data |
+
+## Analysis Modules
+
+OBLITERATUS includes 28 analysis modules for mechanistic interpretability.
+See `skill_view(name="obliteratus", file_path="references/analysis-modules.md")` for the full reference.
+
+### Quick analysis commands
+```bash
+# Run specific analysis modules
+obliteratus run analysis-config.yaml --preset quick
+
+# Key modules to run first:
+# - alignment_imprint: Fingerprint DPO/RLHF/CAI/SFT alignment method
+# - concept_geometry: Single direction vs polyhedral cone
+# - logit_lens: Which layer decides to refuse
+# - anti_ouroboros: Self-repair risk score
+# - causal_tracing: Causally necessary components
+```
+
+### Steering Vectors (Reversible Alternative)
+Instead of permanent weight modification, use inference-time steering:
+```python
+# Python API only — for user's own projects
+from obliteratus.analysis.steering_vectors import SteeringVectorFactory, SteeringHookManager
+```
+
+## Ablation Strategies
+
+Beyond direction-based abliteration, OBLITERATUS includes structural ablation strategies:
+- **Embedding Ablation** — Target embedding layer components
+- **FFN Ablation** — Feed-forward network block removal
+- **Head Pruning** — Attention head pruning
+- **Layer Removal** — Full layer removal
+
+List all available: `obliteratus strategies`
+
+## Evaluation
+
+OBLITERATUS includes built-in evaluation tools:
+- Refusal rate benchmarking
+- Perplexity comparison (before/after)
+- LM Eval Harness integration for academic benchmarks
+- Head-to-head competitor comparison
+- Baseline performance tracking
+
+## Platform Support
+
+- **CUDA** — Full support (NVIDIA GPUs)
+- **Apple Silicon (MLX)** — Supported via MLX backend
+- **CPU** — Supported for tiny models (< 1B params)
+
+## YAML Config Templates
+
+Load templates for reproducible runs via `skill_view`:
+- `templates/abliteration-config.yaml` — Standard single-model config
+- `templates/analysis-study.yaml` — Pre-abliteration analysis study
+- `templates/batch-abliteration.yaml` — Multi-model batch processing
+
+## Telemetry
+
+OBLITERATUS can optionally contribute anonymized run data to a global research dataset.
+Enable with `--contribute` flag. No personal data is collected — only model name, method, metrics.
+
+## Common Pitfalls
+
+1. **Don't use `informed` as default** — it's experimental and slower. Use `advanced` for reliable results.
+2. **Models under ~1B respond poorly to abliteration** — their refusal behaviors are shallow and fragmented, making clean direction extraction difficult. Expect partial results (20-40% remaining refusal). Models 3B+ have cleaner refusal directions and respond much better (often 0% refusal with `advanced`).
+3. **`aggressive` can make things worse** — on small models it can damage coherence and actually increase refusal rate. Only use it if `advanced` leaves > 10% refusals on a 3B+ model.
+4. **Always check perplexity** — if it spikes > 15%, the model is damaged. Reduce aggressiveness.
+5. **MoE models need special handling** — use `nuclear` method for Mixtral, DeepSeek-MoE, etc.
+6. **Quantized models can't be re-quantized** — abliterate the full-precision model, then quantize the output.
+7. **VRAM estimation is approximate** — 4-bit quant helps but peak usage can spike during extraction.
+8. **Reasoning models are sensitive** — use `surgical` for R1 distills to preserve chain-of-thought.
+9. **Check `obliteratus recommend`** — telemetry data may have better parameters than defaults.
+10. **AGPL license** — never `import obliteratus` in MIT/Apache projects. CLI invocation only.
+11. **Large models (70B+)** — always use `--large-model` flag for conservative defaults.
+12. **Spectral certification RED is common** — the spectral check often flags "incomplete" even when practical refusal rate is 0%. Check actual refusal rate rather than relying on spectral certification alone.
+
+## Complementary Skills
+
+- **vllm** — Serve abliterated models with high throughput
+- **gguf** — Convert abliterated models to GGUF for llama.cpp
+- **huggingface-tokenizers** — Work with model tokenizers
--- a/skills/mlops/inference/obliteratus/references/analysis-modules.md
+++ b/skills/mlops/inference/obliteratus/references/analysis-modules.md
@@ -0,0 +1,166 @@
+# OBLITERATUS Analysis Modules — Reference
+
+OBLITERATUS includes 28 analysis modules for mechanistic interpretability of refusal in LLMs.
+These modules help understand how and where refusal behaviors are encoded before performing abliteration.
+
+---
+
+## Core Analysis (Run These First)
+
+### 1. Alignment Imprint Detection (`alignment_imprint.py`)
+Fingerprints whether a model was trained via DPO, RLHF, CAI, or SFT.
+This determines which extraction strategy will work best.
+
+### 2. Concept Cone Geometry (`concept_geometry.py`)
+Determines if refusal is a single linear direction or a polyhedral cone
+(set of multiple mechanisms). Single-direction models respond well to `basic`;
+polyhedral models need `advanced` or `surgical`.
+
+### 3. Refusal Logit Lens (`logit_lens.py`)
+Identifies the specific layer where a model "decides" to refuse by decoding
+intermediate layer representations into token space.
+
+### 4. Ouroboros Detection (`anti_ouroboros.py`)
+Identifies if a model attempts to "self-repair" refusal behaviors after
+excision. Reports a risk score (0-1). High scores mean additional refinement
+passes are needed.
+
+### 5. Causal Tracing (`causal_tracing.py`)
+Identifies which components (layers, heads, MLPs) are causally necessary
+for refusal behavior using activation patching.
+
+---
+
+## Geometric Analysis
+
+### 6. Cross-Layer Alignment (`cross_layer.py`)
+Measures how refusal directions align across different layers. High alignment
+means the refusal signal is consistent; low alignment suggests layer-specific
+mechanisms.
+
+### 7. Residual Stream Decomposition (`residual_stream.py`)
+Decomposes the residual stream into attention and MLP contributions to
+understand which component type contributes more to refusal.
+
+### 8. Riemannian Manifold Geometry (`riemannian_manifold.py`)
+Analyzes the curvature and geometry of the weight manifold near refusal
+directions. Informs how aggressively projections can be applied without
+damaging the manifold structure.
+
+### 9. Whitened SVD (`whitened_svd.py`)
+Covariance-normalized SVD extraction that separates guardrail signals from
+natural activation variance. More precise than standard SVD for models with
+high activation variance.
+
+### 10. Concept Cone Geometry (extended)
+Maps the full polyhedral structure of refusal, including cone angles,
+face counts, and intersection patterns.
+
+---
+
+## Probing & Classification
+
+### 11. Activation Probing (`activation_probing.py`)
+Post-excision verification — probes for residual refusal concepts after
+abliteration to ensure complete removal.
+
+### 12. Probing Classifiers (`probing_classifiers.py`)
+Trains linear classifiers to detect refusal in activations. Used both
+before (to verify refusal exists) and after (to verify it's gone).
+
+### 13. Activation Patching (`activation_patching.py`)
+Interchange interventions — swaps activations between refused and complied
+runs to identify causal components.
+
+### 14. Tuned Lens (`tuned_lens.py`)
+Trained version of logit lens that provides more accurate per-layer
+decoding by learning affine transformations for each layer.
+
+### 15. Multi-Token Position Analysis (`multi_token_position.py`)
+Analyzes refusal signals across multiple token positions, not just the
+last token. Important for models that distribute refusal across the sequence.
+
+---
+
+## Abliteration & Manipulation
+
+### 16. SAE-Based Abliteration (`sae_abliteration.py`)
+Uses Sparse Autoencoder features to identify and remove specific refusal
+features. More surgical than direction-based methods.
+
+### 17. Steering Vectors (`steering_vectors.py`)
+Creates and applies inference-time steering vectors for reversible refusal
+modification. Includes `SteeringVectorFactory` and `SteeringHookManager`.
+
+### 18. LEACE Concept Erasure (`leace.py`)
+Linear Erasure via Closed-form Estimation — mathematically optimal linear
+concept removal. Available as both analysis module and direction extraction method.
+
+### 19. Sparse Surgery (`sparse_surgery.py`)
+High-precision weight modification targeting individual neurons and
+weight matrix entries rather than full directions.
+
+### 20. Conditional Abliteration (`conditional_abliteration.py`)
+Targeted removal that only affects specific refusal categories while
+preserving others (e.g., remove weapons refusal but keep CSAM refusal).
+
+---
+
+## Transfer & Robustness
+
+### 21. Cross-Model Transfer (`cross_model_transfer.py`)
+Tests whether refusal directions extracted from one model transfer to
+another architecture. Measures universality of guardrail directions.
+
+### 22. Defense Robustness (`defense_robustness.py`)
+Evaluates how robust the abliteration is against various defense mechanisms
+and re-alignment attempts.
+
+### 23. Spectral Certification (`spectral_certification.py`)
+Provides mathematical bounds on the completeness of refusal removal
+using spectral analysis of the projection.
+
+### 24. Wasserstein Optimal Extraction (`wasserstein_optimal.py`)
+Uses optimal transport theory for more precise direction extraction
+that minimizes distribution shift.
+
+### 25. Wasserstein Transfer (`wasserstein_transfer.py`)
+Distribution transfer between models using Wasserstein distance
+for cross-architecture refusal direction mapping.
+
+---
+
+## Advanced / Research
+
+### 26. Bayesian Kernel Projection (`bayesian_kernel_projection.py`)
+Probabilistic feature mapping that estimates uncertainty in refusal
+direction identification.
+
+### 27. Cross-Model Universality Index
+Measures if guardrail directions generalize across different model
+architectures and training regimes.
+
+### 28. Visualization (`visualization.py`)
+Plotting and graphing utilities for all analysis modules. Generates
+heatmaps, direction plots, and layer-wise analysis charts.
+
+---
+
+## Running Analysis
+
+### Via CLI
+```bash
+# Run analysis from a YAML config
+obliteratus run analysis-study.yaml --preset quick
+
+# Available study presets:
+# quick     — Fast sanity check (2-3 modules)
+# full      — All core + geometric analysis
+# jailbreak — Refusal circuit localization
+# knowledge — Knowledge preservation analysis
+# robustness — Stress testing / defense evaluation
+```
+
+### Via YAML Config
+See the `templates/analysis-study.yaml` template for a complete example.
+Load with: `skill_view(name="obliteratus", file_path="templates/analysis-study.yaml")`
--- a/skills/mlops/inference/obliteratus/references/methods-guide.md
+++ b/skills/mlops/inference/obliteratus/references/methods-guide.md
@@ -0,0 +1,141 @@
+# OBLITERATUS Methods — Detailed Guide
+
+> The CLI accepts 9 methods via `--method`: basic, advanced, aggressive, spectral_cascade,
+> informed, surgical, optimized, inverted, nuclear.
+> Four additional methods (failspy, gabliteration, heretic, rdo) are available only via the Python API.
+
+## How Abliteration Works (Theory)
+
+Abliteration identifies a "refusal direction" — a vector in the model's activation space that
+corresponds to refusal behavior — and projects it out of the weight matrices.
+
+Mathematically: `W_new = W_old - (W_old @ d @ d.T)` where `d` is the refusal direction.
+
+The key challenge is finding accurate refusal directions without damaging other capabilities.
+
+---
+
+## Direction Extraction Methods
+
+Before projecting, OBLITERATUS extracts refusal directions using one of three methods:
+
+| Method | Flag | Description | Best For |
+|:-------|:-----|:------------|:---------|
+| Diff-in-Means | `--direction-method diff_means` | Difference between mean activations on refused vs. complied prompts | Default, fast, robust |
+| SVD | `--direction-method svd` | Multi-direction extraction via Singular Value Decomposition | Complex alignment, multiple refusal mechanisms |
+| LEACE | `--direction-method leace` | Linear Erasure via Closed-form Estimation — mathematically optimal | Maximum precision, research |
+
+---
+
+## Method Details
+
+### basic
+- **Directions:** 1 (single diff-in-means vector)
+- **Speed:** Fast (~5-10 min for 8B model)
+- **Risk:** Low
+- **Use case:** Quick tests, prototyping, evaluating if abliteration works for a model
+- **How it works:** Extracts one refusal direction and projects it out uniformly across all layers.
+
+### advanced (DEFAULT — RECOMMENDED)
+- **Directions:** 4 (multi-direction SVD)
+- **Speed:** Medium (~10-20 min for 8B model)
+- **Risk:** Low-Medium
+- **Refinement passes:** 2
+- **Use case:** Default for most models. Well-tested and reliable.
+- **How it works:** Extracts multiple refusal directions via SVD, applies norm-preserving bi-projection to maintain weight matrix norms. Two refinement passes catch residual refusal.
+
+### aggressive
+- **Directions:** 8+ (whitened SVD + jailbreak-contrastive)
+- **Speed:** Medium-Slow
+- **Risk:** Medium-High (may damage coherence)
+- **Use case:** When `advanced` leaves > 10% refusals. Stubborn models.
+- **How it works:** Uses whitened SVD for covariance-normalized extraction, adds jailbreak-contrastive directions, performs attention head surgery on the most refusal-active heads.
+
+### spectral_cascade
+- **Speed:** Medium
+- **Risk:** Medium
+- **Use case:** Research, novel approaches
+- **How it works:** DCT (Discrete Cosine Transform) frequency-domain decomposition of refusal signals. Separates high-frequency (surface-level) from low-frequency (deep) refusal patterns.
+
+### informed (EXPERIMENTAL)
+- **Speed:** Slow (~20-40 min for 8B model)
+- **Risk:** Variable — results depend on analysis quality
+- **Use case:** When you want auto-configuration, but be aware this is experimental and may not outperform `advanced`.
+- **How it works:** Runs 4 analysis modules first (alignment imprint, concept geometry, logit lens, ouroboros detection), then auto-configures extraction strategy. Includes an "Ouroboros loop" that detects and counteracts self-repair.
+- **Note:** The auto-detection can sometimes misconfigure. If results are poor, fall back to `advanced`.
+
+### surgical
+- **Speed:** Very slow (~1-2 hrs for 8B model)
+- **Risk:** Low (very precise)
+- **Use case:** Reasoning models (R1 distills, QwQ, etc.) where chain-of-thought must be preserved.
+- **How it works:** Uses SAE (Sparse Autoencoder) features + individual neuron masking + attention head surgery + per-expert decomposition (for MoE). CoT-aware — identifies and protects reasoning-critical directions before projecting.
+
+### optimized
+- **Speed:** Very slow (hours — runs many trials)
+- **Risk:** Low (finds optimal parameters)
+- **Use case:** When quality matters more than speed. Production models.
+- **How it works:** Bayesian hyperparameter search via Optuna TPE sampler. Optimizes n_directions, regularization, refinement passes, and layer selection jointly. Evaluates each configuration on refusal rate + perplexity.
+
+### inverted
+- **Speed:** Fast
+- **Risk:** High (model behavior changes dramatically)
+- **Use case:** Research, studying refusal mechanisms
+- **How it works:** Instead of projecting out the refusal direction, reflects it. The model actively complies rather than passively not-refusing. Useful for understanding the geometry of alignment.
+
+### nuclear
+- **Speed:** Slow
+- **Risk:** Medium-High
+- **Use case:** Stubborn MoE models (DeepSeek-MoE, Mixtral, etc.)
+- **How it works:** Combines expert-granular abliteration (EGA), steering vector injection, attention head pruning, and multi-pass refinement. Decomposes refusal signals into per-expert components for MoE architectures.
+
+---
+
+## Method Selection Flowchart
+
+```
+Is this a quick test?
+  → YES: basic
+  → NO: continue
+
+Is it an MoE model (Mixtral, DeepSeek-MoE)?
+  → YES: nuclear
+  → NO: continue
+
+Is it a reasoning model (R1, QwQ, CoT-focused)?
+  → YES: surgical
+  → NO: continue
+
+Do you need the absolute best quality and have time?
+  → YES: optimized
+  → NO: advanced (recommended default)
+
+Did advanced leave > 10% refusals?
+  → YES: aggressive
+  → Still refusing: nuclear
+```
+
+---
+
+## Key Parameters
+
+| Parameter | Range | Default | Effect |
+|:----------|:------|:--------|:-------|
+| `--n-directions` | 1-32 | method-dependent | More directions = more complete removal, but higher damage risk |
+| `--regularization` | 0.0-1.0 | 0.1 | Higher = more conservative (less removal, less damage) |
+| `--refinement-passes` | 1-5 | 2 | More passes catch residual refusal, but diminishing returns |
+| `--quantization` | 4bit, 8bit | none | Reduces VRAM usage; quality impact minimal for extraction |
+| `--verify-sample-size` | 10-200 | 20 | More samples = more accurate refusal rate estimate |
+
+---
+
+## Troubleshooting
+
+| Problem | Likely Cause | Fix |
+|:--------|:-------------|:----|
+| Refusal rate > 20% | Too few directions | Increase `--n-directions`, try `aggressive` |
+| Refusal rate 5-20% | Residual refusal | Add `--refinement-passes 3`, try `--direction-method svd` |
+| Perplexity spike > 20% | Over-aggressive removal | Reduce `--n-directions`, increase `--regularization` |
+| Repetitive output | Weight matrix damage | Use `basic` with fewer directions, check norm preservation |
+| MoE model still refuses | Non-expert-aware method | Switch to `nuclear` |
+| Reasoning degraded | CoT directions damaged | Use `surgical` method |
+| OOM during extraction | Insufficient VRAM | Add `--quantization 4bit` and/or `--large-model` |
--- a/skills/mlops/inference/obliteratus/templates/abliteration-config.yaml
+++ b/skills/mlops/inference/obliteratus/templates/abliteration-config.yaml
@@ -0,0 +1,33 @@
+# OBLITERATUS Abliteration Config
+# Usage: obliteratus run this-file.yaml
+#
+# This is for reproducible, version-controlled abliteration runs.
+# For one-off usage, the CLI flags are simpler.
+
+# Model to abliterate
+model:
+  name: "meta-llama/Llama-3.1-8B-Instruct"
+  dtype: "bfloat16"         # float16, bfloat16, float32
+  quantization: null         # null, "4bit", "8bit"
+  device: "auto"             # auto, cuda, cuda:0, cpu
+
+# Abliteration method and parameters
+abliteration:
+  method: "informed"         # See SKILL.md Step 4 for all 13 methods
+  n_directions: null         # null = auto-detect, or integer (e.g., 8)
+  regularization: 0.0        # 0.0-1.0, fraction of original to preserve
+  refinement_passes: 1       # Iterative passes (increase for self-repair)
+  norm_preserve: true        # Keep weight norms intact after projection
+
+# Output
+output:
+  directory: "./abliterated-models"
+  save_metadata: true        # Save abliteration_metadata.json alongside model
+  contribute: false          # Save community contribution data
+
+# Verification
+verify:
+  enabled: true
+  test_prompts: null         # null = use built-in test prompts
+  compute_perplexity: true
+  compute_kl: true
--- a/skills/mlops/inference/obliteratus/templates/analysis-study.yaml
+++ b/skills/mlops/inference/obliteratus/templates/analysis-study.yaml
@@ -0,0 +1,40 @@
+# OBLITERATUS Analysis Study Config
+# Usage: obliteratus run this-file.yaml --preset jailbreak
+#
+# Run analysis modules to understand refusal geometry BEFORE abliterating.
+# Useful for research or when you want to understand what you're removing.
+
+# Model to analyze
+model:
+  name: "meta-llama/Llama-3.1-8B-Instruct"
+  dtype: "bfloat16"
+  quantization: "4bit"       # Saves VRAM for analysis
+  device: "auto"
+
+# Study configuration
+study:
+  # Available presets: quick, full, attention, jailbreak, guardrail, knowledge
+  preset: "jailbreak"
+
+  # Or specify individual strategies:
+  # strategies:
+  #   - layer_removal
+  #   - head_pruning
+  #   - ffn_ablation
+  #   - embedding_ablation
+
+# Analysis modules to run (subset of the 27 available)
+analysis:
+  - alignment_imprint        # Detect DPO/RLHF/CAI/SFT training method
+  - concept_geometry          # Map refusal cone geometry
+  - logit_lens               # Find which layer decides to refuse
+  - anti_ouroboros            # Detect self-repair tendency
+  - cross_layer              # Cross-layer alignment clustering
+  - causal_tracing           # Causal necessity of components
+  - residual_stream          # Attention vs MLP contribution
+
+# Output
+output:
+  directory: "./analysis-results"
+  save_plots: true           # Generate matplotlib visualizations
+  save_report: true          # Generate markdown report
--- a/skills/mlops/inference/obliteratus/templates/batch-abliteration.yaml
+++ b/skills/mlops/inference/obliteratus/templates/batch-abliteration.yaml
@@ -0,0 +1,41 @@
+# OBLITERATUS Batch Abliteration Config
+# Abliterate multiple models with the same method for comparison.
+#
+# Run each one sequentially:
+#   for model in models; do obliteratus obliterate $model --method informed; done
+#
+# Or use this as a reference for which models to process.
+
+# Common settings
+defaults:
+  method: "informed"
+  quantization: "4bit"
+  output_dir: "./abliterated-models"
+
+# Models to process (grouped by compute tier)
+models:
+  # Small (4-8 GB VRAM)
+  small:
+    - "Qwen/Qwen2.5-1.5B-Instruct"
+    - "microsoft/Phi-3.5-mini-instruct"
+    - "meta-llama/Llama-3.2-3B-Instruct"
+
+  # Medium (8-16 GB VRAM)
+  medium:
+    - "meta-llama/Llama-3.1-8B-Instruct"
+    - "mistralai/Mistral-7B-Instruct-v0.3"
+    - "google/gemma-2-9b-it"
+    - "Qwen/Qwen2.5-7B-Instruct"
+
+  # Large (24 GB VRAM, 4-bit quantization)
+  large:
+    - "Qwen/Qwen2.5-14B-Instruct"
+    - "Qwen/Qwen3-32B"
+    - "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
+
+# Per-model method overrides (optional)
+overrides:
+  "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B":
+    method: "surgical"        # CoT-aware for reasoning models
+  "mistralai/Mixtral-8x7B-Instruct-v0.1":
+    method: "nuclear"         # Expert-granular for MoE models