Sync all skills and memories 2026-04-14 07:27

This commit is contained in:
2026-04-14 07:27:20 +09:00
parent 516bb44fe6
commit 1eba2bca95
386 changed files with 167655 additions and 0 deletions

View File

@@ -0,0 +1,72 @@
---
name: youtube-content
description: >
Fetch YouTube video transcripts and transform them into structured content
(chapters, summaries, threads, blog posts). Use when the user shares a YouTube
URL or video link, asks to summarize a video, requests a transcript, or wants
to extract and reformat content from any YouTube video.
---
# YouTube Content Tool
Extract transcripts from YouTube videos and convert them into useful formats.
## Setup
```bash
pip install youtube-transcript-api
```
## Helper Script
`SKILL_DIR` is the directory containing this SKILL.md file. The script accepts any standard YouTube URL format, short links (youtu.be), shorts, embeds, live links, or a raw 11-character video ID.
```bash
# JSON output with metadata
python3 SKILL_DIR/scripts/fetch_transcript.py "https://youtube.com/watch?v=VIDEO_ID"
# Plain text (good for piping into further processing)
python3 SKILL_DIR/scripts/fetch_transcript.py "URL" --text-only
# With timestamps
python3 SKILL_DIR/scripts/fetch_transcript.py "URL" --timestamps
# Specific language with fallback chain
python3 SKILL_DIR/scripts/fetch_transcript.py "URL" --language tr,en
```
## Output Formats
After fetching the transcript, format it based on what the user asks for:
- **Chapters**: Group by topic shifts, output timestamped chapter list
- **Summary**: Concise 5-10 sentence overview of the entire video
- **Chapter summaries**: Chapters with a short paragraph summary for each
- **Thread**: Twitter/X thread format — numbered posts, each under 280 chars
- **Blog post**: Full article with title, sections, and key takeaways
- **Quotes**: Notable quotes with timestamps
### Example — Chapters Output
```
00:00 Introduction — host opens with the problem statement
03:45 Background — prior work and why existing solutions fall short
12:20 Core method — walkthrough of the proposed approach
24:10 Results — benchmark comparisons and key takeaways
31:55 Q&A — audience questions on scalability and next steps
```
## Workflow
1. **Fetch** the transcript using the helper script with `--text-only --timestamps`.
2. **Validate**: confirm the output is non-empty and in the expected language. If empty, retry without `--language` to get any available transcript. If still empty, tell the user the video likely has transcripts disabled.
3. **Chunk if needed**: if the transcript exceeds ~50K characters, split into overlapping chunks (~40K with 2K overlap) and summarize each chunk before merging.
4. **Transform** into the requested output format. If the user did not specify a format, default to a summary.
5. **Verify**: re-read the transformed output to check for coherence, correct timestamps, and completeness before presenting.
## Error Handling
- **Transcript disabled**: tell the user; suggest they check if subtitles are available on the video page.
- **Private/unavailable video**: relay the error and ask the user to verify the URL.
- **No matching language**: retry without `--language` to fetch any available transcript, then note the actual language to the user.
- **Dependency missing**: run `pip install youtube-transcript-api` and retry.

View File

@@ -0,0 +1,56 @@
# Output Format Examples
## Chapters
```
00:00 Introduction
02:15 Background and motivation
05:30 Main approach
12:45 Results and evaluation
18:20 Limitations and future work
21:00 Q&A
```
## Summary
A 5-10 sentence overview covering the video's main points, key arguments, and conclusions. Written in third person, present tense.
## Chapter Summaries
```
## 00:00 Introduction (2 min)
The speaker introduces the topic of X and explains why it matters for Y.
## 02:15 Background (3 min)
A review of prior work in the field, covering approaches A, B, and C.
```
## Thread (Twitter/X)
```
1/ Just watched an incredible talk on [topic]. Here are the key takeaways: 🧵
2/ First insight: [point]. This matters because [reason].
3/ The surprising part: [unexpected finding]. Most people assume [common belief], but the data shows otherwise.
4/ Practical takeaway: [actionable advice].
5/ Full video: [URL]
```
## Blog Post
Full article with:
- Title
- Introduction paragraph
- H2 sections for each major topic
- Key quotes (with timestamps)
- Conclusion / takeaways
## Quotes
```
"The most important thing is not the model size, but the data quality." — 05:32
"We found that scaling past 70B parameters gave diminishing returns." — 12:18
```

View File

@@ -0,0 +1,124 @@
#!/usr/bin/env python3
"""
Fetch a YouTube video transcript and output it as structured JSON.
Usage:
python fetch_transcript.py <url_or_video_id> [--language en,tr] [--timestamps]
Output (JSON):
{
"video_id": "...",
"language": "en",
"segments": [{"text": "...", "start": 0.0, "duration": 2.5}, ...],
"full_text": "complete transcript as plain text",
"timestamped_text": "00:00 first line\n00:05 second line\n..."
}
Install dependency: pip install youtube-transcript-api
"""
import argparse
import json
import re
import sys
def extract_video_id(url_or_id: str) -> str:
"""Extract the 11-character video ID from various YouTube URL formats."""
url_or_id = url_or_id.strip()
patterns = [
r'(?:v=|youtu\.be/|shorts/|embed/|live/)([a-zA-Z0-9_-]{11})',
r'^([a-zA-Z0-9_-]{11})$',
]
for pattern in patterns:
match = re.search(pattern, url_or_id)
if match:
return match.group(1)
return url_or_id
def format_timestamp(seconds: float) -> str:
"""Convert seconds to HH:MM:SS or MM:SS format."""
total = int(seconds)
h, remainder = divmod(total, 3600)
m, s = divmod(remainder, 60)
if h > 0:
return f"{h}:{m:02d}:{s:02d}"
return f"{m}:{s:02d}"
def fetch_transcript(video_id: str, languages: list = None):
"""Fetch transcript segments from YouTube.
Returns a list of dicts with 'text', 'start', and 'duration' keys.
Compatible with youtube-transcript-api v1.x.
"""
try:
from youtube_transcript_api import YouTubeTranscriptApi
except ImportError:
print("Error: youtube-transcript-api not installed. Run: pip install youtube-transcript-api",
file=sys.stderr)
sys.exit(1)
api = YouTubeTranscriptApi()
if languages:
result = api.fetch(video_id, languages=languages)
else:
result = api.fetch(video_id)
# v1.x returns FetchedTranscriptSnippet objects; normalize to dicts
return [
{"text": seg.text, "start": seg.start, "duration": seg.duration}
for seg in result
]
def main():
parser = argparse.ArgumentParser(description="Fetch YouTube transcript as JSON")
parser.add_argument("url", help="YouTube URL or video ID")
parser.add_argument("--language", "-l", default=None,
help="Comma-separated language codes (e.g. en,tr). Default: auto")
parser.add_argument("--timestamps", "-t", action="store_true",
help="Include timestamped text in output")
parser.add_argument("--text-only", action="store_true",
help="Output plain text instead of JSON")
args = parser.parse_args()
video_id = extract_video_id(args.url)
languages = [l.strip() for l in args.language.split(",")] if args.language else None
try:
segments = fetch_transcript(video_id, languages)
except Exception as e:
error_msg = str(e)
if "disabled" in error_msg.lower():
print(json.dumps({"error": "Transcripts are disabled for this video."}))
elif "no transcript" in error_msg.lower():
print(json.dumps({"error": f"No transcript found. Try specifying a language with --language."}))
else:
print(json.dumps({"error": error_msg}))
sys.exit(1)
full_text = " ".join(seg["text"] for seg in segments)
timestamped = "\n".join(
f"{format_timestamp(seg['start'])} {seg['text']}" for seg in segments
)
if args.text_only:
print(timestamped if args.timestamps else full_text)
return
result = {
"video_id": video_id,
"segment_count": len(segments),
"duration": format_timestamp(segments[-1]["start"] + segments[-1]["duration"]) if segments else "0:00",
"full_text": full_text,
}
if args.timestamps:
result["timestamped_text"] = timestamped
print(json.dumps(result, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()