youtube-watcher
Fetch and analyze YouTube video transcripts, summaries, and metadata. Use when you need to summarize videos, answer questions about video content, extract key points, or research topics covered in YouTube videos.
Permissions
Risk Assessment
This skill requests 2 of 4 possible permissions. Moderate scope — review that both permissions are necessary for its stated purpose.
SKILL.md
Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.
What It Does
YouTube Watcher retrieves video transcripts and metadata from YouTube, enabling your OpenClaw agent to:
- Summarize videos — get key points from any video with a transcript
- Answer questions — query specific information from video content
- Extract data — pull quotes, timestamps, and structured information
- Research topics — analyze multiple videos on a subject
Usage
Provide a YouTube URL or video ID, and the skill fetches the transcript:
Watch this video and summarize the key points: https://youtube.com/watch?v=...
What does the speaker say about AI safety in this video? https://youtube.com/watch?v=...
Supported Features
| Feature | Support |
|---|---|
| Video transcripts | Auto-generated and manual captions |
| Multiple languages | Yes, via available caption tracks |
| Video metadata | Title, description, channel, duration |
| Timestamp extraction | Yes, with caption timing data |
Limitations
- Requires the video to have captions (auto-generated or manual)
- Private and age-restricted videos may not be accessible
- Very long videos may need to be processed in segments
Best Practices
- Provide the full URL — include the complete YouTube link
- Be specific — ask targeted questions for better extraction
- Check availability — not all videos have transcripts enabled
- Combine with search — use with web search for comprehensive research
Why You Need youtube-watcher
YouTube is the world's second-largest search engine and the primary source for tutorials, product reviews, conference talks, and thought leadership content. But watching a 45-minute video to extract three key insights is a terrible use of time. You need the information, not the viewing experience.
YouTube Watcher fetches video transcripts and metadata, letting your OpenClaw agent summarize videos, answer questions about their content, and extract structured information — all without watching a single second. Feed it a URL, and get the key points, timestamps, quotes, and data you need.
This is especially powerful for research workflows: instead of watching 10 videos on a topic, your agent can process all their transcripts in minutes and synthesize a comprehensive summary with citations back to specific videos and timestamps.
Common Use Cases
- Summarize a 1-hour conference talk into key takeaways with timestamps
- Extract all mentioned tools, libraries, or products from a tech review video
- Answer specific questions about video content without watching the full video
- Research a topic across multiple YouTube videos and synthesize findings
- Generate notes and action items from recorded meetings uploaded to YouTube
Frequently Asked Questions
Does it work with any YouTube video?
It works with any video that has captions — either auto-generated or manually uploaded. Most public videos have auto-generated captions. Private and age-restricted videos may not be accessible.
Does it need an API key?
No API key is required. The skill uses YouTube's public transcript endpoints to fetch caption data.
Can it extract content in languages other than English?
Yes. It retrieves whatever caption tracks are available on the video, including multi-language captions. You can specify which language track to use.