Insight-Meditation-Center-Talks

Insight Meditation Center Talks

This project downloads video transcripts from YouTube and scrapes talk metadata from audiodharma.org. It processes the raw transcripts using a Generative AI to produce cleaned, formatted, and enriched markdown files suitable for a personal knowledge base.

The transcription articles generated by this project are here.

Installation

Set up a Python virtual environment and install the required dependencies.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

You must also have the gemini-cli tool installed and configured in your system’s PATH.

Script Usage

The script is organized into two main subcommands: youtube for downloading and processing video transcripts, and audiodharma for scraping website data.

1. Scraping AudioDharma.org

Before processing YouTube videos, it’s recommended to update the local cache of talk and speaker metadata from audiodharma.org. This data is used to enrich the final markdown files with speaker names and talk URLs.

python download.py audiodharma

This command scrapes the website and saves the data to cache/audiodharma/talks.yaml and cache/audiodharma/speakers.yaml. It will efficiently stop once it encounters a page with no new information.

2. Processing YouTube Transcripts

The youtube command fetches video transcripts, cleans them with an AI, and saves them as markdown files.

Commands

Common Options

Examples

# 1. Update the audiodharma.org data cache first.
python download.py audiodharma

# 2. Process all new videos from the IMC live stream channel.
# The script will stop once it finds a video that has already been processed and is up-to-date.
python download.py youtube channel-url "https://www.youtube.com/@InsightMeditationCenter/streams"

# Process the 5 most recent videos from the main videos tab, forcing AI processing.
python download.py youtube channel-url "https://www.youtube.com/@InsightMeditationCenter/videos" --limit 5 --force-ai-processing

# Process a single video.
python download.py youtube video-id "dQw4w9WgXcQ"

Project Structure