---
title: Transcribe Content with Speakers and Timestamps
type: skill
summary: Turn audio or video into a cleaned transcript with speaker labels, timestamps, and a usable handoff format.
category: Media
environment: local-agent
harnesses:
  - claude-code
  - claude-desktop
  - codex
complexity: moderate
tags:
  - transcription
  - video
  - audio
  - content
compatibility:
  mac:
    status: tested
    notes: Tested on macOS with ffmpeg and a local transcription workflow.
    requires:
      - name: ffmpeg
        url: https://ffmpeg.org/download.html
        reason: Extracts and normalizes audio from source media.
      - name: WhisperX
        url: https://github.com/m-bain/whisperX
        reason: Creates transcripts with word-level timestamps and speaker diarization.
      - name: Python
        url: https://www.python.org/downloads/
        reason: Runs the transcription tooling.
  windows:
    status: expected
    notes: Expected to work with equivalent ffmpeg, Python, and WhisperX setup.
    requires:
      - name: ffmpeg
        url: https://ffmpeg.org/download.html
        reason: Extracts and normalizes audio from source media.
      - name: WhisperX
        url: https://github.com/m-bain/whisperX
        reason: Creates transcripts with word-level timestamps and speaker diarization.
      - name: Python
        url: https://www.python.org/downloads/windows/
        reason: Runs the transcription tooling.
  linux:
    status: expected
    notes: Expected to work with standard package-manager installs.
    requires:
      - name: ffmpeg
        url: https://ffmpeg.org/download.html
        reason: Extracts and normalizes audio from source media.
      - name: WhisperX
        url: https://github.com/m-bain/whisperX
        reason: Creates transcripts with word-level timestamps and speaker diarization.
      - name: Python
        url: https://www.python.org/downloads/
        reason: Runs the transcription tooling.
---

## Use When

Use this skill when you have a video, podcast, screen recording, interview, course clip, or meeting recording and need a transcript that keeps the conversation structure intact.

## Inputs

- A local audio or video file.
- Speaker names, if known.
- The desired output format: clean transcript, chaptered notes, captions, or content brief.

## Workflow

1. Confirm the user owns the media or has permission to process it.
2. Use `ffmpeg` to extract a clean audio file from the source.
3. Run transcription with speaker diarization enabled.
4. Review obvious speaker-label mistakes and timestamp drift.
5. Clean filler only when it does not change meaning.
6. Return the transcript with speaker names, timestamps, and a short quality note.

## Output

The final transcript should be easy to quote, skim, and hand to another AI workflow. Prefer this shape:

```txt
[00:00:12] Speaker 1: Welcome back. Today we are looking at...
[00:00:26] Speaker 2: The tricky part is...
```

## Prompt

Use the related prompt when you want an AI pass over a raw transcript to normalize speaker names, clean formatting, and preserve timestamps.