AI Skills Library
Back to library

Transcribe Content with Speakers and Timestamps

Turn audio or video into a cleaned transcript with speaker labels, timestamps, and a usable handoff format.

Local agentDownload skill

Use When

Use this skill when you have a video, podcast, screen recording, interview, course clip, or meeting recording and need a transcript that keeps the conversation structure intact.

Inputs

  • A local audio or video file.
  • Speaker names, if known.
  • The desired output format: clean transcript, chaptered notes, captions, or content brief.

Workflow

  1. Confirm the user owns the media or has permission to process it.
  2. Use ffmpeg to extract a clean audio file from the source.
  3. Run transcription with speaker diarization enabled.
  4. Review obvious speaker-label mistakes and timestamp drift.
  5. Clean filler only when it does not change meaning.
  6. Return the transcript with speaker names, timestamps, and a short quality note.

Output

The final transcript should be easy to quote, skim, and hand to another AI workflow. Prefer this shape:

[00:00:12] Speaker 1: Welcome back. Today we are looking at...
[00:00:26] Speaker 2: The tricky part is...

Prompt

Use the related prompt when you want an AI pass over a raw transcript to normalize speaker names, clean formatting, and preserve timestamps.

See also