Transcript JSON Specification

Transcript data is stored in a single server-side transcript.json file per project. The file is timestamp-based and agnostic to the specific audio it renders against. If the audio is edited without a corresponding transcript update, playback positions will become misaligned. Automatic transcription realigns the transcript but overwrites any manual edits.

Root Keys

key	type	description
`paragraphs`	list	Transcript text, structured as paragraphs → sentences → words
`sectionBreaks`	list	User-defined section markers with timestamps
`annotations`	object	Text annotations (hyperlinks) attached to transcript segments
`notes`	list	Inline notes anchored to timecodes (stage directions, speech labels, editor notes)

All root keys are optional — a file may contain only paragraphs, or only annotations, etc.

`paragraphs`

A list of paragraph objects. Paragraphs are created either manually by the user or automatically based on detected pauses during transcription. Each paragraph contains one or more sentences; each sentence optionally contains a word-level breakdown.

Paragraph

key	type	description
`speaker`	str	The `speaker_id` associated with this paragraph
`start`	float	Start time of the paragraph in seconds
`end`	float	End time of the paragraph in seconds
`sentences`	list	Ordered list of sentence objects within this paragraph

Sentence

key	type	description
`start`	float	Start time of the sentence in seconds
`end`	float	End time of the sentence in seconds
`text`	str	Full text of the sentence
`words`	list	Word-level breakdown (may be empty; if so, `text` is used as the atomic unit)

Word

key	type	description
`start`	float	Start time of the word in seconds
`end`	float	End time of the word in seconds
`word`	str	The word string
`probability`	float	Confidence score (0–1) for the word's timing/detection

`sectionBreaks`

A list of section break objects. Section numbers are assigned at runtime (sorted by beforeSegStart) and are not stored in the file.

key	type	description
`beforeSegStart`	float	Start time of the first segment in this section. At runtime, the break is positioned immediately before the nearest sentence start. `0` places the break before the very first paragraph.
`name`	str	User-defined label for the section (may be empty string)

`notes`

A list of inline note objects anchored to timecodes. Notes are rendered at the nearest paragraph boundary at or after their timecode. The list is kept sorted by timecode.

key	type	description
`timecode`	float	Position in seconds; the note renders before the first paragraph starting at or after this time
`type`	str	`"stage_direction"` \| `"speech_label"` \| `"editor_note"`
`text`	str	The note text
`public`	bool	`false` = visible to editors only; `true` = visible in presentation view. `"editor_note"` type is always hidden in presentation regardless of this flag.

`annotations`

An object containing text annotations attached to transcript segments. Currently the only annotation type is hyperlinks.

`hyperlinks`

A map from a unique string ID (generated at creation time) to a hyperlink object.

{
  "hyperlinks": {
    "<id>": { ... }
  }
}

Hyperlink object

key	type	description
`url`	str	The destination URL
`name`	str \| null	Display name for the link (null if not set)
`description`	str \| null	Short description shown in the UI (null if not set)
`editorNotes`	str \| null	Private notes visible only in the editor (null if not set)
`segmentIdx`	int	Index of the transcript segment (sentence) this link is anchored to
`charStart`	int	Character offset within the sentence text where the link begins
`charEnd`	int	Character offset within the sentence text where the link ends

Example

{
  "paragraphs": [
    {
      "speaker": "speaker_0",
      "start": 0.0,
      "end": 4.2,
      "sentences": [
        {
          "start": 0.0,
          "end": 4.2,
          "text": "Hello and welcome.",
          "words": [
            { "start": 0.0, "end": 0.5, "word": "Hello",   "probability": 0.99 },
            { "start": 0.6, "end": 0.9, "word": "and",     "probability": 0.98 },
            { "start": 1.0, "end": 1.6, "word": "welcome", "probability": 0.97 }
          ]
        }
      ]
    }
  ],
  "sectionBreaks": [
    { "beforeSegStart": 0, "name": "Introduction" }
  ],
  "notes": [
    { "timecode": 0.0, "type": "stage_direction", "text": "theme music plays", "public": true },
    { "timecode": 1.0, "type": "editor_note", "text": "check audio quality here", "public": false }
  ],
  "annotations": {
    "hyperlinks": {
      "lnk_1714000000000": {
        "url": "https://example.com",
        "name": "Example",
        "description": null,
        "editorNotes": null,
        "segmentIdx": 0,
        "charStart": 10,
        "charEnd": 17
      }
    }
  }
}