Transcript JSON Specification
Transcript data is stored in a single server-side transcript.json file per project. The file is timestamp-based and agnostic to the specific audio it renders against. If the audio is edited without a corresponding transcript update, playback positions will become misaligned. Automatic transcription realigns the transcript but overwrites any manual edits.
Root Keys
| key | type | description |
|---|---|---|
paragraphs | list | Transcript text, structured as paragraphs → sentences → words |
sectionBreaks | list | User-defined section markers with timestamps |
annotations | object | Text annotations (hyperlinks) attached to transcript segments |
notes | list | Inline notes anchored to timecodes (stage directions, speech labels, editor notes) |
All root keys are optional — a file may contain only paragraphs, or only annotations, etc.
paragraphs
A list of paragraph objects. Paragraphs are created either manually by the user or automatically based on detected pauses during transcription. Each paragraph contains one or more sentences; each sentence optionally contains a word-level breakdown.
Paragraph
| key | type | description |
|---|---|---|
speaker | str | The speaker_id associated with this paragraph |
start | float | Start time of the paragraph in seconds |
end | float | End time of the paragraph in seconds |
sentences | list | Ordered list of sentence objects within this paragraph |
Sentence
| key | type | description |
|---|---|---|
start | float | Start time of the sentence in seconds |
end | float | End time of the sentence in seconds |
text | str | Full text of the sentence |
words | list | Word-level breakdown (may be empty; if so, text is used as the atomic unit) |
Word
| key | type | description |
|---|---|---|
start | float | Start time of the word in seconds |
end | float | End time of the word in seconds |
word | str | The word string |
probability | float | Confidence score (0–1) for the word's timing/detection |
sectionBreaks
A list of section break objects. Section numbers are assigned at runtime (sorted by beforeSegStart) and are not stored in the file.
| key | type | description |
|---|---|---|
beforeSegStart | float | Start time of the first segment in this section. At runtime, the break is positioned immediately before the nearest sentence start. 0 places the break before the very first paragraph. |
name | str | User-defined label for the section (may be empty string) |
notes
A list of inline note objects anchored to timecodes. Notes are rendered at the nearest paragraph boundary at or after their timecode. The list is kept sorted by timecode.
| key | type | description |
|---|---|---|
timecode | float | Position in seconds; the note renders before the first paragraph starting at or after this time |
type | str | "stage_direction" | "speech_label" | "editor_note" |
text | str | The note text |
public | bool | false = visible to editors only; true = visible in presentation view. "editor_note" type is always hidden in presentation regardless of this flag. |
annotations
An object containing text annotations attached to transcript segments. Currently the only annotation type is hyperlinks.
hyperlinks
A map from a unique string ID (generated at creation time) to a hyperlink object.
{
"hyperlinks": {
"<id>": { ... }
}
}
Hyperlink object
| key | type | description |
|---|---|---|
url | str | The destination URL |
name | str | null | Display name for the link (null if not set) |
description | str | null | Short description shown in the UI (null if not set) |
editorNotes | str | null | Private notes visible only in the editor (null if not set) |
segmentIdx | int | Index of the transcript segment (sentence) this link is anchored to |
charStart | int | Character offset within the sentence text where the link begins |
charEnd | int | Character offset within the sentence text where the link ends |
Example
{
"paragraphs": [
{
"speaker": "speaker_0",
"start": 0.0,
"end": 4.2,
"sentences": [
{
"start": 0.0,
"end": 4.2,
"text": "Hello and welcome.",
"words": [
{ "start": 0.0, "end": 0.5, "word": "Hello", "probability": 0.99 },
{ "start": 0.6, "end": 0.9, "word": "and", "probability": 0.98 },
{ "start": 1.0, "end": 1.6, "word": "welcome", "probability": 0.97 }
]
}
]
}
],
"sectionBreaks": [
{ "beforeSegStart": 0, "name": "Introduction" }
],
"notes": [
{ "timecode": 0.0, "type": "stage_direction", "text": "theme music plays", "public": true },
{ "timecode": 1.0, "type": "editor_note", "text": "check audio quality here", "public": false }
],
"annotations": {
"hyperlinks": {
"lnk_1714000000000": {
"url": "https://example.com",
"name": "Example",
"description": null,
"editorNotes": null,
"segmentIdx": 0,
"charStart": 10,
"charEnd": 17
}
}
}
}