# Seedance 2 — Canonical Prompt Schema

> **Reader:** This file is the machine-readable contract for writing Seedance 2 prompts through Higgsfield. Follow it verbatim. Do not invent new field labels.
>
> **Source:** Higgsfield official prompting guide (Apr 2026) + ByteDance ModelArk Seedance 2.0 documentation. Last validated: 2026-05-21.

## 1. Top-level skeleton

```
Total: <duration> / <shot_count> / <aspect_ratio>

Style: <aesthetic>, <lens-feel>, <grain>, <grade>, <palette>.
       <Negative constraints>.

Scene: <One prose paragraph. Time of day, location, set dressing,
        starting mise-en-scène, who is where, what is happening.>

Cast:  @<Name1> = <one-line role description>
       @<Name2> = <one-line role description>
       Environment: @<env_tag> (optional, if using a saved location)

Shot 1 (<duration>s): <Plain-language action and camera move>.
@<Name> [<emotion tag>]: "<dialogue line>"

Shot 2 (<duration>s): <…>

…

SFX: <concrete sound event 1>, <concrete sound event 2>, <ambient tone>.
```

## 2. Field-by-field rules

### `Total:` — header line
- **Required.** Always the first line of the prompt.
- Format: `Total: 15s / 5 shots / 16:9` (or `9:16` for vertical, `1:1` for square).
- Supported durations: **5s**, **10s**, **15s** on web; mobile caps at **10s**.
- Hard rule: **shot count × ~2–3s ≤ duration.** If you violate this, Seedance compresses or drops shots.

### `Style:` — aesthetic block
- One paragraph, comma-separated. No bullet points.
- **Working vocabulary** (model honors these):
  - Camera aesthetic buckets: `ARRI ALEXA aesthetic`, `RED cinema look`, `Super 16mm`, `Super 35mm`, `IMAX-style`, `anamorphic`, `widescreen`.
  - Lens feel: `wide-angle`, `normal lens`, `85mm telephoto feel`, `macro`, `fisheye`, `shallow depth of field`, `deep focus`.
  - Grain / texture: `35mm film grain`, `analog grain`, `digital sharpness`, `clean digital look`.
  - Grade / color: `warm cinematic grade`, `cool teal-orange grade`, `bleach bypass`, `desaturated`, `high-contrast`, `noir grade`.
  - Palette: name 2–3 colors max — `tobacco amber, crisp blacks` / `sodium yellow, deep blue shadows`.
- **Wasted tokens** (model ignores):
  - Camera body model names: `Arriflex 435`, `Sony Venice 2`, `Phantom Flex`.
  - Lens model names: `Cooke S4`, `Zeiss Master Primes`.
  - T-stops: `T2.8`, `f/1.8`.
  - Literary metaphors: `"each word lands like a loaded round"`.
- **Negative constraints in this block** (always include if relevant):
  - `No subtitles, no text overlay, no captions.`
  - `No on-screen text.`
  - `Dialogue in <language>.`

### `Scene:` — one prose paragraph
- 2–4 sentences. No field sub-labels.
- Cover: time of day, location, weather, set dressing, starting position of cast.
- This absorbs what the old non-canonical schemas called `Starting Composition`, `Static Description`, `Establishing Description`.
- Reference environment once via `@<env_tag>`; do not repeat the tag.

### `Cast:` — declaration block
- One line per character.
- Format: `@<short_name> = <role + visual anchor>`
- Use **short readable names** (`@Colt`, `@Dutch`) rather than raw UUIDs in the prompt body. Higgsfield resolves these to character chips internally.
- For the environment, optionally: `Environment: @<env_tag>`.

### `Shot N (<duration>s):` — beat blocks
- Numbered sequentially: `Shot 1:`, `Shot 2:`, `Shot 3:`, …
- Per-shot duration in parentheses is optional but useful for budgeting.
- Each Shot describes one camera setup. **Do not write `Hard Cut to…` inside a Shot.** Each new `Shot N:` is itself the cut.
- Pattern: `<size + angle + movement> on <subject>. <Action>.`
  - Example: `Shot 2 (2s): Eye-level close-up on @Dutch at table left. Static. Jaw set, single slow nod, no words.`

### Dialogue inside a Shot
- Format: `@<Name> [<emotion tag>]: "<line>"`
- Emotion tag examples: `[voice low, deliberate]`, `[shouting, panicked]`, `[whispered, almost inaudible]`, `[laughing dryly]`.
- **Hard rule: in a Shot with dialogue, keep the character physically still.** No head turns, no walking, no reaching. Speaking + significant movement collides with the lip-sync engine.
- Movement beats go in a **separate Shot** before or after the dialogue Shot.

### `SFX:` — sound events
- Final line of the prompt.
- Comma-separated list of **concrete sound events**, not stylistic descriptions.
- ✓ `glass set on wood, low ambient saloon room tone, distant boot scuff, two-beat silence after final line`
- ✗ `short declarative sentences, weighty pauses` (that's a directing note, not an SFX cue)

## 3. Shot vocabulary — quick reference

### Shot sizes (model honors)
- `extreme wide shot` (EWS) — landscape with tiny figure
- `wide shot` (WS) — full figure with environment
- `medium long shot` (MLS) — full body
- `medium shot` (MS) — waist up
- `medium close-up` (MCU) — chest up
- `close-up` (CU) — head and shoulders
- `extreme close-up` (ECU) — eyes / mouth / hand only

### Camera angles
- `eye level`, `low angle`, `high angle`, `Dutch angle` (tilted), `bird's eye / top-down`, `worm's eye / from below`, `over the shoulder` (OTS), `point of view` (POV)

### Camera movement
- `push in` / `dolly in` — toward subject
- `pull out` / `dolly out` — away from subject
- `pan left` / `pan right` — horizontal rotation
- `tilt up` / `tilt down` — vertical rotation
- `tracking shot` / `lateral track` — sideways follow
- `orbit` / `arc` — circling subject
- `crane up` / `crane down` — vertical rise/fall
- `handheld` — organic shake
- `Steadicam` — smooth float
- `whip pan` — fast rotational blur
- `static` / `locked off` — no movement
- `slow-motion ramp` — speed change

### Lighting (model honors)
- Direction: `backlight`, `rim light from <direction>`, `key light from <direction>`, `top light`, `under light`, `motivated by <source>`
- Quality: `hard light`, `soft light`, `diffused`, `dappled`
- Fill: `minimal fill`, `bounce fill`, `negative fill on shadow side`
- Practicals: `practical lamps in frame`, `neon signs`, `firelight flicker`, `screen glow`
- Mood: `chiaroscuro`, `low-key`, `high-key`

### Audio / dialogue rules
- One Shot = one dialogue beat (max ~10 words).
- Use `[emotion tag]` before the quote.
- For voiceover: `VO: @<Name>: "<line>"` (in SFX block, not in a Shot).
- For overlapping dialogue: avoid in single generation; do separate clips and edit.

## 4. Validated do-not-do list

1. Do not use field labels `Static Description:`, `Dynamic Description:`, `Starting Composition:`. These are from community Claude skills, not Seedance grammar.
2. Do not put `Hard Cut to…` or `Whip-pan to…` inline inside a Shot.
3. Do not pair head turn / look / movement with a dialogue line in the same Shot.
4. Do not repeat camera specs (size, focal feel, angle) across consecutive Shots if unchanged. The model assumes consistency.
5. Do not write camera body or lens model names. Use aesthetic buckets.
6. Do not write literary metaphors. Concrete physical descriptions only.
7. Do not exceed 1 shot per 2–3 seconds of duration.
8. Do not use raw UUIDs in the prompt body when short names work.

## 5. Minimal working example

```
Total: 5s / 1 shot / 16:9

Style: ARRI ALEXA aesthetic, 50mm normal lens feel, 35mm film grain,
       warm grade, deep shadows. No subtitles, no text overlay.

Scene: Late afternoon kitchen. @Mara stands at the sink, looking out the
       window. Soft daylight through curtains. Steam rising from a kettle.

Cast:  @Mara = young woman in linen shirt, hair tied back

Shot 1 (5s): Medium close-up on @Mara from behind, slow dolly in to over
the shoulder, ending on her reflection in the window.
@Mara [voice barely above a whisper]: "He never said when."

SFX: kettle whistle softening, faucet drip, distant bird, faint floor creak.
```

This is the canonical shape. Everything more complex is a variation on it.

## 6. Reference & anchor system

Seedance 2 through Higgsfield supports persistent references via the **Elements** system. Three reference categories matter inside a prompt.

### Character references — three tiers

- **Soul ID** (trained, persistent): 15–20 photos, ~3–5 min training, ~$3/training. Referenced via `character_id` (UUID) through MCP, or `@<short_name>` in Cinema Studio. Highest fidelity; identity locked across style, lighting, angle.
- **Soul 2.0 text tag** (no training): ~75-word inline description using Soul-style face/body/expression tags. Lighter, faster, less consistent. Use for secondary cast.
- **One-off** (no anchor): plain text description in prompt. New face each generation. Use for extras and exploration.

Cross-surface: **one Soul ID works across Nano Banana, Soul image gen, and Seedance 2 video** within Higgsfield. Train once, use everywhere.

### Environment / object references — Elements

- Locations and props are also Elements — created from reference images, referenced as `@<env_tag>` or `@<prop_name>`.
- **Persistence:** geometry + key features stick. Lighting is a project-level dial (Relight), not baked into the Element.
- **Inline prop tethering** without creating a saved Element: `@<Char> wearing <description> from @<reference_image_N>`.

### Image-to-video starting frame

- **Min 768px** shortest side, recommended ≥1024×1024.
- **Max 10 MB.** Formats: JPG, PNG, WebP, HEIC.
- **Aspect ratios** auto-detected: 21:9, 16:9, 4:3, 1:1, 3:4, 9:16.
- **Modes:** single-frame · first-and-last · Omni (multi-image).
- **Rule:** do not redescribe content visible in the frame. Describe only motion, camera moves, lighting shift, and mood.

### Color / mood references

- **Soul HEX:** upload ≤20 reference images → palette extraction. Works in Soul image gen.
- **Moodboard:** ≤80 images (10–30 sweet spot). Defines style/tone signature. **Do not include faces** — conflicts with Soul ID.
- **No LUT / true style-transfer reference** exists as a feature. Workaround: HEX + moodboard + textual description in `Style:`.

### Combining references in one prompt

- **Omni cap:** 12 assets total (9 images + 3 videos + 3 audio). Sweet spot: 3–5 images + 1–2 videos.
- **Placement** in canonical structure:
  - `Style:` — Moodboard / HEX color palette refs
  - `Scene:` — `@<env_tag>` for location
  - `Cast:` — `@<character_name>`, prop tethers
  - `Shot N:` — starting frame is Shot 1's first keyframe
  - `SFX:` — audio references

### Known failure modes

- Soul ID drift in highly stylized environments — overpowering style tokens override identity.
- Identity weakens at extreme angles if training set lacked profiles. Include profile + 3/4 angle in training photos.
- Hand/pose artifacts are inherited from the base model — no reference fixes these.
- Project-level lighting is not Element-bound. Specify lighting per shot or via Relight; don't expect the Location element to "remember" golden hour.
