April 2, 2026

How to Upload Local Images and Videos to JSONClip With cURL and Render in One Request

A complete multipart JSONClip tutorial for rendering local images, videos, and audio with cURL in one request, including basename rules, mixed-media examples, troubleshooting, and when to switch back to hosted URLs.

Long-read tutorial

This guide is for the moment when your images or videos are not hosted yet. Maybe the files live on your laptop. Maybe they come from a content folder your team just exported. Maybe a workflow step downloaded them locally and now needs to render immediately. JSONClip supports that with one multipart request: upload the files in `files[]`, send the project JSON inside `render_config`, and reference each file by basename.

A lot of teams overcomplicate this step because they assume “API” automatically means “everything must already be on a CDN.” That is not true. The hosted JSON flow is simpler when URLs already exist, but the multipart flow is the practical route when the assets only exist on your machine or on a worker at request time.

Tutorial map

These guides are meant to work together. Start with the article that matches your current workflow, then use the others when you move from manual setup into repeatable automation.

Why multipart upload exists

The hosted JSON mode cannot read arbitrary workstation paths because the renderer runs remotely. If you point the hosted API at `/Users/me/Desktop/clip.mp4`, the server has no access to that file. Multipart upload solves the problem honestly: you attach the files to the request, JSONClip stores them for the render job, and your JSON references them by filename.

That is more than convenience. It creates a reproducible render package. The request body and the uploaded files belong to the same operation, which makes debugging easier and keeps the caller in control of exactly what was rendered.

The render model in one minute

JSONClip works best when you think in layers, not in vague editor gestures. A render request has a format, a scene list, optional overlays, optional audio, optional effects, and optional captions. That separation matters because it keeps the workflow legible whether you are clicking in the editor, sending cURL, or calling the API from an automation tool.

LayerWhat it controlsWhy it matters
FormatWidth, height, FPS, background colorIf format is unclear, everything downstream gets harder, especially captions and text fit.
ScenesThe base images or videosTreat scenes as the backbone. If scene order is wrong, every overlay, effect, and audio cue inherits the mistake.
OverlaysText, logos, sticker-like layersOverlays carry the messaging. They should be positioned with intent, not added as a last-minute afterthought.
AudioVoiceover, music, sound cuesGood video feels finished because the audio is managed carefully, not because the visuals are fancy.
Effects and transitionsMotion treatment and continuityEffects are there to reinforce pacing, not to rescue weak structure.
CaptionsSubtitle-style bottom text or inline cuesCaptions should stay readable on mobile and should match the spoken pacing.

The one rule you must remember in multipart mode

Basename matching is what matters. If the file you upload is `intro.jpg`, then the `src` in `render_config` should be `intro.jpg`. It should not be `./intro.jpg`, `/tmp/intro.jpg`, or an unrelated alias. Keep the names exact and the flow stays reliable.

Local upload + render in one request
DIR=./campaign-assets
curl -sS -X POST 'https://api.jsonclip.com/render?sync=1' \
  -H "X-API-Key: YOUR_API_KEY" \
  --form-string "render_config=$(cat <<'JSON'
{
  "env": "prod",
  "movie": {
    "format": { "width": 720, "height": 1280, "fps": 30, "background_color": "#000000" },
    "scenes": [
      { "type": "image", "src": "intro.jpg", "duration_ms": 1200, "transition_out": { "type": "white_strobe", "duration_ms": 240 } },
      { "type": "video", "src": "feature.mp4", "duration_ms": 2200, "transition_out": { "type": "blur", "duration_ms": 320 } },
      { "type": "image", "src": "cta.jpg", "duration_ms": 1600 }
    ],
    "overlays": [
      {
        "type": "text",
        "text": "LOCAL FILE DEMO",
        "from_ms": 100,
        "to_ms": 1800,
        "position_px": { "x": 360, "y": 160 },
        "width_px": 580,
        "style": { "font": "Avenir Next", "size_px": 60, "bold": true, "case": "upper", "align": "center", "color": "#ffffff" },
        "stroke": { "color": "#000000", "width_px": 5 }
      }
    ],
    "audio": [
      { "src": "voiceover.mp3", "role": "voiceover", "from_ms": 0, "to_ms": 5000 },
      { "src": "music.mp3", "role": "music", "from_ms": 0, "to_ms": 5000, "volume_db": -10, "duck_under_voice": true }
    ],
    "effects": [
      { "type": "zoom_in", "from_ms": 0, "to_ms": 1800, "settings": { "strength": 1.08 } },
      { "type": "fade_out", "from_ms": 4300, "to_ms": 5000 }
    ]
  }
}
JSON
)" \
  -F "files[]=@$DIR/intro.jpg" \
  -F "files[]=@$DIR/feature.mp4" \
  -F "files[]=@$DIR/cta.jpg" \
  -F "files[]=@$DIR/voiceover.mp3" \
  -F "files[]=@$DIR/music.mp3

A minimal mental model for multipart mode

PieceWhat you sendWhy it exists
`files[]`The local assets themselvesThese are the bytes the renderer can actually use.
`render_config`The JSONClip movie definitionThis describes how those uploaded files should be assembled into a video.
Basename referencesNames like `intro.jpg` or `feature.mp4`These connect the uploaded files to the scene, overlay, or audio entries in the config.
Sync resultThe returned `movie_url` and metadataThis gives you a direct success path for the first iteration.

A realistic example that mixes images, video, voiceover, and music

Multipart request with mixed local media
DIR=./local-demo
curl -sS -X POST 'https://api.jsonclip.com/render?sync=1'   -H "X-API-Key: YOUR_API_KEY"   --form-string "render_config=$(cat <<'JSON'
{
  "env": "prod",
  "movie": {
    "format": { "width": 1080, "height": 1920, "fps": 30, "background_color": "#000000" },
    "scenes": [
      { "type": "image", "src": "opener.jpg", "duration_ms": 1600, "transition_out": { "type": "white_strobe", "duration_ms": 240 } },
      { "type": "video", "src": "product-demo.mp4", "duration_ms": 2600, "transition_out": { "type": "blur", "duration_ms": 320 } },
      { "type": "image", "src": "cta.jpg", "duration_ms": 1900 }
    ],
    "overlays": [
      {
        "type": "text",
        "text": "Everything here came from local files",
        "from_ms": 120,
        "to_ms": 2000,
        "position_px": { "x": 540, "y": 210 },
        "width_px": 860,
        "style": { "font": "Avenir Next", "size_px": 80, "bold": true, "align": "center", "color": "#ffffff" },
        "stroke": { "color": "#000000", "width_px": 5 }
      }
    ],
    "audio": [
      { "src": "voice.mp3", "role": "voiceover", "from_ms": 0, "to_ms": 6100 },
      { "src": "music.mp3", "role": "music", "from_ms": 0, "to_ms": 6100, "volume_db": -11, "duck_under_voice": true }
    ],
    "captions": {
      "style": "bold_center",
      "cues": [
        { "from_ms": 0, "to_ms": 1600, "text": "Local files do not block the API workflow." },
        { "from_ms": 1600, "to_ms": 3900, "text": "Upload them once, then reference them by name." },
        { "from_ms": 3900, "to_ms": 6100, "text": "That keeps the request explicit and reproducible." }
      ]
    }
  }
}
JSON
)"   -F "files[]=@$DIR/opener.jpg"   -F "files[]=@$DIR/product-demo.mp4"   -F "files[]=@$DIR/cta.jpg"   -F "files[]=@$DIR/voice.mp3"   -F "files[]=@$DIR/music.mp3"

How to keep multipart requests easy to read

The biggest risk with multipart mode is not the upload itself. The risk is turning the request into an unreadable wall of shell syntax. The fix is simple: keep your `render_config` formatted clearly, keep filenames plain, and limit the first version of the workflow to the assets you actually need.

In practice, that means naming files on purpose before you upload them. `opener.jpg`, `demo.mp4`, `cta.jpg`, `voice.mp3`, and `music.mp3` are better names than `IMG_9472.JPG` and `final-final-v8.mp4`. The renderer can technically use the messy names, but your future debugging self will pay for the sloppiness.

When this flow is better than pre-uploading to object storage

Use multipart when you need a fast one-shot request from a local source. That includes developer testing, operator-driven campaign builds from a folder, or automation workers that assemble files locally before rendering.

Do not overuse multipart if the same assets will be reused often. In that case it can be smarter to persist them into stable storage first and then switch back to the hosted JSON flow. The hosted flow is easier to read, easier to reuse, and easier to scale when the URLs already exist.

Common multipart mistakes that waste time

  • Uploading `intro.jpg` but referencing `intro.PNG` in the JSON.
  • Using a directory path in `src` instead of the actual basename.
  • Mixing JSON-body examples and multipart examples in the same request shape.
  • Uploading large local files first, then realizing the scene timing was never tested on a shorter version.
  • Treating multipart mode as a permanent architecture when the real need is durable media storage.

What to check when the render fails

SymptomLikely causeFix
The API says an asset is missingThe basename in `src` does not match the uploaded filenameMake the names identical and keep the extension exact.
The request is accepted but media is wrongYou uploaded the wrong local files under names the JSON referencesVerify the directory contents and the form fields before rerunning.
The command becomes impossible to maintainThe project is outgrowing ad hoc shell usageMove the JSON into a file or graduate to an upstream storage step plus the hosted API mode.
The render works once but not as a reusable workflowLocal file handling is too tied to one machinePersist the assets earlier and reserve multipart for the cases that truly need it.

Troubleshooting

Most first attempts fail for ordinary reasons, not exotic ones. The fix is usually to simplify the request, verify the media sources, and add complexity back in once the minimal version works.

What you seeWhat it usually meansWhat to do
The API returns an error before rendering startsYour JSON shape or media references are wrongValidate the body, confirm your header is `X-API-Key`, and make sure every `src` is either a downloadable URL or a basename uploaded in multipart mode.
The final video renders but the pacing feels wrongScene durations, effect timing, or audio trim are offShorten the first version of the workflow. Get a clean five-second or eight-second result before you scale to a longer reel.
The video looks fine in one environment and wrong in anotherPreview parity or unsupported media format issueStick to stable formats and verify with the final render, not only with a browser preview.
The output is technically correct but hard to readTypography, caption size, or spacing is too aggressiveReduce text density. Good automation usually starts with simpler copy than teams expect.
The render succeeds but the wrong clip appearsA basename collision or manual shell mistake replaced the expected fileUse clearer filenames and a dedicated working directory per render.
The request is huge and slowYou are shipping more bytes than the first iteration needsCut the first test down to the minimal assets that prove the path.
The team keeps rerendering the same assets locallyThe workflow really wants durable upstream storage nowPromote the assets to stable URLs and switch to the hosted JSON guide.

How this connects to automation workflows

Multipart mode also matters in automation, but it is not always the first choice. In n8n, Make.com, and Zapier, the cleanest pattern is still hosted URLs when available. Multipart is the right advanced move when the automation platform has binary files in-flight and you need to render before those assets are stored elsewhere.

That means this guide is especially useful for hybrid teams. Designers or operators can work from a local folder today, while engineering later decides whether those assets should stay local, be uploaded earlier in the pipeline, or be generated upstream by another service.

FAQ

Can I upload images and videos in the same multipart render request? Yes. The same basename rule applies to both.

Do I still get a normal `movie_url` response? Yes, when you use `?sync=1`, the response shape is the same idea as the hosted JSON flow.

Should I use local upload mode forever? Only if the local-first pattern is truly part of your workflow. Otherwise stable hosted URLs usually make maintenance easier.

How to reason about a multipart local upload payload field by field

The fastest way to debug a multipart local upload request is to stop thinking about it as one giant blob. Treat it as a set of layers with separate responsibilities. Once you do that, most failures become ordinary: the format was wrong, a scene URL was bad, the overlay text was too dense, the audio timing was sloppy, or the effect window did not match the cut.

Field groupWhat to verify firstWhat teams usually overcomplicate
FormatWidth, height, FPS, and channel fitOver-optimizing FPS before the composition works
ScenesEach scene source resolves correctly in multipart upload modeAdding too many scenes before the first request proves the path
OverlaysText width, contrast, and timingDecorative styling before the copy is clear
AudioTrack role and trim windowLayering multiple tracks before one clean track works
EffectsWhether the effect has a jobUsing motion as a substitute for structure
CaptionsCue timing and line readabilityTreating captions as mandatory even when the video does not need them

How to version a multipart local upload render contract

Once a multipart local upload request starts producing useful videos, give the template a name and a version. That can be as simple as `starter_vertical_v1`, `product_demo_v2`, or `quote_card_landscape_v1`. The naming convention matters because it lets your team discuss the template as a stable object instead of as a vague memory of the JSON.

This also makes change control less emotional. If a new layout is better, call it `v2` and keep `v1` available until the new one proves itself. Teams that overwrite a working request every time someone has a new idea create their own instability.

Useful metadata wrapper around a multipart local upload payload
{
  "template_key": "starter_vertical_v1",
  "campaign_id": "campaign_2048",
  "channel": "reels",
  "request": {
    "...": "JSONClip movie payload here"
  }
}

How to adapt one multipart local upload tutorial into several channels

A strong template family usually changes by channel more than by brand idea. The hook, visual proof, and CTA can stay conceptually similar while format, text density, and end-card pacing shift. That means you do not need a brand-new request for every destination. You need controlled variants.

ChannelTypical formatTypical edit change
Short-form vertical720x1280 or 1080x1920Keep the opener fast and the text large
Landscape explainer1280x720 or 1920x1080Give the layout more breathing room and wider text blocks
Square promo1080x1080Center the composition and reduce edge-hugging text
Story or ad variantVertical with clear CTA zoneProtect the closing frame for call-to-action legibility

How to decide when to leave multipart local upload and use automation tooling

The plain multipart local upload workflow is the correct long-term solution when the caller already has the data and can assemble the request predictably. Move into automation only when a trigger, branch, schedule, or downstream business system genuinely needs orchestration.

That is why these guides connect. Start with multipart local upload. If the next problem is workflow coordination, step into n8n, Make.com, or Zapier based on where the rest of the business process lives.

How to review a multipart local upload video before you call it done

The easiest mistake in a multipart local upload workflow is to stop as soon as the render technically succeeds. A successful render is not the same thing as a useful video. Before you ship, review the video with boring discipline: can a person understand the opener instantly, does each scene stay on screen long enough to make sense, does the audio enter and exit cleanly, and does the close actually tell the viewer what to do next?

This matters even more in automation because the first video is rarely the final goal. The real goal is a repeatable pattern. If the first result works only because you manually tolerated a weak opening, awkward copy density, or a sloppy CTA, the system is not ready to scale. A reusable template needs stronger quality rules than a one-off experiment.

Review the first output at normal speed, then one more time with the sound off, and then once again by jumping through key moments on the timeline. Sound-off review tells you whether the visual structure is carrying its own weight. Scrub review tells you whether the transitions, text timing, and end card are landing where you think they are landing.

Review passWhat to look forWhat usually needs fixing
Normal playbackOverall rhythm and legibilityScene durations that are slightly too long or slightly too short
Muted playbackMessage clarity without audio supportOverlays doing too much work or not enough
Scrub reviewCut points, effect windows, caption timingTransitions or text cues landing a little early or late
Mobile-size checkPhone readabilityText that technically fits but is tiring to read
Final export reviewParity between idea and delivered fileSubtle issues that were easy to ignore in the build flow

How to turn one multipart local upload example into a repeatable template

The healthy way to reuse a multipart local upload project is to freeze the structure and vary only the data that actually changes. In plain terms, that means you decide which parts are template constants and which parts are runtime variables. Constants usually include format, text style, caption style, transition family, and effect intensity. Variables usually include scene source URLs, headline text, supporting copy, voiceover, music, or the closing CTA.

This distinction is operationally important because it keeps later edits cheap. If your structure and data are mixed together without a rule, every new campaign becomes a mini redesign. If they are separated early, one template can support many outputs with much less rework.

Template layerKeep stable when possibleLet it vary when needed
CanvasWidth, height, FPS, safe marginsOnly change for a different destination channel
TypographyFont family, general weight, default alignmentSwap only when the brand system truly requires it
Motion languageCore transition and effect familiesChange only when the creative intent changes
Content dataNever hard-code campaign-specific values into the templateHeadlines, asset URLs, captions, and CTA text
DistributionDelivery step shapeDestination channel, notification recipient, or storage path

What to log so debugging stays cheap

Every serious workflow needs enough logs to answer four questions later: what payload did we send, what assets did we reference, what result came back, and which business record did that result belong to? Teams often log too little and then start guessing. Guessing is expensive.

For JSONClip, the minimum useful log record is usually a request identifier, the project or business record identifier, the format, the main asset references, the final `movie_url`, and any credits or duration metadata returned by the render. If you can replay or inspect a failed run from that record, your observability is probably good enough for this stage.

Useful workflow log record shape
{
  "template_key": "starter_vertical_v1",
  "source_record_id": "campaign_2048",
  "format": { "width": 720, "height": 1280, "fps": 30 },
  "primary_assets": [
    "cover.jpg",
    "demo.mp4",
    "voice.mp3"
  ],
  "movie_url": "https://renderer.jsonclip.com/jsonclip/movies/example.mp4",
  "duration_ms": 6100,
  "credits_used": 42
}

A practical shipping checklist

  • The opener is readable in under a second.
  • The text density matches the actual pace of the cut.
  • No scene exists only because an asset was available.
  • Music and voiceover timing make sense together.
  • Effects and transitions reinforce pacing instead of hiding weak structure.
  • The closing frame clearly tells the viewer what happens next.
  • The request or project can be rerun without manual mystery steps.
  • The workflow owner knows whether the next step is hosted JSON, multipart upload, or a workflow tool such as n8n, Make.com, or Zapier.

How to document a multipart local upload workflow so another person can run it

A tutorial is only useful if a second person can follow it later without private context. For a multipart local upload workflow, the minimum documentation set is simple: what inputs are required, what the output looks like, who owns the template, what the normal render duration looks like, and what should happen when the run fails.

This sounds administrative, but it has direct quality impact. Teams that do not write down the expected inputs tend to sneak extra assumptions into the process. Then the workflow seems fine until a new operator or a new campaign uses a slightly different asset set and the whole thing becomes brittle.

Document sectionWhat it should contain
PurposeWhat class of video this workflow is supposed to produce
InputsRequired asset types, text fields, and optional fields
Template rulesFormat, text limits, caption usage, and motion rules
Operational notesExpected runtime, sync or async mode, and downstream destination
Failure policyWho gets notified and what should be retried

How to keep a multipart local upload template from drifting over time

Template drift is one of the quiet costs in video systems. A small text size tweak here, a transition change there, a different CTA rhythm for one campaign, and soon the template is no longer a template. It is a bag of exceptions. The fix is to treat changes as deliberate revisions, not as random convenience edits.

In practical terms, keep a short change log. Note why the template changed, what visual behavior changed, and whether older outputs still need the previous version. Even a tiny log beats memory.

Simple template change log format
- starter_vertical_v1
  - purpose: short product teaser
  - updated: 2026-04-03
  - notable rules:
    - opener under 2 seconds
    - one headline overlay
    - captions optional

- starter_vertical_v2
  - purpose: same template with cleaner close
  - updated: 2026-04-10
  - notable changes:
    - wider CTA safe area
    - slower end fade
    - tighter caption line length

A release checklist for a multipart local upload update

  1. Test one known-good input set.
  2. Test one awkward but realistic input set, such as longer copy or a darker image.
  3. Confirm the final output still matches the intended channel format.
  4. Confirm the downstream consumer still receives the same key result fields.
  5. Write down the update in the template notes before treating the change as complete.

Conclusion

Multipart mode is the practical bridge between local assets and a real render API. It lets you stay explicit, reproducible, and fast even when the media is not hosted yet.

If you later move those assets into stable storage, switch to the hosted media guide. If you want the same idea inside a workflow tool, continue with n8n, Make.com, or Zapier.

That is the practical bar for a good JSONClip workflow: easy to read, easy to rerun, easy to debug, and easy to hand off to the next person or the next automation layer.