April 2, 2026

How to Upload Local Images and Videos to JSONClip With cURL and Render in One Request

A complete multipart JSONClip tutorial for rendering local images, videos, and audio with cURL in one request, including basename rules, mixed-media examples, troubleshooting, and when to switch back to hosted URLs.

Long-read tutorial

This guide is for the moment when your images or videos are not hosted yet. Maybe the files live on your laptop. Maybe they come from a content folder your team just exported. Maybe a workflow step downloaded them locally and now needs to render immediately. JSONClip supports that with one multipart request: upload the files in `files[]`, send the project JSON inside `render_config`, and reference each file by basename.

A lot of teams overcomplicate this step because they assume “API” automatically means “everything must already be on a CDN.” That is not true. The hosted JSON flow is simpler when URLs already exist, but the multipart flow is the practical route when the assets only exist on your machine or on a worker at request time.

Tutorial map

These guides are meant to work together. Start with the article that matches your current workflow, then use the others when you move from manual setup into repeatable automation.

Editor tutorial for the visual workflow.
Hosted API tutorial for plain JSON and hosted URLs.
Local upload tutorial for multipart uploads with files from your machine.
n8n tutorial for workflow automation with the HTTP Request node.
Make.com tutorial for scenario-driven automation.
Zapier tutorial for Webhooks by Zapier flows.

Why multipart upload exists

The hosted JSON mode cannot read arbitrary workstation paths because the renderer runs remotely. If you point the hosted API at `/Users/me/Desktop/clip.mp4`, the server has no access to that file. Multipart upload solves the problem honestly: you attach the files to the request, JSONClip stores them for the render job, and your JSON references them by filename.

That is more than convenience. It creates a reproducible render package. The request body and the uploaded files belong to the same operation, which makes debugging easier and keeps the caller in control of exactly what was rendered.

The render model in one minute

JSONClip works best when you think in layers, not in vague editor gestures. A render request has a format, a scene list, optional overlays, optional audio, optional effects, and optional captions. That separation matters because it keeps the workflow legible whether you are clicking in the editor, sending cURL, or calling the API from an automation tool.

Layer	What it controls	Why it matters
Format	Width, height, FPS, background color	If format is unclear, everything downstream gets harder, especially captions and text fit.
Scenes	The base images or videos	Treat scenes as the backbone. If scene order is wrong, every overlay, effect, and audio cue inherits the mistake.
Overlays	Text, logos, sticker-like layers	Overlays carry the messaging. They should be positioned with intent, not added as a last-minute afterthought.
Audio	Voiceover, music, sound cues	Good video feels finished because the audio is managed carefully, not because the visuals are fancy.
Effects and transitions	Motion treatment and continuity	Effects are there to reinforce pacing, not to rescue weak structure.
Captions	Subtitle-style bottom text or inline cues	Captions should stay readable on mobile and should match the spoken pacing.

The one rule you must remember in multipart mode

Basename matching is what matters. If the file you upload is `intro.jpg`, then the `src` in `render_config` should be `intro.jpg`. It should not be `./intro.jpg`, `/tmp/intro.jpg`, or an unrelated alias. Keep the names exact and the flow stays reliable.

Local upload + render in one request

DIR=./campaign-assets
curl -sS -X POST 'https://api.jsonclip.com/render?sync=1' \
  -H "X-API-Key: YOUR_API_KEY" \
  --form-string "render_config=$(cat <<'JSON'
{
  "env": "prod",
  "movie": {
    "format": { "width": 720, "height": 1280, "fps": 30, "background_color": "#000000" },
    "scenes": [
      { "type": "image", "src": "intro.jpg", "duration_ms": 1200, "transition_out": { "type": "white_strobe", "duration_ms": 240 } },
      { "type": "video", "src": "feature.mp4", "duration_ms": 2200, "transition_out": { "type": "blur", "duration_ms": 320 } },
      { "type": "image", "src": "cta.jpg", "duration_ms": 1600 }
    ],
    "overlays": [
      {
        "type": "text",
        "text": "LOCAL FILE DEMO",
        "from_ms": 100,
        "to_ms": 1800,
        "position_px": { "x": 360, "y": 160 },
        "width_px": 580,
        "style": { "font": "Avenir Next", "size_px": 60, "bold": true, "case": "upper", "align": "center", "color": "#ffffff" },
        "stroke": { "color": "#000000", "width_px": 5 }
      }
    ],
    "audio": [
      { "src": "voiceover.mp3", "role": "voiceover", "from_ms": 0, "to_ms": 5000 },
      { "src": "music.mp3", "role": "music", "from_ms": 0, "to_ms": 5000, "volume_db": -10, "duck_under_voice": true }
    ],
    "effects": [
      { "type": "zoom_in", "from_ms": 0, "to_ms": 1800, "settings": { "strength": 1.08 } },
      { "type": "fade_out", "from_ms": 4300, "to_ms": 5000 }
    ]
  }
}
JSON
)" \
  -F "files[]=@$DIR/intro.jpg" \
  -F "files[]=@$DIR/feature.mp4" \
  -F "files[]=@$DIR/cta.jpg" \
  -F "files[]=@$DIR/voiceover.mp3" \
  -F "files[]=@$DIR/music.mp3

A minimal mental model for multipart mode

Piece	What you send	Why it exists
`files[]`	The local assets themselves	These are the bytes the renderer can actually use.
`render_config`	The JSONClip movie definition	This describes how those uploaded files should be assembled into a video.
Basename references	Names like `intro.jpg` or `feature.mp4`	These connect the uploaded files to the scene, overlay, or audio entries in the config.
Sync result	The returned `movie_url` and metadata	This gives you a direct success path for the first iteration.

A realistic example that mixes images, video, voiceover, and music

Multipart request with mixed local media

DIR=./local-demo
curl -sS -X POST 'https://api.jsonclip.com/render?sync=1'   -H "X-API-Key: YOUR_API_KEY"   --form-string "render_config=$(cat <<'JSON'
{
  "env": "prod",
  "movie": {
    "format": { "width": 1080, "height": 1920, "fps": 30, "background_color": "#000000" },
    "scenes": [
      { "type": "image", "src": "opener.jpg", "duration_ms": 1600, "transition_out": { "type": "white_strobe", "duration_ms": 240 } },
      { "type": "video", "src": "product-demo.mp4", "duration_ms": 2600, "transition_out": { "type": "blur", "duration_ms": 320 } },
      { "type": "image", "src": "cta.jpg", "duration_ms": 1900 }
    ],
    "overlays": [
      {
        "type": "text",
        "text": "Everything here came from local files",
        "from_ms": 120,
        "to_ms": 2000,
        "position_px": { "x": 540, "y": 210 },
        "width_px": 860,
        "style": { "font": "Avenir Next", "size_px": 80, "bold": true, "align": "center", "color": "#ffffff" },
        "stroke": { "color": "#000000", "width_px": 5 }
      }
    ],
    "audio": [
      { "src": "voice.mp3", "role": "voiceover", "from_ms": 0, "to_ms": 6100 },
      { "src": "music.mp3", "role": "music", "from_ms": 0, "to_ms": 6100, "volume_db": -11, "duck_under_voice": true }
    ],
    "captions": {
      "style": "bold_center",
      "cues": [
        { "from_ms": 0, "to_ms": 1600, "text": "Local files do not block the API workflow." },
        { "from_ms": 1600, "to_ms": 3900, "text": "Upload them once, then reference them by name." },
        { "from_ms": 3900, "to_ms": 6100, "text": "That keeps the request explicit and reproducible." }
      ]
    }
  }
}
JSON
)"   -F "files[]=@$DIR/opener.jpg"   -F "files[]=@$DIR/product-demo.mp4"   -F "files[]=@$DIR/cta.jpg"   -F "files[]=@$DIR/voice.mp3"   -F "files[]=@$DIR/music.mp3"

How to keep multipart requests easy to read

The biggest risk with multipart mode is not the upload itself. The risk is turning the request into an unreadable wall of shell syntax. The fix is simple: keep your `render_config` formatted clearly, keep filenames plain, and limit the first version of the workflow to the assets you actually need.

In practice, that means naming files on purpose before you upload them. `opener.jpg`, `demo.mp4`, `cta.jpg`, `voice.mp3`, and `music.mp3` are better names than `IMG_9472.JPG` and `final-final-v8.mp4`. The renderer can technically use the messy names, but your future debugging self will pay for the sloppiness.

When this flow is better than pre-uploading to object storage

Use multipart when you need a fast one-shot request from a local source. That includes developer testing, operator-driven campaign builds from a folder, or automation workers that assemble files locally before rendering.

Do not overuse multipart if the same assets will be reused often. In that case it can be smarter to persist them into stable storage first and then switch back to the hosted JSON flow. The hosted flow is easier to read, easier to reuse, and easier to scale when the URLs already exist.

Common multipart mistakes that waste time

Uploading `intro.jpg` but referencing `intro.PNG` in the JSON.
Using a directory path in `src` instead of the actual basename.
Mixing JSON-body examples and multipart examples in the same request shape.
Uploading large local files first, then realizing the scene timing was never tested on a shorter version.
Treating multipart mode as a permanent architecture when the real need is durable media storage.

What to check when the render fails

Symptom	Likely cause	Fix
The API says an asset is missing	The basename in `src` does not match the uploaded filename	Make the names identical and keep the extension exact.
The request is accepted but media is wrong	You uploaded the wrong local files under names the JSON references	Verify the directory contents and the form fields before rerunning.
The command becomes impossible to maintain	The project is outgrowing ad hoc shell usage	Move the JSON into a file or graduate to an upstream storage step plus the hosted API mode.
The render works once but not as a reusable workflow	Local file handling is too tied to one machine	Persist the assets earlier and reserve multipart for the cases that truly need it.

Troubleshooting

Most first attempts fail for ordinary reasons, not exotic ones. The fix is usually to simplify the request, verify the media sources, and add complexity back in once the minimal version works.

What you see	What it usually means	What to do
The API returns an error before rendering starts	Your JSON shape or media references are wrong	Validate the body, confirm your header is `X-API-Key`, and make sure every `src` is either a downloadable URL or a basename uploaded in multipart mode.
The final video renders but the pacing feels wrong	Scene durations, effect timing, or audio trim are off	Shorten the first version of the workflow. Get a clean five-second or eight-second result before you scale to a longer reel.
The video looks fine in one environment and wrong in another	Preview parity or unsupported media format issue	Stick to stable formats and verify with the final render, not only with a browser preview.
The output is technically correct but hard to read	Typography, caption size, or spacing is too aggressive	Reduce text density. Good automation usually starts with simpler copy than teams expect.
The render succeeds but the wrong clip appears	A basename collision or manual shell mistake replaced the expected file	Use clearer filenames and a dedicated working directory per render.
The request is huge and slow	You are shipping more bytes than the first iteration needs	Cut the first test down to the minimal assets that prove the path.
The team keeps rerendering the same assets locally	The workflow really wants durable upstream storage now	Promote the assets to stable URLs and switch to the hosted JSON guide.

How this connects to automation workflows

Multipart mode also matters in automation, but it is not always the first choice. In n8n, Make.com, and Zapier, the cleanest pattern is still hosted URLs when available. Multipart is the right advanced move when the automation platform has binary files in-flight and you need to render before those assets are stored elsewhere.

That means this guide is especially useful for hybrid teams. Designers or operators can work from a local folder today, while engineering later decides whether those assets should stay local, be uploaded earlier in the pipeline, or be generated upstream by another service.

FAQ

Can I upload images and videos in the same multipart render request? Yes. The same basename rule applies to both.

Do I still get a normal `movie_url` response? Yes, when you use `?sync=1`, the response shape is the same idea as the hosted JSON flow.

Should I use local upload mode forever? Only if the local-first pattern is truly part of your workflow. Otherwise stable hosted URLs usually make maintenance easier.

How to reason about a multipart local upload payload field by field

The fastest way to debug a multipart local upload request is to stop thinking about it as one giant blob. Treat it as a set of layers with separate responsibilities. Once you do that, most failures become ordinary: the format was wrong, a scene URL was bad, the overlay text was too dense, the audio timing was sloppy, or the effect window did not match the cut.

Field group	What to verify first	What teams usually overcomplicate
Format	Width, height, FPS, and channel fit	Over-optimizing FPS before the composition works
Scenes	Each scene source resolves correctly in multipart upload mode	Adding too many scenes before the first request proves the path
Overlays	Text width, contrast, and timing	Decorative styling before the copy is clear
Audio	Track role and trim window	Layering multiple tracks before one clean track works
Effects	Whether the effect has a job	Using motion as a substitute for structure
Captions	Cue timing and line readability	Treating captions as mandatory even when the video does not need them

How to version a multipart local upload render contract

Once a multipart local upload request starts producing useful videos, give the template a name and a version. That can be as simple as `starter_vertical_v1`, `product_demo_v2`, or `quote_card_landscape_v1`. The naming convention matters because it lets your team discuss the template as a stable object instead of as a vague memory of the JSON.

This also makes change control less emotional. If a new layout is better, call it `v2` and keep `v1` available until the new one proves itself. Teams that overwrite a working request every time someone has a new idea create their own instability.

Useful metadata wrapper around a multipart local upload payload

{
  "template_key": "starter_vertical_v1",
  "campaign_id": "campaign_2048",
  "channel": "reels",
  "request": {
    "...": "JSONClip movie payload here"
  }
}

How to adapt one multipart local upload tutorial into several channels

A strong template family usually changes by channel more than by brand idea. The hook, visual proof, and CTA can stay conceptually similar while format, text density, and end-card pacing shift. That means you do not need a brand-new request for every destination. You need controlled variants.

Channel	Typical format	Typical edit change
Short-form vertical	720x1280 or 1080x1920	Keep the opener fast and the text large
Landscape explainer	1280x720 or 1920x1080	Give the layout more breathing room and wider text blocks
Square promo	1080x1080	Center the composition and reduce edge-hugging text
Story or ad variant	Vertical with clear CTA zone	Protect the closing frame for call-to-action legibility

How to decide when to leave multipart local upload and use automation tooling

The plain multipart local upload workflow is the correct long-term solution when the caller already has the data and can assemble the request predictably. Move into automation only when a trigger, branch, schedule, or downstream business system genuinely needs orchestration.

That is why these guides connect. Start with multipart local upload. If the next problem is workflow coordination, step into n8n, Make.com, or Zapier based on where the rest of the business process lives.

How to review a multipart local upload video before you call it done

The easiest mistake in a multipart local upload workflow is to stop as soon as the render technically succeeds. A successful render is not the same thing as a useful video. Before you ship, review the video with boring discipline: can a person understand the opener instantly, does each scene stay on screen long enough to make sense, does the audio enter and exit cleanly, and does the close actually tell the viewer what to do next?

This matters even more in automation because the first video is rarely the final goal. The real goal is a repeatable pattern. If the first result works only because you manually tolerated a weak opening, awkward copy density, or a sloppy CTA, the system is not ready to scale. A reusable template needs stronger quality rules than a one-off experiment.

Review the first output at normal speed, then one more time with the sound off, and then once again by jumping through key moments on the timeline. Sound-off review tells you whether the visual structure is carrying its own weight. Scrub review tells you whether the transitions, text timing, and end card are landing where you think they are landing.

Review pass	What to look for	What usually needs fixing
Normal playback	Overall rhythm and legibility	Scene durations that are slightly too long or slightly too short
Muted playback	Message clarity without audio support	Overlays doing too much work or not enough
Scrub review	Cut points, effect windows, caption timing	Transitions or text cues landing a little early or late
Mobile-size check	Phone readability	Text that technically fits but is tiring to read
Final export review	Parity between idea and delivered file	Subtle issues that were easy to ignore in the build flow

How to turn one multipart local upload example into a repeatable template

The healthy way to reuse a multipart local upload project is to freeze the structure and vary only the data that actually changes. In plain terms, that means you decide which parts are template constants and which parts are runtime variables. Constants usually include format, text style, caption style, transition family, and effect intensity. Variables usually include scene source URLs, headline text, supporting copy, voiceover, music, or the closing CTA.

This distinction is operationally important because it keeps later edits cheap. If your structure and data are mixed together without a rule, every new campaign becomes a mini redesign. If they are separated early, one template can support many outputs with much less rework.

Template layer	Keep stable when possible	Let it vary when needed
Canvas	Width, height, FPS, safe margins	Only change for a different destination channel
Typography	Font family, general weight, default alignment	Swap only when the brand system truly requires it
Motion language	Core transition and effect families	Change only when the creative intent changes
Content data	Never hard-code campaign-specific values into the template	Headlines, asset URLs, captions, and CTA text
Distribution	Delivery step shape	Destination channel, notification recipient, or storage path

What to log so debugging stays cheap

Every serious workflow needs enough logs to answer four questions later: what payload did we send, what assets did we reference, what result came back, and which business record did that result belong to? Teams often log too little and then start guessing. Guessing is expensive.

For JSONClip, the minimum useful log record is usually a request identifier, the project or business record identifier, the format, the main asset references, the final `movie_url`, and any credits or duration metadata returned by the render. If you can replay or inspect a failed run from that record, your observability is probably good enough for this stage.

Useful workflow log record shape

{
  "template_key": "starter_vertical_v1",
  "source_record_id": "campaign_2048",
  "format": { "width": 720, "height": 1280, "fps": 30 },
  "primary_assets": [
    "cover.jpg",
    "demo.mp4",
    "voice.mp3"
  ],
  "movie_url": "https://renderer.jsonclip.com/jsonclip/movies/example.mp4",
  "duration_ms": 6100,
  "credits_used": 42
}

A practical shipping checklist

The opener is readable in under a second.
The text density matches the actual pace of the cut.
No scene exists only because an asset was available.
Music and voiceover timing make sense together.
Effects and transitions reinforce pacing instead of hiding weak structure.
The closing frame clearly tells the viewer what happens next.
The request or project can be rerun without manual mystery steps.
The workflow owner knows whether the next step is hosted JSON, multipart upload, or a workflow tool such as n8n, Make.com, or Zapier.

How to document a multipart local upload workflow so another person can run it

A tutorial is only useful if a second person can follow it later without private context. For a multipart local upload workflow, the minimum documentation set is simple: what inputs are required, what the output looks like, who owns the template, what the normal render duration looks like, and what should happen when the run fails.

This sounds administrative, but it has direct quality impact. Teams that do not write down the expected inputs tend to sneak extra assumptions into the process. Then the workflow seems fine until a new operator or a new campaign uses a slightly different asset set and the whole thing becomes brittle.

Document section	What it should contain
Purpose	What class of video this workflow is supposed to produce
Inputs	Required asset types, text fields, and optional fields
Template rules	Format, text limits, caption usage, and motion rules
Operational notes	Expected runtime, sync or async mode, and downstream destination
Failure policy	Who gets notified and what should be retried

How to keep a multipart local upload template from drifting over time

Template drift is one of the quiet costs in video systems. A small text size tweak here, a transition change there, a different CTA rhythm for one campaign, and soon the template is no longer a template. It is a bag of exceptions. The fix is to treat changes as deliberate revisions, not as random convenience edits.

In practical terms, keep a short change log. Note why the template changed, what visual behavior changed, and whether older outputs still need the previous version. Even a tiny log beats memory.

Simple template change log format

- starter_vertical_v1
  - purpose: short product teaser
  - updated: 2026-04-03
  - notable rules:
    - opener under 2 seconds
    - one headline overlay
    - captions optional

- starter_vertical_v2
  - purpose: same template with cleaner close
  - updated: 2026-04-10
  - notable changes:
    - wider CTA safe area
    - slower end fade
    - tighter caption line length

A release checklist for a multipart local upload update

Test one known-good input set.
Test one awkward but realistic input set, such as longer copy or a darker image.
Confirm the final output still matches the intended channel format.
Confirm the downstream consumer still receives the same key result fields.
Write down the update in the template notes before treating the change as complete.

Conclusion

Multipart mode is the practical bridge between local assets and a real render API. It lets you stay explicit, reproducible, and fast even when the media is not hosted yet.

If you later move those assets into stable storage, switch to the hosted media guide. If you want the same idea inside a workflow tool, continue with n8n, Make.com, or Zapier.

That is the practical bar for a good JSONClip workflow: easy to read, easy to rerun, easy to debug, and easy to hand off to the next person or the next automation layer.