JSON 5 min read Updated Jul 3, 2026

Storing Binary Data in JSON with Base64

JSON is a text format and cannot represent raw binary data. The standard solution is to Base64-encode the binary and store it as a string. Here is how it works, when to use it, and when to choose an alternative.

Contents

Why JSON cannot store binary natively
The Base64 approach: encode binary to string
Examples in JavaScript, Python, and Go
The 33% size overhead tradeoff
Alternatives: multipart, BSON, MessagePack, Protobuf
When to use Base64 in JSON vs alternatives

Why JSON cannot store binary natively

The JSON specification (RFC 8259) defines exactly six value types: object, array, string, number, boolean, and null. There is no byte-array type and no notion of "raw bytes." Every value is ultimately text built from Unicode characters, and a JSON document is meant to be a valid sequence of characters that can be transmitted, logged, and read as text.

This matters because arbitrary binary data — a PNG, a ZIP archive, an encrypted blob — contains byte values that have no meaning as text. Many of those bytes (0x00 through 0x1F) are control characters that JSON strings forbid, and a byte sequence interpreted as UTF-8 will frequently be invalid (for example, a lone continuation byte like 0x80 is not a legal UTF-8 sequence). If you tried to paste raw bytes into a JSON string, the parser would reject the document.

You could store bytes as a JSON array of numbers, like [137, 80, 78, 71], but this is wasteful: each byte expands to several characters plus a comma, often 3–4x the size, with no compatibility benefit. The community converged long ago on a single text-safe representation: Base64.

JSON strings must be valid Unicode. Embedding raw bytes — or even Latin-1 "binary-as-text" — risks data corruption the moment the payload passes through a parser, a database column with a different encoding, or a logging pipeline that normalizes text. Always encode binary explicitly.

The Base64 approach: encode binary to string

Base64 maps every 3 bytes of binary input to 4 ASCII characters drawn from a 64-character alphabet (A–Z, a–z, 0–9, +, and /, with = for padding). Every character in that alphabet is a safe, printable JSON string character, so the result drops cleanly into a JSON value with no escaping needed.

The typical pattern is to wrap the encoded data alongside metadata such as the filename and MIME type, so the receiver knows how to reconstruct the file:

{
  "filename": "logo.png",
  "contentType": "image/png",
  "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNk+M8AAAMCAYAAA..."
}

The data field is a plain string. The server receives standard JSON, decodes the Base64 string back into the original bytes, and writes the file. Because the binary travels inside an ordinary string, it works through every JSON-aware tool in the chain — REST APIs, message queues, document databases, and config files — without special handling.

If your Base64 string may end up in a URL or filename, use the URL-safe alphabet, which replaces + with - and / with _. Inside a JSON body the standard alphabet is fine, since + and / are valid in JSON strings.

Examples in JavaScript, Python, and Go

Below are complete round-trip examples — encoding bytes into a JSON payload and decoding them back — in three common server and client languages.

JavaScript (browser). The browser's btoa() works on binary strings, so an ArrayBuffer must first be turned into a byte string. For large buffers, chunk the conversion to avoid argument-length limits:

// Encode a Blob/File to a Base64 JSON field
async function fileToJson(file) {
  const buf = await file.arrayBuffer();
  const bytes = new Uint8Array(buf);
  let binary = "";
  const chunk = 0x8000;
  for (let i = 0; i < bytes.length; i += chunk) {
    binary += String.fromCharCode(...bytes.subarray(i, i + chunk));
  }
  return {
    filename: file.name,
    contentType: file.type,
    data: btoa(binary),
  };
}

// Decode back to bytes
function jsonToBytes(payload) {
  const binary = atob(payload.data);
  const bytes = new Uint8Array(binary.length);
  for (let i = 0; i < binary.length; i++) bytes[i] = binary.charCodeAt(i);
  return bytes;
}

Python. The base64 module operates on bytes and returns bytes, so decode to str before placing the value in a dict that json.dumps will serialize:

import base64, json

# Encode
with open("logo.png", "rb") as f:
    raw = f.read()

payload = {
    "filename": "logo.png",
    "contentType": "image/png",
    "data": base64.b64encode(raw).decode("ascii"),
}
body = json.dumps(payload)

# Decode
parsed = json.loads(body)
raw_again = base64.b64decode(parsed["data"])
assert raw_again == raw

Go. The standard library's encoding/json already Base64-encodes []byte fields automatically. A struct field of type []byte is marshaled to a Base64 string and unmarshaled back to bytes with no manual conversion:

package main

import (
    "encoding/json"
    "fmt"
)

type File struct {
    Filename    string `json:"filename"`
    ContentType string `json:"contentType"`
    Data        []byte `json:"data"` // auto Base64 in JSON
}

func main() {
    f := File{Filename: "logo.png", ContentType: "image/png", Data: []byte{0x89, 'P', 'N', 'G'}}
    b, _ := json.Marshal(f)
    fmt.Println(string(b)) // {"filename":"logo.png","contentType":"image/png","data":"iVBORw=="}

    var back File
    _ = json.Unmarshal(b, &back)
    fmt.Printf("%x\n", back.Data) // 89504e47
}

Go's automatic []byte handling is a quiet convenience: you never call an encoder yourself. If you need URL-safe output or a custom alphabet, marshal the field as a string and encode it with base64.URLEncoding manually.

The 33% size overhead tradeoff

Base64's core cost is fixed by its design: 3 bytes in, 4 characters out, a 4/3 ratio that adds roughly 33% size overhead before any other factors. A 1 MB file becomes about 1.37 MB of Base64 text. Padding adds 0–2 extra characters, and if your encoder inserts line breaks every 76 characters (MIME style), that is a small additional cost — though JSON payloads usually omit line wrapping.

On top of the raw 33%, the JSON wrapper itself contributes overhead: the field names, quotes, braces, and any string escaping. The escaping is usually negligible for a Base64 string because the Base64 alphabet contains no characters JSON must escape. The bigger hidden cost is memory: encoding and decoding typically materializes the entire blob plus its encoded form in memory at once, so a "small" 50 MB upload can briefly consume well over 100 MB on the server.

The 33% expansion is on the wire and in storage. If you persist Base64 in a database text column, every row carries that overhead permanently, plus you lose the ability to do byte-range reads. For anything large, store the raw bytes (or an object-storage URL) and Base64 only at the transport boundary if you must.

Compression partially offsets this. Base64 text compresses well with gzip or Brotli — often back down close to, though not below, the original binary size — because the encoding is regular. If your transport already applies content compression (most HTTP stacks do), the on-the-wire penalty of Base64 is smaller than the raw 33% suggests, but the CPU and memory costs of encoding remain.

Alternatives: multipart/form-data, BSON, MessagePack, Protocol Buffers

Base64-in-JSON is convenient but not always the right tool. Several alternatives carry binary natively, eliminating the 33% expansion:

multipart/form-data — The browser-native, HTTP-standard way to upload files. Binary parts are sent as raw bytes with their own headers, so there is no encoding overhead. JSON metadata can ride along as additional form fields. This is the default choice for file uploads from web clients.
BSON — "Binary JSON," used by MongoDB. It is a binary serialization of a JSON-like document model with a first-class binary type, so byte arrays are stored directly. BSON is slightly larger than JSON for text-heavy data but far more efficient for binary fields.
MessagePack — A compact binary serialization that mirrors the JSON data model (objects, arrays, strings, numbers) and adds native bin and ext types. It is typically smaller and faster to parse than JSON, and binary needs no Base64. Good drop-in when both ends control the format.
Protocol Buffers — Google's schema-driven binary format. Fields of type bytes are stored raw, and the schema (.proto) gives strong typing and forward/backward compatibility. Best for high-throughput service-to-service RPC (often via gRPC), at the cost of needing a shared schema and codegen.

The common thread: each of these can transmit bytes as bytes. JSON's limitation is specifically that it is a text format. If you are free to move off plain JSON, you can avoid the encoding tax entirely.

You do not have to choose globally. A common hybrid is a JSON metadata document plus a separate binary channel — multipart parts, a second request to object storage, or a binary side-stream — keeping your primary API JSON while large blobs travel raw.

When to use Base64 in JSON vs alternatives

Base64-in-JSON earns its keep when simplicity and a uniform text pipeline outweigh raw efficiency. Reach for it when:

Your API is strictly JSON. Many platforms, webhooks, and serverless functions accept only application/json. Base64 lets you include binary without a second content type or endpoint.
The payload is small. For files under roughly 1 MB — icons, thumbnails, signatures, short audio clips — the 33% overhead is immaterial and inlining keeps the request atomic.
Atomicity matters. When the binary must arrive together with related fields in a single, all-or-nothing document (a record plus its attachment), embedding avoids the coordination problems of separate uploads.
Human-readability or debuggability helps. A JSON payload you can log, diff, and replay end-to-end is easier to operate than a binary protocol, even at some size cost.

Choose an alternative when the binary is large or performance-critical. Use multipart/form-data for browser file uploads; use BSON when MongoDB is already in the path; use MessagePack for compact internal messaging where both ends agree on the format; and use Protocol Buffers for typed, high-volume RPC. The decisive questions are: how big is the blob, do both ends control the wire format, and does the rest of your stack expect plain JSON?

In practice many teams settle on a clean split — small binaries inline as Base64 for convenience, large binaries to object storage with only a URL or key in the JSON. That keeps the API JSON-clean while sidestepping the worst of the overhead.

FAQ

How do I send a binary file as JSON?

Base64-encode the file contents and include the string in your JSON payload: { "filename": "image.png", "data": "iVBORw0KGgo..." }. In JavaScript: const data = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer))). In Python: import base64; data = base64.b64encode(file_bytes).decode()

How much larger is binary data when Base64-encoded in JSON?

Base64 adds approximately 33% overhead — 3 bytes of binary become 4 ASCII characters. A 1 MB file becomes ~1.37 MB as a Base64 string in JSON. The JSON wrapper itself adds additional overhead. For large files, multipart/form-data is more efficient.

Should I use Base64 or multipart/form-data for file uploads?

Use multipart/form-data for file uploads — it is more bandwidth-efficient (no Base64 overhead) and is the browser native standard. Use Base64 in JSON when your API is strictly JSON-only, when the file is small (under ~1 MB), or when you need to include the file alongside other JSON fields atomically.

Try Base64 encoding and decoding instantly

Paste any string or file — base64.dev auto-detects and converts it instantly.

Open base64.dev →