Base64 vs Base32, Base58 & Base85

Not all binary-to-text encodings are the same. Base64, Base32, Base58, and Base85 each make different tradeoffs between size efficiency, readability, and the characters they use. Here is when to use each.

Why binary-to-text encodings exist

Computers store everything as bytes — sequences of 8 bits that can take any of 256 values. But many systems that move data around were designed only to handle text. Email bodies, URLs, JSON strings, XML attributes, and source code files all expect printable characters, and some treat certain bytes (null, newline, control codes) as special or simply corrupt them in transit. Sending raw binary through these channels is unreliable.

Binary-to-text encoding solves this by mapping arbitrary bytes onto a restricted, safe alphabet of printable characters. The cost is size: representing 8 bits of information using characters that each only carry a few bits of usable entropy always inflates the output. The question every encoding answers is which tradeoff to make — fewer, safer characters that are easy for humans to handle, or a denser alphabet that keeps the output small.

The math is straightforward. An alphabet of N symbols carries log2(N) bits per character. To encode 8 bits you need 8 / log2(N) characters per byte, and the overhead is how much larger the text is than the original binary:

Alphabet size   Bits/char   Chars per byte   Overhead
─────────────────────────────────────────────────────
16  (hex)       4.00        2.00             100%
32  (base32)    5.00        1.60             ~60%
58  (base58)    5.86        1.37             ~37%
64  (base64)    6.00        1.33             ~33%
85  (base85)    6.41        1.25             ~25%

Larger alphabets are more efficient, but they pull in more punctuation and case-sensitive characters that can clash with the surrounding format. The four encodings below sit at different points on that curve.

Base64 — the standard

Base64 is the default binary-to-text encoding of the modern web. It uses a 64-character alphabet — A-Z, a-z, 0-9, plus + and / — and an optional = for padding. Because 64 is a power of two, each character maps cleanly to exactly 6 bits. Base64 groups the input into 3-byte (24-bit) chunks and emits 4 characters per chunk, giving the familiar ~33% size overhead defined in RFC 4648.

Input:  "Man" = [0x4D, 0x61, 0x6E] = 24 bits
Split:  010011 010110 0001 011011 10
        →  77   →  22   →  5   →  46  (6-bit values, padded)
Map:     T       W       F      u
Output: "TWFu"   (3 bytes → 4 chars)

When the input length is not a multiple of 3, Base64 pads the final group with = so the output length is always a multiple of 4. Almost every language ships a built-in implementation:

// JavaScript
btoa("Hi")                      // => "SGk="
atob("SGk=")                    // => "Hi"

# Python
import base64
base64.b64encode(b"Hi")         # => b'SGk='
base64.b64decode(b"SGk=")       # => b'Hi'

The + and / characters are unsafe in URLs and filenames. RFC 4648 defines a "URL-safe" variant that swaps them for - and _. JWTs use this variant and also strip the = padding entirely.

Base64 wins by ubiquity, not by efficiency. It is the format you reach for when you need something that works everywhere with zero extra dependencies: data URIs, email attachments (MIME), HTTP Basic Auth headers, JSON payloads carrying binary blobs, and JWT segments all rely on it.

Base32 (RFC 4648) — human-readable and case-insensitive

Base32 trades size for legibility. It uses a 32-character alphabet — the uppercase letters A-Z and the digits 2-7 — deliberately excluding 0, 1, and 8 because they are easily confused with O, I/L, and B. Each character encodes 5 bits, so Base32 groups input into 5-byte (40-bit) chunks and emits 8 characters per chunk, padding with = as needed. The result is about 60% larger than the original binary.

Base32 alphabet (RFC 4648):
ABCDEFGHIJKLMNOPQRSTUVWXYZ234567

Input:  "Hi"
Output: "JBSQ===="   (note 5-char-aligned padding)

The payoff is that Base32 is case-insensitive and free of punctuation, so it survives being read aloud, typed by hand, or printed on paper without ambiguity. Because of this it shows up wherever a human is in the loop. The classic example is TOTP, the algorithm behind Google Authenticator and similar apps — the shared secret in an otpauth:// URI is Base32-encoded:

# Python — generate a TOTP secret in Base32
import base64, os
secret = base64.b32encode(os.urandom(20)).decode()
# => e.g. "JBSWY3DPEHPK3PXP..." (only A-Z and 2-7)

Base32 is also used in DNS-based systems and onion service addresses (Tor v3 addresses are Base32), the NSEC3 records in DNSSEC, and various IETF protocols where data must pass through case-folding or case-insensitive layers. It is the encoding to choose when output may be transcribed by a person.

Base32 is case-insensitive on decode but conventionally emitted in uppercase. Some implementations also offer an "extended hex" variant (Base32hex, alphabet 0-9A-V) used in DNSSEC — it is not interchangeable with standard Base32. Always confirm which variant a system expects.

Base58 — no 0/O/I/l confusion

Base58 was introduced by Satoshi Nakamoto for Bitcoin addresses. Its design goal is purely human ergonomics: take a Base62-ish alphabet but remove the four characters that are hardest to tell apart in common fonts — the digit 0, the capital O, the capital I, and the lowercase l. It also drops + and / so the output is safe to paste into URLs and double-click to select as a single token.

Base58 alphabet (Bitcoin):
123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz
(no 0, O, I, or l)

Unlike Base64 and Base32, 58 is not a power of two, so there is no clean bit-to-character mapping. Base58 instead treats the entire input as one big integer and repeatedly divides it by 58, much like converting a number to a different radix. This makes encoding and decoding slower and means the output length is not perfectly predictable, but the resulting strings are remarkably tidy.

// JavaScript using the `bs58` package
import bs58 from "bs58";
const bytes = Buffer.from("hello world");
bs58.encode(bytes);             // => "StV1DL6CwTryKyV"
bs58.decode("StV1DL6CwTryKyV"); // => <Buffer 68 65 6c 6c 6f ...>

In practice Base58 almost always appears as Base58Check, which appends a 4-byte truncated double-SHA-256 checksum before encoding. This lets a wallet reject a mistyped address instead of sending funds into the void. Beyond Bitcoin, Base58 is used in IPFS content identifiers (the older CIDv0 format), Monero and many other cryptocurrencies, and Solana public keys. It is the right choice when a value will be copied, typed, or read by people and a single wrong character must never silently succeed.

There is no single official Base58 alphabet. Bitcoin, Ripple (XRP), and Flickr each use different orderings. bs58 defaults to the Bitcoin alphabet — decoding a Ripple address with it will produce garbage.

Base85 (ASCII85) — the most compact

Base85 pushes the alphabet as large as is practical for ASCII while keeping the encoding sane. With 85 symbols each character carries about 6.41 bits, so Base85 groups input into 4-byte (32-bit) chunks and emits just 5 characters per chunk — only ~25% overhead, the tightest of any common scheme. The reason 85 is the magic number: 85^5 = 4,437,053,125, which is the smallest power of an integer that is at least 2^32 = 4,294,967,296, so five characters can always represent a 32-bit group.

Encoding a 4-byte group:
1. Treat the 4 bytes as one 32-bit unsigned integer N.
2. Compute five base-85 digits: N = (((d0*85 + d1)*85 + d2)*85 + d3)*85 + d4
3. Map each digit (0-84) to a printable ASCII char, starting at '!' (0x21).

The classic ASCII85 alphabet runs from ! (value 0) through u (value 84) — 85 consecutive printable characters. As a space optimization, a group of four zero bytes is written as a single z instead of !!!!!. Python's standard library implements it directly:

# Python
import base64
base64.a85encode(b"Man is distinguished")
# => b'9jqo^BlbD-BleB1DJ+*+F(f'
base64.b85encode(b"hi")   # => b'Xk' (RFC 1924 / Z85-style variant)

Base85 is used where every byte counts but the data still has to be text. Adobe PDF uses ASCII85 to embed binary streams, PostScript uses it, and Git's binary patch format (git diff --binary) uses a Base85 variant to keep binary deltas compact. The ZeroMQ project defined Z85, a Base85 alphabet chosen so the output is safe inside source code and string literals.

Base85 has several incompatible variants — Adobe ASCII85 (delimited by <~ and ~>), RFC 1924, and ZeroMQ Z85 — that use different alphabets and rules for whitespace and the z shortcut. It also includes characters like ", ', \, and < that break HTML, JSON, and XML, so it is a poor fit for the web despite its efficiency.

Comparison table

Side by side, the four encodings trade efficiency against human-friendliness in a predictable way — smaller alphabets are easier for people, larger alphabets are smaller on the wire.

Encoding   Alphabet size   Overhead   Padding   Group size      Case
──────────────────────────────────────────────────────────────────────────
Base32     32 (A-Z 2-7)    ~60%       Yes (=)   5 bytes → 8     Insensitive
Base58     58 (no 0OIl)    ~37%       No        big-integer     Sensitive
Base64     64 (A-Za-z0-9   ~33%       Yes (=)   3 bytes → 4     Sensitive
                  + /)
Base85     85 (!..u)       ~25%       No        4 bytes → 5     Sensitive

Choosing between them comes down to the channel and the audience:

  • Base64 — the safe default. Use it for data URIs, JWTs, email/MIME, HTTP headers, and any binary blob inside JSON. Universal support, no dependencies.
  • Base32 — when a human types or reads the value: TOTP secrets, onion addresses, recovery codes, and case-insensitive identifiers.
  • Base58 — when a value is copied by hand and a typo must never silently succeed: cryptocurrency addresses and keys, IPFS CIDv0. Almost always paired with a checksum (Base58Check).
  • Base85 — when size matters and the channel tolerates a wide character set: PDF/PostScript streams, Git binary patches, and ZeroMQ keys (Z85). Avoid it in HTML, JSON, and URLs.

Encoding never provides confidentiality — all four schemes are fully reversible without a key. If you need to protect data, encrypt it first and then encode the ciphertext for transport.


FAQ

What is the difference between Base64 and Base32?

Base32 uses only 32 characters (A-Z and 2-7) and is case-insensitive, making it easier for humans to read and type. The tradeoff is 60% size overhead vs Base64's 33%. Base32 is used in TOTP (authenticator apps) and IETF standards.

Why does Bitcoin use Base58 instead of Base64?

Base58 removes characters that look similar (0 and O, I and l) and avoids + and / which have special meaning in URLs. This makes Bitcoin addresses easier to type and copy without errors.

Which encoding is most space-efficient?

Base85 (ASCII85) is the most efficient, with only 25% overhead. Base64 has 33% overhead. Base32 has 60% overhead. For most web use cases, Base64 is the standard due to wide support despite not being the smallest.

Try Base64 encoding and decoding instantly

Paste any string or file — base64.dev auto-detects and converts it instantly.

Open base64.dev →