Base64 Encoding and Decoding: A Complete Developer's Guide

Open any JWT, poke around a CSS file embedding a tiny icon, or look inside an email's raw MIME source — you'll find Base64 everywhere. It's one of those encoding schemes that every developer encounters daily but surprisingly few fully understand at the byte level. This guide covers exactly what Base64 is, how the algorithm works step-by-step, which variant to use in each context, and the real pitfalls that trip up even experienced developers.

The Problem Base64 Was Designed to Solve

Binary data — images, audio, compiled executables, cryptographic keys — is a sequence of raw bytes spanning values from 0 to 255. Most text-based protocols and storage systems were designed to handle printable ASCII characters: bytes in the range 32 to 126. When these systems encounter raw binary, they may interpret certain byte values as control characters, truncate on null bytes, corrupt data at line endings, or simply reject the input as invalid.

Email (SMTP) is the original example. The protocol was designed for 7-bit ASCII text and had no mechanism for binary attachments. Developers needed a way to take arbitrary binary data and represent it using only the safe printable characters that every ASCII-aware system would faithfully transmit. Base64 — introduced formally in RFC 1421 in 1993 and later standardized in RFC 4648 — is that representation. The name comes from the fact that it uses 64 characters from the ASCII set as its alphabet.

The Base64 Alphabet

The standard Base64 alphabet consists of exactly 64 characters:

Uppercase letters A through Z — values 0 to 25
Lowercase letters a through z — values 26 to 51
Digits 0 through 9 — values 52 to 61
+ — value 62
/ — value 63

Each of these 64 characters can represent a 6-bit value (since 2⁶ = 64). The padding character = is not part of the alphabet proper — it is used to pad the output to a multiple of 4 characters when the input length is not a multiple of 3 bytes.

How the Encoding Algorithm Works

The core algorithm takes 3 bytes of input (24 bits) and produces 4 Base64 characters (24 bits spread across 4 × 6-bit groups). Here is a step-by-step example encoding the string Man:

Start with the ASCII byte values: M = 77, a = 97, n = 110. In binary:

M         a         n
01001101  01100001  01101110

Concatenate those 24 bits into one stream, then split into four 6-bit groups:

010011  010110  000101  101110
  19      22       5      46

Look up each 6-bit value in the Base64 alphabet: 19 → T, 22 → W, 5 → F, 46 → u. Result: TWFu. You can verify this in your browser console: btoa("Man") returns exactly "TWFu".

Padding with =

When the input is not a multiple of 3 bytes, Base64 pads the output so it remains a multiple of 4 characters. If there is 1 remaining byte (8 bits), it is zero-padded to 12 bits, producing 2 Base64 characters followed by ==. If there are 2 remaining bytes (16 bits), they are zero-padded to 18 bits, producing 3 Base64 characters followed by =. Decoders use the padding characters to know how many meaningful bytes the final group contains.

btoa("M")    // "TQ=="  — 1 byte → 2 chars + ==
btoa("Ma")   // "TWE="  — 2 bytes → 3 chars + =
btoa("Man")  // "TWFu"  — 3 bytes → 4 chars, no padding

Base64 vs Base64URL

The standard Base64 alphabet includes + and /. Both of these characters have special meanings in URLs: + encodes a space in form data, and / is the path separator. If you embed standard Base64 in a URL query parameter or path segment, these characters will either be misinterpreted or need percent-encoding, which inflates the size and makes the string harder to handle.

Base64URL (defined in RFC 4648 §5) solves this by substituting two characters: - replaces +, and _ replaces /. Padding characters are typically omitted as well. The result is a string that is safe to include in URLs and filenames without any additional encoding. JWTs use Base64URL exclusively — the three dot-separated segments of a JWT are each Base64URL-encoded, not standard Base64. This is a common source of confusion when developers try to decode a JWT segment using a standard Base64 decoder and get an error on tokens that happen to contain - or _.

Common Use Cases

Data URIs for Images

A data URI embeds file content directly in HTML or CSS, eliminating an HTTP request. The format is data:[mimetype];base64,[encoded-data]. For small icons and decorative images, this can improve first-paint performance because the image data arrives with the HTML document. However, Base64 encoding inflates size by 33%, and the data URI cannot be cached separately by the browser. It is best reserved for images smaller than roughly 4KB.

Email MIME Attachments

The original use case. MIME (Multipurpose Internet Mail Extensions) defines how email clients encode attachments for transmission over SMTP. The Content-Transfer-Encoding: base64 header tells the receiving mail client that the attachment body is Base64-encoded and must be decoded before use.

Basic Authentication Headers

HTTP Basic Auth encodes credentials as username:password in Base64 and transmits them in the Authorization header: Authorization: Basic dXNlcjpwYXNz. Critically, this is encoding, not encryption. The credentials are trivially recoverable by anyone who sees the header. Basic Auth should only ever be used over HTTPS, and even then it is generally inferior to token-based authentication.

JWT Tokens

As mentioned, JWTs use Base64URL to encode their header and payload segments. This allows the JSON claims inside a JWT to be read by any client without a decryption key, because the payload is encoded, not encrypted. The signature segment provides integrity, not confidentiality.

Storing Binary Data in JSON

JSON has no native binary type. When an API needs to include binary data — a thumbnail image, a cryptographic nonce, a file hash — in a JSON response, Base64 encoding the bytes to a string is the standard approach. The alternative, an array of byte integers, is verbose and harder to work with in most languages.

Encoding in JavaScript

btoa and atob: The Browser APIs

Browsers provide btoa() (binary to ASCII, i.e., encode to Base64) and atob() (ASCII to binary, i.e., decode from Base64). They are simple and require no imports:

btoa("Hello, World!")  // "SGVsbG8sIFdvcmxkIQ=="
atob("SGVsbG8sIFdvcmxkIQ==")  // "Hello, World!"

The critical limitation: btoa() only handles strings where every character is in the Latin-1 range (code points 0–255). Pass a string with any Unicode character above U+00FF — an emoji, a Chinese character, an em dash — and you get a InvalidCharacterError. The fix is to first encode the string to UTF-8 bytes before Base64-encoding:

// Encode UTF-8 string to Base64
function toBase64(str) {
  return btoa(
    encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, (_, p1) =>
      String.fromCharCode(parseInt(p1, 16))
    )
  );
}

// Decode Base64 back to UTF-8 string
function fromBase64(b64) {
  return decodeURIComponent(
    atob(b64)
      .split("")
      .map(c => "%" + c.charCodeAt(0).toString(16).padStart(2, "0"))
      .join("")
  );
}

In modern browsers and Node.js 16+, a cleaner alternative is available using TextEncoder and Uint8Array, or simply using the Node.js Buffer API when running server-side.

Node.js: Buffer API

In Node.js, the Buffer class handles Base64 encoding cleanly and correctly handles Unicode:

// Encode string to Base64
const encoded = Buffer.from("Hello, 世界").toString("base64");
// "SGVsbG8sIOS4lueVjA=="

// Decode Base64 to string
const decoded = Buffer.from("SGVsbG8sIOS4lueVjA==", "base64").toString("utf8");
// "Hello, 世界"

// Base64URL variant
const urlSafe = Buffer.from("Hello+/World").toString("base64url");

Encoding in Python

Python's standard library includes the base64 module. The key functions are b64encode() and b64decode(), which operate on bytes objects. For URL-safe encoding, use urlsafe_b64encode() and urlsafe_b64decode():

import base64

# Encode
data = "Hello, World!".encode("utf-8")
encoded = base64.b64encode(data)
print(encoded)  # b'SGVsbG8sIFdvcmxkIQ=='

# Decode
decoded = base64.b64decode(encoded).decode("utf-8")
print(decoded)  # Hello, World!

# URL-safe variant
url_safe = base64.urlsafe_b64encode(data)
print(url_safe)  # b'SGVsbG8sIFdvcmxkIQ=='  (same here, no + or / in this input)

# Decode with padding tolerance
base64.b64decode(encoded + b"==")

A common pitfall in Python: b64decode() will raise a binascii.Error if the input has incorrect padding. When decoding Base64URL strings that have omitted padding, add the padding back before decoding: s += "=" * (-len(s) % 4).

The 33% Size Overhead — When It Matters

Every 3 bytes of input becomes 4 bytes of output. That is a 33% size increase. For small payloads this is negligible, but at scale it compounds significantly:

A 1 MB image embedded as a Base64 data URI becomes approximately 1.37 MB. If you serve this on a page to 100,000 users per day, you've added 37 GB of bandwidth per day purely from the encoding overhead.
API endpoints that accept or return large binary payloads as Base64 JSON fields significantly increase both payload size and serialization/deserialization time compared to raw binary endpoints using multipart/form-data or application/octet-stream.
JWT tokens grow proportionally with the payload. Keep JWT payloads small — avoid embedding large claim sets or redundant data.

For large binary transfers, prefer sending binary directly with the correct MIME type rather than Base64-encoding it inside JSON. Reserve Base64 for contexts where binary is genuinely not supported: JSON fields, HTTP headers, CSS data URIs, and text-only storage systems.

Common Mistakes

Double-Encoding

This is the most frequent Base64 bug. You encode a value, store it, then encode it again when reading it out because you forgot it was already encoded. The result is valid Base64 that decodes to another valid Base64 string instead of your original data. Debugging this is confusing because the encoded string looks perfectly valid at every stage. The fix is to establish clear conventions: encode once at the boundary where binary meets text, decode once at the boundary where text meets binary.

Wrong Variant: Standard vs URL-Safe

If you decode a Base64URL string using a standard Base64 decoder, it will fail or produce garbage whenever the string contains - or _ (which are not in the standard alphabet). Always know which variant you are dealing with. JWTs are always Base64URL. File attachments are standard Base64. When in doubt, check the RFC or the spec of the system producing the encoded data.

Missing Padding on Decode

Some Base64URL implementations strip padding. Standard Base64 decoders require padding. If you receive a Base64URL string and try to decode it with a standard library, you may get an error even though the data is valid. The solution is to re-add padding before decoding: a Base64 string's length must be a multiple of 4, so append = characters until it is.

Security: Base64 Is Not Encryption

This bears repeating because it causes real security incidents annually. Base64 is an encoding scheme, not a cryptographic primitive. It provides zero confidentiality. Anyone who sees a Base64-encoded string can decode it to the original bytes in milliseconds with freely available tools. There is no key, no secret, no protection of any kind.

Treating Base64 as obfuscation is security theater. Credentials, passwords, API keys, and sensitive data stored or transmitted as Base64 are effectively in plaintext. The correct tools for confidentiality are encryption algorithms: AES for symmetric encryption, RSA or ECDSA for asymmetric. Base64 is only ever an encoding layer on top of already-protected data — not the protection itself.

Use Tanvrit's Base64 encoder/decoder to quickly encode or decode strings right in your browser — no data ever leaves your machine. Open the Base64 Tool →

Friendly

Mandee

Swyft

School

Automator