FORMATFORGE // KNOWLEDGE_BASE

NFC vs NFD: Unicode Normalization Explained

Runs locally in your browser Updated: April 2026 No data upload required

Quick Answer: NFC vs NFD

Use NFC for storage, APIs, HTML, search indexing, and most production text handling. Use NFD only when you intentionally need decomposed characters for analysis, accent stripping, or compatibility with systems that store decomposed filenames. If you are unsure, choose NFC.

What Unicode Normalization Is

Unicode normalization converts equivalent strings into a consistent form so visually identical text behaves consistently in code, search, storage, and comparison. Without normalization, two strings that look the same on screen can fail equality checks because they are stored as different code-point sequences.

The most common comparison is NFC vs NFD. NFC keeps characters in composed form where possible. NFD decomposes them into base characters plus combining marks. Both are valid Unicode. They are not interchangeable unless you normalize them first.

Composed vs Decomposed: Visual Code Point Comparison

The character "e with acute accent" can be stored two ways in Unicode. Both render identically on screen, but they are different byte sequences.

Form Display Code Points Bytes (UTF-8)
NFC (composed) é U+00E9 (1 code point) C3 A9 (2 bytes)
NFD (decomposed) é U+0065 U+0301 (2 code points) 65 CC 81 (3 bytes)

More complex characters have larger differences. The Korean syllable "ga" in NFC is a single code point (U+AC00), while in NFD it decomposes to three code points (U+1100 U+1161). Characters with multiple combining marks, like "o with diaeresis and macron," can decompose to three or more code points.

NFC vs NFD Side by Side

Property NFC NFD
How é is stored Single composed code point Base letter + combining accent
Best default for web apps Yes No
Best for storage and indexing Yes Usually no
Best for decomposition work No Yes
String equality without normalization NFC != NFD even when the text looks identical

Comparison summary: NFC is the default form for storage, indexing, APIs, and most web application text. NFD is mainly useful when you intentionally need decomposed characters for analysis or accent stripping.

NFKC and NFKD: Compatibility Normalization

Beyond NFC and NFD, Unicode defines two compatibility forms: NFKC and NFKD. These replace compatibility characters with their canonical equivalents in addition to composing or decomposing.

Form Composition Compatibility Mapping Use Case
NFC Composed No General storage, APIs, web content
NFD Decomposed No Accent stripping, character analysis
NFKC Composed Yes Search indexing, username comparison, identifier matching
NFKD Decomposed Yes Full decomposition for analysis

Compatibility normalization folds characters that are semantically equivalent but visually distinct. Examples include:

Use NFKC when you need maximum folding for search or identity comparison. Do not use it for display or storage where the visual distinction matters, because the normalization is lossy and cannot be reversed.

When to Use NFC

When to Use NFD

Normalization in Databases

Most databases store Unicode text as raw bytes and do not normalize on write. This means your application is responsible for normalization.

Database Behavior Collation Notes
PostgreSQL Stores bytes as-is. ICU-based collations (available since v10) can perform normalization-aware comparison, but stored data is not normalized. Use CREATE COLLATION with ICU for accent-insensitive searches. Normalize to NFC in your application before writes.
MySQL Stores raw bytes with utf8mb4. Collations handle comparison but not storage normalization. utf8mb4_unicode_ci treats NFC and NFD equivalents as equal in comparisons, but UNIQUE constraints operate on stored bytes.
SQLite No built-in normalization. Binary comparison by default. Load the ICU extension for Unicode-aware collation, or normalize in your application.

Cross-Platform Normalization Differences

Operating systems handle filename normalization differently, which causes bugs in cross-platform applications and file-syncing tools.

Platform Filesystem Behavior Impact
macOS (HFS+/APFS) Converts filenames to a variant of NFD on disk. Filenames that are NFC-equivalent map to the same file. A file created as caf\u00e9.txt and one created as cafe\u0301.txt refer to the same file. Directory listings return the decomposed form.
Windows (NTFS) Preserves the original byte sequence. No normalization is applied. NFC and NFD filenames can coexist as separate files. Most Windows applications produce NFC text, so this rarely causes issues on Windows alone.
Linux (ext4, XFS, Btrfs) Filenames are opaque byte sequences. No normalization, no case folding. Two filenames that look identical but differ in normalization form are treated as entirely separate files. This causes sync conflicts with macOS.

If your application handles files across platforms, normalize filenames to NFC before storing paths in a database or performing lookups.

W3C and WHATWG Recommendations

The W3C Character Model specification recommends NFC for all web content. WHATWG specifications for HTML and the URL standard also assume NFC. Specifically:

When Not to Mix Forms

Do not keep mixed NFC and NFD strings in the same database, index, or comparison workflow. That creates subtle bugs: duplicate-looking values, failed lookups, incorrect length checks, and inconsistent slugs. Normalize at input boundaries and normalize again before equality-sensitive comparisons if necessary.

Common Bugs Caused by Mixed Forms

Bug What causes it Fix
Database lookup fails for accented text Stored values and query values use different forms Normalize both sides to NFC
Regex behaves strangely on accented words Combining marks are separate code points in NFD Normalize first or use Unicode-aware regex
Character counts look wrong NFD increases visible-character length at the code-point level Normalize to NFC before counting, then inspect with the Text Analysis Tool
Slug output changes between systems Different normalization forms reach the slug step Normalize to NFC before slug creation

Bug summary: Mixed normalization forms most often break lookup consistency, regex behavior on accented text, character counts, and slug output across systems.

Code Examples

JavaScript

const composed = "\u00E9";
const decomposed = "e\u0301";

console.log(composed === decomposed); // false
console.log(decomposed.normalize("NFC") === composed); // true

function normalizeForStorage(value) {
  return value.normalize("NFC").trim();
}

// NFKC for search indexing (folds ligatures, fullwidth, etc.)
function normalizeForSearch(value) {
  return value.normalize("NFKC").toLowerCase().trim();
}

Python

import unicodedata

composed = "\u00E9"
decomposed = "e\u0301"

print(composed == decomposed)  # False
print(unicodedata.normalize("NFC", decomposed) == composed)  # True


def normalize_for_storage(value):
    return unicodedata.normalize("NFC", value).strip()


# NFKC for search indexing
def normalize_for_search(value):
    return unicodedata.normalize("NFKC", value).casefold().strip()


# Accent stripping via NFD
def strip_accents(value):
    nfd = unicodedata.normalize("NFD", value)
    return ''.join(c for c in nfd if unicodedata.category(c) != 'Mn')

FAQ

Should I store text as NFC?

Yes, in almost all web, API, database, and search cases. NFC is the safe default for production text handling.

Does JavaScript normalize strings automatically?

No. You must call .normalize("NFC") or another normalization form explicitly.

When is NFD useful?

NFD is useful when you intentionally need decomposed characters, such as accent stripping or combining-mark inspection.

Can this affect slugs and search?

Yes. Mixed normalization forms can produce inconsistent slugs and reduce matching consistency in search and indexes.

What is the difference between NFC and NFKC?

NFC composes characters but preserves compatibility distinctions (ligatures, fullwidth forms, superscripts). NFKC applies compatibility decomposition first, folding those distinctions away. Use NFC for storage, NFKC for search indexing or identifier comparison.

Why do files from macOS cause normalization issues on Linux?

macOS stores filenames in a variant of NFD, while Linux treats filenames as raw byte sequences. A file created on macOS with an accented name arrives on Linux in NFD form. If your Linux code expects NFC, the filename will not match. Normalize filenames to NFC when reading from cross-platform sources.

Related Tools

Related Guides