FORMATFORGE // KNOWLEDGE_BASE

Regex Basics: Regular Expressions for Beginners

Runs locally in your browser Updated: April 2026 No data upload required

Regex Basics for Real Work

Regular expressions are not useful because they look clever. They are useful because they help you find, extract, validate, or clean text patterns quickly. The trick is to keep the pattern as small and explicit as possible for the job you actually need to do.

Three Core Building Blocks

Concept Example What it does
Character class [a-z] Matches one lowercase letter
Quantifier \d{4} Matches exactly four digits
Anchor ^error Matches "error" only at the start of the line

Character Classes in Detail

Character classes define which characters are allowed at a given position. Regex engines provide predefined shorthand classes for the most common sets, and you can build custom ranges for anything else.

Predefined Character Classes

Shorthand Equivalent Matches
\d [0-9] Any digit
\D [^0-9] Any non-digit
\w [a-zA-Z0-9_] Word character (letter, digit, underscore)
\W [^a-zA-Z0-9_] Non-word character
\s [ \t\n\r\f\v] Any whitespace character
\S [^ \t\n\r\f\v] Any non-whitespace character
. Almost everything Any character except newline (unless dotall mode is on)

Custom Ranges

Square brackets let you define exactly which characters to allow. A caret inside the brackets negates the set.

Quantifier Cheat Sheet

Quantifiers control how many times the preceding element must appear. By default they are greedy, meaning they match as much text as possible. Append ? to make any quantifier lazy (match as little as possible).

Quantifier Meaning Example Matches
* Zero or more go*d "gd", "god", "good", "goood"
+ One or more go+d "god", "good", "goood" (not "gd")
? Zero or one colou?r "color", "colour"
{n} Exactly n \d{4} Exactly four digits like "2024"
{n,} n or more \w{3,} Words with three or more characters
{n,m} Between n and m [a-z]{2,5} Two to five lowercase letters

Grouping and Capturing

Parentheses serve two purposes: they group elements for quantifiers, and they capture the matched text so you can reference it later (in replacements or in code).

Capturing Groups

// Pattern: (\d{4})-(\d{2})-(\d{2})
// Input:   2024-03-15
// Group 1: 2024
// Group 2: 03
// Group 3: 15

Each pair of parentheses creates a numbered group. Group 0 is always the entire match. In a replacement string, you reference groups with $1, $2, etc. (or \1, \2 depending on the engine).

Non-Capturing Groups

When you need grouping for alternation or quantifiers but do not need the captured value, use (?:...). This avoids polluting your group numbering and is slightly faster.

// Capturing:     (https?|ftp)://
// Non-capturing:  (?:https?|ftp)://

Lookahead and Lookbehind

Lookarounds assert that something exists before or after the current position without including it in the match. They are zero-width: they check a condition but consume no characters.

Type Syntax Meaning
Positive lookahead X(?=Y) Match X only if followed by Y
Negative lookahead X(?!Y) Match X only if NOT followed by Y
Positive lookbehind (?<=Y)X Match X only if preceded by Y
Negative lookbehind (?<!Y)X Match X only if NOT preceded by Y

A practical example: \d+(?= USD) matches a number only when it appears before " USD", so in "Price: 150 USD" it matches "150" but not in "150 items".

Useful Beginner Patterns

How to Practice Safely

Test the pattern against a tiny sample first, then expand to more realistic input. Use the Regex Tester Online to see what matches and how the pattern behaves before you run it on full logs, exports, or copied content.

Practical Exercises

Try these challenges in the Regex Tester. Solutions are hidden below each one.

Challenge 1: Extract Prices

Given the text "Items cost $12.99, $3.50, and $149.00", write a pattern that captures all dollar amounts including the decimal portion.

Solution
\$\d+\.\d{2}

The escaped \$ matches a literal dollar sign. \d+ matches one or more digits before the decimal, and \.\d{2} matches the period plus exactly two decimal digits.

Challenge 2: Validate a Username

Match usernames that are 3 to 16 characters long and contain only letters, digits, and underscores. The first character must be a letter.

Solution
^[a-zA-Z]\w{2,15}$

The anchor ^ ensures we start at the beginning. [a-zA-Z] forces a letter first. \w{2,15} allows 2 to 15 more word characters, giving a total length of 3 to 16. The $ anchor prevents trailing characters.

Challenge 3: Find Lines Without a Keyword

Match entire lines that do NOT contain the word "error" (case-insensitive).

Solution
^(?!.*error).*$

The negative lookahead (?!.*error) at the start of the line asserts that "error" does not appear anywhere on the line. If the assertion passes, .*$ matches the full line. Use the i flag for case insensitivity.

Challenge 4: Swap First and Last Name

Given "Doe, Jane", rearrange to "Jane Doe" using a replacement pattern.

Solution
// Pattern: (\w+),\s*(\w+)
// Replacement: $2 $1

Group 1 captures the last name, group 2 captures the first name. The replacement reverses the order and drops the comma.

Common Beginner Mistakes

Frequently Asked Questions

What is the difference between greedy and lazy matching?

A greedy quantifier like .* consumes as much text as possible while still allowing the overall pattern to match. A lazy quantifier like .*? consumes as little as possible. For example, given <b>one</b> and <b>two</b>, the greedy pattern <b>.*</b> matches everything from the first <b> to the last </b>, while the lazy <b>.*?</b> matches each tag pair individually.

When should I use non-capturing groups?

Use (?:...) when you need to group for alternation or repetition but do not need the matched text for back-references or replacements. This keeps your group numbering clean and avoids a minor performance cost in engines that store captured content.

Are lookaheads supported in all regex engines?

Positive and negative lookaheads are widely supported in JavaScript, Python, Java, .NET, and most modern engines. Lookbehinds have broader support now but some engines (older JavaScript versions before ES2018) do not support them. Always test in your target environment.

How do I match a literal special character like a dot or bracket?

Precede the character with a backslash: \. matches a literal period, \[ matches a literal bracket. Inside a character class, most special characters lose their meaning, so [.] also matches a literal dot.

Can I use regex to parse full HTML or XML documents?

No. Regex cannot handle nested, recursive structures reliably. For quick extraction of a single tag or attribute in known-clean markup, a simple pattern works. For anything more complex, use a proper parser. The HTML Cleaner handles stripping and sanitization without regex.

Next Step

Once you understand the basics, move to the regex debugging guide for greedy matches, multiline input, escaping problems, and performance issues. For a broader theory overview, see the regular expressions explained guide.

Related Tools

Related Guides