Regex Basics: Regular Expressions for Beginners

Regex Basics for Real Work

Regular expressions are not useful because they look clever. They are useful because they help you find, extract, validate, or clean text patterns quickly. The trick is to keep the pattern as small and explicit as possible for the job you actually need to do.

Three Core Building Blocks

Concept	Example	What it does
Character class	`[a-z]`	Matches one lowercase letter
Quantifier	`\d{4}`	Matches exactly four digits
Anchor	`^error`	Matches "error" only at the start of the line

Character Classes in Detail

Character classes define which characters are allowed at a given position. Regex engines provide predefined shorthand classes for the most common sets, and you can build custom ranges for anything else.

Predefined Character Classes

Shorthand	Equivalent	Matches
`\d`	`[0-9]`	Any digit
`\D`	`[^0-9]`	Any non-digit
`\w`	`[a-zA-Z0-9_]`	Word character (letter, digit, underscore)
`\W`	`[^a-zA-Z0-9_]`	Non-word character
`\s`	`[ \t\n\r\f\v]`	Any whitespace character
`\S`	`[^ \t\n\r\f\v]`	Any non-whitespace character
`.`	Almost everything	Any character except newline (unless dotall mode is on)

Custom Ranges

Square brackets let you define exactly which characters to allow. A caret inside the brackets negates the set.

[aeiou] — matches any lowercase vowel
[A-Fa-f0-9] — matches a hexadecimal digit
[^0-9] — matches anything that is not a digit
[a-zA-Z] — matches any ASCII letter regardless of case

Quantifier Cheat Sheet

Quantifiers control how many times the preceding element must appear. By default they are greedy, meaning they match as much text as possible. Append ? to make any quantifier lazy (match as little as possible).

Quantifier	Meaning	Example	Matches
`*`	Zero or more	`go*d`	"gd", "god", "good", "goood"
`+`	One or more	`go+d`	"god", "good", "goood" (not "gd")
`?`	Zero or one	`colou?r`	"color", "colour"
`{n}`	Exactly n	`\d{4}`	Exactly four digits like "2024"
`{n,}`	n or more	`\w{3,}`	Words with three or more characters
`{n,m}`	Between n and m	`[a-z]{2,5}`	Two to five lowercase letters

Grouping and Capturing

Parentheses serve two purposes: they group elements for quantifiers, and they capture the matched text so you can reference it later (in replacements or in code).

Capturing Groups

// Pattern: (\d{4})-(\d{2})-(\d{2})
// Input:   2024-03-15
// Group 1: 2024
// Group 2: 03
// Group 3: 15

Each pair of parentheses creates a numbered group. Group 0 is always the entire match. In a replacement string, you reference groups with $1, $2, etc. (or \1, \2 depending on the engine).

Non-Capturing Groups

When you need grouping for alternation or quantifiers but do not need the captured value, use (?:...). This avoids polluting your group numbering and is slightly faster.

// Capturing:     (https?|ftp)://
// Non-capturing:  (?:https?|ftp)://

Lookahead and Lookbehind

Lookarounds assert that something exists before or after the current position without including it in the match. They are zero-width: they check a condition but consume no characters.

Type	Syntax	Meaning
Positive lookahead	`X(?=Y)`	Match X only if followed by Y
Negative lookahead	`X(?!Y)`	Match X only if NOT followed by Y
Positive lookbehind	`(?<=Y)X`	Match X only if preceded by Y
Negative lookbehind	`(?<!Y)X`	Match X only if NOT preceded by Y

A practical example: \d+(?= USD) matches a number only when it appears before " USD", so in "Price: 150 USD" it matches "150" but not in "150 items".

Useful Beginner Patterns

Email-like match: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
ISO date: \d{4}-\d{2}-\d{2}
Simple HTML tag match: <[^>]+>
US phone number: $?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4}
IPv4 address: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

How to Practice Safely

Test the pattern against a tiny sample first, then expand to more realistic input. Use the Regex Tester Online to see what matches and how the pattern behaves before you run it on full logs, exports, or copied content.

Practical Exercises

Try these challenges in the Regex Tester. Solutions are hidden below each one.

Challenge 1: Extract Prices

Given the text "Items cost $12.99, $3.50, and $149.00", write a pattern that captures all dollar amounts including the decimal portion.

Solution

\$\d+\.\d{2}

The escaped \$ matches a literal dollar sign. \d+ matches one or more digits before the decimal, and \.\d{2} matches the period plus exactly two decimal digits.

Challenge 2: Validate a Username

Match usernames that are 3 to 16 characters long and contain only letters, digits, and underscores. The first character must be a letter.

Solution

^[a-zA-Z]\w{2,15}$

The anchor ^ ensures we start at the beginning. [a-zA-Z] forces a letter first. \w{2,15} allows 2 to 15 more word characters, giving a total length of 3 to 16. The $ anchor prevents trailing characters.

Challenge 3: Find Lines Without a Keyword

Match entire lines that do NOT contain the word "error" (case-insensitive).

Solution

^(?!.*error).*$

The negative lookahead (?!.*error) at the start of the line asserts that "error" does not appear anywhere on the line. If the assertion passes, .*$ matches the full line. Use the i flag for case insensitivity.

Challenge 4: Swap First and Last Name

Given "Doe, Jane", rearrange to "Jane Doe" using a replacement pattern.

Solution

// Pattern: (\w+),\s*(\w+)
// Replacement: $2 $1

Group 1 captures the last name, group 2 captures the first name. The replacement reverses the order and drops the comma.

Common Beginner Mistakes

Using .* when a tighter class would be safer
Forgetting anchors and matching more than expected
Escaping too little or too much
Testing only on perfect input instead of real-world messy text
Using greedy quantifiers when lazy ones would prevent over-matching
Forgetting that . does not match newlines by default

Frequently Asked Questions

What is the difference between greedy and lazy matching?

A greedy quantifier like .* consumes as much text as possible while still allowing the overall pattern to match. A lazy quantifier like .*? consumes as little as possible. For example, given one and two, the greedy pattern .* matches everything from the first  to the last , while the lazy .*? matches each tag pair individually.

When should I use non-capturing groups?

Use (?:...) when you need to group for alternation or repetition but do not need the matched text for back-references or replacements. This keeps your group numbering clean and avoids a minor performance cost in engines that store captured content.

Are lookaheads supported in all regex engines?

Positive and negative lookaheads are widely supported in JavaScript, Python, Java, .NET, and most modern engines. Lookbehinds have broader support now but some engines (older JavaScript versions before ES2018) do not support them. Always test in your target environment.

How do I match a literal special character like a dot or bracket?

Precede the character with a backslash: \. matches a literal period, \[ matches a literal bracket. Inside a character class, most special characters lose their meaning, so [.] also matches a literal dot.

Can I use regex to parse full HTML or XML documents?

No. Regex cannot handle nested, recursive structures reliably. For quick extraction of a single tag or attribute in known-clean markup, a simple pattern works. For anything more complex, use a proper parser. The HTML Cleaner handles stripping and sanitization without regex.

Next Step

Once you understand the basics, move to the regex debugging guide for greedy matches, multiline input, escaping problems, and performance issues. For a broader theory overview, see the regular expressions explained guide.

Related Tools

Regex Tester Online for live matching and plain-language explanations
Remove Line Breaks to clean copied text before testing patterns
Remove Duplicate Lines for post-match cleanup workflows
HTML Cleaner when regex is overkill for tag stripping

Related Guides

Regex Debugging — fixing greedy matches, escaping, and performance
Regular Expressions Explained — deeper theory and advanced features
Text Cleaning — the broader workflow where regex plays a key role
Data Cleaning Best Practices — applying regex in data pipelines

Regex Basics for Real Work

Three Core Building Blocks

Character Classes in Detail

Predefined Character Classes

Custom Ranges

Quantifier Cheat Sheet

Grouping and Capturing

Capturing Groups

Non-Capturing Groups

Lookahead and Lookbehind

Useful Beginner Patterns

How to Practice Safely

Practical Exercises

Challenge 1: Extract Prices

Challenge 2: Validate a Username

Challenge 3: Find Lines Without a Keyword

Challenge 4: Swap First and Last Name

Common Beginner Mistakes

Frequently Asked Questions

Next Step

Related Tools

Related Guides

Recommended Tools