Skip to the content.

Character Parsers

In addition to the Core Parsers, ParserObjects provides a few pre-built parsers for common character and string parsing and matching tasks. Several of these methods will cache the created instances so you don’t recreate them on every call. Exceptions will be noted below. All of the specialty parsers assume char input, because all of these tasks are related to characters and strings.

All these specialty parsers can be accessed with this declaration:

using static ParserObjects.ParserMethods;

Matching Parsers

Character Matcher

The MatchChar parser matches a given character. It operates the same as Match(char) except it caches instances for characters that have previously been requested.

var parser = MatchChar('x');

This is functionally equivalent to the following, but caches the instance to reduce memory consumption if parsers are being re-requested multiple times:

var parser = Match('x');

Character In Collection

If you want to match a character which is one of a finite set of options, you can use the MatchAny parser to check if the character is in a collection.

var possibilities = new HashSet<char>() { 'x', 'y', 'z' };
var parser = MatchAny(possibilities);

HashSet<char> or another collection type optimized for fast .Contains() is preferred, but you can use any collection.

There is also a NotMatchAny which returns success if the character is not in the collection:

var forbidden = new HashSet<char>() { 'a', 'b', c' };
var parser = NotMatchAny(forbidden);

Character String Parser

The CharacterString parser matches a literal string of characters against a char input and returns the string on success.

var parser = CharacterString("abc");

This is functionally equivalent to a combination of the MatchSequence and Transform parsers:

var parser = Match("abc").Transform(x => new string(x.ToArray()));

Character Class Parsers

The Letter parser matches any uppercase or lowercase letter character. Word matches a sequence of one or more letters and returns them as a string. UpperCase matches any one uppercase character, and LowerCase matches any one lowercase character. The Symbol parser matches any non-letter, non-number symbol or punctuation character. You can convert any of those to a string like Word does by using the .ListCharToString() extension method:

var allUpperCase = UpperCase().ListCharToString();
var allLowerCase = LowerCase().ListCharToString();

Digit Parsers

ParserObjects provides parsers for parsing digits (‘0’-‘9’) and sequences of digits.

If you want to parse formatted numbers with possible negative values, decimal values and syntactic rules (no leading 0, etc), consider using one of the Programming Parser Methods instead.

A Single Digit

The Digit parser returns a single character in the range ('0'-'9'). The NonZeroDigit parser returns a single character in the range ('1'-'9'), and the HexadecimalDigit parser returns any valid hex character ('a'-'f', 'A'-'F', '0'-'9').

var parser = Digit();
var parser = NonZeroDigit();
var parser = HexadecimalDigit();

A String of Digits

The DigitString parser returns a string of consecutive digits. The HexadecimalString parser returns a string of hexadecimal digits.

var parser = DigitString();
var parser = HexadecimalString();

Line Parsers

The Line method parses the remainder of the line until the next newline ('\n') character. It does not return the newline character. The PrefixedLine method parses the line if it starts with the given prefix. If the prefix is null or empty, it is the same as Line.

var parser = Line();
var parser = PrefixedLine("abc");

The PrefixedLine parser is not cached.

Whitespace Parsers

The WhitespaceCharacter parser matches any single whitespace character and returns it. The Whitespace parser returns a string of one or more whitespace characters and returns the string. The OptionalWhitespace parser returns a string of zero or more whitespace characters and is equivalent to .Whitespace().Optional().

var parser = WhitespaceCharacter();
var parser = Whitespace();
var parser = OptionalWhitespace();

Quoted String Parsers

using static ParserObjects.QuotedParserMethods;

ParserObjects provides several methods for parsing quoted strings with escape characters. By default, the backslash ('\') is used as an escape character, which is a common idiom in modern programming and data serialization languages. These methods all have two variants, one to return the literal match including quotes and escape sequences, and a “stripped” version to return the string contents without quotes or escapes.

Double Quoted Strings

var parser = DoubleQuotedString();
var parser = StrippedDoubleQuotedString();

Single Quoted Strings

var parser = SingleQuotedString();
var parser = StrippedSingleQuotedString();

Custom Strings

You can create your own quoted string parser method by specifying the start, stop and escape characters:

var parser = DelimitedStringWithEscapedDelimiters('"', '"', '\\');
var parser = StrippedDelimitedStringWithEscapedDelimiters('"', '"', '\\');

The parsers created by these methods are not cached.

Regexes

You can use basic regular expressions to create a parser:

var parser = Regex("(a|b)?c*");

For more details on what syntax is supported by the Regex parser, see the Regexes Page. Be warned that certain types of patterns may create pathological backtracking behavior which will hurt the performance of your parser.

Identifier Parsers

Camel Case

You can parse CamelCase identifiers using the CamelCase and UpperCamelCase parsers:

var parts = CamelCase().Parse("camelCaseIdentifier123ABC");
// returns ["camel", "Case", "Identifier", "123", "ABC"]

UpperCamelCase expects the first character of the first word to be capitalized. the CamelCase parser allows the first character to be upper or lower case. This parser treats number strings as a word, and also consecutive upper-case characters as a single word acronym.

Spinal Case

Spinal Case (also known by some people as “kebab case”) consists of words separated by a dash, and can be parsed with the SpinalCase parser:

var parts = SpinalCase().Parse("spinal-case-identifier");
// returns ["spinal", "case", "identifier"]

Capitalization does not matter, and numbers may also be used. If you want only capitalized identifiers, you can use the ScreamingSpinalCase parser, which only recognizes upper-case letters and numbers:

var parts = ScreamingSpinalCase().Parse('SCREAMING-SPINAL-CASE");
// returns ["SCREAMING", "SPINAL", "CASE"]

Snake Case

Snake case consists of workds separated by an underscore and can be parsed with the SnakeCase parser, and an all-uppercase version can be parsed with the ScreamingSnakeCase parser:

var parts = SpinalCase().Parse("snake_case_identifier");
// returns ["snake", "case", "identifier"]

var parts = ScreamingSnakeCase().Parse("SNAKE_CASE_IDENTIFIER");
// returns ["SNAKE", "CASE", "IDENTIFIER"]