Character Parsers

In addition to the Core Parsers, ParserObjects provides a few pre-built parsers for common character and string parsing and matching tasks. Several of these methods will cache the created instances so you don’t recreate them on every call. Exceptions will be noted below. All of the specialty parsers assume char input, because all of these tasks are related to characters and strings.

All these specialty parsers can be accessed with this declaration:

using static ParserObjects.Parsers;

Matching Parsers

Character Matcher

The MatchChar parser matches a given character. It is similar to the Match parser but optimized for character matching workflows. The library will do additional caching of parser instances where possible, and the MatchChar parser will never match the end sentinel of the input sequence.

var parser = MatchChar('x');
var parser = MatchChar(c => char.IsSymbol(c));

This is functionally similar to, but slower than, Match():

var parser = Match('x');
var parser = Match(c => char.IsSymbol(c));

You can also do case insensitive match by setting the caseInsensitive flag:

var parser = MatchChar('x', caseInsensitive: true);

Note: MatchChar will not read the end sentinel, where the Match parser will. So these two parsers will behave differently at the end of input:

var parser1 = Match(c => !char.IsLetter(c));
var parser2 = MatchChar(c => !char.IsLetter(c));

The Match parser will see that the default end sentinel '\0' satisfies the predicate and return it, while the MatchChar parser will not read the end sentinel and will return failure. Also, the MatchChar parser will always return failure if you ask it to match the end sentinel, because it will never attempt to read past the end of input:

var parser = MatchChar('\0');
var result = parser.Parse("");
result.Success.Should().BeFalse();

If your input string contains nested null characters, those will match the above parser until the sequence reaches end of input.

Character In Collection

If you want to match a character which is one of a finite set of options, you can use the MatchAny parser to check if the character is in a collection.

var possibilities = new HashSet<char>() { 'x', 'y', 'z' };
var parser = MatchAny(possibilities);

HashSet<char> or another collection type optimized for fast .Contains() is preferred, but you can use any collection.

There is also a NotMatchAny which returns success if the character is not in the collection:

var forbidden = new HashSet<char>() { 'a', 'b', 'c' };
var parser = NotMatchAny(forbidden);

Notice that these methods do not have a caseInsensitive flag. If you want case-insensitive matching you need to set up your collection with a case-insensitive IEqualityComparer<T>.

Character String Parser

The MatchChars parser matches a literal string of characters against a char input and returns the string on success. There is also an alias CharacterString() which does the same thing:

var parser = MatchChars("abc");
var parser = CharacterString("abc");

This is functionally equivalent to (though faster than) a combination of the MatchSequence and Transform parsers:

var parser = Match("abc").Transform(x => new string(x.ToArray()));

There is also a MatchAny parser, similar to the Trie parser, which takes several string patterns and will return a string if any of the patterns match. It also has a caseInsensitive mode:

var parser = MatchAny(new[] { "pattern1", "pattern2", ...}, caseInsensitive: true);

Character Class Parsers

The Letter parser matches any uppercase or lowercase letter character. Word matches a sequence of one or more letters and returns them as a string. UpperCase matches any one uppercase character, and LowerCase matches any one lowercase character. The Symbol parser matches any non-letter, non-number symbol or punctuation character. You can convert any of those to a string like Word does by using the .ListCharToString() extension method:

var letter = Letter();
var word = Word();
var alsoWord = Letter().ListCharToString();
var allUpperCase = UpperCase();
var allLowerCase = LowerCase();
var symbols = Symbol();

Digit Parsers

ParserObjects provides parsers for parsing digits (‘0’-‘9’) and sequences of digits.

using static ParserObjects.Parsers.Digits;

If you want to parse formatted numbers with possible negative values, decimal values and syntactic rules (no leading 0, etc), consider using one of the Programming Parser Methods instead.

A Single Digit

The Digit parser returns a single character in the range ('0'-'9'). The NonZeroDigit parser returns a single character in the range ('1'-'9'), and the HexadecimalDigit parser returns any valid hex character ('a'-'f', 'A'-'F', '0'-'9').

var parser = Digit();
var parser = NonZeroDigit();
var parser = HexadecimalDigit();

A String of Digits

The DigitString parser returns a string of consecutive digits.

var parser = DigitString();

A String of Digits as an Integer

The DigitsAsInteger parser reads a string of consecutive digits and parses them as an int:

var parser = DigitsAsInteger();

This is the same as:

var parser = DigitString().Transform(int.Parse);

Notice that this parser doesn’t do any special behavior with respect to leading zeros, doesn’t handle decimal points, fractions, or scientific notation, doesn’t parse leading - for negatives, etc. For a more structured number parser following existing programming language rules see The C Parsers or the JS Parsers.

Line Parsers

The Line method parses the remainder of the line until the next newline character. It does not return the newline character. The PrefixedLine method parses the line if it starts with the given prefix. If the prefix is null or empty, it is the same as Line.

var parser = Line();
var parser = PrefixedLine("abc");

The PrefixedLine parser instance is not cached by the library.

Whitespace Parsers

The WhitespaceCharacter parser matches any single whitespace character and returns it. The Whitespace parser returns a string of one or more whitespace characters and returns the string. The OptionalWhitespace parser returns a string of zero or more whitespace characters and is equivalent to .Whitespace().Optional().

var parser = WhitespaceCharacter();
var parser = Whitespace();
var parser = OptionalWhitespace();

Quoted String Parsers

using static ParserObjects.Parsers;

ParserObjects provides several methods for parsing quoted strings with escape characters. By default, the backslash ('\') is used as an escape character, which is a common idiom in modern programming and data serialization languages. These methods all have two variants, one to return the literal match including quotes and escape sequences, and a “stripped” version to return the string contents without quotes or escapes.

Double Quoted Strings

var parser = DoubleQuotedString();
var parser = StrippedDoubleQuotedString();

Single Quoted Strings

var parser = SingleQuotedString();
var parser = StrippedSingleQuotedString();

Custom Strings

You can create your own quoted string parser method by specifying the start, stop and escape characters:

var parser = DelimitedStringWithEscapedDelimiters('"', '"', '\\');
var parser = StrippedDelimitedStringWithEscapedDelimiters('"', '"', '\\');

The parsers created by these methods are not cached.

Regexes

You can use basic regular expressions to create a parser:

var parser = Regex("(a|b)?c*");

For more details on what syntax is supported by the Regex parser, see the Regexes Page. Be warned that certain types of patterns may create pathological backtracking behavior which will hurt the performance of your parser.

Identifier Parsers

Camel Case

You can parse CamelCase identifiers using the CamelCase and UpperCamelCase parsers:

var parts = CamelCase().Parse("camelCaseIdentifier123ABC");
// returns ["camel", "Case", "Identifier", "123", "ABC"]

UpperCamelCase expects the first character of the first word to be capitalized. the CamelCase parser allows the first character to be upper or lower case. This parser treats number strings as a word, and also consecutive upper-case characters as a single word acronym.

Spinal Case

Spinal Case (also known by some people as “kebab case”) consists of words separated by a dash, and can be parsed with the SpinalCase parser:

var parts = SpinalCase().Parse("spinal-case-identifier");
// returns ["spinal", "case", "identifier"]

Capitalization does not matter, and numbers may also be used. If you want only capitalized identifiers, you can use the ScreamingSpinalCase parser, which only recognizes upper-case letters and numbers:

var parts = ScreamingSpinalCase().Parse('SCREAMING-SPINAL-CASE");
// returns ["SCREAMING", "SPINAL", "CASE"]

Snake Case

Snake case consists of words separated by an underscore and can be parsed with the SnakeCase parser, and an all-uppercase version can be parsed with the ScreamingSnakeCase parser:

var parts = SpinalCase().Parse("snake_case_identifier");
// returns ["snake", "case", "identifier"]

var parts = ScreamingSnakeCase().Parse("SNAKE_CASE_IDENTIFIER");
// returns ["SNAKE", "CASE", "IDENTIFIER"]

Stringify Parsers

The Stringify parser takes a parser which returns a IReadOnlyList<char> result and transforms it to return a string result instead. This may be useful in places where you are using a List or a Match(pattern) parser and need to get a string result without having to concatenate all the substrings together yourself.

var getWord = Stringify(
    List(
        Match(c => char.IsLetter(c))
    )
);

Capturing Parsers

CaptureString Parser

The CaptureString parser takes a list of several parsers, matches each of them in series, and returns the complete match as a string. This parser can only be used with an ISequence<char> input, is optimized for ICharSequence inputs, and represents a significant optimization over alternative parsers for the same behavior:

var parser = CaptureString(p1, p2, p3, ...);

This has equivalent behavior to, but much lower performance than, a Combine or Rule parser with a Transform:

var parser = Combine(
        p1,
        p2,
        p3,
    )
    .Transform(l => string.Join("", l));