Core Single Parsers
The core single parsers are all generic on the type of input they accept. For a list of additional parsers which are related strictly to parsing strings with character input sequences, see the String/Character Parsers page. In addition, there are several specialty parsers related to parsing common patterns from programming languages, which also take character input sequences. You can find these in the Programming Parsers page.
Throughout the descriptions of these parsers, examples will be shown where one parser is functionally or logically equivalent to a combination of other parsers. This is done to give multiple possible ways to understand some of the trickier concepts.
The best way to access these core parsers is through the static factory methods. Add this to the top of your C# file:
using ParserObjects;
using static ParserObjects.Parsers<char>;
(Replace <char>
with whatever your input type is.)
Declaration Styles
There are two basic styles of declaring parsers. The first is to use the static factory methods to create parsers:
using static ParserObjects.Parsers<char>;
var parser = List(
Any()
);
The second is to use monadic extension methods to combine them:
using ParserObjects;
using static ParserObjects.Parsers<char>;
var parser = Any().List();
In a few cases there are tuple syntaxes available as well. These will be noted in the appropriate sections.
Do not use parser class names directly from the ParserObjects.Internal.Parsers
namespace. These class names are not designed for easy discovery or use, and they may change between releases to better describe what they are and how they operate. The Function and Method names described here will stay the same between releases, even if the ways they are implemented may change.
Matching Parser Types
Matching parsers match 0 or more input items from the input sequence and return some sort of value.
using ParserObjects;
using static ParserObjects.Parsers<char>;
Any Parser
The Any
parser matches any single input value and returns it directly. It consumes one item of input, and only fails when the sequence is at the end.
var anyParser = Any();
It is functionally equivalent to the match predicate parser (Except for end-of-input, where the Match parser will return the End Sentinel), though simpler and faster (described below):
var anyParser = Match(_ => true);
Empty Parser
The Empty
parser consumes no input and always returns success with a default value, even when the input sequence is at end. It consumes no input and returns no value.
var parser = Empty();
If you would like to always return success, consume no input, but also return a value: use the Produce()
parser instead.
End Parser
The End
parser returns success if the stream is at the end, failure otherwise. It consumes no input and returns no value.
var parser = End();
There is also an IsEnd
parser which returns a success result with a boolean value to indicate end:
var parser = IsEnd();
Match Parser
The MatchParser
examines the next input item or next several items and returns the matched values if they match a pattern or predicate:
// Match a single item, using C# .Equals()
var parser = Match('A');
// Match a single item which satisfies a predicate
var parser = Match(c => IsMatch(c));
// Match a series of items, using C# .Equals()
var parser = Match(new [] { 'A', 'B', 'C' });
// Same as above, but uses the fact that a string is an IEnumerable<char>
var parser = Match("ABC");
The Match
parser can match the end sentinel, and if the end sentinel matches, it will return success at end of input (with 0 .Consumed
). If you would like to have the same behavior as Match but without matching the end sentinel, use the MatchItem
parser instead.
Note: If you are matching single characters the MatchChar() parser is optimized to cache values and perform fewer allocations.
MatchItem Parser
The MatchItem
parser is the same as Match(predicate)
except it returns failure at end of input, even if the end sentinel value would have matched the predicate:
var parser = MatchItem(c => IsMatch(c));
Peek Parser
The Peek
parser peeks at the next value of input, but does not consume it. It returns failure when the input sequence is at end, success otherwise.
var parser = new PeekParser<char>();
var parser = Peek();
This parser is functionally equivalent to the Any
and None
parsers:
var parser = Any().None();
Basic Combinator Parser Types
These parsers are used to combine and compose smaller parsers to form larger parsers. This is the heart of the ParserObjects combinators approach. To use these, import the methods you’re using (replace <char>
with whatever input type you are using):
using ParserObjects;
using static ParserObjects.Parsers<char>;
Bool Parser
The Bool
parser invokes a parser and returns true
if the inner parser succeeds, false
otherwise. It is useful if you want to know whether something matches, but don’t care what the result value is, or if you want to convert IParser<TInput>
to IParser<TInput, bool>
.
var parser = Bool(innerParser);
Chain Parser
The Chain
parser invokes an initial parser to obtain a prefix value, then uses that prefix value to select the next parser to invoke.
var parser = Chain(initial, result => {
if (!result.Success)
return HandleFailureParser();
if (result.Value == 'a')
return AParser();
if (result.Value == 'b')
return BParser();
});
var parser = initial.Chain(result => {
if (!result.Success)
return HandleFailureParser();
if (result.Value == 'a')
return AParser();
if (result.Value == 'b')
return BParser();
});
The Chain parser will throw an InvalidOperationException
if the callback method returns a null
parser value.
ChainWith Parser
The ChainWith
parser is related to the Chain
parser but uses a different fluent syntax for selecting a value.
var parser = ChainWith(initial, config => config
.When(x => x == 'a', AParser())
.When(x => x == 'b', BParser())
);
Choose Parser
The Choose
parser invokes an initial parser to parse a prefix value without consuming any input, then uses that prefix value to select the next parser to invoke.
var parser = Choose(initial, result => {
if (!result.Success)
return HandleFailureParser();
if (result.Value == 'a')
return AParser();
if (result.Value == 'b')
return BParser();
});
var parser = initial.Choose(result => {
if (!result.Success)
return HandleFailureParser();
if (result.Value == 'a')
return AParser();
if (result.Value == 'b')
return BParser();
});
The Choose
parser is implemented using the Chain
parser internally and is equivalent to a combination of the Chain
and None
parsers:
var parser = initial
.None()
.Chain(result => ...);
Combine Parser
The Combine
parser takes a list of parsers, parses each in sequence, and returns a list of object
results. You can transform or filter these results as appropriate for your application.
var parser = Combine(p1, p2, p3, ...);
For most cases, it is preferred to use the strongly-typed Rule
parser instead of Combine
.
Fail Parser
The Fail
parser returns failure unconditionally. It can be used to explicitly insert failure conditions into your parser graph, to provide error messages which are more helpful than the default error messages, or to serve as a placeholder for replacement operations. The Fail parser has an output type so it can be inserted into places in your parser graph that expect an output type to be specified.
var parser = Fail<char>("helpful error message");
var parser = Fail("helpful error message");
If the output type is not specified, it returns the same as the input type.
First Parser
The First
parser takes a list of parsers. Each parser is attempted in order, and the result is returned as soon as any parser succeeds. If none of the parsers succeed, the First
parser fails. The First parser can also be written as an extension method on a tuple of parsers. The First
parser is used to create preference or precedence among multiple possible options.
var parser = First(
parser1,
parser2,
parser3
);
var parser = (parser1, parser2, parser3).First();
The tuple variant of this parser is limited up to 9 child parsers. The other variants can take any number of child parsers.
List Parser
The List
parser attempts to parse the item parser repeatedly until it fails, and returns an enumeration of the results. Optionally the items may have a separator between them. The List parser takes optional minimum
and maximum
values, to control the number of items matched. If you specify a minimum, the list will fail unless at least that number of items has been matched. If you do not specify a minimum, the list may return success if no items are matched, and return an empty list as a result. If a maximum number is specified, the list will continue matching only until that maximum number is reached then it will stop even if more matches are possible.
var parser = List(innerParser);
var parser = List(innerParser, 3, 5);
var parser = List(innerParser, separatorParser);
var parser = List(innerParser, separatorParser, 3, 5);
// same as List(innerParser, minimum: 1);
var parser = List(innerParser, true);
var parser = List(innerParser, separatorParser, true);
var parser = innerParser.List();
var parser = innerParser.List(3, 5);
var parser = innerParser.List(separatorParser);
var parser = innerParser.List(separatorParser, 3, 5);
// Same as innerParser.List(minimum: 1);
var parser = innerParser.List(true);
If the inner parser returns success but consumes zero input, the List parser will break the loop and return only a single item. If a minimum number is set, the List parser will loop only until the minimum value and then break, returning success with a list with the correct number of items. This is a precaution to prevent the list parser from getting into an infinite loop when no input is being consumed.
None Parser
The None
parser evaluates an inner parser and then rewinds the input sequence to ensure no data has been consumed.
var parser = None(Any());
var parser = Any().None();
NonGreedyList Parser
The NonGreedyList
parser is similar to List()
except it attempts to match the fewest number of items possible. It takes a continuation parser which will be invoked to continue the parse:
var parser = NonGreedyList(
itemParser,
values => new Rule(
values,
finalParser
(v, f) => { ... }
)
);
Like the List()
parser, NonGreedyList()
parser also takes optional separator, minimum and maximum parameters.
The NonGreedyList
implementation provides a backtracking behavior. It will attempt to continue the parse by matching zero items. If the parse fails, it will match one item and attempt again, if that fails it will match a second item and attempt again, etc. Performance can be negatively impacted if the NonGreedyList has to make many such attempts and backtracks.
Optional Parser
The Optional
parser attempts to invoke the inner parser, but returns success no matter the result. The Optional parser takes a callback argument to return a default value if the parse fails. If the default value callback is not provided, the Optional parser will return an IOption
object which will report on success or failure of the inner parser.
var parser = Optional(innerParser);
var parser = Optional(innerParser, () => defaultValue);
var parser = innerParser.Optional();
var parser = innerParser.Optional(() => defaultValue);
The Optional parser is conceptually equivalent to a combination of First
and Produce
parsers:
var parser = First(
innerParser,
Produce(() => defaultValue)
);
Predict Parser
The Predict
parser peeks at a lookahead value in the input stream, and uses that value to determine what parser to invoke next.
var parser = Predict(config => config
.When(c => c == 'a', AParser())
.When(c => c == 'b', BParser())
);
If no matching value is found, the Predict parser returns failure. The Predict
parser is implemented internally using the Chain
parser and the Peek
parser. It is logically equivalent to, though nicer syntax than:
var parser = Peek().Chain(r => ...);
Produce Parser
The Produce
parser produces a value but consumes no input. It always returns success.
var parser = Produce(() => "abcd");
var parser = Produce((input, data) => "abcd");
The produce parser may be used to construct synthetic values at parse time. It can return a constant value or create a new value on every call. The value will not be cached. It may look at and consume input from the input sequence. It may use values from the contextual state data.
The simple case of the Produce parser is functionally equivalent to a combination of the Empty
and Transform
parsers:
var parser = Empty().Transform(_ => "abcd");
Note: The Produce
parser has access to the input stream and can consume input or perform other operations on the input stream or the parse state. While it is strongly preferred that you treat the Produce
parser callback as a read-only, side-effect-free operation, you have the power to do anything you want in the callback you provide. Keep in mind that side-effects you create in your callback will not be automatically undone if subsequent parsers .Reset()
or .Rewind()
the input sequence, or if the parent parser fails, etc. Be careful not to create problems for yourself here, and try to take the most simple approach.
Rule Parser
The Rule
parser attempts to execute a list of parsers, and then return a combined result. If any parser in the list fails, the input is rewound and the whole parser fails. You can create rule parsers by using the .Rule()
extension method on a Tuple
or ValueTuple
of parser objects, which may be cleaner to read and write in some situations
var parser = Rule(
parser1,
parser2,
parser3,
(r1, r2, r3) => ...
);
var parser = (parser1, parser2, parser3).Rule((r1, r2, r3) => ...);
The Rule()
method and tuple variants are both limited to 9 parsers at most. If you need to combine the results of more than 9 parsers, use the Combine
parser instead.
Synchronize Parser
The Synchronize
parser allows entering panic mode when a parse fails. In panic mode, the parser will discard tokens to get back to a known “good” state, before attempting the parse again. This is useful for cases where you want to report all syntax errors to the user, not just the first error.
var parser = Synchronize(inner, x => x == ';');
var parser = inner.Synchronize(x => x == ';');
Once you define your parser, you can check to see if there are any errors. If the parser eventually succeeds, the successful result will also be available:
var result = parser.Parse(...);
var allErrors = result.TryGetData<ErrorList>();
var successResult = result.TryGetData<IResult>();
You can use the list of errors to report problems back to the user. Notice that if the first attempted parse fails, the Synchronize
parser will always return failure, even if it is eventually able to find a successful continuation after discarding some inputs. Use the TryGetData
methods described above to see what your errors were and what your eventual success would have been, and then you can decide what you want to do with that information.
Try Parser
The Try
parser catches user-thrown exceptions from within the parse and handles them. When an exception is caught, the input sequence is rewound to the location where the Try
parser began.
var parser = Try(innerParser, ex => {...}, bubble: true);
The second parameter is a callback to allow examining the exception when it is received. This can be a useful place to set a breakpoint during debugging. The third parameter bubble
tells whether to rethrow the exception (true
) or to handle the exception and return a failure result (false
).
You can get information about the exception thrown from the result, if you set bubble: false
:
var result = parser.Parse(...);
var exception = result.TryGetData<Exception>();
Note: The ParserObjects library uses special exceptions for non-local control flow purposes in specific situations. The Try
parser will not catch or interfere with these in any way, and if you throw a ParserObjects.Internal.ControlFlowException
or a subclass of ParserObjects.Internal.ControlFlowException
in your user callbacks or custom parser implementations, they will not be caught or handled by the Try
parser.
Code Callback Parsers
Some parsing tasks can better be handled manually with a user-provided callback delegate. This can be for specific algorithms (stack-based, shunting yard, etc) or cases where debugging tasks require setting breakpoints in the middle of a parse. The Function
and Sequential
parsers both allow you to write your own parser code in a callback delegate, though the features they offer are a little different.
Because these functions take arbitrary user callback delegates which may operate on the IParseState<TInput>
and the ISequence<TInput>
, many optimizations (.Match()
, etc) are not available and several other features do not work as might be expected (.ToBnf()
, etc). In exchange for some of these missing features, you get more flexibility and an opportunity to do some of your own optimizations.
Function Parser
The Function
parser takes a callback function to perform the parse and expects you to create your own IResult<TOutput>
return value. The callback takes success
and fail
arguments, which are factory methods to generate the correct result object with filled-in metadata. It is suggested you use these callbacks, but it is not required. The Function
parser will automatically rewind the input sequence on failure, so you do not need to cleanup manually. It will also automatically report the correct number of consumed input tokens so you do not need to track it yourself.
var parser = Function((t, success, fail) => {
if (t.Input.GetNext() == 'A')
// for success
return success("ok");
// for failure
return fail("parse failed");
});
The Function
parser callback is a largely unstructured environment where you have access to the input sequence, and are expected to do the parsing yourself.
Note: If your user callback delegate has side-effects, those will not be undone if the parse is failed and the input sequence is rewound. It is generally recommended that you do not have side-effects in your callback, and that you do not maintain external state for this reason. You are free to do these things, but may suffer complications in some scenarios.
Sequential Parser
The Sequential
parser is a more structured version of the Function
parser, that expects you to be delegating parsing work to other IParser
instances. This allows you to use procedural logic to aid in parsing and to set breakpoints between parsers to get maximum debuggability. The downside is that the Sequential Parser does not work with some features like BNF stringification or .Replace()
/.ReplaceChild()
operations, and .Match()
cannot be optimized with arbitrary user callbacks.
var parser = Sequential(t =>
{
var type = t.Parse(Word());
if (type == 'decimal')
{
var colon = t.Parse(Match(':'));
var value = t.Parse(Integer());
return value;
}
if (type == 'hex')
{
var colon = t.Parse(Match(':'));
var value = t.Parse(HexadecimalInteger());
return value;
}
return 0;
});
The t
object assists in performing the parse and it has ability to handle errors by causing the whole Sequential
parser to fail if any of the child parsers fail.
Note: If your user callback delegate has side-effects, those will not be undone if the parse is failed and the input sequence is rewound. It is generally recommended that you do not have side-effects in your callback, and that you do not maintain external state for this reason. You are free to do these things, but may suffer complications in some scenarios.
Matching Parsers
These parsers help to simplify matching of literal patterns.
Match Parser
The Match
parser has several different forms.
The first form takes a single item to match against, using default .Equals()
behavior, and returns that item if it matches the next input value:
var parser = Match('c');
The second form takes a predicate callback which takes the next input item and returns a bool
. If the predicate returns true, the item is considered a match and is returned
var parser = Match(c => c == 'a' || c == 'b' || char.IsSymbol(c));
The third form takes an enumerable of input values as a pattern, and attempts to match all of them. If all input items match, in order, the values will be returned as an IReadOnlyList
:
var parser = Match(new char[] { 'a', 'b', 'c', 'd' });
// Notice that a string is an IEnumerable<char>. This is the same as the above.
var parser = Match("abcd");
This is functionally equivalent (though faster and more succinct) to a combination of the Rule
and Match
parsers:
var parser = Rule(
Match(c => c == 'a'),
Match(c => c == 'b'),
Match(c => c == 'c'),
Match(c => c == 'd')
(a, b, c, d) => new [] { a, b, c, d }
);
Note: For characters and strings, the MatchChar
parser is faster than the Match(char)
parser for matching a single item, and the MatchChars()
parser is faster than Match(IEnumerable<char>)
for similar behaviors.
Trie and MatchAny parsers
The Trie
parser uses a trie to find the longest match in a list of possible literal sequences. This is a useful optimization for keyword and operator literals, especially where individual patterns may have overlapping prefixes. The ParserObjects library provides IReadOnlyTrie<TKey, TResult>
and IInsertableTrie<TKey, TResult>
abstractions for this purpose.
var parser = Trie(trie);
var parser = trie.ToParser();
var parser = Trie(trie => trie.Add(...));
There is also a MatchAny
parser which is similar to Trie
but matches characters into strings.
Transforming parsers
These parsers exist to transform results from one form to another.
Transform Parser
The Transform
parser transforms the output of an inner parser. If the inner parser fails the Transform
parser fails. If the inner parser succeeds, the Transform
parser will return a transformed result.
var parser = Transform(innerParser, r => ...);
var parser = innerParser.Transform(r => ...);
Capturing Parsers
The IParser<TInput>.Match()
can be a significant optimization over the IParser<TInput>.Parse()
method. The Capture
parser takes advantage of this fact by calling .Match()
on a list of parsers in series and then returns a list of input items directly from the input sequence which were spanned by the match as an IReadOnlyList<TInput>
. The Capture
parser can be a significant optimization in terms of both runtime performance and memory use if you need to match a complicated pattern and return the input items directly.
var parser = Capture(p1, p2, p3, ...);
There is also a CaptureString
variant which operates on char
inputs and returns a string
output instead of an IReadOnlyList<char>
.
Recursive Parsers
These parsers exist to help simplify certain recursion scenarios, especially in parsing equations and mathematical expressions. They are not helpful in all recursion scenarios.
Left Apply Parser
The LeftApplyParser
is a parser for left-associative parsing. The left value is parsed first and the value of it is applied to the right side production rule. The value of the right parser will then be used as the new left value and it will attempt to continue until a right parser does not match. The pseudo-BNF for it is:
self := <self> <right> | <item>
var parser = LeftApply(
itemParser,
left => Rule(
left,
...
)
);
For a single step, LeftApply
is equivalent to this:
var parser = Rule(
itemParser,
...
);
But LeftApply
has a looping effect where the value of the Rule
would be used as the next value for itemParser
and the Rule
applied again.
Right Apply Parser
The RightApply
is for right-associative recursion. It is conceptually similar to the LeftApply
parser, but with right-recursion instead. It parses an item and then attempts to parse a separator followed by a recursion to itself. The pseudo-BNF for it is:
self := <item> (<middle> <self>)?
var parser = RightApply(item, middle, (l, m, r) => ...);
This same recursive functionality can be reproduced by a combination of Deferred
, First
, and Rule
:
IParser<char, string> parserCore = null;
var parser = Deferred(() => parserCore);
parserCore = First(
Rule(
item,
middle,
parser,
(l, m, r) => ...
),
item
);
This type of parser is also referred to as “recursive ascent” because of how it parses a list of items and then starts combining them together from the right to the left.
Pratt Parser
The Pratt
parser is an implementation of the Pratt parsing algorithm, which may be particularly helpful with parsing mathematical expressions and other types of languages with similar structures.
var parser = Pratt(config => { ... });
For detailed information about configuring and using the Pratt
parser, see the Pratt Parser page. It may be simpler to use in many situations than the LeftApply
and RightApply
parsers are, or attempting to use other mechanisms for parsing precedence and associativity rules.
Earley Parser
The Earley
parser is an implementation of the Earley parsing algorithm, which is a powerful algorithm for context-free languages and left- and right-recursive grammars.
var parser = Earley(...);
For more detailed information about configuring and using the Earley
parser, see the Earley Parser page.
Referencing Parsers
These parsers exist to help with referencing issues, to help resolve circular dependencies or decide on which parser to use at parse-time.
Create Parser
The Create
parser creates a parser at parse time using information available in the current parse state. Create parser looks similar to the Deferred
parser, though has a few important semantic differences: The create callback takes the ParseState
, and it cannot be used with find/replace operations. The Create parser is expected to create new parser instances at different times, so it is not considered to have “children”. This means that the parser returns by the Create parser will not be visible to Visitors.
var parser = new CreateParser<TInput, TOutput>(state => { ... });
var parser = Create(state => { ... });
Deferred Parser
The Deferred
parser references another parser and resolves the reference at parse time instead of at declaration time. This allows your parser to handle recursion and circular references. The parser returned from the Deferred parser is expected by the system to be the same throughout the entire parse and may be cached after first access. Because the parser returned by Deferred is expected to be static and available at any time after the parser graph is created, the parser can be used with find/replace operations and should correctly work with BNF stringification.
var parser = new DeferredParser<TInput, TOutput>(() => targetParser);
var parser = Deferred(() => targetParser);
Replaceable Parser
The Replaceable
parser references an inner parser and invokes it transparently. However, the replaceable parser allows the inner parser to be replaced in-place at runtime without cloning. This is useful in cases where you want to make modifications to the parser tree without creating a whole new tree.
var parser = Replaceable(innerParser);
var parser = innerParser.Replaceable();
If an inner parser is not explicitly specified, the inner parser will be a Fail
parser. These two lines are equivalent:
var parser = Replaceable<TOutput>();
var parser = Replaceable(Fail<TOutput>());
It is extremely helpful to name your replaceable parsers so you can quickly find and replace values by name.
See the section on Finding and Replacement in the Parsers Usage Page for more details.