I have been kicking around a project idea in my head for a few years now, it’s something which I don’t need often but I have needed every once in a while. Before discussing what the idea is, I’ll talk about the most recent use-case where I’ve used this idea:

Client Webhooks

We had an application at work with a webhooks feature. Our clients wanted to keep their local systems in sync with the data in our application so, if they gave us a URL, we would post updates to their systems in real time instead of having to do batch loads at the end of the day.

We started to run into problems with a few of our clients, however. Many of them didn’t have the large technical teams, so the integrations they wrote to receive our data were occasionally problematic. We would get a variety of error responses: especially 500 Internal Server Error (when their systems threw an unhandled exception somewhere) and 400 Bad Request (when they couldn’t handle the data we did or didn’t send them). These error messages would trigger all sorts of logging and alarm bells, and even retries in some situations which always saw the same results and just caused more logging and alarms.

To avoid this issue, we asked clients to send us some validation rules that we could use to make sure the data was ready to send before we attempted to send it. We wouldn’t send every single update, only the ones which were in a state ready to be received by the downstream system.

These validation rules worked by sending us a string with xpath-like syntax to locate an element in our data, and then a few rules about what to do with that data (“must exist”, “must not exist”, etc). We would parse the xpath-like string, find the data elements in our objects, and then compare what we found to the rules. This system worked great, we didn’t have to write custom validations for every client, they could push validations to us as they developed their own systems further.

CSPath

The above isn’t the only time I have needed a feature like this, though it is the most recent one. While there are some serious problems and brittlenesses with this approach, there are some rare occasions where the need arises and it is the lesser of all the evil alternatives.

I decided to crystalize this idea into a new library called CSPath, in homage to it’s XPath origins. It’s not exactly the same as XPath though, because XPath works on XML documents which themselves don’t have any functionality. XPath provides some functions and namespacing features to get around this limitation of XML. .NET Objects don’t have a lack of functionality, so CSPath can focus entirely on object graph traversal and object filtering (which is, believe me, difficult enough).

To get an idea about what CSPath can do you’ll want to see some examples:

Example

var costs = lineItem.Path(".Charges[]<ShippingCharge>{.Payment = null}.Cost");

CSPath’s primary public interface is the object.Path() extension method. You can access all the behind-the-scenes bits if you want to customize your experience, but the Path method is probably your best starting point.

Here’s a breakdown of what the path above is doing:

  • .Charges Gets the value of public property “Charges”
  • [] Enumerates the contents of the IEnumerable
  • <ShippingCharge> Only gets the items which are of type “ShippingCharge”
  • {.Payment = null} Only looks at items with a public “Payment” property value of null
  • .Cost Gets the value of the Cost property.

In short, the example above is roughly equivalent to the following LINQ:

var costs = lineItem.Charges
    .OfType<ShippingCharge>
    .Where(sc => sc.Payment == null)
    .Select(pi => pi.Cost);

Take a look at the README file for a full list of all path syntax elements (so far) and what they mean.

Internals

CSPath is broken into two halves: the parser where we read the path string and output a sequence of IPath objects, and the IPath objects which implement the finding/filtering operations themselves.

The Parser

The parser is a scannerless, combinator-based recursive descent parser. I started with a normal scanner/parser combo, where the scanner broke the input into tokens and the parser operated only on a sequence of tokens, but I thought that was overkill for this use-case. Allocating the extra tokens and then having another class to wrap the scanner and provide peek/putback on the token stream was all unnecessary work. So I merged the two together into a scannerless design.

The “combinator-based” bit in the description above is really the interesting part, and I’ll talk about it more in depth in future posts. It’s a technique that I’ve read about in academic literature before but hadn’t ever seen used in a practical way. I recently saw such an example while looking through the repo for Microsoft’s Kusto Query Language, and I was so excited by the approach that I immediately tried to adapt it for my own needs. Using some of these ideas, I was able to make a grammar class which both describes the grammar of the language and also creates a parser from it at the same time, improving readability and maintainability significantly.

The IPath Interface

The IPath interface is quite simple and contains only one method:

IEnumerable<object> Filter(IEnumerable<object> input);

The parser outputs an ordered list of these path objects, which we execute on the input one after another until we get the final result. Many of the path classes are lazily evaluated, which means if you call .First() or .Take(), it will only calculate the ones you need.

It’s worth mentioning here that, because the inputs are all of type object, we have to look up things like indexers and properties by reflection. This brings a performance penalty over what LINQ provides, sometimes significantly so.

Caveats and Contraindications

There are some obvious deficiencies in this concept and there are reasons why I go out of my way to describe use of this as “rare”:

  • The performance of parsing the path string and executing the path with all the necessary reflection is much lower than using LINQ
  • The lack of strong typing means problems with your paths won’t be found until runtime
  • Specifying structure in a string, including verbatim property and type names, prevents you from easily doing refactoring of target object types.

So I personally would make sure all the following were true before using CSPath:

  • The object structures you’re using don’t change often or follow semantic versioning (and you can tie your path strings to major version numbers)
  • The values you’re trying to find aren’t known at compile time. They may be gotten from an external source (config file, database, api) or calculated at runtime.
  • You need so much flexibility that it isn’t feasible to provide pre-canned query methods for every possibility
  • You have a unit-test suite where you can test the paths you write to make sure they are accurate

It’s rare when all these conditions are met, but when you find yourself in this situation I hope you’ll agree that this is a handy little library to have around.

Future Developments

The 0.0.2 version currently published on nuget has a lot of features but also a lot of TODO notes and missing bits. I’m actively developing a few last features that I think are fundamental, but don’t have big long-term plans for it after that. The library is small enough and the interfaces simple enough that I think we can get to a “stable” 1.0.0 version pretty quickly.