This morning I made a few last commits on my whiteknight/io_cleanup1 branch, and I’m cautiously optimistic that the branch is now ready to merge. The last remaining issue, which has taken the last few days to resolve, has been fixing readine semantics to match some old behavior.

A few days ago I wrote a post about how complicated readline is. At the time, I thought I had the whole issue under control. But then Moritz pointed out a problem with a particular feature unique to Socket that was missing in the new branch.

In master, you could pass in a custom delimiter sequence as a string to the .readline() method. Rakudo was using this feature like this:

str = s.readline("\r\n")

Of course, as I’ve pointed out in the post about readline and elsewhere, there was no consistency between the three major builtin types: FileHandle, Socket and StringHandle. The closest thing we could do with FileHandle is this:

str = f.readline();

Notice two big differences between FileHandle and Socket here: First, FileHandle has a separate record_separator method that must be called separately, and the record separator is stored as state on the FileHandle between .readline() calls. Second, FileHandle’s record separator sequence may only be a single character. Internally, it’s stored as an INTVAL for a single codepoint instead of as a STRING*, even though the .record_separator() method takes a STRING* argument (and extracts the first codepoint from it).

Initially in the io_cleanup1 branch I used the FileHandle semantics to unify the code because I wasn’t aware that Socket didn’t have the same restrictions that FileHandle did, even if the interface was a little bit different. I also didn’t think that the Socket version would be so much more flexible despite the much smaller size of the code to implement it. In short, I really just didn’t look at it closely enough and assumed the two were more similar than they actually were. Why would I ever assume that this subsystem ever had “consistency” as a driving design motivation?

So I rewrote readline. From scratch.

The new system follows the more flexible Socket semantics for all types. Now you can use almost any arbitrary string as the record separator for .readline() on FileHandle, StringHandle and Socket. In the whiteknight/io_cleanup1 branch, as of this morning, you can now do this:

var f = new 'FileHandle';'foo.txt', 'r');
string s = f.readline();

…And you can also do this, which is functionally equivalent:

var f = new 'FileHandle';'foo.txt', 'r');
string s = f.readline("TEST");

The same two code snippets should work the same for all built-in handle types. For all types, if you don’t specify a record separator by either method, it defaults to “\n”.

Above I mentioned that almost any arbitrary string should work. I use the word “almost” because there are some restrictions. First and foremost, the delimiter string cannot be larger than half the size of the buffer. Since buffers are sized in bytes, this is a byte-length restriction, not a character-length restriction. In practice we know that delimiters are typically things like “\n”, “\r\n”, ”,”, etc. So if the buffer is a few kilobytes this isn’t a meaningful limitation. Also, the delimiter must be the same encoding as the handle uses, or it must be able to convert to that encoding. So if your handle uses ascii, but you pass in a delimiter which is utf16, you may see some exceptions raised.

I think that the work on this branch, save for a few small tweaks, is done. I’ve done some testing myself and have asked for help to get it tested by a wider audience. Hopefully we can get this branch merged this month, if no other problems are found.