Programming, Software and Code

GSoC Idea: Asynchronous IO System



I talked to Coke briefly yesterday evening about GSoC. I hadn't heard specifically that Parrot would be applying for GSoC again this year and even though I was hopeful I didn't want to presume. He didn't have a firm answer about whether Parrot was planning to apply as a mentoring orgnization, but he did say "I think we'd be crazy not to". That's good enough for me.

There are a few topics that I've covered more than others on this blog. One of them has been asynchronous IO. I've covered that topic at length, and was hoping to have the time to implement the system myself. Regrettably I don't think I will have time to do it before summer, which means it's ripe for prospective GSoC students to gander at.

Parrot has a basic, synchronous-blocking IO system right now. This is all well and good but let's all face facts: IO tends to be among the slowest operations that a program performs. IO represents a huge bottleneck, especially for certain types of IO-intensive programs. To alleviate this, Parrot's design documents describe an asynchronous system where IO operations are performed in the background while the program itself can continue performing non-IO operations at the same time.

I'm purposefully conflating ideas of "asynchronous" and "non-blocking" ideas, though strictly speaking there are differences between the two. What Parrot really needs is an "asynchronous non-blocking" IO system, which I will just call "AIO" for short.

In an AIO system, IO requests are dispatched to the operating system and program execution immediately resumes. The dispatch operation returns a request object. There are two ways to keep track of the IO request, if you need to keep track at all: First is to poll the request object to see if it's completed (or had an error). The second is to provide a callback with the request and the OS will execute the callback when the request has been satisfied (again, with error information if the request was not successful).

A current Parrot print operation looks like this:

print "Hello World\n"

An AIO implemention would add new forms like this:

request = print "Hello World\n", callback

This presents a lot of flexibility, and could potentially improve performance significantly for IO-intensive programs.

The IO system could use a few miscellaneous cleanups but is generally not in bad condition. A good implementation of AIO will be well-integrated with Parrot's event scheduler, and possibly with with the threading implementation as well. The scheduler is in good condition but the threading system is not. The aspiring AIO developer may find herself making fixes to any of these systems along the way. If AIO sounds like the job for you, you may want to start reading through the source code now so there are no surprises over the summer.

Comments

This is a subject near and dear to my heart also. I've designed few AIO infrastructures at various jobs and I've learned a few things along the way.

Most important, I think, is that this is one of those areas that needs to be considered from the top down. It's tempting to start by looking at what the OS offers for AIO (select, (e)poll, WaitForMultipleObjects etc.), whip up an event loop and then start building on top of it. What really needs to be done is to consider the ideal interface that you would want to code to in HLLs, and work down through the stack designing an infrastructure that will support that HLL vision on a variety of different OS facilities and event loops.

One thing that needs to be considered from the start is SSL/TLS. Don't be tempted to punt and leave it for version 2. We'll need to consider the libraries (OpenSSL and GnuTLS mainly) and figure out how they'll fit into the stack. Carefully study their APIs and write some throwaway code to make sure you understand the APIs - the AIO aspects are (or at least were 8 years ago when I did this) not particularly well documented.

SSL can get complicated when you consider that, for long-lived connections, the library will periodically decide it needs to renegotiate the session key, so it will do a whole bunch of IO back and forth that the upper layers of the stack have no knowledge of.

Anyway, lots of fun to be had here. Maybe this is the prodding I need to finally get my hands dirty in parrot!

That's some great advice. I have looked at the OS-specific APIs already, and the PDD does seem to offer a good high-level approach. At the basic level, Parrot is going to have an interface like this:

request = write iopmc, "hello", callback

Where the IO PMC will be something like a Socket, FileHandle, or SSLSocket. So while it's not something we want to punt off till later, we have enough encapsulation that we can do things piecemeal.



This entry was originally posted on Blogger and was automatically converted. There may be some broken links and other errors due to the conversion. Please let me know about any serious problems.