IO Work Proceeds, with Questions



Earlier this week I kicked off work on a revamp of the Parrot IO system in the io_rewiring branch. The purposes of the branch are multifold:
  1. To improve speed, by decreasing reliance on PCCINVOKE calls to perform IO operations
  2. To improve flexibility, by realizing that IO-related PMCs do not all hold a common interface
  3. To make basic IO PMC types properly subclassible
The io_rewiring branch is being used primarily for tinkering right now. We still have a lot of questions open as to how we are going to fix everything. However, we have some general concepts that we are trying:
  1. IO API functions like Parrot_io_* are now called by the methods in IO PMCs, instead of the other way around. Most IO operations now, except those called from methods in PIR directly, do not use PCCINVOKE.
  2. IO PMCs subscribe to a number of roles that determine what operations they can and cannot be expected to do. A PMC that does "file" can 'seek' but cannot 'connect'. A PMC opened for writing only does "write", but does not does "read".
  3. The PMC inheritance hierarchy is getting a little bit more sane. Pipes are not going to be FileHandles with a special flag set anymore. Sockets are not subclassed FileHandles either. At the moment, all IO Objects are derived from a Handle type, but this may change. The imporant part is that they all does "IO".
  4. Buffering logic is (probably) being abstracted into a separate PMC type somewhere.
So what my goal is for this branch, in a nutshell, is to break a little bit of encapsulation for major gains in performance, flexibility, and subclassability. Instead of blindly calling a method to perform operations, we check the roles of the PMC and depending on what roles it implements we access attributes and methods as necessary. The reason I want to try this approach is simple: We have a series of PMCs, each with an internal implementation and an external-facing API. The problem is that none of the PMC types are going to have the same API: Not all PMCs implement 'seek', or 'connect' or 'fcntl' for example. So it doesn't make sense for all PMCs to blindly be implementing these methods, or having to have a million API methods to satisfy every need of every potential type. So the question comes down to this: Do we maintain a large standard API with more methods then any one PMC type needs, do we maintain a small standard API and shoehorn all IO PMCs into it, or do we not maintain any specific API, and assume all PMCs of a particular type share common internals that we can poke into directly?

And we're not really poking into the "internals", really. We're using named attributes which are easy to subclass from PIR and are being treated as public fields. We're also trying to use VTABLEs where appropriate. In fact, I have a major complaint about the massive overabundance of arithmetic-related VTABLEs as compared to the dearth of IO-related VTABLEs, but that's a different rant for a different day. If we make the rule that we only acces VTABLEs and named attributes, and if we properly document which of each the different IO PMC Roles require, I think this method should be fine.

Whether the work in this branch ever satisfies all my goals or not, and if so whether we get community approval to merge it into trunk, is up in the air. It certainly is fertile ground for exploration, however, and I'm taking the opportunity to explore in great detail.

This entry was originally posted on Blogger and was automatically converted. There may be some broken links and other errors due to the conversion. Please let me know about any serious problems.