Programming, Software and Code

Cheap Subclasses

I had an idea the other night when reading over PDD23. That PDD talks about the intention to have an entire hierarchy of exception types, but then mentions a caveat that having too many types is expensive. That got me to thinking, does it really have to be so expensive to make subtypes?

In Parrot when we create a subtype we first create a new VTABLE struct. This struct contains function pointers to all the VTABLE interface functions, plus a small amount of metadata about the class. The VTABLE structure contains a string that is the class name, and a pointer to the Class or PMCProxy PMC that defines the type. There are several function pointers in the VTABLE structure. On a very quick count tonight it looks like there are about 184 of them, and before the vtable_massacre branch merged there were significantly more. Plus other fields, there are over 200 pointers (or fields with equivalent size) in that structure. It's a huge amount of memory to hold for every type, especially if HLLs are expecting to be able to create large amounts of their own types.

Now, consider a case like what is described in PDD23, where we have several exception subtypes which appear to differ from each other only by name. It's a huge waste to give each of these subtypes it's own 184-pointer VTABLE structure, when they are all going to be mostly identical. It's absurd to do it that way, and this is probably a big reason why we don't support the subtypes as described in PDD23.

Consider now the case of user-defined classes and subclasses. This is, I suspect, the largest set of types for most applications. Every PIR-defined object type is an Object PMC, which means the VTABLE structure in C for every user-defined type is 99% identical to the VTABLE structure of Object. All the function pointers, all 184 of them, are identical. The associated NameSpace PMC (after chromatic's refactor the Class PMC instead) contains a list of all the :vtable and :method Sub PMCs. The VTABLEs in Object all search the NameSpace for an override and then launch that override if provided. So for types defined in PIR, we don't need the whole VTABLE struct: just the pointer to the Class PMC that contains the info. We can point the VTABLE pointer to Object's VTABLE and use it without needing an expensive copy.

Instead of creating a Class PMC and a VTABLE structure with over 200 pointers, we only define the Class and the handful of defined overrides that we already define anyway. This is significant memory savings for applications that define many types.

There are two options to implement this kind of idea:
  1. Add a PMC* pointer to every PMC that points to the Class or PMCProxy object that controls it. This could create a mess in GC if Class and PMCProxies weren't marked constant.
  2. Define a new "PMCType" structure. PMCType would contain pointers like a string name, a Class PMC pointer, and maybe a VTABLE pointer. If we add this structure, PMCs get larger by one pointer. If we replace the VTABLE struct and include a pointer to a VTABLE in the PMCType, we have to suffer an additional pointer dereference per VTABLE call (with opportunities to cache).
So this system is not without it's tradeoffs, but with this in place we gain the ability to define large numbers of cheap subclasses of built-in types like what is specified in PDD23, but we also significantly simplify the process of creating new classes in PIR and reduce the amount of memory required for each type.

This entry was originally posted on Blogger and was automatically converted. There may be some broken links and other errors due to the conversion. Please let me know about any serious problems.