Whiteknight: NameSpace and Other Cleanups

NameSpaces hold methods, but they aren’t supposed to. As far as the user is concerned, Methods are not stored in the NameSpace. However, we don’t really have any other place for IMCC to store them right now. IMCC creates NameSpaces in response to the .namespace directive, and stores Sub PMCs in the most recently declared namespace as it is defined. This includes normal Subs and those marked :method. This is a convenience, we need to store methods somewhere until the classes get created at runtime. It’s also good, at least so far as we have thought it out, to store the methods together in a group somewhere so that when we create the Class at runtime we can import that whole group of methods in one big swipe. It’s more efficient to do it all at once than to load each new method individually.

The big problem is that, according to the user, methods aren’t supposed to be held in the namespace. This means if we do something funny, like define a normal subroutine with the same name as a method, the NameSpace needs to flag the two values, keep them separate (though with the same name!), and when the Class asks for the methods we need to return only the method, but when anybody else asks we need to return the Sub instead. Magic!

Actually, in this case, the “magic” looks like really really ugly code. If you haven’t had your daily dose of ugliness today, take a look at Parrot’s src/pmc/namespace.pmc file. Or, look in src/oo.c, the file where much of the coupling between the Class and NameSpace PMCs happens. Do it. I dare you.

NameSpaces are ugly. I’ve had them on my hitlist of items that need a major redesign and major refactor for a long time. Unfortunately, NameSpaces aren’t the underlying problem. PIR is. And IMCC is.

Originally there was PASM, and PASM was good. PASM is a low-level assembly language that has almost a 1:1 correspondence between instruction mnemonics and the underlying VM ops. It isn’t perfect correspondence because our PCC system is so convoluted that it’s nearly impossible for a normal human to code a subroutine invocation by hand, but it was very close to the ideal assembly language. However, people didn’t want to write code in PASM, but they wanted to bootstrap their compilers so that the compilers ran on Parrot and generated code that also ran on Parrot. Unfortunately impatience got the best of people. Instead of using some other language to bootstrap a stage zero compiler, then using that to write a nicer stage one compiler in a better language, PASM was made more “usable” and thus PIR was born. Why write two stages of a compiler when we can write just one? And why use some other language when we can change PIR to be what we need it to be?

Look at Winxed: it uses a pared-down stage zero compiler written in C++ to bootstrap with. This C++ compiler isn’t glamorous or fancy, but it gets a decent subset of the Winxed language available for use. Then, the “real” compiler, the stage one compiler, is written in Winxed itself. This is what people should have done in the beginning: PASM should have been the code that was generated by compilers, not the code that the compiler was written in directly. Humans shouldn’t be writing PASM. We all know that. Every time I even think about writing PASM directly I throw up a little bit in my mouth. It’s a built-in reflex.

But when it comes to PIR, not everybody seems to realize that it’s just as bad. It’s like saying “I’m not interested in being romantic with a farm animal, but if you put a dress on it and some lipstick…”. The answer is still “no”.

PASM represents a series of capabilities. PASM is the stuff that Parrot can do. PIR adds on to that all sorts of semantics, which says nothing about the haphazard way that those new features and semantics were added over time. Look at the way NameSpaces interact with Classes. Look at the way NameSpaces interact with Methods. Look at how Methods differentiate themselves from ordinary Subs, even though they are both the same exact type of PMC. These are all PIR sematics which have enforced themselves on the underlying objects.

Think about what is the difference between an ordinary Sub PMC and a Sub PMC that has been marked with :method. If it has that flag, the Sub can’t be stored normally in a NameSpace (not even as a global data item for use outside of normal method lookup and dispatch) and IMCC automatically inserts instructions to pull the first argument to the function out into a variable named “self”. And then people say that we don’t all like the name “self”, and would like the ability to give it an arbitrary name by adding an :invocant flag to an ordinary PMC parameter. People also say that methods shouldn’t be stored in the NameSpace by default, but then we need to add a :nsentry flag to force IMCC to insert it into the NameSpace anyway. Read that again. We add a lot of really messy special-case logic to the NameSpace PMC to prevent methods from being automatically stored there, then we bolt on a new flag to PIR syntax to force IMCC to store some methods there anyway. I can’t make this stuff up. Instead of adding more magic, more flags, and more semantics to the mix, why don’t we start ripping the bad crap out? What if we don’t store anything in a namespace by default, and give the user a simple interface to store anything in there that they want? What if we let the user do what they need to do, instead of providing a bad default behavior and then needing to provide a lot of hack-job tools to work around those bad defaults?

I’m happy to write about the problems with PIR and IMCC. I’m happy to write about these things at length. The better use of time is in thinking about what we want to replace these things with. We need to answer two important questions: What do we want the “perfect future parrot” to look like? How do we get from here to there?

Let’s start at the base: the object model. We want 6model in Parrot, although some of the necessary changes could be done with even a modest refactor of our current object system. At the base, what we need is a metaobject to which methods and attributes can be added. An add_method method on the metaobject should be sufficient. We don’t need to group the methods anywhere at compile time. At least, we don’t have to if we don’t want to. Eventually we want to have the tools so a user can assemble and store any arbitrary constant PMCs in the packfile at compile time. We just don’t want Parrot to be making those decisions for the users. All we need is the sub name or the sub ID value and we can fetch it directly from the packfile. An HLL can output a sub to run early in the runtime process to create and populate the metaobject. PASM does not need to be aware of it at all, all PASM needs to provide is a syntax for defining Subs, a syntax for getting a static reference to a Sub at runtime, and the syntax necessary to allocate the metaobject and call methods on it. That’s not too bad for requirements so far. Nowhere in this process do we need to instantiate or use a NameSpace.

In fact, an HLL compiler should be able to build the metaobject at compile time and serialize the whole thing–methods, attributes, and all–as a single constant in the packfile. That will cut down on all runtime overhead of adding methods to the metaobject that the previous system would require. The good news is that we aren’t too far off from having this capability available in Parrot. Think about this kind of situation seriously. The user can decide at compile time what types to use for things like namespacing and metaobjects, instantiate those in the compiler, serialize them into the packfile, and have them available immediately when the packfile is executed. In many cases, Parrot might not need to provide default types at all, because users can provide things that they need to fit their own semantics much better than Parrot can.

I don’t like magic. I don’t want things to be happening that I can’t see, at least not in an assembly language. In something like Perl6 there are semantics and you expect things like the “self” keyword to work in a method call. In something like PASM you don’t want variables magically appearing, or parameters magically declaring themselves in a way you can’t see. Basically, what I am saying, is that we want to get rid of the “self” keyword that PIR has and the “:method” flag along with it. The “:method” flag does two things: it magically prepends a parameter “.param pmc self” to the front of the parameter list for the Sub, and it causes the Sub not to be stored in the NameSpace. If we do what I suggest previously and use the metaobject and static Sub references to work with methods in lieu of storing them in the NameSpace, and we force subs to explicitly declare the invocant in the list of parameters, we can get rid of “:method” (and make a whole class of hard-to-debug errors disappear from PIR/PASM forever).

Let me take a minute to clear up terminology. PASM is at the right level of abstraction, but is missing some important features which have been added to PIR over the years. PIR is far too high level, and needs to have some features removed which are more problematic than they should be. I’m going to call the new language I’m envisioning “PASM” or even “New PASM”, because it is much closer to the former than the later.

At the PASM level, and even deeper into the guts of Parrot itself, there really should not be a difference between a method and an ordinary subroutine. Invocants are passed at the front of the argument list and pulled off the front of the parameters list. Internally, PCC handles this mostly correctly, although there are still some vestiges of an older system lingering around. As far as anybody is concerned, a Sub is a Sub is a Sub, and it doesn’t matter whether it’s invoked with “o.foo(1, 2, 3)” or “foo(o, 1, 2, 3)” syntax. In fact, I don’t see any reason why PASM would need to support the former. An HLL can easily generate the second form without any hassle, and it will be mostly transparent to the user of the HLL. The big difference is where the Subs are found: Subs found from Class.find_method() method calls will be treated by the HLL like methods. Subs found elsewhere will be treated like ordinary Subs by the HLL. Both are identical in the eyes of Parrot.

So let’s rip some stuff out of NameSpace. NameSpaces don’t need to store methods anymore, so rip out all that logic. NameSpaces need to store two things: References to other namespaces (to form a traversable tree), and references “global” data items like variables and subroutines which are stored in the namespace. NameSpaces do not need to store a reference to a Class of the same name unless the user explicitly stores the Class in the NameSpace. Likewise, since the Class doesn’t need to pull methods out of the NameSpace anymore, it doesn’t need a reference to the NameSpace. Cut the cord. Separate the two objects forever more. Quoth the Parrot, forevermore.

Remember the :nsentry flag? Kill that too. If you want a function to be stored in a namespace, call the appropriate method or opcode and add it there yourself. Get rid of the .namespace directive. You can create namespaces at compile time and store them as constants in the packfile, or you can create them at runtime if you want them. Parrot doesn’t need to be creating them for you (and shouldn’t presume to know which type to instantiate if you say “I want a namespace-like object”). If the user doesn’t store the Sub someplace like a NameSpace for easy lookup, it will only be available as a static compile-time constant reference or by looking it up in the packfile object. Since we aren’t blindly storing things in places people don’t want us to, we can kill the :anon flag too. Good riddance.

What are packfiles? As I see them, they are files which contain Parrot constants. Some of those are data constants, others are executable Sub constants with associated bytecode, but at the core it’s all a bunch of constant data generated by a compiler. When we load a packfile into Parrot we thaw all those values and make them usable again.

With that idea in mind, what does PASM of the future look like? Well, we have the ability to declare constants in PASM. Sub constants are obvious. Those are the way Parrot does stuff. But other types of constants should be able to be declared and stored in the packfile too. One function gets marked specially as a “main” function where execution starts, and that function is in charge of looking up references to the necessary constants (subs and data) to continue program flow, etc.

Of course, we don’t know what types of data objects our users are going to want to be storing in their generated bytecode files. There’s no real way for us to add PASM syntax to declare types that we don’t know about yet, or to do it in a way that is going to support all sorts of types that people haven’t even thought about yet. So, maybe we leave that kind of stuff to the HLL developers and stick with using PASM to declare Subs and code literals.

Get 6model into Parrot. Use it to start rewriting built-in types in something higher than C, such as Lorito (if it’s available) or Winxed or NQP. Winxed and NQP might be less desirable than Lorito for certain performance-critical types, but more desireable for some other types. The more that we can move out of C the better I think we will be. Internal types bring overhead, and for types that are not used frequently this overhead is unnecessary for most programs. Rip namespaces apart. Don’t use them to store stuff in. Remove all the piled-on garbage syntax for controlling what does and does not get stored in the NameSpace. Remove all of that. Give users the ability to declare Subs and to get references to them out of the packfile. There is nothing else that Parrot can or should do, because we don’t know what all our users want and can’t possibly provide any behaviors that won’t completely suck for some people. Rip out all the stupid semantics that PIR imposes, and then rip out all the stupid workarounds that users need because the PIR semantics suck so bad.

Get rid of global class storage. We don’t want or need it, and it’s a huge hassle anyway. HLLs should not have to make some kind of a lookup key object and ask Parrot to find the metaobject for them so we can instantiate an object. That’s bullmalarky. The HLL should create and control the metaobject, and Parrot shouldn’t pretend that it can organize them very well. The HLL should have the metaobjects it wants and should organize them in a way it expects. Maybe the HLL will choose to store them globally in a NameSpace or other map type or whatever they want. Parrot shouldn’t be doing that and absolutely should not be organizing classes in a hash by strig name. For built-in types maybe we do need a little lookup somewhere so we can get the metaobjects if they exist and create them if they don’t, but that’s only for built-in types.

Currently, we would probably look up a class in a global class registry, or look up the associated NameSpace and then ask the NameSpace for a reference to the Class object it’s associated with (automatically creating a new Class if it can’t be found, and doing the magic dance of pulling the methods from the NameSpace into the new Class). That’s wrong. Classes and NameSpaces shouldn’t be associated together, unless the HLL or the user explicitly wants to work with them together. Parrot shouldn’t be doing that internally. I can’t stress this point enough. I’d say it a few more times, but this blog post is already getting long enough.

I think we can cut out the :vtable flag too. If the metaobject has an .add_vtable() method, we can use that to add them. Of course, there will be far fewer vtables in my hypothetical future parrot. I’ll talk about that idea later.

On the subject of multidispatch, the way we do it now with type name strings is stupid. That’s mostly a symptom of PIRitis. The PIR syntax for multidispatch is of the form :multi(<type name>,<type name>,...). The PIR syntax uses string names, so the underlying mechanism has evolved to use string names too. That’s wrong. Classes might not have unique names. They might not have names at all. Parrot damn sure shouldn’t be managing classes internally anyway, and there won’t necessarily be a good way for Parrot to look up a class by name anyway. MultiSubs should be treated more like “meta-functions”, analogous to how the metaobjects are to objects. At compile time or early at runtime the MultiSub can be created with explicit static sub references and stashed whereever the HLL deems appropriate. The HLL can decide how to specify a signature for the multisub, and should be able to provide its own sorting and dispatch algorithm. I think Parrot can provide a nice default option (or, a library of them!) but we should’t make our way of doing dispatch the way of doing it.

Parrot shouldn’t maintain an internal cache of named multisubs to do dispatch for things like arithmetic and compartive vtables. Cut that out too. If an HLL wants to use a method to test equality, let the HLL generate that code. If the type provides an “equals” vtable/method, maybe fall back to that. If we have to get rid of the various equality ops, let’s do that. If the op is calling a vtable, which falls back to MMD, which invokes a method somewhere, let’s cut out the middle steps and just have the HLL invoke a method to test for equality. It doesn’t make a lot of sense for us to provide an op that takes a circuitous route to invoke a method that the user defined and could have called directly. This is not to mention the fact that far too often doing a simple test for equality between PMCs will lead to a strange error message from the MMD system saying that it can’t find a suitable candidate for signature “PP->I”. What the hell does that even mean, and why is Parrot using MMD to test for equality? Don’t the types involved know how to determine amongst themselves whether they are alike? And if the objects themselves can’t, the metaobjects almost certainly should be able to.

In C#, the base type object has four methods defined on it by default: GetType, Equals, GetHashCode, and ToString. I think we might want a handful more, but not much more. C# is a pretty powerful language, and .NET is a pretty powerful and speedy runtime. I’m not saying it’s the best or that we should want to borrow too many ideas from it, but they are clearly doing something right. Food for thought.

That’s enough for this particular rant. Eventually I’ll be talking about some of these topics on more detail. I’m serious about getting some of these problems sorted out and making Parrot cleaner, nicer, easier to use, and more amenable to HLLs with wildly divergent semantics. We all want those kinds of things, and this is part of how I think we can get there.

16 Feb 2022	ParserObjects 4.0 Development projects
13 Feb 2022	Welcome to 2022 Personal
17 Dec 2021	Good Programmers Manage Expectations Philosophy

Programming, Software and Code

About

Links

NameSpace and Other Cleanups

Related Posts