Immutable Strings branch performance
12 May 2010Parrot now uses immutable strings internally for it's string operations. In a lot of ways this was a real improvement in terms of better code and better performance for many benchmarks. However, many HLL compilers, specifically NQP and Rakudo suffered significant performance decreases with immutable strings. Why would this be?
It turns out that Immutable strings are great for many operations. Copy operations are cheap, maybe creating a new STRING header to point to an existing buffer. There's no sense to actually copy the buffer because nobody can change it. Substrings are likewise very cheap, consisting of only a new STRING header pointing into the middle of an existing immutable buffer.
Some operations however are more expensive. A great example of that is string appends. When we append two strings, we need to allocate a new buffer, copy both previous buffers into the new buffer (possibly translating charset and encoding to match) and creating a new header to point to the new buffer. With the older system of COW strings, appends were far less expensive, and many pieces of code--especially NQP and PCT code generators--used them in large numbers.After the switch to immutable strings, any code that was optimized to use lots of cheap appends began to take huge amounts of time and waste lots and lots of memory.
The solution to these problems is not to use many bare append operations on native strings, but instead to create a new StringBuilder PMC type. StringBuilder stores multiple chunks of strings together in an array or tree structure, and only coalesces them together into a single string buffer when requested. This allows StringBuilder to calculate the size of the allocated string buffer only once, only perform a single set of copies, not create lots of unnecessary temporary STRING headers, etc.
Several contributors have worked on a branch, "codestring", for this purpose, and some results I saw this morning are particularly telling about the performance improvements it brings. Here's numbers from the benchmark of building the Rakudo compiler:
JimmyZ: trunk with old nqp-rx real:8m5.546s user:7m37.561s sys:0m10.281s
JimmyZ: trunk with new nqp-rx real:7m48.292s user:7m11.795s sys:0m10.585s
JimmyZ: codestring with new nqp-rx real:6m58.873s user:6m22.732s sys:0m6.356s
The "new nqp-rx" he's talking about there is nqp-rx modified to make better use of the new CodeString and StringBuilder PMCs in the branch. The numbers are quite telling, build time is down about 12%. I think the finishing touches are being put on, and the branch will be merged back to trunk soon.
This entry was originally posted on Blogger and was automatically converted. There may be some broken links and other errors due to the conversion. Please let me know about any serious problems.