ask pig about pig -> 162,000 ops


Great, great progress this past week on both the display system and optimizations. A few games were showing visual artifacts when drawing things in the status line area, and that is resolved.

The more important achievement this week was in the realm of optimizations.

Why z8 games are slow

To investigate why it takes so long to get a response from Lost Pig after entering commands, I logged out the full list of all z-machine opcodes being performed from <ENTER> to game response. I then looked through the opcodes being called by the game, tried to get a sense of which ones were called more or less frequently, and targetted those areas for optimizations.

I stumbled upon the very simple command `ask pig about pig` which seemed to bring the system to its knees. It took some 30 seconds for a response to come back. 

Imagine my shock to find that this simple command near the game's start triggered an avalanche of opcode calls. 162,000+ ! In fairness, other prompts only triggered about 20,000 opcode calls. Let's compare this with Planetfall.z3. `ambassador, tell me about brochure` triggers about 1,000 opcode calls. So in this very informal check, a z8 game seems to request 20x to 160x more work from the system.

1 z-machine opcode == many Pico-8 calls

One thing to keep in mind is that the z-machine is a virtual machine. The opcodes are simply the requests for work to be done by the system. It is up to the developer of the interpreter to decide how best to perform that work. Let's look at one simple z-machine opcode

je a [b1 b2 b3] — 2OP:$1
Branch if a is equal to at least one of the other operands.

So we'll receive some list of numbers and we must compare `a` with the other numbers passed. If any of them match, we do a `branch` instruction to send the program off to do something like. It's like `if (a == b1) then do something else, otherwise do another thing`

So, first we have to implement a way to check the values for equality. Because Pico-8 has a strict limit on a program's codebase size, I had implemented a function that would check if a value was in a set of other values. (*put a pin in that) So we call the function `is_member` which was implemented using Lua's `deli()` command. We get back true or false.

Now that we know true/false, we have to do something with that knowledge. Documentation says to `branch` so I call the branch function I wrote. The branch function has to do quite a bit of work, fetching values from memory, breaking them apart into individual bits, evaluating those bits, and possibly setting values into memory. So the branch() function itself calls other functions and finally finishes the original request for `je`.

The accumulation of small improvements

In my initial evaluations of Lost Pig's opcode calls, I found `je` to be far and away the most called opcode. In a simple input/response sequence, `je` accounted for about 10% 14% of the calls. Anything we can do to make `je` faster should pay accumlated dividends over the course of gameplay.

So I rethought my initial usage of the helper function `is_member` and its use of `deli()`. Function calls cost performance in Pico-8. In this case I was packing the values passed to `je` into a table, calling `is_member()` which called `deli()` then returning the result back up the chain. But if we're being honest with ourselves, all we actually need to do is `if (a == b1 || a == b2 || a == b3) then`. This proved to be measureably faster.

The main lesson learned during this evaluation/improvement process has been that the trade-off between writing a helper function and simply doing the work immediately inline in the code can make a difference in Pico-8 development. It does risk making the code a little harder to follow, but I could verify that the improvements made a tangible difference to Lost Pig's performance.

Where previously it was about a 30 second wait for `ask pig about pig` now it is about 17 seconds, with text coming on screen in 9 seconds. I tried Lost Pig on the C128 in the amazingly optimized Ozmoo for comparison. The C128 takes over 2 minutes to respond to that prompt. So, 17 seconds suddenly doesn't feel like an obscene amount of time to wait, given the limitations of the system.

Get Status Line

Leave a comment

Log in with itch.io to leave a comment.