Customizing Eliza8 Scripts


The original ELIZA introduced the concept of a "script" which drives the understood vocabulary and grammar, and defines responses to user input. The most famous script was called "DOCTOR" which is used as the default in this port, Eliza8.

However, in the interest of keeping things as simple as possible, I have modified the script format to be easier to read by a human (IMHO). The original format was written in LISP and contained innumerable `(` symbols; very easy to lose track of nesting. That was then modified by a JAVA port, which introduced a more simplified script format, but which (to my mind) contained a lot of redundant information that could be stripped out without upsetting context.r

The parts of a script

The file "base_script.txt" shows what the default script is like in plain text. It is easy to see the primary commands to Eliza8, as they appear as a kind of "header" for each section, and end in a ":" The order of the sections is not important, but each section must be fully defined before starting a new section.

colors:

This is unique to Eliza8 for the Pico-8, and is used to color the UI to fit the mood of the script you have written. A horror-themed script might use Halloween colors, for example. It is a comma-separated list of four numbers. Any of Pico8's full, extended color palette may be used. They must be in the following order:

  1. Eliza8's response text color
  2. The user's input text color, as seen in the conversation
  3. The text input background color (and edge trim for the conversation window)
  4. The primary conversation background and text input color

greeting:

The very first thing Eliza8 will say on startup.

closing:

What Eliza8 will say when receiving a "quit:" keyword

fallback:

When Eliza8 can't parse the user input, has nothing in memory to talk about, and literally has no other options, this is what Eliza8 will say.

quit:

A comma-separated list of words that can be used to "quit" a session with Eliza8 (although, in truth, Eliza8 just keep going; it's just for show)

preprocess:

Eliza8 uses the concept of a "weighted keyword" to decide which part of a user's input to respond to. However, Eliza8 cannot understand the concept of a "synonym" in the context of a keyword (although synonyms can be defined and used for another purpose). So this list defines a kind of substitution that should occur before deciding which keyword to act upon.

One entry per line, as many as you need, a preprocessing directive is written as

<user input word> = <eliza8 keyword>

In the default script, this list is used to route synonyms like "chatbot" and "pico-8" to the generic "computer" keyword, and also to break contractions down for verb extraction. In this way "i'm" is first turned into "i am," which  gives Eliza8 the ability to recognize the word "am" and act upon it.

postprocess:

At first glance, it is similar to preprocess: above; however, postprocess is used for the final response string Eliza8 says to the user. One of the tricks Eliza8 performs is to extract a slice of the original user input, postprocess words like "my" into "your" and use the user's words against her/him. So, the user says, "I feel scared that I am losing my mind." and Eliza8 can flip that into "Your mind?"

Like preprocess:, you can define as many of these as you need, one entry per line

<user input word> = <eliza8 transformed word>

synonyms:

Synonyms are used in grammar pattern matching. Once Eliza8 has extracted the keyword to act upon, there is a list of grammar patterns associated with that keyword that are evaluated. (see "keywords:" below for more information).

Eliza8 basically only "knows" a handful of vocabulary, but synonyms for that word can be referenced during the parsing phase. So, for example, in the grammar pattern "* i am * @sad *" the word @sad is a marker for a potential synonym. Rather than define each and every possible "* i am * depressed *" and "* i am * unhappy *" pattern which all lead to the same response, we can instead list those "sad" synonyms and let the parser match against any of those.

It would be nice to be able to do this for keywords as well, but think of how many combinations of synonyms could be checked at any given time. We have to check every word the user inputs, check every synonym, check every combination of synonyms, and this spirals out of control rapidly. Rather, we use preprocess: to simplify the language parsing task as much as possible first, then use synonyms sparingly once we're close to generating a response.

These are defined as

<word> = <list, of, comma, separated, synonyms>

keywords:

Here's the real magic of Eliza8. This is where we define which specific keywords we want Eliza8 to understand, which keywords are more/less important than others, the sentence patterns in which that keyword might be used, and a list of responses to that usage. This includes the ability to use synonym substitutions, to forward the response handling to another keyword, or to even extract a piece of the user's input and use it in the response.

Keyword definitions are written in the format

keyword <your keyword> <optional keyword weight>
     pattern <your pattern>
          <response pattern 1>
          <response pattern 2>
          etc...

A keyword is...

Each keyword definition must start with the actual word "keyword." Immediately after that is the keyword you are defining. Immediately after that is an optional "weight" to the word. The higher the weight, the more "important" it becomes to Eliza8 in deciding how to respond to the user. No weight will use the default of "1", the lowest possible weight. Notice in the example script, "computer" has a weight of 50. This pretty much assures that any mention of a computer in the user's input will be the thing Eliza8 responds to. Keywords of equal weight will be evaluated in the order they are defined in the script.

A pattern is...

Next, we define one or more grammar patterns for this keyword. This line must begin with the actual word "pattern" followed immediately by a single grammar pattern. A pattern is composed of a number of potential components:

  • * : a wildcard representing 0 or more words, any words
  • literal text: actual words that must appear in the user's input
  • @<word>: a synonym, as defined in "synonyms:" above
  • $: as the first character in a pattern, indicates the response should be "memorized"

So, let's look at the example grammar pattern "* i am * @sad *" This will match any of the following user inputs

  • Recently, I am very unhappy about work.
  • I am quite depressed today.
  • I am sad.

Adding "*" here and there does not hurt, and in fact adds flexibility to Eliza8's ability to match a variety of similar sentence structures. However, the system's stupidity is also revealed when the user says things like, "I am not sad today." "not" would simply be "the wildcard match" and this default script cannot parse that it is negating the sadness. A more clever script potentially could, I believe.

Prefixing "$" does not affect the pattern match whatsoever. However, what it *does* do is tell Eliza8 to commit the response to "memory" and not show that response to the user immediately. After committing this to memory, the *next* best keyword and pattern match will be evaluated and shown to the user. This is one of Eliza8's neatest tricks that really breathes life into the conversation. In doing this, the next time there is a situation where Eliza8 doesn't know how to parse a useful keyword out of the user input, a "memorized" response will be shown instead. This has the effect of "returning to an earlier topic" and gives the illusion that Eliza8 is paying attention to the conversation.

A response is...

Now that Eliza8 has found a keyword of interest and a pattern match, what should Eliza8 say to the user? You can define any number of responses. Eliza8 will use them, in order, as necessary throughout a conversation, so as not to repeat a previous answer too quickly. A response is composed of a few different parts, to be used at your discretion.

  • literal text: Just some words to say, in the way you want them said.
  • (#): A number representing a piece of the user's original input to be quoted back in the response.
  • goto <keyword>: The literal word "goto" followed by one of the defined keywords.

While you could certainly just write out explicit sentences as literal text, the magical part of Eliza8's response system comes from the "(#)" marker. When a pattern is matched, each piece of the pattern is internally assigned a number from 1 to x. Consider the pattern we used earlier, "* i am * @sad *" This pattern is broken down into 4 (four) smaller pieces of text, numbered as follows

  1. The text before "i am"
  2. The text between "i am" and "@sad"
  3. The synonym for "@sad"
  4. The text that comes after "@sad"

So, in this case "i am" is thrown away, because that was only used as the keyword trigger to find the response. So, in other words, the responses already take that "i am" into account in their wording. 

In a response, such as "I am sorry to hear that you are (3)", the user input for #3 above, in this case the synonym for "@sad" will be substituted into Eliza8's response. If the user had written, "I am depressed." this lets Eliza8 respond, "I am sorry to hear that you are depressed."

The "goto <keyword>" is really more of a scripting convenience. Note the responses for foreign languages. They are all start with "goto xforeign" and are then followed up with a language-specific response. So, when Eliza8 is hit with the word "francais" anywhere in the sentence, the first "goto" will defer evaluation to the keyword "xforeign." If "francais" comes up again, then the "I told you, I don't speak French." is used. If it comes up *again* then the cycle begins again, and so on. This allows the scripter to handle common cases in a more generalized way.

Had we used preprocess: or synonyms: to consolidate all foreign language responses, Eliza8 would never have an opportunity to distinguish between the languages and give a unique response. "goto" lets us combine some of the benefits of synonyms with the benefits of unique responses.

Loading a custom script

Your script must be a raw text file: no PDF, RTF, DOC, etc. Just a TXT file. Lines must be as you see in the sample script, essentially one directive per line. White space at the beginning of a line is ignored and is included in the sample simply to make it easier to edit and read. Feel free to use tab indentation to organize your script. Feel free to insert extra blank lines between directives to visually separate information.

Once you have your script, simply launch Eliza8. After Eliza8 greets you, just drag and drop your script into the Eliza8 window. You will be told that the script is loading, then you should be shown your greeting, as defined by your script. The script is now loaded and ready to go.

Making a default script

To do this, you must own a licensed copy of Pico-8 and use the Pico-8 .p8.png file. Load the program into Pico-8 memory and enter the code editor. Inside is a variable called "script". Simply copy-paste your custom script and replace everything between "[[" and "]]". Save your custom Eliza8 to keep this as your personal default script.

A note on Pico-8 boundaries and Eliza8

Compared to the machines that built the original ELIZA, Pico-8 feels absolutely unbounded. However, there are thing to keep in mind. There are two ways to bring a script into Eliza8.

  • Use standard Eliza8 then drag-and-drop your custom script into the running program.
  • Embed the script into the .p8 file and save that as a custom modded version of Eliza8

It is important to remember the limitations of the Pico-8 system when considering which to use. The .p8 file system itself has limitations on tokens, characters, and "compressed size" of the cartridge. Also, Pico-8 has a 2MiB memory limitation.

The default script is about 13,000 characters (13K), which includes all of the whitespace for formatting (unnecessary) and newlines (necessary). The current .p8 cartridge has the following values:

  • Tokens: 2369 / 8192
  • Chars: 30744 / 65535
  • Compressed: 9720 / 15616

As you can see, the complete program is well within .p8 boundaries and there is a lot of room to grow the script, or add your own interface, maybe even drop Eliza8 into some game environment? So, if your script is embedded, you will have to think about the character limitation of the .p8 file system.

However, I've done tests for another project and it is quite possible to bring in much more text than 65K via drag-and-drop. The book "Treasure Island" is 300K characters, and I was able to import and parse the whole book (for another project) in just a couple of seconds. This is all to say that Pico-8's memory is more than sufficient to hold ambitious that can't fit inside the cartridge limitations.

The default script in Eliza8 tends to hit a maximum memory usage of about 130KiB (out of 2048) with a CPU usage during parsing time of about 0.13 (out of 1.0). I believe this means there is room and potential for significantly more complex scripts, even potentially games which make creative use of grammar patterns.

Files

base_script.txt (for making your own scripts) 12 kB
May 20, 2021

Get Eliza8

Leave a comment

Log in with itch.io to leave a comment.