Exploring Squeak's Virtual Machine Model

In this article I explore Squeak's Virtual Machine model, including a description of the aproach it takes to implement it's interpreter. As we'll be working with C, I'll assume you know how to compile the vanilla VM. If not, you can read my previous article to learn that here.

Diving into Squeak's VM

Squeak's VM model is very similar to the original smalltalk-80 implementation [GoR83]. It consists of a rather simple virtual processor with 1-byte instructions and a stack-based Instruction Set. This means that it has an imaginary infinite stack in whose top the instructions do all their work. Smalltalk code inside the image is saved into instances CompiledMethods, which basically consist of ordered collections of encoded instructions that match this Virtual Machine instruction set. I say encoded because each kind of instruction that the VM accepts has been assigned a unique number between 0 and 255 that represents it. We call these encodings the VM's bytecodes. These bytecodes includes all the different kinds of instructions that a machine with a stack-based instruction set would need to work.

 

VMModel

Squeak's VM model. The top of the stack points to aMagnitude. Instruction pointer points to some bytecode within someCompiledMethod bytecodes.

Stack-based IS

Consider the following code:

Magnitude >> #hash     "Hash must be redefined whenever = is redefined."
    ^self subclassResponsibility

You can inspect the compiled method generated for this method by inspecting this: Magnitude >> #hash, which is an instance of CompiledMethod. For each method there is a CompiledMethod that saves machine code that has to be executed when the method is invoked. If you look into "all bytecodes" you'll see:

0x70 self (actually means push self)
0xD0 send: subclassResponsibility
0x7C returnTop

the first line is an instruction to push self into the top of the machine's stack. It has bytecode 0x70. Then, the following line is a send instruction, with bytecode 0xD0. Who will be the receiver of the message? being a stack-based machine, it'll be the object placed in the top of the stack. You may wonder how does it specify, in just one byte, that the name of the message it has to send is #subclassResponsibility and not other. Well, the CompiledMethod instance has not only the bytecodes but an ordered collection of literals, where it places references to things like method and class names it references. Then, bytecode D0 not only means send a message to the object pointed in the top of the stack, but also means send message whose name you can find in literal nº 0 in my collection of literals (actually, bytecode Dx means "send message whose name you can find in literal nº x in my collection of literals") . The last step is to execute an instruction that returns the top of the stack (where the result of the self subclassResponsibility has been placed after execution). This instruction has a bytecode numbered 0x7C.

All the above description corresponded to the instruction set of the VM, we still lack the description of the interpreter, which acctually does all the work by taking the bytecodes in order and executing them one by one. Squeak's interpreter is one of the pieces of the VM written in Slang, and you can browse it by opening a browser on the Interpreter class (which will be available after you load VMMaker package). It's most important method is Interpreter >> #interpret, which consists of an endless loop that fetches the next bytecode and executes the corresponding action.

Into the C

Inside interpret method, most of the magic lies in this line

[true] whileTrue: [self dispatchOn: currentBytecode in: BytecodeTable].

which represents the fetch-decode-execute endless loop of the machine. You won't find an implementor of #dispatchOn:in: inside Interpreter. This is because the Slang parser has a special case that translates the ocurrence of #dispatchOn:in directly to a C switch statement using it's BytecodeTable dictionary as a reference for each case. You can inspect it to see the bytecode-to-selector mapping:

Interpreter classPool at: #BytecodeTable

So, to understand how the dispatcher works we'll have to dive a bit inside the C sea of the VM. To observe the interpret loop in all it's glory you'll have to open the Generated C code folder and there [gnu-]interp.c. There you can find most of Interpreter methods, but translated to C. If you browse a bit you'll find #interpret method, already translated. You should look for:

sqInt interpret(void) 

Here we see that it has replaced the #dispatchOn:in: with a 256 cases expanded switch, each of them corresponding to it's matching bytecode. Browse case 112 (0x70), you'll find "pushReceiverBytecode", which pushes self pointer into the top of the stack, as we learn't before. Browse 0xD0(208) and 0x7C(124) also to verify they match their corresponding instructions.

 

This is all for now, I may update this article later to add some info and I'll absolutelly add an additional article about extending the virtual machine (adding primitives and plugins).

 

References

[GoR83] Goldberg, A., Robson, D.: Smalltalk 80: the Language and its Implementation.
   Addison Wesley, Reading, Mass. (May 1983)