TinyVM – A small and easy to understand virtual machine in C

(github.com)

241 points | by jxub 2231 days ago

9 comments

  • userbinator 2230 days ago
    After seeing C4[1], everything else doesn't seem tiny at all... and maybe it's just me, but this is another one of those projects where I found the directory layout rather confusing, especially for something that claims to be "small and easy to understand." bin/ is empty, there's only a single nearly-empty file in src/, lib/ is also empty, include/tvm is two levels but one is empty, and all the interesting stuff actually appears to be in libtvm/ .

    More importantly, even after going through all the files in libtvm/ , I still haven't managed to find the main instruction execution loop nor the decoding switch. Sorry, but I don't think "small and easy to understand" applies, and I've had experience with VMs and interpreters and the like for many years. Compare with this, for example:

    https://github.com/tjmerritt/z80

    A rule-of-thumb when investigating the source code of a project for the first time: if I have to go more than 2 directories deep to get to the "meat" of the code, my desire to explore further drops significantly.

    [1] https://news.ycombinator.com/item?id=8558822

    • dfbrown 2230 days ago
      C library public headers are commonly two levels deep to so that projects using them can add the "include" directory to their header search path and in their code have #include "<libname>/header.h". It helps avoid filename clashes.
    • lmitchell 2230 days ago
      I dunno, I find it pretty easy to understand... seems like the main instruction loop is in tvm.c, in the function tvm_vm_run(), which is exactly the first place I looked and seems totally sensible to me.

        for (; vm->prog->instr[*instr_idx] != -0x1; ++(*instr_idx))
          tvm_step(vm, instr_idx);
      • spc476 2230 days ago
        Compare that to my 6809 emulator [1]. I have a mc6809_run() function that is pretty much the same, but mc6809_step() is in the same file and not hidden in a header file. There also seems to be a lack of memory operations (like reading or writing) other than PUSH/POP.

        [1] https://github.com/spc476/mc6809 [2]

        [2] Yes, it lacks a README. I know.

    • coldtea 2230 days ago
      >bin/ is empty, there's only a single nearly-empty file in src/

      Because that's where the generated binary will go.

      >lib/ is also empty

      Presumably the same for any lib files?

      >include/tvm is two levels but one is empty

      That's as intended too, poor man's namespacing.

      • saagarjha 2230 days ago
        > Because that's where the generated binary will go. > Presumably the same for any lib files?

        Why is even tracked, then? Surely putting it in .gitignore is a better way to go?

        • numpad 2229 days ago
          I find leaving them in the repo makes it pretty clear how your have to configure your build process. For GCC, I instantly know I need `-I ./tinyvm/include/` and can use `#include <tvm/tvm.h>` etc.
    • acqq 2230 days ago
      I’ve found that switch fast, in a few clicks, reading this on my phone:

      https://github.com/jakogut/tinyvm/blob/master/include/tvm/tv...

      I’ve used the info you provided as the start, of course.

      • ben_bai 2230 days ago
        In a .h file?

        Have to agree with userbinator.

      • e12e 2230 days ago
        Isn't it odd to have:

        /* nop */ case 0x0

        Rather than #define NOP 0x0? Namespacing? How about vm_NOP?

    • kowdermeister 2230 days ago
      "easy to understand" is added by the poster to the title, the repo doesn't make this claim.

      With zero explanation and code it's small experiment, but not really educational.

    • pankajdoharey 2230 days ago
      By C4 you meant C in 4 functions? https://github.com/rswier/c4 here?
    • tytytytytytytyt 2230 days ago
      Why does he/she have .gitignore files in subdirectories? That seems to make those subdirectories linger for no reason.
      • tines 2230 days ago
        Usually because a build script depends on their existence or something.
        • saagarjha 2230 days ago
          Then have the scripts create them?
      • coldtea 2230 days ago
        >Why does he/she have .gitignore files in subdirectories? That seems to make those subdirectories linger for no reason.

        He has it precisely to make them linger. And what "no reason"? It's so subsequent scripts can put results there.

  • giancarlostoro 2230 days ago
    Another good simple one is NekoVM (though not sure how they compare size wise) which is one of the target platforms that Haxe compiles to:

    https://github.com/HaxeFoundation/neko

    Being interested in wanting to write my own languages (though never finding the time with other pet projects) I always wanted to write something that would ultimately be usable with NekoVM as one of my side goals for a language. Neko also has a module for Apache.

  • classichasclass 2230 days ago
    I like an elegant VM, but the ones I find the most interesting are the ones you can actually employ as a compilation target.

    For example, there's a C-like language for SUBLEQ machines: http://mazonka.com/subleq/hsq.html

  • peterkelly 2230 days ago
    Here's the switch statement, if you're looking for it: https://github.com/jakogut/tinyvm/blob/master/include/tvm/tv...
  • mar77i 2230 days ago
    rather recently I remembered [0], which I then rebuilt in C using the same memory layout and tried to approximate the functionality of the original thing [1]...

    [0] https://www.randelshofer.ch/fhw/gri/holzi.html

    [1] https://gist.github.com/mar77i/46bd25504dd9e81d0ca7778efcee4...

  • joe_the_user 2230 days ago
    Hmm,

    Scanning the syntax, is there an operation for addressing memory? I see an operation for moving one value to another and that's it. I don't see any method of addressing a variable position in memory (or a variable position in an array if one wants to be more managed about it).

    I suppose you could handle all operations from the stack.

    But I think a lot of things require memory reads and writes, at least to do efficiently.

    • int0x80 2230 days ago
      I was also looking for it and couldn't find it. You can't do much with the stack if you can't load/store.
  • d33 2230 days ago
    I'm kind of worried about usage of strcmp here:

    https://github.com/jakogut/tinyvm/search?q=strcmp

    It's also very easy to crash the thing, either with a malformed input file or afl-fuzz. Are you sure C was the right choice here?

    • spc476 2230 days ago
      For the actual emulation, I would say C is okay. For the assembler? I can think of half a dozen other languages better suited for that.
  • kondor6c 2230 days ago
    There is also the Java based "PC emulator":

    https://github.com/ianopolous/JPC

    However, development seems to have slowed down drastically.

  • _RPM 2230 days ago
    Great work. After scanning the source tree with my eyes, I have yet to find the implementation of the instruction set. Therefore, I wouldn't say this is small.