Limbo module implemented on C is sometimes called the OS Inferno driver because it is built into the OS kernel. We usually need such module to either access from Limbo some features absent in Inferno (access existing 3rd-party C/C++ libraries or host OS specific syscalls) or get maximum performance (my benchmark show about 1.3-1.5 times speedup between Limbo with JIT and C, but sometimes even this may be critical).
Adding our module to OS kernel
Sadly, for now there is no way to dynamically load module implemented in C, so we should build it right into the OS kernel. (This make development more complex because after each change we’ve to rebuild Inferno. Luckily, required partial rebuild took just about 10 seconds.)
To embed our module we’ve to modify several files:
libinterp/mkfile, module/runt.m, emu/Linux/emu and emu/Linux/emu-g.
The standard patch
command can’t be used to modify these files
because user may want to add several modules in random order, and all
these modules will try to modify same files in same places.
It may handle one-two modules, but with more modules it will fail because
what it expects to see in these files will differ too much from what really
is in these files now.
To solve this issue I’ve developed small Perl script - in most cases it’s enough to edit name of your module in that script in line
my $MODNAME = 'CJSON';
and it will apply all needed changes in all these files, embedding your
module into OS Inferno kernel.
In more complex cases, for example where you need to link extra C/C++
libraries with Inferno, this script have to be additionally modified
(example of such modification for linking with C++ library
re2 you can see in
Re2 module).
Run this script with option -R
to undo changes.
So, download the script, save it into $INFERNO_ROOT
, rename to
patch.example, edit it to modify module name to 'Example'
, and run.
Now (non-existing yet) module Example added to kernel, all we need is
create it and rebuild Inferno with it.
To begin, let’s create these two files:
- module/example.m
-
Example: module { PATH: con "$Example"; };
- libinterp/example.c
-
#include <lib9.h> #include <isa.h> #include <interp.h> #include "runt.h" #include "examplemod.h" void examplemodinit(void) { builtinmod("$Example", Examplemodtab, Examplemodlen); }
And rebuild OS Inferno:
$ (cd libinterp/; mk nuke)
$ rm Linux/386/bin/emu # work around "text file busy" error
$ mk install
Now we can implement Limbo application, which successfully load our module (which isn’t doing anything useful yet).
- testexample.b
-
implement TestExample; include "sys.m"; include "draw.m"; include "example.m"; TestExample: module { init: fn(nil: ref Draw->Context, nil: list of string); }; init(nil: ref Draw->Context, nil: list of string) { sys := load Sys Sys->PATH; example := load Example Example->PATH; if(example == nil) sys->print("fail to load Example: %r\n"); else sys->print("Example module loaded\n"); }
And run:
$ emu
; limbo testexample.b
; testexample
Example module loaded
;
How it works
While building Inferno, file module/example.m analysed, and corresponding C data structures is generated. General information about this module saved in separate file libinterp/examplemod.h. Structures describing module’s public interface (constants, adt, functions) appended to file libinterp/runt.h (which contain information about all C modules). Both these .h-files are already included in our libinterp/example.c.
Next, while booting OS Inferno will call our function
examplemodinit()
, which should initialize global data of our module (if
such data exist) and attach it (by executing builtinmod()
) to kernel.
builtinmod()
will set connection between our module and it pseudo-path
'$Example'
defined in constant PATH
, used from Limbo to load
our
module.
Functions: how to get params and return result
Numbers
Let’s begin with simple data types, to not overcomplicate initial examples with pointers.
- module/example.m
-
Example: module { ... increment: fn(i: int): int; };
- libinterp/example.c
-
... void Example_increment(void *fp) { F_Example_increment *f; int i; f = fp; i = f->i; *f->ret = i + 1; }
Rebuild Inferno.
- testexample.b
-
... init(nil: ref Draw->Context, nil: list of string) { ... sys->print("increment(5) = %d\n", example->increment(5)); }
Don’t forget to restart emu
before executing our example application
because currently running emu
doesn’t contain modified C module.
$ emu
; limbo testexample.b
; testexample
Example module loaded
increment(5) = 6
;
How it works
While building, definition for function increment()
(found in
module/example.m), it parameters and returned value was automatically
generated and appended into libinterp/runt.h:
void Example_increment(void*);
typedef struct F_Example_increment F_Example_increment;
struct F_Example_increment
{
WORD regs[NREG-1];
WORD* ret;
uchar temps[12];
WORD i;
};
I didn’t check yet what regs is; temps added for data alignment; ret is a pointer to returned value; and i is our parameter.
Strings
- module/example.m
-
Example: module { ... say: fn(s: string); };
- libinterp/example.c
-
... void Example_say(void *fp) { F_Example_say *f; String *s; char *str; f = fp; s = f->s; str = string2c(s); print("%s\n", str); }
- testexample.b
-
... init(nil: ref Draw->Context, nil: list of string) { ... example->say("Hello!"); }
Build, restart, test:
$ emu
; limbo testexample.b
; testexample
Example module loaded
increment(5) = 6
Hello!
;
How it works
Here is what we got in libinterp/runt.h:
void Example_say(void*);
typedef struct F_Example_say F_Example_say;
struct F_Example_say
{
WORD regs[NREG-1];
WORD noret;
uchar temps[12];
String* s;
};
With noret vs ret it’s all clear - function say()
return nothing.
Type String*
is C implementation of Limbo’s strings. You can find
struct String
in include/interp.h, helper functions working with
strings (like used in our example string2c()
) are in libinterp/string.c.
In similar way you can work with all other Limbo’s data types: using
Array*
, List*
, etc. Not for all data types there are available
helper functions like for strings, but you can find enough examples in
VM’s opcodes implementation (in libinterp/xec.c) - for example, how to
handle array slices).
User adt defined in module/example.m converted to usual C struct (pick adt into union). Tuple also became usual struct.
It’s very likely you will have to run Inferno rebuild (which will fail) after modifying module/example.m just to update libinterp/runt.h and let you see which exactly structures was generated for your data types and understood how to handle them in libinterp/example.c.
Exceptions
To raise exception it’s enough to call error()
. You can include
raise.h to raise standard exceptions defined in libinterp/raise.c
or define your own exceptions in similar way in
libinterp/example.c.
Of course, if you manually allocated memory using malloc()
, then before
calling error()
you have to free()
than memory to avoid leaking.
Objects allocated in standard way on heap (like String*
and Array*
)
may not be destroyed, they will be freed later by GC anyway (more about GC
below).
Return a pointer
One subtle thing about returning result from functions is *f->ret
already pointing to user’s variable, which should keep function result
value after it returns. There are two consequences because of this:
-
If you put result into
*f->ret
first, but then decide to raise exception, something impossible in Limbo will happen: function both return value and raise exception. -
If in variable, where result of your function should be stored, there is already existing value (which is also a pointer because this variable’s type is the same as type of value returned by your function), then you should
destroy()
it before overwriting it with your returned value.
To demonstrate first issue let’s modify our function increment()
in this way:
- libinterp/example.c
-
... void Example_increment(void *fp) { ... *f->ret = i + 1; error("some error"); }
- testexample.b
-
... init(nil: ref Draw->Context, nil: list of string) { ... i := 0; { i = example->increment(5); } exception e { "*" => sys->print("catched: %s\n", e); } sys->print("i = %d\n", i); }
; testexample
...
catched: some error
i = 6
;
So solve second issue in C functions before storing returned value into
*f->ret
you have to free current value. Usually it’s done either in this way:
destroy(*f->ret);
*f->ret = new_value;
or in this way (H
is C equivalent for Limbo’s nil
):
void *tmp;
...
tmp = *f->ret;
*f->ret = H;
destroy(tmp);
...
*f->ret = new_value;
As far as I understood, right now both snippets do the same, and there is no difference between them. But if Dis will be changed to run simultaneously on multiple CPU/Core, then second snippet will work correctly while first one will not.
Locking Dis
Inferno uses global Dis lock (probably, similar to well-known GIL in Python). C functions called with lock set because it’s required to safely access and modify Dis data structures (i.e. any values and variables available in Limbo apps - including parameters and returned value of our C functions).
But if your function must do some long operation (like read/write or call
to slow function from external lib or do some complex calculation doesn’t
involving Dis data structures), then you should release()
lock before
that operation, to let Dis continue running in parallel with your
operation, and after you have done you should acquire()
lock again
(because without it you won’t be able to return result and get back to
your caller’s Limbo code). You can see an example in sys->read()
implementation in emu/port/inferno.c:
void
Sys_read(void *fp)
{
...
release();
*f->ret = kread(fdchk(f->fd), f->buf->data, n);
acquire();
}
Heap
To be able to correctly create and destroy complex Limbo data structures on C, you should be understood how they stored in memory, i.e. how heap implemented in Inferno. All mentioned below functions to work with heap are in libinterp/heap.c, data structures are in include/interp.h.
What’s in my memory…
To simply allocate n bytes in heap, and then free them, you should do this:
Heap *h;
uchar *data;
h = nheap(256);
data = H2D(uchar*, h);
... // working with data, 256 bytes long
destroy(data);
This result in allocating 256 + size of heap header bytes, which will begin with
heap header first and your data next. Heap header is struct Heap
, plus
there are two macros to convert pointer to heap header to pointer to your
data (with type cast at same time for usability) H2D()
and vice versa
D2H()
. Function destroy()
uses D2H()
to convert pointer to beginning
of user’s data with unknown size to pointer to heap header and find out
so how many bytes it should free.
struct Heap
{
int color; /* Allocation color */
ulong ref;
Type* t;
ulong hprof; /* heap profiling */
};
#define H2D(t, x) ((t)(((uchar*)(x))+sizeof(Heap)))
#define D2H(x) ((Heap*)(((uchar*)(x))-sizeof(Heap)))
Few words about GC
So, what’s in heap header? I don’t know about hprof, I didn’t look in heap profiling, but all other fields are very important.
At first, few words about garbage collection in Inferno. There are two
strategies used: simple reference counting (which is enough for most
data structures), plus sort of
tri-colour marking
algorithm to free data structures with cyclic references.
So, ref is reference counter, and color is used by tri-colour marking.
When nheap()
or another similar function allocate on heap new data
object, it heap header’s ref value is set to 1. When you call
destroy()
, it decrease ref value by 1, and if ref become equal to 0
it will actually free memory used by this object.
So, while you keep value returned by nheap()
(or any other similar
function) in single variable - you’ve exactly one reference to that
object, and object’s ref set to 1. As soon as you will copy that pointer
into another variable - you must increase reference counter plus notify
tri-colour algorithm about this. This is done in this way (Setmark()
is
also macros, but to understand what it does you have to understand how
Inferno’s tri-colour algorithm works, which will be explained later):
Heap *h;
uchar *data, *data2;
data = H2D(uchar*, nheap(256));
data2 = data;
h = D2H(data2);
h->ref++;
Setmark(h); // works with h->color
destroy(data); // data won't be freed from memory yet
destroy(data2); // now data will be freed from memory
Of course, if you allocated object on heap, and then returned it to user
by copying to *f->ret
, you don’t need to do something with ref and
color - pointer to that object stored in your function’s local variables
will be deleted as soon as you return from function, so there is will be
still just one pointer to that data - in user’s variable where she store
value returned by your function. So, it was actually pointer moving, not
copying.
There is one subtle issue related to moving pointers. In previous example
with value returned from function we move new, just allocated pointer, and
don’t need to do anything more. But if you moved pointer from one existing
data structure to another existing data structure, you must notify
tri-colour algorithm about this by calling Setmark()
(this is also
related to implementation details of Inferno GC and will be explained
later):
dst->data = src->data;
src->data = H;
Setmark(D2H(dst->data));
Types of objects in heap
Previous example with nheap()
probably never used in real code because
Limbo doesn’t have such data type: n bytes. So, you can’t receive such
data object from Limbo as parameter for your function, and you can’t
return object allocated with nheap()
to Limbo. As for your function’s
internal needs, it’s usually enough to use malloc()
and free()
.
All objects in heap which can be accessed by Limbo must have some type,
defined by struct Type
. This is needed to solve task of finding all
pointers inside any object - which is required for:
-
allocating memory for that object (to set all pointers inside object to
H
a.k.a.nil
); -
delete object from memory with
destroy()
(to either delete from memory or at least decrease ref for all objects referenced by this one); -
do garbage collection (to find and check all objects referenced by another complex object).
struct Type
{
int ref;
void (*free)(Heap*, int);
void (*mark)(Type*, void*);
int size;
int np;
void* destroy;
void* initialize;
uchar map[STRUCTALIGN];
};
For standard data types (any adt/tuple/struct which contain either
non-pointer fields or fields with pointers to usual heap-objects, which
can be freed using destroy()
- like String*
and Array*
) the fields
np and map are used. Field map contain bit mask, with one bit
(starting from most significant bit in first byte) per each 4 bytes of
adt/tuple/struct, where set bit mean corresponding 4 bytes is a pointer to
heap-object. Field np contain length of map field in bytes.
Some data types use memory in non-standard way. Typical examples are
String
and Array
- first one contain char*
field, which have to be
freed with free()
; second may consist of adt/tuple/struct which should
be freed according to their type, not type of Array
struct itself.
The Type
struct allow defining own non-standard functions-handlers for
such types, which will be called while allocating/initializing memory,
by destroy()
and from GC: free, mark, destroy and initialize.
Field size contain size of structure for that type (if we have to provide type while allocating new object of that type anyway, keeping size inside type allow us to avoid providing both type and size separately).
Field ref is used to keep current amount of objects of that type in
heap. Thing is, type list is not limited by list of predefined types
like string
, array of
, etc. - any tuple defined on the fly is a
separate type, which needs its own definition by structure Type
. That
means, some types are created by Limbo on the fly, stored in same heap, and
must be freed from memory as soon as there are no more objects of that
type in heap. To make this work, when creating every new object of some
type, that type’s ref must be increased, and while freeing object of
some type destroy()
will also decrease object type’s ref and will free
that type after freeing object if type’s ref become 0.
Values of Type
for standard types are defined statically, with ref set
to 1 (so their ref never become less than 1 and they never will be
freed) - you can find their definitions in libinterp/head.c:
Type Tbyte = { 1, 0, 0, 1 };
Type Tword = { 1, 0, 0, sizeof(WORD) };
Type Tptr = { 1, 0, markheap, sizeof(WORD*), 1, 0, 0, { 0x80 } };
Type Tarray = { 1, freearray, markarray, sizeof(Array) };
Type Tstring = { 1, freestring, noptrs, sizeof(String) };
...
Allocating our own complex data structures in heap
Types for your own adt/tuple/struct in some cases can be defined with help
of libinterp/runt.h (it will calculate your struct size for Type.size
and pointers bit mask for Type.map
and Type.np
), in other cases you will
have to define it manually (for example, to create and return tuple from C
function). Usually these types defined while initializing your module
(back to our module Example) using helper function dtype()
. And you will
allocate memory for your objects using heap(&Tsometime)
instead of
nheap(n_bytes)
.
- module/example.m
-
Example: module { ... MyData: adt{ i: int; s: string; new: fn(i: int): ref MyData; }; };
- libinterp/runt.h
-
(generated automatically based on module/example.m)
typedef struct Example_MyData Example_MyData; struct Example_MyData { WORD i; String* s; }; #define Example_MyData_size 8 #define Example_MyData_map {0x40,} void MyData_new(void*); typedef struct F_MyData_new F_MyData_new; struct F_MyData_new { WORD regs[NREG-1]; Example_MyData** ret; uchar temps[12]; WORD i; };
Value 0x40 for Example_MyData_map mean bit mask 010000…, i.e. first 4 bytes of our struct isn’t pointer (
WORD
) and next 4 bytes is pointer (String*
). - libinterp/example.c
-
static Type* TMyData; static uchar MyData_map[] = Example_MyData_map; void examplemodinit(void) { ... TMyData = dtype(freeheap, Example_MyData_size, MyData_map, sizeof(MyData_map)); } void MyData_new(void *fp) { F_MyData_new *f; int i; Example_MyData* mydata; f = fp; i = f->i; mydata = H2D(Example_MyData*, heap(TMyData)); mydata->i = i; destroy(*f->ret); *f->ret = mydata; }
- testexample.b
-
... example: Example; MyData: import example; ... init(nil: ref Draw->Context, nil: list of string) { ... mydata := MyData.new(5); sys->print("i = %d, s = %q\n", mydata.i, mydata.s); }
; testexample
...
i = 5, s = ''
;
Array
Let’s see how to handle arrays in C. In memory array allocated in this
way: heap header, array header, array elements.
So, to allocate array on heap we’ve to know amount and size of it
elements. To initialize these elements (in case they contain pointers
which must be set to H
) and later correctly free these elements from
memory we’ve to know type of elements (this mean we don’t have to provide
elements size anymore because it’s already inside element’s Type
).
struct Array
{
WORD len;
Type* t;
Array* root;
uchar* data;
};
The len is array length, t is type of it elements, root is pointer
to parent array (if this array is a slice), and data is pointer to first
element (that’s next byte after struct Array
if this is independent
array or address of some element in parent’s array, which is first element
of our slice).
If we don’t create slice of another array, then we need to allocate more
memory than used by Array
struct (and so in Tarray.size
). Because of
this we can’t allocate memory for array using heap()
function. Happily,
we’ve function heaparray()
. Example of allocating array[16] of byte
:
Array *arr;
arr = H2D(Array*, heaparray(&Tbyte, 16));
And here is example of function creating array slice:
static Array*
slice(Array* sa, int s, int e)
{
Type *t;
Heap *h;
Array *a;
if(s < 0 || s > e || e > sa->len)
error(exBounds);
t = sa->t;
h = heap(&Tarray);
a = H2D(Array*, h);
a->len = e - s;
a->data = sa->data + s*t->size;
a->t = t;
t->ref++;
if(sa->root != H) /* slicing a slice */
sa = sa->root;
a->root = sa;
h = D2H(sa);
h->ref++;
Setmark(h);
return a;
}
adt и ref adt
There is subtle difference between Limbo and C in how they handle adt and ref adt. On Limbo they look very similar:
a := array[10] of MyData;
b := array[10] of ref MyData;
for(i := 0; i < len b; i++)
b[i] = ref MyData;
a[0].i = b[0].i;
but on C it’s completely different things:
Array *a, *b;
int i;
Example_MyData* adata;
Example_MyData** bdata;
a = H2D(Array*, heaparray(TMyData, 10));
adata = (Example_MyData*)a->data;
b = H2D(Array*, heaparray(&Tptr, 10));
bdata = (Example_MyData**)b->data;
for(i = 0; i < b->len; i++)
bdata[i] = H2D(Example_MyData*, heap(TMyData));
adata[0].i = bdata[0]->i;
GC
You can read tri-color algorithm overview in
wikipedia.
In Inferno these three "colours" are named mutator
, marker
and
sweeper
.
New objects created with h->color=mutator
.
After each complete gc loop values of variables mutator
, marker
and
sweeper
change on a circle, effectively resulting in "changing" values
(actually, meaning) of h->color
of all objects in heap:
mutator -> marker
marker -> sweeper
sweeper -> mutator // this never happens because all sweepers already freed
If gc see h->color==sweeper
it free h
from memory.
So, finally, what Setmark()
is and why it needed:
#define Setmark(h) if((h)->color!=mutator) { (h)->color = propagator; nprop=1; }
GC may work a very long time, and everything else is paused while GC works. To solve this issue, Inferno’s GC do it work in small chunks - after checking several objects in heap GC stops and allow other code to run, then it continues from same point. But for tri-colour algorithm it needs to check all objects in heap in one run, atomically. So Inferno GC provide some way to find out is objects on heap were changed since beginning of GC loop.
This is what Setmark()
does. It sets global flag nprop when needed,
notifying GC about some object was changed, and so GC shouldn’t free all
sweeper
objects at end of current loop, but instead should start all
over again and recheck all objects in heap.
Value of h->color==propagator
means GC must check this object on next
run. After checking it h->color
will be set to mutator
.
(Same propagator
value is set to "root" objects in beginning of next GC
loop, but that doesn’t important in this context.)
Detaching heap object from GC
While GC delete from memory all objects which doesn’t referenced from
"root" objects (which is probably consists of working threads, loaded
modules, etc.) ignoring their ref value, then we can’t keep pointer to
heap object outside Limbo threads (for example in global variables of
our C module). To prevent GC from freeing such our objects in heap we
should detach them from GC using poolimmutable()
(later we can attach it
back to GC using poolmutable()
):
- libinterp/example.c
-
... static Array* EmptyArray; void examplemodinit(void) { ... EmptyArray = H2D(Array*, heaparray(&Tbyte, 0)); poolimmutable(D2H(EmptyArray)); }