January 2019 – Tools of our Tools

What is this?

I like to program in interpreted languages because development is fast and easy, and I think duck typing is Just Fine. But I also like the speed and efficiency of object code compiled from carefully written C or C++ that written with some regard to How Computers Actually Work.

So, from time to time, when I have a medium-sized project that benefits from flexibility but also needs some high performance computation, I use the ability of the scripting language to load and run native object code.

Back in ancient times I did this with Tcl. In slightly less ancient times, I used Perl. Perl has a cool set of modules called Inline::* that let you put source code for a compiled language mixed right in with the Perl code. It would then rebuild and link that code when you ran the Perl script and would cache the object code to save startup time on subsequent runs. It was brilliant, but I don’t code in Perl much anymore because all the serious programmers tell me “Perl is Considered Harmful.” (That is, most folks who pay me don’t want to see Perl code. Shrug.)

So I decided the other day to experiment with the binary interfaces to languages I use nearly every day: NodeJS and Python. I also included my old friend, Perl.

The test library to bind: the “fractizer”

I used a simple bit of C code I wrote awhile back for a parallel computing demo as my test target. (Relevant files are native/fractizer.[ch].) Until now, I have used it in a standalone executable that I call via fork from a simple Node.JS server application. The code computes fractals. It’s not rocket science, but it has just enough interface to be a half-decent experiment: a few functions, a struct, pointers, and an array. These are the basic elements you’d need to bind any library.

typedef struct cp_t {
 double r;
 double i;
} cp_t;

typedef void (*znp1_calc_t)(cp_t *z, cp_t *c);

typedef struct fparams_t {
    uint16_t max_iters;
    double   escape_val;
    double   x_min;
    double   x_max;
    double   y_min;
    double   y_max;
    uint16_t x_pels;
    uint16_t y_pels;
    uint16_t x_tile;
    uint16_t y_tile;
    uint8_t  type;
    uint8_t  do_julia;
    double   jx;
    double   jy;
    znp1_calc_t algo;
} fparams_t;

void showParams(fparams_t *p);
void set_default_params(fparams_t *p);
void generate_fractal(fparams_t *pparams, uint16_t *rbuf);

First contestant: Node.JS

I worked on NodeJS first because my application already used it, and I thought it would be cool to avoid the fork() that was part of the existing demo.

Try 1

Node has a native binary interface, part of the V8 engine. V8 is under constant improvement, and they make no promises about long-term API or ABI compatibility. Instead, they have a (promised) more stable wrapper interface called N-API, so that you can move compiled objects between Node versions. It took me about an hour to figure out how to use N-API on my little functions. It would have taken less time had the documentation been better, particularly with examples that include a few non-trivial things like passing complex types to and from Node. But I got it working. The wrapper code looked like this:

#include "fractizer.h"
#include <node_api.h>

bool get_named(napi_env env, napi_value obj, const char *name, uint32_t *tuint, double *tdouble) {
    bool hasit = false;
    napi_has_named_property(env, obj, name, &hasit);
    if (hasit) {
        napi_value nobj;
        napi_get_named_property(env, obj, name, &nobj);
        napi_get_value_uint32(env, nobj, tuint);
        napi_get_value_double(env, nobj, tdouble);
    }
    return hasit;
};

fparams_t unpack_node_params2(napi_env env, napi_callback_info info) {
    fparams_t params = make_default_params();

    size_t argc = 1;
    napi_value argv[1];
    napi_get_cb_info(env, info, &argc, argv, NULL, NULL);

    uint32_t tuint;
    double tdouble;
    if (get_named(env, argv[0], "max_iters", &tuint, &tdouble)) { params.max_iters = tuint;
    };
    if (get_named(env, argv[0], "escape_val", &tuint, &tdouble)) { params.escape_val = tdouble;
    };
    if (get_named(env, argv[0], "x_min", &tuint, &tdouble)) { 
        params.x_min = tdouble;
    };
    if (get_named(env, argv[0], "x_max", &tuint, &tdouble)) { 
        params.x_max = tdouble;
    };
    if (get_named(env, argv[0], "y_min", &tuint, &tdouble)) { 
        params.y_min = tdouble;
    };
    if (get_named(env, argv[0], "y_max", &tuint, &tdouble)) {
        params.y_max = tdouble;
    };
    if (get_named(env, argv[0], "x_pels", &tuint, &tdouble)) {
        params.x_pels = tuint;
    };
    if (get_named(env, argv[0], "y_pels", &tuint, &tdouble)) {
        params.y_pels= tuint;
    };
    if (get_named(env, argv[0], "x_tile", &tuint, &tdouble)) {
        params.x_tile = tuint;
    };
    if (get_named(env, argv[0], "y_tile", &tuint, &tdouble)) {
        params.y_tile = tuint;
    };
    if (get_named(env, argv[0], "type", &tuint, &tdouble)) {
        params.type = tuint;
    };
    if (get_named(env, argv[0], "jx", &tuint, &tdouble)) {
        params.jx = tdouble;
    };
    if (get_named(env, argv[0], "jy", &tuint, &tdouble)) {
        params.jy = tdouble;
    };

    return params;

};

napi_value runFunc(napi_env env, napi_callback_info info) {

    fparams_t params = unpack_node_params2(env, info);
    void *vrdata;
    size_t len = params.x_pels * params.y_pels;
    napi_value abuf, oary;
    napi_create_arraybuffer(env, len * sizeof(uint16_t), &vrdata, &abuf);
    uint16_t *rdata = (uint16_t *)vrdata;
    napi_create_typedarray(env, napi_uint16_array, len, abuf, 0, &oary);

    if (rdata) {
        generate_fractal(&params, rdata);
        return oary;
    }
    napi_get_null(env, &oary);
    return oary;
}

napi_value Init(napi_env env, napi_value exports) {

    napi_status status;
    napi_value fn;
    status = napi_create_function(env, NULL, 0, runFunc, NULL, &fn);
    if (status != napi_ok) {
        napi_throw_error(env,NULL,"Unable to wrap native function.");
    }
    status = napi_set_named_property(env, exports, "run", fn);
    if (status != napi_ok) {
       napi_throw_error(env,NULL,"Unable to populate exports");
    }
    return exports;

};

NAPI_MODULE(NODE_GYP_MODULE_NAME, Init)

It’s mostly straightforward, using various functions to create and access Javascript objects. I had to write a function to convert between a node Object and my configuration struct. This is a theme in all three languages I tried, and I think it’s lame. There should a utility that reads a header file and does this for me.

Note that the code returns something called a TypedArray. This is a cool thing. It lets you use a real pointer to memory in your compiled code but access is directly without a copy/conversion in Node. This, I’m pretty sure, helps avoid a copy of a potentially big array. It also avoids the size bloat of a similar length array of full-blown Javascript objects.

Try 2

Node has an interesting execution model. There is a single main thread, but you can take advantage of multiple additional threads for activities that might block or just take a long time to compute. Doing so also lets you avail yourself of extra CPU cores to run those long-running tasks while the main thread soldiers on. Taking advantage of this means making your code asynchronous.

Getting that to work well took more time than I care to admit, but I did ultimately succeed. Again, the documentation sucked, particularly regarding how you marshal data across the script/binary boundary and between main and subthreads. In the end, not much wrapper code was really needed, but figuring it out was not fun — lots of segfaults in the interim.

This is what the async wrapper looked like (I also switched to a C++ wrapper around the native API. This cut down a bit on typing but I’m not sure it’s a better than the raw C functions, especially if you don’t want to use C++ exceptions:

#include "fractizer.h"
#include <napi.h>

class aWorker : public Napi::AsyncWorker {

public:
  aWorker(const Napi::Function& callback) : 
    Napi::AsyncWorker(callback), bufptr(0) { }

protected:
  void Execute() override {
    showParams(&parms);
    if (bufptr) {
        generate_fractal(&parms, bufptr);
        return;
    }
    std::cout << "no buffer" << std::endl;
  }

  void OnOK() override {
    Napi::Env env = Env();

    size_t len = parms.x_pels * parms.y_pels;
    Napi::Array oary = Napi::Array::New(env, len);
    for (uint32_t i=0;i<len;i++) {
        oary[i] = bufptr[i];
    }
    Callback().MakeCallback(
      Receiver().Value(), {
        env.Null(), oary
      }
    );
    delete [] bufptr;
  }

public:
    bool get_named(Napi::Object parms_arg, const char *name, uint32_t &tuint, double &tdouble) {
        bool hasit = parms_arg.Has(name);
        if (hasit) {
            Napi::Value v = parms_arg.Get(name);
            tuint   = v.As<Napi::Number>().Uint32Value();
            tdouble = v.As<Napi::Number>().DoubleValue();
        }
        return hasit;
    };
    void unpack_params(Napi::Object parms_arg) {
        std::cout << "unpackParams()" << std::endl;
        set_default_params(&parms);

        uint32_t tuint;
        double tdouble;

        if (get_named(parms_arg, "max_iters", tuint, tdouble)) {
            parms.max_iters = tuint;
        };
        if (get_named(parms_arg, "escape_val", tuint, tdouble)) {
            parms.escape_val = tdouble;
        };
        if (get_named(parms_arg, "x_min", tuint, tdouble)) {
            parms.x_min = tdouble;
        };
        if (get_named(parms_arg, "x_max", tuint, tdouble)) {
            parms.x_max = tdouble;
        };
        if (get_named(parms_arg, "y_min", tuint, tdouble)) {
            parms.y_min = tdouble;
        };
        if (get_named(parms_arg, "y_max", tuint, tdouble)) {
            parms.y_max = tdouble;
        };
        if (get_named(parms_arg, "x_pels", tuint, tdouble)) {
            parms.x_pels = tuint; 
       };
        if (get_named(parms_arg, "y_pels", tuint, tdouble)) {
            parms.y_pels= tuint;
       };
        if (get_named(parms_arg, "x_tile", tuint, tdouble)) {
            parms.x_tile = tuint;
       };
        if (get_named(parms_arg, "y_tile", tuint, tdouble)) {
            parms.y_tile = tuint;
        };
        if (get_named(parms_arg, "type", tuint, tdouble)) {
            parms.type = tuint; };
        if (get_named(parms_arg, "do_julia", tuint, tdouble)) {
            parms.do_julia = tdouble;
        };
        if (get_named(parms_arg, "jx", tuint, tdouble)) {
            parms.jx = tdouble;
        };
        if (get_named(parms_arg, "jy", tuint, tdouble)) {
            parms.jy = tdouble;
        };

    };

    void setupBuffer() {
      size_t len = parms.x_pels * parms.y_pels;
      bufptr = new uint16_t[len];
    };


private:
    fparams_t parms;
    Napi::ArrayBuffer abuf;
    Napi::TypedArray  tary;
    uint16_t *bufptr;
};


void aRun(const Napi::CallbackInfo& info) {

  Napi::Object parms_arg = info[0].ToObject();
  Napi::Function cb = info[1].As<Napi::Function>();

  auto w = new aWorker(cb);
  w->unpack_params(parms_arg); 
  w->setupBuffer();
  w->Queue();

  return;
}


Napi::Object Init(Napi::Env env, Napi::Object exports) {
  exports.Set(
    Napi::String::New(env, "aRun"),
    Napi::Function::New(env, aRun)
  );
  return exports;
}


NODE_API_MODULE(addon, Init)

Still had to write that struct setter function, though.

To use this in Node, you need to compile it (obvs). Basically, you install node-gyp from npm and then call:node-gyp configure build. You will need a configuration file for gyp that is pretty simple:

{
    "targets": [
        {
            "include_dirs": [
                "<!@(node -p \"require('node-addon-api').include\")"
            ],
            "dependencies": [
                "<!(node -p \"require('node-addon-api').gyp\")"
            ],
            "target_name": "fractizer",
            "sources": [ "native/fractizer.cpp", "native/wrapper.cpp" ],
            "defines": [ "NAPI_DISABLE_CPP_EXCEPTIONS" ]
        }
    ]
}

Anyway, that worked fine, but I did not enjoy the experience. One thing I do not like is that all the dealing with Node objects still needs to be done in the main thread. So you must be in the main thread to convert any input arguments to the form your native code will understand, then run your native code in its own thread, and then when it is done, it calls back to the main thread, where you’ll provide more code to unspool your native types back to Javascript objects. I wish some of that prep/unprep could be done in the async part, so that you maximize the performance of the main loop. In my case, converting a C array into a Javascript Array takes runtime I’d rather not have in the main event loop. Alas, I was not able to figure out how to do this, though I suspect this is possible and I was just too dumb to figure it out.

Next at bat: Python

Anyway, after the Node experience and because of the one-way-to-do-it philosophy of the Python community, I just assumed Python would be a pain in the ass, too. It turns out, no, Python isn’t so bad. In fact, for Python, using the ctypes library, I could wrap my existing library code without writing any more C code — the entire wrapper could be done in Python! I didn’t have to do anything to the source to adjust my native code.

I did tell Python all about my structs, but in return I got automagically created object accessors, so fair trade.

In theory, if you already have a .so built, you needn’t even compile anything at all. (Actually, I did have to add “extern C” because the ABI is C-only and my original code was in a cpp file even though it was basically just C.

Anyway, the new Python module looked like this:

import ctypes

class fparams_t(ctypes.Structure):
    _fields_ = [
        ('max_iters', ctypes.c_ushort),
        ('escape_val', ctypes.c_double),
        ('x_min', ctypes.c_double),
        ('x_max', ctypes.c_double),
        ('y_min', ctypes.c_double),
        ('y_max', ctypes.c_double),
        ('x_pels', ctypes.c_ushort),
        ('y_pels', ctypes.c_ushort),
        ('x_tile', ctypes.c_ushort),
        ('y_tile', ctypes.c_ushort),
        ('type', ctypes.c_ubyte),
        ('do_julia', ctypes.c_ubyte),
        ('jx', ctypes.c_double),
        ('jy', ctypes.c_double),
        ('algo',ctypes.c_void_p),
]


class Fractizer(object):
    def __init__(self):
        self.fr = ctypes.cdll.LoadLibrary('./fractizer.so')
        self.params = fparams_t()
        self.fr.set_default_params(ctypes.byref(self.params))

    def getParams(self):
        return self.params

    def showParams(self):
        self.fr.showParams(ctypes.byref(self.params))

    def compute(self):
        output_len = self.params.x_pels * self.params.y_pels
        output_ary_t = ctypes.c_ushort * output_len
        output_ary = output_ary_t()
        poutput_ary = ctypes.pointer(output_ary)
        self.fr.generate_fractal(ctypes.byref(self.params),poutput_ary)
        return output_ary

    def showResult(self,ary):
        olines = [];
        for j in range(self.params.y_pels):
            x = [ary[i + j*self.params.x_pels] for i in range(self.params.y_pels)]
            y = ['{0:4}'.format(q) for q in x]
            olines.append(' '.join(y))
        return '\n'.join(olines)



if __name__ == '__main__':
    # example usage
    f = Fractizer()
    f.getParams().x_pels = 20
    f.getParams().y_pels = 20
    f.showParams()
    result = f.compute()
    print(f.showResult(result))

Of the languages I tested, only Python asked me to build the code myself, but I think that’s reasonable, as their main idea is that you are binding an existing library anyway:

g++ -c -fPIC -O3 fractizer.cpp -o fractizer.o
g++ -shared fractizer.o -o fractizer.so

Not so bad. I think it would have been cool if Python could have read the C header file and generated the parallel Python type for the parameters rather than me having to create (and hardcode) it myself, but I guess it’s about par for the course. However, at least I did not have to write accessor functions.

Overall, I was impressed with the Python. I was also able to run Python in threads and create separate instances of my wrapped function and it all seemed to go just fine.

Olde Tymes’ Sake: Perl

I finished with Perl, because I remembered it being so easy. I remembered incorrectly. All the building and linking stuff is handled by Inline::C, but if your library uses its own types, Perl needs just as much help as the other languages. You need to tell it about any structs you might have to use, and provide accessor functions for them.

Telling Perl about the types is straightforward. You create a typemap to tell it these are pointers to things it doesn’t understand:

fparams_t *    T_PTR
cp_t *         T_PTR
uint16_t *     T_PTR
uint8_t        T_U_CHAR

Basically, I told it that there are these things called fparams_t and cp_t, and that Perl will be managing pointers to them but doesn’t really need to know about their innards. A more complex typemap could have created accessors for me automaticaly, but I find it easier just to let Perl treat the structs as opaque and provide access with my own routines. Usually, only a subset of the members of a struct will require access from Perl. I also had to add types for uint16_t and uint8_t because the built-in type system doesn’t know the <stdint.h> aliases for basic types. Kind of annoying, since the error messages were not helpful at all.

There is a library on CPAN, Inline::Struct, that reads struct definitions in header files and automatically create typemaps for them. I haven’t gotten it to work yet, but I am corresponding with the author and I think we can get it to work eventually. In the meantime, I have to handle the structs myself.

Anyway, this is an entire Perl script including the wrapper code and a quick-n-dirty example run:

#!/usr/bin/perl -w

use strict;
use warnings qw(all);
use Data::Dumper;
use Inline C =>
    Config =>
        INC => '.',
        TYPEMAPS => 'perl_typemaps',
        ENABLE => "AUTOWRAP";

use Inline "C";
Inline->init;

my $params = new_fparams();
my $width = 120;
my $height = 60;
fr_set_x_pels($params,$width);
fr_set_y_pels($params,$height);
fr_show($params);

my $output = fr_calc($params);

my $olines = [];
for (my $j=0;$j<$height;$j++) {
    my $line = '';
    for (my $i=0;$i<$width;$i++) {
        my $v = $output->[$j*$width+$i];
        my $s = $v >= 200 ? '*' : ' ';
        $line .= $s;
    };
    push(@$olines,$line);
};


print(join("\n",@$olines));
print("\n");

1;

__DATA__
__C__

#include "fractizer.h"
// Could have linked to pre-compiled code here, but it's easier
// to abuse the preprocessor and just include the source:
#include "fractizer.cpp"

fparams_t *new_fparams() {
    fparams_t *p = malloc(sizeof(fparams_t));
    if (p) set_default_params(p);
    return p;
}

void free_fparams(fparams_t *p) {
    if (p) free(p);
}

void fr_set_max_iters(fparams_t *p, uint16_t i) { p->max_iters = i; };
void fr_set_escape_val(fparams_t *p, double d)  { p->escape_val = d; };
void fr_set_x_min(fparams_t *p, double d)       { p->x_min = d; };
void fr_set_x_max(fparams_t *p, double d)       { p->x_max = d; };
void fr_set_y_min(fparams_t *p, double d)       { p->y_min = d; };
void fr_set_y_max(fparams_t *p, double d)       { p->y_max = d; };
void fr_set_x_pels(fparams_t *p, uint16_t i)    { p->x_pels = i; };
void fr_set_y_pels(fparams_t *p, uint16_t i)    { p->y_pels = i; };
void fr_set_x_tile(fparams_t *p, uint16_t i)    { p->x_tile = i; };
void fr_set_y_tile(fparams_t *p, uint16_t i)    { p->y_tile = i; };
void fr_set_type(fparams_t *p, uint8_t i)       { p->type = i; };
void fr_set_do_julia(fparams_t *p, uint8_t i)   { p->do_julia = i; };
void fr_set_jx(fparams_t *p, double d)          { p->jx = d; };
void fr_set_jy(fparams_t *p, double d)          { p->jy = d; };

void fr_show(fparams_t *p) {
    showParams(p);
};

SV *fr_calc(fparams_t *p) {
    size_t len = p->x_pels * p->y_pels;
    uint16_t *buf = malloc(sizeof(uint16_t) * len);
    generate_fractal(p,buf);

    AV* array = newAV();
    for (size_t i=0; i<len; i++) {
        av_push(array, newSVuv(buf[i]));
    };
    free(buf);
    return newRV_noinc((SV*)array);
}

Performance

I didn’t evaluate the performance of these various bindings. I assume they are all similar. The one exception might be the synchronous binding in Node.JS. I think that one has the potential to be faster because the same array used by the C code can be wrapped and use directly by NodeJS as a TypedArray. This avoids a copy of the entire output buffer contents that all the other versions do either implicitly of explicitly.

Conclusion

Binding compiled/compilable code to your favorite dynamic language gives you the benefits of both, with the overhead of having to learn just a bit more about the inner guts of the scripting language than you’d prefer. The process is more or less the same in the three languages I tried, but you see the philosophies vary.

Because there is no standard C++ ABI, all of the languages force you to use “extern C” C++ code. The exception is the C++ wrapper code for Node N-API, which sort of does the inverse. You include a head that use C++ to wrap the the N-API C functions; it works because you are compiling the wrapper.

Something I did not try is taking objects from languages other than C and C++ and binding them to the scripting languages. I am assuming that if those languages use a C style ABI that I can just link those and pretend they came from a C compiler.

Addendum: SWIG

Those of you who have been down this road before will ask, what about SWIG? The “Simple Wrapper Interface Generator” is a tool designed to look at your source and automatically generate wrappers for popular languages. SWIG has been around for a very long time. The last time I tried to use it was >10 years ago and I remember the experience as not great:

Getting SWIG installed and built was not trivial at the time, particularly on Windows.
I had to learn about SWIG and its special .i language for specifying interfaces
I had to make changes to my code so that SWIG could understand it
I had to apply some manual tweaks to the wrapper code it generated. You can do this with their language, but it is still basically you coding in the target language’s API.

In the intervening decade, some of this is fixed and some of this is most decidedly not fixed. On the plus side, it’s easier to install than it used to be. But on the downside, SWIG macros are as arcane as ever, and they do not save you from having to know how your scripting language interface API works — which to my mind is the whole point of SWIG.

This is what a usable input file to SWIG looked like for my project (for Perl):

%module fractizer
%{
#include "fractizer.h"
%}

%include "fractizer.h"

%inline %{
    unsigned short *make_obuf(size_t s) {
        unsigned short *p = malloc(sizeof(unsigned short) * s);
        return p; 
    }
    void free_obuf(unsigned short *p) {
        if (p) free(p);
    }
    SV *bufToArray(unsigned short *buf, size_t len) {
        AV *av = newAV();
        for (size_t i=0; i<len; i++) {
            av_push(av, newSVuv(buf[i]));
        }
        return newRV_noinc((SV*)av);
    }
%}

The first part is not so bad: just include the C header file. But things go downhill from there:

I needed to provide routines for creating and freeing buffers that were not in my library. That’s reasonable, as this code is still all in “C land.”
To see the contents of that buffer in the scripting language, I needed to provide a routine to do that. And that routine is written using the primitives provided by the scripting language — the exact thing you’d hope SWIG was designed to do for you. So now I have invested time in learning SWIG and I still need to know how $scripting_language works under the hood. Why bother?
Finally, SWIG didn’t understand stdint types, either, so I had to change my code to use olde fashioned names. Maybe that’s just a Perl typemap issue.

It also took me a little while to figure out how it wrapped my code and how to call it. The right answer is like this:

#!/usr/bin/perl -w
use lib '.';

use fractizer;

my $width  = 200;
my $height = 100;

my $params = fractizer::fparams_t->new();
fractizer::set_default_params($params);
$params->swig_x_pels_set($width);
$params->swig_y_pels_set($height);

fractizer::showParams($params);

my $obuf = fractizer::make_obuf($params->swig_x_pels_get() * $params->swig_y_pels_get());

fractizer::generate_fractal($params,$obuf);

my $output = fractizer::bufToArray($obuf,$params->swig_x_pels_get() * $params->swig_y_pels_get());

fractizer::free_obuf($obuf);

# ... then display the results

In short, my take on SWIG hasn’t changed: it introduces the complexity of its own little macro language and you are not really shielded from the details of your scripting language’s implementation.

meh.

Month: January 2019

scripts and object code: like chocolate and peanut butter