How to work with Ruby strings in C extensions

The Problem

That's about string types in C and Ruby. As you may know, C uses null-terminated strings while Ruby uses more sophisticated string type, therefore C strings cannot contain null byte while Ruby strings can. Many Ruby gems are written in C but what happens when you convert Ruby string to the C string?

Well, that depends. There are at least two ways in Ruby C API:

RSTRING_PTR(VALUE)

strlcpy(
    our_c_string,
    RSTRING_PTR(our_ruby_string),
    RSTRING_LEN(our_ruby_string) + 1 // don't forget the terminating zero
);

So what happens to strings containing nulls? They just get truncated.

"HAHA I'M HAXXOR! \0 SOME CORRUPT DATA" becomes "HAHA I'M HAXXOR! "

But there is another, better way.

StringValueCStr(VALUE)

our_c_string = StringValueCStr(our_ruby_string);

Seems simple enough but if our_ruby_string contains null, we will get an exception, the one we got from systemd-journal:

ArgumentError: string contains null byte

Better but still may be not good enough.

What to do then?

There are several options to make it right, depending on what you're trying to do.

Wrapping C library that depends on C strings

If C library that you're wrapping heavily relies on C strings, then you have no choice. Just use StringValueCStr and let it fail on incorrect C strings.

This also happens when you use ffi instead of writing an extension. ffi always uses StringValueCStr for string arguments.

Wrapping C library that doesn't depend on C strings

You may be lucky enough to find one. I got lucky when writing a wrapper for systemd-journal, because sd_journal_sendv() uses iovec structs as arguments, not strings. So just use it! Get buffer pointer and length.

struct iovec* msgs = xcalloc(argc, sizeof(struct iovec));

for (int i = 0; i < argc; i++) {
    VALUE v = argv[i];
    msgs[i].iov_base = RSTRING_PTR(v);
    msgs[i].iov_len  = RSTRING_LEN(v);
}

int result = sd_journal_sendv(msgs, argc);

However doing this way means that you must abandon ffi and write a real extension.

Wrapping a library without strings or writing something yourself

Then just avoid C strings. Use ffi and do all string related things in Ruby. Or even use rice and write everything in C++.

Conclusion

C strings are awful. Avoid them.

Comments

Comments powered by Disqus