David Mertens
2014-05-23 12:35:42 UTC
Hey everyone,
tl;dr: C::Blocks is a new TinyCC-based module, presently only available on
github. (1) It jit-compiles blocks of C code, building and inserting OPs
into the Perl OP tree, making invocation of C code essentially free. (2) It
will allow different blocks of C code to share function and struct
declarations, thus removing the need to always recompile perl.h, an
otherwise major cost of jit-compiling C code that can interface with Perl
and Perl data structures.
I am currently seeking help and encouragement to squash the segfaults that
currently prevent the completion of the second feature. :-)
----
I like Perl, but I like C, too. I would like to be able to write and call C
code from Perl in about as painless a way as possible. Inline::C is nice,
as are XS::TCC and C::TinyCompiler, but we can do better. C::Blocks is my
attempt to do better.
*Pain point 1*. C code should be a first class citizen. With XS::TCC and
C::TinyCompiler, you pass your code to the compiler via a string. With
Inline::C, you either place your code at the bottom of your script in a
__DATA__ section, or you enclose it in a string. Steffen's module is
probably the most transparent in this sense. Still, working with an
interface that requires me to compile a string to get my product feels the
same as compiling a regex from a string. This is Perl! We can do better!
C::Blocks does better by using a keyword parser hook. Blocks of code that
you want executed are called like so:
print "Before cblock\n";
cblock {
printf("In cblock\n");
}
print "After cblock\n";
If stdio.h is included, you get the output
Before cblock
In cblock
After cblock
Because it uses a parser hook, the C code really is inline with your Perl
code.
*Pain point 2*. Calling C code should be obvious and cheap. All three
modules discussed so far provide a mechanism for calling C functions. This
means that for a simple, small operation, I must wrap my idea into a
function one place and invoke it in another. Furthermore, if I want to
repeatedly call a block of C code in a loop, I must define that block of
code somewhere outside of the loop, potentially very far from the call
site. C::TinyCompiler suffers further because it uses a complicated and
rather slow calling mechanism.
C::Blocks solves this by extracting and jit-compiling the C code at Perl
parse time, generating an OP and inserting it into the Perl OP tree. This
means that you can insert your C code exactly where you want it and not
worry about repeated re-compiles. If you were to wrap the example given
above in a for loop, you would see how this works.
*Pain point 3*. Sharing C code should be as easy as sharing Perl code.
C::TinyCompiler provides a fairly complex mechanism to allow modules to add
declarations and symbols to a compiler context. Any string that uses that
will need to recompile those declarations, however, tempting me to
prematurely optimize by placing all of my C code in one giant string
instead of interspersed among my Perl code. Neither XS::TCC nor Inline::C
provide much (if any) automated machinery to share code.
C::Blocks provides a mechanism to share function declarations, struct
definitions, and other identifiers with other cblocks in the current
lexical scope, as well as to share them on a per-package basis. It is even
more versatile than normal Perl function scoping, allowing you to correctly
correlate functionality with lexical scope. (It is also somewhat buggy, as
discussed next.)
*Pain point 4*. Changing C code should not cost anything. Inline::C can
take seconds to recompile a changed set of C code. In contrast, there is no
cost associated with changing code when using XS::TCC and C::TinyCompiler
because they jit-compile their code. That comes at the cost, however, of
always compiling everything each time you invoke your Perl script. If your
C code needs the Perl C API, you will have to re-parse perl.h every time
you *compile* a code block, which can happen many times with each execution
of your script. Inline::C's caching mechanism provides a big win in that
respect, unless you change your code. It would be nice if we could somehow
cache the result of parsing ``#include "perl.h"''.
C::Blocks uses a fork of tcc that I've been working on for many months
aimed at allowing one compiler context to share its symbol table with other
compiler contexts. This is related to the previous point. The sharing
mechanism discussed in the previous point applies to preprocessor includes,
so once I have compiled a block that uses the Perl headers, I can share all
of those declarations with later compilation units, without recompiling. In
future work, I plan to store these symbol tables to disk so that they don't
even need to be re-parsed each time you run your script.
----
C::Blocks currently addresses, completely, pain points 1 and 2 above. It
has taken many months to hack on tcc to reach this point. I have now
encountered some segfault-causing issues when trying to share code, yet I
have a hard time reproducing those segfaults with direct tests on tcc. If
you think this sounds like a cool project, I would appreciate some
camaraderie as I try to dig into the internals of tcc. You can find me on
perl's IRC network hanging out on #pdl, #xs, and #tinycc, among other
channels. You can find my work at https://github.com/run4flat/C-Blocks
If you would like to help out, let me know and I will give you a tour
through the codebase. :-)
Any help or encouragement would be much appreciated! Thanks!
David
tl;dr: C::Blocks is a new TinyCC-based module, presently only available on
github. (1) It jit-compiles blocks of C code, building and inserting OPs
into the Perl OP tree, making invocation of C code essentially free. (2) It
will allow different blocks of C code to share function and struct
declarations, thus removing the need to always recompile perl.h, an
otherwise major cost of jit-compiling C code that can interface with Perl
and Perl data structures.
I am currently seeking help and encouragement to squash the segfaults that
currently prevent the completion of the second feature. :-)
----
I like Perl, but I like C, too. I would like to be able to write and call C
code from Perl in about as painless a way as possible. Inline::C is nice,
as are XS::TCC and C::TinyCompiler, but we can do better. C::Blocks is my
attempt to do better.
*Pain point 1*. C code should be a first class citizen. With XS::TCC and
C::TinyCompiler, you pass your code to the compiler via a string. With
Inline::C, you either place your code at the bottom of your script in a
__DATA__ section, or you enclose it in a string. Steffen's module is
probably the most transparent in this sense. Still, working with an
interface that requires me to compile a string to get my product feels the
same as compiling a regex from a string. This is Perl! We can do better!
C::Blocks does better by using a keyword parser hook. Blocks of code that
you want executed are called like so:
print "Before cblock\n";
cblock {
printf("In cblock\n");
}
print "After cblock\n";
If stdio.h is included, you get the output
Before cblock
In cblock
After cblock
Because it uses a parser hook, the C code really is inline with your Perl
code.
*Pain point 2*. Calling C code should be obvious and cheap. All three
modules discussed so far provide a mechanism for calling C functions. This
means that for a simple, small operation, I must wrap my idea into a
function one place and invoke it in another. Furthermore, if I want to
repeatedly call a block of C code in a loop, I must define that block of
code somewhere outside of the loop, potentially very far from the call
site. C::TinyCompiler suffers further because it uses a complicated and
rather slow calling mechanism.
C::Blocks solves this by extracting and jit-compiling the C code at Perl
parse time, generating an OP and inserting it into the Perl OP tree. This
means that you can insert your C code exactly where you want it and not
worry about repeated re-compiles. If you were to wrap the example given
above in a for loop, you would see how this works.
*Pain point 3*. Sharing C code should be as easy as sharing Perl code.
C::TinyCompiler provides a fairly complex mechanism to allow modules to add
declarations and symbols to a compiler context. Any string that uses that
will need to recompile those declarations, however, tempting me to
prematurely optimize by placing all of my C code in one giant string
instead of interspersed among my Perl code. Neither XS::TCC nor Inline::C
provide much (if any) automated machinery to share code.
C::Blocks provides a mechanism to share function declarations, struct
definitions, and other identifiers with other cblocks in the current
lexical scope, as well as to share them on a per-package basis. It is even
more versatile than normal Perl function scoping, allowing you to correctly
correlate functionality with lexical scope. (It is also somewhat buggy, as
discussed next.)
*Pain point 4*. Changing C code should not cost anything. Inline::C can
take seconds to recompile a changed set of C code. In contrast, there is no
cost associated with changing code when using XS::TCC and C::TinyCompiler
because they jit-compile their code. That comes at the cost, however, of
always compiling everything each time you invoke your Perl script. If your
C code needs the Perl C API, you will have to re-parse perl.h every time
you *compile* a code block, which can happen many times with each execution
of your script. Inline::C's caching mechanism provides a big win in that
respect, unless you change your code. It would be nice if we could somehow
cache the result of parsing ``#include "perl.h"''.
C::Blocks uses a fork of tcc that I've been working on for many months
aimed at allowing one compiler context to share its symbol table with other
compiler contexts. This is related to the previous point. The sharing
mechanism discussed in the previous point applies to preprocessor includes,
so once I have compiled a block that uses the Perl headers, I can share all
of those declarations with later compilation units, without recompiling. In
future work, I plan to store these symbol tables to disk so that they don't
even need to be re-parsed each time you run your script.
----
C::Blocks currently addresses, completely, pain points 1 and 2 above. It
has taken many months to hack on tcc to reach this point. I have now
encountered some segfault-causing issues when trying to share code, yet I
have a hard time reproducing those segfaults with direct tests on tcc. If
you think this sounds like a cool project, I would appreciate some
camaraderie as I try to dig into the internals of tcc. You can find me on
perl's IRC network hanging out on #pdl, #xs, and #tinycc, among other
channels. You can find my work at https://github.com/run4flat/C-Blocks
If you would like to help out, let me know and I will give you a tour
through the codebase. :-)
Any help or encouragement would be much appreciated! Thanks!
David
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan