Announcing C::Blocks, a different way to interface Perl and C code

Discussion:

David Mertens

2014-05-23 12:35:42 UTC

Hey everyone,

tl;dr: C::Blocks is a new TinyCC-based module, presently only available on
github. (1) It jit-compiles blocks of C code, building and inserting OPs
into the Perl OP tree, making invocation of C code essentially free. (2) It
will allow different blocks of C code to share function and struct
declarations, thus removing the need to always recompile perl.h, an
otherwise major cost of jit-compiling C code that can interface with Perl
and Perl data structures.

I am currently seeking help and encouragement to squash the segfaults that
currently prevent the completion of the second feature. :-)

----

I like Perl, but I like C, too. I would like to be able to write and call C
code from Perl in about as painless a way as possible. Inline::C is nice,
as are XS::TCC and C::TinyCompiler, but we can do better. C::Blocks is my
attempt to do better.

*Pain point 1*. C code should be a first class citizen. With XS::TCC and
C::TinyCompiler, you pass your code to the compiler via a string. With
Inline::C, you either place your code at the bottom of your script in a
__DATA__ section, or you enclose it in a string. Steffen's module is
probably the most transparent in this sense. Still, working with an
interface that requires me to compile a string to get my product feels the
same as compiling a regex from a string. This is Perl! We can do better!

C::Blocks does better by using a keyword parser hook. Blocks of code that
you want executed are called like so:

print "Before cblock\n";
cblock {
printf("In cblock\n");
}
print "After cblock\n";

If stdio.h is included, you get the output

Before cblock
In cblock
After cblock

Because it uses a parser hook, the C code really is inline with your Perl
code.

*Pain point 2*. Calling C code should be obvious and cheap. All three
modules discussed so far provide a mechanism for calling C functions. This
means that for a simple, small operation, I must wrap my idea into a
function one place and invoke it in another. Furthermore, if I want to
repeatedly call a block of C code in a loop, I must define that block of
code somewhere outside of the loop, potentially very far from the call
site. C::TinyCompiler suffers further because it uses a complicated and
rather slow calling mechanism.

C::Blocks solves this by extracting and jit-compiling the C code at Perl
parse time, generating an OP and inserting it into the Perl OP tree. This
means that you can insert your C code exactly where you want it and not
worry about repeated re-compiles. If you were to wrap the example given
above in a for loop, you would see how this works.

*Pain point 3*. Sharing C code should be as easy as sharing Perl code.
C::TinyCompiler provides a fairly complex mechanism to allow modules to add
declarations and symbols to a compiler context. Any string that uses that
will need to recompile those declarations, however, tempting me to
prematurely optimize by placing all of my C code in one giant string
instead of interspersed among my Perl code. Neither XS::TCC nor Inline::C
provide much (if any) automated machinery to share code.

C::Blocks provides a mechanism to share function declarations, struct
definitions, and other identifiers with other cblocks in the current
lexical scope, as well as to share them on a per-package basis. It is even
more versatile than normal Perl function scoping, allowing you to correctly
correlate functionality with lexical scope. (It is also somewhat buggy, as
discussed next.)

*Pain point 4*. Changing C code should not cost anything. Inline::C can
take seconds to recompile a changed set of C code. In contrast, there is no
cost associated with changing code when using XS::TCC and C::TinyCompiler
because they jit-compile their code. That comes at the cost, however, of
always compiling everything each time you invoke your Perl script. If your
C code needs the Perl C API, you will have to re-parse perl.h every time
you *compile* a code block, which can happen many times with each execution
of your script. Inline::C's caching mechanism provides a big win in that
respect, unless you change your code. It would be nice if we could somehow
cache the result of parsing ``#include "perl.h"''.

C::Blocks uses a fork of tcc that I've been working on for many months
aimed at allowing one compiler context to share its symbol table with other
compiler contexts. This is related to the previous point. The sharing
mechanism discussed in the previous point applies to preprocessor includes,
so once I have compiled a block that uses the Perl headers, I can share all
of those declarations with later compilation units, without recompiling. In
future work, I plan to store these symbol tables to disk so that they don't
even need to be re-parsed each time you run your script.

----

C::Blocks currently addresses, completely, pain points 1 and 2 above. It
has taken many months to hack on tcc to reach this point. I have now
encountered some segfault-causing issues when trying to share code, yet I
have a hard time reproducing those segfaults with direct tests on tcc. If
you think this sounds like a cool project, I would appreciate some
camaraderie as I try to dig into the internals of tcc. You can find me on
perl's IRC network hanging out on #pdl, #xs, and #tinycc, among other
channels. You can find my work at https://github.com/run4flat/C-Blocks

If you would like to help out, let me know and I will give you a tour
through the codebase. :-)

Any help or encouragement would be much appreciated! Thanks!
David

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan

Ingy dot Net

2014-05-23 21:38:04 UTC

Permalink

I for one welcome our new C::Blocks overlords! :-)

Post by David Mertens
Hey everyone,
tl;dr: C::Blocks is a new TinyCC-based module, presently only available on
github. (1) It jit-compiles blocks of C code, building and inserting OPs
into the Perl OP tree, making invocation of C code essentially free. (2) It
will allow different blocks of C code to share function and struct
declarations, thus removing the need to always recompile perl.h, an
otherwise major cost of jit-compiling C code that can interface with Perl
and Perl data structures.
I am currently seeking help and encouragement to squash the segfaults that
currently prevent the completion of the second feature. :-)
----
I like Perl, but I like C, too. I would like to be able to write and call
C code from Perl in about as painless a way as possible. Inline::C is nice,
as are XS::TCC and C::TinyCompiler, but we can do better. C::Blocks is my
attempt to do better.
*Pain point 1*. C code should be a first class citizen. With XS::TCC and
C::TinyCompiler, you pass your code to the compiler via a string. With
Inline::C, you either place your code at the bottom of your script in a
__DATA__ section, or you enclose it in a string. Steffen's module is
probably the most transparent in this sense. Still, working with an
interface that requires me to compile a string to get my product feels the
same as compiling a regex from a string. This is Perl! We can do better!
C::Blocks does better by using a keyword parser hook. Blocks of code that
print "Before cblock\n";
cblock {
printf("In cblock\n");
}
print "After cblock\n";
If stdio.h is included, you get the output
Before cblock
In cblock
After cblock
Because it uses a parser hook, the C code really is inline with your Perl
code.
*Pain point 2*. Calling C code should be obvious and cheap. All three
modules discussed so far provide a mechanism for calling C functions. This
means that for a simple, small operation, I must wrap my idea into a
function one place and invoke it in another. Furthermore, if I want to
repeatedly call a block of C code in a loop, I must define that block of
code somewhere outside of the loop, potentially very far from the call
site. C::TinyCompiler suffers further because it uses a complicated and
rather slow calling mechanism.
C::Blocks solves this by extracting and jit-compiling the C code at Perl
parse time, generating an OP and inserting it into the Perl OP tree. This
means that you can insert your C code exactly where you want it and not
worry about repeated re-compiles. If you were to wrap the example given
above in a for loop, you would see how this works.
*Pain point 3*. Sharing C code should be as easy as sharing Perl code.
C::TinyCompiler provides a fairly complex mechanism to allow modules to add
declarations and symbols to a compiler context. Any string that uses that
will need to recompile those declarations, however, tempting me to
prematurely optimize by placing all of my C code in one giant string
instead of interspersed among my Perl code. Neither XS::TCC nor Inline::C
provide much (if any) automated machinery to share code.
C::Blocks provides a mechanism to share function declarations, struct
definitions, and other identifiers with other cblocks in the current
lexical scope, as well as to share them on a per-package basis. It is even
more versatile than normal Perl function scoping, allowing you to correctly
correlate functionality with lexical scope. (It is also somewhat buggy, as
discussed next.)
*Pain point 4*. Changing C code should not cost anything. Inline::C can
take seconds to recompile a changed set of C code. In contrast, there is no
cost associated with changing code when using XS::TCC and C::TinyCompiler
because they jit-compile their code. That comes at the cost, however, of
always compiling everything each time you invoke your Perl script. If your
C code needs the Perl C API, you will have to re-parse perl.h every time
you *compile* a code block, which can happen many times with each
execution of your script. Inline::C's caching mechanism provides a big win
in that respect, unless you change your code. It would be nice if we could
somehow cache the result of parsing ``#include "perl.h"''.
C::Blocks uses a fork of tcc that I've been working on for many months
aimed at allowing one compiler context to share its symbol table with other
compiler contexts. This is related to the previous point. The sharing
mechanism discussed in the previous point applies to preprocessor includes,
so once I have compiled a block that uses the Perl headers, I can share all
of those declarations with later compilation units, without recompiling. In
future work, I plan to store these symbol tables to disk so that they don't
even need to be re-parsed each time you run your script.
----
C::Blocks currently addresses, completely, pain points 1 and 2 above. It
has taken many months to hack on tcc to reach this point. I have now
encountered some segfault-causing issues when trying to share code, yet I
have a hard time reproducing those segfaults with direct tests on tcc. If
you think this sounds like a cool project, I would appreciate some
camaraderie as I try to dig into the internals of tcc. You can find me on
perl's IRC network hanging out on #pdl, #xs, and #tinycc, among other
channels. You can find my work at https://github.com/run4flat/C-Blocks
If you would like to help out, let me know and I will give you a tour
through the codebase. :-)
Any help or encouragement would be much appreciated! Thanks!
David
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan

s***@public.gmane.org

2014-05-24 03:34:07 UTC

Permalink

Sounds pretty cool, David.
A plodder like me is probably just going to stick with Inline, but it would be great to see C::Blocks takes off. And it has the attractiveness to do so.
(I like the way you can so easily just plonk the C code right in there amongst the perl code .... makes Inline look not-so-inline :-)

Cheers,
Rob

From: David Mertens
Sent: Friday, May 23, 2014 10:35 PM
To: perl-xs-***@public.gmane.org ; Perl Inline Mail List
Subject: Announcing C::Blocks, a different way to interface Perl and C code
Hey everyone,

tl;dr: C::Blocks is a new TinyCC-based module, presently only available on github. (1) It jit-compiles blocks of C code, building and inserting OPs into the Perl OP tree, making invocation of C code essentially free. (2) It will allow different blocks of C code to share function and struct declarations, thus removing the need to always recompile perl.h, an otherwise major cost of jit-compiling C code that can interface with Perl and Perl data structures.

I am currently seeking help and encouragement to squash the segfaults that currently prevent the completion of the second feature. :-)

----

I like Perl, but I like C, too. I would like to be able to write and call C code from Perl in about as painless a way as possible. Inline::C is nice, as are XS::TCC and C::TinyCompiler, but we can do better. C::Blocks is my attempt to do better.

Pain point 1. C code should be a first class citizen. With XS::TCC and C::TinyCompiler, you pass your code to the compiler via a string. With Inline::C, you either place your code at the bottom of your script in a __DATA__ section, or you enclose it in a string. Steffen's module is probably the most transparent in this sense. Still, working with an interface that requires me to compile a string to get my product feels the same as compiling a regex from a string. This is Perl! We can do better!

C::Blocks does better by using a keyword parser hook. Blocks of code that you want executed are called like so:

print "Before cblock\n";

cblock {

printf("In cblock\n");

}

print "After cblock\n";

If stdio.h is included, you get the output

Before cblock

In cblock

After cblock

Because it uses a parser hook, the C code really is inline with your Perl code.

Pain point 2. Calling C code should be obvious and cheap. All three modules discussed so far provide a mechanism for calling C functions. This means that for a simple, small operation, I must wrap my idea into a function one place and invoke it in another. Furthermore, if I want to repeatedly call a block of C code in a loop, I must define that block of code somewhere outside of the loop, potentially very far from the call site. C::TinyCompiler suffers further because it uses a complicated and rather slow calling mechanism.

C::Blocks solves this by extracting and jit-compiling the C code at Perl parse time, generating an OP and inserting it into the Perl OP tree. This means that you can insert your C code exactly where you want it and not worry about repeated re-compiles. If you were to wrap the example given above in a for loop, you would see how this works.

Pain point 3. Sharing C code should be as easy as sharing Perl code. C::TinyCompiler provides a fairly complex mechanism to allow modules to add declarations and symbols to a compiler context. Any string that uses that will need to recompile those declarations, however, tempting me to prematurely optimize by placing all of my C code in one giant string instead of interspersed among my Perl code. Neither XS::TCC nor Inline::C provide much (if any) automated machinery to share code.

C::Blocks provides a mechanism to share function declarations, struct definitions, and other identifiers with other cblocks in the current lexical scope, as well as to share them on a per-package basis. It is even more versatile than normal Perl function scoping, allowing you to correctly correlate functionality with lexical scope. (It is also somewhat buggy, as discussed next.)

Pain point 4. Changing C code should not cost anything. Inline::C can take seconds to recompile a changed set of C code. In contrast, there is no cost associated with changing code when using XS::TCC and C::TinyCompiler because they jit-compile their code. That comes at the cost, however, of always compiling everything each time you invoke your Perl script. If your C code needs the Perl C API, you will have to re-parse perl.h every time you compile a code block, which can happen many times with each execution of your script. Inline::C's caching mechanism provides a big win in that respect, unless you change your code. It would be nice if we could somehow cache the result of parsing ``#include "perl.h"''.

C::Blocks uses a fork of tcc that I've been working on for many months aimed at allowing one compiler context to share its symbol table with other compiler contexts. This is related to the previous point. The sharing mechanism discussed in the previous point applies to preprocessor includes, so once I have compiled a block that uses the Perl headers, I can share all of those declarations with later compilation units, without recompiling. In future work, I plan to store these symbol tables to disk so that they don't even need to be re-parsed each time you run your script.

----

C::Blocks currently addresses, completely, pain points 1 and 2 above. It has taken many months to hack on tcc to reach this point. I have now encountered some segfault-causing issues when trying to share code, yet I have a hard time reproducing those segfaults with direct tests on tcc. If you think this sounds like a cool project, I would appreciate some camaraderie as I try to dig into the internals of tcc. You can find me on perl's IRC network hanging out on #pdl, #xs, and #tinycc, among other channels. You can find my work at https://github.com/run4flat/C-Blocks

If you would like to help out, let me know and I will give you a tour through the codebase. :-)

Any help or encouragement would be much appreciated! Thanks!
David

David Mertens

2014-05-24 11:08:15 UTC

Permalink

Thanks Rob, and thanks Ingy. Any and all words of encouragement are much
appreciated.

By the way, it's possible to write a similar keyword-based wrapper for
Inline::C. In fact, I plan to eventually build something like
C::Blocks::Cache (or something like that) which generates a .xs and .pmc
file from a single .pm file. In principle, one could then pass this along
to the Inline machinery to compile.

Eventually. :-)
David

Post by s***@public.gmane.org
Sounds pretty cool, David.
A plodder like me is probably just going to stick with Inline, but it
would be great to see C::Blocks takes off. And it has the attractiveness to
do so.
(I like the way you can so easily just plonk the C code right in there
amongst the perl code .... makes Inline look not-so-inline :-)
Cheers,
Rob
*Sent:* Friday, May 23, 2014 10:35 PM
*Subject:* Announcing C::Blocks, a different way to interface Perl and C
code
Hey everyone,
tl;dr: C::Blocks is a new TinyCC-based module, presently only available on
github. (1) It jit-compiles blocks of C code, building and inserting OPs
into the Perl OP tree, making invocation of C code essentially free. (2) It
will allow different blocks of C code to share function and struct
declarations, thus removing the need to always recompile perl.h, an
otherwise major cost of jit-compiling C code that can interface with Perl
and Perl data structures.
I am currently seeking help and encouragement to squash the segfaults that
currently prevent the completion of the second feature. :-)
----
I like Perl, but I like C, too. I would like to be able to write and call
C code from Perl in about as painless a way as possible. Inline::C is nice,
as are XS::TCC and C::TinyCompiler, but we can do better. C::Blocks is my
attempt to do better.
*Pain point 1*. C code should be a first class citizen. With XS::TCC and
C::TinyCompiler, you pass your code to the compiler via a string. With
Inline::C, you either place your code at the bottom of your script in a
__DATA__ section, or you enclose it in a string. Steffen's module is
probably the most transparent in this sense. Still, working with an
interface that requires me to compile a string to get my product feels the
same as compiling a regex from a string. This is Perl! We can do better!
C::Blocks does better by using a keyword parser hook. Blocks of code that
print "Before cblock\n";
cblock {
printf("In cblock\n");
}
print "After cblock\n";
If stdio.h is included, you get the output
Before cblock
In cblock
After cblock
Because it uses a parser hook, the C code really is inline with your Perl code.
*Pain point 2*. Calling C code should be obvious and cheap. All three
modules discussed so far provide a mechanism for calling C functions. This
means that for a simple, small operation, I must wrap my idea into a
function one place and invoke it in another. Furthermore, if I want to
repeatedly call a block of C code in a loop, I must define that block of
code somewhere outside of the loop, potentially very far from the call
site. C::TinyCompiler suffers further because it uses a complicated and
rather slow calling mechanism.
C::Blocks solves this by extracting and jit-compiling the C code at Perl
parse time, generating an OP and inserting it into the Perl OP tree. This
means that you can insert your C code exactly where you want it and not
worry about repeated re-compiles. If you were to wrap the example given
above in a for loop, you would see how this works.
*Pain point 3*. Sharing C code should be as easy as sharing Perl code.
C::TinyCompiler provides a fairly complex mechanism to allow modules to add
declarations and symbols to a compiler context. Any string that uses that
will need to recompile those declarations, however, tempting me to
prematurely optimize by placing all of my C code in one giant string
instead of interspersed among my Perl code. Neither XS::TCC nor Inline::C
provide much (if any) automated machinery to share code.
C::Blocks provides a mechanism to share function declarations, struct
definitions, and other identifiers with other cblocks in the current
lexical scope, as well as to share them on a per-package basis. It is even
more versatile than normal Perl function scoping, allowing you to correctly
correlate functionality with lexical scope. (It is also somewhat buggy, as
discussed next.)
*Pain point 4*. Changing C code should not cost anything. Inline::C can
take seconds to recompile a changed set of C code. In contrast, there is no
cost associated with changing code when using XS::TCC and C::TinyCompiler
because they jit-compile their code. That comes at the cost, however, of
always compiling everything each time you invoke your Perl script. If your
C code needs the Perl C API, you will have to re-parse perl.h every time
you *compile* a code block, which can happen many times with each
execution of your script. Inline::C's caching mechanism provides a big win
in that respect, unless you change your code. It would be nice if we could
somehow cache the result of parsing ``#include "perl.h"''.
C::Blocks uses a fork of tcc that I've been working on for many months
aimed at allowing one compiler context to share its symbol table with other
compiler contexts. This is related to the previous point. The sharing
mechanism discussed in the previous point applies to preprocessor includes,
so once I have compiled a block that uses the Perl headers, I can share all
of those declarations with later compilation units, without recompiling. In
future work, I plan to store these symbol tables to disk so that they don't
even need to be re-parsed each time you run your script.
----
C::Blocks currently addresses, completely, pain points 1 and 2 above. It
has taken many months to hack on tcc to reach this point. I have now
encountered some segfault-causing issues when trying to share code, yet I
have a hard time reproducing those segfaults with direct tests on tcc. If
you think this sounds like a cool project, I would appreciate some
camaraderie as I try to dig into the internals of tcc. You can find me on
perl's IRC network hanging out on #pdl, #xs, and #tinycc, among other
channels. You can find my work at https://github.com/run4flat/C-Blocks
If you would like to help out, let me know and I will give you a tour
through the codebase. :-)
Any help or encouragement would be much appreciated! Thanks!
David
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan

Reini Urban

2014-05-27 14:55:14 UTC

Permalink

Post by David Mertens
Thanks Rob, and thanks Ingy. Any and all words of encouragement are much
appreciated.
By the way, it's possible to write a similar keyword-based wrapper for
Inline::C. In fact, I plan to eventually build something like
C::Blocks::Cache (or something like that) which generates a .xs and .pmc
file from a single .pm file. In principle, one could then pass this
along to the Inline machinery to compile.
Eventually. :-)
David

Very cool

I'm planning to do something similar with perlcc and rperl.

Generate shared libs per .pm, a Makefile and the ability to choose the
compilation backend (= optimization level) via B::C, B::CC or rperl.

David Mertens

2015-08-04 00:53:25 UTC

Permalink

Post by David Mertens
Hey everyone,
C::Blocks is a new TinyCC-based module, presently only available on
github. (1) It jit-compiles blocks of C code, building and inserting OPs
into the Perl OP tree, making invocation of C code essentially free. (2) It
will allow different blocks of C code to share function and struct
declarations, thus removing the need to always recompile perl.h, an
otherwise major cost of jit-compiling C code that can interface with Perl
and Perl data structures.
I am currently seeking help and encouragement to squash the segfaults that
currently prevent the completion of the second feature. :-)

It took a lot longer than I had expected, but I finally completed the
second feature listed above and released the first "alpha" release of
C::Blocks <https://metacpan.org/pod/C::Blocks> today.

It took a long time because ultimately I had to create my own fork of the
Tiny C Compiler <https://github.com/run4flat/tinycc> that supports extended
symbol tables. I released an Alien distribution
<https://metacpan.org/pod/Alien::TinyCCx> with (what I believe to be a
nearly complete implementation of) extended symbol table support late last
week, and it seems to be passing its test suite decently well
<http://matrix.cpantesters.org/?dist=Alien-TinyCCx+0.06> on a fair number
of platforms. Alien::TinyCCx is doing particularly well on Linux and decent
on Windows. I'm a bit annoyed it's not passing on Macs because it works on
my Mac. That'll get fixed soonish, I hope.

It's still rough around the edges, but it's surprisingly fast. I have a
number of examples
<https://metacpan.org/source/DCMERTENS/C-Blocks-0.01/examples> that might
give you an idea of how it works. I am particularly fond of my "port"
<https://metacpan.org/source/DCMERTENS/C-Blocks-0.01/examples/libobjmg.pl>
of XS::Object::Magic <https://metacpan.org/pod/XS::Object::Magic>, which
simply involved copying the top half of rafl's code from his Magic.xs
<https://metacpan.org/source/FLORA/XS-Object-Magic-0.04/Magic.xs> file into
a cshare block.

I plan to write more about it on blogs.perl.org, so I would encourage folks
to check there if interested.

David