Discussion:
Stringwise comparisons (Inline::C)
David Oswald
2012-05-07 16:50:18 UTC
Permalink
I'm trying to figure out a sane approach to string comparisons in
extension code. I'm using Inline::C on this one. The simplest
solution turns out to be the hardest: Unpack an SV into a C-String,
and use C's comparison functions. Why is that the hardest? Because
it ignores Unicode. However, Perl's 'cmp', 'le', 'ge', 'lt', 'gt',
and 'eq' know about Unicode. So I wanted to compare the two SV's
using Perl's native operations. I see some documentation in perlapi,
but it's so terse I could benefit from an example.

So... does anyone know how these would be implemented in C as XS
extension code such that the comparisons invoke Perl's native tools?

sub compare {
return $_[0] cmp $_[1];
}

.... A C stub:

int compare( SV* left, SV* right ) {
int result;
// Compare the two SV's stringwise.
return result;
}


....and....

sub lessthan {
return $_[0] lt $_[1];
}

A C stub:

int lessthan( SV* left, SV* right ) {
int result;
// Less-than compare the two SV's stringwise.
return result;
}

My goal is to use Inline::C and Inline::C2XS to produce an XS version
of my List::BinarySearch module. It would be named
List::BinarySearch::XS, and I would convert List::BinarySearch to
auto-detect the XS version's presence, substituting the XS versions in
place of the Pure Perl versions if the XS module exists. Not
surprisingly, the documentation on dealing with strings in extension
code is a bit opaque.

Dave
--
David Oswald
daoswald-***@public.gmane.org
David Mertens
2012-05-07 17:13:00 UTC
Permalink
Sorry, I got nothing. The only function I found for UTF8 comparisons is
foldEQ_utf8, which is supposed to perform case-insensitive string
comparisons. Maybe XS mailing list or p5p is the better place to ask?

David
Post by David Oswald
I'm trying to figure out a sane approach to string comparisons in
extension code. I'm using Inline::C on this one. The simplest
solution turns out to be the hardest: Unpack an SV into a C-String,
and use C's comparison functions. Why is that the hardest? Because
it ignores Unicode. However, Perl's 'cmp', 'le', 'ge', 'lt', 'gt',
and 'eq' know about Unicode. So I wanted to compare the two SV's
using Perl's native operations. I see some documentation in perlapi,
but it's so terse I could benefit from an example.
So... does anyone know how these would be implemented in C as XS
extension code such that the comparisons invoke Perl's native tools?
sub compare {
return $_[0] cmp $_[1];
}
int compare( SV* left, SV* right ) {
int result;
// Compare the two SV's stringwise.
return result;
}
....and....
sub lessthan {
return $_[0] lt $_[1];
}
int lessthan( SV* left, SV* right ) {
int result;
// Less-than compare the two SV's stringwise.
return result;
}
My goal is to use Inline::C and Inline::C2XS to produce an XS version
of my List::BinarySearch module. It would be named
List::BinarySearch::XS, and I would convert List::BinarySearch to
auto-detect the XS version's presence, substituting the XS versions in
place of the Pure Perl versions if the XS module exists. Not
surprisingly, the documentation on dealing with strings in extension
code is a bit opaque.
Dave
--
David Oswald
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
Sisyphus
2012-05-08 01:30:28 UTC
Permalink
----- Original Message -----
From: "David Oswald"
Post by David Oswald
sub compare {
return $_[0] cmp $_[1];
}
int compare( SV* left, SV* right ) {
int result;
// Compare the two SV's stringwise.
return result;
}
....and....
sub lessthan {
return $_[0] lt $_[1];
}
int lessthan( SV* left, SV* right ) {
int result;
// Less-than compare the two SV's stringwise.
return result;
}
I think a callback to a perl subroutine is what you're after. See the
perlcall documentation.

The following demo is taken from one of the examples in the perlcall docs -
but it prints out the result of the comparison, rather than returning it.

I couldn't quickly modify it to return the value .... though it shouldn't be
that difficult. I think you'll need to replace "dSP" with "dXSARGS" and then
somehow XSRETURN the value off the stack. I'll leave my quick and wrong
attempt in there - it simply returns 'hello'.
There might even be some example in perlcall that demonstrates how to
correctly return a value - I didn't look all the way thru it.

##############################
use warnings;

use Inline C => Config =>
BUILD_NOISY => 1;

use Inline C => <<'EOC';

void call_Compare(char * a, char * b) {
dSP;
int count;

ENTER;
SAVETMPS;

PUSHMARK(SP);
XPUSHs(sv_2mortal(newSVpv(a, 0)));
XPUSHs(sv_2mortal(newSVpv(b, 0)));
PUTBACK;

count = call_pv("Compare", G_SCALAR);

SPAGAIN;

if (count != 1)
croak("Big trouble\n");

printf ("Comparison of %s and %s yields %d\n", a, b, POPi);

PUTBACK;
FREETMPS;
LEAVE;

}


void return_Compare(char * a, char * b) {
dXSARGS;
int count;

ENTER;
SAVETMPS;

PUSHMARK(SP);
XPUSHs(sv_2mortal(newSVpv(a, 0)));
XPUSHs(sv_2mortal(newSVpv(b, 0)));
PUTBACK;

count = call_pv("Compare", G_SCALAR);

SPAGAIN;

if (count != 1)
croak("Big trouble\n");

PUTBACK;
FREETMPS;
XSRETURN(1);

}

EOC

call_Compare ('hello', 'world');
call_Compare ('hello', 'hello');
call_Compare ('hello', 'hell');

print return_Compare ('hello', 'world'), "\n";
print return_Compare ('hello', 'hello'), "\n";
print return_Compare ('hello', 'hell'), "\n";


sub Compare {
return $_[0] cmp $_[1];
}

##############################

Cheers,
Rob
David Oswald
2012-05-08 01:52:24 UTC
Permalink
Post by Sisyphus
I think a callback to a perl subroutine is what you're after. See the
perlcall documentation.
That demo will actually help when I get this function (in an XS
version of List::BinarySearch):

bsearch_custom { $_[0] cmp $_[1] } $needle, @haystack

But for this:

bsearch_str $needle, @haystack

I wouldn't be calling out to a subroutine implemented in Perl, but
rather, to a Perl internal built-in that does comparisons.

The perl-xs-***@public.gmane.org mailing list has been completely silent on my
request for suggestions. Max Maischein (Corion) has been helpful,
with this suggestion over at PerlMonks:

Corion says: I think you'll need to call the appropriate OP,
which seems (wildly guessing) to be pp_scmp in pp.c.
Or appropriate the code from there.

I've looked at pp.c (Perl internals) and I think he's right for the
case of 'cmp'. Looks like I just need to figure out how to use the
undocumented function, and need to locate somewhere in that code the
'lt', 'gt', and 'eq' operators.

Once I figure it out it actually may make a good addition to the
Inline::C cookbook. Any existing example code that shows how easy it
is to pass Perl strings as C-strings is glossing over the fact that as
soon as the user's data is utf8 encoded the code breaks.



--

David Oswald
daoswald-***@public.gmane.org
Sisyphus
2012-05-08 03:41:08 UTC
Permalink
----- Original Message -----
From: "David Oswald"
Post by David Oswald
I wouldn't be calling out to a subroutine implemented in Perl, but
rather, to a Perl internal built-in that does comparisons.
request for suggestions. Max Maischein (Corion) has been helpful,
Corion says: I think you'll need to call the appropriate OP,
which seems (wildly guessing) to be pp_scmp in pp.c.
Or appropriate the code from there.
Yep, if you can't re-structure the code to call a perl sub then, afaik,
you'll have to do as Corion has suggested.
Post by David Oswald
Once I figure it out it actually may make a good addition to the
Inline::C cookbook.
Yes, I think so.

Cheers,
Rob
David Mertens
2012-05-08 11:26:04 UTC
Permalink
Now now, let's not be too hasty. Using an undocumented function is
generally not advised, and documenting how to use an undocumented function
is generally frowned upon. As such I see two possibilities here:

1. Use pp_scmp and document it with lots of caveats indicating that the
function is not part of Perl's public API.
2. Ask p5p if pp_scmp (and any other handy UTF8 string functions) can be
added to the public API, as well as how you might go about doing that. The
difference between public and private API for Perl, as far as I know, is
just the decision to document the thing. So this probably means simply
adding the docs for the function. I could help out with this.

Would you like me to email p5p and get their ideas about it?

David
----- Original Message ----- From: "David Oswald"
Post by David Oswald
I wouldn't be calling out to a subroutine implemented in Perl, but
rather, to a Perl internal built-in that does comparisons.
request for suggestions. Max Maischein (Corion) has been helpful,
Corion says: I think you'll need to call the appropriate OP,
which seems (wildly guessing) to be pp_scmp in pp.c.
Or appropriate the code from there.
Yep, if you can't re-structure the code to call a perl sub then, afaik,
you'll have to do as Corion has suggested.
Once I figure it out it actually may make a good addition to the
Post by David Oswald
Inline::C cookbook.
Yes, I think so.
Cheers,
Rob
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
David Mertens
2012-05-08 17:24:20 UTC
Permalink
To those not CC'd:

I posted a question on p5p and Nicholas Clark pointed out that sv_cmp looks
like it's *exactly* what you need, both as a general comperator, and as a
means to implement eq, lt, gt, etc. Does this look right?

David
Post by David Mertens
Now now, let's not be too hasty. Using an undocumented function is
generally not advised, and documenting how to use an undocumented function
1. Use pp_scmp and document it with lots of caveats indicating that
the function is not part of Perl's public API.
2. Ask p5p if pp_scmp (and any other handy UTF8 string functions) can
be added to the public API, as well as how you might go about doing that.
The difference between public and private API for Perl, as far as I know,
is just the decision to document the thing. So this probably means simply
adding the docs for the function. I could help out with this.
Would you like me to email p5p and get their ideas about it?
David
----- Original Message ----- From: "David Oswald"
Post by David Oswald
I wouldn't be calling out to a subroutine implemented in Perl, but
rather, to a Perl internal built-in that does comparisons.
request for suggestions. Max Maischein (Corion) has been helpful,
Corion says: I think you'll need to call the appropriate OP,
which seems (wildly guessing) to be pp_scmp in pp.c.
Or appropriate the code from there.
Yep, if you can't re-structure the code to call a perl sub then, afaik,
you'll have to do as Corion has suggested.
Once I figure it out it actually may make a good addition to the
Post by David Oswald
Inline::C cookbook.
Yes, I think so.
Cheers,
Rob
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
David Oswald
2012-05-09 01:38:15 UTC
Permalink
Post by David Mertens
I posted a question on p5p and Nicholas Clark pointed out that sv_cmp looks
like it's *exactly* what you need, both as a general comperator, and as a
means to implement eq, lt, gt, etc. Does this look right?
David

Thanks for taking that initiative. I was a little shy about asking on
P5P for some reason; I guess I wasn't sure if I was asking the right
question.

It looks like sv_cmp is the right tool for the job.

It's unfortunate that the API is so opaque in this area. One of
Perl's strengths is text processing, but when writing XS extension
code that should "just work" regardless of whether the text handed to
it is Unicode or not, that strength is stripped away if nobody can
figure out how to do it. I think you and Nicholas made progress.

I probably won't have a chance to work with it for a few days, but
when I do I'll be sure to discuss my experiences here on the Inline
list (hopefully on-topic since I'll use Inline::C or maybe Inline::CPP
and then possibly Rob's c2xs or cpp2xs.

Dave
--
David Oswald
daoswald-***@public.gmane.org
David Mertens
2012-05-09 11:06:56 UTC
Permalink
David -
Post by David Oswald
Post by David Mertens
I posted a question on p5p and Nicholas Clark pointed out that sv_cmp
looks
Post by David Mertens
like it's *exactly* what you need, both as a general comperator, and as a
means to implement eq, lt, gt, etc. Does this look right?
David
Thanks for taking that initiative. I was a little shy about asking on
P5P for some reason;
I've been on the p5p list since December, so I've gotten some idea for what
passes as acceptable. :-)

Perl5Porters doesn't usually get that sort of question, but I've seen far
sillier questions, and I've asked similar questions elsewhere and not
gotten anywhere.
Post by David Oswald
I guess I wasn't sure if I was asking the right question.
Funny you mention it. I certainly asked the wrong question, but still
managed to get the right answer. :-)
Post by David Oswald
It looks like sv_cmp is the right tool for the job.
It's unfortunate that the API is so opaque in this area. One of
Perl's strengths is text processing, but when writing XS extension
code that should "just work" regardless of whether the text handed to
it is Unicode or not, that strength is stripped away if nobody can
figure out how to do it. I think you and Nicholas made progress.
I've also felt that perlapi generally lacks good examples. I could see
myself enjoying expanding these docs.
Post by David Oswald
I probably won't have a chance to work with it for a few days, but
when I do I'll be sure to discuss my experiences here on the Inline
list (hopefully on-topic since I'll use Inline::C or maybe Inline::CPP
and then possibly Rob's c2xs or cpp2xs.
Dave
--
David Oswald
No rush on my end. I've already gained much just learning more about the C
API and the process for expanding it was a learning.

David
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
Steffen Mueller
2012-05-09 16:37:58 UTC
Permalink
Post by David Mertens
Now now, let's not be too hasty. Using an undocumented function is
generally not advised, and documenting how to use an undocumented
1. Use pp_scmp and document it with lots of caveats indicating that the
function is not part of Perl's public API.
2. Ask p5p if pp_scmp (and any other handy UTF8 string functions) can
be added to the public API, as well as how you might go about doing
that. The difference between public and private API for Perl, as far
as I know, is just the decision to document the thing. So this
probably means simply adding the docs for the function. I could help
out with this.
Sorry, part of 2. is wrong. It's not just documentation. On win32, for
example, you can't access symbols that weren't explicitly exported. This
being said, making functions public isn't *technically* difficult, just
brings up considerations about whether the current interface is
something we want to commit to for eternity.

--Steffen

Loading...