Mailing List Archive

RFC: pack()ing long words
pack() and unpack() can handle words of 1, 2, 4, and (if you built your
perl right) 8 bytes. And I use the same magic characters (although
without using pack and unpack) in Data::Hexdumper.

However, I want to extend it to support 16 byte words and, indeed, to
support any other length words. 3 byte words, for example.

I'd like to remain as compatible as possible with the characters used in
pack()'s templates, but there's nothing there for what I want.

So, can I propose that we pick a character for this purpose and at least
define some syntax for specifying a word length, endian-ness, and repeat
count for it, even if it isn't implemented yet?

Something like this perhaps:
X5,4>

which means:
X - whatever letter we choose
5 - word length
,4 - optional repeat count
> - optional endian-ness

--
David Cantrell | Enforcer, South London Linguistic Massive

Fashion label: n: a liferaft for personalities
which lack intrinsic buoyancy
Re: RFC: pack()ing long words [ In reply to ]
On Mon, Aug 13, 2012 at 2:41 PM, David Cantrell <david@cantrell.org.uk> wrote:
> pack() and unpack() can handle words of 1, 2, 4, and (if you built your
> perl right) 8 bytes. And I use the same magic characters (although
> without using pack and unpack) in Data::Hexdumper.
>
> However, I want to extend it to support 16 byte words and, indeed, to
> support any other length words. 3 byte words, for example.

That sounds like an excellent idea.

> I'd like to remain as compatible as possible with the characters used in
> pack()'s templates, but there's nothing there for what I want.
>
> So, can I propose that we pick a character for this purpose and at least
> define some syntax for specifying a word length, endian-ness, and repeat
> count for it, even if it isn't implemented yet?
>
> Something like this perhaps:
> X5,4>
>
> which means:
> X - whatever letter we choose
> 5 - word length
> ,4 - optional repeat count
> > - optional endian-ness

I don't like the syntax much, but I'm not sure I can think of
something better. Maybe «X{5}4»?

Leon
Re: RFC: pack()ing long words [ In reply to ]
On Mon, 13 Aug 2012 12:41:02 +0100, David Cantrell
<david@cantrell.org.uk> wrote:

> pack() and unpack() can handle words of 1, 2, 4, and (if you built your
> perl right) 8 bytes. And I use the same magic characters (although
> without using pack and unpack) in Data::Hexdumper.
>
> However, I want to extend it to support 16 byte words and, indeed, to
> support any other length words. 3 byte words, for example.
>
> I'd like to remain as compatible as possible with the characters used in
> pack()'s templates, but there's nothing there for what I want.
>
> So, can I propose that we pick a character for this purpose and at least
> define some syntax for specifying a word length, endian-ness, and repeat
> count for it, even if it isn't implemented yet?
>
> Something like this perhaps:
> X5,4>
>
> which means:
> X - whatever letter we choose
> 5 - word length
> ,4 - optional repeat count
> > - optional endian-ness

Counterintuitive in that order

l4 is 4 longs, so if the 4 in your example matches the 4 in l4, I'd
guess that

X5>4

would be more intuitive

I kinda like your approach though. What about bits? Why restrict to
multiple of 8 bits?

--
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.14 porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Re: RFC: pack()ing long words [ In reply to ]
On Mon, Aug 13, 2012 at 01:51:03PM +0200, H.Merijn Brand wrote:

> Counterintuitive in that order
>
> l4 is 4 longs, so if the 4 in your example matches the 4 in l4, I'd
> guess that
>
> X5>4
>
> would be more intuitive

But then you have a problem if someone wants native endian-ness - you
can't tell whether X54 is a single word of 54 bytes, or four words of
five bytes each.

Leon's suggestion of putting the word length in {curlies} works nicely.

> I kinda like your approach though. What about bits? Why restrict to
> multiple of 8 bits?

Hmmm ... and just have X{40}4 instead of X{5}4 for a five byte
(== 40 bits) word. I haven't looked at the source (and am somewhat
terrified to do so TBH) but I can see that getting a bit tricky. If you
consume just three bits with X3, does the next template thingy, and all
the ones after it, have to start and stop half way through a byte?
Yuck.

Maybe that's something to allow for in the syntax, but leave the
implementation until even later.

--
David Cantrell | Bourgeois reactionary pig

Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt
Re: RFC: pack()ing long words [ In reply to ]
On Mon, Aug 13, 2012 at 8:16 PM, David Cantrell <david@cantrell.org.uk> wrote:
> I haven't looked at the source (and am somewhat
> terrified to do so TBH) but I can see that getting a bit tricky.

pp_pack.c is where you need to be. It's rather full of "tricky".

Leon
Re: RFC: pack()ing long words [ In reply to ]
On 13 August 2012 20:31, Leon Timmermans <fawaka@gmail.com> wrote:
> On Mon, Aug 13, 2012 at 8:16 PM, David Cantrell <david@cantrell.org.uk> wrote:
>> I haven't looked at the source (and am somewhat
>> terrified to do so TBH) but I can see that getting a bit tricky.
>
> pp_pack.c is where you need to be. It's rather full of "tricky".

And that is the diplomatic way to put it. :-)

yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"
Re: RFC: pack()ing long words [ In reply to ]
On 13/08/2012 20:32, demerphq wrote:
> On 13 August 2012 20:31, Leon Timmermans <fawaka@gmail.com> wrote:
>> On Mon, Aug 13, 2012 at 8:16 PM, David Cantrell <david@cantrell.org.uk> wrote:
>>> I haven't looked at the source (and am somewhat
>>> terrified to do so TBH) but I can see that getting a bit tricky.
>>
>> pp_pack.c is where you need to be. It's rather full of "tricky".
>
> And that is the diplomatic way to put it. :-)

I shouldn't have looked, but I did. It is dark and full of terrors, and
I want my mummy.

Thankfully, all I'm asking for right now is that the syntax be defined,
so that I can go ahead and implement it in my module, and make sure I
use the same magic letter as pack() will do if pack/unpack ever sprout
this tentacle in the future. So if people agree that this is a good
thing to do, all that actually needs patching for now is the
documentation, something like ...

x A null byte (a.k.a ASCII NUL, "\000", chr(0))
X Back up a byte.
+
+ Y NOT YET IMPLEMENTED. This syntax is reserved for a word of
+ an arbitrary number of bits. The number of bits is
+ specified as a base ten number in {braces}, eg Y{40} for
+ a forty bit (or five byte) word.
+
@ Null-fill or truncate to absolute position, counted from the
start of the innermost ()-group.
. Null-fill or truncate to absolute position specified by
the value.

--
David Cantrell | http://www.cantrell.org.uk/david

Eye have a spelling chequer / It came with my pea sea
It planely marques four my revue / Miss Steaks eye kin knot sea.
Eye strike a quay and type a word / And weight for it to say
Weather eye am wrong oar write / It shows me strait a weigh.
Re: RFC: pack()ing long words [ In reply to ]
On Mon, Aug 13, 2012 at 5:35 PM, David Cantrell <david@cantrell.org.uk> wrote:
> On 13/08/2012 20:32, demerphq wrote:
>> On 13 August 2012 20:31, Leon Timmermans <fawaka@gmail.com> wrote:
>>> On Mon, Aug 13, 2012 at 8:16 PM, David Cantrell <david@cantrell.org.uk> wrote:
>>>> I haven't looked at the source (and am somewhat
>>>> terrified to do so TBH) but I can see that getting a bit tricky.
>>>
>>> pp_pack.c is where you need to be. It's rather full of "tricky".
>>
>> And that is the diplomatic way to put it. :-)
>
> I shouldn't have looked, but I did. It is dark and full of terrors, and
> I want my mummy.
>
> Thankfully, all I'm asking for right now is that the syntax be defined,
> so that I can go ahead and implement it in my module, and make sure I
> use the same magic letter as pack() will do if pack/unpack ever sprout
> this tentacle in the future. So if people agree that this is a good
> thing to do, all that actually needs patching for now is the
> documentation, something like ...
>
> x A null byte (a.k.a ASCII NUL, "\000", chr(0))
> X Back up a byte.
> +
> + Y NOT YET IMPLEMENTED. This syntax is reserved for a word of
> + an arbitrary number of bits. The number of bits is
> + specified as a base ten number in {braces}, eg Y{40} for
> + a forty bit (or five byte) word.
> +
> @ Null-fill or truncate to absolute position, counted from the
> start of the innermost ()-group.
> . Null-fill or truncate to absolute position specified by
> the value.

pp_pack.c has its terrors, but even I can see that if you unpack an
integer type you get an IV or a UV on the stack (that's what mPUSHi
and mPUSHu do). What is it you want pushed on the stack when you
unpack a 16-byte word? It's not going to be anything that Perl can
represent as a numeric value unless you also implement
arbitrary-precision numerics. Or have I misunderstood what you're
wanting?
Re: RFC: pack()ing long words [ In reply to ]
On Mon, 13 Aug 2012 19:18:42 -0500, "Craig A. Berry"
<craig.a.berry@gmail.com> wrote:

> On Mon, Aug 13, 2012 at 5:35 PM, David Cantrell <david@cantrell.org.uk> wrote:
> > On 13/08/2012 20:32, demerphq wrote:
> >> On 13 August 2012 20:31, Leon Timmermans <fawaka@gmail.com> wrote:
> >>> On Mon, Aug 13, 2012 at 8:16 PM, David Cantrell <david@cantrell.org.uk> wrote:
> >>>> I haven't looked at the source (and am somewhat
> >>>> terrified to do so TBH) but I can see that getting a bit tricky.
> >>>
> >>> pp_pack.c is where you need to be. It's rather full of "tricky".
> >>
> >> And that is the diplomatic way to put it. :-)
> >
> > I shouldn't have looked, but I did. It is dark and full of terrors, and
> > I want my mummy.
> >
> > Thankfully, all I'm asking for right now is that the syntax be defined,
> > so that I can go ahead and implement it in my module, and make sure I
> > use the same magic letter as pack() will do if pack/unpack ever sprout
> > this tentacle in the future. So if people agree that this is a good
> > thing to do, all that actually needs patching for now is the
> > documentation, something like ...
> >
> > x A null byte (a.k.a ASCII NUL, "\000", chr(0))
> > X Back up a byte.
> > +
> > + Y NOT YET IMPLEMENTED. This syntax is reserved for a word of
> > + an arbitrary number of bits. The number of bits is
> > + specified as a base ten number in {braces}, eg Y{40} for
> > + a forty bit (or five byte) word.
> > +
> > @ Null-fill or truncate to absolute position, counted from the
> > start of the innermost ()-group.
> > . Null-fill or truncate to absolute position specified by
> > the value.
>
> pp_pack.c has its terrors, but even I can see that if you unpack an
> integer type you get an IV or a UV on the stack (that's what mPUSHi
> and mPUSHu do). What is it you want pushed on the stack when you
> unpack a 16-byte word? It's not going to be anything that Perl can
> represent as a numeric value unless you also implement
> arbitrary-precision numerics. Or have I misunderstood what you're
> wanting?

Math::bigint?

--
H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.14 porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Re: RFC: pack()ing long words [ In reply to ]
On Mon, Aug 13, 2012 at 07:18:42PM -0500, Craig A. Berry wrote:

> pp_pack.c has its terrors, but even I can see that if you unpack an
> integer type you get an IV or a UV on the stack (that's what mPUSHi
> and mPUSHu do). What is it you want pushed on the stack when you
> unpack a 16-byte word?

Dunno. I guess the thing that is the closest match to an int would be a
string of bytes in the right order, so a PV.

--
David Cantrell | top google result for "topless karaoke murders"

I know that you believe you understand what you think you wrote, but
I'm not sure you realize that what you wrote is not what you meant.
RE: RFC: pack()ing long words [ In reply to ]
----------------------------------------
> Date: Tue, 14 Aug 2012 18:10:02 +0100
> From: david@cantrell.org.uk
> To: perl5-porters@perl.org
> Subject: Re: RFC: pack()ing long words
>
> On Mon, Aug 13, 2012 at 07:18:42PM -0500, Craig A. Berry wrote:
>
> > pp_pack.c has its terrors, but even I can see that if you unpack an
> > integer type you get an IV or a UV on the stack (that's what mPUSHi
> > and mPUSHu do). What is it you want pushed on the stack when you
> > unpack a 16-byte word?
>
> Dunno. I guess the thing that is the closest match to an int would be a
> string of bytes in the right order, so a PV.
A packed string  (PV, binary gibberish, not ASCII numbers) is the best, or others say method of last resort to pack/unward any word size ints. If a sufficient big number library is loaded into the script, then return/take big number objects. I integrated http://search.cpan.org/~salva/Math-Int64-0.26/lib/Math/Int64.pm into my XS library, so Math::Int64 objects are accepted and returned, or 8byte PV strings otherwise. Size is checked to make sure the scalar is exactly 8 characters long for sanity reasons.
Re: RFC: pack()ing long words [ In reply to ]
* David Cantrell <david@cantrell.org.uk> [2012-08-13T13:16:09]
> Leon's suggestion of putting the word length in {curlies} works nicely.
>
> > I kinda like your approach though. What about bits? Why restrict to
> > multiple of 8 bits?
>
> Hmmm ... and just have X{40}4 instead of X{5}4 for a five byte
> (== 40 bits) word.

I liked his suggestion as well. I'd probably suggest r, as it's an
rbitrary-length word.

--
rjbs
Re: RFC: pack()ing long words [ In reply to ]
On Mon, Aug 13, 2012 at 7:41 AM, David Cantrell <david@cantrell.org.uk>wrote:

> pack() and unpack() can handle words of 1, 2, 4, and (if you built your
> perl right) 8 bytes. And I use the same magic characters (although
> without using pack and unpack) in Data::Hexdumper.
>
> However, I want to extend it to support 16 byte words and, indeed, to
> support any other length words. 3 byte words, for example.
>
> I'd like to remain as compatible as possible with the characters used in
> pack()'s templates, but there's nothing there for what I want.
>
> So, can I propose that we pick a character for this purpose and at least
> define some syntax for specifying a word length, endian-ness, and repeat
> count for it, even if it isn't implemented yet?
>
> Something like this perhaps:
> X5,4>
>
> which means:
> X - whatever letter we choose
> 5 - word length
> ,4 - optional repeat count
> > - optional endian-ness
>

We already have (...)4 for repeating.
Re: RFC: pack()ing long words [ In reply to ]
On 08/13/2012 01:51 PM, H.Merijn Brand wrote:
> On Mon, 13 Aug 2012 12:41:02 +0100, David Cantrell
> <david@cantrell.org.uk> wrote:
>
>> pack() and unpack() can handle words of 1, 2, 4, and (if you built your
>> perl right) 8 bytes. And I use the same magic characters (although
>> without using pack and unpack) in Data::Hexdumper.
>>
>> However, I want to extend it to support 16 byte words and, indeed, to
>> support any other length words. 3 byte words, for example.
>>
>> I'd like to remain as compatible as possible with the characters used in
>> pack()'s templates, but there's nothing there for what I want.
>>
>> So, can I propose that we pick a character for this purpose and at least
>> define some syntax for specifying a word length, endian-ness, and repeat
>> count for it, even if it isn't implemented yet?
>>
>> Something like this perhaps:
>> X5,4>
>>
>> which means:
>> X - whatever letter we choose
>> 5 - word length
>> ,4 - optional repeat count
>> > - optional endian-ness
>
> Counterintuitive in that order
>
> l4 is 4 longs, so if the 4 in your example matches the 4 in l4, I'd
> guess that
>
> X5>4
>
> would be more intuitive
>
> I kinda like your approach though. What about bits? Why restrict to
> multiple of 8 bits?
>

but them, to make it really interesting, pack/unpack would have to
handle all the templates at arbitrary bit offsets.

For instance:

pack("X{23}CX9");
# should extract
# - a bitstring from bits 0 to 22
# - an unsigned char from bits 23 to 30
# - a bitstring from bits 31 to 39

Or at least, it should be possible to do that for X templates:

pack("X{23}X8X9");
# should extract
# - a bitstring from bits 0 to 22
# - a bitstring from bits 23 to 30
# - a bitstring from bits 31 to 39

Also, when considering bit strings and bit-offsets, "endianess" may be
interpreted in several ways: where do you place the byte boundaries? at
the byte boundaries in the string being unpacked? every 8 bits on the
sub-bitstring? starting from the left or from the right?