Mailing List Archive

Problems with hex-conversion functions
Hello everyone.

I see several problems with the two hex-conversion function pairs that
Python offers:
1. binascii.hexlify and binascii.unhexlify
2. bytes.fromhex and bytes.hex

Problem #1:
bytes.hex is not implemented, although it was specified in PEP 358.
This means there is no symmetrical function to accompany bytes.fromhex.

Problem #2:
Both pairs perform the same function, although The Zen Of Python suggests
that
"There should be one-- and preferably only one --obvious way to do it."
I do not understand why PEP 358 specified the bytes function pair although
it mentioned the binascii pair...

Problem #3:
bytes.fromhex may receive spaces in the input string, although
binascii.unhexlify may not.
I see no good reason for these two functions to have different features.

Problem #4:
binascii.unhexlify may receive both input types: strings or bytes, whereas
bytes.fromhex raises an exception when given a bytes parameter.
Again there is no reason for these functions to be different.

Problem #5:
binascii.hexlify returns a bytes type - although ideally, converting to hex
should
always return string types and converting from hex should always return
bytes.
IMO there is no meaning of bytes as an output of hexlify, since the output
is a
representation of other bytes.
This is also the suggested behavior of bytes.hex in PEP 358


Problems #4 and #5 call for a decision about the input and output of the
functions being discussed:

Option A : Strict input and output
unhexlify (and bytes.fromhex) may only receives string and may only return
bytes
hexlify (and bytes.hex) may only receives bytes and may only return strings

Option B : Robust input and strict output
unhexlify (and bytes.fromhex) may receive bytes and strings and may only
return bytes
hexlify (and bytes.hex) may receive bytes or strings and may only return
strings

Of course we may also consider a third option, which will allow the return
type of
all functions to be robust (perhaps specified in a keyword argument), but as
I wrote in
the description of problem #5, I see no sense in that.

Note that PEP 3137 describes: "... the more strict definitions of encoding
and decoding in
Python 3000: encoding always takes a Unicode string and returns a bytes
sequence, and decoding
always takes a bytes sequence and returns a Unicode string." - suggesting
option A.

To repeat problems #4 and #5, the current behavior does not match any
option:
* The return type of binascii.hexlify should be string, and this is not the
current behavior.
As for the input:
* Option A is not the current behavior because binascii.unhexlify may
receive both input types.
* Option B is not the current behavior because bytes.fromhex does not allow
bytes as input.


To fix these issues, three changes should be applied:
1. Deprecate bytes.fromhex. This fixes the following problems:
#4 (go with option B and remove the function that does not allow bytes
input)
#2 (the binascii functions will be the only way to "do it")
#1 (bytes.hex should not be implemented)
2. In order to keep the functionality that bytes.fromhex has over unhexlify,
the latter function should be able to handle spaces in its input (fix #3)
3. binascii.hexlify should return string as its return type (fix #5)
Re: Problems with hex-conversion functions [ In reply to ]
On Sat, Sep 5, 2009 at 14:26, Ender Wiggin<wiggin15@gmail.com> wrote:
> Hello everyone.
>
> I see several problems with the two hex-conversion function pairs that
> Python offers:
> 1. binascii.hexlify and binascii.unhexlify
> 2. bytes.fromhex and bytes.hex
>
> Problem #1:
> bytes.hex is not implemented, although it was specified in PEP 358.

Probably an oversight.

> This means there is no symmetrical function to accompany bytes.fromhex.
>
> Problem #2:
> Both pairs perform the same function, although The Zen Of Python suggests
> that
> "There should be one-- and preferably only one --obvious way to do it."
> I do not understand why PEP 358 specified the bytes function pair although
> it mentioned the binascii pair...
>

It's nicer to have this kind of functionality on the built-ins than in
the standard library. "Practicality beats purity".

> Problem #3:
> bytes.fromhex may receive spaces in the input string, although
> binascii.unhexlify may not.
> I see no good reason for these two functions to have different features.
>

Well, one allows for sloppy input while the other does not. Usually
accepting sloppy input but giving strict input is better.

> Problem #4:
> binascii.unhexlify may receive both input types: strings or bytes, whereas
> bytes.fromhex raises an exception when given a bytes parameter.
> Again there is no reason for these functions to be different.

Well, giving bytes back into bytes seems somewhat silly. That's an
error in mixing your strings and bytes.

>
> Problem #5:
> binascii.hexlify returns a bytes type - although ideally, converting to hex
> should
> always return string types and converting from hex should always return
> bytes.
> IMO there is no meaning of bytes as an output of hexlify, since the output
> is a
> representation of other bytes.
> This is also the suggested behavior of bytes.hex in PEP 358
>
> Problems #4 and #5 call for a decision about the input and output of the
> functions being discussed:
>
> Option A : Strict input and output
> unhexlify (and bytes.fromhex) may only receives string and may only return
> bytes
> hexlify (and bytes.hex) may only receives bytes and may only return strings
>
> Option B : Robust input and strict output
> unhexlify (and bytes.fromhex) may receive bytes and strings and may only
> return bytes
> hexlify (and bytes.hex) may receive bytes or strings and may only return
> strings
>
> Of course we may also consider a third option, which will allow the return
> type of
> all functions to be robust (perhaps specified in a keyword argument), but as
> I wrote in
> the description of problem #5, I see no sense in that.
>
> Note that PEP 3137 describes: "... the more strict definitions of encoding
> and decoding in
> Python 3000: encoding always takes a Unicode string and returns a bytes
> sequence, and decoding
> always takes a bytes sequence and returns a Unicode string." - suggesting
> option A.
>
> To repeat problems #4 and #5, the current behavior does not match any
> option:
> * The return type of binascii.hexlify should be string, and this is not the
> current behavior.
> As for the input:
> * Option A is not the current behavior because binascii.unhexlify may
> receive both input types.
> * Option B is not the current behavior because bytes.fromhex does not allow
> bytes as input.
>
> To fix these issues, three changes should be applied:
> 1. Deprecate bytes.fromhex. This fixes the following problems:
>    #4 (go with option B and remove the function that does not allow bytes
> input)
>    #2 (the binascii functions will be the only way to "do it")
>    #1 (bytes.hex should not be implemented)
> 2. In order to keep the functionality that bytes.fromhex has over unhexlify,
>    the latter function should be able to handle spaces in its input (fix #3)
> 3. binascii.hexlify should return string as its return type (fix #5)

Or we fix bytes.fromhex(), add bytes.hex() and deprecate binascii.(un)hexlify().

-Brett
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Problems with hex-conversion functions [ In reply to ]
Brett Cannon wrote:
>> To fix these issues, three changes should be applied:
>> 1. Deprecate bytes.fromhex. This fixes the following problems:
>> #4 (go with option B and remove the function that does not allow bytes
>> input)
>> #2 (the binascii functions will be the only way to "do it")
>> #1 (bytes.hex should not be implemented)
>> 2. In order to keep the functionality that bytes.fromhex has over unhexlify,
>> the latter function should be able to handle spaces in its input (fix #3)
>> 3. binascii.hexlify should return string as its return type (fix #5)
>
> Or we fix bytes.fromhex(), add bytes.hex() and deprecate binascii.(un)hexlify().

binascii is the legacy approach here, so if anything was to go, those
functions would be it. I'm not sure getting rid of them is worth the
hassle though (especially in 2.x).

Regarding bytes.hex(), it may be better to modify the builtin hex()
function to accept bytes as an input type.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
---------------------------------------------------------------
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Problems with hex-conversion functions [ In reply to ]
Sorry for the late reply. I would really like to see this fixed.

>> Or we [...] deprecate binascii.(un)hexlify().
...
>> binascii is the legacy approach here, so if anything was to go, those
functions would be it
...

I'm not entirely convinced binascii is the legacy approach. What makes this
module "legacy"?
On the contrary, I'm pretty sure modularity is better than sticking all the
functionality in the core.
As was written in this issue:
http://psf.upfronthosting.co.za/roundup/tracker/issue3532
"If you wanted to produce base-85 (say), then you can extend the
functionality of bytes by providing a
function that does that, whereas you can't extend the existing bytes type."
This example shows that "hex" is actually getting a special treatment by
having builtin methods associated
with the bytes type. Why don't we add ".base64" methods? Or even ".zlib"?
After all, these options were present
in Python 2.x using the "encode" method of string. In my opinion, having
modules to deal with these types of
conversions is better, and this is why I suggested sticking to binascii.
In any case, seeing as both this discussion and the one linked above were
abandoned, I would like to hear
about what needs to be done to actually fix these issues. If no one else is
willing to do it (that would be a
little disappoiting), I think I have the skills to learn and fix the code
itself, but I don't have the time
and I am unfamiliar with the process of submitting patches and getting them
approved. For example, who gets
to decide about the correct approach?
Is there a better place to discuss this?

Thanks for the responses.

-- Arnon

On Sun, Sep 6, 2009 at 5:51 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

> Brett Cannon wrote:
> >> To fix these issues, three changes should be applied:
> >> 1. Deprecate bytes.fromhex. This fixes the following problems:
> >> #4 (go with option B and remove the function that does not allow
> bytes
> >> input)
> >> #2 (the binascii functions will be the only way to "do it")
> >> #1 (bytes.hex should not be implemented)
> >> 2. In order to keep the functionality that bytes.fromhex has over
> unhexlify,
> >> the latter function should be able to handle spaces in its input (fix
> #3)
> >> 3. binascii.hexlify should return string as its return type (fix #5)
> >
> > Or we fix bytes.fromhex(), add bytes.hex() and deprecate
> binascii.(un)hexlify().
>
> binascii is the legacy approach here, so if anything was to go, those
> functions would be it. I'm not sure getting rid of them is worth the
> hassle though (especially in 2.x).
>
> Regarding bytes.hex(), it may be better to modify the builtin hex()
> function to accept bytes as an input type.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
> ---------------------------------------------------------------
>
Re: Problems with hex-conversion functions [ In reply to ]
On 9/23/2010 5:31 AM, Ender Wiggin wrote:

> I think I have the skills to learn and fix the
> code itself, but I don't have the time
> and I am unfamiliar with the process of submitting patches and getting

Anyone can submit a patch at bugs.python.org. The process of getting one
approved includes responding to questions, suggestions, and criticisms.
Beyond that, the process may be short if the patch is simple and
non-controversial. Others may take extensive discussion on pydev or
other forums. Some are ignored or rejected.

One can also participate by commenting on issues started by others. See
http://wiki.python.org/moin/TrackerDocs/
for more.

> them approved. For example, who gets
> to decide about the correct approach?

This particular issue would probably require more discussion than less.
However, submission of a patch using one approach would tend to push the
discussion to happen.

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Problems with hex-conversion functions [ In reply to ]
On Thu, Sep 23, 2010 at 7:31 PM, Ender Wiggin <wiggin15@gmail.com> wrote:
> Sorry for the late reply. I would really like to see this fixed.
>
>>> Or we [...] deprecate binascii.(un)hexlify().
> ...
>>> binascii is the legacy approach here, so if anything was to go, those
>>> functions would be it
> ...
>
> I'm not entirely convinced binascii is the legacy approach. What makes this
> module "legacy"?

Because the binascii functions predate the bytes type, and we added
the bytes methods knowing full well that the hexlify/unhexlify
functions already existed.

> On the contrary, I'm pretty sure modularity is better than sticking all the
> functionality in the core.
> As was written in this issue:
> http://psf.upfronthosting.co.za/roundup/tracker/issue3532
> "If you wanted to produce base-85 (say), then you can extend the
> functionality of bytes by providing a
> function that does that, whereas you can't extend the existing bytes type."
> This example shows that "hex" is actually getting a special treatment by
> having builtin methods associated
> with the bytes type. Why don't we add ".base64" methods? Or even ".zlib"?
> After all, these options were present
> in Python 2.x using the "encode" method of string. In my opinion, having
> modules to deal with these types of
> conversions is better, and this is why I suggested sticking to binascii.

This *is* a matter of opinion, but python-dev's collective opinion was
already expressed in the decision to include these methods in the
bytes API.

Base 16 *is* given special treatment by many parts of Python,
precisely because it *is* special: it's the most convenient way to
express binary numbers in a vaguely human readable format.

No other coding even comes close to that level of importance in
computer science.

> If no one else is willing to do it (that would be a
> little disappoiting)

Why would it be disappointing? While it's untidy, nothing's actually
broken and there are ways for programmers to do everything they want
to do. I (and many others here) already have a pretty long list of
"things I'd like to improve/fix but haven't got around to yet", so it
isn't uncommon for things to have to wait awhile before someone looks
at them.

As Terry said though, there *are* ways to expedite that process (In
this case, providing a patch that adds a .hex method in accordance
with PEP 358, or, as a more ambitious, extensible alternative,
consider updating the hex builtin to support the PEP 3118 API, which
would allow it to automatically provide a hex dump of any object that
exposes a view of a contiguous sequence of data bytes).

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Problems with hex-conversion functions [ In reply to ]
Hello again. I submitted two patches to resolve the issues from my first
post.

Patch 9951 - implement bytes.hex (http://bugs.python.org/issue9951)
Patch 9996 - fix input and output of binascii functions (
http://bugs.python.org/issue9996)

Fix #1 - patch 9951 implements bytes.hex
Fix #2 - this is not fixed for now, no deprecation
Fix #3 - this is not fixed for now. I will probably submit another patch if
patch 9996 is accepted (create shared conversion functions to be used by
both binascii and bytes, maybe)
Fix #4 - patch 9996 makes binascii behave correctly in this conversion
Fix #5 - same as #4 (strict input and output)

As you can see, patch 9996 was rejected and I was referred to this mailing
list to continue the discussion.
I would like to hear your thoughts about the backward compatibility issue in
patch 9996, and getting patch 9951 commited. Thanks.

On Fri, Sep 24, 2010 at 12:04 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

> On Thu, Sep 23, 2010 at 7:31 PM, Ender Wiggin <wiggin15@gmail.com> wrote:
> > Sorry for the late reply. I would really like to see this fixed.
> >
> >>> Or we [...] deprecate binascii.(un)hexlify().
> > ...
> >>> binascii is the legacy approach here, so if anything was to go, those
> >>> functions would be it
> > ...
> >
> > I'm not entirely convinced binascii is the legacy approach. What makes
> this
> > module "legacy"?
>
> Because the binascii functions predate the bytes type, and we added
> the bytes methods knowing full well that the hexlify/unhexlify
> functions already existed.
>
> > On the contrary, I'm pretty sure modularity is better than sticking all
> the
> > functionality in the core.
> > As was written in this issue:
> > http://psf.upfronthosting.co.za/roundup/tracker/issue3532
> > "If you wanted to produce base-85 (say), then you can extend the
> > functionality of bytes by providing a
> > function that does that, whereas you can't extend the existing bytes
> type."
> > This example shows that "hex" is actually getting a special treatment by
> > having builtin methods associated
> > with the bytes type. Why don't we add ".base64" methods? Or even ".zlib"?
> > After all, these options were present
> > in Python 2.x using the "encode" method of string. In my opinion, having
> > modules to deal with these types of
> > conversions is better, and this is why I suggested sticking to binascii.
>
> This *is* a matter of opinion, but python-dev's collective opinion was
> already expressed in the decision to include these methods in the
> bytes API.
>
> Base 16 *is* given special treatment by many parts of Python,
> precisely because it *is* special: it's the most convenient way to
> express binary numbers in a vaguely human readable format.
>
> No other coding even comes close to that level of importance in
> computer science.
>
> > If no one else is willing to do it (that would be a
> > little disappoiting)
>
> Why would it be disappointing? While it's untidy, nothing's actually
> broken and there are ways for programmers to do everything they want
> to do. I (and many others here) already have a pretty long list of
> "things I'd like to improve/fix but haven't got around to yet", so it
> isn't uncommon for things to have to wait awhile before someone looks
> at them.
>
> As Terry said though, there *are* ways to expedite that process (In
> this case, providing a patch that adds a .hex method in accordance
> with PEP 358, or, as a more ambitious, extensible alternative,
> consider updating the hex builtin to support the PEP 3118 API, which
> would allow it to automatically provide a hex dump of any object that
> exposes a view of a contiguous sequence of data bytes).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
>
Re: Problems with hex-conversion functions [ In reply to ]
On Sat, Oct 2, 2010 at 5:17 AM, Arnon Yaari <wiggin15@gmail.com> wrote:
> Hello again. I submitted two patches to resolve the issues from my first
> post.
>
> Patch 9951 - implement bytes.hex (http://bugs.python.org/issue9951)
> Patch 9996 - fix input and output of binascii functions
> (http://bugs.python.org/issue9996)
>
> Fix #1 - patch 9951 implements bytes.hex
> Fix #2 - this is not fixed for now, no deprecation
> Fix #3 - this is not fixed for now. I will probably submit another patch if
> patch 9996 is accepted (create shared conversion functions to be used by
> both binascii and bytes, maybe)
> Fix #4 - patch 9996 makes binascii behave correctly in this conversion
> Fix #5 - same as #4 (strict input and output)
>
> As you can see, patch 9996 was rejected and I was referred to this mailing
> list to continue the discussion.

I actually agree with that rejection. You appear to be thinking of hex
coding solely as a data display format, when it is also used fairly
often as a data interchange format (usually embedded inside a larger
formatting scheme rather than standalone). For data interchange, you
want the hex values as ASCII-encoded bytes, for display to the user,
you want it as a string.

The conversion of the binascii API to Py3k took a data interchange
view of the world, bytes.fromhex is more user I/O oriented.

> I would like to hear your thoughts about the backward compatibility issue in
> patch 9996, and getting patch 9951 commited. Thanks.

The 9951 patch looks pretty good on a quick read through. I put some
specific feedback on the tracker.

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com