Mailing List Archive

Cython for cPickle?
Hi,

I noticed that there is a PEP (3154) and a GSoC proposal about improving
Pickle. Given the recent discussion on this list about using Cython for the
import module, I wonder if it wouldn't make even more sense to switch from
a C (accelerator) implementation to Cython for _pickle.

The rationale is that C code that deals a lot with object operations tends
to be rather verbose, and _pickle specifically looks very verbose in many
places. Some of this is optimised I/O, ok, but most of it seems to take its
complexity from code specialisations for builtin types and a lot of error
handling code. A Cython reimplementation would take a lot of weight out of
this.

Note that the approach won't be as simple as compiling pickle.py. _pickle
uses a lot of optimisations that only work at the C level, at least
efficiently. So the idea would be to rewrite _pickle in Cython instead.
It's currently about 6500 lines of C. Even if we divide that only by a
rather conservative factor of 3, we'd end up with some 2000 lines of Cython
code, all extracted straight from the existing C code. That sounds like
less than two weeks of work, maybe even if we add the marshal module to it.
In less than a month of GSoC time, this could easily reach a point where
it's "close to the speed of what we have" and "fast enough", but a lot more
accessible and maintainable, thus also making it easier to add the
extensions described in the PEP.

What do you think?

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
> What do you think?

I think I know what Jim Fulton thinks (as we talked about something
like this a PyCon): don't. He is already sad that cPickle grew so much
pickle features when it was designed as a real fast implementation.
pickle speed is really important to some users, and any loss of
performance needs serious justification. Easier maintenance is not
a sufficient reason.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
On Thu, Apr 19, 2012 at 6:55 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
> What do you think?

I think the possible use of Cython for standard library extension
modules is potentially worth looking into for the 3.4 timeframe (c.f.
the recent multiple checkins sorting out the refcounts for the new
ImportError helper function). There are obviously a lot of factors to
consider before actually proceeding with such an approach (even for
the extension modules), but a side-by-side comparison of pickle.py,
the existing C accelerated pickle module and a Cython accelerated
pickle module (including benchmark numbers) would be a valuable data
point in any such discussion.

However, it would definitely have to be pitched to any interested
students as a proof-of-concept exercise, with a real possibility that
the outcome will end up supporting MvL's reply.

Regards,
Nick.

--
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
On Thu, 19 Apr 2012 10:55:24 +0200
Stefan Behnel <stefan_ml@behnel.de> wrote:
>
> I noticed that there is a PEP (3154) and a GSoC proposal about improving
> Pickle. Given the recent discussion on this list about using Cython for the
> import module, I wonder if it wouldn't make even more sense to switch from
> a C (accelerator) implementation to Cython for _pickle.

I think that's quite orthogonal to PEP 3154 (which shouldn't add a lot
of new code IMHO).

> Note that the approach won't be as simple as compiling pickle.py. _pickle
> uses a lot of optimisations that only work at the C level, at least
> efficiently. So the idea would be to rewrite _pickle in Cython instead.
> It's currently about 6500 lines of C. Even if we divide that only by a
> rather conservative factor of 3, we'd end up with some 2000 lines of Cython
> code, all extracted straight from the existing C code. That sounds like
> less than two weeks of work, maybe even if we add the marshal module to it.

I think this all needs someone to demonstrate the benefits, in
terms of both readability/maintainability, and performance.

Also, while C is a low-level language, Cython is a different language
than Python when you start using its optimization features. This means
core developers have to learn that language.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
On Thu, Apr 19, 2012 at 05:38, Nick Coghlan <ncoghlan@gmail.com> wrote:
> On Thu, Apr 19, 2012 at 6:55 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
>> What do you think?
>
> I think the possible use of Cython for standard library extension
> modules is potentially worth looking into for the 3.4 timeframe (c.f.
> the recent multiple checkins sorting out the refcounts for the new
> ImportError helper function).

I'd rather just "rtfm" as was suggested and get it right than switch
everything around to Cython.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
On Thu, 19 Apr 2012 14:44:06 +0200, Antoine Pitrou <solipsis@pitrou.net> wrote:
> Also, while C is a low-level language, Cython is a different language
> than Python when you start using its optimization features. This means
> core developers have to learn that language.

Hmm. On the other hand, perhaps some core developers (present or
future) would prefer to learn Cython over learning C [*].

--David

[*] For this you may actually want to read "learning to modify the Python
C codebase", since in fact I know how to program in C, I just prefer to
do as little of it as possible, and so haven't really learned the Python
C codebase.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
Personally I find the unholy product of C and Python that is Cython to be
more complex than the sum of the complexities of its parts. Is it really
wise to be learning Cython without already knowing C, Python, and the
CPython object model?

While code generation alleviates the burden of tedious languages, it's also
infinitely more complex, makes debugging very difficult and adds to
prerequisite knowledge, among other drawbacks.
Re: Cython for cPickle? [ In reply to ]
Matt Joiner, 19.04.2012 16:13:
> Personally I find the unholy product of C and Python that is Cython to be
> more complex than the sum of the complexities of its parts. Is it really
> wise to be learning Cython without already knowing C, Python, and the
> CPython object model?

The main obstacle that I regularly see for users of the C-API is actually
reference counting and an understanding of what borrowed references and
owned references imply in a given code context. In fact, I can't remember
seeing any C extension code getting posted on Python mailing lists (core
developers excluded) that has no ref-counting bugs or at least a severe
lack of error handling. Usually, such code is also accompanied by a comment
that the author is not sure if everything is correct and asks for advice,
and that's rather independent of the functional complexity of the code
snippet. OTOH, I've also seen a couple of really dangerous code snippets
already that posters apparently meant to show off with, so not everyone is
aware of these obstacles.

Also, the C code by inexperienced programmers tends to be fairly
inefficient because they simply do not know what impact some convenience
functions have. So they tend to optimise prematurely in places where they
feel more comfortable, but that can never make up for the overhead that
simple and very conveniently looking C-API functions introduce in other
places. Value packing comes to mind.

So, from my experience, there is a serious learning curve beyond knowing C,
right from the start when trying to work on C extensions, including
CPython's own code, because the C-API is far from trivial.

And that's the kind of learning curve that Cython tries to lower. It makes
it substantially easier to write correct code, simply by letting you write
Python code instead of C plus C-API code. And once it works, you can start
making it explicitly faster by applying "I know what I'm doing" schemes to
proven hot spots or by partially rewriting it. And if you do not know yet
what you're doing, then *that's* where the learning curve begins. But by
then, your code is basically written, works more or less and can be
benchmarked.


> While code generation alleviates the burden of tedious languages, it's also
> infinitely more complex, makes debugging very difficult and adds to
> prerequisite knowledge, among other drawbacks.

You can use gdb for source level debugging of Cython code and cProfile to
profile it. Try that with C-API code.

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
On Thu, Apr 19, 2012 at 16:08, Stefan Behnel
>> While code generation alleviates the burden of tedious languages, it's also
>> infinitely more complex, makes debugging very difficult and adds to
>> prerequisite knowledge, among other drawbacks.
>
> You can use gdb for source level debugging of Cython code and cProfile to
> profile it. Try that with C-API code.

I know I'm in the minority of committers being on Windows, but we do
receive a good amount of reports and contributions from Windows users
who dive into the C code. The outside contributors actually gave the
strongest indication that we needed to move to VS2010.

Visual Studio by itself makes debugging unbelievably easy, and with
the Python Tools for VS plugin it even allows Visual Studio's built-in
profiler to work. I know Windows is not on most people's maps, but if
we have to scrap the debugger, that's another learning curve
attachment to evaluate.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
Brian Curtin, 19.04.2012 23:19:
> On Thu, Apr 19, 2012 at 16:08, Stefan Behnel
>>> While code generation alleviates the burden of tedious languages, it's also
>>> infinitely more complex, makes debugging very difficult and adds to
>>> prerequisite knowledge, among other drawbacks.
>>
>> You can use gdb for source level debugging of Cython code and cProfile to
>> profile it. Try that with C-API code.
>
> I know I'm in the minority of committers being on Windows, but we do
> receive a good amount of reports and contributions from Windows users
> who dive into the C code.

Doesn't match my experience at all - different software target audiences, I
guess.


> Visual Studio by itself makes debugging unbelievably easy, and with
> the Python Tools for VS plugin it even allows Visual Studio's built-in
> profiler to work. I know Windows is not on most people's maps, but if
> we have to scrap the debugger, that's another learning curve
> attachment to evaluate.

What I meant was that there's pdb for debugging Python code (which doesn't
know about the C code it executes) and gdb (or VS) for debugging C code,
from which you can barely infer the Python code it executes. For Cython
code, you can use gdb for both Cython and C, and within limits also for
Python code. Here's a quick intro to see what I mean:

http://docs.cython.org/src/userguide/debugging.html

For profiling, you can use cProfile for Python code (which doesn't tell you
about the C code it executes) and oprofile, callgrind, etc. (incl. VS) for
C code, from which it's non-trivial to infer the relation to the Python
code. With Cython, you can use cProfile for both Cython and Python code as
long as you stay at the source code level, and only need to descend to a
low-level profiler when you care about the exact details, usually assembly
jumps and branches.

Anyway, I guess this is getting off-topic for this list.

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
On Thu, Apr 19, 2012 at 17:21, Stefan Behnel <stefan_ml@behnel.de> wrote:
> Brian Curtin, 19.04.2012 23:19:
>> On Thu, Apr 19, 2012 at 16:08, Stefan Behnel
>>>> While code generation alleviates the burden of tedious languages, it's also
>>>> infinitely more complex, makes debugging very difficult and adds to
>>>> prerequisite knowledge, among other drawbacks.
>>>
>>> You can use gdb for source level debugging of Cython code and cProfile to
>>> profile it. Try that with C-API code.
>>
>> I know I'm in the minority of committers being on Windows, but we do
>> receive a good amount of reports and contributions from Windows users
>> who dive into the C code.
>
> Doesn't match my experience at all - different software target audiences, I
> guess.

I'm don't know what this means. I work on CPython, which is the target
audience at hand, and I come across reports and contributions from
Windows users for C extensions.

>> Visual Studio by itself makes debugging unbelievably easy, and with
>> the Python Tools for VS plugin it even allows Visual Studio's built-in
>> profiler to work. I know Windows is not on most people's maps, but if
>> we have to scrap the debugger, that's another learning curve
>> attachment to evaluate.
>
> What I meant was that there's pdb for debugging Python code (which doesn't
> know about the C code it executes) and gdb (or VS) for debugging C code,
> from which you can barely infer the Python code it executes. For Cython
> code, you can use gdb for both Cython and C, and within limits also for
> Python code. Here's a quick intro to see what I mean:
>
> http://docs.cython.org/src/userguide/debugging.html

I know what you meant. What I meant is "easy debugging on Windows goes
away, now I have to setup and learn GDB on Windows". *I* can do that.
Does the rest of the community want to have to do that as well? We
should also take into consideration how something like this affects
the third-party IDEs and their debugger support.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
On Thu, Apr 19, 2012 at 4:55 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
>
> That sounds like less than two weeks of work, maybe even if we add the
> marshal module to it.
> In less than a month of GSoC time, this could easily reach a point where
> it's "close to the speed of what we have" and "fast enough", but a lot more
> accessible and maintainable, thus also making it easier to add the
> extensions described in the PEP.
>
> What do you think?


As others have pointed out, many users of pickle depend on its performance.
The main reason why _pickle.c is so big is all the low-level optimizations
we have in there. We have custom stack and dictionary implementations just
for the sake of speed. We also have fast paths for I/O operations and
function calls. These optimizations alone are taking easily 2000 lines of
code and they are not micro-optimizations. Each of these were shown to give
speedups from one to several orders of magnitude.

So I disagree that we could easily reach the point where it's "close to the
speed of what we have." And if we were to attempt this, it would be a
multiple months undertaking. I would rather see that time spent on
improving pickle than on yet another reimplementation.

-- Alexandre
Re: Cython for cPickle? [ In reply to ]
> So I disagree that we could easily reach the point where it's "close to the
> speed of what we have." And if we were to attempt this, it would be a
> multiple months undertaking. I would rather see that time spent on
> improving pickle than on yet another reimplementation.

Of course, this being free software, anybody can spend time on whatever they
please, and this should not make anybody feel sad. You just don't get merits
if you work on stuff that nobody cares about.

Regards,
Martin


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
Alexandre Vassalotti wrote:
>
> We have custom stack and
> dictionary implementations just for the sake of speed. We also have fast
> paths for I/O operations and function calls.

All of that could very likely be carried over almost
unchanged into a Cython version. I don't see why it
should take multiple months. It's not a matter of
rewriting it from scratch, just translating it from
one dialect (C) to another (the C subset of Cython).

--
Greg
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
On Sun, Apr 22, 2012 at 6:12 PM, <martin@v.loewis.de> wrote:

> So I disagree that we could easily reach the point where it's "close to
>> the
>> speed of what we have." And if we were to attempt this, it would be a
>> multiple months undertaking. I would rather see that time spent on
>> improving pickle than on yet another reimplementation.
>>
>
> Of course, this being free software, anybody can spend time on whatever
> they
> please, and this should not make anybody feel sad. You just don't get
> merits
> if you work on stuff that nobody cares about.


Yes, of course. I don't want to discourage anyone to investigate this
option—in fact, I would very much like to see myself proven wrong. But, if
I understood Stefan correctly, he is proposing to have a GSoC student to do
the work, to which I would feel uneasy about since we have no idea how
valuable this would be as a contribution.

-- Alexandre
Re: Cython for cPickle? [ In reply to ]
On Mon, Apr 23, 2012 at 9:27 AM, Alexandre Vassalotti
<alexandre@peadrop.com> wrote:
> On Sun, Apr 22, 2012 at 6:12 PM, <martin@v.loewis.de> wrote:
>> Of course, this being free software, anybody can spend time on whatever
>> they
>> please, and this should not make anybody feel sad. You just don't get
>> merits
>> if you work on stuff that nobody cares about.
>
>
> Yes, of course. I don't want to discourage anyone to investigate this
> option—in fact, I would very much like to see myself proven wrong. But, if I
> understood Stefan correctly, he is proposing to have a GSoC student to do
> the work, to which I would feel uneasy about since we have no idea how
> valuable this would be as a contribution.

So long as it's made clear to the students applying that it's a proof
of concept that may return a negative result (i.e. "it was tried, it
proved to be a bad idea") I don't see a problem with it. The freedom
to try out multiple ideas in parallel is one of the great strengths of
open source.

We've had GSoC students try unsuccessful experiments in the past and
have gained useful information as a result (e.g. the main reason I
know the Import Engine API proposed in the deferred PEP 406 isn't
adequate as currently written is because of the design level problems
Greg found when implementing it last summer. The currently documented
design simply doesn't achieve the full objectives of the PEP)

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Cython for cPickle? [ In reply to ]
On Sun, Apr 22, 2012 at 6:34 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
> On Mon, Apr 23, 2012 at 9:27 AM, Alexandre Vassalotti
> <alexandre@peadrop.com> wrote:
>> On Sun, Apr 22, 2012 at 6:12 PM, <martin@v.loewis.de> wrote:
>>> Of course, this being free software, anybody can spend time on whatever
>>> they
>>> please, and this should not make anybody feel sad. You just don't get
>>> merits
>>> if you work on stuff that nobody cares about.
>>
>>
>> Yes, of course. I don't want to discourage anyone to investigate this
>> option—in fact, I would very much like to see myself proven wrong. But, if I
>> understood Stefan correctly, he is proposing to have a GSoC student to do
>> the work, to which I would feel uneasy about since we have no idea how
>> valuable this would be as a contribution.
>
> So long as it's made clear to the students applying that it's a proof
> of concept that may return a negative result (i.e. "it was tried, it
> proved to be a bad idea") I don't see a problem with it. The freedom
> to try out multiple ideas in parallel is one of the great strengths of
> open source.
>
> We've had GSoC students try unsuccessful experiments in the past and
> have gained useful information as a result (e.g. the main reason I
> know the Import Engine API proposed in the deferred PEP 406 isn't
> adequate as currently written is because of the design level problems
> Greg found when implementing it last summer. The currently documented
> design simply doesn't achieve the full objectives of the PEP)

However, I think that in this case the success may be predetermined,
or at least not determined by technical success alone. I have a lot of
respect for Cython, but I don't think it is right to have any part of
core Python depend on it. Cython is an incredibly complex and
relatively young (and still fast evolving) piece of technology, while
I think that core dependencies should be minimized and limited to
absolutely fundamental building blocks.

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com