Discussion:
template rendering as iteration, instead of layered concatenation
(too old to reply)
Brian Harring
2007-06-14 19:23:00 UTC
Permalink
Just filed ticket 4565, which basically converts template rendering
away from "build a string within this node of subnode results, return
it, wash rinse repeat", and into "yield each subnode chunk, and my
data as it's available".

The pros of it are following (copy/pasting from the ticket):

* instant results; as long as you don't have any tags/nodes that
require collapsing their subtree down into a single string, data is
sent out immediately. Rather nice if you have a slow page- you start
getting chunks of the page immediately, rather then having to wait for
the full 10s/whatever time for the page to render.

* Far less abusive on memory; usual 'spanish inquisition' heavy
test test (term coined by mtreddinick, but it works), reduction from
84m to 71m for usage at the end of rendering. What I find rather
interesting about that reduction is that the resultant page is 6.5
mb; the extra 7mb I'm actually not sure where the reduction comes
from (suspect there is some copy idiocy somewhere forcing a new
string)- alternatively, may just be intermediate data hanging around,
since I've been working with this code in one form or another for 3
months now, and still haven't figured out the 'why' for that diff.

* Surprisingly, at least for the test cases I have locally, I'm not
seeing a noticable hit on small pages, nothing above background noise
at the very least- larger pages, seeing a mild speed up (which was
expected- 4-5%).

* conversion over is simple- only potential API breakage is for Nodes
that don't actually do anything, ie no render method.

The con of it is that the Node base class grows a potential gotcha;

class Node(object):
def render(self, context):
return ''.join(self.iter_render(context))

def iter_render(self, context):
return (self.render(context),)

As hinted at above, if a derivative (for whatever reason) doesn't
override iter_render or render, it goes recursive and eventually
results in a RuntimeError (exceeds max stack depth).

Have been playing with this patch in one form or another for last few
months, and have been pretty happy with the results- what I'm
specifically wondering is if folks are willing to accept the potential
Node gotcha for the gains listed above.

Alternatively, if folks have an alternate solution, would love to hear
it.

General thoughts, comments, etc, desired- thanks.
~harring
Malcolm Tredinnick
2007-06-14 23:14:29 UTC
Permalink
Post by Brian Harring
Just filed ticket 4565, which basically converts template rendering
away from "build a string within this node of subnode results, return
it, wash rinse repeat", and into "yield each subnode chunk, and my
data as it's available".
+1 from me. Except for ...
Post by Brian Harring
The con of it is that the Node base class grows a potential gotcha;
return ''.join(self.iter_render(context))
return (self.render(context),)
As hinted at above, if a derivative (for whatever reason) doesn't
override iter_render or render, it goes recursive and eventually
results in a RuntimeError (exceeds max stack depth).
Whoops. :-(

To completely bulletproof this, you could add a metaclass to Node to
check that the Node subclass that is being created has a method called
either "render" or "iter_render" (or both). That's only run once each
import, so won't give back any of the savings in real terms.

Regards,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Jacob Kaplan-Moss
2007-06-14 23:16:22 UTC
Permalink
Post by Malcolm Tredinnick
Whoops. :-(
Yeah, I got stuck on that, too :)
Post by Malcolm Tredinnick
To completely bulletproof this, you could add a metaclass to Node to
check that the Node subclass that is being created has a method called
either "render" or "iter_render" (or both). That's only run once each
import, so won't give back any of the savings in real terms.
Oh, that would work well.

Any reason not to spell ``iter_render`` as ``__iter__``?

Jacob

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Brian Harring
2007-06-15 10:08:45 UTC
Permalink
Post by Jacob Kaplan-Moss
Post by Malcolm Tredinnick
Whoops. :-(
Yeah, I got stuck on that, too :)
Post by Malcolm Tredinnick
To completely bulletproof this, you could add a metaclass to Node to
check that the Node subclass that is being created has a method called
either "render" or "iter_render" (or both). That's only run once each
import, so won't give back any of the savings in real terms.
Oh, that would work well.
Any reason not to spell ``iter_render`` as ``__iter__``?
__iter__ is used by iter, takes no args; iter_render (and render)
however, take a single arg- context.

Plus __iter__ is already used for visiting each node of the tree ;)

~harring
Jacob Kaplan-Moss
2007-06-15 11:24:24 UTC
Permalink
Post by Brian Harring
Post by Jacob Kaplan-Moss
Any reason not to spell ``iter_render`` as ``__iter__``?
__iter__ is used by iter, takes no args; iter_render (and render)
however, take a single arg- context.
Yeah, that's a good reason :)

Note to self: think about suggestions *BEFORE* posting 'em.

Jacob

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Malcolm Tredinnick
2007-06-17 07:12:19 UTC
Permalink
Post by Brian Harring
Just filed ticket 4565, which basically converts template rendering
away from "build a string within this node of subnode results, return
it, wash rinse repeat", and into "yield each subnode chunk, and my
data as it's available".
* instant results; as long as you don't have any tags/nodes that
require collapsing their subtree down into a single string, data is
sent out immediately. Rather nice if you have a slow page- you start
getting chunks of the page immediately, rather then having to wait for
the full 10s/whatever time for the page to render.
* Far less abusive on memory; usual 'spanish inquisition' heavy
test test (term coined by mtreddinick, but it works), reduction from
84m to 71m for usage at the end of rendering. What I find rather
interesting about that reduction is that the resultant page is 6.5
mb; the extra 7mb I'm actually not sure where the reduction comes
from (suspect there is some copy idiocy somewhere forcing a new
string)- alternatively, may just be intermediate data hanging around,
since I've been working with this code in one form or another for 3
months now, and still haven't figured out the 'why' for that diff.
There was one subtle bug whose fix negates these improvements in some
cases: if any middleware needs access to the contents of the response,
we need to store a copy of the stringified response content. Since the
iterator version isn't rewindable, you get to access the iterators once
only. That's handled automatically, though, so it's only a penalty in
the necessary cases and never a development burden.

Other than that (and a few minor formatting things), the patch was fine.
All committed now -- along with protection against infinite looping in
render().

Thanks,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Malcolm Tredinnick
2007-06-21 10:33:05 UTC
Permalink
Brian,
Post by Brian Harring
Just filed ticket 4565, which basically converts template rendering
away from "build a string within this node of subnode results, return
it, wash rinse repeat", and into "yield each subnode chunk, and my
data as it's available".
[...]
Post by Brian Harring
* Far less abusive on memory; usual 'spanish inquisition' heavy
test test (term coined by mtreddinick, but it works), reduction from
84m to 71m for usage at the end of rendering. What I find rather
interesting about that reduction is that the resultant page is 6.5
mb; the extra 7mb I'm actually not sure where the reduction comes
from (suspect there is some copy idiocy somewhere forcing a new
string)- alternatively, may just be intermediate data hanging around,
since I've been working with this code in one form or another for 3
months now, and still haven't figured out the 'why' for that diff.
When you were doing this memory checking, what server configuration were
you using?

Regards,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Brian Harring
2007-06-21 17:16:48 UTC
Permalink
Post by Malcolm Tredinnick
Brian,
Post by Brian Harring
Just filed ticket 4565, which basically converts template rendering
away from "build a string within this node of subnode results, return
it, wash rinse repeat", and into "yield each subnode chunk, and my
data as it's available".
[...]
Post by Brian Harring
* Far less abusive on memory; usual 'spanish inquisition' heavy
test test (term coined by mtreddinick, but it works), reduction from
84m to 71m for usage at the end of rendering. What I find rather
interesting about that reduction is that the resultant page is 6.5
mb; the extra 7mb I'm actually not sure where the reduction comes
from (suspect there is some copy idiocy somewhere forcing a new
string)- alternatively, may just be intermediate data hanging around,
since I've been working with this code in one form or another for 3
months now, and still haven't figured out the 'why' for that diff.
When you were doing this memory checking, what server configuration were
you using?
lighttpd/fcgi; the gain there is from what directly comes out of the
templating subsystem (more specifically, HttpResponse gets an iterable
instead of a string). Gain there *should* be constant regardless of
setup- exemption to this would be if your configuration touches
.content, or prefers to collect the full response and send that
instead of buffering- if your setup kills the iteration further up
(and it's not middleware related), would be interested.

Nudge on ticket #4625 btw ;)

~harring
Malcolm Tredinnick
2007-06-21 23:47:57 UTC
Permalink
Post by Brian Harring
Post by Malcolm Tredinnick
Brian,
Post by Brian Harring
Just filed ticket 4565, which basically converts template rendering
away from "build a string within this node of subnode results, return
it, wash rinse repeat", and into "yield each subnode chunk, and my
data as it's available".
[...]
Post by Brian Harring
* Far less abusive on memory; usual 'spanish inquisition' heavy
test test (term coined by mtreddinick, but it works), reduction from
84m to 71m for usage at the end of rendering. What I find rather
interesting about that reduction is that the resultant page is 6.5
mb; the extra 7mb I'm actually not sure where the reduction comes
from (suspect there is some copy idiocy somewhere forcing a new
string)- alternatively, may just be intermediate data hanging around,
since I've been working with this code in one form or another for 3
months now, and still haven't figured out the 'why' for that diff.
When you were doing this memory checking, what server configuration were
you using?
lighttpd/fcgi; the gain there is from what directly comes out of the
templating subsystem (more specifically, HttpResponse gets an iterable
instead of a string). Gain there *should* be constant regardless of
setup- exemption to this would be if your configuration touches
.content, or prefers to collect the full response and send that
instead of buffering- if your setup kills the iteration further up
(and it's not middleware related), would be interested.
Nudge on ticket #4625 btw ;)
You can safely assume I'm aware of it. #4625 is way down the list,
though; there are bigger problems. I'm currently trying to work through
all the unexpected side-effects of the first change (in between being
really busy with Real Life). If we don't come up with a solution to the
hanging database connections today -- see relevant thread on the
django-users list -- I'm going to back out the whole iterator change for
a while until we can fix things properly.

The reason for the original question was part of that. We might have to
give back all the memory savings, since we need to know when use of the
database is definitely finished. One solution that remains
WSGI-compliant is to have the __call__ method return a single item
iterable of all the content which is basically a return to the old
behaviour (slightly fewer string copies, but not many). That is how the
mod_python handler works already ,in effect, so that's easier to fix.
Alternatively, we have to introduce much tighter coupling between
templates, backend and HTTP layer to manage the database connections
correctly, which would be a nasty thing to have to do.

Making the change at the WSGI handler level is more efficient in terms
of network traffic, since WSGI servers are not permitted to buffer
iterated writes. Lots of short writes can be occurring with a compliant
server at the moment. But if we do that, it's not clear it's enough of a
win over the original code to warrant it.

Malcolm



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Brian Harring
2007-06-22 01:29:45 UTC
Permalink
Post by Malcolm Tredinnick
Post by Brian Harring
Post by Malcolm Tredinnick
Brian,
Post by Brian Harring
Just filed ticket 4565, which basically converts template rendering
away from "build a string within this node of subnode results, return
it, wash rinse repeat", and into "yield each subnode chunk, and my
data as it's available".
[...]
Post by Brian Harring
* Far less abusive on memory; usual 'spanish inquisition' heavy
test test (term coined by mtreddinick, but it works), reduction from
84m to 71m for usage at the end of rendering. What I find rather
interesting about that reduction is that the resultant page is 6.5
mb; the extra 7mb I'm actually not sure where the reduction comes
from (suspect there is some copy idiocy somewhere forcing a new
string)- alternatively, may just be intermediate data hanging around,
since I've been working with this code in one form or another for 3
months now, and still haven't figured out the 'why' for that diff.
When you were doing this memory checking, what server configuration were
you using?
lighttpd/fcgi; the gain there is from what directly comes out of the
templating subsystem (more specifically, HttpResponse gets an iterable
instead of a string). Gain there *should* be constant regardless of
setup- exemption to this would be if your configuration touches
.content, or prefers to collect the full response and send that
instead of buffering- if your setup kills the iteration further up
(and it's not middleware related), would be interested.
Nudge on ticket #4625 btw ;)
You can safely assume I'm aware of it. #4625 is way down the list,
though; there are bigger problems. I'm currently trying to work through
all the unexpected side-effects of the first change (in between being
really busy with Real Life). If we don't come up with a solution to the
hanging database connections today -- see relevant thread on the
django-users list -- I'm going to back out the whole iterator change for
a while until we can fix things properly.
Up to you regarding punting it; very least, I don't much like breaking
folks setups, so a temp punt won't get complaints out of me.

Re: the new thread, not on that ml, although looks of it ought to be.
Also, in the future if it *looks* like I had a hand in whatever
borkage is afoot, don't hesitate to cc me- as you said, real life
intervenes, so I'm not watching everything. Rather have a few false
positives email wise, then be missing something I may have broke.
Post by Malcolm Tredinnick
The reason for the original question was part of that. We might have to
give back all the memory savings, since we need to know when use of the
database is definitely finished. One solution that remains
WSGI-compliant is to have the __call__ method return a single item
iterable of all the content which is basically a return to the old
behaviour (slightly fewer string copies, but not many). That is how the
mod_python handler works already ,in effect, so that's easier to fix.
Alternatively, we have to introduce much tighter coupling between
templates, backend and HTTP layer to manage the database connections
correctly, which would be a nasty thing to have to do.
Making the change at the WSGI handler level is more efficient in terms
of network traffic, since WSGI servers are not permitted to buffer
iterated writes. Lots of short writes can be occurring with a compliant
server at the moment. But if we do that, it's not clear it's enough of a
win over the original code to warrant it.
Keep in mind I'm having a helluva time finding *exactly* whats
required of a handler beyond __iter__/close, but a potential approach

from itertools import chain

class IterConsumedSignal(object):

def __iter__(self):
return self

def next(self):
dispatcher.send(signal=signals.request_finished)
raise StopIteration()

class ResponseWrapper(object):

def __init__(self, response):
self._response = response
if hasattr(self._response, 'close'):
self.close = self._response.close

def __iter__(self):
assert self._response is not None
try:
if isinstance(self._response, basestring):
return chain([self._response], IterConsumedSignal())
return chain(self._response, IterConsumedSignal())
finally:
self._response = None


then for __call__...

def __call__(self, environ, start_response):
from django.conf import settings

# Set up middleware if needed. We couldn't do this earlier, because
# settings weren't available.
if self._request_middleware is None:
self.load_middleware()

dispatcher.send(signal=signals.request_started)
try:
request = WSGIRequest(environ)
response = self.get_response(request)

# Apply response middleware
for middleware_method in self._response_middleware:
response = middleware_method(request, response)

except:
dispatcher.send(signal=signals.request_finished)
raise

try:
status_text = STATUS_CODE_TEXT[response.status_code]
except KeyError:
status_text = 'UNKNOWN STATUS CODE'
status = '%s %s' % (response.status_code, status_text)
response_headers = response.headers.items()
for c in response.cookies.values():
response_headers.append(('Set-Cookie', c.output(header='')))
start_response(status, response_headers)
return ResponseWrapper(response)

Failing with this approach is that if anything else is accessed on the
response, it's no longer accessible- the ResponseWrapper specifically
lacks a __getattr__ proxying to avoid the potential of the underlying
response being inadvertantly consumed.

Thoughts? Upshot of this approach is that it's fairly loosely
coupled, and is still able to preserve the iteration behaviour.
Downside is that if there are other attrs/methods that should be
exposed, have to add the proxying to ResponseWrapper.

~harring
Malcolm Tredinnick
2007-06-22 01:57:58 UTC
Permalink
[...]
Post by Brian Harring
Post by Malcolm Tredinnick
Making the change at the WSGI handler level is more efficient in terms
of network traffic, since WSGI servers are not permitted to buffer
iterated writes. Lots of short writes can be occurring with a compliant
server at the moment. But if we do that, it's not clear it's enough of a
win over the original code to warrant it.
Keep in mind I'm having a helluva time finding *exactly* whats
required of a handler beyond __iter__/close,
That's how you can tell you're reading the WSGI spec. I find it a bit
tricky to interpret in places.
Post by Brian Harring
but a potential approach
[...]

Similar but slightly different to what I've been trying to make work.
The wrapping and delayed "catch and release" work on signals has had me
searching for alternative possibilities, too.

One alternative I was trying was to hook something up to the iterator on
the HttpResponse being exhausted as a trigger for cleanup. Depending on
which way I squint, I can convince myself it's either a good or a bad
idea (I dislike separating start-request and end-request signalling
quite so widely, for example).
Post by Brian Harring
Failing with this approach is that if anything else is accessed on the
response, it's no longer accessible- the ResponseWrapper specifically
lacks a __getattr__ proxying to avoid the potential of the underlying
response being inadvertantly consumed.
My reading of the spec (PEP 333) is that what you've done should be safe
and sufficient. I'm not as comfortably familiar with WSGI details as I
would like to be, though.
Post by Brian Harring
Thoughts? Upshot of this approach is that it's fairly loosely
coupled, and is still able to preserve the iteration behaviour.
Downside is that if there are other attrs/methods that should be
exposed, have to add the proxying to ResponseWrapper.
My only concerns are whether there's something a bit neater that avoids
yet another wrapper, as I mentioned above. Functionality-wise, I think
your solution is sufficient (particularly since I wrote a similar
version last night, so two people inventing the same thing gives it some
legitimacy).

This problem is my high-priority item at the moment. When I've done my
necessary work today, I'm going to hook up a couple of different wsgi
servers (along with mod_python) and make sure we fix it or back it out
for a bit. Will keep you in the loop either way.

Regards,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Brian Harring
2007-06-22 02:20:40 UTC
Permalink
Post by Malcolm Tredinnick
[...]
Post by Brian Harring
Post by Malcolm Tredinnick
Making the change at the WSGI handler level is more efficient in terms
of network traffic, since WSGI servers are not permitted to buffer
iterated writes. Lots of short writes can be occurring with a compliant
server at the moment. But if we do that, it's not clear it's enough of a
win over the original code to warrant it.
Keep in mind I'm having a helluva time finding *exactly* whats
required of a handler beyond __iter__/close,
That's how you can tell you're reading the WSGI spec. I find it a bit
tricky to interpret in places.
Good to know I'm not just being an idiot and missing something
obvious. :)
Post by Malcolm Tredinnick
Post by Brian Harring
but a potential approach
[...]
Similar but slightly different to what I've been trying to make work.
The wrapping and delayed "catch and release" work on signals has had me
searching for alternative possibilities, too.
One alternative I was trying was to hook something up to the iterator on
the HttpResponse being exhausted as a trigger for cleanup. Depending on
which way I squint, I can convince myself it's either a good or a bad
idea (I dislike separating start-request and end-request signalling
quite so widely, for example).
Thought about it, but decided against it since it would require an
extra method/protocol for django Response objects.
Post by Malcolm Tredinnick
Post by Brian Harring
Thoughts? Upshot of this approach is that it's fairly loosely
coupled, and is still able to preserve the iteration behaviour.
Downside is that if there are other attrs/methods that should be
exposed, have to add the proxying to ResponseWrapper.
My only concerns are whether there's something a bit neater that avoids
yet another wrapper, as I mentioned above.
Related, long term intention for signals (non paired) was to make them
actually track the misc listeners- via that, could just ask a specific
signal "is anything listening to you?". If something is, wrap, if
not, return the original response object.

Downside to that approach is that if an iter_render does something
special like adding a signal on the fly, could potentially screwup.

Either way, usage of chain there means the overhead *should* be pretty
minimal. Worst case, could just override close- fire the signal
there, then defer to the original objects close (if it had one), zero
overhead in the content iteration.
Post by Malcolm Tredinnick
Functionality-wise, I think
your solution is sufficient (particularly since I wrote a similar
version last night, so two people inventing the same thing gives it some
legitimacy).
This problem is my high-priority item at the moment. When I've done my
necessary work today, I'm going to hook up a couple of different wsgi
servers (along with mod_python) and make sure we fix it or back it out
for a bit.
Thinking some tests would be a useful addition also; number of things
that have popped up has me a bit concerned about the assertions the
test suite make.
Post by Malcolm Tredinnick
Will keep you in the loop either way.
Would be appreciated.

Meanwhile, thinking about it, shifting the signal firing to close
should be a bit saner- as said, will avoid the extra layer (even if it
is c code) for iteration.

~harring
Graham Dumpleton
2007-06-22 03:47:05 UTC
Permalink
Post by Malcolm Tredinnick
This problem is my high-priority item at the moment. When I've done my
necessary work today, I'm going to hook up a couple of different wsgi
servers (along with mod_python) and make sure we fix it or back it out
for a bit. Will keep you in the loop either way.
Ah, just found this thread after making comments on that proposed
patch over on the database thread over on the user list.

http://groups.google.com/group/django-users/browse_frm/thread/588718d711a5a0f0

If you are going to do testing, I hope you will consider also testing
mod_wsgi.

One thing I would like to point out is that although you perceive the
goal as being reducing overall memory use, do keep in mind that,
especially perhaps with deployment solutions which run under Apache,
that you may loose out more than you gain by forcing lots of small
writes of data as the overhead of all those small (flushed) writes can
actually affect throughput.

When I have done testing with both mod_python and mod_wsgi, albeit not
with really big responses, the best performance was always had when
all output was turned into a single string in Python code before
writing it back through the Apache API. In other words, it was better
than having Apache doing the buffering of the individual writes within
its output buckets and then only writing and flushing them at the end
of the response. And depending on the response, much much better than
allowing WSGI to flush after every iterable value returned.

Because of the poor performance of WSGI when lots of small values are
returned from an iterable, in mod_wsgi I specifically provide a
directive which allows one to optionally violate the WSGI
specification and turn on output buffering by Apache so that
performance will be improved. One could also wrap an application in a
WSGI middleware component which did buffering in Python and get the
best performance, but if an application has some URLs which providing
streaming data that then becomes messy as the middleware component has
to check the URLs and know when it should be turned on or off and that
itself affects performance. In mod_wsgi at least, the directive to
turn on output buffering can be more easily associated with only
specific URLs or otherwise by using Apache Location directive to
restrict the scope. The URL checks in this case being done in Apache
are more efficient than if done in Python.

Thus, you possibly should consider whether there is really one
solution that fits all. You might want to consider having whether
buffering is done or not somehow controllable on a per resource basis.
For small to moderate sizes pages you will get better performance
through buffering than you save in memory. With big pages, progressive
flushes would be better so as to save memory, but you wouldn't want to
write out too small a data in each go. What you may need is some sort
of adaptive buffering scheme that avoids small writes somehow. In
other words, it would do some measure of buffering between needing to
write data, but not the buffer the complete message. Obviously, if
streaming though, you need a way of saying never buffer that overrides
such a scheme.

Graham


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Malcolm Tredinnick
2007-06-22 04:58:00 UTC
Permalink
Post by Graham Dumpleton
Post by Malcolm Tredinnick
This problem is my high-priority item at the moment. When I've done my
necessary work today, I'm going to hook up a couple of different wsgi
servers (along with mod_python) and make sure we fix it or back it out
for a bit. Will keep you in the loop either way.
Ah, just found this thread after making comments on that proposed
patch over on the database thread over on the user list.
http://groups.google.com/group/django-users/browse_frm/thread/588718d711a5a0f0
If you are going to do testing, I hope you will consider also testing
mod_wsgi.
Yep.. it's on the list.
Post by Graham Dumpleton
One thing I would like to point out is that although you perceive the
goal as being reducing overall memory use, do keep in mind that,
especially perhaps with deployment solutions which run under Apache,
that you may loose out more than you gain by forcing lots of small
writes of data as the overhead of all those small (flushed) writes can
actually affect throughput.
[... good stuff snipped ...]

Graham, you've accurately put into words most of the things I've had
running around in my brain with respect to performance on the network
and server writing level, mostly from having traced through individual
request/response cases a lot yesterday. How to do buffering sensibly in
the default case and allow for control in the extreme cases is something
I don't think we've got right yet. Any solution should ideally work for
any WSGI server, so working out how to take advantage of Apache-specific
(or any other server-specific) issues without adding three dozen config
params and calling options is one part of the puzzle.

It isn't the forgotten ugly duckling. Work in progress.

Regards,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Brian Harring
2007-06-22 06:29:03 UTC
Permalink
Post by Graham Dumpleton
Post by Malcolm Tredinnick
This problem is my high-priority item at the moment. When I've done my
necessary work today, I'm going to hook up a couple of different wsgi
servers (along with mod_python) and make sure we fix it or back it out
for a bit. Will keep you in the loop either way.
<snip comments re: "repeated flushes kill performance">

Been digging around a bit for the logic why buffering is disallowed by
WSGI-
http://mail.python.org/pipermail/web-sig/2004-August/000555.html

Looks of it,
http://groups.google.com/group/python-web-sig/browse_thread/thread/8694bd30e46329e5/691ab6a563070404
instead of assuming each yielded chunk requires a flush, an explicit
flush object may be added for WSGI2.

Graham, any comments re: the future of that? If WSGI2 guts the "it
must flush every step of the way" requirement, would
definitely be relevant to the discussion.
Post by Graham Dumpleton
When I have done testing with both mod_python and mod_wsgi, albeit not
with really big responses, the best performance was always had when
all output was turned into a single string in Python code before
writing it back through the Apache API. In other words, it was better
than having Apache doing the buffering of the individual writes within
its output buckets and then only writing and flushing them at the end
of the response.
If you have the stats still floating around, I'd be interested in
them- kind of a given that if you're doing single char writes,
overhead is going to build up, but implication is that even with a
sane chunk size per write, it still was noticably less performant.
That I'd be interested in.
Post by Graham Dumpleton
Because of the poor performance of WSGI when lots of small values are
returned from an iterable, in mod_wsgi I specifically provide a
directive which allows one to optionally violate the WSGI
specification and turn on output buffering by Apache so that
performance will be improved. One could also wrap an application in a
WSGI middleware component which did buffering in Python and get the
best performance, but if an application has some URLs which providing
streaming data that then becomes messy as the middleware component has
to check the URLs and know when it should be turned on or off and that
itself affects performance. In mod_wsgi at least, the directive to
turn on output buffering can be more easily associated with only
specific URLs or otherwise by using Apache Location directive to
restrict the scope. The URL checks in this case being done in Apache
are more efficient than if done in Python.
Personally, I'd be more inclined to just have wsgi at the top of the
stack collapse the results down into a big ass string- use
middleware/marking/whatever to disable that collapse, or slip in
buffering instead of the collapse.

If WSGI isn't buffering friendly, by default drop the attempt at
buffering until it is (someday) basically.
Post by Graham Dumpleton
Thus, you possibly should consider whether there is really one
solution that fits all.
At the top of the stack? Probably not. That doesn't mean however
that the innards of the stack (the template rendering) must aim for
LCD though. Clarifying; worst case, say WSGI long term sticks with
the "it must flush after every chunk" approach. There are other
interfaces however, and they may be more generative/iterative
friendly.

Collapsing an iterable down into a single string is pretty easy- the
only downside to this is that you suffer a bit of overhead from
zipping up/down the generators; that cost is a bit offset by avoiding
intermediate strings however.

So collapsing is definitely doable. The reverse is however not true-
you have the resultant page in memory (thus no savings are
possible), no point in chunking it up. Saner to just hand it to the
backend, and let it decide how best to process it.

Thus I'd argue that it's better to have the internals iterative- it
allows for fcgi backends to behave better, and ought to behave fine
for a WSGI backend. This is assuming the collapsing overhead for WSGI
is basically background noise performance wise- if it's a noticable
hit, that would change things a bit :)

While it's not performance related, one additional argument against
iter_render via James Bennett is that it makes third party code
supplying their own Node derivatives a little tricky if they're trying
to support svn and <=0.96 . A metaclass could solve that, but
potentially relevant since we're discussing horkage induced by the
change.

Considering the number of folk who seem to be running svn (perhaps a
sign a release might be wise ;), I'd be +.5 on punting it out of
mainline and into it's own branch- already have had enough bugs pop
out that were pretty unexpected that I'm a fair bit worried about what
bugs are still uncovered.

~harring
Malcolm Tredinnick
2007-06-22 06:54:27 UTC
Permalink
On Thu, 2007-06-21 at 23:29 -0700, Brian Harring wrote:
[...]
Post by Brian Harring
While it's not performance related, one additional argument against
iter_render via James Bennett is that it makes third party code
supplying their own Node derivatives a little tricky if they're trying
to support svn and <=0.96 .
Why is this an issue? Those classes will implement render() and
Node.iter_render() will call that when needed. They just ignore
iter_render() without any harm at all. I thought that was the big
advantage of your circular render() <-> iter_render() calls in Node: the
API is backwards compatible.
Post by Brian Harring
Considering the number of folk who seem to be running svn (perhaps a
sign a release might be wise.
We always have a lot of people running svn. It's expected. Of course we
have no idea of the actual numbers. That's also expected.
Post by Brian Harring
, I'd be +.5 on punting it out of
mainline and into it's own branch- already have had enough bugs pop
out that were pretty unexpected that I'm a fair bit worried about what
bugs are still uncovered.
Since this really isn't the problem I want to work on right now and
since it is causing some destabilisation and changes block a bit based
on somebody with commit access being ready to react, I'll back it out so
that you can work on it in peace a bit more.

Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Brian Harring
2007-06-22 07:16:11 UTC
Permalink
Post by Malcolm Tredinnick
[...]
Post by Brian Harring
While it's not performance related, one additional argument against
iter_render via James Bennett is that it makes third party code
supplying their own Node derivatives a little tricky if they're trying
to support svn and <=0.96 .
Why is this an issue? Those classes will implement render() and
Node.iter_render() will call that when needed. They just ignore
iter_render() without any harm at all. I thought that was the big
advantage of your circular render() <-> iter_render() calls in Node: the
API is backwards compatible.
External code calling the API is backwards compatible (render just
invokes iter_render); implementations that are overriding behaviour
of a Node derivative however have to override *both* as you pointed
out if code is trying to target .96/svn, which kind of sucks.

It's addressable via including their own

def render(self, context):
return ''.join(self.iter_render, context)

I'd prefer for folks who need that compatibility, that a metaclass be
available they can include that handles it automatically- basically
avoid the potential of a screwup.

Might be overkill however, thoughts definitely welcome on that one.
Post by Malcolm Tredinnick
Post by Brian Harring
, I'd be +.5 on punting it out of
mainline and into it's own branch- already have had enough bugs pop
out that were pretty unexpected that I'm a fair bit worried about what
bugs are still uncovered.
Since this really isn't the problem I want to work on right now and
since it is causing some destabilisation and changes block a bit based
on somebody with commit access being ready to react, I'll back it out so
that you can work on it in peace a bit more.
Works for me. Ironing out the known issues shouldn't be too tricky,
but I'm thinking the test coverage needs a bit of an increase before
iter_render appears in trunk again (try to smoke out any remaining
explosions prior to folks hitting them).

~harring
Malcolm Tredinnick
2007-06-22 07:21:33 UTC
Permalink
Post by Brian Harring
Post by Malcolm Tredinnick
[...]
Post by Brian Harring
While it's not performance related, one additional argument against
iter_render via James Bennett is that it makes third party code
supplying their own Node derivatives a little tricky if they're trying
to support svn and <=0.96 .
Why is this an issue? Those classes will implement render() and
Node.iter_render() will call that when needed. They just ignore
iter_render() without any harm at all. I thought that was the big
advantage of your circular render() <-> iter_render() calls in Node: the
API is backwards compatible.
External code calling the API is backwards compatible (render just
invokes iter_render);
And vice-versa: Node.iter_render() called self.render(...). So if a
sub-class of Node only supplies a render() method and leaves
iter_render() alone because it doesn't care about it, calling
iter_render() on it will still do the right thing.

Malcolm



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Malcolm Tredinnick
2007-06-22 07:16:01 UTC
Permalink
On Fri, 2007-06-22 at 16:54 +1000, Malcolm Tredinnick wrote:
[...]
Post by Malcolm Tredinnick
Since this really isn't the problem I want to work on right now and
since it is causing some destabilisation and changes block a bit based
on somebody with commit access being ready to react, I'll back it out so
that you can work on it in peace a bit more.
Done in [5511].

Regards,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Graham Dumpleton
2007-06-22 06:57:41 UTC
Permalink
Post by Brian Harring
Post by Graham Dumpleton
Post by Malcolm Tredinnick
This problem is my high-priority item at the moment. When I've done my
necessary work today, I'm going to hook up a couple of different wsgi
servers (along with mod_python) and make sure we fix it or back it out
for a bit. Will keep you in the loop either way.
<snip comments re: "repeated flushes kill performance">
Been digging around a bit for the logic why buffering is disallowed by
WSGI-http://mail.python.org/pipermail/web-sig/2004-August/000555.html
Looks of it,http://groups.google.com/group/python-web-sig/browse_thread/thread/86...
instead of assuming each yielded chunk requires a flush, an explicit
flush object may be added for WSGI2.
Graham, any comments re: the future of that? If WSGI2 guts the "it
must flush every step of the way" requirement, would
definitely be relevant to the discussion.
The idea was certainly brought up but not really thrashed out as to
how it would work in practice. Not long after that thread the whole
WSGI 2.0 discussion pretty well died off again so there has been no
more discussion on it.

The initial idea that was put up was that by default buffering is
assumed and that to flag that a flush should be done that a specific
flush request object be yielded. What wasn't addressed was what the
flush object should be. Personally I'm not too keen on the idea of it
being some special class that has to be supplied by a WSGI adapter
through the WSGI environment. I can't see why a None object couldn't
be used.

Also I thought that simply making that change wouldn't be enough.
Since WSGI has some ties to CGI specification, I would like to see the
idea from CGI 1.2 be used of having a Script-Control header that could
be passed back by a request handler. In this case a request handler
might use it to indicate a buffering behaviour different to what the
default may be. Thus if the default was still to flush, the request
handler might pass back the header Script-Control with value 'buffer-
response'. So rather than a request handler explicitly controlling
when flushing should occur, it could indicate simply whether it wanted
buffering or not. This wouldn't preclude it still yielding a None to
force a flush if buffering was requested to be used.

Am not sure how this idea of using Script-Control header would go down
in WSGI group as from what I have seen they tend to like passing
objects of methods through the WSGI environment and require a request
handler to call back to a higher layer that way. To me the Script-
Control header that CGI 1.2 introduces seems simpler and could be used
for various other purposes as well.
Post by Brian Harring
Post by Graham Dumpleton
When I have done testing with both mod_python and mod_wsgi, albeit not
with really big responses, the best performance was always had when
all output was turned into a single string in Python code before
writing it back through the Apache API. In other words, it was better
than having Apache doing the buffering of the individual writes within
its output buckets and then only writing and flushing them at the end
of the response.
If you have the stats still floating around, I'd be interested in
them- kind of a given that if you're doing single char writes,
overhead is going to build up, but implication is that even with a
sane chunk size per write, it still was noticably less performant.
That I'd be interested in.
It was very adhoc and was more rule of thumb rather than anything
sophisticated.

Time for me to catch a train. At least the weekend is here down
under. :-)

Graham


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Loading...