Post by Malcolm Tredinnick Post by Brian Harring Post by Malcolm Tredinnick
Post by Brian Harring
Just filed ticket 4565, which basically converts template rendering
away from "build a string within this node of subnode results, return
it, wash rinse repeat", and into "yield each subnode chunk, and my
data as it's available".
Post by Brian Harring
* Far less abusive on memory; usual 'spanish inquisition' heavy
test test (term coined by mtreddinick, but it works), reduction from
84m to 71m for usage at the end of rendering. What I find rather
interesting about that reduction is that the resultant page is 6.5
mb; the extra 7mb I'm actually not sure where the reduction comes
from (suspect there is some copy idiocy somewhere forcing a new
string)- alternatively, may just be intermediate data hanging around,
since I've been working with this code in one form or another for 3
months now, and still haven't figured out the 'why' for that diff.
When you were doing this memory checking, what server configuration were
lighttpd/fcgi; the gain there is from what directly comes out of the
templating subsystem (more specifically, HttpResponse gets an iterable
instead of a string). Gain there *should* be constant regardless of
setup- exemption to this would be if your configuration touches
.content, or prefers to collect the full response and send that
instead of buffering- if your setup kills the iteration further up
(and it's not middleware related), would be interested.
Nudge on ticket #4625 btw ;)
You can safely assume I'm aware of it. #4625 is way down the list,
though; there are bigger problems. I'm currently trying to work through
all the unexpected side-effects of the first change (in between being
really busy with Real Life). If we don't come up with a solution to the
hanging database connections today -- see relevant thread on the
django-users list -- I'm going to back out the whole iterator change for
a while until we can fix things properly.
Up to you regarding punting it; very least, I don't much like breaking
folks setups, so a temp punt won't get complaints out of me.
Re: the new thread, not on that ml, although looks of it ought to be.
Also, in the future if it *looks* like I had a hand in whatever
borkage is afoot, don't hesitate to cc me- as you said, real life
intervenes, so I'm not watching everything. Rather have a few false
positives email wise, then be missing something I may have broke.
Post by Malcolm Tredinnick
The reason for the original question was part of that. We might have to
give back all the memory savings, since we need to know when use of the
database is definitely finished. One solution that remains
WSGI-compliant is to have the __call__ method return a single item
iterable of all the content which is basically a return to the old
behaviour (slightly fewer string copies, but not many). That is how the
mod_python handler works already ,in effect, so that's easier to fix.
Alternatively, we have to introduce much tighter coupling between
templates, backend and HTTP layer to manage the database connections
correctly, which would be a nasty thing to have to do.
Making the change at the WSGI handler level is more efficient in terms
of network traffic, since WSGI servers are not permitted to buffer
iterated writes. Lots of short writes can be occurring with a compliant
server at the moment. But if we do that, it's not clear it's enough of a
win over the original code to warrant it.
Keep in mind I'm having a helluva time finding *exactly* whats
required of a handler beyond __iter__/close, but a potential approach
from itertools import chain
def __init__(self, response):
self._response = response
if hasattr(self._response, 'close'):
self.close = self._response.close
assert self._response is not None
if isinstance(self._response, basestring):
return chain([self._response], IterConsumedSignal())
return chain(self._response, IterConsumedSignal())
self._response = None
then for __call__...
def __call__(self, environ, start_response):
from django.conf import settings
# Set up middleware if needed. We couldn't do this earlier, because
# settings weren't available.
if self._request_middleware is None:
request = WSGIRequest(environ)
response = self.get_response(request)
# Apply response middleware
for middleware_method in self._response_middleware:
response = middleware_method(request, response)
status_text = STATUS_CODE_TEXT[response.status_code]
status_text = 'UNKNOWN STATUS CODE'
status = '%s %s' % (response.status_code, status_text)
response_headers = response.headers.items()
for c in response.cookies.values():
Failing with this approach is that if anything else is accessed on the
response, it's no longer accessible- the ResponseWrapper specifically
lacks a __getattr__ proxying to avoid the potential of the underlying
response being inadvertantly consumed.
Thoughts? Upshot of this approach is that it's fairly loosely
coupled, and is still able to preserve the iteration behaviour.
Downside is that if there are other attrs/methods that should be
exposed, have to add the proxying to ResponseWrapper.