Malcolm Tredinnick
2008-07-06 01:51:38 UTC
I thought I'd sit down yesterday, do a final review of #285 and commit
it. I was naïve. It turned out to be all I did for the day.
There are some compromises needed somewhere, so it's time for audience
participation. When responding to this email, please try to keep in mind
that there are many, many different ways people install and use Django
and any solution has to work for all of them, not just your preferred
method. This also isn't a case where you can say "the specs say..",
since any specs aren't worth the paper they're printed on here. What
counts is what webservers used here in the real world actually do.
Fortunately, web servers mostly follow the specs, but that doesn't mean
ISPs do.
The good news: I have a patch that is mostly backwards compatible,
doesn't require intrusive code changes and even works with Apache's
mod_rewrite, thus avoiding counter-intuitive URLs when using apache
+fastcgi the way a lot of shared-hosting environments use it
(mod_rewrite plus a django.fcgi file). Tested it on a bunch of sets
(nginx, lighttpd, cherrypy's wsgi server, apache +
fastcgi/mod_python/mod_wsgi) and everything looks good.
Thus endeth the good news.
To understand the bad news, a quick intro to the problems we have and
some "ideal" solutions (there are some proposed solutions at the bottom
for the attention deficit types who want to skip ahead)...
Firstly, there's the "development" vs "production" differences with URL
prefixes. You don't always know at development time what prefix the
applications will be installed under. This shouldn't be a problem, since
web servers set SCRIPT_NAME to be the portion they are managing and (in
theory) PATH_INFO to be the bits that are passed to your application for
handling. So SCRIPT_NAME + PATH_INFO (+ QUERY_STRING) is the URL. Except
things are never that easy. Mod_python (and a few other Apache plugins)
have noticed that PATH_INFO isn't always set correctly by Apache,
mod_rewrite changes things around and so on. Still, we can work around
all of those, so let's safely assume we have a SCRIPT_NAME portion that
is the webserver prefix and PATH_INFO is the bit our Django apps care
about for URL parsing (it's true, we can derive those bits in all cases
with minimal amounts of hassle).
Code needs to know how to construct proper URLs. Which means, amongst
other things, it needs to know SCRIPT_NAME so that that can be added as
a prefix. Portable code would also like to only have to work with
PATH_INFO, so that it is agnostic about the prefix under which it is
installed. Thus developers can have things under whatever prefix they
like and when installation happens, there isn't a dependency on the
final deployment URL.
Problem #1:
-----------
I suspect there are a number of installations around, particularly using
mod_python, that have Apache configuration files looking something like
this:
<Location /admin/>
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE mysite.settings
...
</Location>
<Location /site_prefix/>
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE mysite.settings
...
</Location>
Only *some* pieces under "/" are handled by Django, leaving the rest of
that namespace free for static documents, other scripts, etc. The fact
that mod_python passes through the URL as "/admin/foo/" and
"/site_prefix/foo/" means that the URL file (it's the same URLConf file
in both cases, since it's the same settings file) can differentiate
between the two.
Portable URL practices here would mean that the Django code shouldn't
care (or know much) about "/site_prefix/" and "/admin/" in those cases,
so there's no easy way to tell them apart. This is particularly
problematic with newforms-admin, since it's urlpatterns entry is
"admin/(.*)$"
Strip the leading "admin/" and no other pattern is going to get a look
in.
But that isn't the biggest problem...
Problem #2
-----------
The "{% url ... %}" template tag and the reverse() function. :-(
Both of these need to be aware of SCRIPT_NAME (or the equivalent) so
that they can put the right prefix onto the URLs they construct. Since
template rendering is independent of the current request, this is really
hard to work around. To the point that I don't have a solution that I'd
be comfortable including in the code. Any code that is intended to be at
all portable would need to be passing the URL prefix into every
HTTP-destined template rendering call, or else we'd need to have it
available in the thread's environment (the latter is the closest I've
come to finding happiness -- we already use the current thread's
environment for the active translation context, so this would be another
aspect like that).
Solutions(?)
============
Firstly, I'll say that we have to include something to fix the problem
with the wrong path being passed through. On everything except
mod_python, the SCRIPT_NAME is an important component of the path.
That's a solved problem, though. John Melesky's patches in #285 did most
of the work and I've shuffled things a little to make it more backwards
compatible and to handle mod_rewrite in the Apache case. So no problems
there. Take it as given that we present the proper full path in the
request object. That's a bug fix, nothing more or less.
The meta-problem is that to avoid problems #1 and #2 above, we need to
keep the URLConf files aware of the full path (solution #3 below tries
to avoid this, but it has a hole).
Solution #1
===========
Nothing changes in URLConf-land. Every time you install under a
different prefix, you need to edit your root URLConf (only). This means
that if you're writing code that is installation location agnostic, it's
going to look like this:
SITE_PREFIX="/site_prefix/"
urlpatterns = patterns('',
('^%s/foo/...' % SITE_PREFIX, ....),
...
)
I've written a couple of sites like that and all that SITE_PREFIX stuff
hanging around is kind of noisy and interferes with the real point of
the code. But it gets the job done.
So no changes == minimally disruptive, but slightly messy long-term.
Solution #2
===========
We introduce a new second argument to patterns() which is the common
prefix to put before all the patterns in that call. This isn't hard on a
technical level and would be backwards compatible, if a little prone to
misreading of old code. The above example now becomes
urlpatterns = patterns('', SITE_PREFIX,
('foo/...', ...),
...
)
Less shouting all round.
Solution #3
===========
We shove the current SCRIPT_NAME prefix into the currently active
context, just as we do with the active locale. The reverse() function
knows to look there for the prefix (and if nothing's present it's the
same as an empty prefix). Using the current thread's context doesn't
make me deliriously happy, but I can live with it for something like
this.
This is the neatest solution from an ideal world perspective, since it
respects the design principles behind PATH_INFO and SCRIPT_NAME and
similar webserver-set environment variables.
Unfortunately, I don't see how to make something like the admin pattern
for newforms-admin work, then. Particularly under installations do the
SCRIPT_NAME / PATH_INFO split. If we could find a way of saying "these
go to the admin path, these go to the foo path, these go to the blah
path", I'd probably like this solution a bit more.
<End of solutions>
Personally, I prefer solution #2 and somehow I'll learn to live with the
fact there'll always be the ugliness of not being prefix-independent.
It's sad, but it might be the most pragmatic.
However, I'd like to hear some other well-considered opinions first in
case there's a possibility I've forgotten.
Regards,
Malcolm
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
it. I was naïve. It turned out to be all I did for the day.
There are some compromises needed somewhere, so it's time for audience
participation. When responding to this email, please try to keep in mind
that there are many, many different ways people install and use Django
and any solution has to work for all of them, not just your preferred
method. This also isn't a case where you can say "the specs say..",
since any specs aren't worth the paper they're printed on here. What
counts is what webservers used here in the real world actually do.
Fortunately, web servers mostly follow the specs, but that doesn't mean
ISPs do.
The good news: I have a patch that is mostly backwards compatible,
doesn't require intrusive code changes and even works with Apache's
mod_rewrite, thus avoiding counter-intuitive URLs when using apache
+fastcgi the way a lot of shared-hosting environments use it
(mod_rewrite plus a django.fcgi file). Tested it on a bunch of sets
(nginx, lighttpd, cherrypy's wsgi server, apache +
fastcgi/mod_python/mod_wsgi) and everything looks good.
Thus endeth the good news.
To understand the bad news, a quick intro to the problems we have and
some "ideal" solutions (there are some proposed solutions at the bottom
for the attention deficit types who want to skip ahead)...
Firstly, there's the "development" vs "production" differences with URL
prefixes. You don't always know at development time what prefix the
applications will be installed under. This shouldn't be a problem, since
web servers set SCRIPT_NAME to be the portion they are managing and (in
theory) PATH_INFO to be the bits that are passed to your application for
handling. So SCRIPT_NAME + PATH_INFO (+ QUERY_STRING) is the URL. Except
things are never that easy. Mod_python (and a few other Apache plugins)
have noticed that PATH_INFO isn't always set correctly by Apache,
mod_rewrite changes things around and so on. Still, we can work around
all of those, so let's safely assume we have a SCRIPT_NAME portion that
is the webserver prefix and PATH_INFO is the bit our Django apps care
about for URL parsing (it's true, we can derive those bits in all cases
with minimal amounts of hassle).
Code needs to know how to construct proper URLs. Which means, amongst
other things, it needs to know SCRIPT_NAME so that that can be added as
a prefix. Portable code would also like to only have to work with
PATH_INFO, so that it is agnostic about the prefix under which it is
installed. Thus developers can have things under whatever prefix they
like and when installation happens, there isn't a dependency on the
final deployment URL.
Problem #1:
-----------
I suspect there are a number of installations around, particularly using
mod_python, that have Apache configuration files looking something like
this:
<Location /admin/>
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE mysite.settings
...
</Location>
<Location /site_prefix/>
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE mysite.settings
...
</Location>
Only *some* pieces under "/" are handled by Django, leaving the rest of
that namespace free for static documents, other scripts, etc. The fact
that mod_python passes through the URL as "/admin/foo/" and
"/site_prefix/foo/" means that the URL file (it's the same URLConf file
in both cases, since it's the same settings file) can differentiate
between the two.
Portable URL practices here would mean that the Django code shouldn't
care (or know much) about "/site_prefix/" and "/admin/" in those cases,
so there's no easy way to tell them apart. This is particularly
problematic with newforms-admin, since it's urlpatterns entry is
"admin/(.*)$"
Strip the leading "admin/" and no other pattern is going to get a look
in.
But that isn't the biggest problem...
Problem #2
-----------
The "{% url ... %}" template tag and the reverse() function. :-(
Both of these need to be aware of SCRIPT_NAME (or the equivalent) so
that they can put the right prefix onto the URLs they construct. Since
template rendering is independent of the current request, this is really
hard to work around. To the point that I don't have a solution that I'd
be comfortable including in the code. Any code that is intended to be at
all portable would need to be passing the URL prefix into every
HTTP-destined template rendering call, or else we'd need to have it
available in the thread's environment (the latter is the closest I've
come to finding happiness -- we already use the current thread's
environment for the active translation context, so this would be another
aspect like that).
Solutions(?)
============
Firstly, I'll say that we have to include something to fix the problem
with the wrong path being passed through. On everything except
mod_python, the SCRIPT_NAME is an important component of the path.
That's a solved problem, though. John Melesky's patches in #285 did most
of the work and I've shuffled things a little to make it more backwards
compatible and to handle mod_rewrite in the Apache case. So no problems
there. Take it as given that we present the proper full path in the
request object. That's a bug fix, nothing more or less.
The meta-problem is that to avoid problems #1 and #2 above, we need to
keep the URLConf files aware of the full path (solution #3 below tries
to avoid this, but it has a hole).
Solution #1
===========
Nothing changes in URLConf-land. Every time you install under a
different prefix, you need to edit your root URLConf (only). This means
that if you're writing code that is installation location agnostic, it's
going to look like this:
SITE_PREFIX="/site_prefix/"
urlpatterns = patterns('',
('^%s/foo/...' % SITE_PREFIX, ....),
...
)
I've written a couple of sites like that and all that SITE_PREFIX stuff
hanging around is kind of noisy and interferes with the real point of
the code. But it gets the job done.
So no changes == minimally disruptive, but slightly messy long-term.
Solution #2
===========
We introduce a new second argument to patterns() which is the common
prefix to put before all the patterns in that call. This isn't hard on a
technical level and would be backwards compatible, if a little prone to
misreading of old code. The above example now becomes
urlpatterns = patterns('', SITE_PREFIX,
('foo/...', ...),
...
)
Less shouting all round.
Solution #3
===========
We shove the current SCRIPT_NAME prefix into the currently active
context, just as we do with the active locale. The reverse() function
knows to look there for the prefix (and if nothing's present it's the
same as an empty prefix). Using the current thread's context doesn't
make me deliriously happy, but I can live with it for something like
this.
This is the neatest solution from an ideal world perspective, since it
respects the design principles behind PATH_INFO and SCRIPT_NAME and
similar webserver-set environment variables.
Unfortunately, I don't see how to make something like the admin pattern
for newforms-admin work, then. Particularly under installations do the
SCRIPT_NAME / PATH_INFO split. If we could find a way of saying "these
go to the admin path, these go to the foo path, these go to the blah
path", I'd probably like this solution a bit more.
<End of solutions>
Personally, I prefer solution #2 and somehow I'll learn to live with the
fact there'll always be the ugliness of not being prefix-independent.
It's sad, but it might be the most pragmatic.
However, I'd like to hear some other well-considered opinions first in
case there's a possibility I've forgotten.
Regards,
Malcolm
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---