Discussion:
Database pooling vs. persistent connections
Aymeric Augustin
2013-02-17 11:24:52 UTC
Permalink
**tl;dr** I believe that persistent database connections are a good idea.
Please prove me wrong :)

--------------------

Since I didn't know why the idea of adding a connection pooler to Django was
rejected, I did some research before replying to the cx_Oracle SessionPooling
thread.
To clarify -- we've historically been opposed to adding connection
pooling to Django is for the same reason that we don't include a web
server in Django -- the capability already exists in third party
tools, and they're in a position to do a much better job at it than us
because it's their sole focus. Django doesn't have to be the whole
stack.
All the discussions boil down to this argument, and the only ticket on the
topic is short on details: https://code.djangoproject.com/ticket/11798

--------------------

The connection pools for Django I've looked at replace "open a connection" by
"take a connection from the pool" and "close a connection" by "return the
connection to the pool". This isn't "real" connection pooling: each worker
holds a connection for the entire duration of each request, regardless of
whether it has an open transaction or not.

This requires as many connection as workers, and thus is essentially
equivalent to persistent database connections, except connections can be
rotated among workers.

Persistent connections would eliminate the overhead of creating a connection
(IIRC ~50ms/req), which is the most annoying symptom, without incurring the
complexity of a "real" pooler.

They would be a win for small and medium websites that don't manage their
database transactions manually and where the complexity of maintaining an
external connection pooler isn't justified.

Besides, when Django's transaction middelware is enabled, each request is
wrapped in a single transaction, which reserves a connection. In this case, a
connection pooler won't perform better than persistent connections.

Obviously, large websites should use an external pooler to multiplex their
hundreds of connections from workers into tens of connections to their
database and manage their transactions manually. I don't believe persistent
connections to the pooler would hurt in this scenario, but if it does, it
could be optional.

--------------------

AFAICT there are three things to take care of before reusing a connection:

1) restore a pristine transaction state: transaction.rollback() should do;

2) reset all connection settings: the foundation was laid in #19274;

3) check if the connection is still alive, and re-open it otherwise:
- for psycopg2: "SELECT 1";
- for MySQL and Oracle: connection.ping().

Some have argued that persistent connections tie the lifetime of databases
connections to the lifetime of workers, but it's easy to store the creation
timestamp and re-open the connection if it exceeds a given max-age.

So -- did I miss something?
--
Aymeric.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Anssi Kääriäinen
2013-02-17 18:31:48 UTC
Permalink
On 17 helmi, 13:24, Aymeric Augustin
Post by Aymeric Augustin
**tl;dr** I believe that persistent database connections are a good idea.
Please prove me wrong :)
--------------------
Since I didn't know why the idea of adding a connection pooler to Django was
rejected, I did some research before replying to the cx_Oracle SessionPooling
thread.
To clarify -- we've historically been opposed to adding connection
pooling to Django is for the same reason that we don't include a web
server in Django -- the capability already exists in third party
tools, and they're in a position to do a much better job at it than us
because it's their sole focus. Django doesn't have to be the whole
stack.
All the discussions boil down to this argument, and the only ticket on the
topic is short on details:https://code.djangoproject.com/ticket/11798
--------------------
The connection pools for Django I've looked at replace "open a connection" by
"take a connection from the pool" and "close a connection" by "return the
connection to the pool". This isn't "real" connection pooling: each worker
holds a connection for the entire duration of each request, regardless of
whether it has an open transaction or not.
This requires as many connection as workers, and thus is essentially
equivalent to persistent database connections, except connections can be
rotated among workers.
Persistent connections would eliminate the overhead of creating a connection
(IIRC ~50ms/req), which is the most annoying symptom, without incurring the
complexity of a "real" pooler.
They would be a win for small and medium websites that don't manage their
database transactions manually and where the complexity of maintaining an
external connection pooler isn't justified.
Besides, when Django's transaction middelware is enabled, each request is
wrapped in a single transaction, which reserves a connection. In this case, a
connection pooler won't perform better than persistent connections.
Obviously, large websites should use an external pooler to multiplex their
hundreds of connections from workers into tens of connections to their
database and manage their transactions manually. I don't believe persistent
connections to the pooler would hurt in this scenario, but if it does, it
could be optional.
--------------------
1) restore a pristine transaction state: transaction.rollback() should do;
2) reset all connection settings: the foundation was laid in #19274;
    - for psycopg2: "SELECT 1";
    - for MySQL and Oracle: connection.ping().
Some have argued that persistent connections tie the lifetime of databases
connections to the lifetime of workers, but it's easy to store the creation
timestamp and re-open the connection if it exceeds a given max-age.
So -- did I miss something?
I am not yet convinced that poolers implemented inside Django core are
necessary. A major reason for doing #19274 was to allow somewhat easy
creation of 3rd party connection poolers.

I don't see transactional connection pooling as something that forces
including connection pools into Django. Transactional pooler
implementation should be possible outside Django. On every execute
check which connection to use. When inside transaction, then use the
connection tied to the transaction, otherwise take a free connection
from the pool and use that. The big problem is that Django's
transaction handling doesn't actually know when the connection is
inside transaction. Fix this and doing transactional poolers external
to Django will be possible. (Tying the connection to transaction
managed blocks could work, but then what to do for queries outside any
transaction managed block?).

Instead of implementing poolers inside Django would it be better to
aim for DBWrapper subclassing (as done in #19274)? The subclassing
approach has some nice properties, for example one could implement
"rewrite to prepared statements" feature (basically, some queries will
get automatically converted to use prepared statements on execution
time). This setup should result in nice speedups for some use cases.

Another implementation idea is to have the DB settings contain a
'POOLER' entry. By default this entry is empty, but when defined it
points to a class that has a (very limited) pooler API:
lend_connection(), release_connection() and close_all_connections()
(the last one is needed for test teardown). And then the connections
itself could have .reset(), .ping() and so on methods. This is simple
and should also be extensible.

It seems SQLAlchemy has a mature pooling implementation. So, yet
another approach is to see if SQLAlchemy's pooling implementation
could be reused. (Maybe in conjunction with the above 'POOLER' idea).

I also do believe that persistent database connections are a good
idea. I don't yet believe the implementation must be in Django core...

- Anssi
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Carl Meyer
2013-02-17 22:17:09 UTC
Permalink
Post by Anssi Kääriäinen
It seems SQLAlchemy has a mature pooling implementation. So, yet
another approach is to see if SQLAlchemy's pooling implementation
could be reused. (Maybe in conjunction with the above 'POOLER' idea).
There is in fact an implementation of reusing SQLAlchemy's connection
pool with Django (via DatabaseWrapper subclassing):
https://github.com/kennethreitz/django-postgrespool/

I have this in production for several months with no issues, so the
concept certainly works.

Carl
Marc Tamlyn
2013-02-18 09:39:18 UTC
Permalink
+1 to django-postgrespool it works well for me.
Post by Carl Meyer
Post by Anssi Kääriäinen
It seems SQLAlchemy has a mature pooling implementation. So, yet
another approach is to see if SQLAlchemy's pooling implementation
could be reused. (Maybe in conjunction with the above 'POOLER' idea).
There is in fact an implementation of reusing SQLAlchemy's connection
https://github.com/kennethreitz/django-postgrespool/
I have this in production for several months with no issues, so the
concept certainly works.
Carl
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Andrey Antukh
2013-02-18 14:38:00 UTC
Permalink
Also, https://github.com/niwibe/djorm-ext-pool is based initially on
postgresspool but works with sqlite and mysql.

The only difference is that uses monky patching instead a separate backend.
We also use it in production on a project a while ago and has not given us
any problems.

Andrey
Post by Carl Meyer
Post by Anssi Kääriäinen
It seems SQLAlchemy has a mature pooling implementation. So, yet
another approach is to see if SQLAlchemy's pooling implementation
could be reused. (Maybe in conjunction with the above 'POOLER' idea).
There is in fact an implementation of reusing SQLAlchemy's connection
https://github.com/kennethreitz/django-postgrespool/
I have this in production for several months with no issues, so the
concept certainly works.
Carl
--
Andrey Antukh - АМЎрей АМтух - <niwi-***@public.gmane.org>
http://www.niwi.be/about.html
http://www.kaleidos.net/A5694F/

"Linux is for people who hate Windows, BSD is for people who love UNIX"
"Social Engineer -> Because there is no patch for human stupidity"
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Aymeric Augustin
2013-02-18 12:41:24 UTC
Permalink
Post by Anssi Kääriäinen
I am not yet convinced that poolers implemented inside Django core are
necessary.
In case I wasn't clear enough — I agree that Django core shouldn't provide
connection pooling, because it'll be simplistic and hamper innovation outside
of Django.
Post by Anssi Kääriäinen
I also do believe that persistent database connections are a good
idea. I don't yet believe the implementation must be in Django core...
The goal of my proposal is to save the cost (in response time) of establishing
the database connection, in vanilla Django, without compromising reliability.
I'm trying to help people who aren't even aware of the problem :)

The implementation is quite straightforward:
https://github.com/django/django/pull/733

Everything is controlled by a per-connection max-age setting. It can be set
to a low value on rarely used connections, so they get closed quickly, and
are re-opened only when necessary.

Why would this not belong in Django core -- besides historical reasons?
--
Aymeric.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Jacob Kaplan-Moss
2013-02-18 13:47:05 UTC
Permalink
On Mon, Feb 18, 2013 at 7:41 AM, Aymeric Augustin
Post by Aymeric Augustin
The goal of my proposal is to save the cost (in response time) of establishing
the database connection, in vanilla Django, without compromising reliability.
I'm trying to help people who aren't even aware of the problem :)
https://github.com/django/django/pull/733
Everything is controlled by a per-connection max-age setting. It can be set
to a low value on rarely used connections, so they get closed quickly, and
are re-opened only when necessary.
+1 from me.

The overhead of establishing a connection isn't a big deal when you
control your own hardware and can route traffic to the database over a
gigE backplane, but many of our users don't have that luxury. Routing
on AWS seems to be particularly crappy: most of my AWS-hosted sites
show a 75-100ms overhead just to establish a connection. That can be
as much as 90% of my response time! This fix represents a cheap and
easy performance improvement. Let's get it in.

Jacob
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Carl Meyer
2013-02-18 18:02:57 UTC
Permalink
Post by Jacob Kaplan-Moss
On Mon, Feb 18, 2013 at 7:41 AM, Aymeric Augustin
Post by Aymeric Augustin
The goal of my proposal is to save the cost (in response time) of establishing
the database connection, in vanilla Django, without compromising reliability.
I'm trying to help people who aren't even aware of the problem :)
https://github.com/django/django/pull/733
Everything is controlled by a per-connection max-age setting. It can be set
to a low value on rarely used connections, so they get closed quickly, and
are re-opened only when necessary.
+1 from me.
The overhead of establishing a connection isn't a big deal when you
control your own hardware and can route traffic to the database over a
gigE backplane, but many of our users don't have that luxury. Routing
on AWS seems to be particularly crappy: most of my AWS-hosted sites
show a 75-100ms overhead just to establish a connection. That can be
as much as 90% of my response time! This fix represents a cheap and
easy performance improvement. Let's get it in.
+1 from me too. Similar experience on Heroku (which is of course also
AWS), persistent DB connections via django-postgrespool were a quick and
easy win to save a significant chunk of response time. I don't see any
reason Django shouldn't do it by default, since many users won't even
realize how much their response time is being hurt by the lack of it.

Carl
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Michael
2013-02-18 18:51:16 UTC
Permalink
I was just alerted to this by the reopening of ticket #18135.

I agree the routing on AWS is slow and represents a large amount of
connections. This is something that AWS needs to address.

I will argue, however, that the AWS use case is a reason we need this to be
an OPTION, not automatic.

Generally, in my experience, databases are the first place that require
investigations as you to start seeing slow downs as your application grows
with users. Relational databases are expensive to scale. They generally
require DBMs, massive servers, tons of memory and that is when everything
goes right. So we do not immediately jump to upgrading the database, that
would be just wrong.

Instead what we do is to look to places to cache aggressively. I think that
in the future we are going to see more and more places flattening databases
and moving to NoSQL, generally in views, but occasionally directly on the
pages with javascript. This forces less and less reliance on the relational
database.

So the future of web frameworks will have less and less reliance on a
constrained database. This is a really really good thing because AWS has
given us the ability to only purchase the hardware that we need when we
need it. That means that if one of my sites get a ton of users right now,
my application will scale the number of EC2 instances. I have confidence
that the number of queries directly to my database will not increase
directly with that load because each query is aggressively cached.

My EC2 scaling is directly in proportion to the number of users my
application has. RDS, on the other hand, scales exponentially with the
number of users that I have [1]. This is consistant with any large
application. The database just costs more.

So here is my point, requiring that an application has a consistant
connection to the database is really good for small applications, but as
the number of front facing web servers increase, the number of database
connections also increase, consuming the most expensive resource of the web
application, the database. Leaving open, "sleeping", I would argue "stale",
connections doesn't scale cost linearly with usership, instead it increases
exponentially.

This should be an option. For the small percentage of people who are
noticing the SQL connection time issues (which I would recommend they reach
out to a AWS support person about), a persistant connection makes sense.
The general Django community should not be exhausting their most expensive
resource first.




[1] http://aws.amazon.com/rds/#pricing-tabs
Post by Aymeric Augustin
Post by Jacob Kaplan-Moss
On Mon, Feb 18, 2013 at 7:41 AM, Aymeric Augustin
Post by Aymeric Augustin
The goal of my proposal is to save the cost (in response time) of
establishing
Post by Jacob Kaplan-Moss
Post by Aymeric Augustin
the database connection, in vanilla Django, without compromising
reliability.
Post by Jacob Kaplan-Moss
Post by Aymeric Augustin
I'm trying to help people who aren't even aware of the problem :)
https://github.com/django/django/pull/733
Everything is controlled by a per-connection max-age setting. It can be
set
Post by Jacob Kaplan-Moss
Post by Aymeric Augustin
to a low value on rarely used connections, so they get closed quickly,
and
Post by Jacob Kaplan-Moss
Post by Aymeric Augustin
are re-opened only when necessary.
+1 from me.
The overhead of establishing a connection isn't a big deal when you
control your own hardware and can route traffic to the database over a
gigE backplane, but many of our users don't have that luxury. Routing
on AWS seems to be particularly crappy: most of my AWS-hosted sites
show a 75-100ms overhead just to establish a connection. That can be
as much as 90% of my response time! This fix represents a cheap and
easy performance improvement. Let's get it in.
+1 from me too. Similar experience on Heroku (which is of course also
AWS), persistent DB connections via django-postgrespool were a quick and
easy win to save a significant chunk of response time. I don't see any
reason Django shouldn't do it by default, since many users won't even
realize how much their response time is being hurt by the lack of it.
Carl
--
You received this message because you are subscribed to the Google Groups
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/django-developers?hl=en
.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Jacob Kaplan-Moss
2013-02-18 19:03:56 UTC
Permalink
Post by Michael
I will argue, however, that the AWS use case is a reason we need this to be
an OPTION, not automatic.
Check out Aymeric's patch; it is indeed an option triggered by
DATABASES[CONN_MAX_AGE].

Jacob
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Michael
2013-02-18 19:17:37 UTC
Permalink
Post by Jacob Kaplan-Moss
Post by Michael
I will argue, however, that the AWS use case is a reason we need this to
be
Post by Michael
an OPTION, not automatic.
Check out Aymeric's patch; it is indeed an option triggered by
DATABASES[CONN_MAX_AGE].
Unless I am reading it wrong, only on the next database query. So if I set
the connection to 0 and don't have a connection to the database after the
validation on start, the connections still are sleeping. I really never
want to hang onto a DB connection.

Am I reading that wrong? I will get an instance fired up sometime this week
to test this out.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Carl Meyer
2013-02-18 19:21:11 UTC
Permalink
Post by Michael
Post by Michael
I will argue, however, that the AWS use case is a reason we need
this to be
Post by Michael
an OPTION, not automatic.
Check out Aymeric's patch; it is indeed an option triggered by
DATABASES[CONN_MAX_AGE].
Unless I am reading it wrong, only on the next database query. So if I
set the connection to 0 and don't have a connection to the database
after the validation on start, the connections still are sleeping. I
really never want to hang onto a DB connection.
No, it's not on the next DB query, it's at the request_finished signal
(which is the same point when Django will currently close your
connection for you). So the effect with CONN_MAX_AGE=0 is really not
distinguishable from what you get currently.

Carl
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Michael
2013-02-18 19:27:33 UTC
Permalink
Post by Carl Meyer
Post by Michael
Post by Michael
I will argue, however, that the AWS use case is a reason we need
this to be
Post by Michael
an OPTION, not automatic.
Check out Aymeric's patch; it is indeed an option triggered by
DATABASES[CONN_MAX_AGE].
Unless I am reading it wrong, only on the next database query. So if I
set the connection to 0 and don't have a connection to the database
after the validation on start, the connections still are sleeping. I
really never want to hang onto a DB connection.
No, it's not on the next DB query, it's at the request_finished signal
(which is the same point when Django will currently close your
connection for you). So the effect with CONN_MAX_AGE=0 is really not
distinguishable from what you get currently.
Carl
Thanks, Carl, I had missed that.
Then I have no objection. I will still take a deeper dive this week, let
you know how it goes.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Aymeric Augustin
2013-02-18 19:39:48 UTC
Permalink
Post by Michael
I was just alerted to this by the reopening of ticket #18135.
My initial analysis is incomplete. I'll comment on the ticket in a few minutes.
Post by Michael
I will argue, however, that the AWS use case is a reason we need this to be an OPTION, not automatic.
Like Jacob said, persistent connections can be disabled entirely by settings CONN_MAX_AGE = 0.
Post by Michael
So here is my point, requiring that an application has a consistant connection to the database is really good for small applications, but as the number of front facing web servers increase, the number of database connections also increase, consuming the most expensive resource of the web application, the database. Leaving open, "sleeping", I would argue "stale", connections doesn't scale cost linearly with usership, instead it increases exponentially.
If the majority of your views do not hit a given database, I expect that you'll set a low CONN_MAX_AGE for this database.
Post by Michael
For the small percentage of people who are noticing the SQL connection time issues (which I would recommend they reach out to a AWS support person about), a persistant connection makes sense.
Most Django sites suffer from the database connection establishment delay — actually, all except those that use a connection pooler. AWS is the most egregious example, but it isn't the primary motivation for this patch.

On a random server running both Django and PostgreSQL locally, I measured an overhead of 7ms. If PostgreSQL was running on another machine, the overhead would be more important. If one is aiming for 200ms response times, connection establishment can easily eat 5 to 10% of the processing time.
--
Aymeric.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Aymeric Augustin
2013-02-18 21:27:38 UTC
Permalink
This fix represents a cheap and easy performance improvement. Let's get it in.
After working on a few iterations of the patch, I have identified two problems
worth discussing.


Problem #1: Is it worth re-executing the connection setup at the beginning of
every request?

The connection setup varies widely between backends:
- SQLite: none
- PostgreSQL: https://github.com/django/django/blob/master/django/db/backends/postgresql_psycopg2/base.py#L185-L205
- MySQL: https://github.com/django/django/blob/master/django/db/backends/mysql/base.py#L445-L452
- Oracle: https://github.com/django/django/blob/master/django/db/backends/oracle/base.py#L549-L599

The current version of the patch repeats it every time. In theory, this isn't
necessary. Doing it only once would be more simple.

It could be backwards incompatible, for instance, if a developer changes the
connection's time zone. But we can document to set CONN_MAX_AGE = 0 to restore
the previous behavior in such cases.


Problem #2: How can Django handle situations where the connection to the
database is lost?

Currently, with MySQL, Django pings the database server every time it creates
a cursor, and reconnects if that fails. This isn't a good practice and this
behavior should be removed: https://code.djangoproject.com/ticket/15119

Other backends don't have an equivalent behavior. If a connection was opened,
Django assume it works. Worse, if the connection breaks, Django fails to close
it, and keeps the broken connection instead of opening a new one:
https://code.djangoproject.com/ticket/15802

Thus, persistent connections can't make things worse :) but it'd be nice to
improve Django in this area, consistently across all backends.

I have considered four possibilities:

(1) Do nothing. At worst, the connection will be closed after max_age and then
reopened. The worker will return 500s in the meantime. This is the current
implementation.

(2) "Ping" the connection at the beginning of each request, and if it doesn't
work, re-open it. As explained above, this isn't a good practice. Note
that if Django repeats the connection setup before each request, it can
take this opportunity to check that the connection works and reconnect
otherwise. But I'm not convinced I should keep this behavior.

(3) Catch database exceptions in all appropriate places, and if the exception
says that the connection is broken, reconnect. In theory this is the best
solution, but it's complicated to implement. I haven't found a conclusive
way to identify error conditions that warrant a reconnection.

(4) Close all database connections when a request returns a 500. It's a bad
idea because it ties the HTTP and database layers. It could also hide
problems.


Ideas?
--
Aymeric.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Karen Tracey
2013-02-18 22:30:12 UTC
Permalink
Just a couple of random quick thoughts:

Will persistent connections be able to ensure that "bad connections" (e.g.
those in state "current transaction is aborted, commands ignored until end
of transaction block") don't leak from one request to another?

MySQL, at least, will close a connection from the server side if it is idle
for too long (no idea if that is configurable, I've never had reason to
investigate...have just noticed the behavior). In the pull request it looks
like we only check for the need to close at the end of a request cycle? For
a very low activity site it seems those two behaviors could interact badly,
with the request cycle finishing code deciding to keep the connection
around since it's not too old....long delay until next request while MySQL
server closes what it sees as an idle connection...next request tries to
use a connection that the server has closed?

Karen
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Alex Gaynor
2013-02-18 22:33:28 UTC
Permalink
As far as I know, the MySQL timeout time is configurable, and the default
is 8 hours.

Alex
Post by Karen Tracey
Will persistent connections be able to ensure that "bad connections" (e.g.
those in state "current transaction is aborted, commands ignored until end
of transaction block") don't leak from one request to another?
MySQL, at least, will close a connection from the server side if it is
idle for too long (no idea if that is configurable, I've never had reason
to investigate...have just noticed the behavior). In the pull request it
looks like we only check for the need to close at the end of a request
cycle? For a very low activity site it seems those two behaviors could
interact badly, with the request cycle finishing code deciding to keep the
connection around since it's not too old....long delay until next request
while MySQL server closes what it sees as an idle connection...next request
tries to use a connection that the server has closed?
Karen
--
You received this message because you are subscribed to the Google Groups
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/django-developers?hl=en
.
For more options, visit https://groups.google.com/groups/opt_out.
--
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Karen Tracey
2013-02-18 22:41:38 UTC
Permalink
Post by Alex Gaynor
As far as I know, the MySQL timeout time is configurable, and the default
is 8 hours.
Assuming this is what does:
http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_wait_timeoutthat
sounds right. I guess I have just managed to encounter a fair number
of servers who lowered that number. (I've mainly seen it from mysql shell
sessions, where I get a message about the server having gone away and the
connection being re-established if I've waited "too long" between
commands...pretty sure "too long" has never been anywhere near 8 hours).

Karen
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Aymeric Augustin
2013-02-18 22:51:47 UTC
Permalink
Will persistent connections be able to ensure that "bad connections" (e.g. those in state "current transaction is aborted, commands ignored until end of transaction block") don't leak from one request to another?
I'm calling connection.abort() at the end of each request to avoid this. If there are bugs in connection.abort(), I'll try to fix them.
MySQL, at least, will close a connection from the server side if it is idle for too long (no idea if that is configurable, I've never had reason to investigate...have just noticed the behavior). In the pull request it looks like we only check for the need to close at the end of a request cycle? For a very low activity site it seems those two behaviors could interact badly, with the request cycle finishing code deciding to keep the connection around since it's not too old....long delay until next request while MySQL server closes what it sees as an idle connection...next request tries to use a connection that the server has closed?
Indeed, that could be a problem.

To mitigate it, I could:
- check the connection's max age before each request, and reopen it if it's outdated (in addition to the current check after each request, required to emulate the current behavior);
- implement a "ping" before each request — but I'd really prefer to avoid this overhead;
- disable persistent connections by default :(
--
Aymeric.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Carl Meyer
2013-02-19 00:31:01 UTC
Permalink
Post by Aymeric Augustin
Problem #1: Is it worth re-executing the connection setup at the beginning of
every request?
- SQLite: none
- PostgreSQL: https://github.com/django/django/blob/master/django/db/backends/postgresql_psycopg2/base.py#L185-L205
- MySQL: https://github.com/django/django/blob/master/django/db/backends/mysql/base.py#L445-L452
- Oracle: https://github.com/django/django/blob/master/django/db/backends/oracle/base.py#L549-L599
The current version of the patch repeats it every time. In theory, this isn't
necessary. Doing it only once would be more simple.
It could be backwards incompatible, for instance, if a developer changes the
connection's time zone. But we can document to set CONN_MAX_AGE = 0 to restore
the previous behavior in such cases.
It seems silly to re-run queries per-request that were only ever
intended to be per-connection setup. So I favor the latter. I think this
change will require prominent discussion in the
potentially-backwards-incompatible section of the release notes regardless.

(This could form an argument for keeping the built-in default 0, and
just setting it to a higher number in the project template. But I don't
like polluting the default project template, and I think a majority of
existing upgrading projects will benefit from this without problems, so
I don't actually want to do that.)
Post by Aymeric Augustin
Problem #2: How can Django handle situations where the connection to the
database is lost?
Currently, with MySQL, Django pings the database server every time it creates
a cursor, and reconnects if that fails. This isn't a good practice and this
behavior should be removed: https://code.djangoproject.com/ticket/15119
Other backends don't have an equivalent behavior. If a connection was opened,
Django assume it works. Worse, if the connection breaks, Django fails to close
https://code.djangoproject.com/ticket/15802
Thus, persistent connections can't make things worse :) but it'd be nice to
improve Django in this area, consistently across all backends.
(1) Do nothing. At worst, the connection will be closed after max_age and then
reopened. The worker will return 500s in the meantime. This is the current
implementation.
(2) "Ping" the connection at the beginning of each request, and if it doesn't
work, re-open it. As explained above, this isn't a good practice. Note
that if Django repeats the connection setup before each request, it can
take this opportunity to check that the connection works and reconnect
otherwise. But I'm not convinced I should keep this behavior.
(3) Catch database exceptions in all appropriate places, and if the exception
says that the connection is broken, reconnect. In theory this is the best
solution, but it's complicated to implement. I haven't found a conclusive
way to identify error conditions that warrant a reconnection.
(4) Close all database connections when a request returns a 500. It's a bad
idea because it ties the HTTP and database layers. It could also hide
problems.
I'd be inclined to go for (1), with the intent of moving gradually
towards (3) as specific detectable error conditions that happen in real
life and do warrant closing the connection and opening a new one are
brought to our attention. Unfortunately handling those cases is likely
to require parsing error messages, as pretty much anything related to
DBAPI drivers and error conditions does :/

I tried to dig for the origins of the current MySQL behavior to see if
that would illuminate such a case, but that code goes way back into the
mists of ancient history (specifically, merger of the magic-removal
branch), beyond which the gaze of "git annotate" cannot penetrate.

Option (4) is very bad IMO, and (2) is not much better.

Carl
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Mario Briggs
2013-02-19 05:46:13 UTC
Permalink
Just FYI, in the DB2 backend for Django, we have exposed a custom
database property PCONNECT and have defaulted it to true for a long
time now. What this does is that it uses persistent connections that
is supported by the DB2 Python driver (http://code.google.com/p/ibm-db/
wiki/APIs#ibm_db.pconnect), rather than opening a new fresh connection
each time. One point to note with this approach is that it cannot be
used with multi-threaded web servers and we did have a situation with
a customer using this with Apache on Windows (http://groups.google.com/
group/ibm_db/browse_thread/thread/eab320359f0d16d8?hl=en_US)... , so i
am curious how the above approach deals with the multi-threaded
scenario

regards
Mario Briggs
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Anssi Kääriäinen
2013-02-19 10:04:28 UTC
Permalink
Post by Carl Meyer
Post by Aymeric Augustin
Problem #1: Is it worth re-executing the connection setup at the beginning of
every request?
- SQLite: none
- PostgreSQL:https://github.com/django/django/blob/master/django/db/backends/postg...
- MySQL:https://github.com/django/django/blob/master/django/db/backends/mysql...
- Oracle:https://github.com/django/django/blob/master/django/db/backends/oracl...
The current version of the patch repeats it every time. In theory, this isn't
necessary. Doing it only once would be more simple.
It could be backwards incompatible, for instance, if a developer changes the
connection's time zone. But we can document to set CONN_MAX_AGE = 0 to restore
the previous behavior in such cases.
It seems silly to re-run queries per-request that were only ever
intended to be per-connection setup. So I favor the latter. I think this
change will require prominent discussion in the
potentially-backwards-incompatible section of the release notes regardless.
(This could form an argument for keeping the built-in default 0, and
just setting it to a higher number in the project template. But I don't
like polluting the default project template, and I think a majority of
existing upgrading projects will benefit from this without problems, so
I don't actually want to do that.)
Maybe we need another setting for what to do in request.start. It does
seem somewhat likely that users could do SET SEARCH_PATH in middleware
to support multitenant setups for example, and expect that set to go
away when connection is closed after the request. Any other SET is a
likely candidate for problems in PostgreSQL, and I am sure other DBs
have their equivalent of session state, too. (In this case doing RESET
ALL; run connection setup again is the right thing to do in
PostgreSQL).

It would be nice to support this use case, but just documenting this
change clearly in the release notes, and point out that if you have
such requirements, then set max_age to 0. More features can always be
added later on.
Post by Carl Meyer
Post by Aymeric Augustin
Problem #2: How can Django handle situations where the connection to the
database is lost?
Currently, with MySQL, Django pings the database server every time it creates
a cursor, and reconnects if that fails. This isn't a good practice and this
behavior should be removed:https://code.djangoproject.com/ticket/15119
Other backends don't have an equivalent behavior. If a connection was opened,
Django assume it works. Worse, if the connection breaks, Django fails to close
https://code.djangoproject.com/ticket/15802
Thus, persistent connections can't make things worse :) but it'd be nice to
improve Django in this area, consistently across all backends.
(1) Do nothing. At worst, the connection will be closed after max_age and then
    reopened. The worker will return 500s in the meantime. This is the current
    implementation.
(2) "Ping" the connection at the beginning of each request, and if it doesn't
    work, re-open it. As explained above, this isn't a good practice. Note
    that if Django repeats the connection setup before each request, it can
    take this opportunity to check that the connection works and reconnect
    otherwise. But I'm not convinced I should keep this behavior.
(3) Catch database exceptions in all appropriate places, and if the exception
    says that the connection is broken, reconnect. In theory this is the best
    solution, but it's complicated to implement. I haven't found a conclusive
    way to identify error conditions that warrant a reconnection.
(4) Close all database connections when a request returns a 500. It's a bad
    idea because it ties the HTTP and database layers. It could also hide
    problems.
I'd be inclined to go for (1), with the intent of moving gradually
towards (3) as specific detectable error conditions that happen in real
life and do warrant closing the connection and opening a new one are
brought to our attention. Unfortunately handling those cases is likely
to require parsing error messages, as pretty much anything related to
DBAPI drivers and error conditions does :/
I tried to dig for the origins of the current MySQL behavior to see if
that would illuminate such a case, but that code goes way back into the
mists of ancient history (specifically, merger of the magic-removal
branch), beyond which the gaze of "git annotate" cannot penetrate.
Option (4) is very bad IMO, and (2) is not much better.
I hope this discussion is about what to do at request finish/start
time.

I am very strongly opposed to anything where Django suddenly changes
connections underneath you. At request finish/start this is OK (you
should expect new connections then anyways), but otherwise if you get
broken connection, it isn't Django's business to swap the connection
underneath you. There is a reasonable expectation that while you are
using single connections[alias] in a script for example, you can
expect the underlying connection to be the same for the whole time.
Otherwise SET somevar in postgresql could break for example.

Now, one could argument that SET somevar should not be used with
Django's connections. But this is very, very limiting for some real
world use cases (multitenant and SET SEARCH_PATH for one). There is no
way to actually force such a limitation, so the limitation would be
documentation only. In addition the result is that very rarely you get
a weird (potentially data corrupting) problem because your connection
was swapped at the wrong moment. Nearly impossible to debug
(especially if this is not logged either).

If the connection swapping is still wanted, then there must at least
be a way to tell Django that do NOT swap connections unless told to do
so.

I think a good approach would be to mark the connection potentially
broken on errors in queries, and then in request_finished check for
this potentially broken flag. If flag set, then and only then run
ping() / select 1. So, this is a slight modification of no. 3 where
one can mark the connection potentially broken liberally, but the
connection is swapped only when the ping fails, and only in
request_finished. For most requests there should be no overhead as
errors in queries are rare.

BTW the remark above in Aymeric's post that persistent connections
can't make things worse: I don't believe this. Persistent connections
will keep the broken connection from request to request, and at least
on PostgreSQL a broken connection is correctly closed in request
finish.

- Anssi
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Aymeric Augustin
2013-02-19 10:20:53 UTC
Permalink
Post by Anssi Kääriäinen
Maybe we need another setting for what to do in request.start. It does
seem somewhat likely that users could do SET SEARCH_PATH in middleware
to support multitenant setups for example, and expect that set to go
away when connection is closed after the request. Any other SET is a
likely candidate for problems in PostgreSQL, and I am sure other DBs
have their equivalent of session state, too. (In this case doing RESET
ALL; run connection setup again is the right thing to do in
PostgreSQL).
It would be nice to support this use case, but just documenting this
change clearly in the release notes, and point out that if you have
such requirements, then set max_age to 0. More features can always be
added later on.
Yes, the short term solution in this case is to set max_age to 0.

I think the long term solution is to modify such middleware to revert its
effects in process_response() — or reset everything in process_request().

I agree with Carl that I should document this thoroughly.
Post by Anssi Kääriäinen
(…)
I hope this discussion is about what to do at request finish/start
time.
You're right, it's very important not to fiddle with connections during
request processing.
Post by Anssi Kääriäinen
I am very strongly opposed to anything where Django suddenly changes
connections underneath you. At request finish/start this is OK (you
should expect new connections then anyways), but otherwise if you get
broken connection, it isn't Django's business to swap the connection
underneath you. There is a reasonable expectation that while you are
using single connections[alias] in a script for example, you can
expect the underlying connection to be the same for the whole time.
Otherwise SET somevar in postgresql could break for example.
Even worse, you could break transactional integrity! See this comment:
https://code.djangoproject.com/ticket/15119#comment:9
Post by Anssi Kääriäinen
I think a good approach would be to mark the connection potentially
broken on errors in queries, and then in request_finished check for
this potentially broken flag. If flag set, then and only then run
ping() / select 1. So, this is a slight modification of no. 3 where
one can mark the connection potentially broken liberally, but the
connection is swapped only when the ping fails, and only in
request_finished. For most requests there should be no overhead as
errors in queries are rare.
This is an interesting idea. Django already catches database exceptions in
execute() to re-raise them. We could simply set the flag there.
Post by Anssi Kääriäinen
BTW the remark above in Aymeric's post that persistent connections
can't make things worse: I don't believe this. Persistent connections
will keep the broken connection from request to request, and at least
on PostgreSQL a broken connection is correctly closed in request
finish.
https://code.djangoproject.com/ticket/15802 says it isn't — although I haven't
confirmed that this bug still exists.
--
Aymeric.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Anssi Kääriäinen
2013-02-19 10:37:52 UTC
Permalink
On 19 helmi, 12:20, Aymeric Augustin
Post by Aymeric Augustin
Post by Anssi Kääriäinen
Maybe we need another setting for what to do in request.start. It does
seem somewhat likely that users could do SET SEARCH_PATH in middleware
to support multitenant setups for example, and expect that set to go
away when connection is closed after the request. Any other SET is a
likely candidate for problems in PostgreSQL, and I am sure other DBs
have their equivalent of session state, too. (In this case doing RESET
ALL; run connection setup again is the right thing to do in
PostgreSQL).
It would be nice to support this use case, but just documenting this
change clearly in the release notes, and point out that if you have
such requirements, then set max_age to 0. More features can always be
added later on.
Yes, the short term solution in this case is to set max_age to 0.
I think the long term solution is to modify such middleware to revert its
effects in process_response() — or reset everything in process_request().
I agree with Carl that I should document this thoroughly.
Yeah, just doing the reset manually in middleware should be good
enough.
Post by Aymeric Augustin
Post by Anssi Kääriäinen
I think a good approach would be to mark the connection potentially
broken on errors in queries, and then in request_finished check for
this potentially broken flag. If flag set, then and only then run
ping() / select 1. So, this is a slight modification of no. 3 where
one can mark the connection potentially broken liberally, but the
connection is swapped only when the ping fails, and only in
request_finished. For most requests there should be no overhead as
errors in queries are rare.
This is an interesting idea. Django already catches database exceptions in
execute() to re-raise them. We could simply set the flag there.
Post by Anssi Kääriäinen
BTW the remark above in Aymeric's post that persistent connections
can't make things worse: I don't believe this. Persistent connections
will keep the broken connection from request to request, and at least
on PostgreSQL a broken connection is correctly closed in request
finish.
https://code.djangoproject.com/ticket/15802says it isn't — although I haven't
confirmed that this bug still exists.
I just tried this and broken connections are closed correctly at least
in master. First paragraph of comment https://code.djangoproject.com/ticket/15802#comment:20
says this is already fixed in 1.4. The open part of the ticket seems
to be about 1.3 (which is not going to be fixed anyways) or automatic
connection swapping (which is dangerous). So maybe it is time to
reclose the ticket?

- Anssi
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Carl Meyer
2013-02-19 18:31:35 UTC
Permalink
Post by Anssi Kääriäinen
I hope this discussion is about what to do at request finish/start
time.
I am very strongly opposed to anything where Django suddenly changes
connections underneath you. At request finish/start this is OK (you
should expect new connections then anyways), but otherwise if you get
broken connection, it isn't Django's business to swap the connection
underneath you.
I agree completely. I assumed we were talking about closing the
connection at request-finished time, which would cause a new one to be
opened on the next request.
Post by Anssi Kääriäinen
I think a good approach would be to mark the connection potentially
broken on errors in queries, and then in request_finished check for
this potentially broken flag. If flag set, then and only then run
ping() / select 1. So, this is a slight modification of no. 3 where
one can mark the connection potentially broken liberally, but the
connection is swapped only when the ping fails, and only in
request_finished. For most requests there should be no overhead as
errors in queries are rare.
I like this idea. The ping is a minimal cost if you only pay it when
there is a query error during the request, and it's more reliable than
trying to parse error messages to figure out which ones require closing
the connection.

I think we should exclude some classes of errors that we already
distinguish and know they don't mean the connection is broken (e.g.
IntegrityError). If some codebase makes heavy use of algorithms like
get_or_create that rely on a correctly-handled IntegrityError, they
shouldn't need to pay the cost of the ping more frequently.

Carl
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Eric Florenzano
2013-02-20 05:52:09 UTC
Permalink
One question: does this proposal include some way of specifying a maxiumum
number of outstanding connections to the database from a single process?
Looked but didn't see it in the pull request. In my experience, most
PostgreSQL instances don't have a whole lot of breathing room in terms of
the number of total outstanding connections they can handle due to low
shared memory defaults etc. This is something that django-postgrespool does
by way of SQLAlchemy's pool_size and max_overflow settings.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Alex Gaynor
2013-02-20 06:11:35 UTC
Permalink
I don't really see what point such an option would serve. This is *not a
connection pool*. Currently Django uses one open connection per thread,
this simple keeps them open between requests. That means before: you needed
O(concurrent threads) connections, now you need O(total active threads)
connections. If your database can't handle that many connections, why do
you have that many threads running, peak load would already be making it
fall over.

Alex
Post by Eric Florenzano
One question: does this proposal include some way of specifying a maxiumum
number of outstanding connections to the database from a single process?
Looked but didn't see it in the pull request. In my experience, most
PostgreSQL instances don't have a whole lot of breathing room in terms of
the number of total outstanding connections they can handle due to low
shared memory defaults etc. This is something that django-postgrespool does
by way of SQLAlchemy's pool_size and max_overflow settings.
--
You received this message because you are subscribed to the Google Groups
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at http://groups.google.com/group/django-developers?hl=en
.
For more options, visit https://groups.google.com/groups/opt_out.
--
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Shai Berger
2013-02-21 23:42:19 UTC
Permalink
X-Received: by 10.49.75.9 with SMTP id y9mr2228950qev.9.1361490244004;
Thu, 21 Feb 2013 15:44:04 -0800 (PST)
X-BeenThere: django-developers-/***@public.gmane.org
Received: by 10.49.127.142 with SMTP id ng14ls18257qeb.61.gmail; Thu, 21 Feb
2013 15:43:52 -0800 (PST)
X-Received: by 10.236.170.132 with SMTP id p4mr12528952yhl.26.1361490145708;
Thu, 21 Feb 2013 15:42:25 -0800 (PST)
X-Received: by 10.236.170.132 with SMTP id p4mr12528951yhl.26.1361490145696;
Thu, 21 Feb 2013 15:42:25 -0800 (PST)
Received: from smtp205.dfw.emailsrvr.com (smtp205.dfw.emailsrvr.com. [67.192.241.205])
by gmr-mx.google.com with ESMTPS id f47si44256yhg.4.2013.02.21.15.42.25
(version=TLSv1 cipher=RC4-SHA bits=128/128);
Thu, 21 Feb 2013 15:42:25 -0800 (PST)
Received-SPF: pass (google.com: domain of shai-9mtU2oJkgntWk0Htik3J/***@public.gmane.org designates 67.192.241.205 as permitted sender) client-ip=67.192.241.205;
Received: from localhost (localhost.localdomain [127.0.0.1])
by smtp10.relay.dfw1a.emailsrvr.com (SMTP Server) with ESMTP id 2CED01B8434;
Thu, 21 Feb 2013 18:42:25 -0500 (EST)
X-Virus-Scanned: OK
Received: by smtp10.relay.dfw1a.emailsrvr.com (Authenticated sender: shai-AT-platonix.com) with ESMTPSA id EC5421B83F7;
Thu, 21 Feb 2013 18:42:23 -0500 (EST)
User-Agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; )
In-Reply-To: <EE412DA3-0571-49DD-9F97-426FDE60CBAD-o/5/jSaJEHk+NdeTPqioyti2O/***@public.gmane.org>
X-Shai-Sent-From: Home
X-Original-Sender: shai-9mtU2oJkgntWk0Htik3J/***@public.gmane.org
X-Original-Authentication-Results: gmr-mx.google.com; spf=pass
(google.com: domain of shai-9mtU2oJkgntWk0Htik3J/***@public.gmane.org designates 67.192.241.205 as
permitted sender) smtp.mail=shai-9mtU2oJkgntWk0Htik3J/***@public.gmane.org
Precedence: list
Mailing-list: list django-developers-/***@public.gmane.org; contact django-developers+owners-/***@public.gmane.org
List-ID: <django-developers.googlegroups.com>
X-Google-Group-Id: 1042074780459
List-Post: <http://groups.google.com/group/django-developers/post?hl=en_US>, <mailto:django-developers-/***@public.gmane.org>
List-Help: <http://groups.google.com/support/?hl=en_US>, <mailto:django-developers+help-/***@public.gmane.org>
List-Archive: <http://groups.google.com/group/django-developers?hl=en_US>
Sender: django-developers-/***@public.gmane.org
List-Subscribe: <http://groups.google.com/group/django-developers/subscribe?hl=en_US>,
<mailto:django-developers+subscribe-/***@public.gmane.org>
List-Unsubscribe: <http://groups.google.com/group/django-developers/subscribe?hl=en_US>,
<mailto:googlegroups-manage+1042074780459+unsubscribe-/***@public.gmane.org>
Archived-At: <http://permalink.gmane.org/gmane.comp.python.django.devel/37874>

Hi,
Post by Aymeric Augustin
**tl;dr** I believe that persistent database connections are a good idea.
Please prove me wrong :)
[...]
So -- did I miss something?
I think you haven't -- but several points came to mind, which weren't
discussed in this thread. As far as I can tell, none of them are problems, but
*I* may be missing something, so I thought I should raise them here.

First -- the very mention of this topic reminded me of
https://code.djangoproject.com/ticket/9964, which was my pet bug for about two
years. The point of that bug, though, was to make sure that transactions are
properly closed before (and regardless if) the connection is closed, so while
related, it should be unaffected.

Second -- Tasks out of requests. Core developers participating in this
discussion already have this in mind, Aymeric has commented about it in
https://code.djangoproject.com/ticket/17887, but it was left out of the
discussion on the thread. The suggested changes, AFAICT, modify the behavior
around the end of requests only -- for tasks, nobody closed the connection
before, and nobody is going to do anything different now; so that's not
"something missed" either.

Third -- different use patterns of multi-db. AFAICT the change treats all
connections as equals, so no new problems should arise, but I can't remove a
gnawing suspicion that some interaction may pop up. In particular, I'm worried
because not all such patterns are automatically tested; I know my pet pattern
(two aliases for one db, for different transaction behavior -- for logging into
db, which shouldn't be reverted when the main transaction is rolled back) has
had problems unnoticed by others (see e.g.
https://code.djangoproject.com/ticket/14415).

And last (and probably least) -- coordinated distributed transactions and two-
phase-commit are not really supported by Django, and if you want them you need
to write backend-specific code (probably a custom backend -- I haven't had the
[mis]fortune to need to do this yet). I suspect such code would be affected,
I'm not sure if it becomes easier or harder to write. I know the issue is
mostly theoretical, and even if there is a problem with it, that's not a
blocker; still, I thought it was worth a mention.

Thanks,
Shai.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Aymeric Augustin
2013-02-22 09:21:14 UTC
Permalink
Hi Shai,

Thanks for compiling this list ! My answers below.
Post by Shai Berger
First -- the very mention of this topic reminded me of
https://code.djangoproject.com/ticket/9964, which was my pet bug for about two
years. The point of that bug, though, was to make sure that transactions are
properly closed before (and regardless if) the connection is closed, so while
related, it should be unaffected.
If anything, persistent connections will *require* proper transaction termination
at the end of each request cycle.
Post by Shai Berger
Second -- Tasks out of requests. Core developers participating in this
discussion already have this in mind, Aymeric has commented about it in
https://code.djangoproject.com/ticket/17887, but it was left out of the
discussion on the thread. The suggested changes, AFAICT, modify the behavior
around the end of requests only -- for tasks, nobody closed the connection
before, and nobody is going to do anything different now; so that's not
"something missed" either.
Yes, this patch must not affect behavior outside of the request handler.
Post by Shai Berger
Third -- different use patterns of multi-db. AFAICT the change treats all
connections as equals, so no new problems should arise, but I can't remove a
gnawing suspicion that some interaction may pop up. In particular, I'm worried
because not all such patterns are automatically tested; I know my pet pattern
(two aliases for one db, for different transaction behavior -- for logging into
db, which shouldn't be reverted when the main transaction is rolled back) has
had problems unnoticed by others (see e.g.
https://code.djangoproject.com/ticket/14415).
The problem described in that ticket occurred because of the invalid
assumption that each connection was to a different database.

As far as I can tell I'm not making this assumption anywhere. Transaction state
is managed per-connection, and the representation of a connection in Django
is very straightforward.
Post by Shai Berger
And last (and probably least) -- coordinated distributed transactions and two-
phase-commit are not really supported by Django, and if you want them you need
to write backend-specific code (probably a custom backend -- I haven't had the
[mis]fortune to need to do this yet). I suspect such code would be affected,
I'm not sure if it becomes easier or harder to write. I know the issue is
mostly theoretical, and even if there is a problem with it, that's not a
blocker; still, I thought it was worth a mention.
I've never thought about this and I'm not sure what the interactions are. At worst,
you can restore the previous behavior by setting the connection's max age to zero.
--
Aymeric.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Anssi Kääriäinen
2013-02-22 11:09:26 UTC
Permalink
On 22 helmi, 11:21, Aymeric Augustin
Post by Aymeric Augustin
Post by Shai Berger
Second -- Tasks out of requests. Core developers participating in this
discussion already have this in mind, Aymeric has commented about it in
https://code.djangoproject.com/ticket/17887, but it was left out of the
discussion on the thread. The suggested changes, AFAICT, modify the behavior
around the end of requests only -- for tasks, nobody closed the connection
before, and nobody is going to do anything different now; so that's not
"something missed" either.
Yes, this patch must not affect behavior outside of the request handler.
The way connections are handled in requests could be useful for task
queues, too. The tasks should be separated from each other, so the
wanted properties are:
- Make sure you have fresh connection state (transaction state for
example) per task
- Make sure that if a connection is closed inside some task, the
connection will be reopened for the next task.
- For performance reasons always closing connections after each task
isn't wanted.

This is exactly what is wanted in request processing, too.

For example django-celery does already have their version of
persistent connections, see: https://github.com/celery/django-celery/blob/master/djcelery/loaders.py#L84

So, persistent connections will be usable outside request processing
as long as they are opt-in. Supporting persistent connections outside
request processing isn't needed for the initial implementation, of
course.

- Anssi
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
David Cramer
2013-02-28 19:09:40 UTC
Permalink
Can we please change this so it defaults to off, and just document how to
turn it on and in what situations you should turn it on?

In my opinion this default-on feature caters to a very specific audience,
and will cause a lot of unexpected behavior with other users.

Here is the tl;dr of an argument for turning it on:

Connections are expensive, we should persist them.

Here is the tl;dr for turning it off:

Connections are expensive, so we dont have a lot of them.

Immediately for anyone who has configured more workers than they have
Postgres connections (which I can only imagine is common among people who
havent setup infrastructure like pgbouncer) things will start blowing up.

Why should this be the default behavior?

Unlike MySQL, which already has cheap connections and doesn't suffer this
problem, connections in Postgres add quite a large cost, both on creation,
on allowing them. **Everyone** who cares about their performance on
Postgres should already be using pgbouncer (or some alternative) to offset
**both** of these, not just a single problem which is what Django is
addressing here.

I will happily defend my opinion in person at PyCon in two weeks if it
still remains a default, and anyone is willing to listen, and if you reach
out to the other large (and I dont mean scale) Django users I imagine you
will find a lot of mixed feelings about this default.
Post by Aymeric Augustin
**tl;dr** I believe that persistent database connections are a good idea.
Please prove me wrong :)
--------------------
Since I didn't know why the idea of adding a connection pooler to Django was
rejected, I did some research before replying to the cx_Oracle
SessionPooling
thread.
To clarify -- we've historically been opposed to adding connection
pooling to Django is for the same reason that we don't include a web
server in Django -- the capability already exists in third party
tools, and they're in a position to do a much better job at it than us
because it's their sole focus. Django doesn't have to be the whole
stack.
All the discussions boil down to this argument, and the only ticket on the
topic is short on details: https://code.djangoproject.com/ticket/11798
--------------------
The connection pools for Django I've looked at replace "open a connection" by
"take a connection from the pool" and "close a connection" by "return the
connection to the pool". This isn't "real" connection pooling: each worker
holds a connection for the entire duration of each request, regardless of
whether it has an open transaction or not.
This requires as many connection as workers, and thus is essentially
equivalent to persistent database connections, except connections can be
rotated among workers.
Persistent connections would eliminate the overhead of creating a connection
(IIRC ~50ms/req), which is the most annoying symptom, without incurring the
complexity of a "real" pooler.
They would be a win for small and medium websites that don't manage their
database transactions manually and where the complexity of maintaining an
external connection pooler isn't justified.
Besides, when Django's transaction middelware is enabled, each request is
wrapped in a single transaction, which reserves a connection. In this case, a
connection pooler won't perform better than persistent connections.
Obviously, large websites should use an external pooler to multiplex their
hundreds of connections from workers into tens of connections to their
database and manage their transactions manually. I don't believe persistent
connections to the pooler would hurt in this scenario, but if it does, it
could be optional.
--------------------
1) restore a pristine transaction state: transaction.rollback() should do;
2) reset all connection settings: the foundation was laid in #19274;
- for psycopg2: "SELECT 1";
- for MySQL and Oracle: connection.ping().
Some have argued that persistent connections tie the lifetime of databases
connections to the lifetime of workers, but it's easy to store the creation
timestamp and re-open the connection if it exceeds a given max-age.
So -- did I miss something?
--
Aymeric.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Aymeric Augustin
2013-02-28 20:11:02 UTC
Permalink
Can we please change this so it defaults to off, and just document how to turn it on and in what situations you should turn it on?
In my opinion this default-on feature caters to a very specific audience, and will cause a lot of unexpected behavior with other users.
For the record, the reasoning behind the current default is as follows.

By default, Django uses the database session backend. As a consequence, each worker opens a database connection early in the request cycle (in process_request), and keeps it open until the end of the request. I don't have any figures, but I believe that such sites cannot be dismissed as "a very specific audience".

When a website grows larger (in terms of traffic), its developers optimize it: put sessions into a cache, disable transaction middleware, turn autocommit on, handle transactions manually, use a database connection pooler, etc.

The further along this curve you are, the less persistent connections are useful; they may even become harmful. However, the further along this curve you are, the more you know about database performance too, and the better your ability to determine if persistent connections are useful for your application.

The default is targeted at people who wouldn't know what a good value is and who will get the most benefit from persistent connections.

You obviously have a strong opinion on what a good value is for your applications (namely, 0) and you won't suffer from an unsuitable default.
Immediately for anyone who has configured more workers than they have Postgres connections (which I can only imagine is common among people who havent setup infrastructure like pgbouncer) things will start blowing up.
This argument was addressed by Alex earlier in the thread and I agree with his answer. I also find it more helpful to blow up at application start rather than when the traffic exceeds some threshold.
Why should this be the default behavior?
Unlike MySQL, which already has cheap connections and doesn't suffer this problem, connections in Postgres add quite a large cost, both on creation, on allowing them. **Everyone** who cares about their performance on Postgres should already be using pgbouncer (or some alternative) to offset **both** of these, not just a single problem which is what Django is addressing here.
"**Everyone** who cares about their performance on Postgres" can take 5 seconds to set CONN_MAX_AGE to 0.

Persistent connections will improve the situation a little bit for everyone else — people running on a hosted platform, people who don't have the means to configure, run and monitor a pooler efficiently, people who don't even know what a pooler is, etc.
I will happily defend my opinion in person at PyCon in two weeks if it still remains a default, and anyone is willing to listen, and if you reach out to the other large (and I dont mean scale) Django users I imagine you will find a lot of mixed feelings about this default.
I won't be at PyCon, but several Django committers will. If a core dev wants to change the default to 0, feel free. (Being the author of this feature, I'm obviously biased towards turning it on.)

I don't mind being proved wrong, and I chose to commit this early in the 1.6 release cycle precisely to give plenty of time for such feedback!
--
Aymeric.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Christophe Pettus
2013-02-28 21:05:14 UTC
Permalink
One comment on the patch (which I generally approve of entirely):

It would be helpful to have a backend method that performers the "restore connection between uses" function, rather than just use connection.abort() (of course, the default implementation can use that). For example, on PostgreSQL, ABORT; DISCARD ALL is the recommended way of resetting a connection, so being able to implement that would be great.

--
-- Christophe Pettus
xof-tNz4p9GgBDdWk0Htik3J/***@public.gmane.org
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Aymeric Augustin
2013-02-28 21:29:16 UTC
Permalink
Post by Christophe Pettus
It would be helpful to have a backend method that performers the "restore connection between uses" function, rather than just use connection.abort() (of course, the default implementation can use that). For example, on PostgreSQL, ABORT; DISCARD ALL is the recommended way of resetting a connection, so being able to implement that would be great.
Good point, filed here: https://code.djangoproject.com/ticket/19948
--
Aymeric.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Christophe Pettus
2013-02-28 21:10:50 UTC
Permalink
Immediately for anyone who has configured more workers than they have Postgres connections (which I can only imagine is common among people who havent setup infrastructure like pgbouncer) things will start blowing up.
If they have this configuration, it's an error. The fact that the error is now surfacing doesn't make it a correct configuration.

--
-- Christophe Pettus
xof-tNz4p9GgBDdWk0Htik3J/***@public.gmane.org
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Michael
2013-02-28 21:40:06 UTC
Permalink
Post by David Cramer
Post by David Cramer
Immediately for anyone who has configured more workers than they have
Postgres connections (which I can only imagine is common among people who
havent setup infrastructure like pgbouncer) things will start blowing up.
If they have this configuration, it's an error. The fact that the error
is now surfacing doesn't make it a correct configuration.
Why is this an error? It sounds more like a barrier to entry to me. Most
sites won't ever hit the maximum number of database connections, but they
over provision in their app (or they add more apps to the database as they
see their database load isn't so high). Seems to me if we enable this by
default, we are requiring a certain set of hardware that is way more
demanding than the actual use case for the app, just for a small speed
increase.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
David Cramer
2013-02-28 21:43:24 UTC
Permalink
It is most definitely not an "error" to have less connections available than workers, considering workers may serve different types of requests, and will now persist the database connection even after that request has finished.

It may not be "the most correct", but a concurrent connection is very different than a single operation.
Post by Christophe Pettus
Immediately for anyone who has configured more workers than they have Postgres connections (which I can only imagine is common among people who havent setup infrastructure like pgbouncer) things will start blowing up.
If they have this configuration, it's an error. The fact that the error is now surfacing doesn't make it a correct configuration.
Why is this an error? It sounds more like a barrier to entry to me. Most sites won't ever hit the maximum number of database connections, but they over provision in their app (or they add more apps to the database as they see their database load isn't so high). Seems to me if we enable this by default, we are requiring a certain set of hardware that is way more demanding than the actual use case for the app, just for a small speed increase.
--
You received this message because you are subscribed to a topic in the Google Groups "Django developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/NwY9CHM4xpU/unsubscribe?hl=en.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Christophe Pettus
2013-02-28 21:46:37 UTC
Permalink
Post by David Cramer
It is most definitely not an "error" to have less connections available than workers, considering workers may serve different types of requests, and will now persist the database connection even after that request has finished.
If you have more workers than database connections, you have either (a) over-configured the number of workers, which is generally a bad thing to do, or (b) you are accepting that you will at high-load points get refused connections. I don't see either one as being correct.

--
-- Christophe Pettus
xof-tNz4p9GgBDdWk0Htik3J/***@public.gmane.org
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
To post to this group, send email to django-developers-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Continue reading on narkive:
Loading...