Discussion:
reverse caching of foreign keys #3369
wbyoung-MYLTGd/4c51g9hUCZPvPmw@public.gmane.org
2007-01-29 02:42:38 UTC
Permalink
i just wanted to spark some discussion of #3369. i implemented it and
would like to see it get included.

here's an example of what reverse caching of foreign keys would mean:

b = Blog.objects.get(id=1)
for entry in b.entry_set.all():
entry.blog # No database access required.

currently each item in the list would require a database access (which
isn't needed).

the one controversial change that i made is this. when you assign a
value for a foreign key association, django sets the value, and clears
it's cache. so this:

entry = Entry.objects.get(id=1)
blog = Blog.objects.get(id=1)
entry.blog = blog
entry.blog is blog # false
entry.blog == blog # maybe true, maybe false

is how things work currently. to some extent this makes sense. the
entry hasn't been saved, so the value in the database is still
potentially some other value. to me it makes more sense to update the
cache rather than clearing the cache. my changes do just that, so the
way it works is:

entry = Entry.objects.get(id=1)
blog = Blog.objects.get(id=1)
entry.blog = blog
entry.blog is blog # true
entry.blog == blog # true

doing this made implementing the reverse caching easier. all tests
still pass, and i doubt it is something that anyone using django would
notice. updating the cache is the way that i (and i imagine others)
would expect things to work.

so i hope that people like the reverse caching idea (the main point of
this post) and don't mind the small implication that it has on other
parts of django.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Jacob Kaplan-Moss
2007-01-29 16:51:22 UTC
Permalink
Post by wbyoung-MYLTGd/***@public.gmane.org
i just wanted to spark some discussion of #3369. i implemented it and
would like to see it get included.
b = Blog.objects.get(id=1)
entry.blog # No database access required.
How does this differ from select_related()?

Jacob

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
wbyoung-MYLTGd/4c51g9hUCZPvPmw@public.gmane.org
2007-01-29 22:25:03 UTC
Permalink
It's a lot more efficient, for one.

select_related() generates more code to query the database. It also
goes as "far as possible", so if the blog has a lot more relationships
to it or if the entries have a lot more relationships to them, then
select_related would be pretty inefficient especially since it already
knows the exact object for the blog. If you already have the
information, then why query the database for it? Why not just cache
it right away?

Other reasons it's different...

b = Blog.objects.get(id=1)
for entry in b.entry_set.all():
entry.blog is b # True

Ok, so that might not be a big deal in most cases, but imagine the
blog object does some calculations that each blog entry might want to
know about. (This is the case in my application, but with different
models). If each blog is a fresh object from the database, then this
information would have to be recalculated for each object accessing
the original blog. But if the blog caches it and it's the same blog
every time, then everything's a lot faster. Just a code example that
might help explain what I was trying to say:

b = Blog.objects.get(id=1)
for entry in b.entry_set.all():
calculated = entry.blog.do_some_heavy_calculation()
entry.handle(calculated)

The reason for not just using the original b object? Well if you're
passing your list of entries to a view (and don't want to pass the
blog object as well because - well, why should you have to?), then
this situation is very real.

Another reason... select_related() isn't needed here and shouldn't be
required to be used here. It's possible to get the same result via
simple caching. select_related() is something entirely different.

And the last reason this is different from select related:

b = Blog.objects.get(id=1)
for entry in b.entry_set.all():
entry.blog is b # There is no reason that this should not be
True

It makes sense for that to be True. For people unaware of what's
going on, that should be True. For people assuming it's True, it
should be. If the information's there for it to be True, and it's
possible for it to be True, then shouldn't it be?

That's why it's different.

All tests still pass with the changes included.
I wrote test cases.
I added documentation.
It's ready to be included.... just needs the okay.
Post by wbyoung-MYLTGd/***@public.gmane.org
i just wanted to spark some discussion of #3369. i implemented it and
would like to see it get included.
b = Blog.objects.get(id=1)
entry.blog # No database access required.How does this differ from select_related()?
Jacob
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
wbyoung-MYLTGd/4c51g9hUCZPvPmw@public.gmane.org
2007-02-08 17:10:18 UTC
Permalink
Does anyone else have any thoughts on this?
Post by wbyoung-MYLTGd/***@public.gmane.org
It's a lot more efficient, for one.
select_related() generates more code to query the database. It also
goes as "far as possible", so if the blog has a lot more relationships
to it or if the entries have a lot more relationships to them, then
select_related would be pretty inefficient especially since it already
knows the exact object for the blog. If you already have the
information, then why query the database for it? Why not just cache
it right away?
Other reasons it's different...
b = Blog.objects.get(id=1)
entry.blog is b # True
Ok, so that might not be a big deal in most cases, but imagine the
blog object does some calculations that each blog entry might want to
know about. (This is the case in my application, but with different
models). If each blog is a fresh object from the database, then this
information would have to be recalculated for each object accessing
the original blog. But if the blog caches it and it's the same blog
every time, then everything's a lot faster. Just a code example that
b = Blog.objects.get(id=1)
calculated = entry.blog.do_some_heavy_calculation()
entry.handle(calculated)
The reason for not just using the original b object? Well if you're
passing your list of entries to a view (and don't want to pass the
blog object as well because - well, why should you have to?), then
this situation is very real.
Another reason... select_related() isn't needed here and shouldn't be
required to be used here. It's possible to get the same result via
simple caching. select_related() is something entirely different.
b = Blog.objects.get(id=1)
entry.blog is b # There is no reason that this should not be
True
It makes sense for that to be True. For people unaware of what's
going on, that should be True. For people assuming it's True, it
should be. If the information's there for it to be True, and it's
possible for it to be True, then shouldn't it be?
That's why it's different.
All tests still pass with the changes included.
I wrote test cases.
I added documentation.
It's ready to be included.... just needs the okay.
Post by wbyoung-MYLTGd/***@public.gmane.org
i just wanted to spark some discussion of #3369. i implemented it and
would like to see it get included.
b = Blog.objects.get(id=1)
entry.blog # No database access required.How does this differ from select_related()?
Jacob
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Gary Wilson
2007-02-09 05:25:58 UTC
Permalink
Post by wbyoung-MYLTGd/***@public.gmane.org
Does anyone else have any thoughts on this?
Oh, and I forgot to mention that there is a similar ticket wanting the
same thing for select_related:
Ticket #17 - Metasystem optimization: Share select_related in memory
http://code.djangoproject.com/ticket/17


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Gary Wilson
2007-02-09 04:35:37 UTC
Permalink
Post by wbyoung-MYLTGd/***@public.gmane.org
b = Blog.objects.get(id=1)
entry.blog is b # True
So what if the blog with id=1 changes in between accesses? What if I
had

b = Blog.objects.get(id=1)
[ I do some other things here, and meanwhile the title of blog with
id=1 was changed]
for entry in b.entry_set.all():
entry.blog is b # Is this still true?
entry.blog.title is b.title # How about this?

The caching should probably not always be automatic, but rather
explicit.

I could maybe agree that objects from the some should be cached, for
example:
entry1, entry2 = b.entry_set.all()[:2]
entry1.blog is entry2.blog

because it came from a single query. But then again, even with a
single query you might do things when looping through the returned
objects where you want the latest version of the object in each
iteration. You might also want to do things differently depending on
whether or not the code is in a transaction block or depending on the
isolation level. Someone with more database experience than I should
chime in here.

Gary


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---
Brian Harring
2007-02-09 05:40:08 UTC
Permalink
Post by Gary Wilson
Post by wbyoung-MYLTGd/***@public.gmane.org
b = Blog.objects.get(id=1)
entry.blog is b # True
So what if the blog with id=1 changes in between accesses?
As much as I hate the possibility of more django.dispatcher.send
invocations, seems that as long as the change occurs within the dbapi
a signal could be sent existing instances to update themselves.

Would require tracking all instances of a specific model/id though,
which while not overly costly memory wise, is going to be extra
overhead in a common codepath.
Post by Gary Wilson
b = Blog.objects.get(id=1)
[ I do some other things here, and meanwhile the title of blog with
id=1 was changed]
entry.blog is b # Is this still true?
entry.blog.title is b.title # How about this?
The caching should probably not always be automatic, but rather
explicit.
One alternative view is that the caching doesn't go far enough;
identified by primary id, could have it such that a returned record is
unique in memory. Still have to get the id via a query, but can
substitute in existing instances instead.

Basically, could go full blown and do what ticket #17 is requesting.

Ticket #17 doesn't really detail the potential gotchas though; namely,
if threading of django is supported (literally N threads executing
multiple queries) you can get redundant work occuring on the same
instance in each thread.

Haven't seen any threading of django (just fork and friends), but I'm
also fairly new to django. So... what usage of django out there
natively does multiple python threads of a django instance without
forking?

The other issue here is that even in a single thread execution, if a
function goes and messes with a record/model instance *all* refs
obviously get this pollution. Potential gotcha, although that can be
resolved via adding a method to generate a nonshared instance for
modification, with its save updating the db and the shared version.

Personally I'm more interested in the unique instance approach- poking
at implementing it locally at the moment, but I'd like to see what
folks opinions are on instance caching.

~harring
wbyoung-MYLTGd/4c51g9hUCZPvPmw@public.gmane.org
2007-02-09 15:34:34 UTC
Permalink
Post by wbyoung-MYLTGd/***@public.gmane.org
b = Blog.objects.get(id=1)
[ I do some other things here, and meanwhile the title of blog with
id=1 was changed]
entry.blog is b # Is this still true?
entry.blog.title is b.title # How about this?
I understand how this could happen, but if you simply read the code,
why shouldn't it be true? (If on a single thread and not using a
Post by wbyoung-MYLTGd/***@public.gmane.org
b = Blog.objects.get(id=1)
[ I do some other things here, and meanwhile the title of blog with
id=1 was changed]
b.title # Should it be fetched now since there's a chance that it's changed? No.
If you think your object might have changed in the database, then when
it matters, you fetch it again. I think the caching would still be
good in this case. Don't you?


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-developers-/***@public.gmane.org
To unsubscribe from this group, send email to django-developers-unsubscribe-/***@public.gmane.org
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Loading...