Log in

entries friends calendar profile Consoleninja.net Previous Previous Next Next
dormando - Cache your sessions. Don't piss off your users
Cache your sessions. Don't piss off your users
I hope you're all enjoying the 1.2.6 stable release of memcached. Don't want to hear no whining about it crashing!

One of the most common questions in memcached land is the ever obnoxious "how do I put my sessions in memcached?". The long standing answer is usually "you don't", or "carefully", but people often walk the dark path instead. Many libraries do this as well, although I've seen at least one which gets it.

This isn't as huge of a deal as people make it out to be. I've been asked about this over the mailing list, in IRC, in person, and even in job interviews. What people end up doing gives me the willies! Why! Why why why... Well, I know why.

So what is the deal with sessions? Why does everyone want to jettison them from mysql/postgres/disk/whatever? Well, a session is:

- Almost always larger than 250 bytes, and almost always smaller than 5 kilobytes.
- Read from datastore for every logged in (and often logged out) user for every dynamic page load.
- Written to the datastore for every dynamic page load.
- Eventually reaped from the database after N minutes of inactivity.

Ok well that sucks I guess. Every time a user loads a page we read a blob row from mysql, then write a blob row back. This is a lot slower than row without blobs. Alright, so I see it now. Memcached to the rescue!

Er, except maybe it's a little complicated to actually memcached these things, since we need a write for every read... Why not just use memcached for sessions!? It lines up perfectly! Check it out:

- Set a memcached expire time for the max inactivity for a session. Say 30 minutes...
- Read from memcached.
- Write to memcached.
- A miss from memcached means the user is logged out.

Voila! ZERO reads or writes to the database, fantastic! Fast. Except I really don't like the tradeoffs here. This is one example where I believe the experience of both your users and your operations team is cheapened. Users now get logged out when anything goes wrong with memcached! Operations has to dance on eggshells. Or needles. Painful.

- Evictions are serious business. Even if you disable them (-M), out of memory errors means no one can log into your site.
- Upgrading memcached, OS kernel, hardware, etc, now means kicking everyone off your site.
- Adding/removing memcached servers kicks people off your site. Even with consistent hashing, while the miss rate is low it's not going to be zero.

So now what? Well we have zero accesses on our database, so it's fast! But we can't ever touch memcached again in fear of ticking off users. Progress be damned! Before you all think I'm completely off my rocker, I will admit there are some legitimate reasons to do this. If the way your site works doesn't really impact users on loss of a session, or impacts few enough users, you can use this design pattern. How many people are actually affected if you get logged out of wikipedia.org? Well, the people writing revisions certainly mind, but the greater userbase is unaffected. They're a non profit, they understand the tradeoff, etc. So that's fine. It's not fine for a lot of the people I see suggesting it or doing it. As developers get more comfy with memcached the session issue will become more of an obvious bottleneck.

The memcached/mysql hybrid really isn't that bad at all. You can get rid of over 90% of the database reads, a lot of the writes, and leave your users logged in during rolling upgrades of memcached.

First, recap the components involved: The page session handler itself, and some batch job which reaps dead sessions. For small websites (like a vbulletin forum) these batch jobs are often run during page loads. For larger sites they will be crons and so forth. This batch job can also be used to save data about sessions for later analysis.

The pattern is simple. For reads fetch from memcached first, database second. For writes write to memcached, unless you haven't synced the session to the database in the last N seconds. So if a user is clicking around they will only write to the database once every 120 seconds, and write to memcached every time.

Now modify the batch job. Crawl all expired sessions, and check memcached for the latest data. If session is not really expired don't expire it then, if it is use the latest possible data from memcached. Write back to the database. Easy.

You take the tradeoff of sessions being mildly lossy for recent information, but you gain reliability back in your system. Reads against the database should be almost nonexistent, and write load should drop significantly, but not as much as reads.

So please, if you run some website I might eventually use, don't put memcached in a place where restarting individual servers might piss me off. Thanks :)

I'd like to also challenge maintainers of session libraries for all languages to turn this design pattern into tunable (note all the places where I wrote N) libraries folks can plug in and use.

The more standard this stuff is the more likely the next fancy startup is going to get it right. Reuse is a great thing. I can't say enough about how great efforts like krow's libmemcached go for standardizing how we use memcached, but it's also a great help to ship libraries for common design patterns.


7 comments or Leave a comment
tamasrepus From: tamasrepus Date: November 3rd, 2008 01:37 am (UTC) (Link)

How to "crawl" expired sessions?

I like your idea, but there's one step which is not clear how to implement... it may be because I don't have much Memcache experience under my belt.

You mention to "crawl all expired sessions." As far as I know, memcached only lets you retrieve/delete entries if you know the full key. How do you know the keys/sessions to remove when "crawling," if all session information is being stored in memcached, and you can't query anything?
netherben From: netherben Date: November 3rd, 2008 07:33 am (UTC) (Link)

Re: How to "crawl" expired sessions?

In this pattern, since you have a copy of the sessions in the DB as well, you can query for session-IDs from the Database, and check the databases sessions against the sessions in memcache.

We use a similar method at Hab.la, but write to DB when an 'important' session change has occurred, or if the DB session is nearing expiry. I like the timer idea.

dormando From: dormando Date: November 3rd, 2008 07:37 am (UTC) (Link)

Re: How to "crawl" expired sessions?

This pattern discusses using memcached to augment the database usage... so you should already have your session data in the database, which is trawlable.

for each expired session in database
- find related memcached key, test if session was updated and thus not expired
- expire session, delete memcached key, if session is still expired.
- log useful data somewhere.
ext2366 From: ext2366 Date: February 26th, 2010 10:24 am (UTC) (Link)

Re: How to "crawl" expired sessions?

I've been thinking about implementing this pattern. However, there's one thing that I haven't come up with a good solution to yet.

We expire sessions after 24 hours. Without something running more often, it seems that when we come to crawl the sessions, the memcached version may have been evicted.

I guess there are 2 options.

1. Somehow make sure the don't get evicted - maybe by giving memcached lots of memory.
2. Make a list of sessions as they get touched, and crawl that every half hour or so, persisting them. This ends up with writes, or some sort of in-memory structure.

Do you have any other ideas?
dormando From: dormando Date: February 26th, 2010 11:37 pm (UTC) (Link)

Re: How to "crawl" expired sessions?

You should be trawling the sessions often, and expiring them when they're actually expired.

For example if you hold sessions for 24 hours, you'd scan through sessions that haven't been updated in the db in over 30 minutes, sync their memcached keys, and if they're expired, expire them.

You're still only crawling stuff that's not being hit all the time, which should reduce most of your extra writes. You could reduce it more by having a "dirty" column and only looking at sessions which have been idle for more than 15-30 minutes and haven't been dirtied by a sync caused by normal user activity.

So even though you expire sessions every 24 hours, an active user should be writing his session back to the db every 5-15 minutes. Which dirties it and updates the stamp in the DB for the last time the session was known active... No in memory structures, just a single index lookup.
brong From: brong Date: April 21st, 2009 05:00 am (UTC) (Link)
Interesting. I've been considering doing, well, pretty much exactly the same "don't cause IO to the mysql machines for sessions" issue.

I have responses to your issues:

- Evictions are serious business. Even if you disable them (-M), out of memory errors means no one can log into your site.

So make sure you have enough memory for all your sessions plus some serious slack. We have a bunch of machines with 32Gb RAM. I have calculated that 2Gb would be used about 10% at our maximum logged-in-user load, and that will hardly affect those hosts at all.

So long as you can actually get stats out of memcached (currently 37% "alive" in the most active slab) then you can monitor and scale as needed.

Besides - LRU. The ones that get kicked off are already pretty stale assuming you haven't sized your memcached insanely low.

- Upgrading memcached, OS kernel, hardware, etc, now means kicking everyone off your site

Ahh, yes - "single instance" hashing. So write the session to multiple memcacheds each time. In the worst case you may need to try multiple fetches. Still no database IO though. Most times you'll fetch the cache record successfully from a random server.

The general answer to "single point of failure" is store in multiple places. Hardly takes a genius to figure that one out.

- Adding/removing memcached servers kicks people off your site. Even with consistent hashing, while the miss rate is low it's not going to be zero.

See above. Store more than one copy. Duh.

We can shut down any one of our machines with a couple of seconds notice and be sure of consistency. A hard failure not so much (cyrus and mysql replication are both running async for performance), but it's still pretty close. Maybe lose a couple of seconds' worth of data - and that happens very rarely, only if there's complete hardware failure.

We haven't switched to memcached yet - I've only been trialing the setup on a testbed - but shutting down one of the servers didn't affect the active sessions at all.

There are two things stopping me from switching things over now - some poorly factored APIs mean there are two other places I would need to convert from mysql specific queries to use the abstract session interface, and we don't _yet_ have monitoring infrastructure to make sure that all the memcacheds are online and working. Everything else gets a health check every 2 minutes and we get paged if it goes down. I wouldn't want live session in the memcacheds until we have a test for them.
chelpsix From: chelpsix Date: October 10th, 2009 06:59 pm (UTC) (Link)
Предлагаю ссылкообмен.
7 comments or Leave a comment