We recently migrated our session management from MongoDB to Redis. The migration itself was motivated by our experience with MongoDB, which didn’t handle high frequent updates and even more frequent reads particularly well. Redis on the other hand is known as a proven storage to handle exactly that use case.

Database migrations aren’t always easy, because we need to learn new patterns, best practices, and quirks of another service. We aimed at keeping our Java service layer as simple as possible, so that it would be stable and future proof: the session management is certainly one of those services with a quite stable feature set and its code won’t be touched very often. So, keeping it simple and comprehensible for anybody peeking into it after several years is an important aspect.

Due to our, well, naïve approach we faced two issues:

  1. Spring Data’s concept of implementing secondary indices and how it works together with EXPIRE
  2. Redis’ scope of atomicity and Spring Data’s update mechanism

This article summarizes our learnings adopting the Redis with a thin Java service using Spring Data as persistence layer.

Spring Data Redis with Secondary Indices and EXPIRE/TTL

Adopting Spring Data with Redis starts straight forward: all you need are the dependencies for your Gradle or Maven build along with a @EnableRedisRepositories annotation in a Spring Boot app. Most defaults of Spring Boot make sense and get you running with a Redis instance quite smoothly.

The actual implementation of a generic repository isn’t required, because Spring Data lets you declare a simple interface leading to a generic instance at runtime. Our repository started like this:

import org.springframework.data.repository.CrudRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface SessionDataCrudRepository extends CrudRepository<SessionData, String> {
}

Our entities managed by that repository also started as simple as it might get:

import org.springframework.data.annotation.Id;
import org.springframework.data.redis.core.RedisHash;
import org.springframework.data.redis.core.TimeToLive;

import java.util.concurrent.TimeUnit;

@RedisHash("SessionData")
public class SessionData {

  @Id
  private String sessionId;
  @TimeToLive(unit = TimeUnit.MINUTES)
  private Long ttl;

  ...
}

You’ll notice that we opted to model the TimeToLive as ttl property, which is translated as EXPIRE for the entity. We didn’t want to manually track expiring sessions, but wanted Redis to remove expired sessions transparently. The ttl is regularly refreshed to its initial value during user activity - otherwise a user might be logged out in the midst of working with our platform.

What happens when a user actually pushes the Logout button or how could we disable a user account and invalidate a running session? Easy: we also have a userId as part of the SessionData and can perform a query to find every session for that userId. The required changes for the classes above look like this:

  • SessionDataCrudRepository

    @Repository
    public interface SessionDataCrudRepository extends CrudRepository<SessionData, String> {
    
    +    List<SessionData> findByUserId(String userId);
    }
    
  • SessionData

    +import org.springframework.data.redis.core.index.Indexed;
    
    @RedisHash("SessionData")
    public class SessionData {
    
        @Id
        private String sessionId;
        @TimeToLive(unit = TimeUnit.MINUTES)
        private Long ttl;
    
    +    @Indexed
    +    private String userId;
    
        ...
    }
    

The innocent looking @Indexed annotation triggers a special behaviour in Spring Data. The annotation actually tells Spring Data to create and maintain another index on the entities so that we can query a whole list of SessionData for a given userId. The combination of a secondary index and automated expiry of entities makes the setup a bit more complex, though. Redis won’t automatically update the secondary index when a referenced entity is deleted, so Spring Data needs to handle that case. Yet, Spring Data doesn’t constantly query Redis for expiring entities (keys), which is why Spring Data relies on Redis Keyspace Notifications for expiring keys along with so called Phantom Copies:

When the expiration is set to a positive value, the corresponding EXPIRE command is run. In addition to persisting the original, a phantom copy is persisted in Redis and set to expire five minutes after the original one. This is done to enable the Repository support to publish RedisKeyExpiredEvent, holding the expired value in Spring’s ApplicationEventPublisher whenever a key expires, even though the original values have already been removed.

There’s a little detail to notice in the next paragraph:

By default, the key expiry listener is disabled when initializing the application. The startup mode can be adjusted in @EnableRedisRepositories or RedisKeyValueAdapter to start the listener with the application or upon the first insert of an entity with a TTL. See EnableKeyspaceEvents for possible values.

Sadly, we didn’t read so far. Thats’s why we experienced the effects of enabling EXPIRE with disabled key expiry listeners, combined with an ever growing secondary index. Long story short: we observed an ever growing amount of keys and growing memory usage - until the Redis’ memory limit was reached.

Inspecting the Redis keys made it obvious where to find the configuration error, which ultimately made us fix the @EnableRedisRepositories annotation enabling keyspace events. We also disabled the automated server configuration of the notify-keyspace-events property, because we enabled that setting server-side:

import org.springframework.data.redis.repository.configuration.EnableRedisRepositories;

import static org.springframework.data.redis.core.RedisKeyValueAdapter.EnableKeyspaceEvents.ON_STARTUP;

@EnableRedisRepositories(enableKeyspaceEvents = ON_STARTUP, keyspaceNotificationsConfigParameter = "")
@SpringBootApplication
public class SessionManagementApplication {

   ...
}

We also had to manually cleanup the stale data, so let’s also mention that you should always prefer SCAN instead of KEYS when working with large data sets. Netflix’s nf-data-explorer might help, if you don’t fancy working with the native redis-cli.

Missing Entities During Concurrent Reads and Writes

With the issue of an ever growing memory usage being fixed we eventually made the new service the primary source for our sessions.

When requests hit our security chain, we always verify the users’ session to be valid. Those verifications are simple lookups of a sessionId at the session management. Usually, a status 404 NOT FOUND from the session management indicates either the sessionId to be invalid (unknown), or the session being expired (and deleted by Redis).

Along with some related changes in our applications consuming the new api we observed another strange behaviour: some sessions couldn’t be found, although we were 100% sure that the sessions should still be valid (known and not expired). After a session lookup having failed, most retries succeeded, so we knew that the data wasn’t lost and simply couldn’t be found.

We couldn’t actively reproduce the erroneous behaviour and collecting logs, metrics, and traces didn’t give lead. Along the way we added caching and other workarounds, with some changes being improvments for the overall behaviour, but we didn’t actually fix the issue.

If you carefully read the first part of this article, you might remember the little detail about us refreshing the ttl. We not only refresh the ttl but a lastResponse timestamp as part of the SessionData, too:

import org.springframework.data.annotation.Id;
import org.springframework.data.redis.core.RedisHash;
import org.springframework.data.redis.core.TimeToLive;
import org.springframework.data.redis.core.index.Indexed;

import java.time.LocalDateTime;
import java.util.concurrent.TimeUnit;

@RedisHash("SessionData")
public class SessionData {

  @Id
  private String sessionId;
  @TimeToLive(unit = TimeUnit.MINUTES)
  private Long ttl;
  private LocalDateTime lastResponse;

  @Indexed
  private String userId;

  ...
}

So, let’s have a more detailed look at the request processing regarding the session management. The user sends a request, along with a sessionId, indicating that they are logged in. We perform a lookup with that sessionId to verify the user’s session. If the session is considered to be valid, the application can proceed with the requested action. After the application has processed the request, the security chain regularly updates the session resetting the ttl and writing the current lastResponse timestamp. Ususally, the user performs several requests - probably not the actual human, but a frontend application running in the browser. That frontend application doesn’t truly care how frequently it sends new requests, so we can assume that several requests might hit our backends at the same time.

Several requests being verified. Several requests triggering a session refresh along with the write operation on the SessionData.

We were still using Spring Data’s CrudRepository for reading and updating sessions, using the following code:

  • reading:
SessionDataCrudRepository repository;

public Optional<SessionDto> getSession(String sessionId) {
  Optional<SessionData> session = repository.findById(sessionId);
  ...
  return session;
}
  • updating:
SessionDataCrudRepository repository;

public Optional<Long> refreshSessionTtl(String sessionId) {
  Optional<SessionData> session = repository.findById(sessionId);

  AtomicLong updatedTtl = new AtomicLong();
  session.ifPresent(data -> {
    data.setLastResponse(LocalDateTime.now(clock).truncatedTo(SECONDS));
    data.setTtl(SESSION_TIMEOUT.toMinutes());

    SessionData saved = repository.save(data);
    updatedTtl.set(saved.getTtl());
  }
  return Optional.of(updatedTtl.longValue());
}

Sometimes, the repository.findById(...) didn’t yield anything, so we focussed on that part. The problem was triggered by the repository.save(...) call, though. After several weeks of googling and staring at logs and traces we found a correlation between refreshSessionTtl and getSession calls.

Many articles in the internet already trained us to think of Redis as a single threaded service, performing every request sequentially. Googling with “spring data redis concurrent writes” as part of the search query led us to stackoverflow and the issue at spring-projects/spring-data-redis/issues/1826, where our problem was described and even explained - along with a fix.

Long story short: Spring Data implements updates as a sequence of DEL and HMSET, without any transactional guarantees. In other words: updating entities via CrudRepositories doesn’t provide atomicity. Our HGETALL requests sometimes happened exactly between DEL and HMSET, resulting in an empty result or sometimes with a result, but a negative ttl.

Our issue could now be reproduced with an integration test and fixed using PartialUpdate. So the implementation above changed to:

KeyValueOperations keyValueOperations;

public Optional<Long> refreshSessionTtl(String sessionId) {
  Optional<SessionData> session = repository.findById(sessionId);

  AtomicLong updatedTtl = new AtomicLong(-3);
  session.ifPresent(data -> {
    PartialUpdate<SessionData> update = new PartialUpdate<>(data.getSessionId(), SessionData.class)
        .refreshTtl(true)
        .set("ttl", SESSION_TIMEOUT.toMinutes())
        .set("lastResponse", LocalDateTime.now(clock).truncatedTo(SECONDS));
    keyValueOperations.update(update);
    Optional<SessionData> saved = repository.findById(data.getSessionId());
    if (saved.isPresent()) {
      updatedTtl.set(saved.get().getTtl());
    }
  }
  return Optional.of(updatedTtl.longValue());
}

Summary

A combination of expiring keys, secondary indices, and delegating all the magic to Spring Data Redis requires proper configuration of keyspace event listeners. Otherwise your used memory grows over time due to the phantom copies. Consider using a configuration like @EnableRedisRepositories(enableKeyspaceEvents = ON_STARTUP) in your app.

In an environment with concurrent reads and updates, beware that Spring Data’s CrudRepository implements updates as a two-step process of DEL and HMSET. If you observe sporadically missing keys or results with a negative TTL, you might have hit a concurrency issue. Check your write operations and consider updating the changed properties with a PartialUpdate and Spring Data’s RedisKeyValueTemplate#update method.

You can contact us for feedback or questions via Twitter @Europace or @gesellix. Thanks!


Teaser Photo by Antonio Grosz on Unsplash.