Like any complex framework, EF can introduce difficult-to-diagnose bugs and performance problems due to its misuse, particularly as projects grow — either in scope or in scale. This article will discuss four guidelines for using EF effectively.
Always disable lazy loading.
Lazy loading is a feature in Entity Framework that allows you to defer loading of connected entities (via foreign keys) until they’re accessed for the first time. There are two main requirements for lazy loading of a navigation property to work:
- The property in question must be marked ‘public virtual.’
- The DbContext’s ‘Configuration.LazyLoadingEnabled’ must be set to true.
The following is an example of a DbContext with lazy-load-enabled navigation properties:
And here’s an example of lazy loading in action:
In some scenarios, this feature grants our code a performance boost since it’s possible to conditionally avoid loading an entity’s contents from the database:
However, incorrect use of this feature has serious performance consequences. The following code, while innocent at first glance, will actually execute a SQL query for each salaried employee in the database (plus the first query) — the infamous ‘N+1 Query’ problem. As the number of items in ‘salariedEmployees’ grows, the number of round trips to the database increases.
There are other problems, too. If a repository returns an entity that happens to outlive the DbContext from which it came, it’s possible that accessing a navigation property will throw a runtime exception. Usually this isn’t a problem in web applications since the lifetime of the DbContext is tied to the lifetime of the web request (using your favorite Dependency Injection (DI) container). However, in other types of applications (services, etc.), this can become a problem.
Lastly, if an entity is eventually serialized as part of an API response, lazy loading will cause every property to be sequentially loaded from the database as part of the serialization (since serialization must touch every property). Usually this will lead to runtime errors since entities typically have circular references; in other cases, you’ll be left with performance problems down the road.
Note: As pointed out by my colleague, serialization of Common Language Runtime (CLR) objects with circular references can be accomplished using additional third-party libraries.
The above examples can all be written with lazy loading disabled. Here's what that looks like:
A lot of debate surrounds the usefulness of lazy loading. For most code, the additional cost incurred by disabling lazy loading is simply a few calls to ‘Include’ (as seen above). However, some applications depend heavily on this functionality. In one design pattern, the repository returns entities directly, and the calling code accesses the navigation properties to retrieve what’s needed from the database. In my opinion, this defeats the point of using a relational database management system since the join operator is completely avoided, and all data access is boiled down to single-table queries.
In addition, this pattern causes code that has nothing to do with the data access layer to directly cause database round trips. This kind of code is also (once again) less performant than loading the needed information upfront. Given the subtlety of the problems introduced by lazy loading and the limited upside, it’s best to disable it.
Prefer projection over manual mapping.
Using the same repository above, consider these two repository methods that return a Data Transfer Object (DTO) representation of an organization:
The first loads the entity (as well as everything else we need via ‘Include’) in the first step, and then maps the result to the DTO, which is then returned. The second uses projection to map the query directly to the object.
There are two advantages to the projection approach. First, EF is smart enough to notice we’re using only the count of the ‘Employees’ collection in the projection (rather than the employees themselves). The generated SQL will not bring back properties for employees since we don’t need them. The performance gain due to this fact can be enormous — especially if departments have many employees. In contrast, the first technique loads all of the employees into memory (and, in fact, there’s no way to avoid doing so).
Secondly, the second version is easier to maintain. Suppose a new navigation property is added to department, and a corresponding property is added to DepartmentSummary. In the second version of this function, the projection simply needs to be modified to set the DTO’s property. In the first version, doing the same will result in a null reference exception (or an N+1 situation if lazy loading is enabled) if the programmer forgets to ‘Include’ the related entity.
DbContext, SaveChanges and dependency injection
The management of DbContext in the context of business transactions is a difficult topic. Mehdi El Gueddari has written the best post I've seen on the subject; I won't duplicate his efforts, but I'll try to summarize the important points below and present a way to avoid most of the mess.
We're assuming a typical architecture in the following discussion, where the DI container has set DbContext's lifetime to 'InstancePerRequest.' Note that in this architecture, the DbContext associated with the web request is injected into both Repository1 and Repository2.
What's wrong with the 'usual' architecture?
Nonweb contextsIn nonweb projects, such as services or console apps, there’s no web request — so configuring your DI to use 'InstancePerRequest' for your DbContexts will throw an exception. As a result, we must configure DbContext's lifetime to be singleton or transient. Singleton won't work since DbContext isn't thread-safe and the cache will quickly go stale (it's generally advised that DbContext's lifetime be kept short).
Configuring DbContext to be transient, or instance-per-dependency, will cause Repository1 and Repository2 to receive different instances of DbContext. This can cause all sorts of issues:
- Atomicity is lost since the DbContexts' units of work are separate.
- Entities returned from Repository1 and passed to Repository2 won’t be 'attached' to Repository2's DbContext, leading to strange lost updates/inserts/deletes.
- Two connections to the database are opened, prompting escalation to a distributed transaction if TransactionScope is used (more on that later).
Your service classes shouldn't require a web context to function correctly. If a standalone Windows service or scheduled task eventually makes its way into the project, you’ll want to reuse the data access layer, but ignoring the above points will keep you from doing so. Even if you’re sure your project will never require a nonweb component, there are still problems.
Web contextsIn this scenario, all repositories work on the same instance of DbContext — the one associated with the web request. There are two main issues in this scenario:
- Implicitly using a shared DbContext leads to confusion over who needs to call SaveChanges.
- Implicitly reusing a shared DbContext across multiple calls to service methods can lead to unexpected side effects.
For the first point above, the first solution you may consider involves having each repository SaveChanges after each of its public methods. The problem is that doing so prevents you from composing two or more repository methods in a single service call without losing atomicity (since SaveChanges is called more than once).
This greatly reduces the reusability of the repository's code. In addition, each repository shares the same DbContext, so calling SaveChanges in one repository can (rather unexpectedly) save changes made by another.
The second solution involves injecting the DbContext into the service class and calling SaveChanges from there instead of inside the repository. Although this solution allows you to compose repository methods while maintaining atomicity, it's still unsatisfactory from a clarity standpoint — the service object is calling SaveChanges on an object it didn't seem to make changes to.
Note that this solution makes the situation much, much worse in the nonweb context (where each service/repository gets its own instance of DbContext). In that context, the repositories aren't calling SaveChanges at all, and the service class is saving an empty set of changes.
For the second point above, suppose the controller calls service method A() that makes some database changes, and a runtime exception is thrown after DbContext has been modified but before SaveChanges is called — presumably, A() failed to update the database.
The controller catches and logs the exception, and calls a different service method B(). Once B() calls SaveChanges, A()'s partially complete changes are incorrectly committed to the database.
This bug is a consequence of the service class reusing the same instance of DbContext to perform different operations and is very difficult to detect — it only occurs in exceptional cases. In all other cases, A() is atomic.
A better way
The following points need to be addressed:
- Repositories must use a shared DbContext within a given service method so joint operations can be atomic.
- Repeated calls to the service class must use separate DbContexts in order to avoid the 'partial completion' bug described in the above section (#2).
- Repositories must not call SaveChanges (so their methods are composable).
- The service class must work in web and nonweb contexts.
- There must be no hard-coded dependencies.
I've included an example architecture that accomplishes all of the above:
The service class creates the DbContext on a per-operation basis using a mockable DbContextFactory. It then passes the DbContext through the (also mockable) repository factories in order to create repositories that will work on that DbContext.
Since the service class explicitly controls the DbContext, it can take responsibility for calling SaveChanges once and only once, satisfying the atomicity requirement above. This also means the repositories don't have to call SaveChanges, allowing us to compose their methods as needed.
Each DbContext is local to its own method, resulting in no partial update bugs. All dependencies can be set to either 'Transient' or 'Singleton' in this code (doesn't matter), so it will work in both contexts, and there aren't any hard-coded dependencies.
Although this approach satisfies all of our objectives above, it isn't perfect. As it currently stands, there’s no way to combine multiple service methods into a single transaction since the creation of DbContext prompts the opening of a connection.
In addition, since we're explicitly disposing DbContext as soon as the service method has completed, the service method shouldn't return any entities if lazy loading is enabled. Trying to access a navigation property while the DbContext is disposed will throw a runtime exception (as discussed earlier).
Prior to EF 6, the only way to achieve atomicity across multiple calls to ‘SaveChanges’ was using TransactionScope:
This approach has a few problems. By design, TransactionScope can detect when a local transaction isn’t sufficient for atomicity and automatically escalates the transaction to a ‘distributed transaction.’ A distributed transaction is designed to span more than one machine and is very expensive. Unfortunately, it’s surprisingly easy to inadvertently write code that triggers this escalation.
In SQL Server 2005, very simple code (Update 4 in the link) can cause TransactionScope to escalate from a local transaction to a distributed transaction. This has been fixed in SQL Server 2008 and above.
Any time you have more than one DbContext (more than one database), and you accidentally update both within the same TransactionScope, you get an escalation — even if this wasn't explicitly your intention. I've done this personally when adding logging to an external database to existing service methods.
Since distributed transactions are uncommon, it's better to use DbContext.Database.BeginTransaction() (example below) which always creates a lightweight local transaction. Only switch to TransactionScope if you’re absolutely sure you need a distributed transaction.
TransactionScopeAsyncFlowOption.Enabled isn't set, whereas BeginTransaction() doesn't have this problem.
When used improperly, Entity Framework can cause subtle defects that aren't detected until late in the product's lifecycle. Since many apps' central concern is data, it pays to carefully architect the code surrounding your data to be as reliable, reusable and performant as possible. With these guidelines in mind, you'll avoid many of the traps of Entity Framework while still leveraging its full potential.
Big thanks to Tom Straub, Jonathan Gardner and Adam Howard for lending their guidance and experience for this post.
This article originally appeared on Jan. 10, 2017.