Author: Marc Geldon
IQueryable<T>.Skip(0) seems like it should be “free.” In other words, since it will have no effect on the resulting data, there should be little to no performance cost for calling it. But this is demonstrably not true in LINQ to Entities, and it occurs to me that LINQ providers are not required to optimize it away. Therefore, it is probably a good idea to avoid making such a call at all, so that you do not have to concern yourself with whether the provider will generate a slower query if you do it.
Unfortunately, it is not just your own code you need to be concerned with. It turns out that some fairly popular LINQ helpers do this. For example, Rob Connery’s PagedList<T> type does it:
public PagedList(IQueryable<T> source, int index, int pageSize)
this.TotalCount = source.Count();
this.PageSize = pageSize;
this.PageIndex = index;
this.AddRange(source.Skip((index - 1) * pageSize).Take(pageSize).ToList());
Index, in this code, is the 1-based page you’d like to display. When index is 1, the code calls Skip(0), and you get less efficient SQL. Fixing the problem is quite simple:
this.PageIndex = index;
var pageData = index > 1 ?
source.Skip((index - 1) * pageSize).Take(pageSize) : source.Take(pageSize);
Naturally, this only optimizes SQL generation for the first page of results, but since this is by far the most common page a user might display, it can net you a substantial win in server performance.
Troy Goode’s updated PagedList type, which I have recommended in the past (and still do, with this tweak!), has the same problem. The solution is almost identical, although Troy’s type is 0-based rather than 1-based. Update: Troy has now posted an update containing this fix, among many other changes.
The root of the problem in the SQL generation is that when you call Skip(0), LINQ to Entities, in at least some cases, generates SQL like this:
) AS [Project3]
WHERE [Project3].[row_number] > 0
ORDER BY [Project3].[TimeRecordDate] DESC
The reference to
row_number can have a substantial performance impact. In one case I tried with a SQL Server table containing just over 1 million rows, a simple query ran with sub-second performance without this WHERE clause, and over 4 seconds with the WHERE clause, even though the result set was exactly the same.
Now, is this a bug in LINQ to Entities? Or LINQ itself? It’s hard to say. Both frameworks are doing exactly what you tell them to, even though it doesn’t make a lot of sense. It seems like a missed opportunity for optimization, but I’m open to the possibility that there might be some desirable effect of this that I have simply missed. For my own purposes, I am content to simply not call Skip(0) at all.