An Underutilized Secret Weapon for Faster Queries: The SQL Filtered Index

“Efficiency is doing things right; effectiveness is doing the right things.”

Peter Drucker

First, a Quick Refresher: “Normal” Indexes

When storing and retrieving data in MS SQL Server, you have two options: Clustered and Nonclustered indexes. A clustered index is your table in a specific order (Think of the old-timey phonebooks). A nonclustered index is typically a smaller subset of fields of the entire table, usually ordered on a different field than your clustered index (Think the index in the back of a textbook). However, both indexes, by default, consist of one record for each row in your table.

The Problem: When Your Index Gets Too Big

Lets say you have a shipping company and you store your orders in the table below.

CREATE TABLE Orders (
    OrderID INT IDENTITY(1,1) PRIMARY KEY,
    OrderDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CustomerName VARCHAR(255) NOT NULL,
    ShippingAddress TEXT NOT NULL,
    -- This is the key column:
    -- It will be NULL (empty) if it hasn't shipped.
    -- It will have a date/time if it has shipped.
    ShipDate DATETIME NULL 
);

Over time, your business has done great and your widgets are selling like hotcakes! The query below that you use to generate your weekly reports is running slower and slower, though.

-- Select count of orders that have not shipped yet
SELECT OrderDate, COUNT(*) Total_Orders
FROM Orders
WHERE ShipDate IS NULL
GROUP BY OrderDate;

You decide you should probably add a nonclustered index to this table to help speed it up.

-- Create NC index on ShipDate
CREATE NONCLUSTERED INDEX idx_ShipDate
ON dbo.Orders(ShipDate)
INCLUDE (OrderDate);

This helps speed up your query tremendously and time marches on. But soon you realize that this index is rather large and you wonder if you can save space. And you also notice that sometimes your estimated counts are way off from your actual counts on this and a few other queries that utilize this index (even after confirming stats are up to date).

This is Where Filtered Indexes Shine!

A Filtered Index is a special nonclustered index that only includes a subset of the rows in your table, based on a WHERE clause you define. You can create a new index like below (and drop the old one above).

-- Create NC index on orderdate but only where ShipDate is null
CREATE NONCLUSTERED INDEX idx_OrderDate_ShipDate_IsNull
ON dbo.Orders(OrderDate)
WHERE ShipDate IS NULL;

The “Hidden” Benefits of Filtered Indexes

Benefit 1: Smaller Size and (Much) Better Statistics

This index will now only be the few thousand orders that are currently unshipped instead of the millions of rows of everything ever ordered (shipped or not). The added bonus of being smaller is that the 200 row histogram of the statistics for this filtered index will be better (more representative) than what the old one was.

Benefit 2: Less Blocking on INSERTS, UPDATES, and DELETES

Another benefit of Filtered Indexes comes into play with INSERTs, UPDATEs, and DELETEs. If you are doing any of those actions on records that are not included in the filtered index, then other queries that only need information in the filtered index won’t be blocked. Think fully covered nonclustered filtered indexes here and you’ll get the idea. The action will occur on the other indexes that may be affected, but since your rows are not getting the action, that Filtered Index will not be part of the INSERT, UPDATE, or DELETE.

Benefit 3: Solving the “Wide Table” Lookup Problem

Lets say you have a very wide table with lots of fields and tons of rows. You have a query that needs to pull back several long fields. You could just create a nonclustered index on the predicates that narrows this down, but even then the query is taking a long time to do the lookups to grab the long fields.

You could update the nonclustered index to include those several long fields, but that will take up a lot of space. Instead, you decide to create a filtered index on your predicate data to narrow this index down to just the rows you really care about and then include those long fields, thus keeping that nonclustered index small AND still allow it to be fully covering.

A Quick Warning: Your Query Must “Fit” the Filter

The added benefit of Filtered Indexes is that your predicate doesn’t have to match the filtering criteria exactly. As long as your predicate is a subset of the filtering criteria, the optimizer can pick your filtered index. For example, the filtered index below is looking for any orders over $100. But my query is looking for any orders over $500. However, the second SELECT query will NOT use the filtered index because $50 is not a subset of the filtered index.

--The table
CREATE TABLE Orders (
    OrderID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    OrderDate DATETIME2 NOT NULL DEFAULT GETDATE(),
    CustomerID INT NOT NULL,
    OrderStatus VARCHAR(20) NOT NULL,
    Amount DECIMAL(10, 2) NOT NULL
);

--The filtered index where we only have amounts greater than 100
CREATE NONCLUSTERED INDEX IX_LargeOrders
ON Orders (CustomerID, OrderDate)  -- Index key columns
INCLUDE (Amount)                   -- Covering columns
WHERE (amount >= 100);   -- The filter

--A query looking for orders over 500 will use the filtered index
SELECT CustomerID, OrderDate, Amount
FROM dbo.Orders
WHERE Amount > 500

--A query looking for orders over 50 will NOT use the filtered index
SELECT CustomerID, OrderDate, Amount
FROM dbo.Orders
WHERE Amount > 50


A Warning: The “Selectivity” Trap

You shouldn’t use a filtered index when the predicate of that filter isn’t very selective. For instance, if it returns 90% of your table, there is not likely a reason to use the filter. Where is that sweet spot? From my experience, that isn’t until it gets to at least 20%, if not less, of the total size of the table. Your mileage may vary, though, so try it out if you think it’ll help. Just remember, every new index increases your DB size so Backups, defragmentation, and insert/update/deletes will take that much longer.

The Final Verdict

So as you can see, Filtered Indexes are a great tool. They are not going to be something you use all the time, but in specific needs they come in very handy to:

  • Reduce your overall storage footprint
  • Improve Statistics
  • Speed up your queries

Here are some external links if you want to read more on this subject.

Create Filtered Indexes: It’s always a good idea to have the official Microsoft docs handy if you want to see every last technical detail.

What You Can (and Can’t) Do with Filtered Indexes: This is a great summary of the common “gotchas” and limitations, like what you can and can’t put in the WHERE clause.

Filtered Indexes, Variables, and Less Doom & Gloom: This post tackles that next big problem you’ll hit: why your index might not get used when you have variables in your query.

Also, below you will fine a couple articles I’ve written that you may find interesting and related to this article.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.