Unindexed items in Content Search in Office 365

A Content Search that you run from the Office 365 Security & Compliance Center gives you the option to include unindexed items in the estimated search results when you run a search. Unindexed items are Exchange mailbox items and documents on SharePoint sites that for some reason weren't indexed for search. In Exchange, an unindexed item typically contains a file—of a file type that can’t be indexed—that is attached to an email message. Here are some reasons why items can't be indexed for search and are returned as unindexed items when you run a search:

  • The file type is unsupported or disabled for indexing.

  • The file type is supported for indexing but an indexing error occurred for a specific file.

  • Too many files attached to an email message.

  • A file attached to an email message is too large.

  • A file is encrypted with non-Microsoft technologies.

  • A file is password-protected.

For legal investigations, your organization may be required to review unindexed items. You can also specify whether to include unindexed items when you export search results to a local computer or when you prepare the results for further analysis with Office 365 Advanced eDiscovery.

For information about indexing in SharePoint Online, see Search limits for SharePoint Online.

Contents

File types not indexed for search

Messages and documents with unindexed file types can be returned in search results

Unindexed items included in the search results

Unindexed items excluded from the search results

Indexing limits for messages in Content Search

More information about unindexed items

File types not indexed for search

Certain types of files, such as Bitmap or MP3 files, don't contain content that can be indexed. As a result, the search indexing servers in Exchange and SharePoint don't perform full-text indexing on these types of files. These types of files are considered to be unsupported file types. There are also file types for which full-text indexing has been disabled, either by default or by an administrator. Unsupported and disabled file types are considered unindexed items in Content Searches. As previously stated, unindexed items can be included in the set of search results when you run a search, export the search results to a local computer, or preparing search results for Advanced eDiscovery.

For a list of supported and disabled file formats, see the following topics:

Return to top

Messages and documents with unindexed file types can be returned in search results

Not every email message with an unindexed file attachment or every unindexed SharePoint document is automatically returned as an unindexed item. That’s because other message or document properties, such as the Subject property in email messages and the Title or Author properties for documents are indexed and available to be searched. For example, a keyword search for "financial" will return items with an unindexed file attachment if that keyword appears in the subject of an email message or in the file name or title of a document. However, if the keyword appears only in the body of the file, the message or document would be returned as an unindexed item.

Similarly, messages with unindexed file attachments and documents of an unindexed file type are included in search results when other message or document properties, which are indexed and searchable, meet the search criteria. Message properties that are indexed for search include sent and received dates, sender and recipient, the file name of an attachment, and text in the message body. Document properties indexed for search include created and modified dates. So even though a message attachment may be an unindexed item, the message will be included in the regular search results if the value of other message or document properties matches the search criteria.

For a list of email and document properties that you can search for by using the Search feature in the Security & Compliance Center, see Keyword queries and search conditions for Content Search.

Return to top

Unindexed items included in the search results

Your organization might be required to identify and perform additional analysis on unindexed items to determine what they are, what they contain, and whether they’re relevant to a specific investigation. To include unindexed items with the search results, you can use the unindexed items option when you run a search, export search results, or prepare the results for Advanced eDiscovery. To include unindexed items, select the Include items that have an unrecognized format, are encrypted, or weren't indexed for other reasons checkbox when you're creating or modifying a search.

Keep the following in mind about unindexed items:

  • When you run a search that includes unindexed items, the number and total size of unindexed items (returned by the search query) are displayed in search statistics in the details pane.

  • When you export search results and include unindexed items, unindexed Exchange items are exported to a separate PST file for each mailbox in which they are located, or as individual messages if you select the option to download Exchange items as messages. Unindexed SharePoint items are exported to a folder named Uncrawlable.

  • If you choose to include all mailbox items in the search results, or if a search query doesn’t specify any keywords or only specifies a date range, unindexed items might not be copied to the PST file that contains the unindexed items. This is because all items, including any unindexed items, will be automatically included in the regular search results.

  • Unindexed items aren't available to be previewed. You have to export the search results to view unindexed items returned by the search.

Return to top

Unindexed items excluded from the search results

If an item is unindexed but it doesn't meet the search query criteria, it won't be included as an unindexed item in the search results. In other words, the item is excluded from the search results. For example, let's say you run a search and don't include any keywords or properties because you want to include all content. But you include a date range condition for the query. If an unindexed item falls outside of that date range, it won't be included as an unindexed item. Date ranges are an effective way to exclude unindexed items from your search results.

Similarly, if you choose to include unindexed items when you export the results of a search, unindexed items that were excluded from the search results won't be exported.

Indexing limits for messages in Content Search

The following table describes the indexing limits that might result in an email message being returned as an unindexed item or a partially indexed item in a Content Search in Office 365.

For a list of indexing limits for SharePoint documents, see Search limits for SharePoint Online.

Indexing limit

Maximum value

Description

Maximum attachment size (excluding Excel files)

32 MB

The maximum size of an email attachment that will parse for indexing. Any attachment that's larger than this limit won't be parsed for indexing, and the message with the attachment will be marked as unindexed.

Note: Parsing is the process where the indexing service extracts text from the attachment, removes unnecessary characters like punctuation and spaces, and then divides the text into words (in a process called tokenization), that are then stored in the index.

Maximum size of Excel files

4 MB

The maximum size of an Excel file located on a site or attached to an email message that will be parsed for indexing. Any Excel file that's larger than this limit won't be parsed, and the file or the email the message with the file attachment will be marked as unindexed.

Maximum number of attachments

10

The maximum number of files attached to an email message that will be parsed for indexing. If a message has more than 10 attachments, the first 10 attachments are parsed and indexed, and the message is marked as partially indexed because it had additional attachments that weren't parsed.

Maximum attachment depth

1

The maximum number of nested attachments that are parsed. For example, if an email message has another message attached to it and the attached message has an attached Word document, the Word document won't be indexed; only the attached message will be indexed.

Maximum number of attached images

0

An image that's attached to an email message is skipped by the parser and isn't indexed.

Maximum time spent parsing an item

30 seconds

A maximum of 30 seconds is spent parsing an item for indexing. If the parsing time exceeds 30 seconds, the item is marked as partially indexed.

Maximum parser output

2 million characters

The maximum amount of text output from the parser that's indexed. For example, if the parser extracted 8 million characters from a document, only the first 2 million characters are indexed.

Return to top

More information about unindexed items

  • As previously stated, because message and document properties and their metadata are indexed, a keyword search might return results if that keyword appears in the indexed metadata. However, that same keyword search might not return the same item if the keyword only appears in the content of an item with an unsupported file type. In this case, the item would be returned as an unindexed item.

  • If an unindexed item is included in the search results because it met the search query criteria (and wasn't excluded) then it won't be included as an unindexed item in the estimated search statistics. Also, it won't be included with unindexed items when you export search results.

  • Although a file type is supported for indexing and is indexed, there can be indexing or search errors that will cause a file to be returned as an unindexed item. For example, searching a very large Excel file might be partially successful (because the first 4 MB are indexed), but will then fail because the file size limit is exceeded. In this case, it’s possible that the same file is returned with the search results and as an unindexed item.

  • Attached files encrypted with Microsoft technologies are indexed and can be searched. Files encrypted with non-Microsoft technologies are unindexed.

  • Email messages encrypted with S/MIME aren't indexed. This includes encrypted messages with or without file attachments.

  • Messages protected using Information Rights Management (IRM) are indexed and will be included in the search results if they match the search query.

Return to top

Share Facebook Facebook Twitter Twitter Email Email

Was this information helpful?

Great! Any other feedback?

How can we improve it?

Thank you for your feedback!

×