I've been involved with an initiative to transform search and information discovery at a company for a while now. We're trying some really groundbreaking things, but it ultimately comes down to getting the right information to the user when the ask for it, and we tend to call this relevance. Relevance is a tricky thing because, to a large degree, it is contextual. There are some aspects you can anticipate or control, while others you will never be able to anticipate. For these reasons, I've devised a 3-facet approach to discussing relevance.
Collection: Ensuring that you have the right stuff in your bucket while keeping the wrong stuff out. Most search experiences could be improved a great deal by simply removing all the crap that nobody's interested in from the index.
Selection: Being thoughtful about how you are pulling items from the bucket at query time. This is far and away the most difficult facet to get right. It means building a platform that anticipates user needs as it translates the user request into a response made up of items from the bucket in Facet 1.
Inspection: Giving the user the right levers to pull to refine the response from the platform. Where the platform's ability to anticipate user needs stops, we can turn control over to the user to create a result set that is more satisfactory based on their specific needs.
Now that we have pithy names for each facet, let's dig into each:"Search relevance is driven by collection, selection, and inspection."
The first thing to do is to make sure that the information you are searching is as clean as possible. Claude Shannon called this "signal to noise." Many times, the thinking is that better technology will do a better job of managing the same ol' pile of crap that the last "better" technology failed to manage. But, it's not just removing stuff from your indices. Cleaning house is good, but the problem is often that there's no good description of the information to help the technology understand it. Segmenting your content into rational collections is a good start. Don't just organize by where it's stored; organize the content by purpose.
A solid metadata strategy is helpful, but difficult to maintain and ensure consistency. Semantic technology and machine learning can be leveraged to ferret out the relationships between unstructured and structured data to create dynamic metadata. As much as you can do any of these, it will only help.
More than anything, a solid model for your knowledge domain will help create a rational pool of information. I work in professional services, so the kernel of the knowledge graph is "People doing Work for Companies." In a retail company, it may be "Sales Reps selling Products to Customers." Whatever the primary function of your organization is, use that as the core of the knowledge model, and thoughtfully add information in a way that it can hang off that structure. Having explicit entities defined within the knowledge domain not only make the information segmentation more effective, but it allows for more specific requests for information.
Once you have a well-organized collection, having an algorithm that is tuned to pull the most relevant information is the next step.
Parsing: Understanding what the user is asking for is critical to being able to respond effectively. Natural Language Processing allows you to parse the user query to understand what it is they are asking for. Mapping entities found in the user input to nodes on the knowledge domain graph allows you to reach into the pool of data and return the most relevant information based on where it hangs off the knowledge domain graph.
Who's askin'?: There are a number of data points about a specific user that can help determine how to tune the algorithm in order to present the most relevant results. If the user is logged in, you can use profile data about that user to help determine what is most appropriate. With mobile, it is possible to determine where and when the question is being asked. It is also possible to incorporate "I/me" as a query concept (e.g. "What's my vacation balance?" or "Who have I worked with?"). The more you know about the user, the more effectively you can return the results.
Biasing: In certain cases, some sources of information will be more important than others. In other cases, it may depend on who's asking the question or when/where they are asking for it in order to determine information priority. In still other cases, priority of information may be based on specifically what is being asked. It's important to note that, just like security, biasing can be early- or late-binding. A mix of these approaches will yield the best results. It is also important to bias thoughtfully, because prioritizing one source up, by definition, means prioritizing another source down.
No matter how clean you keep the pool of data you draw from, and no matter how intelligent you make the algorithm to reach in and grab information for the user, you will never be able to fully predict the context in which the user is asking the question. The best you can do is provide them with what seems to be the most relevant materials based on the way they phrased the query. At this point, it's best to turn control over to the user and let them fine tune the relevance of the result set.
Our research showed that users, overwhelmingly, want information discovery to be a dialog. Users were perfectly fine if you didn't get the result set perfect the first time, as long as you provided them with the affordances to refine the result set on their own. This means all the usual clustering/filtering of results, as well as the ability to search within results to get at precisely what is desired.
It was also important to respond with different kinds of results: a specific data point, a pointer to a source of record, or even tools to continue to explore information possibilities (e.g. a break-even analysis tool). Returning a list of links to documents is not enough anymore. We enhanced this capability by providing the ability to perform computational analysis of the underlying data in order to provide answers that were derived from (rather than stored in) the source data.
Thoughtfulness is at the core of a successful and relevant search experience. Begin with a thoughtful model that describes the knowledge domain. Next, thoughtfully incorporate data to the pool of available information in a way that relates to the knowledge model. Next, be thoughtful in the way that items are selected from the information pool so that the best possible starting point for information discovery is returned. Finally, provide thoughtful levers for the user to pull to further drive specific contexts to refine the returned results. If you manage to hit on all of these aspects, the resulting information is more likely to satisfy the user's needs.