Managing Search Complexity through Simplicity

At the heart of any search solution is a good understanding of the business problem to be solved as well as knowledge of the available content and metadata. You have to work within the confines of the content you have (or can add with content enhancement). You have to analyze that content and be able to describe how you can identify some documents as being relevant for solving a business problem and why other documents are not relevant. That is just the beginning.

Developing an effective search solution is a complex space by it’s very nature. It brings many different pieces together to accomplish specific business objectives. Managing the complexity of search is no simple matter. One of the top challenges faced by businesses is managing this complexity. The key is managing solution complexity and providing an effective solution is to simplify the scope of the solution by dividing the space into meaningful layers:

Business Problem to Solve
     The end goal is to provide a space where employees, customers and contractors can search for and find accurate and timely information. After all, their job is not to search but to complete some other business process or transaction. Search is a tool for helping people complete their tasks successfully. Understanding the business problem to be solved and how it relates to the larger strategic plan of the business is vital for creating a successful search space.

User Environment
    Understanding how many people will be using the system, how they will access it (web, intranet, internet, mobile devices, embedded applications, etc.), and the business processes to be supported.

Search Application
    Consists of the Query and Results sub-layers. It is a user interface which consists of various search tooling functionalities. People’s impression of the effectiveness of a search solution is often based on the search interface alone. After all, that is what people see and use. Too often businesses use an out-of-the-box interface without evaluating whether it is designed to solve the business problems driving the need to update search.

Search Tooling
    Search tooling is the tool set that a search platform provides to build search applications. Tooling may include search word boosting, relevance tuning, thesaurus, synonyms, stopword lists, facets, taxonomies, search analytics, knowledge extraction tooling, and more. Also included are how content sources are crawled, parsed and indexed. Different search platforms provide different tool sets. Understanding what tooling is available and how they work is key to being able to architect an effective search solution.

Search Platform/Engine
    The software that provides the search tooling; whether it is IBM OmniFind Enterprise Edition, Endeca, Autonomy, FAST, Lucene/SOLR or some proprietary solution. In addition to search tooling you also want to pay attention to a platform’s scalability, fail-over, disaster recovery, system management, configuration management, system security and availability.

Content Enhancement
    Content enhancement or enrichment is sometimes required in order to develop a search solution that will solve a specific business need. This may mean that third-party data needs to be added to existing data. It may mean that knowledge extraction tools need to be used for unstructured (that is, non-fielded data) data like that found in emails, reports and memo fields in databases.

Content & Metadata
    It is important to know the number of documents, content types (email, reports, database records, etc.), average size of documents, annual growth of your data stores, multi-lingual requirements, governance strategy. It is also important to know the types of information that is explicitly available or can be extrapolated via content enhancement. This information will be the basis for building an effective search solution.

Security
    Security is a vital piece of any enterprise project. In the case of search, there is user authentication & authorization for accessing the search application and for the data that will be displayed in search results lists. (You don’t want just anybody to view HR data.) Search engine crawlers also require authentication and authorization for access to different data stores at the collection level. And then there is security digital rights management at the individual document level within a given data store. In many cases this is the most complex IT piece of the search puzzle. However, solving this layer alone does not guarantee an effective search solution.

Storage Environment
    Knowing the number and types of data stores (portal, file system, FileNet8, Domino, Quickr, Documentum, etc.) is vital. Knowing how frequently the stores are updated and will need to be searched is another key piece of information. It is also know the format that data is stored in (PDF, database, flat files, etc.).

IT Infrastructure
    All of the above occurs within an IT framework of many layers in its own right and can include the following layers: network, hardware, operating system, system software and more depending on the environment.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s