Getting Started with Enterprise Search
Steven Magid - Intervate Solutions
Italian fashion icon Aldo Gucci once remarked that “the bitterness of poor quality is remembered long after the sweetness of low price”. Whilst we can probably all identify with this sentiment there are many areas in IT where good quality is indeed available through a wide range of prices, and I’m glad to say that my favourite topic, Enterprise Search, is a prime, if somewhat complex, example.
What is Enterprise Search?
Enterprise Search is the ability to search across all sources of information within and outside of an organisation, ideally from a single location.
The indomitable Google brand has created a common misconception that the technologies of the Web can be used to solve the organisational challenges of finding information. On the Internet, search engines borrow a principle from the world of academics, and rank results according to the popularity of a page – the more incoming links a page receives from other popular pages, the higher the page in question will be ranked.
This, however, doesn’t work well inside the organisation where there is no vast web of hyperlinks to draw from. Even if there were – the relevancy of results in an area where there is only one subject matter expert should not be diminished just because the area lacks popularity. Organisations also invariably need to search not only unstructured content (web pages, documents, etc) but also the structured data found in line-of-business databases. Additionally, companies commonly have a complex security model regarding access to information that must be considered and adhered to that is not an issue when searching the web.
Enterprise Search Market Overview
Making an informed decision on an Enterprise Search solution involves much research and market awareness. The market is in the early stages of maturity, driven largely by a recent spout of mergers and acquisitions at the high-end of the market, and commoditisation of search at the low-end where it’s becoming popular for vendors to release their entry level solutions for free. In 2006, as an answer to the ultra-affordable Google Mini Appliance, IBM was the first to make search free with its release of IBM OmniFind Yahoo! Edition. Microsoft followed suit 2 years later releasing Microsoft Search Server Express 2008 as a free download.
Whilst Gartner predicts the Enterprise Search market will eclipse $1.5 Billion by 2012, the local market is lagging behind. Most major corporations and government agencies do not have a single search solution that covers a substantial portion of their enterprise’s content. The rapid proliferation of Portal and Enterprise Content Management platforms with embedded search, such as Microsoft Office SharePoint Server 2007, means this capability often exists, but poorly planned flick of the switch approaches to implementing the search keeps the real value lying dormant.
Choosing a Solution that is Right for You
The first step in deciding which Enterprise Search solution to procure is to clearly understand your needs. A number of well differentiated products provide a solution for almost every business requirement, but the idea of “one search solution to rule them all” is pure fantasy. Rather select your most important business outcomes and map them to single scalable and robust solutions that can provide for the majority of the requirements. Mid-enterprise solutions are capable of delivering on most non-specialist requirements and offer easier and quicker implementations.
You also ought to consider the scope and size of the corpus of content. The business outcome must be mapped to available processing capacity and storage space. This mapping is important because compliancy search, for example, will require loads of storage space ensuring that no result is ever omitted, but it will go easier on the processor since the likelihood of many concurrent searches is low.
Next, you will need to assess the vendors in terms of available connectors and security. Connectors allow you to index content from different repositories and most search solutions are shipped with the main repository connectors. Security is important since without security trimming, you may find your staff viewing results with content that they are prohibited from accessing. One corporation reported that after indexing a particular repository, a search of an employee’s name returned results linking to disciplinary records for that individual. The employees could not open the documents, but the results page was enough to see which employees had these records, and that was damage enough!
By this stage in the product selection process you will have probably reduced your list of potential vendors down to 2 to 4 possibilities. This is where the bulk of the research begins as you try to distinguish between the features of these vendors. It helps to break these down into 2 major areas – interface architecture, and indexing and retrieval technology.
Interface architecture is concerned with executing searches and presenting the results. With access to a team of developers, you will usually have a wide array of options available. The Application Programming Interface (API) should allow your developers to exercise considerable control over the design of the interface – popular options include the development of parametric search and faceted navigation. In terms of presenting the results, a well planned implementation will control what information (or metadata) is surfaced in the results and in certain instances they will be made actionable.
The indexing and retrieval technology is embedded within the search engine and is therefore an area where you will have very little control in terms of in-house development and customisation. This area is therefore the most important yet difficult to evaluate. Because the underlying technology can’t be extensively modified, it may be important to perform a Proof-of-Concept prior to making the purchasing decision. This is due to the difficulty in assessing the accuracy of results and available tuning options without first-hand experience and access to a large corpus of relevant and familiar content.
Requirements here go a long way to determining how much you are going to spend. Content classification, clustering, fact and entity extraction, concept extraction, audio and visual search, and sentiment analysis are some popular features that will typically move you beyond the mid-enterprise mark and into the expensive high-end enterprise bracket.
Irrespective of the price paid for Enterprise Search, whether it’s a free download or high-end super-system, search needs to be treated as a strategic technology. Companies need to understand exactly what they want from a search solution and once a product is selected, its implementation needs to be carefully planned. Finally, businesses must learn that search is not a magic wand. It may clean the squalid and grimy window used to peer into the front-yard, but if it’s an ugly, broken down jalopy that lies beyond the window then that is what the users will see. Successful search projects therefore need to focus both on the product to be implemented as well as the state and quality of the data to be searched.