Why Build Custom Categorizers Using Boolean Queries Instead of Machine Learning? Robert Wood Johnson Foundation Case Study

Joseph Busch, Vivian Bliss

Abstract


This presentation will cover a case study for using Boolean queries to scope custom categories, provide a Boolean query syntax primer, and then present a step-by-step process for building a Boolean query categorizer. The Robert Wood Johnson Foundation (RWJF) is the largest philanthropy dedicated solely to health in the United States. Taxonomy Strategies has been working with RWJF to develop an enterprise metadata framework and taxonomy to support needs across areas including program management, research and evaluation, communications, finance, etc. We have also been working with RWJF on methods to apply automation to support taxonomy development and implementation within their various information management applications. Machine learning has become a popular and hyped method promoted by large information management application vendors including Microsoft, IBM, Salesforce and others. The problem is that machine learning is opaque. The benefit is that you don’t need to do any preparation, content just gets processed. The problem is that the categories are generic, may be irrelevant, can be biased, and are difficult to change or tune. Pre-defined categories (e.g., a controlled vocabulary or taxonomy) plus Boolean queries to scope the context for categories are much more transparent. The benefit is relevant categories. The problem is that pre-defined categories requires work to set up, and specialized skills. But how hard is it do this?