The Technical Track focuses on the latest text analytics techniques and methods – a how-to develop text analytics foundations, incorporate the latest advances, and explore new approaches. This track will appeal to anyone just starting to develop text analytics and to those who are looking to enhance their current efforts.
View the Text Analytics Forum 2019 Final Program PDF
Wednesday, November 6: 11:45 a.m. - 12:30 p.m.
This talk describes the use of an enterprise knowledge graph as the semantic backbone of a text analytics application. Going beyond traditional hierarchical taxonomies, we demonstrate how we use a knowledge graph to enhance entity resolution, boost signal detection, and improve relevance scoring. We examine a use case where graph-informed tagging adds business value by surfacing connections between different facets of content and by driving personalization and user experience through precise metadata.
Dan Segal, Information Architect, IBM
Wednesday, November 6: 1:30 p.m. - 2:15 p.m.
This presentation covers text analytics and text processing techniques used in creating several interesting text based knowledge graphs. One example is the Noam Chomsky Knowledge Graph which incorporates hundreds of articles and numerous books that Chomsky has authored about linguistics, mass media, politics, and war. Another example covers health effects for ingredients in foods and beauty products. We show how a combination of AI techniques and knowledge graphs can be used to transform text-heavy applications into an interactive response system that can be used by scientists, technologists, politicians, and scholars along with smart applications, intelligent chatbots, and question/answering machines, as well as other AI and data systems.
Jans Aasman, CEO, Franz Inc.
Wednesday, November 6: 2:30 p.m. - 3:15 p.m.
This session looks at how AI and data science may be able to shape the world of project delivery, particularly projects which have a high degree of complexity but only if mediated by human sense-making and decision support. Our experienced and popular speaker takes work from counter-terrorism (his DARPA and other work) and applies them to project management using a multi-methods approach. Get in on the ground floor and grab new ways of thinking about AI and analytics.
Dave Snowden, Founder & Chief Scientist, The Cynefin Company
Wednesday, November 6: 4:00 p.m. - 5:00 p.m.
A panel of four text analytics experts answer questions that have been gathered before the conference, during the conference and some additional questions from the program chair. This was one of our most popular features last year, so come prepared with your favorite questions and be ready to learn!
Jeremy Bentley, Head, Strategy, MarkLogic
John Paty, Expert System
Mark Butler, VP Engineering, Voise, Inc.
Simon Taylor, VP, Partners & Alliances, Lucidworks
Thursday, November 7: 10:15 a.m. - 11:00 a.m.
Government influence operations (IO) have been conducted throughout recorded history. In recent times, they have commonly been referred to as propaganda, active measures, or psychological operations (PSYOPS). More than a century of Russian “Chekist” tradition has culminated in a force that can mobilize thousands of humans augmented by unlimited numbers of bots. As documented in congressional testimony, this force has repeatedly seized control of foreign news cycles, inserting sentiments or wholly fictional stories. A simple positive/neutral/negative axis is not as applicable to the IO mission as one specific to the operation in question, such as entity stability/ instability, trustworthiness, or advocacy of violence. Given an IO action, such as promotion of an embarrassing story, the operator wants to measure the effect as change in sentiment, such as distrust of the now-discredited entity.
Christopher Biow, SVP, Global Public Sector, Basis Technology
Mike Harris, Director, Field Operations, Basis Technology Corp
Regulations.gov was launched in 2003 to provide the public with access to federal regulatory content and the ability to submit comments on federal regulations. Manually reading thousands of comments is time-consuming and labor-intensive. It is also difficult for multiple reviewers to accurately and consistently assess content, themes, stakeholder identity, and sentiment. In response to this issue, text analytics can be used to develop transparent and accurate text models, and visual analytics can quantify, summarize, and present the results of that analysis. This talk addresses public commentary submitted in response to new product regulations by the U.S. Food and Drug Administration.
Emily McRae, Systems Engineer, SAS
Thursday, November 7: 11:15 a.m. - 12:00 p.m.
Due to subjective content, an absence of labels and a lack of dimensions, analyzing unstructured data can be a challenging task. In this session we’ll discuss improving unstructured data analysis through automation (including a human-in-the-loop), pre-processing capabilities for reducing noise, and options for feature engineering and extraction. You will also get an overview of our hybrid analytical platform, which combines Natural Language Processing along with Machine Learning and statistical techniques in order to deliver rich insights. Our hope is that regardless of your platform of choice, you obtain ideas that make your own analysis easier and more effective
Sundaresh Sankaran, Solutions Architect, Global Technology Practice, SAS Institute
One fundamental obstacle for using machine learning (ML) to accurately extract facts from free text documents is that it requires huge amounts of pre-categorized data for training. Manual annotation is not a viable option as it would entail enormous amounts of human analyst time. In this presentation we outline an innovative rule-based approach for automated generation of pre-categorized data that can be further used for training ML models. This approach relies on writing queries expressed in the powerful pattern definition language that fully exploits the results of the underlying natural language processing (NLP): deep linguistic, semantic, and statistical analysis of documents. The sequential application of rule-based and ML techniques facilitates the high accuracy of results. An example project illustrating this technology focuses on the automated extraction of clinical information from patient medical records.
Sergei Ananyan, CEO, Megaputer Intelligence
Elli Bourlai, Senior Computational Linguist, Megaputer Intelligence
Thursday, November 7: 1:00 p.m. - 1:45 p.m.
AI promises to categorize all types of content with reliable results, but the reality is much more complex. Most applications won’t work with a meat grinder approach, where you pour a huge amount of content in one end and a perfectly organized collection comes out the other end. Effective automated categorization depends on defining a process workflow and assembling a stack of methods to process different types of content in different ways. Designing and validating a content processing workflow requires human judgments. So good quality categorization applications often rely on how to make the best use of people. This presentation provides a reality check on unsupervised automated categorization, and discusses a case study in which the performance was suitable for editorial review and approval, but not for unsupervised processing of a large collection.
Joseph Busch, Principal, Taxonomy Strategies
There is no such thing as unstructured text—even tweets have some structure—words, clauses, phrases, even the occasional paragraph. Techniques that treat documents as undifferentiated bags of words have never achieved a high enough accuracy to build good auto-categorization whether using machine learning (ML) or rules. However, going beyond bags of words and utilizing the structures found in “unstructured” text, it is possible to achieve dramatically improved accuracy. This talk, using multiple examples from recent projects, presents how to build content structure models and content structure rules that can be used for both rules-based and ML categorization. We conclude with a method for combining rules and ML in a variety of ways for the best of both worlds.
Tom Reamy, Chief Knowledge Architect & Founder, KAPS Group and Author, Deep Text
Thursday, November 7: 2:00 p.m. - 2:45 p.m.
The American Psychological Association’s PsycINFO databases release around 3,000 records per month. In June 2017, a plan was created to bring machine-aided indexing (MAI) back to the APA’s PsycINFO databases. Since then, MAI has been implemented across three of the databases, including PsycARTICLES. Pearson discusses the strategy used to build the rule base, and integrate the software into the production system. He also takes a look at some of the challenges faced along the way and explores future goals and further deployment plans.
Christopher Pearson, Machine-Aided Indexing Specialist, Content Management, American Psychological Association
Thursday, November 7: 3:00 p.m. - 3:45 p.m.
Products like Amazon Alexa and Google Home are changing the expectations as to how search should work. Searchers now expect voice-driven search solutions that provide answers and not just a list of links. Part of this talk shares how knowledge graphs enable a natural language search and how text analytics along with machine learning can be used to populate these powerful constructs. We explain how to architect these solutions and provide real world examples as to how many of our clients have taken advantage of these powerful tools.
Joseph Hilger, COO, Enterprise Knowledge, LLC
Similar to many enterprises, the Inter-American Development Bank (IDB) has multiple information sources which are isolated in different systems. There is no link between all these information resources that can make them accessible outside of their native systems. It is not possible to relate distinct kinds of resources that share some characteristics, e.g., to find a course that is about the same topic as a publication. To achieve this objective, IDB implemented a system that can automatically extract entities and concepts from its systems, including structured and unstructured data. Further, it semantically enhanced the data and made it accessible in a Knowledge Graph. Hernandez and Marino share lessons learned from this project that can help interested attendees start with a baseline of best practices for their own projects, saving valuable time and money.
Chris Marino, Senior Consultant, Enterprise Knowledge
Monica Hernandez, Senior Project Manager, Inter-American Development Bank