The Applications Track focuses on what types of applications can be built using those techniques – the newest kinds of applications, how they deliver business value, and how to organize your text analytics teams to build those applications. This track will appeal to those with some text analytics capabilities but who want to learn what others are doing to develop new applications, and those who are primarily interested in the business side of text analytics.
View the Text Analytics Forum 2018 Final Program PDF
Wednesday, November 7: 1:30 p.m. - 2:15 p.m.
Performing and synthesizing text analysis is not an easy task. It requires different level of disciplines. In this session, Chung and Duddempudi share lessons learned and journey on how to develop the capability for the team, centered around these areas: disciplines required to perform text analysis; generating the right level of insights to answer business questions; integrating into business operations; and determining criteria to select the right tools.
Alice Chung, Senior Analytics Manager, Medical Insights Lead, Genentech and PMP, Certified Innovation Manager (GIMI)
Deepthi Duddempudi, Data Scientist, Incedo
There are lots of tools available that provide the building blocks for automated tagging applications including NLP, entity extraction, summarization, and sentiment analysis. Many tools and search engines also include a content categorizer. What they usually do is categorize to ITPC news or Wikipedia categories. But what if you want to categorize to some other scheme or a set of custom subjects relevant to you or your organization’s areas of interest? Boolean queries are a useful way to scope custom categories. It also happens to be the most transparent method for specifying the rules for content categorizers to automate the tagging process to predefined categories or taxonomies. This session provides a quick review of the Boolean query syntax, and then presents a step-by-step process for building a Boolean query categorizer.
Joseph Busch, Principal, Taxonomy Strategies
Wednesday, November 7: 2:30 p.m. - 3:15 p.m.
Enterprises have finally mastered the art and science of gathering enough data, but some struggle to make it meaningful – particularly when dealing with unstructured text. Cutting-edge machine learning initiatives will help, but success depends on consistently organized and labeled data. For enterprises with data from multiple sources in multiple types, a general taxonomy is an important pre-processor to normalize, clean, and tag the data. Explore how users across the enterprise can, in real-time, leverage unstructured text data sets for a wide range of business applications.
Stephen Scarr, CEO and Co-Founder, eContext
Incredible technical capabilities and a myriad of implementation strategies are the real excitement … for us. How do you get the movers and shakers excited too? Begin by translating the possible—and something that sounds like magic—into something relatable. Berndt and Kent start the discussion on the many ways organizations are benefiting from text analytics and illustrate the value of taxonomies, ontologies, and semantics in a text analytics infrastructure—all with an eye toward helping you navigate the financial and organizational barriers to a successful text analytics project.
Sarah Ann Berndt, KM & Social Learning Program Manager, Knowledge Management & Social Learning, TechnipFMC
Evelyn L. Kent, Principal, Bacon Tree Consulting
In this talk, Garcia and Raya share how Grupo Imagen applies analytical solutions in text mining and calculates the ROI within Grupo Imagen. Data mining, machine learning and artificial intelligence were the topics that we begin to explore to answer the questions in order to obtain new KPIs, dashboards, and BI systems. Although the solution is fully conceptualized, the great challenge is to carry it out with limited (monetary and RH) resources. Now, the fundamental challenge for implementation is to satisfy the very basic business equation: Profit = Sale - Cost (Research and Development). Now, that all research has been done, the challenge Grupo Imagen faces to continue is the cost. When your CPM is $5, can you afford Watson (IBM)? Or should we start from scratch to generate a customized low-cost solution?
Daniel Villegas Raya, Audience Development Manager, Grupo Imagen
Thursday, November 8: 10:15 a.m. - 11:00 a.m.
The AI hype is rapidly exploding into C-suites of organizations around the world and with good reason—the promise is compelling. The convergence of AI, robotic process automation (RPA), machine learning, and cognitive platforms allows employees to focus their attention on exception processing and higher-value work while digital labor manages low-value, repetitive tasks. While the debate whether digital labor will add or eliminate jobs is ongoing, what’s important in today’s enterprise is how digital and human labor can be integrated to improve efficiency and drive innovation. Using real-world examples, this session covers how machine processing, when guided by human knowledge, curation, and control, provides assisted intelligence (AI) to organizations that want to streamline processes, reduce operating costs, and improve ROI.
Jeremy Bentley, Head, Strategy, MarkLogic
Traditional knowledge platforms are not capable of effectively understanding unstructured information due to the complexity of language and the lack of structure. Therefore, they cannot effectively organize disparate sources of knowledge (marketing material, customer service content, emails, chat logs, social media chatter, customer response surveys, internal documentation, etc.) in any meaningful way. Addressing the complexity of language requires more than keywords, spending weeks and months manually tagging data, or locating topic-specific content in an effort to train machine learning algorithms. This presentation explains the concepts behind an AI/cognitive computing platform, what makes it work, and how it can be deployed as a smart infrastructure to support a variety of business objectives. It will include a demonstration of an English-language knowledge graph (ontology), a customer support self-service mobile solution, and a smart content navigation portal.
Bryan Bell, Regional Vice President of Sales, Lucidworks
Thursday, November 8: 11:15 a.m. - 12:00 p.m.
FINRA receives hundreds of thousands of various documents each year from stockbrokers and investors to be reviewed and analyzed by our investigators. The investigators are looking for information about the who, what, where, when, and how contained in these documents, which is labor-intensive. Our solution was to develop a system that leverages NLP, machine learning and graph databases. The enhanced NER model combined with our custom entity resolution algorithms allowed us to extract individuals, organizations, and FINRA-specific entities and to map these entities into FINRA’s business systems. Entities were loaded into the Titan graph DB that supported navigation between documents, individuals, and organizations, visually highlighting hard-to-see patterns and insights. In addition, our NLP process allowed us to generate document summaries. This system significantly improved effectiveness and comprehensiveness of investigators’ documents review.
Dmytro Dolgopolov, Senior Director, Financial Industry Regulatory Authority (FINRA)
Greg Wolff, Enterprise Software Architect, Financial Industry Regulatory Authority (FINRA)
In order to reduce waste in DoD research, the DTIC is developing a document similarity application to identify forms of fraud, waste, and abuse such as equivalent work being done by different services. The document similarity tool will provide DTIC with the capability to apply content analytics against a large collection of documents, including Request for Proposals, proposals, technical reports, and project descriptions is key. The challenge goes beyond the identification of simple copy-and-paste style duplication. In this presentation, we discuss our hybrid approach to evaluating document similarity that combines multiple approaches, including vector space models, semantic similarity, and a novel approach to text analytics called trains-of-thought analysis. In addition, we provide a demonstration of the web-based application to include a real-time, document similarity analysis, including data visualizations, to speed the finding and assessment of similar content.
Hany Mohammed, Senior Information Architect, Defense Technical Information Center (DTIC)
Lowell Vizenor, CTO, Defense Technical Information Center (DTIC)
Thursday, November 8: 1:00 p.m. - 1:45 p.m.
Using NLP and linguistics, Saini presents Sapient’s work to develop unsupervised learning-based question-answering system. This talk showcases a demo on real-life data and also explains the process of building such automated QnA systems. Further, it also talks about shortcomings of chatbot systems and how these systems can be integrated with QnA systems to make them scalable.
Anuj Saini, Architect NLP, Sapient Corp.
Sanjani introduces a semantic search and browse tool that aims to help researchers at IMF with their studies by finding relevant papers, authors, and concepts.
Marzie Taheri Sanjani, US Head of Quantitative Macro Research, Global Macro Advisers and SPX
Thursday, November 8: 2:00 p.m. - 2:45 p.m.
Meza presents a discussion on how the use of an analytical framework in conjunction with the current human interface improved the understanding of the International Space Station crew perspective data and shortened the analysis time, allowing for more informed decisions and rapid development improvements.
David Meza, Chief Knowledge Architect, NASA Johnson Space Center
The Center for Food Safety and Applied Nutrition at the U.S. Food & Drug Administration (CFSAN/FDA) has been piloting a new process to identify, prioritize, and address potential emerging chemical hazards of concern that may be associated with CFSAN-regulated products. The objective has been to develop a business solution that enables analysts to identify predictors that are indicative of emerging chemical hazards associated with CFSAN-regulated products. This presentation reviews how CFSAN leverages SAS capabilities such as text analytics, entity extraction, predictive modeling, and business intelligence, combined with access to a variety of data sources, in our approach to build the Emerging Chemical Hazard Intelligence Platform (ECHIP), CFSAN’s solution for identifying emerging chemical hazards. We discuss how we developed an integrated solution that enables our analysts to quickly filter, visualize, and identify trends in reports that are indicative of potential chemical hazards.
Emily McRae, Systems Engineer, SAS
This presentation showcases a strategy of applying text analytics to explore the Trafficking in Persons (TIP) reports and apply new layers of structured information. Specifically, it identifies common themes across the reports, use topic analysis to identify a structural similarity across reports, identifying source and destination countries involved in trafficking, and use a rule-building approach to extract these relationships from free-form text. We subsequently depict these trafficking relationships across multiple countries using a geographic network diagram that covers the types of trafficking as well as whether the countries involved are invested in addressing the problem. This ultimately provides decision makers with big-picture information about how to best combat human trafficking internationally.
Tom Sabo, Advisory Solutions Architect, SAS
Thursday, November 8: 3:00 p.m. - 3:45 p.m.
Since supervised machine learning gained court acceptance for use in e-discovery 6 years ago, best practices have evolved. This talk describes the special circumstances of e-discovery and the best approaches that are currently in use. How robust is the Continuous Active Learning (CAL) approach? How much impact does the choice of seed documents have? What are SCAL and TAR 3.0?
Bill Dimm, Founder & CEO, Hot Neuron LLC
This short case study describes a recent project with a special collection from a major university library which posed a fascinating challenge: Provided with scanned images (and OCR) of 14,000 typewritten and handwritten Cuban catalog cards, how can we extract structured text, index the content, and build XML records from this source data? Using a variety of text analytics techniques—including both Boolean and Bayesian approaches—we were able to identify and extract and structure the targeted elements accurately enough to create a dataset that required minimal manual cleanup.
Bob Kasenchak, Information Architect, Factor