Click on a session title below to see individual session descriptions and speaker details. Or, click here to view the full Text Analytics Forum program.
Upgrade to a Platinum Pass for your choice of two preconference workshops or access to Taxonomy Boot Camp, a co-located event with Text Analytics Forum 2018. Also, includes access to the Enterprise Solutions Showcase Grand Opening Reception from 5:00 p.m. - 6:30 p.m.
Upgrade to a Platinum or Gold Pass for full-access to all sessions at KMWorld, Enterprise Search & Discovery, Office 365 Symposium, and Taxonomy Boot Camp, a series of co-located events happening alongside Text Analytics Forum 2018. Also includes access to the Networking Happy Hour taking place in the Enterprise Solutions Showcase from 5:00 p.m. - 6:00 p.m.
While it may not occur to us on a daily basis, there is a widespread cultural tendency toward quick decisions and quick action. This pattern has resulted in many of society’s greatest successes, but even more of its failures. We have begun to reward speed over quality, and the negative effects suffered in both our personal and professional lives are potentially catastrophic. Pontefract proposes a return to balance between the three components of productive thought: dreaming, deciding, and doing; combining creative, critical, and applied thinking. “Open Thinking” is a cyclical process in which creativity is encouraged, critiquing leads to better decisions, and thoughtful action delivers positive, sustainable results. Get tips & techniques to use in your organization!
Hayes surfaces ideas on how the world’s biggest and most innovative companies transform customer and employee experiences. Learn how the best and brightest organizations take a human-first approach to finally meet the transformational promise of Big Data by delivering moments of clarity to employees and customers alike through engaging digital experiences.
Becoming information-driven enables key stakeholders within an organization to leverage all available enterprise data and content to gain the best possible understanding of the meaning and insights it carries. Connecting enterprise data along topical lines across all available sources provides people with the collective knowledge and expertise of the organization in context. This is especially valuable for data-intensive companies that are geographically dispersed with lots of content in multiple data repositories. By connecting people with relevant knowledge and expertise, the overall performance of the organization increases. Parker discusses the challenges preventing data-intensive organizations from becoming “information-driven” how insight engines help organizations solve these challenges and multiply the business benefits, and the current state and the future possibilities of insight engines.
With the recently published book, Deep Text: Using Text Analytics to Overcome Information Overload, Get Real Value from Social Media, and Add Big(ger) Text to Big Data as a guide, author Tom Reamy provides an extensive overview of the whole field of text analytics: What is text analytics, how to get started, developing best practices, latest applications, and building an enterprise text analytics platform. The talk ends with a look at current and future trends that promise to dramatically enhance our ability to utilize text with new techniques and applications.
How do you decide whether cognitive computing is right—even necessary—for your organization? When new and complex technologies like AI and cognitive computing burst on the scene, it’s easy to rush to adopt them. The result is often confusion and technology abandonment when the new applications don’t meet expectations. Hoping to forestall this shelfware phenomenon, in 2016 the Cognitive Computing Consortium started to develop guidelines for understanding how to use cognitive applications. Our goal was to come up with a set of usage profiles that developers could match to their planned use of cognitive technologies. This presentation describes the Consortium framework for understanding cognitive applications and gives examples of successful uses for a variety of purposes such as customer relations, healthcare, and robotics. A panel of experienced experts then describes how they are using cognitive applications and fields questions on that topic from the audience.
This talk is about how we’ve found ways to clean up the mess by increasing precision and recall with a hybrid rules-based/Bayesian approach while also making a new data source meaningful and usable across the organization. We were able to dramatically increase the quality of extracted attributes by transforming raw data into a managed taxonomy. By integrating the work of engineering and taxonomy, we can ensure that changes to the taxonomy are painlessly integrated into databases and that engineering work increases the effectiveness of taxonomists. Attendees walk away with an idea of what collaboration between developers and taxonomists looks like from the taxonomist’s perspective at one company with a strong engineering culture, along with some practical tips on how to turn low-quality or unstructured data into high-quality semantic data.
DTIC acquires approximately 25,000 new research documents each year, and this number is expected to at least double in the next few years. A key challenge for DTIC is to make this data useful to end users. In response, DTIC has invested in an enterprise metadata strategy to provide efficient and consistent information extraction methods across collections and develop downstream applications that will leverage this metadata to automate much of the manual effort it takes analysts to enrich the content and researchers to search through it to find answers. One of these applications is the Metatagger, a text analytics tool which is applied to content and then provides automatic tagging and subject categorization. The source of the terminology for the tagging is the DTIC Thesaurus, and through the use of topic files works to extract terms and categories.
Performing and synthesizing text analysis is not an easy task. It requires different level of disciplines. In this session, Chung and Duddempudi share lessons learned and journey on how to develop the capability for the team, centered around these areas: disciplines required to perform text analysis; generating the right level of insights to answer business questions; integrating into business operations; and determining criteria to select the right tools.
There are lots of tools available that provide the building blocks for automated tagging applications including NLP, entity extraction, summarization, and sentiment analysis. Many tools and search engines also include a content categorizer. What they usually do is categorize to ITPC news or Wikipedia categories. But what if you want to categorize to some other scheme or a set of custom subjects relevant to you or your organization’s areas of interest? Boolean queries are a useful way to scope custom categories. It also happens to be the most transparent method for specifying the rules for content categorizers to automate the tagging process to predefined categories or taxonomies. This session provides a quick review of the Boolean query syntax, and then presents a step-by-step process for building a Boolean query categorizer.
Keyword research allows companies to learn the voice of their customers and tune their marketing messages for them. One of the challenges in keyword research is to find collections of keywords that are topically relevant and in demand and therefore likely to draw search traffic and customer engagement. Data sources such as search logs and search engine result pages provide valuable sources of keywords, as well as insight into audience-specific language. Additionally, cognitive technologies such as natural language processing and machine learning provide capabilities for mining those sources at scale. With a few tools and some minimal coding, an analyst can generate clusters of best-bet keywords that are not only syntactically similar but semantically related. This how-to talk presents some practical techniques for automated analysis of keyword source data using off-the-shelf APIs.
Uncovering insights and deep connections across your unstructured data using AI is challenging. You need to design for scalability and apt level of sophistication at various stages in the data ingestion pipeline as well as post ingestion interactions with the corpora. In this session, we discuss the top 10 things, including techniques, you would need to account for when designing AI-enabled discovery and exploration systems that can augment knowledge workers to make good decisions. These include but are not limited to document cleansing and conversion, machine-learned entity extraction and resolution, knowledge graph construction, natural language queries, passage retrieval, relevancy training, relationship graphs, and anomaly detection.
Enterprises have finally mastered the art and science of gathering enough data, but some struggle to make it meaningful – particularly when dealing with unstructured text. Cutting-edge machine learning initiatives will help, but success depends on consistently organized and labeled data. For enterprises with data from multiple sources in multiple types, a general taxonomy is an important pre-processor to normalize, clean, and tag the data. Explore how users across the enterprise can, in real-time, leverage unstructured text data sets for a wide range of business applications.
Incredible technical capabilities and a myriad of implementation strategies are the real excitement … for us. How do you get the movers and shakers excited too? Begin by translating the possible—and something that sounds like magic—into something relatable. Berndt and Kent start the discussion on the many ways organizations are benefiting from text analytics and illustrate the value of taxonomies, ontologies, and semantics in a text analytics infrastructure—all with an eye toward helping you navigate the financial and organizational barriers to a successful text analytics project.
In this talk, Garcia and Raya share how Grupo Imagen applies analytical solutions in text mining and calculates the ROI within Grupo Imagen. Data mining, machine learning and artificial intelligence were the topics that we begin to explore to answer the questions in order to obtain new KPIs, dashboards, and BI systems. Although the solution is fully conceptualized, the great challenge is to carry it out with limited (monetary and RH) resources. Now, the fundamental challenge for implementation is to satisfy the very basic business equation: Profit = Sale - Cost (Research and Development). Now, that all research has been done, the challenge Grupo Imagen faces to continue is the cost. When your CPM is $5, can you afford Watson (IBM)? Or should we start from scratch to generate a customized low-cost solution?
A panel of four text analytics experts answer questions that have been gathered before the conference, during the conference, and some additional questions from the program chair. This was one of our most popular features last year, so come prepared with your favorite questions and be ready to learn.
Organizations can use game design techniques to fully engage customers, partners, and employees. When well implemented, gamification can transform a work culture by cultivating deep emotional connections, high levels of active participation, and long-term relationships that drive knowledge sharing, learning and business value. Enterprises can utilize strategy games, simulation games, and role-playing games as means to teach, drive operational efficiencies, and innovate. Find out how organizations have embraced social collaboration using playful design to reap tremendous value, grab tips and tools to build a learning culture, and learn how to engage your community!
Semantic enhanced artificial intelligence is based on the fusion of semantic technologies and machine learning. Our leader in the field discusses six core aspects of semantic-enhanced AI and why semantics should be a fundamental element of any AI strategy. He looks into concrete examples and shares how to increase precision of machine learning tasks by semantic enrichment. Semantic AI is the next-generation artificial intelligence. Understand how machine learning (ML) can help to extend knowledge graphs, and in return, how knowledge graphs can help to improve ML algorithms. This integrated approach ultimately leads to systems that work like self-optimizing machines after an initial setup phase, while being transparent to the underlying knowledge models.
Answers are the key exchange between customer and provider in support, service, and sales, yet that intersection is wrought with friction when information isn’t readily available, context is unknown, and time is of the essence. AI-driven technologies such as natural language processing, machine learning, and text analytics can help reduce the friction and create more satisfying experiences for both customer and vendor, across any touchpoint, ensuring the most precise answer is delivered every time. Johnson explores how and shares real-world outcomes from Fortune 1000 companies.
Most text analysis methods include the removal of stopwords, which generally overlap with the linguistic category of function words, as part of pre-processing. While this makes sense in the majority of use cases, function words can be extremely powerful. Research within the field of language psychology, largely centered around linguistic inquiry and word count (LIWC), has shown that function words are indicative of a range of cognitive, social, and psychological states. This makes an understanding of function words vital to making appropriate decisions in text analytics. In model design, differences in expected distributions of function words compared with content words have an impact on feature engineering. For instance, methods which use as their input the presence or absence of a word within a text segment will produce no useable signal when applied to function words, while those that are sensitive to deviations from expected frequency within a given language context will be highly successful. When interpreting results, differences in the way that function and content words are processed neurologically must be accounted for. As awareness of the utility of function words rises within the text analytics community, it is increasingly important to cultivate a nuanced understanding of the nature of function words.
The basic premise of taxonomy and text analytics work is to impose structure on—or reveal structure in—unstructured content. Despite being called “unstructured,” much workplace information can be described as semi-structured, as there is always some level of organization in even the most basic content formats. For example, in a workplace document you will likely find titles, headers, sentences, and paragraphs, or at least a clear indicator of the beginning and end of a large block of text. Similarly, taxonomies and ontologies are artificial constructs which may reflect the information they describe or be imposed as a form of ordering on semi-structured content. In this session, attendees hear case studies about using the contextual structure of taxonomies and ontologies and the various structural indicators in text to perform taxonomy-based content auto-categorization and information extraction.
The AI hype is rapidly exploding into C-suites of organizations around the world and with good reason—the promise is compelling. The convergence of AI, robotic process automation (RPA), machine learning, and cognitive platforms allows employees to focus their attention on exception processing and higher-value work while digital labor manages low-value, repetitive tasks. While the debate whether digital labor will add or eliminate jobs is ongoing, what’s important in today’s enterprise is how digital and human labor can be integrated to improve efficiency and drive innovation. Using real-world examples, this session covers how machine processing, when guided by human knowledge, curation, and control, provides assisted intelligence (AI) to organizations that want to streamline processes, reduce operating costs, and improve ROI.
Traditional knowledge platforms are not capable of effectively understanding unstructured information due to the complexity of language and the lack of structure. Therefore, they cannot effectively organize disparate sources of knowledge (marketing material, customer service content, emails, chat logs, social media chatter, customer response surveys, internal documentation, etc.) in any meaningful way. Addressing the complexity of language requires more than keywords, spending weeks and months manually tagging data, or locating topic-specific content in an effort to train machine learning algorithms. This presentation explains the concepts behind an AI/cognitive computing platform, what makes it work, and how it can be deployed as a smart infrastructure to support a variety of business objectives. It will include a demonstration of an English-language knowledge graph (ontology), a customer support self-service mobile solution, and a smart content navigation portal.
This presentation serves as an overview of current issues with named entity recognition in text analytics, focusing on work done beyond the categories of people, place, organization, and other elements that are (relatively) easily extracted through current processes. It covers areas of ongoing research, issues, and ideas about their potential benefits to taxonomy and ontology development.
Traditional approaches to concept and relationship extraction focus either on pure statistical techniques or on detecting and extending noun phrases. This talk outlines an alternative approach that identifies multiword concepts and the relationships between them, without requiring any predefined knowledge about the text’s subject. We demonstrate a number of capabilities built using this approach, including ontology learning, intelligent browsing, semantic search, and text categorization.
FINRA receives hundreds of thousands of various documents each year from stockbrokers and investors to be reviewed and analyzed by our investigators. The investigators are looking for information about the who, what, where, when, and how contained in these documents, which is labor-intensive. Our solution was to develop a system that leverages NLP, machine learning and graph databases. The enhanced NER model combined with our custom entity resolution algorithms allowed us to extract individuals, organizations, and FINRA-specific entities and to map these entities into FINRA’s business systems. Entities were loaded into the Titan graph DB that supported navigation between documents, individuals, and organizations, visually highlighting hard-to-see patterns and insights. In addition, our NLP process allowed us to generate document summaries. This system significantly improved effectiveness and comprehensiveness of investigators’ documents review.
In order to reduce waste in DoD research, the DTIC is developing a document similarity application to identify forms of fraud, waste, and abuse such as equivalent work being done by different services. The document similarity tool will provide DTIC with the capability to apply content analytics against a large collection of documents, including Request for Proposals, proposals, technical reports, and project descriptions is key. The challenge goes beyond the identification of simple copy-and-paste style duplication. In this presentation, we discuss our hybrid approach to evaluating document similarity that combines multiple approaches, including vector space models, semantic similarity, and a novel approach to text analytics called trains-of-thought analysis. In addition, we provide a demonstration of the web-based application to include a real-time, document similarity analysis, including data visualizations, to speed the finding and assessment of similar content.
AI is on the highest rung of the IT agenda. But how does it support professionals’ needs for insights in decision-making? Mayer looks at text analytics, the particular strand of AI that deals with language, the essential vehicle for professional knowledge. Through examples of its impact in insurance, media and the sciences, he illustrates “the art of the possible” and how you can make AI part of your knowledge practice’s roadmap.
The Inter-American Development Bank is a multilateral public sector institution committed to improving lives in Latin America and the Caribbean. Human capital may be the institution’s most important resource for realizing its vision: The knowledge of its employees, roughly 5,000, is spread across offices in 29 countries throughout the Americas, Europe and Asia. The IDB’s knowledge management division led an 8-week proof of concept that used natural language processing techniques to create explicit representations of the tacit knowledge within its employees and make those representations searchable. Attempting to identify and represent people’s knowledge is a complex task. Part of this complexity lies in the fact that variables used to determine knowledge have ambiguous definitions. These and other considerations are what make this POC so different from a simple skills database or profile search. This presentation details our experience with this project and how the use of NLP allowed us to successfully create approximations of IDB personnel knowledge and turn them into machine searchable knowledge entities.
Information retrieval can be seen as matching the intellectual content represented in documents to a knowledge gap in the mental map of a searcher. For decades, most of the focus of information retrieval research, whether in academia or in commercial systems, has been on improving the representation of documents, or collections of documents. Less attention has been paid to representing the searcher’s information need, or knowledge gap. This knowledge gap was characterized by Belkin, Brooks, and Oddy as an Anomalous State of Knowledge. This talk will describe the theory and practice of this concept and how it can be utilized to enhance information retrieval.
Using NLP and linguistics, Saini presents Sapient’s work to develop unsupervised learning-based question-answering system. This talk showcases a demo on real-life data and also explains the process of building such automated QnA systems. Further, it also talks about shortcomings of chatbot systems and how these systems can be integrated with QnA systems to make them scalable.
Sanjani introduces a semantic search and browse tool that aims to help researchers at IMF with their studies by finding relevant papers, authors, and concepts.
Advances in machine learning have led to an evolution in the field of text analytics. As these and other AI technologies are incorporated into business processes at organizations around the world, there’s an expectation that intelligent automation will lead to improvements like increased operational efficiency, enriched customer engagements and faster detection of emerging issues. How will technology meet that demand? How can we combine the expertise of humans with the speed and power of machines to analyze unstructured text that’s being generated at an unprecedented rate? Find out in this talk from Mary Beth Moore, who will share stories about text analytics being used to augment regulatory analysis, improve product quality and fight financial crimes.
The advent of unsupervised machine-learning algorithms make it possible for content owners to index their content without a taxonomy. This means that publishers are faced with this challenge: Do you maintain your existing taxonomies or replace them by a full ML approach? Or is there any way of combining the two? This talk looks at some case studies that have implemented different solutions, including publishers with private taxonomies used by organizations, and the use of large-scale, public-controlled vocabularies such as MeSH.
Meza presents a discussion on how the use of an analytical framework in conjunction with the current human interface improved the understanding of the International Space Station crew perspective data and shortened the analysis time, allowing for more informed decisions and rapid development improvements.
The Center for Food Safety and Applied Nutrition at the U.S. Food & Drug Administration (CFSAN/FDA) has been piloting a new process to identify, prioritize, and address potential emerging chemical hazards of concern that may be associated with CFSAN-regulated products. The objective has been to develop a business solution that enables analysts to identify predictors that are indicative of emerging chemical hazards associated with CFSAN-regulated products. This presentation reviews how CFSAN leverages SAS capabilities such as text analytics, entity extraction, predictive modeling, and business intelligence, combined with access to a variety of data sources, in our approach to build the Emerging Chemical Hazard Intelligence Platform (ECHIP), CFSAN’s solution for identifying emerging chemical hazards. We discuss how we developed an integrated solution that enables our analysts to quickly filter, visualize, and identify trends in reports that are indicative of potential chemical hazards.
This presentation showcases a strategy of applying text analytics to explore the Trafficking in Persons (TIP) reports and apply new layers of structured information. Specifically, it identifies common themes across the reports, use topic analysis to identify a structural similarity across reports, identifying source and destination countries involved in trafficking, and use a rule-building approach to extract these relationships from free-form text. We subsequently depict these trafficking relationships across multiple countries using a geographic network diagram that covers the types of trafficking as well as whether the countries involved are invested in addressing the problem. This ultimately provides decision makers with big-picture information about how to best combat human trafficking internationally.
Machine learning models often depend on large amounts of training data for supervised learning tasks. This data may be expensive to collect, especially if it requires human labeling. This raises some particular quality issues, for example, how to ensure that human agreement is high and what to do in the event that it is not? Also, when your data is expensive to tag, how do you ensure that you have the smallest set possible that is representative of all your features? This talk addresses these and other issues associated with gathering hand-coded datasets for supervised machine-learning models, especially models run on textual data.
A U.S. intelligence community researcher recently declared, “Analytics is my second priority.” We have long passed the point where even “medium data” projects exceed the capacity of human analysts to actually read the corpus. Yet “human in the loop” is essential to ensuring quality in machine analytics. Thus his, and our, first priority becomes effective triage: determining which text warrants human attention, which should be condensed by automated means, and which may actually best be disregarded as valueless or actively malign. We model the text analytic process as a success of tiered steps, each with accuracy rates. While we classically think of text analytic accuracy in favorable terms as “precision and recall,” their inverses are “false negative and false positive.” We explore how initial steps with high-volume, automated processing can best tune their accuracy trade-offs to optimize the latter, human-moderated steps.
Since supervised machine learning gained court acceptance for use in e-discovery 6 years ago, best practices have evolved. This talk describes the special circumstances of e-discovery and the best approaches that are currently in use. How robust is the Continuous Active Learning (CAL) approach? How much impact does the choice of seed documents have? What are SCAL and TAR 3.0?
This short case study describes a recent project with a special collection from a major university library which posed a fascinating challenge: Provided with scanned images (and OCR) of 14,000 typewritten and handwritten Cuban catalog cards, how can we extract structured text, index the content, and build XML records from this source data? Using a variety of text analytics techniques—including both Boolean and Bayesian approaches—we were able to identify and extract and structure the targeted elements accurately enough to create a dataset that required minimal manual cleanup.
The intersection of knowledge sharing and new ways of learning and training is having an impact on how connected your employees feel to your organization at large. Moneypenny demonstrates how using video, social networks, and content collaboration together empowers knowledge practitioners and experts and people across the organization to engage with each other. Foster a culture of curiosity and share learning and best practices, while improving employee experience.
What are the chances of three thought leaders meeting in the same room, at the same terminal, in the same airport, in the same city by coincidence? Hear their story and many more as they discuss the impact of social media, organizational culture, machine learning, demographics and more!