Publishers’ success is driven by providing compelling content to the STM community, and by organizing and delivering the content in a cost-efficient operational framework.
Parity’s STM publishing infrastructure provides intelligent automation and analytics to enable these objectives.
Parity’s STM publishing technologies fall into four successive modules, each of which benefits from the processing of earlier modules:
- The Tagger takes unstructured data and provides detailed tagging of entities within the text. An example is extracting references from an article’s full text and tagging each reference to the level of page number, volume, each author’s first name, last name, etc.
- The Linker clusters together multiple representations of the same object. For example, linking references to their cited articles, or finding duplicate article records in a database.
- The Profiler assembles comprehensive profiles of people, institutions, and other entities. An entity’s profile includes all occurrences of the entity in the data set, a summary of the entity, relationships to other entities, and additional details. For example, in the STM context the profile for a person would include all papers and patents written by the person, awarded grants, topics of expertise, frequent collaborators, affiliation, and contact information, among other things. In assembling profiles, the Profiler disambiguates between distinct entities that are referred to in the same way (for example two authors with the same name or same first initial and last name) while overcoming variants in how they are referenced (for example nicknames, abbreviations, and typos). A secondary, but important, outcome of profiling is that it provides the basis for building a network of entities.
- The Analytics module mines the network of entities in response to queries from the Applications/Solutions layer. An array of analytics methodologies are employed,
including term vectors which provide quantified representations of concepts that characterize various entities such as articles, institutions, and persons;
deep domain-aware natural language processing (NLP); and machine learning to discover important predictive patterns in the data.
- Applications and Solutions that address a wide variety of publishers’ and researchers’ needs are built on top of the underlying modules by Parity and other organizations.
Modules benefit from but do not require the results of preceding modules. For example, references can be linked with high accuracy even if they are untagged, but fully tagged references result in higher precision and recall. Likewise, author profiling can be done without linking references, but the citation graph provides additional evidence that can increase profiling accuracy.
These technologies support a variety of applications, including those highlighted in the STM R&D solutions section.
Our customers include some of the most respected electronic publishers in the world. Our best-of-class products and services leverage Parity's years of experience serving the STM publishing industry. We have helped many organizations achieve superior results with minimal disruption, some starting from scratch and others improving upon existing solutions.
Processing and Linking
Robust, scalable, and highly accurate
bibliographic reference processing - including extraction, tagging, linking,
and related capabilities.
and Institution Profiles
Parity's patented profiling technology offers detailed knowledge of institutions and authors as they relate to each other in the knowledge network, leveraging inferences within customer data and public data sources.
Development and Classification
Enable powerful content navigation and knowledge discovery, by organizing
your content into domain specific ontologies.
Reduce the cost of migrating and transforming data into publishable
content. Parity is an experienced service provider for metadata
capture, data cleansing and conversion.
Integrate your research content with patent data, revealing new connections
between research and IP to attract new users and types of usage.