APR Nov/Dec 2022 - 72

» FORMULATION AND DEVELOPMENT
»
manual curation process dramatically increases costs and time needed
for discovery and development of therapeutics. Failure to implement
a standardized process can even set up the enterprise to waste money
again and again, finding and curating the same data repeatedly.
To avoid these problems,
A Realistic Approach for De-Siloing
and FAIRifying
life science organizations must adopt
strategies to efficiently curate data and associated metadata (in all
forms) upon ingestion to a central repository. Without an established
and diligent approach, researchers cannot efficiently leverage the
enterprise's assets, and data that should be the fuel for development
instead becomes a stumbling block.
Applying FAIR Principles to
Complex Data
The data management framework that the ML research community
has embraced in recent years is FAIR Data-meaning digital assets
should be Findable, Accessible, Interoperable and Reusable. While the
FAIR principles are more widely applied in academia and in clinical
research settings (where data sharing is expected and required), they
are also applicable within the walls of life science companies to make
data more valuable across the enterprise.
Forward-thinking life science companies are using these guidelines to
not only help combat the data management issues they are facing,
but also to maximize the potential of their data. If applied correctly,
the FAIR principles can advance research in pharma companies by
reducing R&D work and costs, bringing operational efficiencies, and
eventually accelerating time to market.
Of course, applying the FAIR principles is enough of a challenge
when dealing with tabular data-FAIRifying the medical imaging
data required for many ML efforts is an even bigger hurdle. Medical
imaging is a rich and valuable source of information for researchers,
but because of its large size and complex nature, these data pose one
of the biggest challenges to organizations that are seeking to de-silo
and FAIRify data.
As an example, consider DICOM files. The DICOM format itself
provides some level of standardization, but significant variations still
exist between modalities (MR, CT, PET, etc.), vendor instrumentation
(Siemens, GE, Philips, etc.), acquisition type, and specific site. Such
differences must be reconciled before the data can be used in
assessment/analysis approaches, including ML.
Additional challenges standing in the way of leveraging imaging data
include the sheer size, as data sets can be in the range of gigabytes
per study, terabytes over a cohort, and petabytes in legacy systems.
Furthermore, labeling the data often requires qualitative assessments
by scientists and radiologists. Setting up workflows and data capture
mechanisms for this work can add more labor to an already intensive
effort. Moreover, imaging data oftentimes need to be integrated with
other data types (i.e., clinical measures). Collectively, these factors
help illustrate why application of the FAIR principles presents a high
hurdle-even to very motivated research teams.
72 |
| November/December 2022
Life science enterprises are coming to terms with the fact that manually
locating datasets and curating them at the scale required for ML is costprohibitive,
time-consuming, and prone to human error. As discussed,
data and metadata from internal and external sources needs to be
curated, including standardization, classification, and quality control,
to ensure that it meets the standard required for complex analysis and
ML workflows. As more data is introduced over time, these processes
must be easily repeated within and across datasets, and scale with the
size of the data coming in.
In attempts to streamline some of this work, many organizations have
created homegrown infrastructure for select purposes. However,
these systems are typically built for very specific tasks and require
expertise from IT departments and/or data scientists to create and
maintain. Such programs can be difficult to train staff on, and the
institutional knowledge on how to run them may reside with just a
handful of people, which puts such operations at risk when teams are
reorganized or key members depart.
Many organizations are discovering that a better alternative is a
modern data management platform, which can automate much
of the work of ingesting data-even complex imaging data-and
curating it to FAIR standards. The automation of these processes is
key, as it reduces human error and variability. In addition, automation
also promotes adoption of FAIRification as a realistic and achievable
goal that doesn't require an indefinite all-hands-on-deck effort from
data scientists.
A modern data management platform can leverage cloud scalability
and parallelism to achieve many goals at once: It reduces upload time
while handling data from both old and new sources. It automatically
de-identifies data, extracts metadata, and classifies data per needs.
It then builds an easily searchable collection of data. In summary, it
prepares the data for downstream processes in a standardized and
efficient way.
With this type of extensible data platform, the FAIR principles can
become more concrete within an organization:
* Findability is dramatically improved, as researchers can build
their complete dataset with simple queries within one interface.
Without this approach, researchers commonly must request
data from CROs, external collaborators, or an internal archive,
which can take days or weeks, and often comes with additional
costs (which are paid repeatedly if multiple divisions request the
same data from the vendor).
* Accessibility is handled with role-based user permissions,
which can be configured to give individuals the appropriate
level of access to data, and user roles that enable which actions
they can take with data.

APR Nov/Dec 2022

Table of Contents for the Digital Edition of APR Nov/Dec 2022

APR Nov/Dec 2022 - Cover1
APR Nov/Dec 2022 - Cover2
APR Nov/Dec 2022 - 1
APR Nov/Dec 2022 - 2
APR Nov/Dec 2022 - 3
APR Nov/Dec 2022 - 4
APR Nov/Dec 2022 - 5
APR Nov/Dec 2022 - 6
APR Nov/Dec 2022 - 7
APR Nov/Dec 2022 - 8
APR Nov/Dec 2022 - 9
APR Nov/Dec 2022 - 10
APR Nov/Dec 2022 - 11
APR Nov/Dec 2022 - 12
APR Nov/Dec 2022 - 13
APR Nov/Dec 2022 - 14
APR Nov/Dec 2022 - 15
APR Nov/Dec 2022 - 16
APR Nov/Dec 2022 - 17
APR Nov/Dec 2022 - 18
APR Nov/Dec 2022 - 19
APR Nov/Dec 2022 - 20
APR Nov/Dec 2022 - 21
APR Nov/Dec 2022 - 22
APR Nov/Dec 2022 - 23
APR Nov/Dec 2022 - 24
APR Nov/Dec 2022 - 25
APR Nov/Dec 2022 - 26
APR Nov/Dec 2022 - 27
APR Nov/Dec 2022 - 28
APR Nov/Dec 2022 - 29
APR Nov/Dec 2022 - 30
APR Nov/Dec 2022 - 31
APR Nov/Dec 2022 - 32
APR Nov/Dec 2022 - 33
APR Nov/Dec 2022 - 34
APR Nov/Dec 2022 - 35
APR Nov/Dec 2022 - 36
APR Nov/Dec 2022 - 37
APR Nov/Dec 2022 - 38
APR Nov/Dec 2022 - 39
APR Nov/Dec 2022 - 40
APR Nov/Dec 2022 - 41
APR Nov/Dec 2022 - 42
APR Nov/Dec 2022 - 43
APR Nov/Dec 2022 - 44
APR Nov/Dec 2022 - 45
APR Nov/Dec 2022 - 46
APR Nov/Dec 2022 - 47
APR Nov/Dec 2022 - 48
APR Nov/Dec 2022 - 49
APR Nov/Dec 2022 - 50
APR Nov/Dec 2022 - 51
APR Nov/Dec 2022 - 52
APR Nov/Dec 2022 - 53
APR Nov/Dec 2022 - 54
APR Nov/Dec 2022 - 55
APR Nov/Dec 2022 - 56
APR Nov/Dec 2022 - 57
APR Nov/Dec 2022 - 58
APR Nov/Dec 2022 - 59
APR Nov/Dec 2022 - 60
APR Nov/Dec 2022 - 61
APR Nov/Dec 2022 - 62
APR Nov/Dec 2022 - 63
APR Nov/Dec 2022 - 64
APR Nov/Dec 2022 - 65
APR Nov/Dec 2022 - 66
APR Nov/Dec 2022 - 67
APR Nov/Dec 2022 - 68
APR Nov/Dec 2022 - 69
APR Nov/Dec 2022 - 70
APR Nov/Dec 2022 - 71
APR Nov/Dec 2022 - 72
APR Nov/Dec 2022 - 73
APR Nov/Dec 2022 - 74
APR Nov/Dec 2022 - 75
APR Nov/Dec 2022 - 76
APR Nov/Dec 2022 - 77
APR Nov/Dec 2022 - 78
APR Nov/Dec 2022 - 79
APR Nov/Dec 2022 - 80
APR Nov/Dec 2022 - 81
APR Nov/Dec 2022 - 82
APR Nov/Dec 2022 - 83
APR Nov/Dec 2022 - 84
APR Nov/Dec 2022 - Cover3
APR Nov/Dec 2022 - Cover4
https://www.nxtbookmedia.com