Instrumentation & Measurement Magazine 26-5 - 33

operates within the intended workflow in the operational
environment.
Effective evaluations will rely on a combination of domain
subject matter experts (SMEs) and AI practitioners
to define the best metrics for measuring both performance
and effectiveness. These experts working together should
develop use cases and requirements prior to AI-enabled system
development. In addition, the SMEs will work with the
AI experts to understand the interpretation of the evaluation
results.
Metrics need to be standard and repeatable, meet the mission
need, address complex applications where there is no
ground-truth or limited ground-truth, operate in systems of
systems, provide continuous T&E feedback, facilitate trustworthiness
from development through operations, and be
documented through evaluation cards. We highlight a few of
these below.
Standard and Repeatable Measures: Well-defined metrics
will capture the ability of an AI model to perform a specific
function for which it was designed, such as object recognition.
Performance metrics tell an important part of the story,
yet do not always indicate effective use. Equally important,
effectiveness metrics must be well defined to capture the
AIES performs in operational settings.
Future AI T&E metrics sets must assist testers to evaluate
performance and effectiveness for several stakeholders,
including managers, users, and developers. Measures for
managers describe mission effectiveness, suitability, and survivability;
system comparison for system selection; and match
to programmatic requirements for contractors. Measures for
users enable them to understand when and how to use and
trust systems. Measures for system developers are diagnostic
as the AI model is developed and integrated. AI testers can use
all of this information to apprise the AI Model development/
training process pedigree, capability effectiveness, integration,
suitability, and expected survivability (e.g., risk of data or
model corruption).
Effectiveness for Mission Need: Defining the requirements
for a system and mapping these to algorithmic performance
is key to future AI efforts. The AI-enabled system with the
best performance may not be the most effective, for reasons
such as explainability and data drift. (The detection and mitigation
of data drift is an active area of research.) While knowing
that a model performs strongly in isolation is informative,
it is more critical to define how the model should perform in
a well-defined mission.
This mission need should be defined as a use case that describes
the functionality that the AIES is to provide. A use case
is a detailed specification that includes descriptions of the user's
expectations for the system, required accuracy thresholds,
inputs and their variations, interactions with other system, etc.
Complexity in the Absence of " Ground-Truth " : AI evaluators
use the term " ground-truth " to describe a data set where, for
August 2023
given a set of feature values, there is a single corresponding,
agreed upon output or answer value. For lab-based evaluation,
having ground-truth presents a relatively straightforward
means of evaluating a system. Not all AI application
types can be built or evaluated under conditions where
ground-truth data exists. In military and intelligence settings,
the desired outputs sometimes can be more exactly specified
by mission need characteristics or system behaviors as proxies
for ground-truth.
AI applications can be divided into these categories for
performance evaluations:
◗ Where ground-truth exists
◗ Where ground-truth exists, but class imbalance or rare
event conditions can skew accuracy calculations and
weighting
◗ Where ground-truth does not exist, but there are reasonable
proxies
◗ Where no ground-truth exists [6].
Computer vision is an example of an application type
where ground-truth exists. An image either contains an object
or does not. However, annotating images to mark correct
answers may be a time-consuming and therefore expensive
process.
When there is varied ground-truth, the metric might be
well-established, yet the accuracy necessary for one set of
answers is much greater than the accuracy for another. For instance,
if an event occurs only 1% of the time, yet is the critical
objective of the use, a metric may be tailored to emphasize that
1% rather than the whole population. Other times, it is easier
or more efficient to tailor the test data's distribution (rare vs.
common event), depending on criticality of objective or data
set size, to ensure that the AI learning system focuses on the
rare events of interest.
Some applications, such as Machine Translation (MT),
have no ground-truth, because two humans will translate the
same document differently. Both translations may be equally
valid and correct, and yet vary, so establishing a single right
answer is impossible. Use cases are valuable to identify proxy
measures. For instance, information analysts often use MT
to support document triage which relies on proper names.
The proper name translation has measures with groundtruth
answers to validate with and so these measures serve
as 'proxy' measures of performance and actual measures of
effectiveness.
As a rule, AIESs will fall into one of the four ground-truth
availability categories listed above, although multi-algorithm
systems may have aspects from multiple categories,
such as an application that generates captions for images.
The objects recognized in the image would have groundtruth
answers whereas the text description in the caption
might not.
For situations lacking ground truth, evaluators will use
operationally based " proxy " metrics to serve as substitutes
for ground-truth. As an example, a metric could be improvements
in the user's ability to perform a task while maintaining
quality.
IEEE Instrumentation & Measurement Magazine
33

Instrumentation & Measurement Magazine 26-5

Table of Contents for the Digital Edition of Instrumentation & Measurement Magazine 26-5

Instrumentation & Measurement Magazine 26-5 - Cover1
Instrumentation & Measurement Magazine 26-5 - Cover2
Instrumentation & Measurement Magazine 26-5 - 1
Instrumentation & Measurement Magazine 26-5 - 2
Instrumentation & Measurement Magazine 26-5 - 3
Instrumentation & Measurement Magazine 26-5 - 4
Instrumentation & Measurement Magazine 26-5 - 5
Instrumentation & Measurement Magazine 26-5 - 6
Instrumentation & Measurement Magazine 26-5 - 7
Instrumentation & Measurement Magazine 26-5 - 8
Instrumentation & Measurement Magazine 26-5 - 9
Instrumentation & Measurement Magazine 26-5 - 10
Instrumentation & Measurement Magazine 26-5 - 11
Instrumentation & Measurement Magazine 26-5 - 12
Instrumentation & Measurement Magazine 26-5 - 13
Instrumentation & Measurement Magazine 26-5 - 14
Instrumentation & Measurement Magazine 26-5 - 15
Instrumentation & Measurement Magazine 26-5 - 16
Instrumentation & Measurement Magazine 26-5 - 17
Instrumentation & Measurement Magazine 26-5 - 18
Instrumentation & Measurement Magazine 26-5 - 19
Instrumentation & Measurement Magazine 26-5 - 20
Instrumentation & Measurement Magazine 26-5 - 21
Instrumentation & Measurement Magazine 26-5 - 22
Instrumentation & Measurement Magazine 26-5 - 23
Instrumentation & Measurement Magazine 26-5 - 24
Instrumentation & Measurement Magazine 26-5 - 25
Instrumentation & Measurement Magazine 26-5 - 26
Instrumentation & Measurement Magazine 26-5 - 27
Instrumentation & Measurement Magazine 26-5 - 28
Instrumentation & Measurement Magazine 26-5 - 29
Instrumentation & Measurement Magazine 26-5 - 30
Instrumentation & Measurement Magazine 26-5 - 31
Instrumentation & Measurement Magazine 26-5 - 32
Instrumentation & Measurement Magazine 26-5 - 33
Instrumentation & Measurement Magazine 26-5 - 34
Instrumentation & Measurement Magazine 26-5 - 35
Instrumentation & Measurement Magazine 26-5 - 36
Instrumentation & Measurement Magazine 26-5 - 37
Instrumentation & Measurement Magazine 26-5 - 38
Instrumentation & Measurement Magazine 26-5 - 39
Instrumentation & Measurement Magazine 26-5 - 40
Instrumentation & Measurement Magazine 26-5 - 41
Instrumentation & Measurement Magazine 26-5 - 42
Instrumentation & Measurement Magazine 26-5 - 43
Instrumentation & Measurement Magazine 26-5 - 44
Instrumentation & Measurement Magazine 26-5 - 45
Instrumentation & Measurement Magazine 26-5 - 46
Instrumentation & Measurement Magazine 26-5 - 47
Instrumentation & Measurement Magazine 26-5 - 48
Instrumentation & Measurement Magazine 26-5 - 49
Instrumentation & Measurement Magazine 26-5 - 50
Instrumentation & Measurement Magazine 26-5 - 51
Instrumentation & Measurement Magazine 26-5 - 52
Instrumentation & Measurement Magazine 26-5 - 53
Instrumentation & Measurement Magazine 26-5 - 54
Instrumentation & Measurement Magazine 26-5 - 55
Instrumentation & Measurement Magazine 26-5 - 56
Instrumentation & Measurement Magazine 26-5 - 57
Instrumentation & Measurement Magazine 26-5 - 58
Instrumentation & Measurement Magazine 26-5 - 59
Instrumentation & Measurement Magazine 26-5 - Cover3
Instrumentation & Measurement Magazine 26-5 - Cover4
https://www.nxtbook.com/allen/iamm/26-6
https://www.nxtbook.com/allen/iamm/26-5
https://www.nxtbook.com/allen/iamm/26-4
https://www.nxtbook.com/allen/iamm/26-3
https://www.nxtbook.com/allen/iamm/26-2
https://www.nxtbook.com/allen/iamm/26-1
https://www.nxtbook.com/allen/iamm/25-9
https://www.nxtbook.com/allen/iamm/25-8
https://www.nxtbook.com/allen/iamm/25-7
https://www.nxtbook.com/allen/iamm/25-6
https://www.nxtbook.com/allen/iamm/25-5
https://www.nxtbook.com/allen/iamm/25-4
https://www.nxtbook.com/allen/iamm/25-3
https://www.nxtbook.com/allen/iamm/instrumentation-measurement-magazine-25-2
https://www.nxtbook.com/allen/iamm/25-1
https://www.nxtbook.com/allen/iamm/24-9
https://www.nxtbook.com/allen/iamm/24-7
https://www.nxtbook.com/allen/iamm/24-8
https://www.nxtbook.com/allen/iamm/24-6
https://www.nxtbook.com/allen/iamm/24-5
https://www.nxtbook.com/allen/iamm/24-4
https://www.nxtbook.com/allen/iamm/24-3
https://www.nxtbook.com/allen/iamm/24-2
https://www.nxtbook.com/allen/iamm/24-1
https://www.nxtbook.com/allen/iamm/23-9
https://www.nxtbook.com/allen/iamm/23-8
https://www.nxtbook.com/allen/iamm/23-6
https://www.nxtbook.com/allen/iamm/23-5
https://www.nxtbook.com/allen/iamm/23-2
https://www.nxtbook.com/allen/iamm/23-3
https://www.nxtbook.com/allen/iamm/23-4
https://www.nxtbookmedia.com