Instrumentation & Measurement Magazine 26-5 - 34

Continuous T&E and Self-Evaluation Feedback: Future AIESs
for DoD should include a consistent approach at the programmatic
level to ensure that the ability to continuously capture
data, results, and standardized measures is integrated into a
deployed system. These measures should be provided openly
to enable ease of adoption by commercial vendors. This facilitates
not only refinement but innovation.
The feedback measures can include confidence values that
are provided along with the results to indicate result reliability.
Calibrated confidence measures permit users moving from
one AIES to another to have a level of continuity in interpreting
system results. Once validated, confidence measures can be of
high utility to test and validate a system's operation in different
operating conditions. By integrating these measures into
the system, continuous feedback is feasible throughout the life
cycle of the system, increasing trust.
In addition, metrics on system use, such as the extent to
which the system output is acted on, should be defined such
that they inform evaluations. Standard practice should include
the facility to capture system use, outputs, user feedback, and,
if possible, the resulting outcomes from system use. This will
address how ML models' performance in real-world applications
can degrade over time. Since AI models are integrated
into existing operations, a continual feedback loop improves
future evaluation efforts.
T&E from Development through Operations: An T&E plan for
AIESs would include a checklist that will ensure that tools,
measures, processes, and SME input will be applied. The plan
should capture key artifacts that provide a measure of quality
of the AI development and training processes. These artifacts
include well-documented training and testing that includes
datasheets, model cards [7], training/test logs, results, and
lessons learned.
Traditional software development processes increasingly
use DevSecOps tools and processes. AI-based system development,
like other software development, should include
specific tools to tie the challenges of AI development and
testing into an overall managed ecosystem executed to handle
AI's unique processes. The hardware/platform's profile
and network connectivity, power reliability, and form factor
must be included in the use case specifications. AIESs
that rely heavily on massive models and large feature sets
have unique power and/or connectivity constraints. This
will ensure that testers drive operational test and evaluations
of AIESs within the bounds of its power and compute
limitations. A well-designed operational test for AIESs is another
measure of quality. The test plan must include all data
and effects relevant to the environment the system will operate
within.
Data
Data is the driving force in ML and plays a critical role in T&E.
Having the right data, reflective of operational conditions, yet
also capturing boundary conditions, unexpected inputs (accidental
and intentional), and imbalanced classes will ensure
34
more effective T&E. By including data as an essential component
of an overall T&E framework, evaluations become more
repeatable, efficient, and informative. Evaluation results are an
important data element to add to the evaluation package. Not
only does it enable comparison to past iterations of systems or
new capabilities to determine progress, but it also facilitates
detecting error causes and domain drift [6].
Data hubs and data sharing infrastructures are increasingly
used throughout service-level enterprises. A data hub
for T&E could hold the AI test data which is isolated from AI
training data. The trend toward datasheets for training data
should be extended to include and differentiate test results
data. Datasheets [8] describe all aspects of data sets, including
the specifics of the data, the data characteristics, the amount of
data, the type of data, the data coverage, data collection procedures,
any potential sources of bias or ethical concern, and
pre- and post-processing needs.
The planning for T&E must start with the earliest stages of
the acquisition that occurs for the program. Evaluation data
should be drawn from two sources, as a holdout set from the
training data and an independent operational training data
set. The cost of curating and annotating this data is typically
high, and future systems should be designed to automatically
collect operational evaluation data. Automatic data collection
could be built into the tools and Standard Operating Procedures
would exist to automatically collect operational data
along with the results of the human use. This facilitates testing
that closely aligns with operational use.
In situations where additional processing or annotation is
necessary, collections of tools that facilitate more rapid data
annotation are expected to become increasingly available (e.g.,
[9]). They would reside in a repository as part of the evaluation
infrastructure.
Synthetic evaluation data is useful in situations where the
amount of relevant operational data is low or where the expected
event is a rare event problem. For instance, in image
recognition, if the data contains 10,000 images of a terrain and
only 100 images of a tank in that terrain, then synthetic generation
of additional images allows for more accurate testing.
The use of synthetic data presents risks in that it can reduce
fidelity to operational data, increase vulnerability to malicious
attack, or introduce bias [10]. A future data environment
would have tools to enable the generation of synthetic test data
that reflects operational data sufficient to give meaningful and
accurate results while reducing the data collection and annotation
burden. This future vision is more distant than other
recommendations, since it's an area of active research requiring
fundamental AI R&D.
An essential element in T&E is the ability to compare current
testing to previous tests. This means treating previous
test results as a data set that must be cataloged and tracked.
In the past, test results, once summarized and reported, can
be lost although the data used for testing is retained. To avoid
rerunning tests to compare potential progress or confirm compatibility,
saving test results allows for better comparisons to
past tests. Future test data repositories should treat test results
IEEE Instrumentation & Measurement Magazine
August 2023

Instrumentation & Measurement Magazine 26-5

Table of Contents for the Digital Edition of Instrumentation & Measurement Magazine 26-5

Instrumentation & Measurement Magazine 26-5 - Cover1
Instrumentation & Measurement Magazine 26-5 - Cover2
Instrumentation & Measurement Magazine 26-5 - 1
Instrumentation & Measurement Magazine 26-5 - 2
Instrumentation & Measurement Magazine 26-5 - 3
Instrumentation & Measurement Magazine 26-5 - 4
Instrumentation & Measurement Magazine 26-5 - 5
Instrumentation & Measurement Magazine 26-5 - 6
Instrumentation & Measurement Magazine 26-5 - 7
Instrumentation & Measurement Magazine 26-5 - 8
Instrumentation & Measurement Magazine 26-5 - 9
Instrumentation & Measurement Magazine 26-5 - 10
Instrumentation & Measurement Magazine 26-5 - 11
Instrumentation & Measurement Magazine 26-5 - 12
Instrumentation & Measurement Magazine 26-5 - 13
Instrumentation & Measurement Magazine 26-5 - 14
Instrumentation & Measurement Magazine 26-5 - 15
Instrumentation & Measurement Magazine 26-5 - 16
Instrumentation & Measurement Magazine 26-5 - 17
Instrumentation & Measurement Magazine 26-5 - 18
Instrumentation & Measurement Magazine 26-5 - 19
Instrumentation & Measurement Magazine 26-5 - 20
Instrumentation & Measurement Magazine 26-5 - 21
Instrumentation & Measurement Magazine 26-5 - 22
Instrumentation & Measurement Magazine 26-5 - 23
Instrumentation & Measurement Magazine 26-5 - 24
Instrumentation & Measurement Magazine 26-5 - 25
Instrumentation & Measurement Magazine 26-5 - 26
Instrumentation & Measurement Magazine 26-5 - 27
Instrumentation & Measurement Magazine 26-5 - 28
Instrumentation & Measurement Magazine 26-5 - 29
Instrumentation & Measurement Magazine 26-5 - 30
Instrumentation & Measurement Magazine 26-5 - 31
Instrumentation & Measurement Magazine 26-5 - 32
Instrumentation & Measurement Magazine 26-5 - 33
Instrumentation & Measurement Magazine 26-5 - 34
Instrumentation & Measurement Magazine 26-5 - 35
Instrumentation & Measurement Magazine 26-5 - 36
Instrumentation & Measurement Magazine 26-5 - 37
Instrumentation & Measurement Magazine 26-5 - 38
Instrumentation & Measurement Magazine 26-5 - 39
Instrumentation & Measurement Magazine 26-5 - 40
Instrumentation & Measurement Magazine 26-5 - 41
Instrumentation & Measurement Magazine 26-5 - 42
Instrumentation & Measurement Magazine 26-5 - 43
Instrumentation & Measurement Magazine 26-5 - 44
Instrumentation & Measurement Magazine 26-5 - 45
Instrumentation & Measurement Magazine 26-5 - 46
Instrumentation & Measurement Magazine 26-5 - 47
Instrumentation & Measurement Magazine 26-5 - 48
Instrumentation & Measurement Magazine 26-5 - 49
Instrumentation & Measurement Magazine 26-5 - 50
Instrumentation & Measurement Magazine 26-5 - 51
Instrumentation & Measurement Magazine 26-5 - 52
Instrumentation & Measurement Magazine 26-5 - 53
Instrumentation & Measurement Magazine 26-5 - 54
Instrumentation & Measurement Magazine 26-5 - 55
Instrumentation & Measurement Magazine 26-5 - 56
Instrumentation & Measurement Magazine 26-5 - 57
Instrumentation & Measurement Magazine 26-5 - 58
Instrumentation & Measurement Magazine 26-5 - 59
Instrumentation & Measurement Magazine 26-5 - Cover3
Instrumentation & Measurement Magazine 26-5 - Cover4
https://www.nxtbook.com/allen/iamm/26-6
https://www.nxtbook.com/allen/iamm/26-5
https://www.nxtbook.com/allen/iamm/26-4
https://www.nxtbook.com/allen/iamm/26-3
https://www.nxtbook.com/allen/iamm/26-2
https://www.nxtbook.com/allen/iamm/26-1
https://www.nxtbook.com/allen/iamm/25-9
https://www.nxtbook.com/allen/iamm/25-8
https://www.nxtbook.com/allen/iamm/25-7
https://www.nxtbook.com/allen/iamm/25-6
https://www.nxtbook.com/allen/iamm/25-5
https://www.nxtbook.com/allen/iamm/25-4
https://www.nxtbook.com/allen/iamm/25-3
https://www.nxtbook.com/allen/iamm/instrumentation-measurement-magazine-25-2
https://www.nxtbook.com/allen/iamm/25-1
https://www.nxtbook.com/allen/iamm/24-9
https://www.nxtbook.com/allen/iamm/24-7
https://www.nxtbook.com/allen/iamm/24-8
https://www.nxtbook.com/allen/iamm/24-6
https://www.nxtbook.com/allen/iamm/24-5
https://www.nxtbook.com/allen/iamm/24-4
https://www.nxtbook.com/allen/iamm/24-3
https://www.nxtbook.com/allen/iamm/24-2
https://www.nxtbook.com/allen/iamm/24-1
https://www.nxtbook.com/allen/iamm/23-9
https://www.nxtbook.com/allen/iamm/23-8
https://www.nxtbook.com/allen/iamm/23-6
https://www.nxtbook.com/allen/iamm/23-5
https://www.nxtbook.com/allen/iamm/23-2
https://www.nxtbook.com/allen/iamm/23-3
https://www.nxtbook.com/allen/iamm/23-4
https://www.nxtbookmedia.com