Society for the Preservation of Natural History Collections September 2019 - 24
Features
Maintaining Data Flexibility and
Accessibility Over Time In Large
Natural History Collections
Laurel Kaminsky *, Trudi Durgee, Akito Y.
Kawahara
Digitization is often perceived as balancing time commitments and data capture. Time includes hours spent designing
workflows and equipment, managing personnel, data capture, and quality control of data. Data is defined as the type,
amount and quality of information generated. However, there
is a third component that is often overlooked that we will call
"data flexibility" which is the curation and maintenance of the
data generated over time. A plan to maintain data is important
to ensure that data is not lost with changes in technology,
while also maintaining accessibility for future researchers.
For example, if a researcher left data on a floppy disc and
never put it onto a modern computer, the data would have a
high chance of being lost either due to floppy disc corruption
or inability to copy the data onto modern computers.
While every institution should have a plan to maintain their
data, data flexibility for large natural history collections is
critical to manage the enormous number of files generated
through digitization. As data accumulates, it becomes harder
to go back and change folder organization or to batch rename
files to standardize format. The McGuire Center for Lepidoptera and Biodiversity (MGCL) was founded in 2004 and
contains an estimated 1 to 3 million pinned specimens, 2 to
7 million papered specimens, 20,000 microscopic slides, and
50,000 fluid lots of larval collections (Kawahara et al. 2012).
The total amount of data we will generate if we image all of
the specimens at MGCL is probably in excess of 5 million
files. To maintain data flexibility of images at MGCL, we
have created a workflow that includes: 1) simple rules to govern how we name images, add images to folders, and search
for images; and 2) a design that allows for fluid data transfer
between databases with computer scripts.
Keep Organizational Rules Simple
There is a benefit to keeping computer infrastructure organization and rules simple. Simpler rules are easier for digitization volunteers and staff to follow, remember or to figure out
if institutional knowledge is lost. File naming is one aspect
that is important to keep simple because these form the bulk
of the files generated from digitization. Our file naming protocol is MGCL_#######_[D or V or L], where "D" is dorsal,
"V" is ventral and "L" is lateral. We opted to not put the sex
in the image name because it would take too much time to try
to sex specimens, sex may not be known and it would lead to
a less standardized file naming format. The scientific name
24 * SPNHC Connection
was not included in the image file name because if the name
is changed, it would require updating the file name and makes
it harder for the collection manager to assess if an image has
been uploaded onto a database. Sex and scientific name are
pieces of data that are captured in a database and can be put
into the file name through scripting if necessary (see next
section).
Folder structure is critical to maintain data flexibility. We
avoid nesting folders and duplicate folder structure to make
it easier to find folders and for scripting. We have three
main image folders in our pinned collection based on image
type: raw (.CR2) and high and low compressed (.jpg). All
three image folders are in the same directory level. The file
structure within each of the three main image types is the
same. The folders are organized by family, then by genus and
within each genus is the date that the images are taken. It is
estimated that there are over 100,000 species of Lepidoptera
in the world. To try to incorporate species level folders into
organization would make it harder to keep track of images,
especially if the identification changes.
Lastly, a method to keep track of what has been digitized
is crucial. To keep track of what we have digitized we have
a spreadsheet that contains basic summary statistics: what
we have digitized, number of species, who performed the
work, and where in the post imaging pipeline the specimen is
(including transcription, georeferencing, uploading data onto
databases). The key piece of information is the folder date
that the image was taken. We only complete a post imaging
process on an entire folder to make it easier to track the status
of the folder.
Data flexibility and computer scripting
Data flexibility includes how computer infrastructure can
be built to streamline post imaging processing, file movement, and compatibility with databases. Basic scripting skills
are crucial. The benefit of scripting is that it decreases the
time needed for doing repetitive processes. For example, if
a researcher wanted to add the scientific name or sex to a
specimen name, the researcher can download the data from
the database, and then run a script to insert the data into the
file name. A standardized folder structure that is implemented
across different image types enables a collection manager to
move or add new images from one folder to another. Data
flexibility also ensures that information from a database can
be used to help manage computer files. One example involves
new accessions that are unidentified. If a collection manager
has specimens that are unidentified from many genera, the
specimens can be imaged, then identified and the name put
into the database. The identification can be downloaded and
a script can be used to move the associated images to the appropriate genus folder.
Society for the Preservation of Natural History Collections September 2019
Table of Contents for the Digital Edition of Society for the Preservation of Natural History Collections September 2019
No label
Society for the Preservation of Natural History Collections September 2019 - No label
Society for the Preservation of Natural History Collections September 2019 - 2
Society for the Preservation of Natural History Collections September 2019 - 3
Society for the Preservation of Natural History Collections September 2019 - 4
Society for the Preservation of Natural History Collections September 2019 - 5
Society for the Preservation of Natural History Collections September 2019 - 6
Society for the Preservation of Natural History Collections September 2019 - 7
Society for the Preservation of Natural History Collections September 2019 - 8
Society for the Preservation of Natural History Collections September 2019 - 9
Society for the Preservation of Natural History Collections September 2019 - 10
Society for the Preservation of Natural History Collections September 2019 - 11
Society for the Preservation of Natural History Collections September 2019 - 12
Society for the Preservation of Natural History Collections September 2019 - 13
Society for the Preservation of Natural History Collections September 2019 - 14
Society for the Preservation of Natural History Collections September 2019 - 15
Society for the Preservation of Natural History Collections September 2019 - 16
Society for the Preservation of Natural History Collections September 2019 - 17
Society for the Preservation of Natural History Collections September 2019 - 18
Society for the Preservation of Natural History Collections September 2019 - 19
Society for the Preservation of Natural History Collections September 2019 - 20
Society for the Preservation of Natural History Collections September 2019 - 21
Society for the Preservation of Natural History Collections September 2019 - 22
Society for the Preservation of Natural History Collections September 2019 - 23
Society for the Preservation of Natural History Collections September 2019 - 24
Society for the Preservation of Natural History Collections September 2019 - 25
Society for the Preservation of Natural History Collections September 2019 - 26
Society for the Preservation of Natural History Collections September 2019 - 27
Society for the Preservation of Natural History Collections September 2019 - 28
Society for the Preservation of Natural History Collections September 2019 - 29
Society for the Preservation of Natural History Collections September 2019 - 30
Society for the Preservation of Natural History Collections September 2019 - 31
Society for the Preservation of Natural History Collections September 2019 - 32
Society for the Preservation of Natural History Collections September 2019 - 33
Society for the Preservation of Natural History Collections September 2019 - 34
Society for the Preservation of Natural History Collections September 2019 - 35
Society for the Preservation of Natural History Collections September 2019 - 36
Society for the Preservation of Natural History Collections September 2019 - 37
Society for the Preservation of Natural History Collections September 2019 - 38
Society for the Preservation of Natural History Collections September 2019 - 39
https://www.nxtbook.com/allen/cfnl/society-for-the-preservation-of-natural-history-collections-march-2022
https://www.nxtbook.com/allen/cfnl/35-1
https://www.nxtbook.com/allen/cfnl/34-2
https://www.nxtbook.com/allen/cfnl/34-1
https://www.nxtbook.com/allen/cfnl/33-2
http://www.brightcopy.net/allen/cfnl/33-1
http://www.brightcopy.net/allen/cfnl/32-2
http://www.brightcopy.net/allen/cfnl/32-1
http://www.brightcopy.net/allen/cfnl/31-2
http://www.brightcopy.net/allen/cfnl/31-1
http://www.brightcopy.net/allen/cfnl/30-1
http://www.brightcopy.net/allen/cfnl/29-2
https://www.nxtbook.com/allen/cfnl/29-1
https://www.nxtbook.com/allen/cfnl/28-2
https://www.nxtbook.com/allen/cfnl/28-1
http://www.brightcopy.net/allen/cfnl/30-2
https://www.nxtbookmedia.com