Title: A Data-aggregation and Annotation System for Biological Collections AH draft (ahipp 20:18, 25 June 2009 (UTC))

Data on the distribution, ecology, and identity of wild-collected organisms are essential to understanding the evolution of biodiversity. With access to reliable data on habitats, localities, and taxonomy of wild-collected organisms, ecologists are able to model the effects of climate change on biotic communities, predict how species distributions will be affected by urbanization, and identify populations for genetic, ecological, and biosystematic study. Data repositories such as GBIF and NBII play a central role in providing access to these data, but they depend on collectors and taxonomists who can accurately identify and georeference the collections. As data sources grow and expand, aggregating and validating data at regional levels becomes increasingly important. The project proposed here will implement a set of tools for data aggregation, annotation, and georeferencing within a regional biological collections node for the NBII Great Lakes region, built around tools being developed by GBIF and extended for this project. The project will deploy these tools as a coherent system, built on a flexible regional web interface that integrates with existing species data already compiled for many species found in the region (through as well as plant-identification keys (implemented at As a data-curation tool, the Great Lakes herbarium node will (1) provide access to integrated data that is fed upward to the node from participating institutions and downward to the node from GBIF, and (2) allow users to annotate specimens based on their knowledge of the specimens and collection localities. Annotations will live at the node level and flow backward to the source institutions, where they can be accepted, rejected, or superceded according to the source institution's curatorial policies and knowledge. The project will have two important outcomes. Proximally, it will result in a network of herbarium data for the Great Lakes region that will provide access to high-quality data already available at each of the six collaborating institutions as well as six additional participating institutions. This number includes two sub-regional networks (vPlants, representing the greater Chicago region; and Wisconsin's WISPLANTS network), providing a test of data aggregation when the aggregation target is itself a data node. The network database will be populated with an immediately-available body of data by the end of the project, and the portal will provide ready access to researchers, educators, and land managers. Long-range plans and institutional missions at the core collaborating institutions ensure that data will continue to be added to the system, and hosting-collaboration with NBII ensure stability of the system indefinitely. In the longer run, the project creates a set of tools for data aggregation at regional scales that can be readily installed on other systems.

Intellectual merit: The proposed project integrates a suite of bioinformatics tools that increase the accuracy and accessibility of biodiversity data. Through collaboration between databased biological collections and GBIF, one of the world's major international data-servers, the project implements and validates a system of data aggregation, georeferencing, and annotation that will be directly portable to other regional nodes.

Broader impacts: The project creates a new collaboration among Great Lakes region biological collections and GBIF. Moreover, the data provided will serve land-managers, educators, and the lay public in addition to biodiversity researchers.

  • Do we need to be more clear that we're starting with all/only botanical data [plants, fungi, lichens]? It is implied by the inclusion of "herbarium data", but not sure the ABI reviewers would implicitly "get that".--gtonkovich 16:17, 26 June 2009 (UTC)
  • Are we still planning on the ability of a feed-back loop to the partner institutions such that this portal also acts as a data curation tool? If so, is that an important point for the summary or not?--gtonkovich 16:21, 26 June 2009 (UTC)
    • I agree on both of these (AH); I'll make the relevant changes. (ahipp 14:12, 29 June 2009 (UTC))
