header
 
 

Executive Summary  

Full Report                      

NSF Dear Colleague Letter                                   


 


 
 
Report of an NSF, USGS, NASA Workshop on Biodiversity and Ecosystem Informatics held at NASA Goddard Space Flight Center, June 22 - 23, 2000.

Dave Maier, Eric Landis, Judy Cushing, Anne Frondorf, Avi Silberschatz, Mike Frame and John L. Schnase (Editors)


Executive Summary

In June 2000, a group of computer scientists, biologist, and natural resource managers met to examine the prospects for advancing computer science and information technology (CS/IT) research by focusing on the complex and often unique challenges found in the biodiversity and ecosystem domain. We refer to this emerging, interdisciplinary field of study as Biodiversity and Ecosystem Informatics (BDEI). This report synthesizes the discussions and recommendations made at the workshop. It itemizes current BDEI challenges, lays out a national BDEI research agenda, and recommends actions to be taken within the national research agenda. It also proposes specific mechanisms to communicate and implement those actions. The following points summarize the conclusions of this forum:
  • The CS/IT research community plays a foundational role in creating the technological infrastructure from which advances in the environmental sciences evolve;

  • The next-generation CS/IT applications required by our expanding need to understand complex, ecosystem-scale processes will require solutions to significant, ground-breaking CS/IT research problems;

  • Important new research opportunities for the CS/IT community are provided by the urgency, complexity, scale, and uniqueness of the data, processes, and problems presented by work in the biodiversity and ecosystem domain; and

  • There is an increased need for governmental and industrial support of basic CS/IT research in order to respond to these challenges. Both the national CS/IT and environmental research agendas would derive significant, synergistic benefit from such investment.
 

In the remainder of this section, we introduce two major themes that weave throughout this report. First, the CS/IT requirements of biodiversity and ecosystem research are drastically changing, thereby requiring new solutions to fit the altered landscape. Second, the CS/IT research community has a long and successful record of creating new solutions and enabling the technology transfer needed to put these ideas into practical use. It is therefore a wise investment of public monies to ensure that the emerging, interdisciplinary field of biodiversity and ecosystem informatics becomes a healthy and viable discipline. top of page

Biodiversity & Ecosystem Sciences

The most striking feature of Earth is the existence of life, and the most striking feature of life is its diversity. This biological diversity - or biodiversity - provides us with clean air, clean water, food, clothing, shelter, medicines, and aesthetic enjoyment. Biodiversity, and the ecosystems that support it, contribute trillions of dollars to national and global economies, directly through industries such as agriculture, forestry, fishing, and ecotourism and indirectly through biologically-mediated services such as plant pollination, seed dispersal, grazing land, carbon dioxide removal, nitrogen fixation, flood control, waste breakdown, and the biocontrol of crop pests. And biodiversity - the biological richness of ecosystems per se - is perhaps the single most important factor influencing the stability and health of our environment. Clearly, this is one of our most important knowledge domains, vital to a wide range of scientific, educational, commercial, and government activities.

There is an increasing need to understand and respond to complex environmental problems. Just as we are developing a capacity to predict long-term climate events, we would now like to predict public health and ecological outcomes far into the future. Unfortunately, we currently lack the technologies to do this. The environmental sciences are "resource limited" by fundamental inadequacies in the CS/IT tools that can be applied to problems of this scale. If we are to keep pace with our need for quality information about the living systems of our planet, we must produce systems that can efficiently manage petabytes of a new generation of high-resolution, Earth-observing satellite data. We must understand how to integrate these new datasets with traditional biodiversity data, such as specimen data held in natural history collections, and genomic data from cellular- and molecular-level work. We must be able to make correlations among data from these and even more disparate sources, such as ecosystem-scale global change and carbon cycle data, compile those data in new ways, analyze them, and present the results in an understandable and usable way.

Despite encouraging advances in computation and communication performance in recent years, we are still unable to perform these activities on a large scale. It is only recently, for example, that IBM announced plans to build the world's fastest supercomputer - Blue Gene - which will attempt to compute the three-dimensional folding of human protein molecules. Given the thousands of proteins that are produced by the unknown millions of species on this planet, and given too that many of these molecules may have potentially significant economic value or environmental importance, we are clearly entering a new world of computer-mediated exploration. top of page

Biodiversity and Ecosystem Informatics

Until recently, little attention has been paid to computer and information science and technology research in the biodiversity and ecosystem domain. The interdisciplinary field of biodiversity and ecosystem informatics (BDEI) is attempting to change that. We are pushing the boundaries in two directions by identifying research challenges that can simultaneously advance the environmental sciences and the computer and information sciences. The potential for such synergies is high because of the nature of work in the biodiversity and ecosystem domain.

The single most important factor influencing work in this field is the problem of complexity. This complexity arises from several sources. First is the underlying biological complexity of the organisms themselves. There are millions of species, each of which is highly variable across individual organisms, populations, and time. Species have complex chemistries, physiologies, developmental cycles, and behaviors resulting from more than three billion years of evolution. There are hundreds, if not thousands, of ecosystems, each comprising complex interactions among large numbers of species and between those species and multiple abiotic factors. The second source of complexity is sociologically generated and includes problems of communication and coordination - among agencies, divergent interests, and groups of people from different regions, from different backgrounds, and with different points of view. Biodiversity and ecosystem data can be politically and commercially sensitive and entail conflicts of interest. The kinds of data scientists have collected about organisms and their relationships vary greatly in precision and accuracy, and the methods used to collect and store these data are almost as diverse as the natural world they document. Many important observations are made by non-scientists, such as amateur birders and natural history enthusiasts. And the range of datasets with which these datasets must interact is unusually broad, including geographical, meteorological, geological, chemical, physical, and genomic sources. There is thus an unusual need to accommodate differences in data quality within a democratized community information infrastructure that is both formal and informal.

As in most biological and earth sciences, location is central. Much biodiversity and ecosystem data is georeferenced - it is tied to some place on the globe. Sometimes the designation of a location can be ambiguous or imprecise, especially with observations and samples taken in previous centuries. As a result, something as central to the science as a means for spatial referencing becomes a complex issue. Biodiversity and ecosystem data are also distinctive for being species-referenced. Genetic data is frequently associated with a species or sub-species, invasions and extinctions are tracked at the species level, and much of the characterization of an ecosystem is described through the number and distribution of its constituent species. However, the naming of species is an abstract process, deeply embedded in long-standing scientific cultural processes - incomplete, subject to local variation, and changing with time. In the ongoing process of species discovery, different scientists may assign two or more names to the same species, and a single species name may be applied to what turns out to be distinct species. To make matters worse, most species on the planet have not yet been named and classified, and there is no authoritative listing of all the species we do know. In this field, ontological complexities abound!

Many key biodiversity and ecosystem questions involve flux - changes in range, numbers, distribution, genetics, and proportions over time. Extinctions, migrations, incursions, restorations, predicted environment impacts are all issues of flux. However, seldom does one dataset span enough time, area, or include enough species to answer important questions by itself. Scientists often require that biodiversity and ecosystem data be assembled from different sources into time sequences of comparable datasets, realizing that the component datasets may have been compiled for quite different purposes. Scientists also often deal with data at small scales over a large area or extended periods of time. Many significant situations will be lost if standard methods for moving to larger scales are used.

Finally, historical information serves prominently in the work of biodiversity and ecosystem scientists. Examples include plant and animal specimens and their labels, publications (some dating back 250 years), maps, and personal field notebooks. The study of biodiversity and ecosystems requires the analysis of trends, adaptations, and long-term relationships. These historical sources are thus often as pertinent as contemporary data. An additional and significant problem is that many of the historical information sources are not yet in digital form. For example, over 750 million natural history specimens and their accompanying metadata remain to be digitized in the US alone.

Because of these complexities, humans still play a crucial role in the processing of biodiversity and ecosystem data. This information is simply not as amenable to automatic correlation, analysis, synthesis, and presentation as many other types of information. People act as sophisticated filters and query processors - locating resources on the Internet, downloading datasets, reformatting and organizing data for input to analysis tools, then reformatting again to visualize results. This process of creating higher-order understanding from dispersed datasets is a fundamental intellectual process in the biodiversity and ecosystem sciences, but it breaks down quickly as the volume and dimensionality of the data increase. Who could be expected to understand millions of cases, each having hundreds of attributes? Yet problems on this scale are common in biodiversity and ecosystem research. Biodiversity and Ecosystem Informatics Research Agenda

Given this context, there are clearly areas where computer science and information technology research could be advanced - with great social and scientific benefit - by focusing on challenges in the biodiversity and ecosystem domain. These synergistic opportunities fall into three major categories: acquisition and conversion of data and metadata, analysis and synthesis of data and metadata, and dissemination of data and metadata. Some specific opportunities include the following:

Acquisition and Conversion of Data and Metadata

  • Modernizing the Biological Library - The accumulated volume of biological information and data collected over the past 250 years is massive. Improving methods for organizing, storing and retrieving these records is extremely critical. New techniques and tools must be developed for information extraction, text understanding, and cross-lingual information retrieval, making this an important non-business application domain for research on data integration, data cleansing, data warehousing, and archiving.

  • Digitizing the Biological Legacy - America's museums and laboratories maintain nearly one billion biological specimens. There is an urgent need to convert them, their documentation, and new specimens into metric-quality digital formats. This provides an excellent opportunity to advance research on lossless image compression, 3D image understanding, robotics, and the problem of integrating physical artifacts into digital libraries.

  • Multi-dimensional Observation and Recording - Efforts are needed to enable the collection of detailed information about the Earth in multiple dimensions and at multiple scales. This provides rich opportunities for research on scaling sensor-fusion techniques to large fields and developing and testing temporal-spatial data access methods.

  • Mobile Computing - New instrumentation is needed to bring knowledge to the field and to collect, store, and transmit data from the field. Specific opportunities here include applications of human-computer interaction research to multi-model interfaces, hands-free systems, wearable computers, remote presence, robotics, and human augmentation.

  • Taxonomic Freedom - Changes in biological names and classification schemes over time and discipline present enormous challenges. There is a need to integrate various interpretations, views, and versions of taxonomic data and make it available in a simple, easy-to-understand formats. This provides an unusually challenging context in which to examine the flexibility and robustness of knowledge representation systems, particularly their temporal and versioning aspects and their support for cross-ontology linking and translation.

Analysis and Synthesis of Data and Metadata

  • Comparing Across Scales - Biological data from different sources and times are frequently collected and presented in different scales and resolutions resulting in a loss of detail when multiple datasets are required for data synthesis and analysis. Tools and procedures to facilitate analysis across scales are needed, which provides an important opportunity for research on adaptive- and multi-resolution techniques for computation and modeling.

  • Modern Modeling - Researchers, managers, and policy-makers require models for biological decision-making rather than disaggregated collections of data and facts. Improved spatio-temporal modeling of biological, ecological, and social processes are required, providing a fertile area for multi-modal data assimilation research and high-performance computing.

  • Taxonomic Retooling - Taxonomists need new and improved tools for naming and defining species, changing and manipulating taxonomic organization, and performing other tasks regarding taxonomic content and structure. This provides a rich domain for research in knowledge acquisition and hierarchical display techniques.

  • Making Data Usable - Too frequently, decision-makers underutilize research results. To enhance the use of biological data, decision-makers require systems that will facilitate the synthesis and analysis of scientific data and research results. This is an excellent application area for research on uncertainty analysis, reasoning with incomplete information, and automatic summarization.

  • Machine Processable Metadata - Current scientific metadata is largely for human consumption. It is used to document and interpret datasets. However, much more value will be gained from it when it is complete, correct, and descriptive enough to help automate data manipulation tasks, such as summarization, combination of datasets, and conversion of data to appropriate forms for use in models and statistical tools. There are important opportunities here for testing of data-based inferencing technology and metadata-based information integration research.

  • Need for Speed and Accuracy - Many tasks in data management are iterative and time consuming. Researchers are frequently challenged with data entry and pattern discovery procedures and are required to estimate the quality of utilized data. Meanwhile species are disappearing at a rate greater than they can be recorded. This is a challenging domain for research on data reduction and data mining algorithms, including parallel implementations, modeling and analytic techniques with tunable accuracy, and data quality metrics.

Dissemination of Data and Metadata

  • Visualization - Users of biodiversity and ecosystem data and information, including land managers, policymakers, educators, non-governmental organizations, industry, and others outside biological research, need visualization techniques to better understand data, relationships among data, natural processes, and management actions over time. This leads to opportunities for research on advanced display and visualization techniques, including display of uncertainty, user-adaptive display, and multi-dimensional data visualization.

  • Interdisciplinary Collaboration and Communication - Stakeholders of biodiversity and ecosystem data are growing in numbers and breadth. No longer are management decisions made solely by individuals or single agencies, but involve communities of individuals. This calls for the development of computer-supported cooperative work and remote collaboration research suited for participants with widely varying roles, specialties, and training. It also provides opportunities to study cross-domain mapping, data integration, data quality management, ontologies, and other knowledge representations.

  • Data Management Guidelines - Biodiversity and ecosystem information is frequently used in complex and potentially controversial political, economic, and environmental discussions and decision-making. Informatics issues arising from this context include issues of data security, data sharing policies, intellectual property rights, quality assurance, and reuse of data. This provides important opportunities for research on data models for representing annotation and provenance, explicit modeling of data product generation, and policy development and dissemination techniques. top of page

Biodiversity and Ecosystem Informatics Research Agenda Implementation Plan

A concerted effort should be made to build a sustainable biological information infrastructure that proactively engages the broader CS/IT community in BDEI research. The following are among the specific actions that should be taken:

  • Interdisciplinary Planning Groups - Interdisciplinary planning groups, comprising members of the biodiversity and ecosystem and CS/IT research communities, should be established to articulate and communicate the special informatics challenges from one community to research actions in the other community. These planning groups should identify existing CS/IT technologies that could be transferred from other domains, long-term basic CS/IT research questions, short-term CS/IT research that needs to be pursued, and infrastructure needs including, equipment, facilities, networks, and personnel.

  • Matching Research Needs with Available and Appropriate Mechanisms - Efforts to implement any research agenda item need preliminary study to determine how these research actions could benefit from existing programs. These programs include mechanisms for funding, partnerships, interdisciplinary training and teaming, and resource sharing. There also is a need to insure that new CS/IT research is effectively applied to real biodiversity and ecosystem test cases.

  • Communicating the Research Agenda - Every effort should be made to communicate the BDEI research agenda to an audience that includes researchers in computer science and the biodiversity and ecosystem sciences, as well as the many agencies and foundations that support their efforts. Recommended actions include developing extended workshops or seminars; building a multi-sector, multi-disciplinary community; developing interdisciplinary "matchmaking" mechanisms; adding a CS/IT component to existing biodiversity and ecosystem projects; developing venues for multi-disciplinary activities; and promoting biodiversity and ecosystem informatics through the dissemination of reports, publications, email distribution lists, and websites.

  • Short-term Critical Actions that Require Immediate Attention - Several activities should be immediately initiated to "jump start" BDEI research. Paramount among these is an urgent plea from the scientific community for the formation of an NSF, USGS, NASA interagency strategic partnership to promote BDEI research. Within the next fiscal year, every effort should be made to launch a high-profile solicitation for cutting-edge research in this area. This activity should highlight the urgency and importance of biodiversity and ecosystem informatics problems and opportunities, and provide a forum for organizing problem-specific, interdisciplinary consultative and investigative teams and pursuing problems relevant to the core missions of the sponsoring agencies.
    top of page

Conclusion

A more complete consideration of these issues is presented in the full workshop report. The workshop and report emphasize that the biodiversity and ecosystem sciences are fundamentally information sciences, and worthy of special attention from the computer science and information technology community because of their distinctive attributes of scale and socio-technical complexity. At almost every turn, scale, complexity, and urgency conspire to create a particularly wicked set of problems. Working on these problems will undoubtedly advance our understanding and use of information technologies, and, even more important, give us the tools to protect and manage our natural world so as to provide a stable and prosperous future. top of page

Responsible NASA Official:

Dr. John L. Schnase

Senior Bioinformatics Scientist
Earth and Space Data Computing Division
NASA Goddard Space Flight Center

top of page

NASA Privacy Statement

Modified: 19 July 2001