Publishing data through

Data about the world’s biodiversity is complex. For centuries, scientists and researchers studying the natural world have recorded a wealth of information about the organisms they observe or collect. Public and private institutions around the world manage this information, and, profiting from the same technological advances that enable scientists and amateurs alike to contribute to our body of scientific evidence, it’s possible, at least in theory, to establish richer connections between all these data. But standardizing all their sources and formats is an overwhelming task.

GBIF is a research infrastructure for biodiversity data. In general, we integrate data by focusing on the specific elements that tie all this varied and variable information together: evidence that a verified source found a specific organism at a specific time and place. Most data include other details, but by using and its associated standards, tools and web services, we enable people to publish, discover and retrieve thousands of datasets containing hundreds of millions of species occurrences.


Supported classes of datasets

The four classes of datasets supported by GBIF start simply and become progressively richer, more structured and more complex. We encourage data holders to publish the richest data possible to ensure their use across a wider range of research approaches and questions, but not every dataset includes information at the same level of detail. Sharing what is available through is valuable, because even partial information answers some important questions.

Resources metadata

At its simplest level, allows institutions to create datasets describing undigitized resources like those in natural history and other collections. All three other dataset classes include this basic information, but this ‘metadata-only’ class offers researchers a valuable tool for discovering and learning about evidence not yet available online. They can also help assess the relative importance and value of undigitized collections and set priorities for future digitization. As with all datasets, GBIF ensures that each metadata dataset is associated with a unique Digital Object Identifier (DOI) to streamline data users’ citation of these resources.

Checklist data

Datasets can also provide a catalogue or list of named organisms, or taxa. While they may include additional details like local species names or specimen citations, these ‘checklists’ typically categorize information along taxonomic, geographic, and thematic lines, or some combination of the three. For example, a dataset that catalogues the Red Listed molluscs of Seychelles has distinct elements of taxonomy (the phylum Mollusca), geography (the island nation of Seychelles) and theme (species deemed imperiled by IUCN experts). Checklists function as a rapid summary or baseline inventory of taxa in a given context.

Occurrence-only data

Other datasets published through have sufficiently consistent detail to contribute information about the location of individual organisms in time and space—that is, they offer evidence of the occurrence of a species (or other taxon) at a particular place on a specified date. These datasets make up the core of data published through, and examples can range from specimens and fossils in natural history collections, observations by field researchers and citizen scientists, and data gathered from camera traps or remote-sensing satellites.  

Occurrence records in these datasets sometimes provide only general locality information, sometimes simply identifying the country, but in many cases more precision locations and geographic coordinates support fine-scale analysis and mapping of species distributions.

Sampling-event data

Datasets sometimes provide greater detail, not only offering evidence that a species occurred at a given location and date, but also making it possible to assess community composition for broader taxonomic groups or even the abundance of species at multiple times and places. These datasets typically derive from standard protocols for measuring and monitoring biodiversity like vegetation transects, bird censuses and freshwater or marine sampling. By indicating the methods, events and relative abundance of species recorded in a sample, these datasets improve comparisons with data collected using the same protocols at different times and places—in some cases, even leading researchers to infer the absence of particular species from particular sites.  

French translation: Publication de données via le site
Portuguese translation: coming soon
Spanish translation: coming soon