Editor's Note: This blog post was originally posted in The Guardian.

Global Forest Watch uses data to monitor changes to the Earth’s forests. What can other climate initiatives gain from the project?

Forests provide benefits few of us appreciate. They store carbon and mitigate the impacts of climate change, preserve biodiversity and ecosystem stability. They provide resources we all use, and more than a billion people around the world depend on forests directly for their livelihoods.

But forests are under more pressure today than ever. Between 2000 and 2012, the world lost a net 1.5 million square kilometres of tree cover, an area roughly the size of Mongolia. The clearing and burning of forests is responsible for between 12-20% of greenhouse gas emissions. In response, a major declaration on forests was signed at the UN climate summit in September, committing to end global deforestation by 2030. But one major barrier to curbing the destruction of forests around the world remains: the lack of reliable data that tells us precisely when and where it’s happening.

To fill this data gap, we created Global Forest Watch (GFW)—an online platform combining hundreds of thousands of satellite images, high-tech data processing and crowd-sourcing, to provide near-real time data on the world’s forests. Our goal is to enable governments, companies, NGOs, and the public to better manage forests, track illegal deforestation and more.

But big data comes with big challenges. From the start, GFW grappled with a lack of public data, barriers to participation, and confusion over terminology. In our experience, these challenges are common to data-driven initiatives that aim to enable public use of big data. So as we push forward with Global Forest Watch, we thought we would share a few lessons that might help other big data initiatives seeking to tackle climate change.

Building Support for Open Data

It can be difficult to manage the process of opening up previously exclusive data—like the locations of concessions for logging, agriculture, and mining—for public use. Global Forest Watch compares this data with satellite-detected tree cover loss to determine where harmful or illegal activities might be taking place. Governments may release official deforestation statistics, but not simple ways for the public to verify these numbers. And without a history of public data-sharing, even virtuous countries, companies, and researchers may be reluctant to share their information, for fear of losing control over how it would be used.

Global Forest Watch data at work in an area of southwest Brazil.
Global Forest Watch data at work in an area of southwest Brazil.

Despite these concerns, many groups have embraced open and transparent data recently. In June, the Roundtable on Sustainable Palm Oil released via GFW the first detailed public maps of their certified concessions, which Global Forest Watch now uses in its analyses. As the open data movement slowly gains trust and traction, big data tools like GFW will have more and more material to work with.

Reaching Out to Those Who Know the Forests

Satellite images and data processing techniques can only do part of the work. Global Forest Watch was built to allow users to contribute their own data to provide local context, such as: maps of protected areas, concessions, or land ownership, or short stories explaining why forests were lost, regrown, or conserved in a particular area. However, despite hundreds of thousands of visitors to the GFW website, relatively few submitted their own data or content. So we have learned that outreach on the ground is indispensable and are now showcasing GFW for governments, local communities, and businesses around the world. We have also been looking for ways to better engage users online. We are now working with TomNod, a crowdsourcing platform that is part of Digital Globe, a GFW partner, that uses ultra-high resolution satellite imagery to identify areas in Indonesia where forests and other sensitive ecosystems have been cleared by fire for agriculture or due to land conflict. These detailed satellite pictures have prompted almost 300,000 “tags” on the images, identifying over 24,000 active fires. This data will be posted online for the public and law enforcement officials.

A Forest by Any Other Name

Providing data on forests for both technical experts and the broad public requires caution in defining key terms and describing exactly what the data show. The term “forest” is particularly fraught, with many countries and experts defining forest by different thresholds for canopy cover, some including “plantation forests” in their definition, while others exclude it. “Deforestation” is even more confusing, with over 800 competing definitions. Satellites tend to be agnostic to such definitional questions and, without extensive additional analysis, measure only tree cover loss, showing where trees were but no longer are or vice versa. Without consistent definitions, there is increased risk that the data on GFW may be misinterpreted, or dismissed as not relevant. Clarifying and responding to critiques and inquiries about the data has therefore become a major priority for the project, along with continually improving response to such feedback. It has also inspired new research efforts, including an extensive project to map the extent of degradation in the world’s pristine intact forests, and an initiative to map plantation forests in key countries across the world. So while finding definitions that everyone can agree on may not always be possible, we discovered that that there is much that can be done to provide options for those with different priorities.

As we see it, these challenges are not unique to the development of GFW but are hurdles that anyone working in the sphere of big data for the environment needs to tackle. The big lesson is that the big data revolution is under way, and we can all play a role by demanding transparency, contributing our efforts and feedback to science-based platforms where we can, and supporting efforts that confront climate change with timely and accurate data.