It’s the UN Decade on Ecosystem Restoration (2021-2030), and the pressure is on. In Latin America and the Caribbean, for example, 18 countries have joined Initiative 20x20, with the goal of protecting and beginning to restore 50 million hectares by 2030. And in Africa, 32 countries have committed to start regenerating 128 million hectares through AFR100.

Reaching those ambitious national commitments will require governments, from national to municipal, to craft public policies that help local communities revitalize their land. These policy incentive instruments — ranging from direct subsidies and tax reductions for farmers to technical support — are among the most effective tools to bring prosperity back to agricultural lands (and stop the expansion of farms into natural ecosystems like forests and wetlands).

But, for many policymakers, it is difficult to understand the breadth of agricultural, environmental and financial incentives that national and subnational governments have passed to encourage restoration. Retrieving useful information from the archives is a time- and labor-intensive task that requires analysts to read through thousands of policy documents that are packed with dense information and formatted in a variety of ways. During this laborious manual process, analysts are likely to miss at least one key point.

Machine Learning for Policy Analysis

To speed up policy analysis (and restoration progress), WRI and data scientists from Omdena and Solve for Good conducted a study that uses a technique called natural language processing (NLP). A branch of artificial intelligence, NLP translates unstructured text documents into mathematical matrices that use machine learning and its cousin deep learning to filter and categorize information. In this case, it can help policymakers find which documents are useful for their work.

Our “automatic assistant” synthesized policy incentive information from Chile, El Salvador, Guatemala, India, Peru and the United States by tackling four main tasks:

  1. Collecting all forest restoration policy documents from countries’ legislation websites using web-scraping.
  2. Separating restoration policy documents containing incentive information from those without incentive information.
  3. Placing the policy documents into six subcategories (fines, technical support, supplies, direct payments, credits, and tax reductions).
  4. Identifying which jurisdictions issued which policies and at what times.

Leaning on the knowledge of policy experts fluent in English and Spanish, the team trained the data so well that it placed incentives into the six categories identified above almost as accurately as the policy experts themselves.

They created a database for all six countries containing the policy titles, links to their full text, and effective timelines. The model also pulled out all sentences mentioning relative information, categorizing each policy according to its type of incentive and issuing institution. This type of analysis can significantly shorten the policy analysis and summary process from months to minutes.

The program can also work like a search engine: When a policymaker wants to see, for example, how many restoration policies Chile issued in 2019 that directly paid landholders to reward their restoration work, they can enter those criteria and they will receive a near-instantaneous result.

The analysis is of immediate use for policymakers that are joining programs like the Restoration Policy Accelerator, and WRI and its partners have used this technique to study policies in Chile and Guatemala. By encouraging more policymakers to train the datasets (making them more accurate), we can improve these techniques even further. A similar approach could be easily applied to a wide range of environmental policy analyses, too, by simply replacing the training datasets. The era of natural language processing in policy analysis is only beginning.

This work is funded through the Climate Solutions Partnership, a consortium of HSBC, WRI and World Wide Fund for Nature (WWF).