This essay is the second in a series on the promise and challenges of using AI and machine learning to create a planetary environmental management system.
When I think about a global environmental management system, my mind immediately goes to environmental monitoring. As the old adage goes, you can’t manage what you can’t measure, so any management system needs ways to track what is being managed.
Too often, however, the conversation around clear and informative monitoring slides into assertions about how the act of monitoring and providing data itself will lead to action. The theory goes that illegal logging persists in part because it is difficult to know where and when it is occurring. Similarly, overfishing occurs because the authorities mandated with enforcing fishing regulations, and watchdog groups who would report illegal activity, can’t quickly see where and when those regulations are being violated.
Too often, the conversation around clear and informative monitoring slides into assertions about how the act of monitoring and providing data itself will lead to action.
Many data portals refer to this public availability of environmental monitoring data as “transparency.” For instance, Global Forest Watch’s “How does GFW make change?” section states: “greater transparency helps the public hold governments and companies accountable for how their decisions impact forests.” The GFW website opens with “Global Fishing Watch is promoting ocean sustainability through greater transparency.”
The datasets underpinning these portals and others, like Nature Map Earth and the UN Biodiversity Lab, or the United Nations Food and Agriculture Organization’s Hand in Hand initiative, aim to provide critical information we need to understand our environment. However, when it comes to environmental management, datasets and the monitoring that produces them — as necessary as they are — are not sufficient to drive action. A large gap sits between data availability and use.
What I Mean by ‘Transparency’ and Why It Matters
Let’s start with definitions, because when talking about environmental data, transparency can mean many things.
Here I am using the general term “transparency” to mean “shining light on metrics that have previously been hidden or hard to see.” Each of the data portals mentioned above, and the datasets they comprise, make previously hidden changes to our environment publicly visible. I call this “data transparency” — using data to shed light in a public way. Transparency can also apply to the processes used to generate those data and the mechanisms through which the data are shared. This kind of transparency, which I’ll call “process transparency,” can be achieved through publishing algorithms and workflows, or even better, making the code that implements them available, well documented, easy to modify, and open-source. There’s also “action transparency,” which shines light on what actions are taken by whom. That kind of transparency will be discussed in future posts.
Finally, when I’m discussing transparency here, I mean transparency to the broadest possible degree. Perfect universal transparency is not possible because not everyone on the Earth has the same access to information. Transparency can be seen as a measure of how many people have access to information, and maximizing transparency maximizes that number.
Challenging and important as it is, providing transparent data is generally just the first step to achieving positive environmental outcomes. Data does not lead directly to action.
Before launching into the limitations and problems with transparency, let me also explain why it’s necessary for a global environmental management system.
First, any such system will rely on innumerable data providers and will fail unless those providers and the people using their data can trust that the data are accurate (within a stated degree of accuracy). This means that the data need to be viewable, and that the processes used to create those data are verifiable — hence, data and process transparency are critical to the system’s credibility.
Given the number of data contributors, multiple datasets will be created with similar goals. For instance, biodiversity health can be measured in many ways, including using the World Wildlife Fund/Zoological Society of London Living Planet Index, the Biodiversity Intactness Index, or Nature Map’s species richness data. These datasets need to be comparable in some way, and being transparent about what the data are and how they were produced is critical for making those comparisons. Moreover, acknowledging its assumptions and shortfalls is also important, for example: Global indexes usually fail to represent the heterogeneity of biodiversity distribution and often result in coarse results that need to be well-communicated to allow proper interpretation.
Related to the prior two points, a global system should permit a broad community of collaborators. Organizations and communities wanting to incorporate their own data, or use data from the global system, will need to understand how to interoperate with the system.
Next, any management system needs to track changes over time, and making the data available and the methodologies for creating them clear helps ensure that long-term continual reporting of the metrics is possible, as long as the means for collecting inputs used to create the datasets are available.
Finally, and most basically, if the datasets are not widely available, they will not be usable and cannot help with a global management system that supports the broadest possible audience.
Transparent environmental data have provided clear benefits. GFW, for example, reports that their platform has been used by the governments of Mexico, Indonesia, and the Democratic Republic of Congo to inform government policy. In addition, advocacy organizations have used GFW to spur enforcement action against illegal logging. GFW has also reported uptake from governments including Ghana and Indonesia using the tool to bolster law enforcement.
Data Transparency Isn’t Always Possible or Good
But despite Steward Brand’s famous dictum that “information wants to be free,” transparency about data isn’t always perceived as beneficial or even legally possible:
- In many cases, sharing information is illegal or restricted in specific ways, such as in Europe’s General Data Protection Regulation. Sharing geospatial information in particular is limited by many governments (see, for example, the geospatial data policies of India and Malaysia).
- Sometimes laws mandating data sharing can be detrimental to positive outcomes, such as the recently overturned rule in the United States insisting that only transparent data could be used by the EPA in its assessments.
- Corporations and sovereign nations are also cautious about sharing natural data that might be exploited by corporations or other countries — as in Brazil’s restrictions on sharing biological data. In fact, there is a large natural data repatriation movement involving countries and communities attempting to wrest their data back. The debate about the costs and benefits of transparency can be seen in discussions comparing the FAIR principles (data should be findable, accessible, interoperable, and reusable) and the CARE principles for Indigenous Data Governance (collective benefit, authority to control, responsibility, and ethics). More on these two sets of data principles will come in future posts.
- Sharing data on the locations of valuable resources such as minerals, high value timber, or endangered wildlife can open them up to more threats.
Any platform that provides data transparency needs to take into account the risks of data sharing and provide means to mitigate those risks. Limits to transparency can often be overcome through several mechanisms. First, data can be anonymized, and there is extensive literature and many tools available for how to anonymize data (and why it’s not as easy as you might think). Another tool is data aggregation, where summarized data — rather than raw data — are shared. Finally, as is sometimes the case for vulnerable species, data can be embargoed — that is, only released after time has mitigated data sharing risks.
Even When It’s Possible, Transparency Is Not Enough
Challenging and important as it is, providing transparent data is generally just the first step to achieving positive environmental outcomes. Data does not lead directly to action. There are many reasons for this, including translation issues, information overload, analysis paralysis, lack of resources, failure to take local contexts into account, and in some cases, corruption. Below you will find some of the more common obstacles to leveraging data transparency alone into a better planet:
Raw data is often useless for decision makers. The first gap between data and action arises from the raw forms in which the data come. In many cases, decision makers who could affect positive environmental change can’t use data in its raw form. For example, corporations trying to de-risk their supply chain would like to know simply which of their suppliers is harvesting commodities unsustainably. A global map of deforestation pixels does not give them the information they need. (The GFW team recognized this and created Global Forest Watch Pro to address exactly this issue.)
Sometimes information needs to come in a specific form, such as police reports for law enforcement, or disclosure documents for banks. The Brazilian organization Terras takes a step in that direction by creating software that uses big-data and machine-learning-driven environmental analysis to help rural property owners determine whether their properties comply with the social-environmental responsibility criteria of lending organizations. Data collected using big data and machine learning are frequently inadmissible in court. Transparency can be rendered ineffectual by mismatched regulation.
Data often highlight too many options and prioritize none. A second problem with transparency arises in situations where there are simply too many options highlighted by the data. In the context of illegal forest loss, satellite imagery can show so many places where loss is occurring that it’s difficult to know how to address the problem. In these cases, transparency needs to be followed up by prioritization, which provides an opportunity for further transparency — specifically, transparency in how priorities are set.
Data needs to be processed into meaning. The next challenges with transparency flow from creating meaning from data. There’s a value chain moving from data to information to insight to action — and moving along that chain requires knowing the types of processing available and the means to perform that processing. Further transparency can be applied at each step. And even those with that knowledge might still lack the resources to access and process data into actionable information — human resources to do the work, computational resources to process the information in reasonable time, or financial resources to pay for people and infrastructure.
There is commonly a mismatch between the scale of the data and the scale at which insight is needed. In conservation and Earth observation, machine learning and artificial intelligence are most frequently applied to coarse data and over broad areas. Machine learning produces the best results when fed large amounts of data, and freely available satellite imagery is a perfect source for that data firehose. However, global studies at the resolution of freely available data frequently provide results that don’t meet the needs at a more localized context. One example of this is the excellent surface water availability data set generated by the European Commission Joint Research Center. The dataset does a wonderful job of presenting a global perspective of surface water availability over a long timeframe. However, being a global map, it can fail to meet local scale needs. For instance, it has been shown to under-represent surface water in Brazil’s Pantanal. In order to be useful for local contexts, additional information, the ability to integrate local and global data, and the ability to fit algorithms to local contexts are critical.
Data has a hard time beating intentional ignorance and corruption…for now. Finally, let’s face it: All the data in the world won’t help when the people making decisions ignore it or intentionally misconstrue it. However, transparency in how decisions are made can help with this. And investigations into using big data and machine learning to detect corruption are starting to get traction.
Bottom line: Transparency is foundational for a global environmental management system. But it’s in no way sufficient for success, or easy to achieve.
From Transparency to Explainability
In the above, I talked about transparency in data and transparency in processes. All of the datasets I discussed used artificial intelligence to extract information from numerous sensors, primarily Earth observation satellites. Although the platforms I described do a good job of making the data publicly available, the processes used to generate those data can be opaque. Indeed the lack of transparency in artificial intelligence solutions has been one of AI’s most common pitfalls.
The earliest AI solutions in health care, for example, were in part rejected because although the systems made correct predictions, they had difficulty justifying their decisions. In effect, this lack of understanding reflects a lack of transparency — a failure to explain why an AI system behaves the way it does.
In fact, there are two types of transparency in AI systems: interpretability and explainability. A highly interpretable AI system behaves in a humanly understandable way. Given a set of inputs, humans can interpret how the AI system came to reach a given output. Explainability, on the other hand, is more detailed. An explainable AI system is one where the internal mechanisms of the AI can be elucidated.
Understanding these topics in the context of an environmental management system is important because unless people trust the system, it will not be accepted. In the next post I will go into much more detail about the explainability and interpretability of AI systems, and how important these will be for a global environmental management system.
Thanks to my esteemed reviewers, commenters, and contributors: Adia Bey, Aurélie Shapiro, Azalea Kamellia, Bob Lalasz, Dan Morris, Debora Pignatari Drucker, Diana Anthony, Erik Lindquist, Gregoire Dubois, Holly Grimm, Karen Bakker, Karin Tuxen-Bettman, Johanna Prüssmann, Mayra Milkovic, Nasser Olwero, Nicholas Clinton, Sophie Galloway, Tanya Birch, and Tyler Erickson.
Read the previous essay in the series: G.AI.A: Artificial Intelligence and Planetary-Scale Environmental Management