Data privacy for at-risk species

Elizabeth Bondi and I were concerned by the assumption that geo-obfuscating the GPS location of data (ie random jitter within some range) would be sufficient to preserve location privacy, particularly when publishing camera trap data. We dug into it a bit and found that some very simple methods could massively reduce the search area, and wrote it up for the AI for Animals workshop at CVPR this year (attached to this thread).  I'd love to have a conversation about best practices on publishing data for at-risk species, and brainstorm how we can still allow de-siloed, cross-organization research without putting sensitive species at risk.




These topics are blowing my mind, and not in a good way! I just woke up with night sweats after watching Doug, Trishant, Laure and Koustubh's Tech Tutor talk and now this Sara! 

I love open-source and open access so much, but I am super concerned, as I am sure everyone else is, about how to handle these issues. It's all really tricky. Maybe there needs to be (and there probably already is in many cases) a kind of a "chain-of-custody" approach to open data so that any and all uses can be backtracked to sources? A single, central repository for accessing data sets and people have to provide a reasonable amount of personal/organisational information in order to access, and agree not to share data through any other means other than the repository. Urgh, even writing these words is making me cringe! 

I may never sleep again!

Interesting question and kudos for questioning assumptions!

It seems clear that obfuscation only works to the extent that the cost of de-obfuscation is greater than the expected reward.

For high value, highly endangered species, a better approach might be to actually encrypt the data and establish a public key registry of trusted orgs and researchers. Only those whose public keys have been accepted into the chain of trust could decrypt the data.

This would be a semi-decentralized solution, based on the chain of trust.

 

If you haven't read them, these 2 papers offer propose decision-making frameworks for sensitive animal occurrence data:

Tulloch AIT, Auerbach N, Avery-Gomm S, Bayraktarov E, Butt N, Dickman CR, Ehmke G, Fisher DO, Grantham H, Holden MH, et al. 2018. A decision tree for assessing the risks and benefits of publishing biodiversity data. Nature Ecol. Evol. 2: 1209-1217. https://doi.org/10.1038/s41559-018-0608-1

Lennox RJ, Harcourt R, Bennett JR, Davies A, Ford AT, Frey RM, Hayward MW, Hussey NE, Iverson SJ, Kays R. 2020. A novel framework to protect animal data in a world of ecosurveillance. BioScience 70(6): 468–476. https://doi.org/10.1093/biosci/biaa035

The Research Data Alliance has 2 relevant interest groups:

Sensitive Data IG: This is just getting started.

Data Policy Standardization and Implementation IG: For those publishing results, this group is working on advising more consistent data-access policies. In the paper below resulting from this group's work, they give some general suggestions for defining exceptions to open data policies that include sensitive species data:

Hrynaszkiewicz I, Simons N, Hussain A, Grant R, Goudie S. 2020. Developing a research data policy framework for all journals and publishers. Data Science Journal 19(1): 5. http://doi.org/10.5334/dsj-2020-005

If others know of relevant resources please share!