The goal of my analysis is to assess bird diversity in a region of Panama. I have compiled several different species lists:
- Birds an ornithologist I work with has document during point counts in the region (165 species)
- A list of birds on eBird from the region (510 species)
- A list of birds from GBIF from the region, with some cleaning (removing synonyms) (561 species)
- A list of all birds recorded on iNaturalist in the country of Panama (945 species)
I understand there are tradeoffs with choosing species lists - too broad and the AI might identify birds unlikely to be in the region (so the total country list for Panama would likely be overkill and decrease data quality). On the other hand if I limit myself to the birds observed by the ornithologist I can miss some rarer or cryptic species.
What are your thoughts on how I should go about producing a species list to narrow down the ID possibilities for the AI? Do you have experience with syncing taxonomy with Birdnet Analyzer (for example iNat tends to be more current than GBIF in terms of updated species names)?
3 March 2026 2:09pm
Hi Hubert,
I’m based in eastern South Africa and we have come across the same questions when setting up BirdNET workflows.
We typically use a custom species list in BirdNET Analyzer together with a relatively high minimum confidence threshold if we want to reduce out of range predictions, especially species that may appear in eBird but are unlikely in our study area. Before using that list, we download and cross reference regional occurrence data with other sources. Locally we use SABAP2 and our point count data (where available) as a baseline to build a realistic candidate pool.
That said, being too strict with the species list or the confidence threshold can suppress detections. As you mention, this is particularly true for rare or locally uncommon species which often have fewer training examples and therefore lower confidence scores. So when starting with a new dataset we often begin with a species listand see how predictions are distributed before tightening the filter from there.
This process also helps us identify where the default BirdNET model may be underperforming and where custom classifiers or additional local training data might be needed.
4 March 2026 7:03pm
Hi Donovan,
Thank you for that very helpful.
Some followup questions:
- do you program the high minimum confidence threshold in the app, or via the data afterwards? Other than saving processing time I don't see an advantage to doing it in the app, or am I mistaken?
- What are you looking for after you run through the data once with one species list? Do you have a manual validation step to set confidence thresholds per species?
- What do you do about sensitivity and overlap? It seems an overlap of 2 is good anywhere, but sensitivity is more nuanced: https://onlinelibrary.wiley.com/doi/10.1111/ibi.70013
Donovan Tye