Hi all,
Following up on the movebank-api-client release (https://github.com/mcb77/movebank-api-client) a couple of weeks ago, I've open-sourced a second piece of Movebank tooling that's been sitting in a private repo for years: movebank-mirror.
What it does: takes a Movebank account and downloads study metadata + event data to a local file/folder structure. Metadata is per-study JSON; events are per-(tag, sensor type) CSVs under a directory tree.
That's the whole pitch... a predictable on-disk shape you can read with whatever you already use (polars, pandas, R 'readr', awk).
The engineering-interesting bits:
1. Catch-up + incremental update sync: For each (tag, sensor type) pair, the downloader runs in two modes. Catch-up does chunked historical downloads (50k records per chunk by default, cursor on 'timestamp'). Once caught up, the pair switches to update mode, polling for new rows via 'update_ts' with an exponential backoff ladder (5 min -> 15 min -> 1 h -> 6 h -> 24 h -> 7 d). Backoff resets on any pass with new rows. Goal: stay fresh without being a noisy neighbour to the live API.
2. Resumable: State per (tag, sensor type) lives in a small JSON file next to the CSVs. Interrupt during catch-up, run the sync again next week, it picks up exactly where it left off.
3. Single-process lock: A '.lock' file in the mirror directory means it's safe to schedule via systemd / cron without worrying about concurrent runs trampling state.
4. License acceptance: Configurable via the same LicenseChecker interface movebank-api-client uses.
Distribution:
- CLI: ./gradlew :cli:installDist -> bin/movebank-mirror
- Library: de.firetail.compat.movebank:movebank-mirror:0.0.2 on Maven Central
- JDK 21, LGPL-2.1
- Repo: https://github.com/mcb77/movebank-mirror
Two notes on intent:
- This is a community tool, not an official Movebank product. Built independently against the public API.
- It's deliberately conservative about traffic: a 'movebank-mirror sync' run for a study with millions of events will spread requests across hours, not minutes. If you see a misbehaving instance, that's a bug - please file it.
If you have a setup where you can try it, I'd love use-case reports - especially weird studies with unusual sensor types or licence shapes, since those are the test cases I haven't found in my own data.
Cheers,
Matthias