discussion / Software Development  / 25 May 2026

Open-sourcing movebank-mirror tool

Hi all,

Following up on the movebank-api-client release (https://github.com/mcb77/movebank-api-client) a couple of weeks ago, I've open-sourced a second piece of Movebank tooling that's been sitting in a private repo for years: movebank-mirror.

What it does: takes a Movebank account and downloads study metadata + event data to a local file/folder structure. Metadata is per-study JSON; events are per-(tag, sensor type) CSVs under a directory tree.

That's the whole pitch... a predictable on-disk shape you can read with whatever you already use (polars, pandas, R 'readr', awk).

The engineering-interesting bits:

1. Catch-up + incremental update sync: For each (tag, sensor type)  pair, the downloader runs in two modes. Catch-up does chunked historical downloads (50k records per chunk by default, cursor on 'timestamp'). Once caught up, the pair switches to update mode, polling for new rows via 'update_ts' with an exponential backoff ladder (5 min -> 15 min -> 1 h -> 6 h -> 24 h -> 7 d). Backoff resets on any pass with new rows. Goal: stay fresh without being a noisy neighbour to the live API.

2. Resumable: State per (tag, sensor type) lives in a small JSON file next to the CSVs. Interrupt during catch-up, run the sync again next week, it picks up exactly where it left off.

3. Single-process lock: A '.lock' file in the mirror directory means it's safe to schedule via systemd / cron without worrying about concurrent runs trampling state.

4. License acceptance: Configurable via the same LicenseChecker interface movebank-api-client uses.

Distribution:
 - CLI:     ./gradlew :cli:installDist  ->  bin/movebank-mirror
 - Library: de.firetail.compat.movebank:movebank-mirror:0.0.2 on Maven Central
 - JDK 21, LGPL-2.1
 - Repo:    https://github.com/mcb77/movebank-mirror

Two notes on intent:

 - This is a community tool, not an official Movebank product. Built independently against the public API.
 - It's deliberately conservative about traffic: a 'movebank-mirror sync' run for a study with millions of events will spread requests across hours, not minutes. If you see a misbehaving instance, that's a bug - please file it.

If you have a setup where you can try it, I'd love use-case reports - especially weird studies with unusual sensor types or licence shapes, since those are the test cases I haven't found in my own data.

Cheers,
Matthias