What powers the Kea Database?
23 June 2019 |
The Kea Database was first launched in mid-2017 as part of a wider citizen science project called the ‘Kea Sightings Project’. Since then, the database has been extended and improved over time—a never-ending process of bug fixing, usability enhancements and data management. Like most modern technology projects, the Kea Database draws on a multitude of different technologies, resulting in a powerful set of features for obtaining, displaying, querying and exporting data. This article seeks to explain how it works and the rationale behind the technology chosen, decisions that were made even more critical by the long-term nature of this project and its data.
Before looking at the technology specifics, the question must be answered: why another database? There are already many excellent citizen science initiatives out there, with iNaturalist NZ and eBird already showing hundreds of thousands of observations of New Zealand birds and other species.
The primary reason for building a new platform is that no other system currently provides for the identification of individuals, along with the ability to add profiles and descriptions of each. Kea are a particularly charismatic species with distinct behaviours and human interactions that make them such a highly visible species in the wild. The distinct personalities and different behaviours across birds enables the creation of meaningful and interesting ‘personal’ profiles, an approach less likely to work with less-characterful species.
Nonetheless, as I highlighted in my honours dissertation1 it is very important for projects with a long-term view to consider the ongoing sustainability of the of their databases. Though the dissertation was written after the creation of the database, the sustainability of the database was certainly considered as part of its design.
To ensure the sustainability and longevity of the of the project and its data:
- The project uses mature, stable and well-supported frameworks
This ensures that other developers or future maintainers can continue to support the project.
- The project was open source from day one
By open-sourcing the project, the so-called ‘bus factor’2 is reduced in that anyone should be able to immediately provision a copy of the software and associated services should something happen to the maintainer.
- The project has had an API from day one
The front-end of the database (what the public sees) has only ever accessed and uploaded data by the same API that any application can use. This is in contrast to other, often older, projects that use a separate interface for accessing and displaying data to a lesser-featured API.
- All public data is exportable
Through the API, all public data is viewable via the REST interface in a JSON or CSV format. Hence, should the data need to be imported into another system at some future point, it should be easily achievable.
With these four factors in mind, these should help mitigate the risks associated with running a separate database. In fact, because of these considerations made earlier in the project, the Kea Database is rapidly becoming the location for all kea-related data, including for historical datasets, scientific surveys and data collected through partnerships with backcountry organisations.
The project has been coded using open source technologies exclusively, with the code being released for others to use under an Affero General Public Licence (AGPL-3.0)3. Using open source software was the obvious choice—not only because of the technological benefits, but also because of its basis in collaboration—much like the citizen science nature of the Kea Database.
We were even lucky enough to have our project win the award for the science category in the 2018 New Zealand Open Source Awards, recognising the “natural combination […] leveraging one community to help build another, with mutual benefit.”4 We were delighted with the unexpected award, and naturally had to share it with the resident kea at Willowbank Wildlife Reserve in Christchurch.
The Kea Database uses a variety of services to provide both moderators and the public with maps, geospatial processing, content-editing tools and various methods of querying and displaying the data. The following diagram aims to illustrate some of the many linkages between these services.
External services are highlighted in green, custom components are in blue, and infrastructure is in orange. Red lines highlight the components that are directly accessible via a URL.
The following sections seek to summarise the technology choices and justifications for the choice of each component.
At the heart of of the Kea Database project is Django, a mature Python framework that powers platforms such as Instagram5. GeoDjango is an official version that adds support for handling spatially enabled data. Paired with GeoDjango is the database PostGIS, a spatially-enabled version of the open-source Postgres database.
These two technologies, combined with the excellent ‘Django REST Framework’ and various other Python modules, form the single access point for submitting and obtaining all kea-related information. This central access point is then used across multiple services, including the main website, a special map and a new survey tool for the Department of Conservation.
- Provides the GET API endpoints for all 1500+ kea in the database and associated band combos, including querying and sorting.
- Provides POST API endpoints for reporting observations and surveys.
- Provides GeoJSON endpoints for direct view and analysis in tools such as QGIS, or displaying on web maps.
- Interfaces with Amazon S3 object storage and CloudFront caching for image storage. Automatically processes uploaded images into multiple sizes suitable for use on the web.
- Automatically allocates uploaded sightings to a particular geospatial region and reverse-geocodes them to the nearest named feature to enable easy dataset browsing and moderation.
- Provides a secure administrative interface for moderating sightings and updating/editing information.
- Imports raw data from the Department of Conservation’s internal Access-based kea database through repeatable and structured scripts to ensure both datasets are synchronised with each other.
- Enables the import of suitably formatted sightings data from other sources via CSV.
The combination of GeoDjango, PostGIS and Django REST Framework was a no-brainer given the hugely powerful feature set and community-backed modules. This decision has enabled complex functionality to be rapidly built, whilst retaining relative simplicity in the source code. Django also has a great testing framework and a well-defined process for its updates and deprecations, all contributing to project maintainability.
The initial version of the front-end of the Kea Database was ultimately rewritten8 after about a year, using a different mapping library, leveraging the lessons learned from the first version and also taking advantage of now-mature technologies such as Bootstrap 4. Presently the predominant technologies used are React, Redux, React Router, SCSS & Bootstrap 4, Formik, Yup and the excellent
create-react-app tool for easy code transpilation. The front-end currently uses
mapbox-gl for the mapping elements however after some successful testing this will likely be changed to the open-source Leaflet library.
- Presents the main interface by which the public interacts with the database and upload sightings.
- Enables the database to be easily accessed across a variety of devices from mobile to desktop.
- Provides the means to browse a map of sightings, as well as individual birds and their sightings.
- Automatically creates visualisations of band combos to improve usability and identification of individual birds.
- Connects to the WordPress.com API to provide administrator-editable help pages, home page text and recent blog posts.
Hosting has long been a bugbear of the various projects I have been involved with—there are certainly cost advantages to hosting everything on a $10/m VPS on the other-side of the planet, but equally this comes with issues such as latency, as well as having a whole stack of components to maintain and backup. After six years of maintaining my own SaaS product for high schools, I was not keen on having the mental overhead (albeit slight) of having to manually maintain systems, nor was I keen on having to create a whole set of monitoring and automation infrastructure—both routes I’d tried in the past.
Whilst a Kubernetes-style container approach may yet solve many of my problems, in May 2017 application platform Heroku seemed like the easiest option for the database back-end. After some slight rearrangements of the Django application layout and the wrangling of some ‘buildpacks’ to enable the necessary geospatial libraries, a simple git command was all it took to deploy to production. Heroku sorted all of the necessary rollback features, auto-renewing SSL certificates and various performance metrics. Secrets are simply stored in environment variables on the Heroku administrative interface and all of the Kea Database data is stored in either object storage or a managed database enabling easy backups.
Heroku is not without its problems, the main two being cost and performance. The cost is probably at least twice what it might be if I were managing the infrastructure myself, and the performance could be improved—for example there is presently no caching on at the back-end level. Normal protocol for hosting a Django app might be to put it behind NGINX, but for Heroku that would be a separate chargeable service.
The front-end is entirely static, and therefore very simply hosted on Amazon’s S3 object storage, with a CloudFront layer handling caching and SSL certificates. A one line deploy command in
package.json for the various front-end packages means a deploy only takes a few seconds—no maintenance required.
As an aside, code submitted as a pull request automatically undergoes some checking on Travis CI, who offer free services to open source projects. The front-end presently has no testing, but is checked for code consistency. The back-end presently has some unit tests which are run, with a code coverage report being produced.
The current hosting approach has been working OK for the last couple of years, but there’s a few issues to be resolved:
- Data sovereignty
Data sovereignty is an increasingly important and relevant consideration with the ever-increasing amounts of data collected around the world. Whilst perhaps not so significant for non-personal data such as on the Kea Database, on principle the data should be stored in New Zealand.
As more data gets added to the database and its functionality is extended, the performance constraints of the current hosting arrangement are becoming increasingly clear—there’s only so much database query optimisation that can be done. Adding a caching layer and increasing the available computing power would be immensely useful, but it is presently cost-prohibitive to do so.
Presently the per-month cost of the database is high, especially given the current performance constraints. Ideally the hosting would be moved to a a more flexible service, or potentially even to a service offering free hosting and maintenance for non-profit organisations.
All of these issues need to be resolved in due course, so long as the primary goal of easy ongoing maintenance and backups is considered. I don’t yet have an answer to these issues, but there’s some promising work in the containerisation space and potential sponsorship arrangements in the works.
The Kea Database leverages a number of external services, either to reduce cost or complexity.
Mapbox was selected for the second iteraction of the Kea Database front-end, as its pricing model changed to provide a generous free tier. Mapbox is used to provide an ‘outdoors’ style basemap, and a library called
mapbox-gl which powers the maps on the main Kea Database front-end.
mapbox-gl library, whilst powerful in some aspects, proved to be quite limiting in other aspects—for example, it does not have a library of community plugins like other popular open source mapping libraries do.
The mapping library Leaflet has been trialled elsewhere with good effect, so it is likely that the main Kea Database will be switched to this at some point. Fortunately, Leaflet can be used with the underlying Mapbox outdoors basemap provided through a standard Tile Mapping Service interface.
Beloved by trampers and outdoor enthusiasts everywhere, the New Zealand Topo50 maps are almost certainly the most used and recognisable map series for any backcountry navigation. As such, their inclusion on the mapping interfaces of the Kea Database rapidly became a ‘must-have’ feature.
Land Information New Zealand (LINZ) helpfully provides an online data portal9 with thousands of layers of data, one of which is a combined layer of the Topo50 series of maps. After searching for various solutions for hosting the tiles, short of generating and hosting them ourselves the data portal’s provided Tile Mapping Service (TMS) quickly became the obvious choice—as a standard, it was fast to implement. Despite some vague wording10 around what ‘reasonable use’ rate limiting, so far this has been a great solution—Topo50 map tiles now automatically appear when the map is zoomed in to a suitable level. If for some reason the rate limit is exceeded, the map simply degrades to the underlying Mapbox basemap.
The need for a simple content management interface became apparent in the early days of the project. Not wanting to clutter up the Kea Database back-end with content management functionality, WordPress.com was a cheap, simple and performant choice.
WordPress.com provides a simple REST API, enabling the Kea Database front-end to make a simple API query to pull across all relevant data for the home page, the various footer pages and blocks of text elsewhere around the site. Naturally, it also provides a great blogging interface and as a managed service implementation of an open source product, there has been zero maintenance required.
Essentially: it just works!
Google Analytics is a widely used tool for helping understand how websites are used. We use it to find out how users are getting to our site (e.g. directly, searching, social media), how long they’re spending and what they’re doing there. Analytics are also good for determining how people view the site—for example, more than 56% of visits in 2019 have been using a mobile device, highlighting the importance of ensuring a good experience on small screens.
In an ideal world the analytics would of course be provided by a non-Google service for reasons of privacy—something that is in the backlog.
Like all good software projects, there’s a continually updated backlog of features and bug fixes that get addressed as time allows. Currently on the cards is a project building a survey tool for the Department of Conservation, with the goal of creating a long-term dataset in a more rigid scientific framework to enable increased understanding of kea population numbers. Independently, there is a separate mapping interface being developed to enable the import and display of a number of sightings done by a non-profit community organisation that has a partnership with the Kea Conservation Trust. Due to the flexible architecture of the database, building new interfaces is relatively trivial, each interacting with the data through the common REST API.
There are also many improvements to the main Kea Database itself in the works, such as refinements to the user interface of the sightings form, public image uploading and improving the behaviour of searching. However, despite hundreds of hours of work the backlog never seems to get smaller!
The goal is that the above notes are of interest for anyone seeking to build a similar service or use the open source Kea Database code11, but if nothing else, hopefully this article will be useful for any future maintainers of the project. Feel free to direct any queries or advice to me on social media or elsewhere.
As always, by their very nature community projects are always more than just the work of one individual—I’d like to acknowledge the hard work of my project co-conspirator, Dr. Laura Young, for her dedication to all things kea. I’d also like to acknowledge project co-founder Mark Brabyn, who is now pursuing other projects.
Laura and I are also very thankful for the continued support from the Kea Conservation Trust, the Arthur’s Pass Wildlife Trust and the Department of Conservation. Equally, we’d like to mention our many brilliant project sponsors, such as Active Adventures, whose ongoing support for the project is gratefully welcomed.
https://nzosa.org.nz/ (quoted from the awards booklet)↩
I rewrote the database front-end with support from Satoshi at Catalyst IT, for which I’m very grateful for!↩