The Sea Around Us In The Cloud

Project details

Published: August 24, 2021

Last Updated: 1 month ago.

View All Projects

Sea Around Us (SAU) is an international research initiative at the University of British Columbia that acts as a critical resource to assess the impact of fisheries on marine environments. SAU has been collecting fisheries-related information from every maritime country since 1950. The data is viewed by government subsidiaries, marine biodiversity and is available on their website.

Since the data presented through the website is vast and complex, when large database queries are made, or multi-dimensional charts are rendered, the website may slow down or become unavailable. Moreover, the Sea Around Us database team needs to take care of the customized data requests sent by scientific colleagues, members of the conservation community, NGOs, and other researchers. Sea Around Us seeks to improve data access, avoid time delays, and provide a better user experience to enhance meaningful analysis and research, broadening the impact of this valuable dataset.

Approach

To streamline the data analysis workflow, SAU in collaboration with UBC Cloud Innovation Centre (UBC CIC), developed a solution that uses Jupyter notebooks, SAU data, and AWS services using a server-less architecture. This approach empowers the users to execute complex data analyses and customized visualizations to communicate insights, in addition to alleviating the load of the SAU website for obtaining similar results. The workflow of the solution is separated into two different scenarios (i) changes required when there is an update in underlying data (ii) day-to-day analysis workflow for a researcher.

The UBC CIC team built Jupyter notebooks templates with data analysis and visualizations examples, complete with the code behind charts and tables. Instructions are also added in the notebooks that explain the context and implementation details of the code blocks. User guides, additional documentation, and other reference material are provided in the GitHub repo to help the users understand the workflow easily, and to make changes as necessary for drawing insights from data.

User Interface: data analysis example

Supporting Artifacts

Architecture Diagram

Technical Details

The solution aims to improve data access for researchers and offers a new way to interact with the SAU dataset using Jupyter notebooks to analyze, interpret and visualize the data. The team worked with the PostgreSQL database of SAU and the queries provided by the SAU data team. As seen in the architecture diagram, the first part of the workflow involves exporting the Relational Database Service (RDS) snapshot of the SAU database (filtered to only the necessary tables) to S3 storage. The metadata is captured with the help of Glue crawlers, making queries possible. When the queries are run on Athena, CSV files are automatically generated in folders on S3. This is the subset of SAU data highlighted in the architecture diagram Day-to-day workflow for an Analyst. In the Day-to-day workflow, an analyst or a power user can load this subset of SAU data in S3 (via CloudFront) into Jupyter notebooks in any environment for further analysis and visualizations – easily generating insights from data. Jupyter notebooks that demonstrate different scenarios related to data analysis, visualizations, and data access are also included in the notebooks section of the Github repository.

Monitoring the solution utilization can be done via CloudFront statistics. Logs set up for the distribution can be used to monitor and analyze popular objects, usage by location, and to track the trends in requests and data transfer.

The solution enables researchers and analysts to perform data analysis or create custom visualizations as needed within the notebook itself, reducing and potentially replacing the need of using the SAU’s website for obtaining analytical information. More importantly, this solution has the potential of offloading 80% of customized data requests (8 customized data requests per month) from the Sea Around Us database team. Such requests usually come from scientific colleagues, members of the conservation community both national and international (for example, World Wildlife Fund and NGOs involved in SDG 14 Life Below Water), graduate students, and other researchers.

Link to solution on GitHub: https://github.com/UBC-CIC/Sea-Around-Us

Acknowledgements

The Sea Around Us is supported by the David and Lucille Packard Foundation, Marisla Foundation, MAVA Foundation, Minderoo Foundation, Oak Foundation, Oceana, Paul M. Angell Family Foundation, Pew Charitable Trusts, and the Bloomberg and Rockefeller Foundations via RARE.

About the University of British Columbia Cloud Innovation Centre (UBC-CIC)

UBC’s CIC is a public-private collaboration between UBC and Amazon. A CIC identifies digital transformation challenges, the problems or opportunities that matter to the community, and provides subject matter expertise and CIC leadership.

Using Amazon’s innovation methodology, dedicated UBC and Amazon CIC staff will work with students, staff and faculty, as well as community, government or not-for-profit organizations to define challenges, to engage with subject matter experts, to identify a solution, and to build a Proof of Concept (PoC). Through co-op and work-integrated learning, students also have an opportunity to learn new skills which they will later be able to apply in the workforce.

UBC’s CIC focuses on Community Health and Wellbeing.