Document Smart Search

Project phases

Published: September 4, 2025

Last Updated: 1 month ago.

View All Projects

The Department of Fisheries and Oceans (DFO) collaborated with the UBC Cloud Innovation Centre (CIC) to develop an AI-powered prototype to make the process of accessing and exploring official documents more efficient. DFO provides a wide range of scientific and policy resources, but navigating this content to find relevant insights can be a challenge. Users can now search efficiently through conversations with an AI-assistant and filter options; supporting them in finding and utilizing the relevant information more effectively.

Government agencies manage vast document libraries containing critical research and policy information. Finding the right document among thousands can take hours of manual searching. The Department of Fisheries and Oceans (DFO) faced exactly this challenge with their extensive collection of scientific publications and policy documents. DFO identified an opportunity to improve this experience, and approached the UBC Cloud Innovation Centre to co-create a prototype that can help users discover and engage with, making it easier to search, filter, and utilize information.

Approach

The UBC CIC built a web application prototype to help users explore and navigate large collections of official documents, such as DFO’s extensive library of policy and research publications. The prototype organizes content by topic and mandate, generates summaries, and assigns relevance scores in order to help users find relevant materials.

Powered by AWS, the prototype can analyze and rank documents by how closely they align with a users’ search or query. The system assigns relevance scores that reflect the semantic similarity to the user’s search. 

Users can interact with these materials in two ways:

  1. By browsing and filtering documents using topics, mandates, and publication years.
  2. Through a conversational AI assistant that highlights key documents related to the user’s search, helping them identify the most relevant DFO materials.

For example, if used with DFO materials, a user interested in the impact of climate change on Pacific salmon can either search directly through the Document Search feature or ask a question using the AI-Assistant. In both cases, the prototype provides relevance scores and AI-generated summaries to help the user identify which documents most closely align with their query. Whether browsing through filtered search results or receiving recommended sources from the assistant, these features help users focus on the most useful materials.

With this solution, users can locate the most relevant documents on a topic in just one search; reducing what once took hours or days of manual searching to only seconds. This solution can scale to any organization looking to make extensive document libraries easier to discover and explore.

Click here to go directly to the project GitHub repository.

Screenshots of UI

The following screenshots highlight key parts of the user interface, showcasing both the public and administrator views.

Public View

The public-facing interface is designed to help users search for documents, explore topics, and engage with the AI-assistant.

AI-Assistant

DFO Document Smart Search Homepage:  users can search documents, browse mandates, observe emerging trends, and click “Get Started” to begin an interaction with the AI assistant.
To add context to the user’s interaction, users can select a role to receive tailored responses: General Public, Internal/External Researcher, or Policy Maker.
Users can see the source of the responses provided by the AI Assistant by clicking ‘Sources’. It provides a view of the original documents. They can also rate their interaction by clicking ‘Rate’ under the assistant’s last message, where they can give a star rating and leave feedback.

Document Search

Users can filter and explore documents by keyword, topic, mandate, and publication year. For example, typing a phrase such as “Salmon rate of change in abundance” into the search bar will display documents relevant to that query, which can then be narrowed further using the filters on the left.
Users can assess how well a document matches their query using the relevance score; “Document Summary” provides a brief overview of the document, while “Query Summary” offers an AI-generated explanation of its relevance to the user’s search.

Analytics

The Analytics Function allows users to explore trends across DFO’s document library by visualizing document counts over time, which helps identify emerging topics and shifting priorities.

Users can explore patterns in the materials available through the DFO library over time, filtered by topic, document type, or derived topic.
Users begin by selecting filters for topic, document type, and year range. The platform then generates charts that show how document counts shift over time or compare across categories, helping users identify patterns in the available materials. 

Administrator View

The administrator interface provides tools to monitor user engagement, review feedback, and adjust how the assistant responds to different types of users, such as providing detailed, more technical answers for researchers and easier to digest summaries for the general public.

To access admin functions, users sign up by creating an account through the platform’s registration page.
The administrator dashboard shows session counts, user engagement over time, and average feedback scores by role. This helps administrators understand how different user groups are interacting with the tool and identify where improvements or additional support may be needed.
Administrators can easily customize the assistant’s behavior by editing role-specific prompts. Each role comes with a pre-set prompt, which administrators can adjust to better align with the audience’s level of expertise.
Administrators can review user feedback for each role.
Administrators can view user feedback across roles, including ratings and comments; providing a channel for users to engage with the tool and for administrators to gather insights directly from their audience.

Architecture Diagram

This diagram shows the Document Smart Search architecture. The solution ingests documents through Amazon S3 and AWS Glue, combines structured data Amazon RDS with vector embeddings Amazon Bedrock and  Amazon OpenSearch for hybrid search, and delivers AI responses through AWS Lambda-orchestrated RAG workflows with built-in AWS security.

https://github.com/UBC-CIC/DFO-Smart-Search/blob/main/docs/architectureDeepDive.md

Technical Details

The UBC CIC implemented a serverless architecture on AWS to handle unpredictable query volumes while keeping operational overhead minimal. Amazon Bedrock provides the foundation model capabilities, while Amazon OpenSearch enables hybrid search combining keyword matching with semantic similarity.

User authentication is managed by AWS Cognito, which controls sign-up, sign-in, and role-based access. To kick-start the data ingestion process administrators can upload documents to Amazon S3 and trigger an AWS Glue pipeline that scrapes, cleans, and organizes HTML content by topic and mandate.

Structured metadata is stored in Amazon RDS, while text embeddings are generated using Amazon Bedrock and indexed in Amazon OpenSearch. This enables powerful hybrid search, combining traditional keyword and semantic similarity.

The system is secured with AWS Shield for DDoS protection and AWS WAF for application-layer threats. All interactions flow through Amazon API Gateway, which routes requests to AWS Lambda functions. For admin tasks (like managing users or prompts), Lambda accesses DynamoDB for chat memory and RDS for structured data.

When a user asks a question, a Lambda function launches a Retrieval-Augmented Generation (RAG) process: pulling conversation history from DynamoDB, facts from RDS, and relevant documents from OpenSearch. It then sends this context to a Bedrock-hosted LLM, which generates a tailored, source-grounded response.

Link to solution on GitHub: https://github.com/UBC-CIC/DFO-Smart-Search

Video

Acknowledgements

Student team:  Developers: Daniel Long, Tien Nguyen, Nikhil Sinclair, and Zayan Sheikh. Project Assistance by Amy Cao and Harleen Chahal.

About the University of British Columbia Cloud Innovation Centre (UBC CIC)

The UBC CIC is a public-private collaboration between UBC and Amazon Web Services (AWS). A CIC identifies digital transformation challenges, the problems or opportunities that matter to the community, and provides subject matter expertise and CIC leadership.

Using Amazon’s innovation methodology, dedicated UBC and AWS CIC staff work with students, staff and faculty, as well as community, government or not-for-profit organizations to define challenges, to engage with subject matter experts, to identify a solution, and to build a Proof of Concept (PoC). Through co-op and work-integrated learning, students also have an opportunity to learn new skills which they will later be able to apply in the workforce.