Research Data Insights Generator
A professor from the UBC Department of Geography collaborated with the UBC Cloud Innovation Centre to create a solution to help surface meaningful insights from large research datasets. The prototype supports researchers to upload, analyze, and interact with their collected data. To support this process, it includes an AI-powered chat interface that enables users to ask research questions, interpret results, and access contextual responses drawn from their uploaded documents and project materials.
Approach
The prototype is built on AWS Cloud Infrastructure and utilizes Large Language Models (LLMs) to support research analysis across diverse data sources. Researchers can upload survey responses, audio transcripts, and other documents into the project pipeline, where files are ingested, converted to text, and prepared for evaluation. The system employs both plain LLM prompting and Retrieval Augmented Generation (RAG), drawing on contextual documents such as survey questions, transcripts, and literature to strengthen interpretability and relevance of results.
To illustrate the tool’s analytical capabilities, spatial empathy was selected as a representative use case. This use case explores how individuals express emotional connections to particular places and communities through uploaded survey or interview data. Researchers can compare responses across different contexts and group results based on relevant variables, such as the time and location of data collection or participant demographics.
By integrating flexible data ingestion, customizable prompt engineering, and multi-model analysis powered by AWS services, the prototype offers researchers a more robust and adaptable approach to uncovering patterns and generating deeper insights from complex datasets. Built using models hosted on Amazon Bedrock, this solution provides security and scalability, ensuring that user data is never used to train underlying models. The infrastructure enables scaling of computational resources, cost-effective solutions, and reliable performance, while the Amazon Bedrock model selection allows researchers to choose the most appropriate AI models for their specific analytical needs without compromising data confidentiality or control.
For more details, please visit the GitHub repository.
Screenshots of UI
This section outlines the stages of the user journey, as represented through screenshots of the user interface. Members can submit research data, observations, or survey responses and also use the Insights Generator. Researchers have all the same capabilities as Members but can additionally create agendas, add collaborators, and configure AI settings. To gain Researcher access, they must first be added by an Administrator.
MEMBER VIEW
RESEARCHER VIEW

ADMINISTRATOR VIEW
Supporting Artifacts
Click below to see technical details of the solution, including the detailed Architecture. Or click here to go directly to the project GitHub repository.
Architecture Diagram

For a deep dive into the application’s architecture, check the project GitHub repository.
Technical Details
The Research Data Insights (RDI) platform helps researchers organize, analyze, and collaborate on their data using AI. It’s built with secure AWS infrastructure, so the system scales easily, keeps information safe, and is simple to use.
Researchers can upload a wide range of files—like text (pdfs, docs), and audio (mp3s)—and the platform automatically processes them into vector embeddings using the Amazon Titan Embedding model. These vector embeddings are stored with PGVector PostgreSQL and used in the LLM prompts via a Retrieval-Augmented-Generation (RAG) pipeline. This pipeline leverages Langchain to communicate with the vector embeddings and the LLM. These documents can be accessed and queried using AI models such as Mistral and Meta LLaMA by implementing Retrieval Augmented Generation (RAG) workflows. The AI uses the uploaded research observations and context documents to provide context-specific insights and summaries.
A built-in scoring system allows researchers to evaluate how well the AI responses align with their work. This is done by asking multiple LLMs to score a research observation through a scoring prompt and then aggregating their responses to produce a final score. Collaboration is also supported: multiple team members can work within the same research agenda, share insights, and keep separate chat histories for their own interactions with the AI.
For more details, please visit the GitHub repository.
Acknowledgements
This project was created in collaboration with Siobhán Wittig McPhee from the UBC Department of Geography.
Student Team: Development by Harsh Amin and Rohit Murali. Project assistance by Harleen Chahal.
Image by Ryutaro Tsukata.
About the University of British Columbia Cloud Innovation Centre (UBC CIC)
The UBC CIC is a public-private collaboration between UBC and Amazon Web Services (AWS). A CIC identifies digital transformation challenges, the problems or opportunities that matter to the community, and provides subject matter expertise and CIC leadership.
Using Amazon’s innovation methodology, dedicated UBC and AWS CIC staff work with students, staff and faculty, as well as community, government or not-for-profit organizations to define challenges, to engage with subject matter experts, to identify a solution, and to build a Proof of Concept (PoC). Through co-op and work-integrated learning, students also have an opportunity to learn new skills which they will later be able to apply in the workforce.










