Attentive lector is standing afore audience and waiting of issue. Focus on raised hand

Leveraging Large Language Models for a Course Question-Answer Repository

‘In higher education, students often struggle with finding answers to course-specific questions due to information being fragmented across multiple platforms. This leads to confusion over what and who they should consult to answer their questions, as well as wasted time for instructors when answering logistical questions. The goal of this project is to create a web application that allows instructors to upload course material and provide an interface for students to view course materials and receive personalized responses to their questions. The interface is built on top of a large language model (LLM) that uses uploaded course information.’

— Capstone Team PL-40

Leveraging Large Language Models for a Course Question-Answer Repository

Published:

Leveraging LLMs for a Course Question-Answer Repository is a capstone project, developed for the UBC CIC, by students in the Faculty of Electrical and Computer Engineering to demonstrate a solution that can help students find answers to their course-specific questions in a timely manner and save instructors’ time from redundant questions. Team PL-40 developed the prototype with the guidance and support of the UBC CIC.

Approach

Team PL-40 developed LLM-Course-QA, a question-answering system that leverages LLMs and provides a platform for higher education institutions to help students obtain answers to academic questions. The system utilizes information from course documents uploaded by instructors when responding to related queries.

The open-source solution was designed and developed to enable instructors to upload course documents, process content, and store data in a vector database for retrieval to a question. Through the front-end, a student can ask a question, which the system processes using the retrieval augmented generation (RAG) to fetch the most relevant documents to the query. The selected documents are pulled into the LLM to generate a response for the student.

The above video was created by Team PL-40 and describes the goal of the project and includes a demonstration of the solution

See more about the project in the 2024 Capstone Design & Innovation Day showcase, under Artificial Intelligence and Software Systems.

Screenshots of UI

Admin View

Instructor View

User View

Student View

Architecture Diagram

Technical Components

The Leveraging LLMs for a Course Question-Answer Repository project consists of 3 main components: the front-end, back-end including data ingestion and retrieval pipelines, and the LLM. You may expand the sections below to learn more about each component.

Expand below to learn more about the architecture diagram, step-by-step.

Front-End

Dashboards are divided into 2 views: (1) the instructor and (2) the student dashboards.

  1. The instructor dashboard allows the creation of new courses and the ability to upload, delete, or upload the course information within each course. Course information includes, but is not limited to, syllabi, lecture notes, and class slides.
  2. The student dashboard provides students the ability to join courses created by instructors, view uploaded course material, and receive course-specific answers to their questions based on the content uploaded by instructors. Students may also report incorrect information generated by the LLM.

Data Ingestion and Retrieval Pipelines

The pipeline is responsible for taking uploaded documents from the instructor dashboard and feeding the data into the LLM. The main task of the pipeline is to convert the uploaded documents into a plain text format using optical character recognition (OCR). The text is then vectorized to solve the issue of token limits. All LLMs have a maximum number of characters which can be used in the input, so the vectorized format ensures that the data does not go over capacity. The entire purpose of the pipeline is to clean and parse the uploaded data into a consistent format to train the LLM.

Large Language Model

The LLM provides the core functionality of the application. When students query the chat interface from the student dashboard with a course-specific question, the LLM should provide an answer based on facts acquired from the uploaded documents. However, since LLMs are known to hallucinate, to minimize false responses, the model was fine-tuned by the team to with sample questions and answers to ensure consistent responses.

Link to solution on GitHub: https://github.com/UBC-CIC/LLM-Course-QA

Acknowledgements

Congratulations to Team PL-40 who won an APSC Faculty Award for this work.

Team PL-40 was formed by a group senior students in Electrical and Computer Engineering as part of the UBC Electrical and Computer Engineering Capstone Program. Guidance was provided by a faculty member who acted as the technical director and supported by the UBC Cloud Innovation Center technical team.

Photo by Yakobchuk Olena.

About the University of British Columbia Cloud Innovation Centre (UBC CIC)

The UBC CIC is a public-private collaboration between UBC and Amazon Web Services (AWS). A CIC identifies digital transformation challenges, the problems or opportunities that matter to the community, and provides subject matter expertise and CIC leadership.

Using Amazon’s innovation methodology, dedicated UBC and AWS CIC staff work with students, staff and faculty, as well as community, government or not-for-profit organizations to define challenges, to engage with subject matter experts, to identify a solution, and to build a Proof of Concept (PoC). Through co-op and work-integrated learning, students also have an opportunity to learn new skills which they will later be able to apply in the workforce.