Facilitating Document Analysis – Document Chat
Translating real-world data (i.e. electricity use, heat consumption, or waste) into life cycle inventory (LCI) is essential for assessing the environmental impact of emerging technologies. Traditional methods to obtain this data require extensive domain knowledge and assumptions, often compromising the accuracy of sustainability evaluations. Researchers need a way to accelerate the gathering and validation of key data to assess the potential environmental impact of emerging technologies.
To address these challenges, the UBC Faculty of Forestry collaborated with the UBC Cloud Innovation Centre (CIC) to develop an innovative solution using an automated pipeline powered by a large language model (LLM) to facilitate quick and easy document analysis.
Approach
The prototype called Document Chat allows users to upload PDF documents and ask natural language questions, facilitating the research process by improving the clarity and reliability of life cycle assessments and supporting more sustainable technological development.
Screenshots of UI
From the main page, the user can upload PDFs for analysis. Once uploaded, they will begin processing and can be viewed in the “My documents” section. The user can easily view the amount of pages their upload contains and the size of the file.
After selecting a document that is done processing and is “Ready to chat”, the user can start a new conversation.
The user can enter any questions pertaining to the document in the text-entry box.
In seconds, the LLM will generate a response which is displayed on the user interface. If the user has follow-up queries, they can continue the conversation.
Architecture Diagram
For an in-depth explanation of the architecture diagram, check out the Architecture Deep Dive on GitHub.
Technical Details
Learn more about the core technical components of the solution.
Serverless Application
This serverless application leverages AWS services to seamlessly handle document processing and conversational AI. By utilizing serverless technologies like AWS Amplify, Lambda, and S3, the system offers scalability, cost efficiency, and reduced operational overhead. AWS Amplify allows the application to automatically scale based on user demand, eliminating the need for managing infrastructure. Serverless functions, powered by AWS Lambda, process documents in real time, ensuring efficient workflows without the complexity of maintaining servers. This setup enables faster development, automatic scaling, and cost-effective operations by only paying for what is used.
File Upload and Storage
Users can upload PDF documents through the application, where each file is securely stored in an Amazon S3 bucket. The upload process uses pre-signed URLs, ensuring data security and efficient file handling.
Data Ingestion and Processing
Once a file is uploaded, an automated workflow is triggered. A Lambda function extracts metadata (such as the document size and number of pages) and updates the database. The document is then split into smaller sections, and an embedding is generated for each segment using an embeddings model. Embeddings are stored in a vector database.
Querying and Interaction
Users can query the document using natural language. A Retrieval Augmented Generation (RAG) system retrieves relevant information from the vector embeddings, powered by an LLM hosted on Amazon Bedrock.
Document Management
Users can view, inspect, and delete documents, offering flexible content management.
Check out the solution on Github
A review of inventory modelling methods for missing data in life cycle assessment.
Video
Acknowledgements
This project was created in collaboration with Qingshi Tu, Assistant Professor of Industrial Ecology at the University of British Columbia.
About the University of British Columbia Cloud Innovation Centre (UBC CIC)
The UBC CIC is a public-private collaboration between UBC and Amazon Web Services (AWS). A CIC identifies digital transformation challenges, the problems or opportunities that matter to the community, and provides subject matter expertise and CIC leadership.
Using Amazon’s innovation methodology, dedicated UBC and AWS CIC staff work with students, staff and faculty, as well as community, government or not-for-profit organizations to define challenges, to engage with subject matter experts, to identify a solution, and to build a Proof of Concept (PoC). Through co-op and work-integrated learning, students also have an opportunity to learn new skills which they will later be able to apply in the workforce.