edYou
Sep 19, 2021
https://github.com/EthanDP/edyou-hackrice-11
The Idea
As a student, I often find myself going through countless research papers, articles, and other documents to grasp the material I need to study. There are times when I struggle to make sense of the concepts explained in these documents and look for alternative sources to aid my understanding. Understanding this struggle in my personal learning process led me to create edYou, a web application that brings important content available on platforms like YouTube right to the digital documents I am reading.
Architecture
To create a seamless and scalable web application, I chose Django, a high-level Python web framework, for the frontend. Django provides a robust and maintainable infrastructure that supports the development of clean, reusable, and maintainable code. Python, known for its versatility and readability, is an excellent language to handle the intricate logic and data manipulation tasks that edYou requires.
Text Parsing Algorithm
A major challenge of this project was to come up with an efficient text parsing algorithm that could identify search-able terms from the documents. This is where Tika and Beautiful Soup 4 came into the picture.
Tika helped me extract content and metadata from various document formats, including PDF and HTML files. Beautiful Soup 4, a Python library, further refined Tika's output by parsing the text and removing unnecessary tags. My custom text parsing algorithm then processed this clean text to find search-able terms that are most relevant to the context based on their frequency and importance.
Integrating YouTube API and PyPDF2
To provide high-quality supplementary materials for learning, edYou leverages the YouTube API. Once the text parsing algorithm identifies the search-able terms, the YouTube API searches for relevant videos for those terms. These video results are displayed beside the document, ensuring that students have easy access to a wealth of information without having to leave the reading interface.
To make the annotation process more intuitive, I implemented PDF manipulation and annotation features using PyPDF2. This Python library allows students to highlight portions of the text, add comments, and store their annotations directly on the document.
Google OAuth for User Management
To facilitate user management and provide seamless access to the edYou platform, I integrated Google OAuth for authentication. This allows users to create an account and log in using their Google credentials, eliminating the need for creating and memorizing a new set of credentials.