Course Compass

Search engine for university courses made with Python, React, MongoDB, FastAPI, Docker, and AWS.

Code Chrome Web Store

Overview

This project is a full-stack application that allows university students to conveniently look up information about courses. The project currently provides access to the data of over 13,000 courses at Duke University.

The application consists of four parts: data scrapers, a database, an API, and a browser extension frontend.

Data scrapers collect course details from web pages and fetch course data from university APIs. The course data is stored in a MongoDB database. When a student (or any user) wants to learn about a course, they launch the browser extension’s popup window. The user types a course number into the search bar, and the extension sends a request to the RESTful API hosted on Amazon Web Services. The API queries the database and returns the relevant data to be displayed to the user.

Demo

Here’s a quick demo video showing how a user can look up course information through Course Compass.

Tech Stack

Python MongoDB FastAPI Docker AWS Lambda Amazon ECR JavaScript React

Course Compass was built using the FARM (FastAPI, React, MongoDB) stack. Below is a detailed breakdown of the technologies used in each aspect of the project.

  • Data scrapers: Python, with packages like Requests and Beautiful Soup.
  • Database: MongoDB, hosted on MongoDB Atlas.
  • API: Created with FastAPI, containerized with Docker, stored on Amazon Elastic Container Registry (ECR), deployed through AWS Lambda.
  • Extension: React and Vite.

Design Philosophy

Scalability was a key focus for this project, and it largely influenced my choice of the tech stack. I designed the backend architecture to support growth in user traffic and course data volume, and designed the frontend to easily accommodate expansions and new features.

FastAPI’s performant asynchronous capabilities, combined with containerization via Docker and deployment through AWS Lambda, ensure the backend can scale automatically under increased demand.

I decided to use MongoDB as the database system so the application can scale should more universities be added. Different institutions have different attributes associated with their classes. Some attributes, like course name and course number, are fairly universal. Others, like schedule, location, credits, and textbooks, vary across schools. As a result, I chose MongoDB, a NoSQL database, because its flexible schema can accommodate varying formats of course data.

React was used for the frontend because the framework’s modular approach allows new components to be easily added in the future.

Motivation

I began developing this project in November of 2024, when I was a first-year undergraduate student in my first semester of college. In the months prior, I realized that one major difference between high school and university is the almost overwhelming number of classes offered in higher education. Some departments at my university list more courses than my entire high school does.

Like many software projects I develop, I created Course Compass to solve a problem I personally have: the challenge of quickly accessing important information about my college’s courses. It is impossible to memorize the corresponding course name of every course number, let alone details about every course. The obvious solution is to search for course numbers on Google or another search engine. However, this method proves inconsistent. Simply Googling a course number by itself rarely ever yields results from the desired school without scrolling. Even appending the college name doesn’t always return the course info page I am looking for: often, course instruction websites from past semesters are shown instead. Many of these don’t even have the course name displayed prominently, if at all. To conclude, short of navigating to a department course catalog and scrolling to the appropriate course, it is difficult to find crucial information about classes like prerequisites and when they are offered.

My frustration with looking up courses led me to start a project with a simple goal in mind: a lightweight search tool to quickly find information about courses. Over 1,000 lines of code later, Course Compass was born, and I haven’t had to Google a course number since.

How It Works

All that users see of the application is the extension frontend, where they look up course information. However, behind the scenes, many processes are running to ensure the systematic storage and efficient retrieval of data.

This diagram provides a high-level overview of how the application’s various components integrate with one another.

sequenceDiagram
  actor User
  participant 🧩 Extension
  participant 🔗 API
  participant 🗄️ Database
  participant 🤖 Data Scrapers

  🤖 Data Scrapers->>🗄️ Database: 0. Store scraped course data
  User->>🧩 Extension: 1. Searches for a course
  🧩 Extension->>🔗 API: 2. Sends search query to API endpoint
  🔗 API->>🗄️ Database: 3. Queries for matching course
  🗄️ Database->>🔗 API: 4. Returns results of search
  🔗 API->>🧩 Extension: 5. Sends response with course data
  🧩 Extension->>User: 6. Displays course info in popup

Data Scrapers

Combining web scrapers and API callers, object-oriented Python scripts fetch data on course listings. Requests is used to send HTTP requests, Beautiful Soup is used to parse and extract data from HTML, and YAML settings are used to configure program execution. The fetched data is first saved to a cache directory before being written to the database.

Database

Cached course data is written to the cloud-hosted MongoDB database programmatically using Python, along with synonym mappings and MongoDB Atlas search indexes.

Synonym mappings are defined for some department codes. For instance, CS and COMPSCI are synonymous in the context of Computer Science courses.

MongoDB Atlas search indexes are defined for the course number and university ID fields. The course number field is indexed with two types: autocomplete (useful for autocomplete matching) and string (useful for full text matching). The university ID field is indexed with the token type.

API

The database is exposed to the frontend through a RESTful API made with FastAPI. The API constructs MongoDB aggregation pipelines to query the database, taking into account synonymous department codes and the possibility of typos.

The API is containerized using Docker. The resulting Docker image is stored on Amazon Elastic Container Registry (ECR). Then, an AWS Lambda function is created from the Docker image and deployed to the cloud. The API is accessed through its AWS Lambda Function URL.

For more information on the search pipeline and deployment on AWS Lambda, see the next section on Challenges and Learnings.

Frontend

Users can query the database through the application’s frontend, which is a browser extension. Developed with JavaScript, React, Chrome Extensions APIs, and styled with CSS, the extension was built with Vite and published to the Chrome Web Store, where users can easily install it in their browser.

Through the extension, users can conveniently look up course information in two ways. Clicking on the extension’s icon in the extensions toolbar opens a search bar where a user can enter course numbers. The popup window then displays a variety of information about the course. Additionally, a user can select a course number on any web page. Then, through the context menu opened from a right click, the user can perform a course search on the selected text. The extension popup will open to show all the details about the course.

Challenges and Learnings

The technical challenges I overcame as I developed Course Compass taught me valuable lessons and strengthened my skills as a developer and engineer. Below are two of the many problems I encountered and the knowledge I gained while solving them.

For users to smoothly look up courses of interest using course numbers, some factors need to be taken into account.

  1. Users can type things wrong. Someone intending to search for a COMPSCI course could accidentally type COPSCI.
  2. Department codes are not universal. COMPSCI courses in one school could be marked as CS classes in another. Even in the same school, some people may refer to Statistics classes as STAT classes while others call them STA or STATS courses.
  3. Different users type the same course number differently. One user may type in MATH 218D-1 while others may enter math218d1 or math 218 d1.
  4. Users may not always type out the entire course number. For instance, a partial course number like STA 199 should be autocompleted to match STA 199L.

With all these potential variations, an exact matching comparison is not enough. As a result, I took numerous steps to create a robust search system.

  1. User inputs are preprocessed. Non-alphanumeric characters like dashes are replaced with spaces. Consecutive spaces are replaced by a single space. Wherever a letter and a number are right next to each other, a space is inserted.
  2. If the query contains a department code with known synonyms, then query variations are created from the synonyms. Each query variation is then processed until a match is found. For example, if the original query is STATS 240, then query variations are created for STAT 240 and STA 240.
  3. Each query is run through an aggregation pipeline. The pipeline was fine-tuned to combine
    • Exact full-text matching with synonyms
    • Fuzzy full-text matching
    • Exact autocomplete matching
    • Fuzzy autocomplete matching
    • Filter for the target university

This system makes the extension forgiving and flexible, allowing users to find courses even with typos and varying department codes. By leveraging MongoDB Atlas search indexes, the API efficiently fetches the necessary course details, consistently achieving response times of under 200 milliseconds.

While designing this search pipeline, I learned much about MongoDB Atlas Search’s advanced text search capabilities and how to adjust relevant parameters to optimize for the task at hand. More broadly, I learned to approach problems from the user’s perspective and to consider user behavior. A system that works under ideal conditions is one thing, but a system that excels under real-world usage with messy inputs is another.

API Deployment on AWS

Once I had the application working locally, I wanted to deploy the project to the cloud so other people could also benefit from it. The database was already in the cloud being hosted on MongoDB Atlas. The frontend was easily deployed to the Chrome Web Store. The challenge was deploying the API.

I spent a few days researching the various options offered by different cloud providers. Ultimately, I chose AWS Lambda for its pay-per-use pricing model and its relatively simple serverless setup.

Since the API requires a few packages to run, I used Docker to bundle all the dependencies and containerize the runtime environment. This ensured the API would behave consistently on AWS and in my local environment. Once I verified that the Dockerized application was functional on my machine, I pushed the Docker image to Amazon’s Elastic Container Registry (ECR). Then, I created a Lambda function from this Docker image. To make the FastAPI app compatible with Lambda, I used Mangum, an adapter that allows ASGI apps to run on Lambda, to create an event handler. Finally, I made the Lambda function accessible through its Lambda Function URL.

Throughout this process, I resolved a variety of obstacles ranging from insufficient file permissions to port inconsistencies to middleware configuration to evaluating which cloud platform to use. In the process, I gained essential software development skills such as weighing trade-offs, debugging, managing deployments, and more.

Learn More

The code for the project can be found here. The Chrome Web Store listing of the extension is located here.