I’m Kyle McCormick, a rising sophomore at Worcester Polytechnic Institute, and I just finished an 11-week summer internship on edX’s Mobile team. During my time on the team, I focused primarily on performance-related backend work. In this post, I’m going to be talking about one of the two projects on which I spent the most time.

Course Metadata Caching

One of the first issues I took on was to figure out why calls to our team’s User Course Enrollment API endpoint, which is used to display the main screen in our mobile app, was taking over a second to respond. The endpoint simply returns a list of courses in which a user is enrolled, along with some basic course metadata (name, university, start date, etc.), so it was surprising that it was taking more than a few hundred milliseconds. I started my investigation by walking through the code that computed the enrollment list, using pyinstrument to do profiling.

One of the first things that I noticed was that for each course enrollment, metadata for the course was being loaded from MongoDB three separate times (twice in calls to enrollment.course and once during serialization) and it was not being cached between calls to the endpoint. I had a suspicion that this was the cause of the endpoint’s performance problems, which I confirmed using NewRelic’s “X-Ray Tracing” tool. 

New Relic X-Ray Trace screenshot

As you can see from the trace shown above, the function modulestore/mixed.py.get_course, which loads courseware from MongoDB, was consuming 90% of the response time. This implied that there were two ways to optimize the endpoint: make get_course faster, or minimize the number of calls to it by caching the data returned by get_course. While the former is certainly something that needs to be done, it is a task that requires major changes to core platform code. Since a caching solution is simpler to implement – it’s just a layer over the get_course call –  I opted to pursue the latter solution. Additionally, because caching works between requests, it is an improvement that has benefits even if get_course is optimized. 

To cache a course’s metadata, I created a Django model called CourseOverview. The first time metadata for a course is requested, get_course is called, an instance of CourseOverview is created from its return value, and the instance is saved to MySQL. The next request for the course’s metadata will just load the CourseOverview instance, which requires only a single MySQL query (as opposed to the 1-4 MongoDB queries that are executed for each call to get_course). When a course is updated in edX Studio, the corresponding CourseOverview instance is cleared, which forces the next request to fetch the updated metadata by calling get_course again.

I updated the mobile User Course Enrollment API endpoint to use this caching system (PR #8484), and performed load testing using Locust.io to confirm that there was a performance improvement. By looking at production data on NewRelic around the time the updated code was released, the performance impact is strikingly apparent. 

NewRelic performance graph showing faster site response times

The above graph shows the average response time for the Mobile API’s UserCourseEnrollmentsList endpoint in milliseconds. The endpoint was updated to use the metadata caching on July 1st. The average response time dropped from ~1100 ms to ~110 ms.

Additional Applications of Caching

Because it requires a similar set of metadata, I also updated the web student dashboard to use the metadata caching system (PR #8642), which resulted in a less-dramatic but still significant response time reduction. 

NewRelic performance graph showing faster site response times

The above graph shows the average response time for the web-based edX student dashboard in milliseconds. The dashboard was updated to use metadata caching on July 17th. The average response time dropped from between 600-800 ms to ~250 ms.

In addition to these two use cases, the caching system can be taken advantage of in any scenario where basic, user-agnostic course metadata is required. For example, another edX employee recently updated our general Enrollment API to use CourseOverviews instead of calling get_course (PR #8927), resulting in a 10x decrease in the 95th percentile response time for the EnrollmentListView endpoint.

Conclusion

As my fellow intern Ben mentioned in his blog post, the edX internship program really shines in how it embeds its interns into real software engineering teams, treating them like full-time employees and giving them impactful projects to work on. Having never before worked on a project of this scale and number of contributors, I learned an incredible amount. Although my tenure here has come to an end, I hope to stay involved in the awesome Open edX community. Finally, I’d like to thank Nimisha, Chris, Kishore, Adam, Dave, and the rest of the edX team for an incomparable internship experience!