My name is Utkarsh Jadhav, and I am a master’s student of Computer Science at Northeastern University, here in Boston. I have spent the last thirteen weeks at edX working with the Platform team. The Platform team is responsible for building up the infrastructure for Open edX products, particularly working on making the code run faster. In this post, I would like to highlight one of my major projects, Call Stack Manager. It is a tool which keeps track of unique call stacks of functions, methods, and Django Model classes.
Courseware Student Module (CSM) andCourseware Student Module History (CSMH) in edx-platform code are responsible for maintaining the user state of students attempting problems and their respective grades. The edx.org website recently hit 5 million learners; the growing size of CSM and CSMH is causing serious concerns about space and database break-downs. Breaking up monolithic code areas such as the LMS and CMS in edx-platform is a vital need as the code base and database volume increases.
edX User State Client (eUSC)
Originally, the structure of user state client was as follows –
Previous structure of User State Client showing communication between edx-platform and MySQL database via CSM/CSMH
The whole structure was under edx/edx-platform. User state client was directly communicating with MySQL tables named courseware_studentmodule and courseware_studentmodulehistoryvia the DjangoORM.
When I started my internship, the team already knew about the downsides of this architecture. In order to fix this, I helped implement the proposed architecture which is shown in the following figure:
Proposed structure of User State Client showing a layer edx-user-state-client between platform and the database backend
In this new architecture, edx-user-state-client acts as a layer between edx-platform and the database backends. There are many advantages of using such a structure:
- eUSC will act as a single interface on an abstracted level through which all calls to database will be made. Such a pattern will eventually help in effective communication with database.
- eUSC will also allow easy switch between different backends. This API will allow a choice for backends, even for distributed tasks.
As a first step towards creating this structure, I created a repository named edx/edx-user-state-client. This repository contains the interface XBlockUserStateClient, which takes care of all calls made by Django model classes with the database.
Call Stack Manager
The XBlockUserStateClient is responsible for making all calls to the database. However, considering the immense size of the edX code base, use of third party extensions (e.g. XBlocks), and many calls made to database at various places, it is indeed worthwhile to catch calls to the database which are not made via the interface XBlockUserStateClient.
To address this need, I developed a library called Call Stack Manager. This is a library which allows us to track calls that are not made via interface, and are directly communicating with the database. Call Stack Manager logs such calls in the LMS log.
The library implements two main decorators:
- @trackit – which tracks the decorated entity
- @donottrack – which halts tracking of entities decorated by @trackit.
The primary need to develop this library was to track calls of Model classes in CSM and CSMH, primarily StudentModule and StudentModuleHistory. Communication to databases in Django is done by user-defined classes that subclass the Django ‘Model’ class. Model classes use the QuerySet API to create, retrieve, and update databases. Calls made by QuerySet API can be overridden using a custom manager named CallStackManager – defined in the library Call Stack Manager. In this way, the particular case of tracking Django Model classes that directly access the database was handled.
While running Call Stack Manager in its initial version, I faced the following problems:
- Call logs were made repetitively, cluttering the LMS log.
- Many calls that we already knew about were unnecessarily recorded.
- Call logs had unneeded frames in it, making it long and hard to read.
To address these issues, I introduced a new decorator, named @donottrack, which halts tracking for the scope of the function decorated with this decorator. In general, calls to a tracked method can be segregated into two categories: Those that are made by the new interface implementation, and those that are not. Calls that we already know and expect – that is, those calls from the new implementation – can be ignored, as we are only interested in capturing calls that we don’t know about. Thus, we use this @donottrack decorator to hide any tracked calls made by that implementation. At this point, the only calls being tracked will be those made from outside the new implementation.
In this way, expected and known calls made to the database (e.g. via interface XBlockUserStateClient) were not logged, giving us a clear picture on what unknown calls were doing. Also, duplicate frames in the call stack were filtered using regular expression filters. In this way, the number of calls recorded were fewer, more precise, and easier to read.
An example of a call stack mentioned above is as follows –
Logging new call stack number 4 for
File “/edx/app/edxapp/edx-platform/lms/djangoapps/instructor/views/api.py”, line 240, in wrapped
return func(*args, **kwargs)
File “/edx/app/edxapp/edx-platform/lms/djangoapps/instructor/views/api.py”, line 176, in wrapped
return func(*args, **kwargs)
File “/edx/app/edxapp/edx-platform/lms/djangoapps/instructor/views/api.py”, line 127, in wrapped
return func(request, *args, **kwargs)
File “/edx/app/edxapp/edx-platform/lms/djangoapps/instructor/views/api.py”, line 1896, in rescore_problem
instructor_task.api.submit_rescore_problem_for_student(request, module_state_key, student)
File “/edx/app/edxapp/edx-platform/lms/djangoapps/instructor_task/api.py”, line 110, in submit_rescore_problem_for_student
return submit_task(request, task_type, task_class, usage_key.course_key, task_input, task_key)
File “/edx/app/edxapp/edx-platform/lms/djangoapps/instructor_task/api_helper.py”, line 346, in submit_task
File “/edx/app/edxapp/edx-platform/lms/djangoapps/instructor_task/tasks.py”, line 80, in rescore_problem
return run_main_task(entry_id, visit_fcn, action_name)
File “/edx/app/edxapp/edx-platform/lms/djangoapps/instructor_task/tasks_helper.py”, line 279, in run_main_task
task_progress = task_fcn(entry_id, course_id, task_input, action_name)
File”/edx/app/edxapp/edx-platform/lms/djangoapps/instructor_task/tasks_helper.py”, line 345, in perform_module_state_update
modules_to_update = StudentModule.objects.filter(course_id=course_id, module_state_key__in=usage_keys)
During development of the Call Stack Manager library, I had to solve numerous basic Python level problems such as effective handling of Django Model classes, creating Django Model classes at run time for the purpose of testing, wrapping functions so that they do not lose their identity, handling clashes with other decorators such as @contract in PyContracts, and many more.
Call Stack Manager as a General Library
The main purpose of the Call Stack Manager was to track calls of StudentModule and StudentModuleHistory. In addition, we can track any Python function at any particular level of code. Tracking can be halted when required. With the use of this library, we can deprecate unwanted functions effectively by tracking unknown calls. It will be interesting to pursue development of this tool as a generic tool applicable to any Django project.
I strongly believe that the generalised solution of Call Stack Manager can be used as a plugin or standard library for Django/Python project with further additions, and modifications.
Looking back on my internship experience, I enjoyed working on the edX code base. Working on such a large scale open source project which has huge global impact is very exciting. EdX has a team of awesome coders, and treated interns as full time employees with maximum exposure at every level. I found myself working on cutting edge technologies and participated in all kinds of technical discussions with the Platform team. Working on the basic python level and solving unusual and unexpected problems was especially rewarding. I would like to thank John Eskew, Calen Pennington, Ali Mohammad, Brian Beggs, Miki Goyal, Adam Palay, and Ned Batchelder for their continued help and support. Working at edX was a fascinating and challenging opportunity that I will cherish for years to come.