Recursion is building maps of biology and chemistry to understand how genes interact with one another and their role in disease.
Recursion scientists perform CRISPR-Cas9 mediated knockout to understand the activity level of every gene in a specific disease model, which allows new discoveries to be made. These maps are specific to a cell-type.
New experiments run on Recursion’s platform generate novel insights that have to be periodically incorporated into new versions of the map. Generating an updated map version was manual and involved running scripts on development machines that filtered and aggregated new experiment data, uploading it to the cache that housed the map. Due to the manual intervention, the maps were updated as and when engineers or data scientists carved out time resulting in scientists having no guarantees for when to expect updates to the map.
Embedding on a team that built and maintained these maps of biology, we approached this problem systematically. We worked with our experiment operations team to ensure new experiments added to the map were ready on a specific day/time of the week. Next, we built infrastructure as code to execute the map building processes on cloud compute instances that were triggered after the experiments were ready. Finally, we improved the reliability of the underlying API that transferred the data from the cloud instances into the cache that housed the map. Making these improvements iteratively, we ensured that our scientists have novel insights at a weekly cadence. Additionally, we repurposed this infrastructure to build multiple maps (different cell types) thus creating an atlas of biology for Recursion.