Behind the scenes: developing new data pipelines in Stall Catchers

blog · July 17, 2020, 12:40pm

Have you been wondering what's happening behind the scenes in Stall Catchers? There haven't been too many interface updates recently, but you might have noticed some changes regarding how the movies are loaded and handled in the game... Movies now load quicker, and the "you already annotation this movie" and similar warnings should have disappeared. We're all set up and will soon be receiving new data too.

All of this is due to some major works in the way we handle data in Stall Catchers. However, these kind of features happen to "fly under the radar" most of the time. They are not reflected in fancy interface changes, but rather fall into the category of a "tune-up". It's kind of like deciding between an oil change and a car wash. You want your car to look nice, but if you wait too long to change the oil, it won't matter what the car looks like. In Stall Catchers we have a prioritized list of literally hundreds of new features. Recently, we realized it was time to bite the bullet and deal with some very important tune-up features that we call "pipelines".

These pipeline features are invisible from a user interface standpoint but they do have strong implications for user experience. We divide them into two parts: our internal pipeline, which governs how movies are selected and distributed to catchers – if we don't do this part right, then when we get low on data, catchers will end up annotating the same movies over and over again, when acquiring more information for movies they haven't seen provides more useful data and makes playing Stall Catchers more interesting.

Our external pipeline defines the process by which entire research datasets are added and prioritized within Stall Catchers. It ensures that if multiple datasets are being analyzed by Stall Catchers, that volunteer effort is being distributed appropriately among those data sets. It also reduces the likelihood that we will run out of data to analyze.

While we'd rather spend time on cool new features like a live chatbox (which is in the works!) we had to get our house in order first. The good news is that initial version of the internal and external pipelines have been implemented and seem to be working, though our super-catchers have been helping us work out a few kinks. This means we've minimized the likelihood that you will see the same movie twice and increased the likelihood that we will always have a fresh movies to annotate, both of which ensure we are speeding as fast as we can toward a new treatment for Alzheimer's.

This is a companion discussion topic for the original entry at https://blog.hcinst.org/new-data-pipelines/