It looks like SC is now "humming like a Singer" (sewing machine).
Before I explain the bug, I want to give big thanks to EVERYONE for being patient through this frustrating experience, and extra gratitude to badstallsbadbad, caparaom, sachambers, christiane, gcalkins, and others whose detailed feedback, screen shots, video captures, analysis, and troubleshooting assistance. It was only because of your help that I was able to get to the bottom of this. And an extra special thank you to @seplute for encouraging/pestering/harassing me to address this, while taking on many of my other obligations so I would have the time to look into this. @seplute, in light of your good motives, I have decided remove the restraining order.
As many of you already know, @ieva recently left the project so she could begin a more normal life with her family in Switzerland. Working on a project like this is not a normal job, and especially when you are the chief developer and are on call 24/7 and have to deal with a boss who sends you a list of urgent questions at 11pm, forgetting that you are 6 hours ahead of him. Ieva pretty much created every line of code in Stall Catchers. Her contributions to this project will never be forgotten, and she has even been occasionally giving help despite working in a new job that has her very busy.
We have a new developer on board, who is starting to ramp up with the code. And we'll introduce him soon. In the meantime, code maintenance falls to me. Resolving this bug has been kind of tricky because, even though I'm an old coder since the early 80s, I don't really know PHP, the language that Stall Catchers is coded in. And I certainly wasn't familiar with the SC code. Now that I've taken my first deep dive into it, I can more fully appreciate the tremendous amount of work that Ieva did, as well as the considerable complexity of the code. The good news is that I'm now more familiar with the code and prepared to deal with future issues more quickly.
To make matters even more "exciting", our test server is down and I am in the midst of a "lift and shift" of our servers to a new Microsoft platform so we can be ready to absorb up to 100,000 concurrent users on April 13 for the Megathon. If you haven't seen @seplute's trailer yet - you must! http://bit.ly/MegathonTrailer
Anyhoo - I'll stop whining and explain the bug now...
First we were seeing many error messages saying "you have already seen this movie" and then after a period of time some of our most active users - our "super catchers" - began to see a reversal in their blue tube, where even when they would answer correctly on a training movie, their blue tube would go down, and they'd receive fewer points for each annotation.
The way we calculate the blue tube height prevents bots from taking over SC and it gives exactly the right incentive and weight to catcher inputs. It adjusts automatically to fluctuating cognitive abilities, even within the same day (e.g., sundowning, happy hour, etc). However, because of the way we implemented it to prevent extreme values, if one type of training video suddenly disappears (in this case, flowing training examples), the values get wonky and the bar starts to drop.
This is exactly what happened. Suddenly, there were no training movies being shown that were flowing. New users continued to see them however. This ended up being an important clue.
Part of my investigation was to take the queries generated by the code to find new movies and run them separately - that way I could see if the problem was somehow in the data. So I ran the query for generating a flowing training movie and it came up empty. But why? So I systematically turned off various filters and eventually got to the one that was the culprit: "error_count < 2".
Sometimes a movie doesn't load properly. If this happens repeatedly, it can mean that the movie file is corrupted. In that case, we don't want to try showing it anymore, so we suppress it. So Ieva set a threshold to suppress any movie that generates a loading error more than once. As it turns out, we've been having a lot of movie loading errors (and we still aren't sure why), so many that all of the flowing calibration movies were being suppressed by this filter. And in another few days the same would have been true of the stalled training movies.
Moreover, there seems to be some kind of strange feedback loop whereby movie errors that might be due to a temporary network situation get logged and then result in incorrect errors that say "you have already annotated this movie" which logs yet another error. So it looks like there is a logic bug in the code that perpetuates this vicious circle. I will continue to seek out this bug, but in the meantime, suppressing the filter has resolved the blue bar reversal issue and broken the vicious circle that was causing lots of error reports for users.
Bottom line - from a catcher's standpoint, user experience should be much better. However, related issues could arise as we get close to the end of this dataset. Additionally, in the next week, we are going to move all our servers to a new host and new architecture - so things could get worse again before they get better. On the bright side, in our new server home we should rarely (if ever) have any slowdowns as it will be an "elastic" system that can stretch to accommodate any number of catchers without getting bogged down.
Change is always challenging, but we hope you understand why we are moving in these directions and appreciate so much your patience and loyalty. Our immediate goal is to be running glitch-free and ready for 100K concurrent users by April 13, and I think we are on track for that. Hope to see you at the Megathon! http://megathon.us