Getting the same movies multiple times

annettei · November 17, 2017, 2:48am

Hello,
In the past couple days I seem to be getting the same movies repeatedly. I know I’ve gotten one movie at least three times because of comments that I’ve added. Is this intentional or a problem with the game? Just thought I’d check in case it is a problem.
Thanks,
Annette

gcalkins · November 17, 2017, 1:45pm

Hi Annette -
We have completed the most recent dataset. Yay!!! You are correct that we are now repeating the annotations of previous vessels. You will see that the number of “Flowing” and “Stalled” votes are up in the 30’s, 40’s and 50’s. This indicates that we are on the “default” set of vessels that keep the game running smoothly without the “Out of Movies Error”. I know Pietro and the team are working to get a new dataset uploaded ASAP. They are trying to get this to happen automatically, but as of now it is a manual process. They are aware that the supercatchers have run out of new vessels to annotate.

caprarom · November 17, 2017, 3:59pm

Thanks, Annette & Guy. I was just looking this morning and wondering where all the new movies had gone. I was seeing flows with 80 votes. That algorithm must be really well tuned now to complete the dataset so soon. Impressive.

gcalkins · November 17, 2017, 5:18pm

Hi Mike - I don’t think the algorithm changed its stopping point, but rather it was just a smaller dataset than the previous one. Each dataset is unique and the number of vessels to be annotated will vary.

annettei · November 18, 2017, 3:09am

Thanks Guy, for letting me know what is going on! I guess I will take a break now and wait for the new movies to be loaded.

pietro · November 20, 2017, 8:40pm

Dear Super-Catchers,

(and thanks, Guy, for alerting to the need for more information on this!)

I apologize for the radio silence and lack of new data. I’d like to explain, as best I can, what is going on with our lack of data and how we intend to resolve it.

We have a commitment from the Cornell Lab to provide weekly data installments. However, the people responsible for providing the data have multiple obligations in the lab, and in this particular case, those obligations cause a delay for the current installment. Combined with that was (as Guy observed) a smaller than usual dataset for the previous installment. So basically, we had a perfect storm for the current data drought. By way of update, @mh973 tells us that the new dataset will be uploaded tomorrow (11/21/17).
In our current Stall Catchers data pipeline, there is an issue. When all vessels have been annotated at least once, then new vessels are randomly assigned. What this means is that if a super catcher has annotated 1/4 of the vessels and the rest of the community has annotated the other 3/4, then as we resample the full set of vessels, super catchers will see familiar vessels 1/4 of the time. We know how to fix this, but getting the design and implementation just right, and in a way that doesn’t bias the research results takes time. The good news is that we finally hashed out a design, and now it is just a matter of inserting this into the implementation (software coding) stage.
On the Cornell Lab side, there is another kind of pipeline in the works that will make it faster and easier to generate new datasets for Stall Catchers (SC). This is held up by two things at the moment: 1) Cornell is awaiting a new validation study that resolves some data agreement issues that came up recently. They reasonably want to be absolutely confident about the quality of data that emerges from the SC+Lab analysis, and so we are taking a new approach that will provide additional checks and balances. 2) The people involved in streamlining the Lab’s data pipeline have had other obligations, but those are ending and they will be available to development and implement this new pipeline. Once it is in place and the current validation study is complete, the Lab will provide a influx of new datasets, primarily evaluating prospective treatments, at a much more rapid pace. In the meantime, we are nearing completion of the high fat diet study data, and will have a new crowd-based result to report about the effect of a high fat diet on the incidence of capillary stalls, which (depending on what we find) might implicate such a diet as a cardiovascular risk factor in Alzheimer’s disease. We will of course report these results back to you.

In the meantime, we will do a better job about keeping you posted on the timing of new datasets and hope you will forgive the lack of information. Thanks for your ongoing patience and, as always, we are grateful for your candid and frequent feedback

My best,
Pietro

pietro · November 21, 2017, 6:07pm

UPDATE: there are approximately 10,000 new movies in the database and 10,000 more on the way by tomorrow.