Are the presentation of the movies randomized for each participant? I assume that they must be because if there were a particular pattern to the presentation, that would create an artifact in the results. I was just wondering about the order of presentation of the movies because sometimes it feels like there is some kind of a pattern. Often it seems like there are several grainy movies in a row, followed by several movies of a different resolution. Since flowing blood vessels are more common than stalled blood vessels, I would think that if someone just responded with flowing to all the movies, they would still do better than chance. Has anyone ever tried different response patterns, or rules for responding to see if indeed there is some pattern to the movie presentation, or if indeed the order of presentation is truly random?
Hi Mike my name is Lindsay, I am from the lab that produces the video data and I recently joined the EyesonAlz team. Excellent question. The order of video presentation is indeed completely random. You are not the first to think you are seeing a pattern, this kind of grouping is a fascinating feature of our brains, we try to categorize everything. A good example is how most music shuffling features aren’t truly random but rather calculated, because people create their own parameters to find patterns, thus making real random seem patterned. The equipment used to acquire the data is always being tuned so imaging sessions may vary in quality and graininess but that wouldn’t be an indicator of flowing vs. stalled. Questions like this are extremely important in the emerging field of citizen science, and we take many precautions to ensure that the crowd remains a “blinded” analyst so that personal bias cannot work its way into the system. Sidenote: the algorithms used to evaluate count accuracy include the potential for someone guessing “not flowing” for every video, so even if someone guessed randomly for a large number of videos it cannot negatively influence the quality of the data produced! Thank you for your work toward this analysis, coming from the biomedical lab I can enthusiastically confirm that everyone’s help on Stall Catchers is extremely needed to advance the treatment research.
Sometimes I can guess whether or not a blood vessel has a stall it just by judging from the resolution, without even moving the slider. You say that it should not be possible to make such a judgment just based on the resolution, but I do so very often, and very often, I am correct. Movies that have a very grainy resolution, almost always have a stall in them for some reason. Also, it always seems to be the movies with a very light resolution, almost always seem to have a stall in them. You’re saying that it just appears that movies with a certain resolution have blood vessels with stalls in them, and that’s not really the case? If that is indeed true, then how is it possible that I can judge whether or not a blood vessel will have a stall in it without even moving the slider?
Now sometimes making judgments about whether or not a blood vessel has a stall in it based on the resolution does fail. That is to say, I’m certainly not always correct when I make judgments based upon the resolution of the movie. Certainly, checking every single movie by moving the slider results in a higher level of accuracy, but making judgments based on the resolution of the movie allows me to proceed much more quickly than I otherwise would be able to proceed. If you don’t want people to make judgments based on the resolution of the movie without even moving the slider, then maybe that’s an issue that needs to be discussed because I bet a lot of other people are doing the same thing. I’m kind of embarrassed to admit publicly that this is what I’m doing, but in the interest of improving your research, you need to know that it is in fact possible.
Hi @MikeLandau -
As players, like yourself, complete a large number of annotations, you will start to see repeats of the calibration movies. These are the ones where you are scored as “Correct” or “Not Correct”. These calibration movies are there to help you maintain alertness and to determine your sensitivity to observing stalls in the real vessels when they are given to you for analysis. It is true that some of these calibration movies do have some unique resolutions and become more recognizable. However, the real vessels we are scoring come in a variety of resolutions that sometimes match the calibration movies that we see. You can’t always predict whether it is a new vessel movie or a calibration. While guessing on the calibrations without using your slider might work for some of those very grainy or light color movies, this does not help advance our Alzheimer’s research, So by trying to guess the answers on real movies that have these same grainy and light colored images, in fact, will ultimately require all of us to annotate more movies in order to reach consensus by the community. Guessing doesn’t hurt the eventual outcome of the science, but it doesn’t help us either and makes more work for everybody.
Secondly, by improving your accuracy and being “Correct” all the time, will improve your sensitivity score which means you can earn 116 points on every movie you annotate and receive much larger bonuses on the calibration films.and receive even higher points when you “Redeem”. The higher sensitivity score also means your vote as to whether a vessel is Stalled or Flowing will receive more weight than other players with lower sensitivity scores.
As a citizen scientist we need to take our gaming seriously and help to cure Alzheimer’s as opposed to just going fast and earning points. Please take your time and use the slider to determine if a vessel is Stalled or Flowing even if you think you remember. See if you can join the ranks of the other top players with a high sensitivity score. The Alzheimer’s research will advance even faster as a result.
As a friend and fellow Stall Catcher,
@pietro Please feel free to comment.
Okay, sorry! I kind of thought that the whole guessing thing was not a good idea, but I would like to point out that this is a major flaw with the whole “game” concept. The problem with calling stall catches a game is that it’s not really a game it just has the form of a game. I mean when you have a leaderboard, and ranks, it does kind of encourage people to play it like a game. I certainly understand that having people guessing the answers is tantamount to having subjects in a psychological experiment doing something other than what they’re supposed to be doing, and that of course would lead to useless results as you point out above.
Another thought did occur to me. I have noticed that sometimes the same movie is presented more than once. If a movie is presented more than once to participants, doesn’t that raise the possibility that the participants are simply remembering the answers from the movies that they saw previously, instead of actually doing the signal detection task the way they are supposed to be doing it? In other words, does presenting the same movies more than once, or even several times introduce a confounding variable into the research design?
Great question @MikeLandau -
You are correct, it is possible to see the same real vessel movie more than once. However, this is much less frequent than the repeating calibration movies. Given the random nature of the movies being selected from the dataset, this repetition of a real vessel movie might only occur every week or two. Occasionally, you might see one again the next day, but this is less likely. Avoiding player bias, assuming the same person sees a movie more than once, is a part of the stopping algorithm and is part of the reason we need 20 or so community votes for each vessel. The newly announced dynamic stopping algorithm, which will reduce the total number of community votes required based on player experience, will also take into consideration the potential for player bias.
Personally, I have found that I can’t necessarily remember how I voted the last time, since I still occasionally miss some of the calibration movies that I have seen numerous times. Use your growing experience and you will be fine. Over time you may even change your mind on how you vote based on similar calibration movies.
One of the things I get frustrated with is that sometimes I see spots that look like stalls, but they’re not, so I guess that must mean that they didn’t stay in one place long enough to be considered a stall. Is there a certain number of frames that a spot must be stationary before it is considered a stall?
As far as the movies repeating is concerned, that’s also quite frustrating because it seems like I keep on getting the wrong answer even though I’ve seen a movie several times, so I guess, at least in my case, in those instances, seeing the movie more than once, doesn’t help that much. I’m still not exactly clear about why movies are presented to the same person more than once. If a person scores a blood vessel as flowing during one trial, and stalled during another, don’t those two annotations basically cancel each other out?
As always, your questions home on in key concepts.
Please see below…
The white that you see is the fluid part of the blood, and the dark gaps are red blood cells. If there is a gap in the vessel that remains present throughout ALL frames, then it is usually believed to be a red blood cell that isn’t moving - so the vessel is deemed to be stalled. Of course, when multiple vessels are present within an outline, part of the challenge is determining which vessel is the target one. If a gap exists in the target vessel and an overlapping vessel comes into view at the gap point, that could create the mistaken impression that the vessel is flowing.
We have video examples of tricky vessels with expert narration to explain how they should be annotated. We simply haven’t had time to properly edit these videos, but intend to post them as soon as we can.
When Stall Catchers is operating normally, you would not see the same “real” vessel movie more than once. We are currently collecting extra data on the existing dataset to support our validation of new methods for combining answers from multiple people. While this does not directly accelerate the biomedical research, it potentially improves the efficiency of Stall Catchers which, multiplied by many future datasets, ultimately makes the best use of volunteer time.
For the purpose of answering the biomedical research question, we count only the first instance a movie is viewed. But for the purpose of validating our consensus algorithm, we consider all answers.
Also, the cognitive model of the perceiver that underlies some of our methods assumes that anytime a person is shown the same vessel they will see it differently, and there is some probability distribution governing an answer of “flowing” vs “stalled” based on these varying percepts. Given your academic background, I thought you might be interested to know that.
Why would the model predict that any time a person is shown the same blood vessel they will see it differently? Is that because our visual system has not evolved to recognize stalls in blood vessels? It seems like in these movies there are not many visual features to latch onto. We never see blood vessels, and stalls in our daily lives, so it would make sense that we are not particularly evolved to be very good at this task.
I was just wondering, is there any way to modify the images so that the stalls are more salient to the perceiver? Even though it would probably be more expensive, has anyone ever tried filming the movies in color? Alternatively, I know that sometimes in medical procedures dyes have been used to make certain features of the blood vessel more salient. Has that technique ever been used with these movies? There seems to be a lot of “noise” in these images. A lot of the movies have very strong luminescence, and even some of them appear to have some kind of an electromagnetic static pattern running through them. Is there any way to reduce some of these features?
In some ways, the task of identifying stalls reminds me of the old psychological school of structuralism in which trained observers engaged in introspection in order to examine the contents of their own consciousness, and then come to some kind of consensus. It strikes me that when one observer sees a stall in a blood vessel, and another observer thinks he sees that same stall, we still have no way of knowing if the two observers are having the same visual experience. Visual experience, like all types of perception, is a private experience, so we really never know if what you see, and what I see are the same thing. I guess when a large group of people can come to some kind of consensus, then to some degree we can say with some degree of certainty that we are having the same visual experience. However, that brings me to the question of what we do with the blood vessels in which the crowd cannot come to a consensus as to whether the blood vessel is flowing, or stalled? Just like the structuralists of old, if we can’t come to a consensus about what we are seeing, that would seem to pose a real problem.
Good question! It is a general model of recognition, not specific to Stall Catchers or to this task. It derives from psychophysics research showing that if you show the same person exactly the same thing 100 times, they will never see it exactly the same way, due to noise and other influences in our perceptual system. This model underlies our crowd science approach both for interpreting individual answers and for combining answers to the same movie from many different people.
Some of the movies are certainly overexposed and we have discussed ways to better regulate movie quality in future datasets. In those cases (when no clear gaps are visible), paying attention to whether the texture in clearly moving in one direction or not can be useful to making a flowing/stalled determination. We have also considered adding “tuning knobs” to allow participants to adjust image quality parameters such as brightness and contrast. Another popular request in this vein (no pun intended) is to provide a zoom capability. These features are all part of our ongoing discussion, and the ongoing challenge for us is how to best prioritize feature development.
This is exactly the kind of question that human computation (crowdsourcing science) research seeks to address. Though we call methods related to answering these questions “consensus algorithms”, in some sense it is a misnomer, as consensus implies agreement among all voters. In reality, it is more like a “quorum algorithm” - seeking a majority vote. But the approach is more involved because each participant has a natural bias toward answering flowing or stalled, and each participant demonstrates a degree of sensitivity in discriminating between flowing and stalled. So we do our best to factor in these individual differences. And because humans are involved, we cannot guarantee perfect accuracy, but through validation studies, we can guarantee that a certain accuracy is achieved with a certain likelihood, and those guarantees are sufficient to support the research requirements for data quality.
Thanks for the great dialog, as always, Michael.