With @MikeLandau’s consent, I am sharing some excellent questions he asked me via email, and will chase down answers for them shortly:
Some of those movies have an extremely large number of negative comments on them. How many comments do there have to be on a movie before you consider taking it down? Why do some incorrect responses only move the blue bar down a little bit, while other incorrect responses move the bar extremely far down? Sometimes when I get one wrong, the bar only moves down a little bit, but with other movies, when I get it wrong the bar moves down a lot. I was wondering, what is the difference between those movies?
Another thing that I’ve always wondered is what is your criterion for perfect performance? I know that you have someone who is an expert who makes judgments on the movies as to whether or not the vessel has a stall and it. I’m just curious, why is this person considered to be an expert? I understand that he has years of study in this field, but it would seem to me for the purposes of this task that a person should be considered an expert depending upon the degree to which he can demonstrate 100% sensitivity on the signal detection task. For example, I know there are a few people who can probably keep the blue bar at the top for longer than I can. I would say that they are better at playing the game than I am. Can your expert perform to the same criterion as the people who can play the game better than I can? I can’t come up with their names at the moment, but I see them on the leaderboard almost every day. If he cannot, then I think you would have to admit that they are in fact more of an expert than he is, and I’m being totally serious! I’m just saying that there are a few movies that have these negative comments on them by the top players, and you still haven’t taken them down, and I’m wondering why. Just curious. On the other hand, if your expert can keep the blue bar at the top all the time, then of course he is a true expert, and you can pretty much disregard what I’ve said above. I was just wondering what your criterion is for expert performance.
I was also thinking that it would be nice if people could actually see whether other players have the blue bar all the way to the top. Right now, all people can see is the total number of points accumulated for that day, yesterday, or the week. It would be nice if people could also see which players currently have 100% sensitivity. The total number of points is a little bit deceptive because if someone has an incredibly large number of redeemed points for the day, it shoots them right to the top of the leaderboard, and makes it appear as though they’ve done a lot of work on that day when in fact it might just be that they had a lot of redeemed points saved up, and it allowed them to go to the top right away without even having to do any work on that particular day.
Caught, and will attempt to answer @MikeLandau’s characteristically incisive questions …
We use the “bad movie” and “bad feedback” flags as ways to find movies that should be considered for removal. Recently (IIRC), one of our vessel experts went through and looked at user comments on calibration movies as another approach and selected some movies for removal. We have not actually removed those movies yet, but plan too.
The extent to which the bar drops depends on whether your incorrect response was a false negative (missed stall) or false positive (incorrectly identified stall) as well as your recent history of false positives and false negatives. In general, a false negative will probably have a greater impact on your sensitivity bar than a false positive.
I think this question, like the others, is apt! But the logic might be circular - in order for the expert to have a perfect bar, the expert would need only agree with herself, whether right or wrong. The more crowd-generated data we obtain and compare to the laboratory-based annotations, the more we realize how subjective the answers can be, especially for very challenging and possibly indeterminate vessels. There are cases of crowd-expert disagreement we’ve examined closely where the expert was determined to be correct (by a group of scientists) based on aspects of the vessel movie that would be virtually impossible for a non-expert to consider (for example, if it is impossible to tell looking at the designated vessel itself, consideration of the adjoining flow context can give clues: if one of the vessels that shares a junction with the designated vessel has one end flowing and one end stalled, then the designated vessel must by flowing because the blood has to go somewhere.) Observing this disagreement between the crowd and experts, and even among experts, leads to a few considerations:
we need to provide advanced annotation courses to volunteers who have achieved some level of expertise (perhaps as measured by sensitivity).
we might want to let our high sensitivity catchers provide a second level of vetting for the tricky vessels
we need to reconsider how the crowd-generated data is being used in the lab - not as a final answer on flowing/stalled, but as a way to greatly reduce the set of vessels that need to be examined by experts.
We have certainly considered creating a sensitivity-based leaderboard (or something like that), and will take your comment as further support for that idea!
Thank you very much for your replies. One other thought that I had is that have you ever considered that there might be a social desirability effect going on when the crowd annotates movies that they’ve seen before? I think for a lot of people once they get a blood vessel wrong a few times, they start to remember the “correct” response that they think you are looking for. So, for example they really think that the blood vessel is flowing, but they say it is stalled, because they know that that’s the correct response that’s going to get them the points, so everyone starts saying stalled, when in fact they think the blood vessel is flowing. I think this might be an important issue because that could totally throw off the decision that the crowd comes to. You might believe that they think now think that the vessel is stalled, when in fact they have just decided to say that so they can get the points, and get the question right. Is this something that you consider when you are analyzing the results?
As far as the question of whether or not the expert can achieve a perfect score, a couple hours after I wrote the question, and posted it, it did occur to me that it was a ridiculous question. Yes, if she sets the criterion, then of course there’s no way that she can be incorrect because she is, in effect, creating the test, and then taking her own test, so of course she will always get the correct answer. I guess a more appropriate question would be who decides what the correct answer is? If it’s only one person, then I guess that could be a problem. However, if you’re using a group of experts to make the decision, then I guess it’s like having an “expert” crowd. Yes, I agree that the judgment of the novice crowd is limited because they can only make judgments based upon what they see, and as is often said, there is often more to something than meets the eye, and I don’t really know how one would impart such knowledge to people who are not subject matter experts, but then I guess that is something that you are working on.
When I come across a movie that I know the answer to, I choose the answer that will give me the points. If I disagree with the community or experts, I then flag the video for the possible wrong answer, and leave a comment stating my disagreement.
@pietro,
I don’t know if I agree with the part that says “people could actually see whether other players have the blue bar all the way to the top”. Mike has a point about the redeemed points shooting someone to the top of the leader board, but so what. I think doing something like that will open the door for people to begin to be bullied for just redeeming points instead of actually catching stalls. I “catch” for an hour or two a night and if I have any points to redeem, I do it at the of my time. Yes, the redeemed points have added to my score and once or twice have been “shot” to number 1. But overall, I put the time in to do the work. I just have a bad feeling about doing something like showing peoples abilities. There are folks on here that have started with a very low score and worked hard to make it go up. I see it everyday when I play.
I guess what I am trying to say is, there is enough things in this world that people are judged on, this shouldn’t be one of them. Sorry for the ranting and thanks for listening.
Sincerely,
Carol
This is excellent feedback - thank you. I think you are bringing a very valid perspective to this discussion and consideration of features like this.
Fortunately, we haven’t seen any bullying in this community, which I think tends to select for people who are doing the right things for the right reasons. But I do think sometimes people can feel pressure that works against their desire to play a game, so maybe giving people control over whether or not their stats are shared on leaderboards would help with that. What do you think?
I don’t think anyone would be criticized for redeeming their points, unless they were able to redeem those points during a competition in one of the Catchathons. If someone could redeem points during a competition, it could be viewed as cheating. In future competitions, the redeem feature should be temporarily locked (if that isn’t already the case), so that people know they are on a fair playing field.
I think you are right about people feeling pressured to do more or work harder to achieve a higher score. Believe me when I say, I get really bummed out when I get one “wrong” and my sensitivity points gets crushed. But… I pluck along and work twice as hard to bring that rating back up.
I happen to like playing this game, as hidden objects are my favorite type of game to play. Once I understood what I was supposed to be looking for, this has turned into a easy, laid-back kind of game. I am not competing against every player on here but in a sense I am. Yes, the goal is to get to number 1 but that takes work. Putting my sensitivity score up for all to see would not be to my liking. I play for my own enjoyment and to help out with the research. I get a sense of fulfillment knowing I am helping. (just call me a worker bee)
I think, if a person wants the others to see their blue bar then they can post it. I don’t know if it’s for bragging rights, empowerment, or what. But I shall remain in the back 40 where no one can see me!
Sincerely,
Carol
I’d just mention that for top players like you, Carol, who spend a lot of time in the top-ten listing, anyone who cares can simply keep an eye on the scores to ascertain your current sensitivity. If your score is incrementing by 116 or factors thereof, we know your blue bar is maxed-out. (On that basis, I’ve been wondering if Guy ever misses!)
I agree with you that posting sensitivity scores is of questionable value, and might be a disincentive for players, especially less experienced ones, who might be ‘sensitive’ (sorry) to that exposure. Good luck with this latest batch!
Mike
Thanks for the feedback everybody - this is very helpful myself, I was thinking more about a “hall of fame” of catchers who have ever reached perfect sensitivity. We could update this as soon as someone reaches that for the first time, no one would ever be removed from that board, and it probably wouldn’t even be on the VM, so you would have to go to the “Leaderboards” page, e.g., to see it (maybe). (When we add more types of “achievements” in the game, there could be halls of fame for other things too.)
The general thinking in the field of incentives and feedback is that public achievements should be available to all players and achievable by all players. It should be designed so that everyone has a chance of achieving the goal. These would include things like our “Levels” since the more you play, the more points you earn, the higher level you achieve. Perhaps it could be a celebration or achievement for playing the game on 10, 25, 50, and 100 days. Most of these are rewarding and celebrating the participation of a player not their skill.
The next category of achievements are skill based. How fast we can annotate or what our average sensitivity is over time. These can be detrimental and have unintended consequences for some people. If these are going to be tracked, available, or celebrated in the game, they should be optional for players. Give a button that allows a player to “hide” their stats from other players.
Competitions on the other hand should be voluntary and yes you have winners and losers. However, if the stakes are low, then the effects of losing are minimal. Winners gain some short term bragging rights. We already have the choice of whether we join a team or competition. Team competitions help all individuals participate without feeling singled out. Most of us are here for the long haul and are here for the science and research and not for the points or notoriety. These competitions are not taken too seriously.
Specific to the “Perfect Sensitivity” Hall of Fame, whether a person has ever achieved a perfect sensitivity is somewhere between the first and second category. It does not indicate a players ongoing ability or potential. We all miss calibrations from time to time (even @caprarom I think) and our sensitivity is affected. I think it is an achievable milestone for all players. I think that this achievement is one we could celebrate for all players. Not having achieved it yet is not a reflection on a person’s skill but more of a reflection of their time playing the game.
Just be careful with other achievements or Hall’s of Fame. Keep competitions as voluntary, skill levels as optionally private, and celebrate mutually achievable milestones.
Definitely more than two cents in here !!! Just a quick note that I captured all of this in our achievements/badges team thread, as this is one of the topics in active planning! This is extremely useful, @gcalkins, thank you!
What do you mean 3-16 points? Do you mean on the blue bar itself or the points you then get for the movies?
In any case, the sensitivity bar is designed such to avoid random answers - if a bot gets hold of a Stall Catchers account, or any other reason a user starts answering incorrectly a lot, the bar will drop, and we will not include the answers to the crowd answers (or assign lower confidence) to protect the data.
The reason there’s a range is because it depends whether you get a flowing or stalled wrong.
Hi Egle,
Sorry to be so random about the questions. I have had a lot going on the last couple of weeks. Ok, I will try to do this better this time.
The email you sent me asking if I would be interested in joining some group on here.(I can’t remember the name of it or find the email) Needing more info about me and a pic to put with the name. The web address you listed wouldn’t let me on or into it to see what it was all about.
That explains a lot of the problem I have been having with getting an incorrect choice. I just couldn’t figure out why sometimes I am “zapped” with low points (3-5) taken away and then other times, wow, it hits me for 10-14 points taken away.
Thank you for the reply and explaining it to me. Again, any help will be greatly appreciated.
It sure did help and back to catching I go!
Carol
Carol, regarding that second item, minor vs. precipitous drops in your skill level and resultant annotation scores, my experience suggests another variable that would certainly apply to you. That variable is how long you have been maxed out on your skill level. If you’ve been maxed out (and active) for a long time, when you finally miss a calibration, you might only drop to 114 or 115 per annotation vs. the 116 max. In fact, you might even stay at 116. This has happened to me a few times. However, if you then miss a second calibration (maybe because the first miss shook you a bit), then you’ll get ‘zapped’ big time. If you just recently maxed out your skill bar, and then miss a calibration, you’ll get a harsher penalty. Hope that helps.
Mike C.
Hi Mike,
Thank you for explaining it in a different way. Yes, I totally get it now! Doesn’t mean I have to like it but I understand the workings of it a lot better. Also means I have to stop and call it a night when the old peepers start to blur!
Thank you for your help.
Carol