We need your help cleaning the data!

seplute · April 6, 2017, 7:09pm

We have a new dataset in Stall Catchers (yay!), but this one is a bit wanky - we had to use a new algorithm to generate outlines that made sure we got more data ready faster to be uploaded to Stall Catchers, but this machine work is certainly not perfect*! But - no suprises here - we humans can fix it

If you’re seeing weird movies - e.g. missing outline, an outline that does not align with a vessel, or an outline that jumps to a new spot in the same movie - please flag them and we will remove them ASAP! This will make it easier for the next person

THANK YOU!!

*second part of our project in fact deals with making this process better & faster by combining the best of humans & machines - like we do on Stall Catchers. Still a bit of a way to go though!

caprarom · April 6, 2017, 7:16pm

Among the other “wankyness,” it seems like movies I’ve already flagged and left comments for reappear again later lacking the comments and without indication they’ve already been evaluated at least once. It’s possible I’m just seeing very similar movies that look the same, but thought I should mention it as a possible issue.

pietro · April 6, 2017, 8:07pm

Thanks for the heads-up! We are looking into it now…

Best,
Pietro

pietro · April 6, 2017, 9:44pm

Ok - mystery solved! You did see indeed see duplicates today: 7 of them to be exact. However, they were calibration movies (not from the new dataset). We have about 1000 calibration movies in the system. Soon we will expand this to include many more, which will reduce the likelihood of seeing a duplicate.

The probabilities for this kind of thing tend to go against our intuition. For example, when asked how many people must be in a room for there to be more than a 50/50 chance of two people having the same birthday, most people would calculate 365 possible birthdays divided by 2 = 183 people. Strangely, the correct answer is only 37 people.

The same goes for seeing duplicates of calibration vessels. It is much more likely than one might think to see duplicates even when drawing from a pool of 1000 vessels.

Thanks again for letting us know. We always appreciate feedback when something doesn’t seem quite right, especially after a major update!

All the best,
Pietro

caprarom · April 6, 2017, 10:34pm

Pietro, I’m quite used to seeing duplicates on the calibrations every session - seven does not surprise me. The movies in question were not calibrations, so they must have just been very similar. They were mostly movies where the green outline was largely or wholly outside the frame making it impossible to properly evaluate them. I would click the “flag” movie box, and leave a comment like, “Outline is outside the frame.” There seem to be a bunch of those, a lot in the extreme bottom right corner. Regards, Mike

pietro · April 7, 2017, 7:03am

Hi Mike,

Thank you very much for following up. This additional information you provided really does paint a different picture.

Given your latest comments I think there are two factors at play. 1) This dataset includes images of the same animal taken on different days. Because most stalls last only a few seconds, the same vessel on one day might be stalled and on the next day might be flowing. But the outline for that vessel might appear in the same place on both days. 2) The method for generating outlines for this dataset is slightly different than usual and has a higher error rate (approximately 1%) than previous datasets, leading to issues like the outline being outside of the frame. You did exactly the right thing by flagging it. We are systematically removing these bad movies as they are flagged and we are able.

Thanks again,
Pietro

gcalkins · April 15, 2017, 4:32pm

Seeing several movies that have been flagged by other users for bad outlines, outlines not on the image, etc. that have not yet been removed from the system. Not sure how quickly these are being removed from the dataset, but some of them were commented on last week. Just a heads up that these are slowing us down.

pietro · April 15, 2017, 11:56pm

Dear @gcalkins - you are absolutely right! Thank you for mentioning it

We have a higher proportion than normal of bad outlines and bad movies in the current dataset. This is a side-effect of using automated methods that have made it possible to bring this database to Stall Catchers much sooner than would otherwise have been possible. So even though we have to deal with some bad outlines, it gives us a shortcut to getting the research done faster without compromising our analysis. But it is indeed a nuisance!

To address this, we are running a mini-crowdsourcing effort among our team members and some Cornell undergraduate volunteers to examine and remove the flagged vessels (I removed 20 of them myself yesterday!). I am hoping that by the end of the week, at least 90% of them will be gone.

Thanks for your patience and for alerting us. We will do our best to clean this up quickly!

Best wishes,
Pietro

gcalkins · April 19, 2017, 10:57pm

Thanks @pietro ,
Hopefully you can get your undergraduates ramped up soon in their crowdsourcing effort to start removing the flagged items. I have marked hundreds of bad outlines myself and have seen that many marked by others. It really is annoying to everyone to see those that have already been flagged by others. The sooner the better.
Regards,
gcalkins

pietro · April 21, 2017, 4:42pm

Hi @gcalkins,

Thank you for keeping tabs on this!

Thanks to Stall Catchers’ excellent flagging of bad movies and the concerted efforts of Mohammad, his team of checkers, and @ieva, 2200 bad movies have been removed.

At this point, all movies in the dataset have been seen at least a few times by different Catchers, so most bad movies have been flagged. It is conceivable that a few may continue to lurk, so please let us know if you continue to encounter bad movies, and certainly flag anything that looks like a bad outline.

Thank you again for your patience. We will do our best to avoid this issue with future datasets.

Best wishes,
Pietro

LotteryDiscountz · April 22, 2017, 4:52pm

Newbie question here, how does the flagging work? I check the flag box when seeing an outline issue, but I’m still presented with the choice to assess stall vs flow. Do I just do the best I can, submit as normal and a report alert goes through to y’all then? (I was expecting, after checking the flag box, to be taken to some different reporting dialogue.) Thanks for advice!

seplute · April 22, 2017, 11:31pm

Hello! [quote=“LotteryDiscountz, post:11, topic:140, full:true”]
Newbie question here, how does the flagging work? I check the flag box when seeing an outline issue, but I’m still presented with the choice to assess stall vs flow. Do I just do the best I can, submit as normal and a report alert goes through to y’all then? (I was expecting, after checking the flag box, to be taken to some different reporting dialogue.) Thanks for advice!
[/quote]

Yes - exactly - just do the best you can. In cases where the outline is obviously wrong, and there is nothing else to do with it other than flag the movie, just check the box and click flowing. Since we check all flagged vessels and remove the impossible ones, the answer really doesn’t matter.

In other cases, though, it might seem like the movie is of bad quality or otherwise difficult to analyse, but you can still manage to see whether it is flowing or stalled, and your answer (even if you feel unsure) is very valuable to us. (Then again - sometimes we have nothing left but make due with worse quality data anyway )

Thanks very much for asking, and have fun stall catching!
Egle

LotteryDiscountz · April 23, 2017, 8:17pm

Thanks for the encouragement! So followup question: If I see a outline that has say, 2 halves where it captures 2 different vessels, if reported do you redraw the outline to be easier for further analyses, or will y’all just leave it and hope that people will assess both or maybe the more prominent one? If it will be a “we’ll just make do” case, then it’s probably not worth my reporting those, I would guess?

blog · April 25, 2017, 3:41pm

Hi! Good question. Figuring out which vessel is the target vessel when more than one seems to be encompassed within an outline can be very tricky. This is one reason why we need human eyes looking at these!

In principle, each outline specifies a single vessel that connects two other vessels - one at each end. Sometimes a single vessel can be twisty and also move up and down (causing it to disappear and reappear as you move the slider). In some cases, this can make a single vessel appear to be two vessels.

When someone flags a vessel movie as “bad”, an expert will look at it and decide either to delete it (if it is truly a mistake) or leave it (if it is ok, but just confusing). We will not redraw outlines and try again.

Does that answer your question?

Thanks,
Pietro

LotteryDiscountz · April 25, 2017, 8:19pm

All clear, thanks Pietro!

gcalkins · May 2, 2017, 6:45pm

Players are flagging bad outlines, but none are being removed from the movies being viewed. Please remove these additional bad outlines. It is a waste of time for all of us.

I quote the first entry above:

“If you’re seeing weird movies - e.g. missing outline, an outline that does not align with a vessel, or an outline that jumps to a new spot in the same movie - please flag them and we will remove them ASAP! This will make it easier for the next person ”

I was hoping this would have been done as we progressed. I have personally marked several hundred that fall in this category. I am seeing many that have been flagged by other players.

Thanks
@gcalkins

seplute · May 3, 2017, 10:34pm

Dear @gcalkins,

Thank you for keeping an eye on this we have re-iterated this issue with our collaborators at Cornell & asked them to make sure they check and remove the bad movies. It seems they have been removing a few hundred a week.

I believe this is going slower than initially expected (and there are more bad movies than expected!), and all movies still need to be double-checked & manually removed (and they only have a couple people working on it - which is the whole reason we need Stall Catchers in the end, really )

In any case, they seem to be on top of it as much as they can be, an hopefully you will be seeing less and less bad movies now.

Also hopefully we will soon have better methods for generating movies faster without so much unclean data! (This issue has been really frustrating for us too )

Thank you again, and our greatest apologies to you and all the catchers who have encountered these repeatedly - hopefully we can avoid anything like that in the future!

Best,
Egle

nn62 · June 3, 2017, 5:48pm

I wanted to give you a bit more information about what is happening behind the scenes and why we are having difficulties with these vessels. The data for StallCatcher is organized by capillary segment, where we define each capillary as going from one branch point to another. A rather challenging part of the parceling out the data this way is making sure we get the out lines correct. This turns out to take a lot of time as we currently have someone manually trace each vessel. In many cases, we have images taken from the same mouse imaged at different times. The vessels don’t change very much from day to day, so in most cases, we can take the outline of the vessel from one imaging session and also use it for the another imaging session. Because the mouse is not exactly in the same position in each imaging session (we do our best to put it in the same place under the microscope, but it is pretty difficult), the images can be rotated or tilted. We do additional image computations to try to fix the orientation of these vessels, but as you can see, we can’t quite get it right.

In addition to trying to catch up on addressing the flagged data, we are currently working on new algorithms based on machine learning to better identify these vessel segments. However, it is likely that we will continue to encounter some of these issue. I was thinking that it might be nice for advanced players to have an option to get credit for marking problem vessels.

gcalkins · June 5, 2017, 4:55pm

Thanks @nn62 for the added explanation.

I didn’t realize that you were trying to use the same outlines over several different imaging sessions. Have you thought about using some “tags” (implants under the skin) that would act as registration marks for the imaging? Three such tags would give you the necessary x,y,z coordinates to help assist your machine based algorithms with the necessary 3-D orientation calibrations. Beyond that it would be a technician identifying a few key minutiae in the loops, whirls, or other distinctive vessels present within the overall imaging session that would provide the same assistance to the algorithms.

Anything we, the players, can do to make things easier, we would be happy to help. Many times we can see the misalignment and the intended vessels but have no way to alert you beyond flagging the film and adding a comment. Perhaps a special flag could be added that denotes a simple “shifted outline”. These could be easily fixed by your techs and added back to the data set for community analysis.

I would think that the fixing process probably takes longer than simply making your expert call as to stalled or flowing. Perhaps these movies should just be removed from further community analysis. It makes no sense to have the community analyzing misaligned outlines and doesn’t make sense to fix them either.

Thanks,
@gcalkins