This site is no longer maintained. Use phish.net or current.phish.net.
This is all fabulous. Thanks for doing this. Some scattered thoughts:I've been using a combination of deviation, entropy, and # of shows rated, which seemed most closely aligned with the RYM system. Entropy is the metric that drives most of the weights, followed by deviation and then # of shows. Typically, anomalous raters are identified by more than one metric, i.e., about half of all people identified with deviation also have low (zero, or near zero) entropy. But your point is well-taken.
1. Dropping raters with exceptionally high deviation scores will mechanically increase R2 and decrease RMSE, so I would be careful to only use those metrics when comparing weighting schemes that treat high-deviators the same.
3. I would be curious to see a graph of avg rating by year, comparing the different weighting schemes. Which years win? Which lose? (Ideally extend back to 1.0)I haven't done this for every year, but here's a couple of things that aim in that direction. For all of the weighted averages (two of which are depicted below), the distribution remains left-skewed, but is squished from the top. There's a bit more mass in each tail, and the overall mean rating (the mean for all shows) actually increases. This graph includes shows from all eras.
4. You've probably thought of this, but a middle ground option between doing nothing and doing full-on real-time adjusted weights would be to generate the weights at regular intervals, e.g. monthly. (...any new accounts created in between weighting updates would get an initial weight of zero or close to zero.)
...since we are seriously considering a change to ratings, then we should also consider letting users have more than 5 options to differentiate shows.This was brought up a couple of days ago in the Discussion Thread. You are not alone in this belief, and I'll certainly raise that as an issue in my memo to the Admins.
A comment by HotPale was voted down.
"My previous research ( https://phish.net/blog/1539388704/setlists-and-show-ratings.html ) demonstrated that show ratings (with equal rater weights) are significantly correlated with setlist elements such as the amount of jamming in a show, the number and type of segues, the relative rarity of the setlist, narrative songs, and other factors."The findings in question were derived from 3.0 shows only, from 3/6/09 through 12/31/17. We don't know whether the same elements would correlate as reliably with higher ratings for shows from 1983 to 2004. We also don't know whether those elements have continued to exert the same effects on ratings from 2018 to present.
If we believe to know the majority of setlist elements that typically correlate to a rating, would it make sense to list those elements for a user to individually rate when they go to rate a show? Subsequently a formula could crunch those numbers together to generate an "overall rating". Personally, I think I would stop and think twice about each category and probably be less impulsive with my rating after attending a show. Yes, it could drive some people away from rating due to the "survey-ness" of the approach, but at least you're able to understand and measure the impact of a rating based on those elements.
None of this explains why Saturday of the fest is still the 3rd best show of all time ????Good shows sometimes take months to come back to reality. Unless you are suggesting that they delay ratings until days/weeks after a show, then there's not really a way to combat it
None of this explains why Saturday of the fest is still the 3rd best show of all time ????Whether it is that good or not, there�s an entire segment of the fan base that�s actually angry enough that the band could possibly play a show of such high quality today that they�re resorting to burning down a silly rating system. It�s awesome that the boys can still bring so much heat 40 years in that we have this kind of passion about it.
To me, a useful rating system should tell us (1) what did the people who attended this particular show think (ie, the fun factor) and (2) what do the people who regularly listen to and attend Phish concerts think about the quality of this show (ie, replay quality).An easily-calculable split based on an existing dataset for user attendance and the current ratings that would be highly informative and require a few lines of code and maybe a table join? Pump it straight into my veins!!
Distinguishing ratings on those two dimensions would help a lot.
None of this explains why Saturday of the fest is still the 3rd best show of all time ????Not anymore! It took a big drop after this article came out. Coincidence??
Just keep IT the same please. Nobody's rating should outweigh anyone else's rating. That is not cool. I think y'all are thinking a bit too hard on this. Keep IT simple! You know, Phish's anthem...no need to overcomplicate any of this. Obviously, raters should only use one account to make ratings, but giving some people more rating power is simply unethical. That's like the super delegates at the DNC, not that they matter anymore since the Democrats have completely subverted the will of the people in this current "primary." Just let everyone have the same opportunity to vote and vote once.
None of this explains why Saturday of the fest is still the 3rd best show of all time ????Recency bias. The shows then settle into the overall scope of phish.
Thank you, @Pauli, for this. Super interesting (even though I don't understand a significant percentage of it). The one factor that I didn't see discussed in these posts is the timing of ratings. It seems that a lot of the abuse of the system has happened all of a sudden - where a show is sitting at say 4.3, and then in the space of an hour, 50 one-star ratings come in that drop it to a 3.6. And from what I recall, they come in from 50 different accounts. Is there any way to take that into consideration as well as the general rating patterns of a user you've described in the previous posts?Agree with this perspective. If you have the rating timestamps I�d love to see a time series of a bunch of shows like 7-14-19 or even the recent Dick�s run per this post.
On another note, I like @Lysergic's idea above - keep the current 5 point scale but add percentile. That would keep everyone happy!
Can someone who is so dead set on determining the exact right methodology to rank shows explain to me why it matters if night 3 of Mondegreen is 5th or if it�s 25th or 50th? It�s a show everyone should listen to regardless. And a good reminder why new phish fans should still be going to every show they can.
Who are the people who are like, �Oh thank god that rating was adjusted, I really felt like it was wrong�?
@Zeron said:Thank you @paulj and all for a very interesting, nerdy and generally considerate discussion here. I'll echo a few points brought up by others, and provide an example for reference....since we are seriously considering a change to ratings, then we should also consider letting users have more than 5 options to differentiate shows.This was brought up a couple of days ago in the Discussion Thread. You are not alone in this belief, and I'll certainly raise that as an issue in my memo to the Admins.
Can someone who is so dead set on determining the exact right methodology to rank shows explain to me why it matters if night 3 of Mondegreen is 5th or if it�s 25th or 50th?Not sure if this is aimed at me, but I think so.
The discussion thread can be found at This Link.