I’ve kinda found /r/ELI5 has turned into “explain something in a short paragraph” and has given up on analogies anyone can understand.
Let’s say you want to find out who the most popular kid in class is, so you start asking around. Five kids say they like Kevin, and three say they like Steve. But three kids can’t stand Kevin, and one hates Steve…so which one is more popular? Plus, there’s Stacy, who nobody really pays attention to but Margo who says she’s her best friend.
So if you added up the scores like we do here, you would have 2 for Kevin, 2 for Steve, and 1 for Stacy. What if we did it by percent, then? Now Kevin has 62% approval, Steve has 75%, but Stacy has 100% - and only Margo likes her!
What Wilson scores do is they take “confidence” into account, too. There’s some
crazy-looking math going on, but you don’t have to work it out by hand. You just take into account how many votes there were on top of what percent liked a kid, and the more information you have on someone, the more sure you can be that the “real” number of classmates who like them is close to what you’d expect for a popular kid.
How do you know what that is? Well, that’s called the
p-value in statistics, and you figure it out by looking at what’s unlikely, and then doing the opposite. Like, you’re pretty sure Stacy isn’t popular since Margo’s the only one who pays attention to her, so who’s the least Stacy-ish kid in class? It’s gotta be Steve.