I wrote a while ago of the strong correlation between entry order for the prognostication quiz and success on the "predicted answer" statistic. In other words, people who entered the quiz first were better at predicting the most popular overall answers, while those who entered late were not as good. It seemed like this was a significant trend.
Grant commented on my initial "statistics" post:
"I'd be interested to see if there are any time trends to the answers to individual questions. Events like the Celtics suddenly hitting the skids or the remarkable lack of Minnesotan snow for the first ten days of the year might account for some of the correlation."
I looked to see if these two questions seemed to show any pattern with respect to entry order, but nothing seemed obvious.
To try to get a handle on the factors that might be underlying this phenomenon, I decided to teach myself principle components analysis (PCA) using SciLab, which can sort through a bunch of variables to see which linear combination of variables explains most of the variance in a key statistic ("predicted score" in my case).
Since I only have 43 entrants, I wanted to limit the number of variables, so I tried first with 15 variables: one for entry order (very quantitative) and one for each question, where I turn answers A-F into integers 1-6. (This is a huge fudge since most of the answers are not orderly quantitative measures of anything. Some of my answers are somewhat quantitative, such as for questions 1, 12, 13, and 14, but most of the others are not, and will probably introduce artifacts.)
Performing PCA provides a principle component that explains 18% of the variance in "predicted score":
Desc | No. | PC1 (18%) |
Entry Order | E | 0.48 |
Baseball | Q11 | 0.42 |
Hockey | Q07 | 0.35 |
Stocks | Q13 | 0.35 |
Mushing | Q05 | 0.31 |
Hoops | Q08 | 0.26 |
Golf | Q09 | 0.21 |
Football | Q03 | 0.18 |
Writing | Q06 | 0.18 |
Storms | Q12 | 0.16 |
Film | Q04 | 0.13 |
Snow | Q14 | 0.11 |
Beauty | Q02 | 0.02 |
Peace | Q10 | 0.01 |
Freedom | Q01 | (0.12) |
As I predicted, the largest part of the first principle component was "entry order," confirming the correlation I showed first. I then examined the questions that contributed most to this component. What jumped out to me was that all six of the sports questions were in the top half, along with the stocks question. "What," I asked myself, "would explain different answers on the sports questions?" The answer was so obvious, I was embarrassed that it hadn't occurred to me earlier. Might there be a gender effect?
The 43 entrants are as evenly divided as possible, with 21 women and 22 men. The women, it turns out, were much prompter than the men. 8 of the first 12 entrants were women, while 8 of the tardiest 12 entrants were men. Was that the key effect? If so, I would expect to see "sex" as a stronger portion of the first principle component (i.e. greater than the 48% of "entry number" above). I coded female = 1 and male = 2 for "sex" and ran PCA on sex plus the 14 questions:
Desc | No. | PC1 (16%) |
Stocks | Q13 | 0.38 |
Sex | S | 0.37 |
Baseball | Q11 | 0.36 |
Golf | Q09 | 0.32 |
Hockey | Q07 | 0.32 |
Snow | Q14 | 0.31 |
Storms | Q12 | 0.31 |
Football | Q03 | 0.25 |
Film | Q04 | 0.24 |
Writing | Q06 | 0.12 |
Mushing | Q05 | 0.12 |
Hoops | Q08 | 0.07 |
Beauty | Q02 | 0.05 |
Peace | Q10 | 0.04 |
Freedom | Q01 | (0.19) |
So just 36% of sex contributes to the first principle component, suggesting that even though the women submitted answers earlier than the men, the early-answering men are more similar to the women and the late-answering women are more similar to the men. Entry order is still the strongest correlation. Very odd.
Oh - I keep meaning to learn PCA analysis because it is used a lot in climate research for temperature reconstructions...
ReplyDeleteCould you say again the part where you said about the things?
ReplyDeleteSorry to have lost you, Jack! Maybe Marcus can explain it to you, since he wants to learn it, and the best way to learn is to teach someone else!
ReplyDeleteAt any rate, I liked some of the vadlo mouse cartoons!
ReplyDelete