Skip to main content

Unusual Correlation


I wrote a while ago of the strong correlation between entry order for the prognostication quiz and success on the "predicted answer" statistic. In other words, people who entered the quiz first were better at predicting the most popular overall answers, while those who entered late were not as good. It seemed like this was a significant trend.

Grant commented on my initial "statistics" post:

"I'd be interested to see if there are any time trends to the answers to individual questions. Events like the Celtics suddenly hitting the skids or the remarkable lack of Minnesotan snow for the first ten days of the year might account for some of the correlation."

I looked to see if these two questions seemed to show any pattern with respect to entry order, but nothing seemed obvious.

To try to get a handle on the factors that might be underlying this phenomenon, I decided to teach myself principle components analysis (PCA) using SciLab, which can sort through a bunch of variables to see which linear combination of variables explains most of the variance in a key statistic ("predicted score" in my case).

Since I only have 43 entrants, I wanted to limit the number of variables, so I tried first with 15 variables: one for entry order (very quantitative) and one for each question, where I turn answers A-F into integers 1-6. (This is a huge fudge since most of the answers are not orderly quantitative measures of anything. Some of my answers are somewhat quantitative, such as for questions 1, 12, 13, and 14, but most of the others are not, and will probably introduce artifacts.)

Performing PCA provides a principle component that explains 18% of the variance in "predicted score":

Desc No. PC1 (18%)
Entry Order E 0.48
Baseball Q11 0.42
Hockey Q07 0.35
Stocks Q13 0.35
Mushing Q05 0.31
Hoops Q08 0.26
Golf Q09 0.21
Football Q03 0.18
Writing Q06 0.18
Storms Q12 0.16
Film Q04 0.13
Snow Q14 0.11
Beauty Q02 0.02
Peace Q10 0.01
Freedom Q01 (0.12)


As I predicted, the largest part of the first principle component was "entry order," confirming the correlation I showed first. I then examined the questions that contributed most to this component. What jumped out to me was that all six of the sports questions were in the top half, along with the stocks question. "What," I asked myself, "would explain different answers on the sports questions?" The answer was so obvious, I was embarrassed that it hadn't occurred to me earlier. Might there be a gender effect?

The 43 entrants are as evenly divided as possible, with 21 women and 22 men. The women, it turns out, were much prompter than the men. 8 of the first 12 entrants were women, while 8 of the tardiest 12 entrants were men. Was that the key effect? If so, I would expect to see "sex" as a stronger portion of the first principle component (i.e. greater than the 48% of "entry number" above). I coded female = 1 and male = 2 for "sex" and ran PCA on sex plus the 14 questions:

Desc No. PC1 (16%)
Stocks Q13 0.38
Sex S 0.37
Baseball Q11 0.36
Golf Q09 0.32
Hockey Q07 0.32
Snow Q14 0.31
Storms Q12 0.31
Football Q03 0.25
Film Q04 0.24
Writing Q06 0.12
Mushing Q05 0.12
Hoops Q08 0.07
Beauty Q02 0.05
Peace Q10 0.04
Freedom Q01 (0.19)


So just 36% of sex contributes to the first principle component, suggesting that even though the women submitted answers earlier than the men, the early-answering men are more similar to the women and the late-answering women are more similar to the men. Entry order is still the strongest correlation. Very odd.

Comments

  1. Oh - I keep meaning to learn PCA analysis because it is used a lot in climate research for temperature reconstructions...

    ReplyDelete
  2. Could you say again the part where you said about the things?

    ReplyDelete
  3. Sorry to have lost you, Jack! Maybe Marcus can explain it to you, since he wants to learn it, and the best way to learn is to teach someone else!

    ReplyDelete
  4. Rachel4:11 PM

    At any rate, I liked some of the vadlo mouse cartoons!

    ReplyDelete

Post a Comment

Popular posts from this blog

Can You Cross Your Toes?

Katie and I had a heated discussion the night before last. We were sitting on the couch watching Jon Stewart when she noticed a large, apparently cancerous growth sticking out of the bottom of my foot. She asked what the big lump in my sock was. "That's my toe," I responded, nonplussed. I had crossed my first and second toes, causing a lump to protrude from the bottom of my sock. Katie was quite alarmed. "You can cross your toes?" "Sure, can't you? Everyone can cross their toes!" "Of course I can't cross my toes. Who can cross their toes?" And I confirmed that Katie could not, in fact, cross her toes. Even manipulating her toes with my fingers, I could not get her toes to stay crossed. She just has very short toes. That led, of course, into a discussion of who was the freak. Were my long, crossable toes abnormal, or were her stubby, uncrossable phalanges the outliers? In case you're confused, here are some pictures. First, of my v

Leagalize drugs!

The Economist has a wonderful editorial this week about legalizing drugs. I wholeheartedly agree that the world will be better off by far if the United States legalized, taxed, and regulated illicit drugs such as cannabis, cocaine, and heroin. The goods that will come from legalization: 1. We will save the $40 billion the US spends trying to eliminate the supply of drugs. 2. We will save the costs involved in incarcerating so many drug offenders (as well as gain their productivity in society). 3. We will gain money through taxation on the legal drug trade. 4. Legalized drugs will be regulated, and thus purer and safer to take. 5. With all these savings, we will have lots of money to spend on treating drug addiction as a public health issue rather than as a law and order issue. We will have lots of money to fund treatment programs for addicts that are ensnared by the easier availability of drugs. 6. We will prevent tens of thousands of killings in countries that produce drugs when proc

2017 Prognostication Quiz FINAL POST: Questions 10 and 11, Stocks and Quakes

In the last post , I pointed out that Matthew D. and I were in a two-way tie at the top of the leaderboard with me holding the edge over him in the tiebreaker. For Matthew D. to have a chance to come from behind and grab the win, some significant December movement would be needed in one of three areas: the stock market, world earthquakes, or a convenient death. Here's what happened: 10. Stocks (December 29) How will stocks do in this first year of Trumponomics? Will the Dow Jones Industrial Average be up or down compared to the final close of 2016? Which way will the Dow go? a. Up b. Down The Dow Jones continued to rise throughout the month. I maintained my advantage in the tie-breaker. 11. Earthquake (December 31) How many big earthquakes (magnitude 8.0 or larger on the Richter scale) will there be this year? (Big earthquake counts from this millennium are indicated in parentheses.) How many big earthquakes will there be this year? a. None (2) b. One (7) c. Two (4) d. Th