Type classification: this is a notes resource. |
P308D - Categorical Data Analysis - Dale Berger
What is a negative correlation between gender and graduation date? - People who are higher on sex are lower on graduation date. Graduating earlier or later? Earlier.
So: people who are male are graduating earlier; females later: what does that imply if you look at the
(See the Google Sheet
-.149: Put that in a sentence: A: “On average, females graduated -.149 semesters later” A2: Probably better to say, “ More females graduated later than males.” …
When we say, “Holding graduation date constant” or even better, “for men and women graduating in the same semester, on average the difference in salary was 2253. Now, that’s a larger number than what we got when we ‘’didn’t’’ control for salary.
Q: So, why? A: There are a couple ways to do this. One is graphically:
This graph is simplified. (J: I don’t really get it:))
Suppression: the relationship between gender and salary was suppressed by the graduation date: people who graduated later tended to get greater salaries, but because proportionally more women graduated later, this suppressed the difference in salary between men and women.
How do you tell if tehre is suppression? A: You compare C and C’ (“cee” and “cee prime”) - and if C’ is bigger than C, then you have suppression.
So, in this situation we have mediation, and we also have suppression.
Q: Two-Tail tests. p. 42; - each table is .025 on bottom end, and .025 on top end.
Q: When doing the Wilcoxon T; figuring out which tnumber to chose from the possible probabilities? - A: The table only lists suprisingly small values: depending on how you sum ranks … the Wilcoxon test is based on ranking them in the order that gives you the smallest possible t value. The same concept applies to the Mann-Whitney “U” - you get multiple answers; smaller is what you take because tables are constructed on one hand.
Theoretically, you have the whole distribution. . .
Q: If you use the table on the left, you’re using α.025.
This statistic is useful if you have two ordered variables, going from Low to High. They don’t need to be interval: all you need to do is order it; in the case of which one is bigger, and which one is smaller.
If you have people who are Temp; People who are hourly; exempt employees; profit-sharing; you could code these things as 1,2,3,4, but would it make sense to run a correlation looking for a linear relationship between them? Not really. But it would make sense at a job level, if you were able to put these in a specific order.
Take an attitude measure: Strongly Agree (1) <—? (2) —>Strongly Disagree (3) - there aren’t necessarily equal intervals here, but what we’re interested in is whether there is an ordinal trend, such as, “are people higher in the job level more likely to agree? Example 2
In a 2x2 table like in Example 3 what test would you apply? Chi Square Test of Independence.
If you ran a χ^2 on this table, (https://docs.google.com/spreadsheets/d/1qc381NZ-FnCABig4oXXr4eEQh7TX2L3so1rrF_U0smc/edit#gid=0 Example 2] how many rows would it have? You can't really tell what the salary is: it's either really big or really small: and you have 6 degrees of freedom (3*4-3) (rows*columns-rows???)
Say someone is higher on 01; Example 2.1-2 If you take someone from 'a' and someone from ‘b' they can be concordant or discordant - (otherwise we don't count them?)
If I sample one person from two cells; I have two cells. We could see: are they a concordant pair, a discordant pair, or neither? In this case, they would be a concordant pair; because the person in cell ‘d’ is higher on BA and Salary than the person in cell ‘a’.
This tells us how many we have: 100 for the first pick; 99 for the second; divide by 2 because it’s not a combination (J: It’s a permutation? Or did I get that mixed up?
How many concordant pairs would there be there? 16*51 = 816.
Where:
To do this, you have to code consistently with some theory: In Example 2 anyone lower and further to the right would be concordant
What is an Odds Ratio? If someone has no BA, what are the Odds that they’re on Salary?
The odds would be 16:24; which is roughly 3:2 = 1.5. That’s not the proportion of people on Salary. . . the Odds ratio is a ratio of odds! (laughter from class)
Q: If the odds are “1”, what does that mean? A: Equally likely. Q: If the odds were the same for both groups, what would the ratio be? A: “1”
If you did this the other way, you woul dget the ratio 0.265.
Q: If you see an odds ratio of 10, another odds ratio of .1, whose is bigger? A: They’re the same. (J: (1/ODDS Ratio 1) = (Odds Ratio 2/1))
(J: Odds ratio Test of Independence. ?)
Q: If you have 16/24 - 16*24 you have perfect independence; and you will get a χ^2 of zero. If you have perfect independence, you expect Bumble had something to do with your data. . . .
Q: In 2x2 table, Odds Ratio is identical to test of independence - (J: is that true?)
Packet 6:
Assumptions of Linear Model:
Bottom line: Ordinary Least Square Regression is clearly inappropriate.
Now, the logistic model fitting the pattern to the data looks like a lazy s. [image: https://www.evernote.com/shard/s95/sh/f634acd0-a7d0-4ebe-9a0c-c919c213881e/31a50a753d8f6c7b4cdc26dfa6e3d794]
(J: you’d better hope this sample doesn’t contain ))
If you have success Rate; Failure rate; Etc.,
Search for Logistic regression on Wikipedia. |
* Binary Logistic Regression with SPSS