The closest I've come to the language of "measurement" before reading Williams, Lauer & Asher, Morgan, and Goubil-Gambrell was when I was rooming with a graduate student in cognitive psychology, who, after entering the dissertation phase of his work, would travel to Chicago schools and conduct what I now can label correlational studies of ADHD. I liked listening to him. The literature he was immersed in sounded like a foreign language. And although he had no great love for it, he was passionate about what it could describe.

I enjoy learning new languages. The grammar of measurement, however, is based on calculation, a muscle that I haven't trained in a long time and probably wasn't all that big to begin with. So I apologize in advance if some of the questions listed below are ignorant of the obvious. First, some definitions.

Qualitative & Quantitative

Morgan is most helpful distinguishing qualitative research and its subgenres of ethnography, case study, and description as best serving research designs "concerned with process and description" (27). Goubil-Gambrell adds that the method is particularly valuable in "identifying key variables" (584) and can be distinguished also by its lack of "treatment" (which, in its "administration" reminded me of the menu of treatments spas let you choose from) (588).

Since Goubil-Gambrell as well as the prompt for this blog constructed "qualitative" in opposition to "quantitative," I'm going to assume that Morgan's categories of "correlational studies" and "experimental studies" both make up the quantitative category, the former being more concerned with relationships, the latter with "outcomes or effects" (26). Williams prefers the terms "descriptive method" and "experimental method" but they seem, especially in reference to treatments and variables, to overlay "qualitative" and "quantitative" (9).

Validity & Reliability

These were the most difficult terms to parse. Lauer and Asher offer this: reliability "is the ability of independent observers or measurements to agree" (134); validity is the ability of a "measurement system...to measure whatever it is intended to assess (in these introductory survey of terms, terms tend to repeat) (140). Goubil-Gambrel isn't particularly helpful with the brief definition of validity being "whether the experiment actually measures what it says it will measure" and reliability "refers to whether the experiment precisely measures a single dimension of human ability" (587). It being the most obscure, I wish I had read that definition first rather than last.

Still, after going through Lauer and Asher's handling of the two terms, I'm somewhat confident concluding reliability often calibrates inward—among its own elements, both those of the study (measurement instrument) and those of the studiers (interraters), to determine if they are equivalent and consistent—before turning outward to judge if the results can be repeated under the same conditions. Validity, on the other hand, focuses more on calibration of the result itself, how it relates with past and future studies, how well "the researcher measures what he claims to measure" (Williams 22).

Probability & Significance

Probability is the frequencies of (population, sample or sampling) distributions, "generalized to cases where there are different total numbers of units involved," such as, to take an example from Williams, the probability of coming up "heads" after 64 coin tosses, a number based on the frequency of generating that result.

Next step: probability becomes vital when it comes to the null hypothesis, that which must be rejected in order for grounds of a research hypothesis. The probability level, then, is a "criterion for rejecting a null hypothesis" (61). If the studies measure comes in under or equal to the established probability level, the null hypothesis can be thrown away. (I'm assuming that the null is something everyone wants to avoid or get beyond, the research hypothesis being a kind of imprimatur. This could be very wrong.)

The level of this probability, this zone of "accept or reject," becomes the "significance level": "if a calculated value of probability is such that it falls within the rejection region, the researcher will often call whatever difference or relationship he is studying statistically significant" (Williams 61).

I had a minor epiphany when reading this section of the text: the "significant" language of CNN polls and medical findings suddenly became clear.

Questions

Some confusion lingers. Some of this is just musing.

Lauer and Asher say "reliability is to a large degree a social construction" (134). Then they say that "validity depends in important ways on social consensus" (141). So are both social constructs?

Morgan asserts that "experimental designs are different [from correlative and descriptive designs] in another way: comparison" (37). So how do you do correlation without comparison?

Goubil-Gambrell tells us that "the reason for all the statistical apparatus in quantitative research is to explain that relationships between variables are due not to chance but to cause-and-effect relationships" (586). How does this square with Williams admitting "our knowledge of the laws of chance" informs us about the "degree of variation" among samples" (43). This "knowledge" of sampling underpins probability which governs the whole "null" or "research" decision—a huge point of accuracy and distinction. My question then: how much do you need to know about chance for it to become knowledge? And isn't that then a little, what's the word....chancy? To call chance predictable or consistent leaves me scratching my head, even though I know they list the "odds" on the back of a scratch ticket.

Aside 1: I love it when Morgan calls theory a "bin" (28). I take my bins to the recycling center every week, otherwise they start to stink. How big is your bin?

Aside 2 (and this isn't sarcastic): I find it fascinating how quantitative research generates particular language (see Williams 58, 67)

Lastly, Williams (23) and Lauer and Asher (145) claim that you can get to reliability through validity, but not vice versa. Again, I'm probably confusing something basic, but it seems to me that a valid result can be generated in an experiment, but it might not be based on reliable measurement tools.

I hope there has been some "truth value" to all of this.