Chapter 2

Let's recall what a measurement is

Now is a statistic a measurement?  Sometimes we hear the two terms used as synonymously are people correct in doing so.

A statistic is not a measure, a statistic is a means of analyzing the scores.

Glass and Hopkins state statistics describe characteristics of observational units.
In other words,
Statistic is defined as a method of analyzing a set of scores to enhance its interpretation.

We’re not going into depth about statistical processes, but we will cover the basic concepts needed.

There are two main categories of statistics.
Descriptive and inferential.
 Descriptive statistics are used to describe or summarize a data set.
It involves tabulating, depicting, and describing collections of data, either quantitative or qualitative.

Inferential statistics involves attempts to infer the properties of a large collection of data from inspection of a sample of observations.
 
The purpose of inferential statistics is to predict or estimate characteristics of a population from a knowledge of the characteristics of only a sample of the population.  He descriptive characteristics of a sample can be generalized to the entire population, within a known margin of error,

In the school setting, we will generally be using descriptive statistics for three purposes.

1. to describe a set of test scores
2. to standardize test scores
3. to estimate the validity and reliability of the test.
 
Measurement scales

To interpret a score we must know underlying rules for the score to analyze it accurately.

This depends on the scale the score reflects.  There are four different scales used in statistics.
Nominal, ordinal, interval, and ratio.
 

Nominal measurement
( giving a name or names)  It is the process of grouping objects into classes so that all those in a single class are equivalent in respect to some attribute or property in other words grouping into a mutually exclusive category.

 Numerals given for identification
 Gender

The idea in nominal scale is that it uses only the property of the number in other words 1 is distinct from 2 or 4,
 
 It does not mean that a person who is number one is any better than a person with the number 99.

 It just identifies one from the other. Gender, race

Ordinal Measurement
Is possible when differing degrees of amounts of an attribute or property can be detected.

Used when we rank can rank a group of things from low to high.  (specific characteristic)

i.e.  ranking of teams in a league, ranking of players, rank ordering of states on different variables, a ranking of priorities, norms, social class, percentile.

Interval measurement
 More refined measurement than ordinal
 
Describe the magnitude of difference among things
 Meaningful order, and units of measure
  Year (a.d)
  Temperature F or C

Interval measurement involves assigning numbers to object in such a way that equal difference in the numbers correspond to equal difference in the amount of the attribute measured.

The difference between 50 and 60 degrees is equal to the difference between 90 and 100 degrees

With interval measurement the zero point of the scale is arbitrary and doesn’t correspond with absence of the attribute

0 degrees celius or 0 degrees Farenheit does not mean there is an absence of heat or temperature.

You can transform interval data to ordinal data, but you cannot transform ordinal data to interval data.

Ratio
 Numbers represent equal units
 Absolute zero.
 Represent equal difference in the amounts of the attribute measured
Observations can be compared as ratio or percentages
 Examples wt., distance, time, age

Same as interval but has an absolute zero, allows us to compare and say subject a has 1, 2, 3, x etc more of an attribute than does subject b.

It allows us to make comparisons such as:
A person who can jump 40” can jump twice as high as a person who is able to jump 20”.

Cannot make ratio-type comparisons in attitude, achievement, personality intelligence or sociometric status, because no absolute zero point.  A person with an IQ of 140 does not have 2 times the intelligence of a person with an IQ of 70.
 

It is important that we understand these scales, why because certain statistical techniques are for use only with certain types of data.
Example is the Spearman-rank difference correlation which is for use only in Ordinal data.
The propterties of the measurement scale should reflect the underlying characteristice of the attribute being measured.
 We use the scales to describe the data as accurately as possible.

Discrete and continuous variables.

First we must define what a variable is… a variable is a non uniform characteristic of observational units (people or the thing we are observing such as a state, school district, school or classroom.  (variables we often observe in people include age, height, i.q. occupation, )

A continuous variable is one that could be measured theoretically could taken to a finer degree such as a height in inches.  In other words a fraction could be used to get it to the most exact measure possible.

A discrete variable is one that can only take on distinct points on a scale or separated values.  Such a the number of students in a class etc.  there cannot be 1.5 or 1.75 students in a class there can be 1 or 2.
 
 

Frequency distributions.

If we are given a sheet a paper with 100 scores randomly ordered on it.  It  would be impossible to understand the value of the scores.  We couldn’t tell if a score were high, low, or average.  There for we must come up with some way to organize a set of data into a usable form.  An easy way of making sense of a data set is to develop a frequency distribution.  There are two basic characteristics of such a distribution.

!.  the test scores must have a meaningful order.  In other words a score must differentiate the amount of the attribute

2. the categories in the frequency distribution must be mutually exclusive.  In other words, the score can fit in only one category in the distribution.  It cannot overlap.

In most cases when we see a set of data with may think that it has a meaningful order.  But this is not always true, the numbers on a locker, or the numbers assigned to a participant in a race, or a number assigned to a football player has no real meaning. It is just a way of identifying one from the other.

Sometimes we have data that can be summarized in a table that is used but is not a distribution. An example would be needing to know how much equipment, merchandise etc you have.  In this case a frequency table would be suitable.  A frequency table is important when the numbers do not represent a score.
 

Lets work with a data set.

6,3,7,4,2,2,7,9,10,4,3,2,1,2,8

First let us rank order the scores.  This as you can see is useful.  Allowing us to see the highest scores and lowest scores as well as allowing to see where the most scores fell, but it isn’t a frequency distribution.  It does have one of the char.  That of having a meaningful order but, it doesn’t have the second char. Which is what?

1,2,2,2,2,3,3,4,4,6,7,7,8,9,10,5,6,5,3,1,1,7,4,5,7,3,2,1

mutually exclusive categories.

To turn this into a frequency distribution we must take the following steps.
1. rank order the test scores
2. tally the score in its appropriate category (tally)
3. record the number of scores by summing the tallies in each interval (f)
4. sum the frequencies from the bottom interval to the top. (cf)
5. Convert the cummulative frequency to a cummulative percent by dividing the cf by the N (or the total number then multiply the factor by 100). C%
 
 
 

If we have a set that needs interval sized larger then one we follow these steps:
1. find the range
2. select the number of intervals (classes/categories)
3. define the score limits for the intervals
4. tally the observations into the intervals
5. count the tallies within each interval and express as a frequency.
 

Finding the range.

To find the range we must get the difference between the largest score or observation and the smallest observation or car.  The formula is as follows:

R = (highest score – lowest core) + 1.

Adding 1 allows us to reflect all possible scores.
 

Select number of intervals or classes.
 
 
 
 
 
 
 

Define the score limits for the intervals.

To approximately estimate the width of each interval divide the range by the desired number of intervals.  to make things more convenient we will round up or down to a round number.
Each interval of the frequency distribution should begin with a multiple of the class. Do not begin with the lowest score.
 

As you notice this is not exact there is a small area not included in the upper and lower limits of the interval.  set the real limits by taking half of one and adding to and upper limit subtracting from the lower limit gives us  the real score limits.

120 - .5 = 119.5 and 139 + .5 = 139.5  140 - .5 = 139.5  and 159 + .5 = 159.5

Tally the observations into the classes.
 Can make an x or use a box method.

Then follow the procedures for the 1 width interval.

Let make a frequency dist from our data.
 

When we build a frequency dist. we usually record with the highest scores at the top.  If a lower score represents better performance it’s appropriate to place them at the top.  Such as in cases with golf scores or timed performance.

Now we can use the data to graphically display the data.
Why would we do this?

Because graphically displaying data can be more powerful than presenting numbers.  We can more easily make a point in defending our position or drive a point home.

There are two commonly used graphs that we will learn in this class.
1. frequency polygon
2. histogram

a frequency polygon depicts frequncy as a line plot.  With the horizontal or y axis depicting the midpoints of the intervals. And the vertical or x axis depicting the frequency of cases.

Show an example.

A histogram is a type of bar graph. The width of the interval is represented by the width of the bars on the y or horizontal axis.  The frequency of cases is shown on the vertical or x axis.   Show example using data.

The histogram has now space between intervals.  Make sure notice this.

If we develop a histogram from a frequency table the bars do not touch show example using baseball equipment.

Show special situation using data set on page 36 for shuttle run scores. Where want to develop special rules.
1. range must be calculated to nearest tenth of second so. R = highest minus lowest + .1 because in tenth of seconds.
2. Range will be in tenth of seconds.
3. Scores will be recorded from lowest to highest.
4. If in graph form would be presents lowest to highest as well.
 
 

In education settings we often want to determine the median.
 
 
 

The median is the 50th percentile (P50)   or X.50

A percentile is a score value for a specified percentage of cases in a distribution of scores.
 

The fiftieth percentile is the score that lies in the middle of the distribution.

In other words half of the students or subjects score above this score and half scored below this score.

Calculation from ordered scores:
If we have a small group and can easily order them the median can be calculated pretty simply.
 

Lets do an example:

Here are some scores from a push up test.
Data set:  30, 26, 14, 50, 23, 26, 14, 32, 40,50, 30,30,32,33,35,36, 42
First we would order the scores:
14,14,23,26,26,30,30,30,32,32,33,35,36,40,42,50,50

next we would simply find the middle score:  since N = 17 we would count 8 from the front and 8 from the back. And the score that divides them would be 32.
Now lets find the 75th%ile. 2.  In this case we would find the scorewhere75% or 12 of the students scored below and four scored above.  Here it would be 35.

Lets now use a data set and develop a frequency distribution where the interval is one.  Let’s use the one from the book page.47. your homework.  Let’s calculate the median for this. we must use the real limits to do this.  Remember we must set real limits to include all points under the curve.  Also remember we determine the distance between the score limits by taking half of one and adding to and subtracting from the score limits.

62,60,59,66,57,65,58,59,65,60,64,61,60,61,60
N = 13.
Use the following formula now to determine %ile .

%ile = lrl + .x(N) – sum of frequency below (fb) divided /by frequency within (fw) multiplied (x) by the size (width of the interval) = (i)

Now calculate the 50ile from your dist.

First determine the interval in which the score is in by going .5 (N) = 7.5 thus, the score is the 7.5 score in the list.  So we count up to the point  in the cf here arewhere this interval is. It is the 60 interval here. It is here because the score is greater than 4 but less than 8 on the cf list. Now we caculate the score.

50th%ile = 59.5 + 7.5 –4 / 4 (1) = 60.375
do 70th and 80th

Now lets calculate using a frequency dist greater using our frequency table from earilier.

Find out which score would be the fiftieth %ile
Now calculate using formula
Do 75th
Answer problems one through 6 part a.
 

Calculating percentiles when lower scores = better performance.

Use same formula.  Only difference is lrl = x + .5 of unit of measurement, if unit is.1 then .5 of .1 = .05 so,

If we want to find the 75th %ile for figure on page 42
First find interval where score will lie.  .75 x 20 = 15.

 The score will lie n the interval where 15 is in the cf it would be 10.4 because cf = exactly 15. Now use 10.4 in you formula.

Remember,
 ?fb =  sum of frequency between is the cf below the interval.

.x(N) is the percentile multiplied by the total number of cases.

And fw is the frequency within the interval.

x.75 =  10.45 + 12 - .75(20) / 3 (1)

 = 10.45 + -3/4(.1)

 = 10.45 - .075 = 10.375

Calculation of percentile ranks

Sometimes we need to know what the corresponding percentile is to a particular score.  We know the score on the test, but need to apply it to see how the person has performed compared to others.  In other words we want to find the percentile rank of a particular score. A percentile rank gives the percentage of cases falling at or below a specified score in a distribution.
 

This information is very useful whether you are in a fitness center or a class room.

Calculations from ordered scores.

To calculate percentile rank from ordered scores choose the score you want to calculate from.  The from the bottom count up to that number.  The divide the rank of that score by the sum or total number.

Example:

1,1,2,2,3,4,5,5,6,7,7,7,8,8,9,  we want to know in this case what percentile the number four represents?  Count from the bottom we find it is the 6th number out of 15 numbers so 6/15 = .40 x 100 = 40%
 

calculate for frequency dist interval of 1

use the following formula to get a more precise estime of the percentile rank.

PR for x = ?fb + x-lrl/I (fw)
      N  x 100
Lets calculate a percentile ranke using the following steps

1. Identify the interval containing the scor eof interest.  Let use a score of 62
     using your frequency dist from question 3a.
2. find the sum of the cf just below the interval identified
3. determin the lrl by subtracting half of the unit of measurement from the lowest score in the interval.
4. Fin, I the interval size, and the fw the requency within that particular interval.
5. Calculate percentile.

62 = 15 + 62-59.5/ 4 (2)  / 30 x 100

62 = 15 + 2.5/4 X 2 / 30 x 100

62 = 15 + .625 * 2 / 30 * 100

62 = 15 + 1.25 /30 *100

62 = 16.25/30 *100

62 = 54.167%
 

now let calculate it where lower scores = greater performance

use this formula to calculate it

PR for x = ?fb + lrl – x / i (fw)
      N  x 100
 

Example using p.42 lets calculate the percentile rank for a score of 10.7

PR for 10.7 = 3 + 10.75 – 10.7 / .1 (2) /20 (100)

3 + .05/.1*2/20*100
3 +.5 * 2/20*100
3 +1 / 20 *100
4/20 *100
.2 *100
20%

IQ scores for 50 persons

141 97 107 127 108
87 124 110 114 92
115 118 101 112 102
91 146 129 114 102
96 108 109 139 134
92 106 83 109 104
98 97 116 113 131
101 108 113 106 89
101 108 113 106 86
107 129 105 89 123

Develop a frequency distribution for the IQ scores provided above.

What will the lowest interval begin with?
 

What will the highest interval begin with?
 

Develop a histogram from the data in the frequency distribution.
 

Develop a frequency polygon from the frequency distribution.
 

Which of the following variables would typically be measured on nominal scales?
a. socioeconomic status  b. speaking ability
c.  typing speed   d. favorite food
e.  nationality    f. musical ability
g. assertiveness   h. year of birth
i. religious affiliation  j.  age
k.  political affiliation   l.  occupation
m. gender     n. handedness

Which options in question 1 would likely be measured on an interval, but not a ratio scale?
 
 
 

Which two variables in question 1 are most likely to be measured by ratio scales?
 

As typically measured, which four variables represent at least ordinal scales but probably not interval scales?
 

To say, “this value is 25% greater than that value,” requires atleast which type of measurement scale?
 

If one has a choice of interval, ordinal, and ratio scales for measuring a variable (e.g. height), order the scales from the least to the most desirable.
 

When persons are measured on an interval scale (e.g., date of birth), do differences between persons represent a ratio scale?
 

Is the measure “number of books included in a library’s card catalog” a discrete variable?  If not what measurement scale does it represent?
 

Give an example of a measure used in your major are of interest that has the properties of a ratio scale.
 

Would you recommend measuring age in months or in years as the unit of measurement?  Why?