Let's recall what a measurement is
Now is a statistic a measurement? Sometimes we hear the two terms used as synonymously are people correct in doing so.
A statistic is not a measure, a statistic is a means of analyzing the scores.
Glass and Hopkins state statistics
describe characteristics of observational units.
In other words,
Statistic is defined as a method
of analyzing a set of scores to enhance its interpretation.
We’re not going into depth about statistical processes, but we will cover the basic concepts needed.
There are two main categories of
statistics.
Descriptive and inferential.
Descriptive statistics are
used to describe or summarize a data set.
It involves tabulating, depicting,
and describing collections of data, either quantitative or qualitative.
Inferential statistics involves
attempts to infer the properties of a large collection of data from inspection
of a sample of observations.
The purpose of inferential statistics
is to predict or estimate characteristics of a population from a knowledge
of the characteristics of only a sample of the population. He descriptive
characteristics of a sample can be generalized to the entire population,
within a known margin of error,
In the school setting, we will generally be using descriptive statistics for three purposes.
1. to describe a set of test scores
2. to standardize test scores
3. to estimate the validity and
reliability of the test.
Measurement scales
To interpret a score we must know underlying rules for the score to analyze it accurately.
This depends on the scale the score
reflects. There are four different scales used in statistics.
Nominal, ordinal, interval, and
ratio.
Nominal measurement
( giving a name or names)
It is the process of grouping objects into classes so that all those in
a single class are equivalent in respect to some attribute or property
in other words grouping into a mutually exclusive category.
Numerals given for identification
Gender
The idea in nominal scale is that
it uses only the property of the number in other words 1 is distinct from
2 or 4,
It does not mean that a person
who is number one is any better than a person with the number 99.
It just identifies one from the other. Gender, race
Ordinal Measurement
Is possible when differing degrees
of amounts of an attribute or property can be detected.
Used when we rank can rank a group of things from low to high. (specific characteristic)
i.e. ranking of teams in a league, ranking of players, rank ordering of states on different variables, a ranking of priorities, norms, social class, percentile.
Interval measurement
More refined measurement
than ordinal
Describe the magnitude of difference
among things
Meaningful order, and units
of measure
Year (a.d)
Temperature F or C
Interval measurement involves assigning numbers to object in such a way that equal difference in the numbers correspond to equal difference in the amount of the attribute measured.
The difference between 50 and 60 degrees is equal to the difference between 90 and 100 degrees
With interval measurement the zero point of the scale is arbitrary and doesn’t correspond with absence of the attribute
0 degrees celius or 0 degrees Farenheit does not mean there is an absence of heat or temperature.
You can transform interval data to ordinal data, but you cannot transform ordinal data to interval data.
Ratio
Numbers represent equal units
Absolute zero.
Represent equal difference
in the amounts of the attribute measured
Observations can be compared as
ratio or percentages
Examples wt., distance, time,
age
Same as interval but has an absolute zero, allows us to compare and say subject a has 1, 2, 3, x etc more of an attribute than does subject b.
It allows us to make comparisons
such as:
A person who can jump 40” can jump
twice as high as a person who is able to jump 20”.
Cannot make ratio-type comparisons
in attitude, achievement, personality intelligence or sociometric status,
because no absolute zero point. A person with an IQ of 140 does not
have 2 times the intelligence of a person with an IQ of 70.
It is important that we understand
these scales, why because certain statistical techniques are for use only
with certain types of data.
Example is the Spearman-rank difference
correlation which is for use only in Ordinal data.
The propterties of the measurement
scale should reflect the underlying characteristice of the attribute being
measured.
We use the scales to describe
the data as accurately as possible.
Discrete and continuous variables.
First we must define what a variable is… a variable is a non uniform characteristic of observational units (people or the thing we are observing such as a state, school district, school or classroom. (variables we often observe in people include age, height, i.q. occupation, )
A continuous variable is one that could be measured theoretically could taken to a finer degree such as a height in inches. In other words a fraction could be used to get it to the most exact measure possible.
A discrete variable is one that
can only take on distinct points on a scale or separated values.
Such a the number of students in a class etc. there cannot be 1.5
or 1.75 students in a class there can be 1 or 2.
Frequency distributions.
If we are given a sheet a paper with 100 scores randomly ordered on it. It would be impossible to understand the value of the scores. We couldn’t tell if a score were high, low, or average. There for we must come up with some way to organize a set of data into a usable form. An easy way of making sense of a data set is to develop a frequency distribution. There are two basic characteristics of such a distribution.
!. the test scores must have a meaningful order. In other words a score must differentiate the amount of the attribute
2. the categories in the frequency distribution must be mutually exclusive. In other words, the score can fit in only one category in the distribution. It cannot overlap.
In most cases when we see a set of data with may think that it has a meaningful order. But this is not always true, the numbers on a locker, or the numbers assigned to a participant in a race, or a number assigned to a football player has no real meaning. It is just a way of identifying one from the other.
Sometimes we have data that can
be summarized in a table that is used but is not a distribution. An example
would be needing to know how much equipment, merchandise etc you have.
In this case a frequency table would be suitable. A frequency table
is important when the numbers do not represent a score.
Lets work with a data set.
6,3,7,4,2,2,7,9,10,4,3,2,1,2,8
First let us rank order the scores. This as you can see is useful. Allowing us to see the highest scores and lowest scores as well as allowing to see where the most scores fell, but it isn’t a frequency distribution. It does have one of the char. That of having a meaningful order but, it doesn’t have the second char. Which is what?
1,2,2,2,2,3,3,4,4,6,7,7,8,9,10,5,6,5,3,1,1,7,4,5,7,3,2,1
mutually exclusive categories.
To turn this into a frequency distribution
we must take the following steps.
1. rank order the test scores
2. tally the score in its appropriate
category (tally)
3. record the number of scores
by summing the tallies in each interval (f)
4. sum the frequencies from the
bottom interval to the top. (cf)
5. Convert the cummulative frequency
to a cummulative percent by dividing the cf by the N (or the total number
then multiply the factor by 100). C%
If we have a set that needs interval
sized larger then one we follow these steps:
1. find the range
2. select the number of intervals
(classes/categories)
3. define the score limits for
the intervals
4. tally the observations into
the intervals
5. count the tallies within each
interval and express as a frequency.
Finding the range.
To find the range we must get the difference between the largest score or observation and the smallest observation or car. The formula is as follows:
R = (highest score – lowest core) + 1.
Adding 1 allows us to reflect all
possible scores.
Select number of intervals or classes.
Define the score limits for the intervals.
To approximately estimate the width
of each interval divide the range by the desired number of intervals.
to make things more convenient we will round up or down to a round number.
Each interval of the frequency
distribution should begin with a multiple of the class. Do not begin with
the lowest score.
As you notice this is not exact there is a small area not included in the upper and lower limits of the interval. set the real limits by taking half of one and adding to and upper limit subtracting from the lower limit gives us the real score limits.
120 - .5 = 119.5 and 139 + .5 = 139.5 140 - .5 = 139.5 and 159 + .5 = 159.5
Tally the observations into the
classes.
Can make an x or use a box
method.
Then follow the procedures for the 1 width interval.
Let make a frequency dist from our
data.
When we build a frequency dist. we usually record with the highest scores at the top. If a lower score represents better performance it’s appropriate to place them at the top. Such as in cases with golf scores or timed performance.
Now we can use the data to graphically
display the data.
Why would we do this?
Because graphically displaying data can be more powerful than presenting numbers. We can more easily make a point in defending our position or drive a point home.
There are two commonly used graphs
that we will learn in this class.
1. frequency polygon
2. histogram
a frequency polygon depicts frequncy as a line plot. With the horizontal or y axis depicting the midpoints of the intervals. And the vertical or x axis depicting the frequency of cases.
Show an example.
A histogram is a type of bar graph. The width of the interval is represented by the width of the bars on the y or horizontal axis. The frequency of cases is shown on the vertical or x axis. Show example using data.
The histogram has now space between intervals. Make sure notice this.
If we develop a histogram from a frequency table the bars do not touch show example using baseball equipment.
Show special situation using data
set on page 36 for shuttle run scores. Where want to develop special rules.
1. range must be calculated to
nearest tenth of second so. R = highest minus lowest + .1 because in tenth
of seconds.
2. Range will be in tenth of seconds.
3. Scores will be recorded from
lowest to highest.
4. If in graph form would be presents
lowest to highest as well.
In education settings we often want
to determine the median.
The median is the 50th percentile (P50) or X.50
A percentile is a score value for
a specified percentage of cases in a distribution of scores.
The fiftieth percentile is the score that lies in the middle of the distribution.
In other words half of the students or subjects score above this score and half scored below this score.
Calculation from ordered scores:
If we have a small group and can
easily order them the median can be calculated pretty simply.
Lets do an example:
Here are some scores from a push
up test.
Data set: 30, 26, 14, 50,
23, 26, 14, 32, 40,50, 30,30,32,33,35,36, 42
First we would order the scores:
14,14,23,26,26,30,30,30,32,32,33,35,36,40,42,50,50
next we would simply find the middle
score: since N = 17 we would count 8 from the front and 8 from the
back. And the score that divides them would be 32.
Now lets find the 75th%ile. 2.
In this case we would find the scorewhere75% or 12 of the students scored
below and four scored above. Here it would be 35.
Lets now use a data set and develop a frequency distribution where the interval is one. Let’s use the one from the book page.47. your homework. Let’s calculate the median for this. we must use the real limits to do this. Remember we must set real limits to include all points under the curve. Also remember we determine the distance between the score limits by taking half of one and adding to and subtracting from the score limits.
62,60,59,66,57,65,58,59,65,60,64,61,60,61,60
N = 13.
Use the following formula now to
determine %ile .
%ile = lrl + .x(N) – sum of frequency below (fb) divided /by frequency within (fw) multiplied (x) by the size (width of the interval) = (i)
Now calculate the 50ile from your dist.
First determine the interval in which the score is in by going .5 (N) = 7.5 thus, the score is the 7.5 score in the list. So we count up to the point in the cf here arewhere this interval is. It is the 60 interval here. It is here because the score is greater than 4 but less than 8 on the cf list. Now we caculate the score.
50th%ile = 59.5 + 7.5 –4 / 4 (1)
= 60.375
do 70th and 80th
Now lets calculate using a frequency dist greater using our frequency table from earilier.
Find out which score would be the
fiftieth %ile
Now calculate using formula
Do 75th
Answer problems one through 6 part
a.
Calculating percentiles when lower scores = better performance.
Use same formula. Only difference is lrl = x + .5 of unit of measurement, if unit is.1 then .5 of .1 = .05 so,
If we want to find the 75th %ile
for figure on page 42
First find interval where score
will lie. .75 x 20 = 15.
The score will lie n the interval where 15 is in the cf it would be 10.4 because cf = exactly 15. Now use 10.4 in you formula.
Remember,
?fb = sum of frequency
between is the cf below the interval.
.x(N) is the percentile multiplied by the total number of cases.
And fw is the frequency within the interval.
x.75 = 10.45 + 12 - .75(20) / 3 (1)
= 10.45 + -3/4(.1)
= 10.45 - .075 = 10.375
Calculation of percentile ranks
Sometimes we need to know what the
corresponding percentile is to a particular score. We know the score
on the test, but need to apply it to see how the person has performed compared
to others. In other words we want to find the percentile rank of
a particular score. A percentile rank gives the percentage of cases falling
at or below a specified score in a distribution.
This information is very useful whether you are in a fitness center or a class room.
Calculations from ordered scores.
To calculate percentile rank from ordered scores choose the score you want to calculate from. The from the bottom count up to that number. The divide the rank of that score by the sum or total number.
Example:
1,1,2,2,3,4,5,5,6,7,7,7,8,8,9,
we want to know in this case what percentile the number four represents?
Count from the bottom we find it is the 6th number out of 15 numbers so
6/15 = .40 x 100 = 40%
calculate for frequency dist interval of 1
use the following formula to get a more precise estime of the percentile rank.
PR for x = ?fb + x-lrl/I (fw)
N x 100
Lets calculate a percentile ranke
using the following steps
1. Identify the interval containing
the scor eof interest. Let use a score of 62
using
your frequency dist from question 3a.
2. find the sum of the cf just
below the interval identified
3. determin the lrl by subtracting
half of the unit of measurement from the lowest score in the interval.
4. Fin, I the interval size, and
the fw the requency within that particular interval.
5. Calculate percentile.
62 = 15 + 62-59.5/ 4 (2) / 30 x 100
62 = 15 + 2.5/4 X 2 / 30 x 100
62 = 15 + .625 * 2 / 30 * 100
62 = 15 + 1.25 /30 *100
62 = 16.25/30 *100
62 = 54.167%
now let calculate it where lower scores = greater performance
use this formula to calculate it
PR for x = ?fb + lrl – x / i (fw)
N x 100
Example using p.42 lets calculate the percentile rank for a score of 10.7
PR for 10.7 = 3 + 10.75 – 10.7 / .1 (2) /20 (100)
3 + .05/.1*2/20*100
3 +.5 * 2/20*100
3 +1 / 20 *100
4/20 *100
.2 *100
20%
IQ scores for 50 persons
141 97 107 127 108
87 124 110 114 92
115 118 101 112 102
91 146 129 114 102
96 108 109 139 134
92 106 83 109 104
98 97 116 113 131
101 108 113 106 89
101 108 113 106 86
107 129 105 89 123
Develop a frequency distribution for the IQ scores provided above.
What will the lowest interval begin
with?
What will the highest interval begin
with?
Develop a histogram from the data
in the frequency distribution.
Develop a frequency polygon from
the frequency distribution.
Which of the following variables
would typically be measured on nominal scales?
a. socioeconomic status b.
speaking ability
c. typing speed
d. favorite food
e. nationality
f. musical ability
g. assertiveness h.
year of birth
i. religious affiliation
j. age
k. political affiliation
l. occupation
m. gender
n. handedness
Which options in question 1 would
likely be measured on an interval, but not a ratio scale?
Which two variables in question
1 are most likely to be measured by ratio scales?
As typically measured, which four
variables represent at least ordinal scales but probably not interval scales?
To say, “this value is 25% greater
than that value,” requires atleast which type of measurement scale?
If one has a choice of interval,
ordinal, and ratio scales for measuring a variable (e.g. height), order
the scales from the least to the most desirable.
When persons are measured on an
interval scale (e.g., date of birth), do differences between persons represent
a ratio scale?
Is the measure “number of books
included in a library’s card catalog” a discrete variable? If not
what measurement scale does it represent?
Give an example of a measure used
in your major are of interest that has the properties of a ratio scale.
Would you recommend measuring age
in months or in years as the unit of measurement? Why?