A test which is criterion-referenced translates scores into a statement regarding behavior which is expected of an individual with that particular score. The majority of tests which are prepared by teachers in school are criterion-referenced. The purpose of these tests is to indicate if a student has a reasonable grasp of the material.
It is important to understand the meaning of criterion any given reference. A number of criterion-referenced tests have a cut score which indicates if the examinee has passed or failed the test. Usually, if the score is above a certain level the individual is assumed to have mastery of the material. The cut score is different from the criterion. The criterion indicates the domain for the subject matter which is being assessed. For example, if the criterion is that students should be able to spell common two syllable words, the cut score might be 75% of these types of words spelled correctly (Harlen).
Interpretation of criterion-referenced test scores indicates the relationship to the subject who is being studied. When a test of mastery is being considered, the examinee will be considered as having mastered the material when they score at, or above, the cut score.
While cut scores are common, there are a number of criterion-referenced tests, which have no such score. An example is the ACT-test, which is designed to assess the level to which a student has mastered high school level material.
There are many college entrance exams, which are norm-referenced test. For example, the SAT and GRE compare how a student performs relative to other students taking the same test. Individuals being assessed with this type of test cannot fail. The test score merely with reports how the person compares to peers. This is often given as a percentile.
A disadvantage of norm-referenced testing is that it does not measure progress, which is population-based. In other words, if everyone taking the test is doing better or worse, on the average, this will not be indicated by scores on norm-referenced testing. Changes in the general population can only be determined when scores are measured against a fixed standard. An advantage of this type of test is that both teachers and students know what to expect from these types of tests, and they are administered to large populations.
Standards-based tests assess whether an individual has achieved a certain level of mastery of the subject being measured. Generally, this involves setting up a criterion which is determined to be the benchmark for achieving an acceptable understanding of a given subject. This is the basis for many tests, which are now given to children to ensure that they are receiving proper education to help them achieve mastery over important subjects (Salkind).
This type of testing has a number of advantages. The students have their scores compared against a certain standard. They are not artificially ranked in a group that provides a Bell curve. This means that it is possible for all students to succeed if they meet or exceed the standard. If a Bell curve were used is possible that students scoring above the standard would still be considered failures if their peers score relatively higher. Many of these tests are scored by people rather than computers. This means the value of a response can be considered in its entirety rather than simply as a right or wrong response. The scores are not multiple-choice tests, which have been shown to be unfair to minorities. The criterion-based tests which are standards-based can be the focus of courses. This is often difficult with a multiple-choice test. The standards-based tests are designed to accommodate education reform, which can allow all students to succeed (Harlen).
An ipsative measure indicates which of a number of desirable options has been chosen. There can be 2 or more options to choose from. This is sometimes called a forced choice scale. This is different from a Likert type scale in which respondents choose a score from 1 to 5, which indicates their agreement or disagreement with a given statement. This is an important distinction since Likert type scales will yield scores, which can be compared across individuals. However, the scores from ipsative measures cannot be compared in this fashion.
An example may be helpful in understanding the distinction between a Likert scale measurement and an ipsative measure. If one were to measure kindness and empathy with a Likert type scale, individuals could indicate how much they agreed with statements, which showed either of these traits. If the traits were being evaluated in an ipsative manner, the respondents would have to choose between several statements and choose which one best applied to them. In general, ipsative measures are helpful when attempting to evaluate traits within individuals. They are also helpful to determine if an individual is faking.
Applicability to the Proposed Study
The proposed research design does not involve any specific type of traditional testing. However, there is an assessment done of the amount of involvement the father has with the family. There is also an interview to determine the level that the mother and daughter feel that the father is involved with the family.
There is no specific criterion upon which the father's involvement with the family is being measured. The only criterion upon which the father's involvement is being based is in relation to the other fathers in the study. Therefore, this measure can be considered to be philosophically based on norm-referenced testing (Salkind). In other words, the father's time spent with the family is only being compared to other fathers participating in the study and is not being based on a standard.
The sexual behavior and development of the young females in this study are being compared to others who are participating in this analysis. They are not being compared to any type of standard, which is based on medical or psychological protocols. It could be argued that their pregnancy is a type of criterion. However, the time of pregnancy is not the focus of this study. Instead, it is a comparison of early female sexual development and behavior relative to other participants in the group. These observations are compared to other females in the group in relation to the level of father presence in the household. Therefore, this is a norm-referenced type of evaluation.
Harlen, W. Teaching, learning and assessing science 5-12. Thousand Oaks, CA: Sage.
Kramer, G. P., Bernstein, D. A., & Phares, V. Introduction to clinical psychology. Upper Saddle River, New Jersey: Pearson Prentice Hall.
Salkind, N.J. Encyclopedia of educational psychology. Thousand Oaks, CA: Sage.
Author InfoPeter Studies
More about Author
Peter Studies is a psychologist and a part-time lecturer at the University of Wisconsin.