A Peek Behind the Curtain – Part I: Test Reliability


What do you think of when you hear the term “reliability”? Most of you probably think of it in terms of shopping for a car. You always ask, “How reliable is this type of car and will it break down more than most?” So in a sense, reliability can be thought of as an element of dependability.

Imagine you are planning a long series of trips across the country in the car. You would want to make sure the car will consistently perform as it should. This type of thinking can be applied to the research being done here at eHarmony Labs. When we make scales to assess certain personality traits, values, or interests, it is important to create tests that are reliable and will not break down when estimating any psychological traits.

With a car, this is pretty easy, it either starts or it doesn’t. When researchers develop tests to try to quantify an unobservable psychological trait, such as different personality characteristics, it is much more complicated. Unlike the dependability of a car, there is no direct measure of stress or anger, so you have to do it with indirect measures, like asking someone how angry they are, or watching to see if they make an angry face.

When researchers make these tests they are hoping to get dependable estimates of these underlying traits. Reliability, from a methodological standpoint, refers to the consistency of a measure or test. Think of it like a game of darts. If you are consistently hitting the same target over and over, even if that isn’t where you were aiming, your throws would be considered reliable. In terms of a psychological test, if all of the items on a given test are estimating the same underlying trait, it is seen as a reliable test.

The most interesting thing is that a test can be considered reliable even if the wrong trait is being measured, as long as it is consistently measuring the same thing. So a test can be perfectly reliable, but useless! For example, you wouldn’t want to use an aggression test to measure extraversion. You would be getting consistent measurements, but they wouldn’t mean a thing!

One of the most common ways to assess reliability in psychological tests is known as internal consistency reliability. In a very general sense, individual items that are measuring the same psychological trait are compared to each other to determine how similar the estimates are. For example, if you are asking about how satisfied you are with your relationship, you might ask several different questions like “How happy are you with your relationship”, “How often do you fight”, “Do you tell your partner your innermost thoughts”, and “Does your partner help you when you are in trouble?” You expect that someone who is satisfied with their relationship would answer all of those questions in the same way (happier, less fighting, more disclosure, more support).

How does the reliability of a test impact the work done at eHarmony, and how important is it, really? Let’s look at the relationship satisfaction measure we use in our studies of married couples, the Dyadic Adjustment Scale. The test is used to estimate the level of satisfaction a person has in their romantic relationship, and is used across many of our ongoing projects. It is one of the most reliable tests for relationship satisfaction. If we didn’t have a reliable measure of relationship satisfaction, we wouldn’t be able to make our matching models and eHarmony wouldn’t exist!

A test with low reliability would bias the estimates of the satisfaction levels, and not be an accurate representation of how satisfied a certain couple was. In other words, our observations of relationship satisfaction would be skewed because some of the items in the test would be measuring something different than satisfaction. It would make for a very poor estimate of a satisfaction score. If we still used it we wouldn’t be predicting which couples were most satisfied, we would be predicting something else. Just like planning for a long road trip with the reliable car, researchers want to make sure that the test they develop will consistently perform as it should on the road ahead.

As you can see, ensuring each item is a reliable measure is very important in test construction, which is the basis for the research we do at eHarmony. Without testing for reliability, we don’t know if we have made a valid matching model. But what does this say about the overall dependability of the test? Reliability is only a part of the story in determining the dependability of the test. With reliability, we’ve established that the test is measuring the same trait consistently. However, we still don’t know if we are measuring the trait that we want to measure. Stay tuned for the upcoming Part II of “A Peek Behind the Curtain” in which we examine the other half of a dependable test, the test validity.

Read more from Jonathan Beber at eHarmony Labs.

If this article gave you the confidence to find your match, try eHarmony today!

Join Now

By posting a comment, I agree to the Community Standards.
Need help with eHarmony.com?