Search form

Section 3. Data Collection: Designing an Observational System

Learn how to find ways to observe the behavior, the conditions, and the changes - or lack of changes - that will answer your evaluation questions.


  • What do we mean by an observational system?

  • Why design an observational system?

  • When should you design an observational system?

  • Who should design an observational system?

  • How do you design an observational system?

The local community health center was starting a program to encourage regular physical activity among people with high blood pressure. The program had one simple objective: to engage participants in 45 minutes of regular, moderate aerobic exercise at least four times a week over the course of six months.  The hope was that this regimen would lower participants’ blood pressure, and lead to weight loss and an overall sense of greater well-being. A related aim was that participants would continue the exercise on their own after the program ended.

The Center had gathered a group of 50 people who were willing to take part. After physical checkups, all the participants attended workshops on diet, the mechanics and dangers of high blood pressure, and how to start and maintain injury-free regular exercise. They also received counseling about the kinds of exercise they might undertake – walking, bicycling, swimming, etc. – and about ways to make exercising pleasant. To help participants to integrate exercise into their lives, the center decided to ask them to exercise at their own convenience, using whatever activities they chose, as long as they maintained the 45-minute, four-day-a-week pattern. They were also asked to keep journals of the frequency and nature of their exercise, and to meet, in groups of five, with Health Center counselors once a month for checks on blood pressure, weight, and progress, as well as support, encouragement, and advice.

The center then had to decide how to evaluate the program. The performance goal of the program – what participants were actually supposed to do – was maintaining the exercise schedule over the six-month period. Since each participant was tending to that individually, it would be hard to actually watch them all exercise whenever they did so. But the center needed to know whether or not they had. The other program goals – lower blood pressure, good weight loss – also had to be observed somehow.  How could the center design a system to find out whether the behavior of physical activity was occurring and whether this resulted in lower blood pressure and weight loss?

Once you’ve determined your evaluation questions and gathered information about what to look for, you have to find a way to look for it.  That’s what observation systems are all about. Like the center in the example, you’ll need to find ways to observe the behavior, the conditions, and the changes – or lack of changes – in them that will answer your evaluation questions. This section is about setting up observation systems to do just that.

What do we mean by designing an observational system?

An observational system is the way you get information about your program – what it and its participants and implementers are actually doing, and what seems to be occurring as a result.  “Observation” here may mean actual observation – watching people, conditions, activity, or results to see what happens – but it may also refer to less direct ways of monitoring a program’s operation and outcomes.  Its varieties include monitoring the behavior of individuals and groups to see the results at different levels.  Some methods of observation that might prove useful in different evaluation situations:

  • Direct observation. This is the purest and most verifiable form – watching people or observing conditions or situations firsthand. If you’re involved in an effort to increase the use and neighborhood sense of ownership of a public park, for instance, you might directly observe how much and how people use the park by visiting and observing on different days, in different types of weather, and under different circumstances over a substantial period of time. Direct observers may be “invisible,” as an observer of park activity would probably be, or they may be staff members who work with participants, recording what happens.  In either case, they are taking measures as outside observers, not as participants themselves.
  • Participant observation. A participant observer becomes part of the action, and observes as an insider. In the case of the park, a participant observer might be a neighborhood resident directly involved in the effort, or might be someone who becomes part of the life of the park for the purposes of observation. He might jog there daily, or join a weekly volleyball game and get to know others who use the park on a regular basis. His own notes about what is observed in the park might also become part of his recording.
  • Self-reports. Some of what you’re trying to achieve may simply not be visible at all, at least not to you. Changes in what people do in private, such as their use of contraceptives, may not be (or should not be) observed directly by an outsider. Similarly, when the goal is to affect changes in the behavior of large numbers of people, such as to promote healthy eating in the community, it will not be feasible to directly observe this for everyone.  In such situations, we ask people to report on their own behavior Thus, an observational system may include interviews, journals, surveys, or other means of first-person reporting. Since such reporting may be subject to bias, we usually try to also use other forms of evidence (e.g., observing weight loss as a product of the behaviors of health nutrition and physical activity).
  • Second-hand reports. An observational system may include or depend on the reports of others who have direct experience with the people or conditions you’re concerned with. Teachers, probation officers, park rangers, public health nurses, social workers – even bartenders or hairdressers – might be valuable sources of second-hand information. These reports, like self-reports, may be gathered by interviews, journals, surveys, checklists, and the like.
  • Electronic or mechanical observation. The observer in this case isn’t a person (although ultimately people would review its information), but an automatically-operated or always-on camera, audio recorder, heart monitor, pedometer, GPS (global positioning system) tracker, or other piece of equipment.  A camera operated by a tripwire is often used, for example, to study the density of an animal population in a particular area, or along a particular path.
  • Tests of various kinds. Depending on what you’re measuring, this category could cover everything from pencil-and-paper tests of academic learning to hands-on skills tests to blood tests and the like. They might also include tests of new program methods and procedures to see if they work before putting them into practice.
  • Public and other records. Police reports, census data, employment statistics, public health information – all of these and more could give you information on community-level indicators that will help you determine the outcomes of your work.
  • Products or results of behavior. Sometimes it is more practical to observe the product or result of a behavior, rather than the behavior itself. For instance, if interested in environmental pollution, we might observe the amount of debris or toxins on the ground or in the water, rather than the behavior of illegal dumping of toxins or materials. Similarly, an initiative interesting in preventing childhood obesity might use school records of height and weight to measure obesity – in addition to direct observations of school lunches and what youth report on eating surveys.

In addition to specifying what kinds of observation you’ll use, the design of an observational system should also cover when, where, how often, by whom, and under what circumstances observations will take place, as well as just what will be looked at. All of these depend on what you’re observing and what information you hope to gain from your evaluation (back to those evaluation questions again).

Among other considerations, will you look at the process of your effort or program – the steps you took in setting it up, and whether they were faithful to what you intended?  Will you look at what you actually did – the number of participants you had, the methods you used, the time everything took, how long participants stayed, etc.? Are you interested in which parts of what you did were successful and which were not?  And what do you want to know about outcomes – the results of your program?

Designing an observational system entails thinking carefully about what you need to know, and creating a system that is most likely to get you that information as accurately and easily as possible. We’ll discuss the design process in detail, including the issues mentioned in the last two paragraphs, in the “how-to” part of this section.

Why design an observational system?

If you’re serious about evaluation, there are a number of reasons for designing a good observational system:

  • It can help you get reliable information. Designing a system that standardizes the methods, times, and other aspects of the observation will mean that the information you get from different observers and places is likely to be accurate and consistent, and therefore more useful to an overall evaluation.
  • It can help you find out exactly what you need to know, without wasted effort. You can design a system that examines what you’re interested in, and ignores what you’re not. That means that you don’t have to sort out unnecessary data, and that you’ll have the right means of collecting the data you need to address your evaluation questions.
  • It can ensure that observations are made. A consistent system that’s designed and accepted by those who will do the observing, whatever form it takes, makes it far more likely that observations will be made when, where, and how they’re supposed to.
  • It can make it easier to analyze your data. A consistent, rational system of observation can give you good information for scientific analysis, whether that analysis is quantitative (based on numbers and statistics) or qualitative (based on narrative and interpretation of the meaning of behavior and events.)
  • It can help you avoid haphazard evaluation. A well-designed observational system will allow you to collect information systematically, and not leave you with a mass of disconnected data that are not necessarily related to what you want to know.
  • It will make it easier to justify your findings. The more accurate your information, the more reliable the conclusions that can be drawn from it. If your observational system is designed and implemented well, it’s much easier to argue that your information is reliable and accurate, and a good base for the conclusions you reach.
  • It can help you gain credibility with funders and policymakers. The people who control funds and policy are particularly concerned with accountability. If you can present them with a useful evaluation based on data collected through a well-designed and reliable observational system, they’ll be more inclined to treat you as a knowledgeable voice in the field.
  • It can let you pass on your practices with confidence. A well-designed observational system makes it possible to feel that your evaluation results tell the truth. If those results show that your program is highly effective, you can pass on what you do as a best practice to colleagues in the field and others, without worrying that you may be urging them to use methods or assumptions that might not work very well.
  • It can give you the best information possible about what’s working in your program, and what you need to adjust.

When should you design an observational system?

As we’ve discussed, an observational system refers not only to direct observation, but to any method of examining and recording the process, activities, and outcomes of your program. An observational system is intrinsic to your evaluation, since that is what will tell you what actually went on. Therefore, the ideal is to design that system before you actually start implementing the program, so that you can monitor the program throughout its existence.

That’s the ideal. The reality for many community workers – especially those who work in small, community-based organizations – is that evaluation begins whenever the time, energy, and resources are available, which is often months or years after the program has started. Whenever it begins, the observational system should be designed to fit the evaluation questions you’re asking. Because the observations must be consistent and reliable it's well worth taking the time to make sure to design an effective evaluation system.

It’s best if you can observe through a whole program cycle, from beginning to end. Some programs don’t have a cycle, and the observation may focus on the behavior or results of individual participants rather than the program as a whole. In these situations, evaluation may begin as new participants begin the program and are observed from the beginning.

If you’ve been recording events, keeping journals, etc., before you start evaluating, you may have information that you can incorporate into the results of your observation. If your design calls for specific firsthand observations, their definition may be precise enough that similar observations recorded in journals kept by staff may not meet the criteria to be included.  If staff journals or records are part of the system you’re putting in place, however, you may be able to use all or most of the information you have (and, indeed, you could design the observational system so you can.)

The real danger here is that you’ve missed something important already by the time your observational system is operational.  It may be that the early part of the program is crucial, at least for some participants or for some changes, and you’ll be starting to observe after those changes have been made.  Start your observations early so that your system can pick it up.

Who should design an observational system?

We’ve stated many times the Community Tool Box bias toward participatory process, and particularly toward participatory research and evaluation. In the case of an observational system, a system will function best if it’s designed by a group that includes those who will actually be the observers. If they’re part of the planning, they’ll be familiar with the system, know exactly what information it’s meant to observe, and understand their roles with the observational system.

In a community-based or smaller organization, it’s likely that time will be a factor – there are probably too few people already doing too many jobs. If that’s the case, the level and nature of the observational system has to be one that the staff can actually handle, whether they’re the observers, or whether they’re facilitating the observation for outside evaluators or volunteers. If they help to plan and set up the system, they’ll have far more incentive to make sure it works than if it’s imposed on them.

The design of the observational system should specifically include the people who will actually do the observing, who are often either staff members or members of the group that will benefit from the program. In addition to these, it helps to include researchers or others who understand observational systems, and can help to design a system that specifically meets the needs of the evaluation or research project. It might also be beneficial to include individuals from the group(s) that will be observed, to help with cultural issues and provide feedback on their response to the design. The observational design team, therefore, might consist both of members of the overall evaluation planning group, and others specifically recruited to work on an observational system.

The actual short list includes:

  • Program staff and administrators
  • Support staff (who often do recording or data entry)
  • Outside evaluators or research consultants
  • Participants or beneficiaries
  • Volunteer observers

If, for some reason, the design group doesn’t include anyone with research experience, a training that includes information about different methods of observation, and about which methods are likely to produce which kinds of information, could help greatly to inform the design process. If the group does include researchers, that information could be presented as part of the discussion about design and the various possibilities, rather than in a training or workshop format.

How do you design an observational system?

So, you’ve decided on evaluation questions and planned an evaluation. Now it’s time to determine how you’ll get the data you need to answer your evaluation questions.

Review your evaluation questions

Remember these? You decided what it was you wanted to know, in order to determine whether your program was effective. Let’s go back to the example at the beginning of this section, the local community health center program. The center was starting a physical activity program for people with high blood pressure. Its objective was to have the participants engage in 45 minutes of moderate exercise at least four times a week over six months.

It hoped for several outcomes from this activity:

  • That participants’ blood pressure would decrease
  • That participants who needed to would lose weight
  • That participants would experience a sense of greater well-being
  • That participants would continue the exercise routine after the six-month program ended.

Some evaluation questions, therefore, are:

  • Did participants engage in the recommended exercise routine for the period of the program?
  • Did participants’ blood pressure decrease by the end of the six months?
  • Did those participants who needed to lose weight do so by the end of six months?

There could easily be many other questions, a few examples being:

  • How well attended were the workshops? Did participants find them helpful? Did participants who attended all or most of the workshops achieve better results than those who didn’t?
  • Did participants experience a sense of greater well-being by the end of six months?
  • Did participants continue the exercise routine (and maintain their lower blood pressure) after the program ended?

The answers to these questions might be only a few of those that the center wanted, but let’s stick with them for now, and use them as examples as we go through this part of the section. You may be concerned about your own process – how well you actually plan and implement your program.  You may also, like the center, have a specific time frame in which you hope for results. You might also have benchmarks – smaller achievements along the way to a larger goal – that you’re concerned with recording. All of this should  figure into your observational design.

Decide what you need to observe to answer your questions

Depending on the kind of program or effort you’re engaged in, and the nature of your evaluation questions, there’s a broad range of choices here.

Some of the most common:

  • Participants’ behavior. This could be anything from the aggressive behaviors of children in a schoolyard or play setting to the degree of welding skill exhibited by participants in an employment-training program to the social interactions of the users of the neighborhood park we discussed earlier. The possibilities are nearly endless.
  • Someone else’s behavior. The ultimate test of whether a high school peer mediation training program is working, for instance, may not be the behavior of the mediators on whose training the program focuses, but the behavior of the students with whom they work. There’s also a possibility here of looking at the ways in which participants are treated by program staff and vice versa.
  • Conditions. An initiative may aim to change conditions directly – eliminating a crack house where drug dealing occurs, building affordable housing, cleaning up a polluted river – or may be meant to influence those changes by implementing programs or environmental or policy changes..
  • Observations of products or results of behavior. When the behavior or event or condition itself isn’t visible or observable, either because it’s private, or because it takes place on a level that can’t be observed directly, you may have to measure its products or results.  It would be virtually impossible to observe directly the rate at which adolescents practiced safe sex, but it would be possible to learn the rate of STD infections among them, and the number of teen pregnancies before, during, and after a safe-sex peer education program.

When products or effects are all you can observe, you have to be sure that you’ve chosen the right ones to look at.  They should be, to the extent possible, obvious results of the behavior or condition you’re interested in, and you should take into account – and try to correct for – any other factors that might have caused them.

  • Participants’ knowledge or attitudes. Like participants’ behavior, the possible range here is enormous, from scores on a knowledge test to nearly anything else you might think of.
  • Someone else’s knowledge or attitudes. For instance, an advocacy program would be concerned with changing the attitudes of legislators and the public, but might not have direct contact with those whom it hoped to influence. This might use repeated public opinion surveys to assess willingness to support a particular policy change.
  • Goal attainment.  Some programs have a particular aim that is their only reason for existence. This might be the passage or repeal of a law, the building of a school, the freedom of one or more political prisoners, etc. The only evaluation question in that case may be whether the goal has been achieved (or to what degree). In this situation, a goal attainment scale can be used to assess the degree of attainment (e.g., from 5 = most favorable outcome, to 1 = least favorable outcome).
  • Interactions. The focus of an evaluation might be on the nature of interactions, or on whether particular individuals or groups interact or engage each other.  For instance, if the goal is increasing parent-child interaction, each parties’ talking to and responding to the other might be measure. As above, interactions among program participants, staff, or between participants and staff might be the focus of observations.

All of the above possibilities might have to do with either program goals – i.e., what a program wants to accomplish – or process and implementation – how a program goes about setting up and carrying out its work.

There are also some areas of observation that relate specifically to program process and implementation:

  • Planning. Measurement here may focus on who was involved in planning what parts of the program, how the plan was developed, what its content was, satisfaction, etc.
  • Timeline. When did the planning, implementation, and evaluation of the program each begin?  How long did each take?  Were deadlines met, and, if not, why not?
  • Numbers of participants. How many participants did you have? What was the average amount of time they spent in the program? How many dropped out before completing the program?  How did those numbers compare to what you expected?
  • Methods. What methods did you use in the program or intervention? How were they used?
  • Program implementation. What did you actually do? This would include the program activity, its frequency and duration, the number of participants it served, where it took place (if that’s relevant), and how it was conducted.

In addition to identifying what you want to look at, you’ll have to define it carefully, so that observers know exactly what to look for. You have to be certain that all observations of a particular behavior, for instance, refer to the same phenomena (e.g., specific features that define whether the behavior occurred), even if observations made by different people.  If they don’t use the same measurement, you can’t really count on the information you get. Setting the limits of observations in each category – what’s included, what’s excluded, and where the boundaries are – will help to eliminate disagreement and make the observation more reliable.

To continue the health center example, let’s look at what the Center needs to observe.  In order to determine whether participants are actually doing their exercise regularly, the center has to find a way to observe people’s behavior (e.g., activity logs, self-reports on how often they engaged in physical activity). To find out whether they’ve lost weight, the center has to observe an outcome of behavior (e.g., weight, body mass index).  To learn whether they’re experiencing an increased sense of well-being, the center has to obtain self-reports.  And to learn whether they continue their exercise past the end of the program, it has to find a way to observe behavior after the program ends.

Decide how the observations will be conducted

Earlier, we discussed some methods of observation. We’ll return to those here, and examine them in greater detail.

  • Direct observation. Direct observation involves either the “fly on the wall” approach, where the observer is anonymous and generally unnoticed, or – more often in service organizations of various kinds – is either an (identified) outside evaluator, or a staff member who works with participants and records, sometimes with their help, their behaviors and aspects of the situation  Anonymous observers are particularly good in situations where any people being observed are equally anonymous – conditions, large events, or activity like the use of that neighborhood park we talked about earlier, where the people involved could be anyone.

One common method of direct observation -- whether the observer is a program staffer or a program participant – is through keeping a journal or activity log. The observer writes down or otherwise records, soon after the occurrence, an account of what happened and events related to it, and often reactions to those experiences as well.  The journal or activity log then becomes a picture of the flow of the program, detailing the progress through it of particular participants, adjustments made, and satisfaction.

The nature of journals or report logs will obviously vary with the nature of the program, and not all programs or efforts lend themselves to this kind of observation.  But, especially in situations where several people have written journals or logs that cover the same period and events, they can be a very powerful and revealing means of observation.

  • Participant observation. As we’ve explained, participant observers become part of the event, activity, culture, etc. that’s being observed, and experience it firsthand. Thus, in a health or human service program, the observer might be an actual participant (i.e., a  member of the group at whom the program is aimed), or an evaluator who joins participants in their activities. In the case of the park, for instance, a neighborhood resident who already visits the park regularly might volunteer to track how he and others actually use it – when various people or groups come, what they do once they’re there, which parts of the park they frequent, who interacts with whom, etc.

A mico-grant program was designed by a non-governmental organization in a rural village to increase income through creation of small businesses by participants. Members of the staff took part in each workshop, and participated in such activities as training in lending and loans, all the while observing the activities and other participants.  At the end of each day, the staff conducted group discussions where they relayed what they had seen, and asked participants to analyze what they had done.  The staff in this instance functioned as participant observers, and their participation added greatly to their ability to help their client, low-income women from small villages, understand and use their experiences.

  • Self-reports. When the object of your observation is participants’ behavior that takes place away from the program (the amount of time participants spend reading to their children, for example), you often have to rely on the observations of participants themselves. This reliance has, as you might expect, both advantages and disadvantages. On the one hand, participants obviously know their own actions well. On the other, they can also leave out things that might be embarrassing or report in ways that think others want to see.  In addition, since they haven’t had prior experience as or been trained as observers, they might miss, or dismiss as unimportant and not report, behaviors or conditions that would be valuable for evaluation purposes.

An obvious remedy would be to train self-reporters as observers, and in some cases –medical trials, for instance – that’s both reasonable and common.  In other situations, however, it would go too far toward telling participants “the right answers,” and thereby possibly changing what they report toward what they think you want to hear.  There is also a huge advantage to self-reports: when they’re honest and represent a real change in behavior or experience for the reporters, they’re far more powerful than anything another person could say about their experience.

Self-reports, at least as defined here, imply an array of possible techniques for data collection.  Individual and group interviews, focus groups, public meetings, surveys and questionnaires, journals, checklists, and even casual conversation might all be ways for participants to convey information for an evaluation.

  • Second-hand reports. These are reports about participant behavior or about conditions that come from people associated with those participants or conditions, but not directly connected to your program or effort. They might be service workers, teachers, health professionals youth workers, family members (particularly parents of young children), employers – almost anyone. Second-hand observations have to be viewed with some of the same cautions as those from people who work closely with participants. Observer-participant relationships, sympathy or empathy for participants, or observers’ personal biases can all keep reports from being objective. These observers may also need training.
  • Electronic or mechanical observation. There are some circumstances where human observation is impossible or impractical. Observations of speeding are often made electronically, as are observations of health conditions inside the human body (using X-rays, CT or MRI scans, electrocardiographs, etc.)  In these situations, objectivity is no problem, but you have to be sure that whatever equipment you use is working properly, set up correctly, well-maintained, and protected against possible damage. It is also important that whoever interprets the information the equipment provides is trained to do so, and understands the limits and appropriate uses of that information.
  • Tests or other similar observation tools. Education and health organizations often use various kinds of tests as observation tools. In a human service context, they are generally used to observe progress in skills, competency levels, or development. In public health or medicine, tests may be used to observe health status (e.g., screenings for elevated blood cholesterol) and the effects of treatment. They can be very useful in all these circumstances, but they’re also very specific, and don’t allow much room for intuition. In addition, the results of tests of skills, knowledge, or intellectual ability may be influenced by nervousness, lack of sleep, personal problems, or other factors that have little to do with actual competency.
  • Public records and the like. If you’re using community-level indicators, such as rates of infant mortality or injuries due to motor vehicle crashes, as one way of looking at outcomes, you’ll have to use records, census data, and other similar material to get the data you need.

To continue with our previous example, the local health center would have to use a variety of these observation methods. The beginning and ongoing observations of blood pressure and weight at the monthly counseling sessions would take place with the use of instruments – a blood pressure cuff and a weighing scale – as well as by direct visual observation. (While an obvious reduction in fat may not indicate weight loss if the fat is replaced by muscle, it does indicate an increase in fitness, which may be equally beneficial. Fitness levels could also be mechanically and electronically measured, if the program chose.)

The amount and type of exercise each participant engaged in would be self-observed and self-reported through journals and interviews.  Feelings of well-being would be self-reported, but could also be observed by counselors trained to look for changes over time in posture, self-presentation, and other observable indicators.  Finally, observations of continued exercise would come through one or more follow-up visits some time after the program ended, with interviews, blood pressure and weight measurements, and more direct visual observation of fitness levels.  (Participants might also agree to continue keeping journals for a set period of time after the program ended, thus providing a self-report of their ongoing levels of physical activity.)

Decide when you need to observe

The question here is whether you need to start observing at the very beginning of the program (you almost always do), and how often you should observe throughout the course of the evaluation.

Some of the possibilities:

  • Pre- and post- observation. This means making your observations at the beginning and the end of the evaluation period or the program. It’s the equivalent of what many schools do with standardized testing. They test reading scores at the beginning and end of each year, and then compare the two to determine how much the students have advanced. Although this type of observation may tell you whether anything changed during the program, it won’t give you strong evidence how the change took place, what caused it, or how effective your methods were.

This explanation assumes only before-and-after observation.  Most of the possibilities here include before-and-after observation, but add other observations to it.  For most kinds of evaluation, you should start observing at the beginning, or even well before the beginning (to understand whether any changes may in fact be part of an already-existing trend).  If you conduct your first observation partway, you won’t know if changes occurred before then.  A major change may occur toward the beginning in some interventions, toward the end in others, steadily throughout the intervention in still others, and in a few not till after the intervention is over.  It’s important to know just where you started from in order to fully understand what you’re seeing.  It may be that a long intervention is no more effective than a short one, or that a short one makes no difference at all.  You can only tell by knowing where you started from and through repeated or continuous measurement.

If your program or effort is one with a specific, one-time goal – for example, the passage of a law, or the clean-up of a particular space – the temptation may be to evaluate it only by whether you reach your goal or not (i.e., a single observation at the end of the effort).  This would be a mistake, because it wouldn’t take into account the parts of your effort that were successful and why, whether or not you reached your goal.  That’s a piece of information you’ll need the next time – and there will be a next time – you or others in the community take on a similar effort.

  • At regular intervals during the evaluation period. You might choose any period from once an hour to once a month or more, depending on what you’re observing. The regularity makes observations easy to schedule, and gives an interval-by-interval picture of what’s going on.
  • At irregular intervals during the evaluation period. The reason for this schedule might be logistical (you observe when you can); might have to do with making sure that observations aren’t expected, so that you get a true picture of what you’re looking at; or might be an attempt to look at the program or effort randomly, again to try to get an accurate picture.
  • At specific times during the evaluation period. In this case, you might be concerned to see what happens or what participants are doing at different, identifiable times that imply different, identifiable conditions. In observing the use of that neighborhood park we mentioned earlier, you might want to be sure to go on weekdays, on weekends, in the morning, afternoon, and evening, at each of the four seasons, in rain, clouds, sun, and snow, and on days when there were special events in the park, to see who uses the park and how under different conditions and at different times.  If you’re monitoring the process and progress of the program, it’s important to make sure you observe each stage of it – the planning, the preparation, the implementation, the evaluation, and any follow-up – to make sure you get a full picture of what you did and how you did it. This will give you the information you need to analyze in order to make adjustments in how you conduct your work.
  • Continuously. When the observer is a staff member working with program participants (or one or more of the participants themselves), it may be possible to make ongoing observations.The observer in this situation might observe directly using checklists, keep a journal, ask participants to keep records, video- or audiotape sessions, or record what happens in some other way, so that there’s an ongoing, day-to-day account of the behavior and what is happening in the environment of the program.

At the local health center, some of the observation – particularly that of monitoring participants’ blood pressure and weight – would be done at regular intervals, during the monthly meetings.  There would also be some continuous observation -- that of participants keeping track of their exercise programs in journals.  And finding out whether participants continued with their routines would be a one- or two-time follow-up – perhaps six months and a year after the program ended.

Define and describe the behaviors, products, conditions, and/or events that observers should be concerned with

If you want to be sure you know what observers are referring to in their reports, you have to be specific about what you want them to look for. The planning group, or a subgroup of it – the ideal would be a group that included a high proportion of people who will actually be researchers and observers – should set out identification standards for each element to be observed. These would explain what it looked like, when it was likely to occur, who would probably be involved, etc. For instance, to observe bullying or interpersonal violence on a playground would require clear definitions of this behavior, examples and non-examples, and scoring instructions. As a result, observers could all start with the same guidance about what they were looking for.

Design training for observers

Unless all the observation is to be done by those directly involved in the planning (not impossible in a small organization), and depending on their previous experience, observers might need to be trained in one or more areas:

  • What it’s important to record, and why. People who have no acquaintance with research might not realize how important it is to record such details as the date, time, evaluation length, place, and circumstances of any observation, a description of who was involved and for how long, whether there were unexpected people or conditions present, etc. An early morning observation might provide a different set of observations than a late afternoon or evening one, for example.The presence of other people in a situation or interview – relatives, friends, program staff – can change the character of the behaviors displayed or information offered. It’s crucial, therefore, that observers understand that the context of the observation can be as important as its content.
  • The definitions and descriptions of the behaviors, conditions, events, or situations to be observed. Careful definitions and descriptions of what’s to be observed won’t make much difference unless those who’ll do the observations are familiar with them.
  • Effects of observation. In some cases, the behavior of participants might change as a result of their reactions to being observed. Observers have to be aware of that possibility, and make their own behavior as invisible as they can, so as not to influence participants’ behaviors.

In addition to human observers, the presence of audio or video equipment can often have an effect.  One way to offset it is to wait to start collecting data until participants are used to the presence of the equipment.  It’s also important to get participants’ permission beforehand to use recording equipment.

  • Observer bias. Especially in situations where the observers are also program staff, their relationships with participants, or simply with the effort as a whole, may affect their reports or observations. If they particularly like or dislike a participant, that may have some influence on how they interpret or describe that person’s behavior. If they’re heavily invested in the success of the program being evaluated, they may – intentionally or unintentionally – put the best possible light on what they see. Whether or not they’re program staff, observers can also be influenced by their personal assumptions, their cultural, religious, or educational background, or their current psychological states or life circumstances. If they can be helped to recognize these biases and understand why and how they should be acknowledged or eliminated, there’s a better chance that they’ll conduct reliable observations.
  • Observer drift. Sometimes, after people have been observing for a while, their observations tend to take on a regularity based on the rules they make up rather than shared definitions based on a standard.  They might tend to rate the behavior of certain participants in ways based more on past experience than on what they see, for instance.

You may also have to correct for observer effects, bias, or drift over the course of the evaluation.  That’s part of devising checks for accuracy and reliability based on a standard.

Devise checks for reliability and accuracy

If your information is to be reliable, it’s important that when two observers record a particular behavior, they mean exactly the same thing. This is a matter of training (see above), and also one of checking, either at the beginning or periodically, to make sure that all observers are seeing things in the same way. A participatory design of the system will help here. If the observers are involved in defining what they’ll all be observing, there’s a much better chance they’ll all see it similarly.

Some ways you can try to ensure agreement among observers:

  • Use an external standard. One way to define what you’re looking for is to use a standard that’s used and accepted by all observers. “Behavior X looks like this, occurs in these circumstances, lasts for this long, and has these after-effects or results.”  The use of external standards often employs a checklist or something similar.  The observer checks off components of a behavior or condition, thus documenting what he sees in a way that matches how a different observer would score the event the same situation.  Such standards help assure the continued accuracy of the observational system.

Research teams and laboratories commonly use standards to assure agreement in identifying various conditions.  Each condition is described in detail, with various possible markers, such as measure of blood pressure or environmental toxins.

  • Check for inter-rater reliability. Inter-rater reliability is the research term for assessing whether all observers interpret the same things – behaviors, conditions, events – in the same way. One way to address it is to check observers against one another. Two or more are exposed to the same situations or information, and then their scoring, such as of instances of bullying,  are compared. If they all say essentially the same thing, then inter-rater reliability is high, and everything’s fine.Typically, in research, if observers agree 80% or more of the time, observations are deemed reliable.If they disagree about what they saw, then you have to find the source of the disagreement. They may define terms differently, or their backgrounds may bias them toward seeing the same thing in different ways. Whatever the case, you have to uncover differences, and find a way to help all observers see things similarly and accurately.
  • Use random third-party checks. A researcher, program director, or someone else who has a clear idea of what information is important and what various conditions or behaviors look like can observe in randomly chosen situations along with a regular observer to see if their observations match reasonably well.  If they disagree once, it may not mean much; but if they consistently disagree, there’s a problem.

Determine how to review and adjust your observational system for the next evaluation

Here’s this section’s version of “keep at it.” Just like your program, your evaluation, including your observational system, should be evaluated and adjusted to be as effective as possible.

Now you’re ready to start collecting and analyzing data. With careful planning and good training, you should be able to get the information you need for your evaluation.

In Summary

In order to conduct an evaluation that allows you to see your program or effort clearly and to adjust and improve it, you have to have a way of collecting accurate and useful information about it. The observational system you use is the way you look at what you’re doing – at your own process, at participants’ behavior and their progress and results, at conditions that affect your effort or that your effort is trying to change – to gain the information that you’ll analyze to evaluate your work. That system has to be feasible within your resources, and has to fit with the nature of your program, so designing it is an important part of your evaluation.

The design of observational systems is best carried out as a participatory process, particularly one involving both researchers or evaluators and those who’ll do the actual data collection. That involvement will give them a clear understanding of the system itself, of what information is needed, and of the pitfalls to data collection that they might encounter along the way. The result should be a more reliable system, and, ultimately, more accurate data for your evaluation.

Stephen B. Fawcett
Phil Rabinowitz

Online Resource

Assessing Children's Physical Activity in Their Homes: The Observational System for Recording Physical Activity in Children-Home, an article written by McIver et al. (2009), provides a real-life example of observational systems in evaluations.

CDC Data Collection Methods for Program Evaluation: Observation is an article that helps users understand observation as a method for evaluation.

The University of Wisconsin – Extension provides an article on Collecting Evaluation Data: Direct ObservationThis article by Taylor-Powell and Steele extensively discuss what and how to observe in an evaluation.

The National Collaboration on Childhood Obesity Research: Measures Registry is a searchable database of diet and physical activity measures relevant to childhood obesity research. The purpose of this registry is to promote the consistent use of common measures and research methods across childhood obesity prevention and research at the individual, community, and population levels. Obesity and public health researchers need standard measures to describe, monitor, and evaluate interventions, particularly policy and environmental interventions, and factors and outcomes at all levels of the socio-ecological model.

Print Resources

Bailey, J. (1977). A handbook of applied research methods in applied behavior analysis. Tallahassee, Florida: The author, Department of Psychology. (pp. 74-126).

Fawcett, S.,et. al. (2008). Community Tool Box Curriculum Module 12: Evaluating the initiative. Center for Community Health and Development. University of Kansas.