Today, the development of the effective system of testing and assessment of skills and professional level of health care professionals is crucial since the progress of medicine and technologies poses new challenges health care professionals should be able to overcome. In this respect, it is also necessary to remember about the great significance of the confidence of patients in health care professionals because patients cannot rely on health care professionals whose qualification is under a question. Moreover, it is necessary to remember about the increased role of patients’ health and the increased responsibility of health care professionals because, in the contemporary health care system, the health and well-being of patients is the primary concern of health care professionals.
In such a situation, it is quite natural that specialists (Norcini, 1998) attempt to develop various systems of testing and evaluation of the qualification and professional level of health care professionals starting from the medical education. However, it should be said that, at the moment, it is hardly possible to find a universal test which can be considered perfect. As the matter of fact, there is still no ideal or perfect test, which allows absolutely precise assessment and testing of skills and qualification of health care professionals.
Nevertheless, there are various testing systems, which are recognized by many professionals and which are widely implemented in the contemporary medical education. In this respect, it is worth mentioning Objective Structured Clinical Examination (OSCE), which is one of the most widely-spread systems of testing of the qualification of students studying medicine as well as working health care professionals. It is worth mentioning the fact that OSCE was developed in the second half of the 20th century, in the 1970s and soon OSCE has gained the recognition of specialists and got started to be used in the medical education.
Many specialists marked the high reliability and effectiveness of OSCE as a system of testing of health care professionals. In actuality, specialists (Kane et al., 1989) mark a number of advantages of OSCE compared to other systems of testing, including the high level of standardization and fair peer comparison, and a number of others. On the other hand, it is necessary to underline the fact that there is an opposite view on OSCE, according to which OSCE is not as effective as it may seem to be and, in actuality, it has a number of drawbacks.
In such a situation, it is necessary to research in details the essence of OSCE, analyze its basic advantages and drawbacks and validity of OSCE in order to adequately assess the extent to which this system of testing is reliable and effective.
The essence of OSCE, its advantages and disadvantages
In fact, OSCE is a modern type of examination which is basically used in health sciences, including medicine, physical therapy, nursing, to test clinical skill performance and competence in skills such as communication, clinical examination, medical procedures, prescription, exercise prescription, joint mobilization, manipulation techniques, and interpretation of results of the testing. In such a way, it is obvious that the scope of application of OSCE can be very large and involve various branches of modern health sciences. At the same time, the fact of such a wide application of OSCE in the modern science indirectly indicates to the fact that health care professionals are confident in the reliability of the test and this is why it is widely applied in medical education.
In this respect, it is worth mentioning the fact that the wide spread of OSCE in the contemporary health science and medical education can be explained by the fact that OSCE has been implemented successfully for many years and, what is more, this examination has been already researched in details. At any rate, today, it is possible to find a lot of scientific works, studies and researches dedicate to the study of OSCE. Basically, scientific researches evaluate not only strength and weaknesses or advantages and drawbacks of OSCE, but also its reliability and validity.
At the same time, it is important to remember about the fact that the use of new systems of testing and examination often encounters a strong opposition from the part of educators and health care professionals because such systems, being new, can be unreliable. In fact, new testing and examination systems are, as a rule, under-researched.
Therefore, they can hardly be applied in medical education because educators are uncertain about effects of these systems of testing and examination not only in relation to students but also in relation to patients which can be involved in the process of examination or testing based on new, experimental systems. In such a context, OSCE naturally seems to be more reliable because this examination system is researched in details and does not evoke serious opposition, though there exists criticism of OSCE as an imperfect testing system.
In actuality, OSCE is comprised of a series of stations through which all candidates rotate on a timed basis. In each station, a candidate is faced with a simulated task or problem, which he needs to solve. The candidate is required to perform specific functions to complete the task or address the problem. It is worth mentioning the fact that OSCE stations can be interactive or non-interactive. Interactive stations will typically involve use of “standardized patients”¯, which are actors who have been specially trained to portray patients with specific medical conditions or drug-related problems. Or, alternatively, interactive stations can involve the use of “standardized clients”¯, which are actors or other health care professional who have been specially trained to portray allied health care professionals in an interdisciplinary health care context. A candidate in an interactive station is observed and assessed by a trained examiner using a standardized marking key (Carpenter, 1995). Non-interactive, or quiet, stations typically are written responses to tasks or problems and involve no direct observation and assessment.
As a performance-based tool, OSCE has advantages over other forms of assessment, such as multiple choice tests. In this respect, it should be said that OSCE provides ample opportunities for testing and assessment of students’ professional and practical skills and abilities as well as their professional knowledge in real life situations which are created in terms of the examination. The interactivity of OSCE makes the examination really close to the real life situations which is extremely beneficial for students. First of all, the interactive approach, which can be applied in terms of OSCE, prepares students to work in the real health care services environment where they have to deal with real patients and solve their health problems without any external assistance. In this respect, it is very important that OSCE provides students with the possibility of communication with actors-patients. This fact is of the utmost importance since communication process is very significant for health care professionals and their patients, especially for beginner health care professionals, who are often inexperienced in the communication with patients and cannot always apply their theoretical knowledge and professional skills in the real life. It proves beyond a doubt that real patients will describe their health problems in a different way than medical students get used to, while they studied different branches of medicine. In such a situation, young health care professionals can be simply confused and have significant problems with diagnosing health problems of their patients, if they have never had such an experience of interactive testing, which they get during OSCE.
In addition, it is necessary to take into consideration the fact that OSCE allows students to acquire important experience in regard to psychology of patients since communication with actor-patients contributes to the better understanding of human psychology. At any rate, students learn to take into consideration the factor of psychological impact on the behavior of patients. At the same time, educators that assess the work of students can evaluate the extent to which students are prepared to work in real life situations, communicate with patients and interact with their colleagues and, what is more take decisions independently.
In such a way, due to the interactivity of OSCE, this examination system allows assessing not only professional knowledge and skills of students but it also provides ample opportunities to assess and evaluate the communication skills and abilities of students, which are often very significant in the work of health care professionals. In such a way, OSCE proves to be beneficial for both students and educators since it enlarges students’ experience and allows educators to assess students more objectively and adequately.
In addition, it is worth mentioning the fact that communication and interpersonal skills, ethical and professional judgment, and complex ethical problem identification and resolution skills may be assessed more effectively and efficiently through a well-designed OSCE than through other testing methods. Experiences in different health care professions have established that reliability and validity of a well-constructed and implemented OSCE that is another considerable advantage of OSCE compared to other testing methods.
Furthermore, it should be said that the use of OSCE in high-stake settings, such as certification examinations and maintenance of competency reviews, has demonstrated their value in assessing clinical competency. For instance, the Medical Council of Canada has used an OSCE in its entry-to-practice examinations since 1994 (Woodburn and Sutcliffc, 1996). Different health professions, such as chiropractic and chiropody, have also introduced OSCE in both undergraduate education and professional education. In such a way, OSCE becomes a truly universal method which is applied in medical science and education.
On the other hand, it is worth mentioning the fact that OSCE was not used extensively in some fields of medical science. For instance, in pharmacy OSCE was not used extensively because of the high costs and difficulties associated with developing and administering this form of assessment. Nevertheless, in recent years, OSCE has become a commonly used system of testing and its significance and effectiveness are recognized by many specialists working in the medical science and health care system (Martin et al., 1996).
At the same time, it should be said one more advantage of OSCE is its objectivity, which is apparently of the utmost importance in the context of medical education because errors are absolutely unacceptable in this field since they can lead to extremely dangerous outcomes. Obviously, subjective assessment of students’ skills and abilities can eventually result in the high risk of errors, which can be made by students in their future professional careers if their professional skills and abilities were not assessed objectively and overestimated, for instance. Speaking about the objectivity of OSCE, it is important to underline the fact that the high degree of objectivity of this testing is provided due to the use of a detailed mark scheme and standard set of questions. For instance, a station concerning the demonstration to a simulated patient on how to use a Metered dose inhaler would award points for specific actions which are performed safely and accurately, when a candidate explains to a patient the need for a seal around the mouthpiece, for instance (Singer et al., 1996).
On analyzing the procedure of OSCE, it is possible to single out one more advantage of this testing methods compared to other methods. To put it more precisely, OSCE normally consists of several short stations, in which each is examined on a one-to-one basis with an impartial examiner and either real or simulated patients. In actuality, this procedure is considered to be a substantial improvement compared to traditional examination methods because the stations can be standardized enabling fairer peer comparison and complex procedures can be assessed without endangering patients’ health (Cohen et al., 1996).
Nevertheless, in spite of obvious advantages of OSCE compared to other examination methods, OSCE is still susceptible to substantial criticism. In this respect, it should be said that critics of OSCE (Streiner and, Norman, 1995) argue that its stations can never be truly standardized and objective in the same way as written exam. At this point, it is possible to argue that such criticism is mainly supported by conservative part of educators and health care professionals who stand for conventional methods of examinations. Nevertheless, still this criticism is justified because it is really impossible to substitute totally written exams by OSCE. In addition, OSCE has been known for different patients-actors to afford more assistance, and for different criteria to be applied. In such a context, it is hardly possible to speak about the absolute objectivity of OSCE, especially compared to written forms of examination.
Furthermore, another point for criticism in regard to OSCE is the fact that it is not uncommon at certain institutions for members of teaching staff be known to students, and vice versa, as the examiner. Naturally, this familiarity does not necessarily affect the integrity of the examination process, although there is a deviation from anonymous marking. In such a context, it is possible to admit the possibility of subjective assessment of students in the course of OSCE. At any rate, this examination method is not absolutely objective, as supporters of OSCE attempt to present this examination method (Hodges et al., 1997).
At the same time, it is possible to find some more drawbacks of OSCE, which, though, are very substantial and can affect the quality of the assessment and the examination itself. In this respect, it should be said that different students are given different patients with different presenting problems (Martin et al., 1996). In such a way, there is a room for unequal complexity of examination. What is meant here is the fact that some students can have patients which can explain the problems which presumably disturb them more clearly than other patients, simply because of their better communication skills. In such a situation, it will be easier for a student to identify the health problem, diagnose it and actually solve the problem and perform his or her functions well. On the other hand, other students can have problems with understanding the essence of the health problem of patients, if the patients are less skillful in communication and therefore their presentation of their problem may be vague even though the actual professional level, knowledge and skills of students are equal, for instance. As a result, it is impossible to estimate that all students are in absolutely equal position during OSCE. At this point, OSCE is quite different from conventional written testing, which actually poses all students in the absolutely identical position when they have to do standardized tests which are identical for all students.
Finally, returning to the problem of subjectivity of the assessment of students in terms of OSCE, it is also worth mentioning the fact that there is a risk of examiner’s subjectivity resulting in inter-examiner variation in the assessment of the same performance.
Nevertheless, it should be said that, in spite of significant criticism, the aforementioned drawbacks of OSCE can hardly outweigh its advantages.
The reliability and validity of OSCE
It proves beyond a doubt that any examination method should be reliable and valid. Naturally, OSCE also needs to be valid and reliable. In order to assess the validity and reliability of OSCE, it is possible to refer to the study conducted by Jean Wessel and other researchers, entitled “Reliability and Validity of an Objective Structured Clinical Examination for Physical Therapy Students”¯ (2003).
The purpose of this study was to examine the reliability and validity of an objective structured clinical examination (OSCE) for students in the first year of a 2-year physical therapy program. Forty-eight students were examined at eight stations in one of two duplicate OSCE circuits. The stations evaluated skills required for the management of persons with chronic musculoskeletal conditions. At each station, students were required to interact with patients, demonstrate techniques, or interpret observations. Checklists were used to score all stations, which had equal values.
The associations among stations were examined using Pearson’s correlations and Cronbach’s a. A two-way analysis of variance (ANOVA) with repeated measures was used to determine differences among stations and between circuits. The validity of the OSCE was evaluated by determining the correlations between scores on the stations and performance in a subsequent clinical practicum. Correlations between stations were r = -0.14 to 0.33 and between stations and clinical performance were r = -0.28 to 0.27. Cronbach’s [alpha] was 0.48. The ANOVA revealed significant differences among stations (F = 62.6, p = 0.000) but not between circuits (F = 1.8, p = 0.185). There was no significant interaction between circuit and station (F = 1.1, p = 0.3.56). There was poor internal consistency of the OSCE, and it did not predict clinical performance. Further research is required to determine if a larger number of stations can reliably and validly assess clinical skills of physical therapy students. J Allied Health. 2003; 32:266-269.
Objective Structured Clinical Examinations (OSCEs) presently are being used to evaluate clinical skills of many different health professional groups. The OSCE comprises a series of timed stations at which students/practitioners are evaluated on clinical skills. At each station, the examinee is asked to perform a specific task, such as taking a history or performing part of an assessment or treatment. The “patients”¯ are standardized patients, individuals who are trained to perform a specific and consistent role. Written stations may be included to evaluate the students’ interpretation and application of findings or their ability to plan further assessment or treatment. Scoring of students’ performance is done by means of checklists of behavior considered necessary to perform the evaluated task or by global rating scales.
The results of OSCEs are being used in physical therapy to make decisions about licensing,1 pass/fail in a course, and readiness of students to begin a clinical practicum.5 To be confident about these decisions, the examiners would want an instrument that is internally consistent6 and that can predict successful clinical performance. Results from the medical literature7,8 and one physical therapy study9 reveal that correlations between OSCE stations are generally low. Twenty stations may be required to achieve an acceptable internal consistency.7,8 In comparison, variability due to different raters and standardized patients is low.3,7 Validity of the OSCE has been supported by the findings that medical students and residents perform better on an OSCE as they progress through their programs.7
The purpose of this study was to examine the reliability and validity of an OSCE used to evaluate first-year physical therapy students. Reliability was evaluated by comparing two duplicate circuits of the OSCE and measuring internal consistency and correlations between specific stations. Validity was assessed by determining if there was any correlation between students’ OSCE scores and their subsequent performance in a clinical placement.
Subjects were 48 students completing the first year of a 2-year, post-baccalaureate physical therapy program. Each year of this program consisted of three units (semesters) of study over a 12-month period. In each unit, the students took three courses (problem-based tutorial, clinical skills laboratory, and inquiry seminar), followed by a clinical practicum. Students in this study already had finished the first two units, which covered physical therapy management of musculoskeletal conditions of the spine and extremities. They were in unit three, which focused on chronic musculoskeletal conditions. At the end of the 8-week academic component of the unit, the students took part in an OSCE that was part of the final evaluation for the clinical skills course. It was the first such examination taken by these students. The students went on to complete a 6-week clinical practicum.
There were 16 male and 32 female students in the study. Before entering the physical therapy program, all students had an undergraduate degree in another field; kinesiology/physical education, 27 (56%); science, 11 (23%); arts, 7 (15%); other, 3 (6%). The study was approved by the Educational Program Ethics Committee, and students gave their consent to use the results of the OSCE and their clinical practicum evaluation.
For the OSCE, the students were assigned randomly to one of two circuits. The eight stations in one circuit were identical to the stations in the other except for the examiner and the person playing the standardized patient role. At each station, students were required to interact with standardized or real patients (interactive stations), show techniques on models, or document and interpret observations from slides or videos. Checklists were used to score all stations. At the interactive stations, examiners evaluated the students’ safety and ability to communicate with the patient and the specific clinical skills. Each station lasted 5 minutes.
The OSCE was designed to evaluate clinical skills that were learned in the unit; that could not be evaluated easily in a written examination; and that covered communication, assessment, and treatment skills. The scenarios and checklists for each station were developed by consensus by the two course instructors, considering the feedback of students, faculty, and clinicians who had participated in unit 3 OSCEs in previous years.
The evaluators were practicing physical therapists. Before the OSCE, they were instructed on how to use the checklists for their stations and told to add comments if they were unsure how to score an item. The two therapists and two patients at the identical duplicate stations practiced together to standardize further the patient simulations and the use of the checklist. The total possible score for each station was 12.
The Physical Therapist Clinical Performance Instrument (CPI)10 was used to evaluate the students’ clinical performance in the subsequent clinical placement. This instrument consists of 24 items, each of which is graded on a 100-mm visual analogue scale with a left anchor of novice clinical performance and a right anchor of entry-level performance. The “final”¯ score used in the present study was the mean of all items that were graded by the therapist supervising the student’s clinical practicum. That is, if the clinical supervisor considered an item not applicable to the student, it was marked as such and was not considered in the final score. In a study of 44 physical therapy academic programs, the CPI showed good interrater reliability (r = 0.87), internal consistency (Cronbach’s [alpha] = 0.97), and construct validity.
The clinical supervisors who evaluated the students had been trained in the proper use and scoring of the CPI. In addition to the overall mean score, scores on some individual items of the CPI were used in the analyses because the items evaluated skills assessed in the OSCE. The selected items were (1) safety, (6) communication, (11) physical therapy examination, (12) interpretation of findings, (14) performance of interventions, and (15) education of others.
Pearson’s coefficients were used to examine interstation correlations of the OSCE, whereas Cronbach’s [alpha] was used as a measure of internal consistency. A two-way analysis of variance (ANOVA) with repeated measures (circuit versus station) was used to determine differences among stations and between circuits. The Shapiro-Wilks statistic indicated that the distributions of the mean CPI score and five of the six selected items were significantly different from a normal distribution. Spearman rank correlations were calculated to analyze the relationships between OSCE and CPI final and item scores.
On analyzing the results of the research it should be said that the ANOVA revealed significant differences among stations (F = 62.6, p The mean of the final score for the CPI is listed in Table 4 along with the mean scores for the selected items. The low interitem correlations and Cronbach’s [alpha] (0.48), and the significant station effect of the ANOVA all indicated that there was poor internal consistency in this OSCE. The lack of a significant interaction or circuit main effect supported the reliability of the examiners and standardized patients. These findings are similar to findings reported for medical and nursing students3,5 and for physical therapy students performing specific musculoskeletal testing.
In the present study, the OSCE included three written stations where students were required to interpret observations or physical findings. To see if inclusion of written stations affected internal consistency (as suggested in the medical literature), the means were lower and the variance greater for the written stations, but correlations between stations of a similar type (interactive or written) were not higher or lower than correlations between stations of a different type. It seems that the written stations were more “difficult”¯ than the interactive ones, but did not greatly affect an already low internal consistency.
The inability of the OSCE to predict clinical performance may be due to characteristics of either the OSCE or the CPI.
The OSCE had only eight stations arid was just one method of measuring clinical competence. Perhaps a greater number of stations would increase the internal consistency and the predictive validity of the OSCE. The CPI measures global clinical performance and includes many aspects of “professional behavior”¯ that are not examined easily in an OSCE format. Researchers found only six CPI items that addressed skills assessed on the OSCE. These items still failed to correlate significantly with the OSCE scores.
The results might be different with senior students or graduates. The OSCE was the first one performed by the students in this study. Many commented on their anxiety during the OSCE and believed that their performance did not reflect their abilities accurately. They had completed only one major clinical practicum before this OSCE, and had not yet started the clinical practicum related to this academic unit.
Conclusions of the research reveal the fact that, although there was consistency between duplicate circuits of the same stations in this OSCE, the internal consistency was low, and the total score was unable to predict subsequent clinical performance. An eight-station OSCE should not be used in isolation to make decisions about clinical competence.
Thus, taking into account all above mentioned, it is possible to conclude that OSCE is a widely-spread examination method which is applied in different fields of the medical science. At the same time, OSCE proved to be very effective and efficient method of examination of students. On the other hand, it is necessary to remember about the fact that this method cannot be viewed as ideal or perfect since along with advantages it has considerable drawbacks. On the one hand, OSCE prove to be very helpful for examining practical skills and abilities of students, their ability to solve complex problems, communicate with patients, etc. On the other hand, the subjectivity of OSCE may be questioned, while, compared to conventional written tests, OSCE seems to be less standardized and objective. This means that, regardless of numerous advantages of this examination method, it cannot totally replace conventional methods of the assessment of students’ skills, abilities and professional knowledge. This is why it is possible to recommend using OSCE in a combination with other examination methods in order to eliminate the risk of error in the process of assessment and to have a possibility of a larger and, therefore, more objective assessment of medical students.
Nevertheless, OSCE is quite reliable and valid examination method and the currently wide use of this examination method is justified.