Quality Assurance Program for Teaching

What would a quality assurance program for teaching at the higher-education level look like? We don’t have one now, nor even much to build on, but perhaps there are analogous programs we could adapt or copy. I think there are, and I will suggest an approach below.

But first: We are starting from a very modest base (not of teaching but of teaching support). As I railed in my previous post , what we have now is an incentive system under which professors are individually rewarded by retention or raises if they get adequate scores from students in surveys administered at the end of courses, surveys that do not reflect learning. I have been assured that we have departments in which high SETs are a tenure liability, but that’s too depressing to dwell on.

We have some ancillary activities, like a teaching and learning center that provides a lot of on-line resources and some training if anyone asks (and whose staff is highly informed and dedicated). Almost no-one ever does (the last event they put on at Cal drew about 40 people from a faculty of a thousand-odd, including a fair number of lecturers and staff). We require one two- or three-unit course for our GSIs (graduate student instructors, = TAs in some schools), but of course this only trains the one prof who teaches it, plus the very few of our own grad students we eventually hire. And we give an annual teaching award, for which the first hurdle is spotless SETs, with no mechanism for the winners to diffuse and replicate what they do well. There is an annual teaching seminar that meets monthly, which usually has trouble recruiting a dozen participants, so in eighty years it might reach all of us.

Several of my colleagues, at Cal and elsewhere, assert firmly that our teaching is actually very good. Our alumni are certainly in demand. I am happy to stipulate that our teaching is superb, and that teaching at Berkeley deserves A+ across the board, with a cherry on top. We are all really great teachers, bow, exeunt stage left with armfuls of flowers.

But I don’t care! No action follows from that proposition: the operational question is not whether to pat ourselves on the back some or a lot, but whether there are things that we could do that would cause enough more learning to be worth doing. If there are, and we are doing C work for our students, we should do them, and if we are doing A work, we should also do them.  Absolute-scale measures are managerially pretty much useless. When I critique a student paper draft, the advice I give about how it could be [even] better is worth about a hundred of the letter grade itself. If you still think high absolute performance is a license not to seek improvement, ask yourself whether you would fly on an airline whose maintenance principle was “if it ain’t broke, don’t fix it!”

Some other colleagues, mostly economists, believe incentives are everything: if we pay faculty enough more for better teaching, and punish them enough for bad teaching, the market will waft us to an optimum. After all, Pharaoh beat the Hebrews if they didn’t work hard, fed them if they did, and got a nice pyramid, right? Incentives do matter, but fear of firing and money rewards are not well-suited for this particular population, which operates pretty far up the hierarchy of needs. Anyway, if you can’t observe good performance (cf Philip Stark’s discussion of SETs), the workforce doesn’t know how to effect it, and they have the wrong tools, all the incentives in the world won’t work.

Finally, there is assuredly a production possibility frontier in research-teaching space. It slopes down monotonically and it is concave to the origin. If we were on it, any improvement in student learning would be at the expense of some research productivity (still might be worth it, but that’s a tough sell). But this is another misuse of good economic theory, like thinking a market equilibrium is where the world is rather than where it’s always trying to grope towards. As I learned from Bob Leone, one of the real live paid professional economists who have taught me so much good stuff, no real organization is ever at its PPF for any pair of output measures, and if it were, it would not be next week, as the PPF moves outward with organizational learning and technological advance. Indeed, organizations without good QA systems are always quite far from their PPF.

The wise manager assumes she can move up or to the right or both, and is almost always correct; the foolish manager assumes she is on the PPF and wanders back and forth along where she thinks it is, like the tiger pacing along remembered cage bars.

Let’s start where college faculty should be comfortable: we have a highly developed QA system for research that has, by near-universal agreement, made our research the wonder of the world and getting better all the time. The way that goes is that we

  • Collaborate on papers and projects,
  • Read each other’s work and cite it carefully in our own,
  • Seek out experts and advice, for example on methodological issues, and
  • Coach each other in institutionalized ways, like journal prepublication reviews and conference presentations.

The coaching is detailed and multidimensional: when I review an article I’m usually asked to make a coarse summary judgment like “publish/revise & resubmit/outer darkness” but that’s the least useful part of the process for the author and for me, and I always write a detailed critique. Both the author and I improve our practice through this coaching.

There is some measurement, like impact scores and citation indices, but I don’t think any of us would substitute that in a tenure review for actually reading stuff someone wrote, and there is no solid quantitative research to prove that this or that research methodology is best, or that this or that type of collaboration or critique is good and another bad. Yet we plug along doing it, and research gets better and better.

This template suggests some practical options for teaching:

  • Collaborate on curriculum and co-teach courses
  • watch each others’ students learn, visiting classrooms and reviewing assignments and syllabuses
  • Seek out research-based teaching expertise and knowledge, and
  • Coach each other [note: not, generally, have a “master teacher” grade junior colleagues at tenure time; coach each other. Research is full of 360 degree review.]

Existing QA for pedagogy, at least in higher ed, has none of those things. None. Industrial QA is built in large part on watching each other work and talking about what we see (the hot idea in coding now is to do it in pairs, one person typing and the other watching while they talk about what they’re doing). It certainly works for Google and Toyota, and management is like teaching in so many respects…

Now that I think of it, people in every high-performance profession, from musicians to scientists to writers to fighter pilots, flock so they can help each other get better, and they have done so since forever. They spend very little time grading each other but a lot of time talking shop about why and how this or that is considerable. J.D. Salinger lived alone in a cabin in New Hampshire, but he’s an uninformative exception; if you want to find writers, they’re in cafés in New York and London and San Francisco, and astronomers are at conferences and schmoosing in the common room, not sitting alone on mountaintops. Every high-performance profession, except college professors in teaching mode!

Of course, it could be that research would have advanced much faster if everyone did it alone and we did it all with incentives, paying profs according to citation indices (note collaboration sneaking in here already), or patent revenues/book royalties. Does anyone believe that?

Deming’s fourteen points are full of good guidance, though his emphasis on uniformity and consistency needs special handling in a service industry where diversity in the product is a feature and not a bug. Deming also specifically cautions against things like rewarding individuals for success, because you will mainly be rewarding random variation, you will destroy team morale, and you will set the winner up for resentment and disillusion when regression towards the mean takes hold next year.

Chris Argyris and Donald Schön had useful insights about organizational learning. My favorite is the instruction to interrupt learned behavior and force attention to it, and the first behavior I would interrupt is our focus first, on curriculum and second, on teacher behavior, never getting to what students are doing even though that’s where the learning is happening. Of course, that can’t even start until management puts us, kicking and screaming, into a room together to talk about learning in the first place. Drive out fear.

What Deming, a statistician (with the soul of a psychologist and the calling of a prophet), really likes is measuring stuff. Not to pick winners and losers, but to understand processes and identify excursions either way that lead to learning if examined. What can we measure about learning?

Well, we can give examinations and look at test score improvement. There is a lot wrong with this, perhaps the subject of another post, but I want to stay away from the question of what I think good teaching practice is; the point of a QA program is to learn exactly that (including learning it from real research by others). We could also distinguish between (i) the average learning of a class and (ii) the grade each individual student deserves, and assess learning by taking a sample of the students and giving them an oral exam that doesn’t count towards a grade, maybe even paying them for their time. In some cases, we can assess learning by performance in follow-on courses. Wieman’s group has developed and qualified standard examinations, for use before and after an introductory course, to assess learning in physics and chemistry. And of course, SETs provide lots of useful information if we use them properly.

I don’t have any good ideas about what to measure in Theory T – lecture style – teaching, mostly because I’ve stopped doing it to students. But in a “flipped-classroom”, active learning, Theory C environment, there are all sorts of things to measure that could be illuminating. For example, I try to get a TA to sit behind the class with a stopwatch for a few sessions and record what fraction of the class time I am talking, and what fraction students are. Then I put a graph of the results up on the screen and invite the students to discuss what we are doing, note a trend, etc. Getting them to debate the optimal value of this indicator is pays off nicely (no, I don’t know what it is; if I did I would just tell them). Other promising measures that GSIs (or a visiting coach) could observe include:

  • The average number of students who speak between interventions or comments by the prof,
  • The average length of a student contribution,
  • The fraction of the class that speaks during a session,
  • the time the prof waits after posing an interesting question before giving a hint,
  • The number of times students directly address each other,
  • the variance over different students in the average number of days between contributions , and
  • The average number of hands in the air at any moment.

As usual, variance in these measures is worth ten of absolute value, and trend is worth twenty.

That’s just for class time; there’s lots we could measure about assignments, critique of student work, etc., all with an eye to improving our understanding of what we are doing, why, and what we could do differently. What’s the most important part of this? No question in my mind, it’s breaking the crippling isolation in which we work.

We are trapped in it by our instinctive misunderstanding of where the PPF is in an environment where research will always be non-negotiable, and by the fear I described in the previous post. But fear can be driven out, and QA in the present context has the advantage that the work force is enormously curious and dedicated. We just need institutional norms and routines that open some windows across the airshaft, and those are the duty of leadership.

Author: Michael O'Hare

Professor of Public Policy at the Goldman School of Public Policy, University of California, Berkeley, Michael O'Hare was raised in New York City and trained at Harvard as an architect and structural engineer. Diverted from an honest career designing buildings by the offer of a job in which he could think about anything he wanted to and spend his time with very smart and curious young people, he fell among economists and such like, and continues to benefit from their generosity with on-the-job social science training. He has followed the process and principles of design into "nonphysical environments" such as production processes in organizations, regulation, and information management and published a variety of research in environmental policy, government policy towards the arts, and management, with special interests in energy, facility siting, information and perceptions in public choice and work environments, and policy design. His current research is focused on transportation biofuels and their effects on global land use, food security, and international trade; regulatory policy in the face of scientific uncertainty; and, after a three-decade hiatus, on NIMBY conflicts afflicting high speed rail right-of-way and nuclear waste disposal sites. He is also a regular writer on pedagogy, especially teaching in professional education, and co-edited the "Curriculum and Case Notes" section of the Journal of Policy Analysis and Management. Between faculty appointments at the MIT Department of Urban Studies and Planning and the John F. Kennedy School of Government at Harvard, he was director of policy analysis at the Massachusetts Executive Office of Environmental Affairs. He has had visiting appointments at Università Bocconi in Milan and the National University of Singapore and teaches regularly in the Goldman School's executive (mid-career) programs. At GSPP, O'Hare has taught a studio course in Program and Policy Design, Arts and Cultural Policy, Public Management, the pedagogy course for graduate student instructors, Quantitative Methods, Environmental Policy, and the introduction to public policy for its undergraduate minor, which he supervises. Generally, he considers himself the school's resident expert in any subject in which there is no such thing as real expertise (a recent project concerned the governance and design of California county fairs), but is secure in the distinction of being the only faculty member with a metal lathe in his basement and a 4×5 Ebony view camera. At the moment, he would rather be making something with his hands than writing this blurb.

15 thoughts on “Quality Assurance Program for Teaching”

  1. Small point: Pair programming is starting to decline in popularity….but it’s being supplanted by even more collaborative agile methods, so your basic point still stands. And for what it’s worth, my institution (not even close to being in Berkeley’s league) has much the same issues, incentives that are either lacking or perverse, etc.

  2. Of course, even with this approach, you are still measuring inputs. I’m not sure how to do this, but shouldn’t there be some effort at measuring how well the students actually learn? Ultimately, that’s one of the really hard problems in measuring the quality of teaching: what exactly are you trying to measure?

    In the primary and secondary schools, at least there is some presumption that most students need to learn the same basic core, so it should be possible to make tests for proficiency gains, at least in principle. But at the college level, especially in optional or distribution requirement courses, which each student has a choice of what to take, how exactly do you show that the teaching was effective? Make some kind of cross-university standardized test per subject area?

    I applaud you for trying to find answers, and wish you luck.

  3. I would definitely fly on an airline whose policy was “if it ain’t broke, don’t fix it” as long as their other policy was “determine before every flight whether or not it’s broke.” In fact, I think it’s safe to say that those ARE most airlines’ policies. 🙂

    1. Any airline following that policy would be grounded quickly. I’m pretty sure commercial
      airliners have an extremely strict maintenance schedule, which has to be performed
      and checked in a way which creates a complete audit trail. And that would include
      replacement of various components after a certain number of operating hours or a certain
      number of takeoffs and landings, regardless of whether they’re “broken” or not.

    2. No you wouldn’t, because you would be assuring that when anything broke, it would break in midair. What RichardC says. Parts are swapped out routinely on strict statistical MTF schedules. Airlines update components and practices that are working fine when something better comes along, like all the business class seats on several airlines recently. The adopted computerized reservations while the old paper system worked. In the US southwest deserts there are enormous parking lots
      http://www.dailymail.co.uk/sciencetech/article-2336804/The-great-aviation-graveyard-New-aerial-images-hundreds-planes-left-die-American-deserts.html
      for whole planes that flew in just fine and are not broken, but have been idled in favor of better equipment.

    3. Well, there’s conjecture, and then there’s FAA regulation.

      In the USA, aircraft airworthiness inspections must be performed every 100 flight hours for commercial airliners, per FAR 91.409. Every airliner in operation must maintain a valid Certificate of Airworthiness showing that it has passed inspection. “If it ain’t broke don’t fix it” is fine as long as everything is regularly inspected to make sure it ain’t broke, or showing signs of poor condition (see FAR 43 Appendix D at the first link).

      These regulations don’t require “replacement of various components after a certain number of operating hours or a certain number of takeoffs and landings, regardless of whether they’re “broken” or not”, nor do they cover “updat[ing] components and practices that are working fine when something better comes along”, that’s just good business sense, as is grounding perfectly good aircraft in massive junkyards because they have been replaced by much more efficient aircraft.

      1. Freeman, interesting, but your explication of FAA regulations is incomplete. Yes, airliners require regular airworthiness inspections, but that is not the only inspection and maintenance they require. LOTS of components on aircraft have specific Times Between Overhaul, and for commercial operators TBOs are absolutely mandatory. That means that part must come out. (It does not necessarily imply new replacement, but it does require that the part be brought back to nominal specifications.) Moreover, the FAA frequently issues Airworthiness Directives as issues with aircraft are discovered. Some ADs require special inspections and/or replacements after a certain number of hours and sometimes even immediate replacement without inspection.

        Also, if you are operating an aircraft for scheduled passenger service you must contend with the totality of Part 121, which is voluminous, and which includes gems like the one below, directly applicable to this article (unlike most of this comment thread).


        §121.373 Continuing analysis and surveillance.
        (a) Each certificate holder shall establish and maintain a system for the continuing analysis and surveillance of the performance and effectiveness of its inspection program and the program covering other maintenance, preventive maintenance, and alterations and for the correction of any deficiency in those programs, regardless of whether those programs are carried out by the certificate holder or by another person.

        (b) Whenever the Administrator finds that either or both of the programs described in paragraph (a) of this section does not contain adequate procedures and standards to meet the requirements of this part, the certificate holder shall, after notification by the Administrator, make any changes in those programs that are necessary to meet those requirements.

        (c) A certificate holder may petition the Administrator to reconsider the notice to make a change in a program. The petition must be filed with the FAA certificate-holding district office charged with the overall inspection of the certificate holder’s operations within 30 days after the certificate holder receives the notice. Except in the case of an emergency requiring immediate action in the interest of safety, the filing of the petition stays the notice pending a decision by the Administrator.
        Airworthiness by the FARs is a pretty broad legal concept. Normal regular airworthiness inspections are part of it, by no means all of it.

        1. Thanks for clarifying, David. Your explication of the A&P side of aircraft maintenance is indeed more complete than the one I offered. My particular expertise is on the avionics side, which is probably why I consider routine maintenance stuff that involves bringing a part back to nominal specifications to be akin to fixing something that’s broken, even though the component may have been functioning just fine and most people would probably not consider it “broken”. We document all discrepancies and corrective action just the same whether out-of-spec (or even too close to it) or malfunctioning, and in my mind it has become pretty much the same thing after 30 years of the routine. Assuring tight adherence to specifications through a regular regimen of inspection, readjustment/replacement, and recertification is how we minimize the incidence of things breaking down in midair. But you’re right — it all involves quite a bit of fixing things that ain’t actually broke when you think about it that way.

          I gratefully concede the point. Thanks for the schoolin’!

          1. David is a graduate of the Goldman School. You argue with our alumni at your own peril, Freeman :-).

          2. Yes, Michael, well after all I’m just a simple midwestern farm boy, a graduate of the Autodidactism School, so I takes my schoolin’ wherever I find it. 😉

          3. No problem. I hope I wasn’t rude. I am not an A&P but a private pilot and I’ve had some experience helping various flying clubs I’ve been a member of remain compliant with their responsibilities, but that’s all Part 91 and comparatively simple as you know. Anything I might know or not know outside that realm is pure aviation geekery. 🙂

          4. Dave J: Rude? Not in the least. I’m always delighted to be enlightened. Trust me, I can hold my own when it comes to an actual argument, but in this case I found myself in agreement with you.

            The “simple midwestern farm boy” bit was directed at Michael. I’m still a bit peeved with him over a recent post that I took offense to. Hopefully he has continued to follow the story and is able to see us as more than a “bunch of curdled Babbitts [who] serve[] up all the teenage girls in town as sexual toys, like a box of candy, to louts who entertain the good people playing football”.

  4. It would be easy to start with online courses, which are very accessible to peers. Of course, they are limited to an impoverished subset of types of learning interaction. For that reason, they may be handier for developing the tools of the O’Hare revolution.

    Intensive peer review of teaching is time-consuming and may not be feasible for all courses. But if you did it for some, these would become the gold standard for the teaching component of tenure review. It would become impossible to get tenure with an empty record of peer-reviewed teaching. The process is far more important than the score.

    Mo’H: “.. people in every high-performance profession, from musicians to scientists to writers to fighter pilots, flock so they can help each other get better ..”
    Darwin is a possible counterexample, after 1850 or so. But his progressive physical isolation in the country coincided with Rowland Hill’s radically improved postal service. Darwin like most other intellectuals of the period, wrote thousands of letters, to other biologists and working-class pigeon-fanciers. They worked in a rich virtual community.

Comments are closed.