Traits and management

K-12 education has been convulsed for years by the idea that good teaching is a trait, a tacit justification for all the versions of the loony idea that we can increase learning by just finding the ‘bad teachers’  and firing them. The latter scheme looks even better if “finding” employs a bureaucratic, mechanistic process of testing students (on things that can be measured “objectively”-bye-bye art, music, creativity, and courage). The alternative idea is that people with widely varying intrinsic qualities, or starting points, can all learn to be better teachers.  Both are obviously correct to some degree; at the time they get control of the chalk, some people have better “teacher traits” than others, and it must also be the case that practice, training,  and coaching can improve anyone’s performance at this job, like all others.  But the relative weight placed on trait and learning theories of effectiveness matters a lot.

Administrators and politicians love what I call immaculate corrections, schemes like student testing for teacher promotion, that excuse managers from all the heavy lifting of retail attention to what subordinates and customers are actually doing and why they do it.  If you can couple  impersonal performance assessment with a theory of motivation that puts greed (for a money raise) and fear (of dismissal) in play, and delegate the implementation labor to people who aren’t on your payroll and can’t defend themselves against having their time wasted (the students), it’s a hat trick.  The only defect of a scheme like this is that it doesn’t deliver much value in the classroom (or wherever), but that’s a feeble weapon with which to confront an internally consistent and theoretically beautiful construct that lets managers out of doing a lot of real work.

Alison Gopnik’s WSJ column has more on the costs of using the trait model, retailing this recent paper [paywall]: people in academics who believe traits count for a lot seem to (i) gather in particular disciplines (ii) have a lot of trouble engaging women and African-Americans as peers, presumably because they also wrap up familiar stereotypes about what kind of people are (intrinsically) smart. Gopnik:

Professors of philosophy, music, economics and math thought that “innate talent” was more important than did their peers in molecular biology, neuroscience and psychology. And they found this relationship: The more that people in a field believed success was due to intrinsic ability, the fewer women and African-Americans made it in that field.

This should be sort of a bombshell, but it’s been  a busy few weeks. We’ve known for a while that the student evaluations of teaching we use at Cal-to the near-exclusion of anything else-for promotion and tenure decisions don’t have much to do with student learning. Indeed, our administrative higher-ups are reflecting deeply on the fell implication that maybe we should (i) do more observation and coaching with an eye to actually improving teaching before review time, when it could actually be useful, and (ii) evaluate teaching for promotion in some way that actually indicates whether students are learning.  Of course, both of these involve actual work, while SETs produce numbers (which must be Data, right?) and don’t cost us (faculty) anything to obtain, so it’s a tough call.

This call has got a lot tougher with the appearance of the first study known to me [HT: Philip Stark] in which students could register their evaluations without knowing the actual sex of the instructor, using an on-line course in which the same teacher presented as a male and as a female, and hooboy:

Students in the two groups that perceived their assistant instructor to be male rated their instructor significantly higher than did the students in the two groups that perceived their assistant instructor to be female, regardless of the actual gender of the assistant instructor….For example, when the actual male and female instructors posted grades after two days as a male, this was considered by students to be a 4.35 out of 5 level of promptness, but when the same two instructors posted grades at the same time as a female, it was considered to be a 3.55 out of 5 level of promptness.

Hard to imagine anything more traity than sex, mmm. There’s more (a colleague reminded me of this about a minute after this post went up; click on the link at the top of the story) and stuff like this anyway needs to be considered against the background of the crap women put up with every day, at work, at school, and on the street.

So the same teaching practices will get a woman significantly lower student evaluation scores than a man.  Could this be true for minorities…how could it not?  I think this study-assuming of course that contrary findings don’t emerge from similar experiments-is a beacon to personal injury lawyers and every woman prof (at least; stay tuned for the experiment in which Phyleesha and Felice are the same person) henceforth denied a raise or tenure through a process in which student evaluations counted. Not to mention an ambitious federal prosecutor with a copy of Title IX in his pocket. Now we’re not just talking about leaving student learning on the table, but consent agreements and actual money: I wonder if this will be enough to make us stop delegating teaching assessment to unpaid, inexpert conscripts.  There’s lots of useful stuff to learn from student evaluations, but not for pay and hiring.

Comments

  1. alnval12 says

    Thank you. I was delighted to read it and especially appreciated the links to outside content. The "arranged environment" of the blog really kicked my curiosity into high gear and I revisited a lot of old friends from the 50's and 60's who thought, as I still do, of teaching as a skill that can be taught. It's hard to believe but not surprising given it's presumed cost-effectiveness (?) that we still wander around with a muddled nature vs nurture trait theory as the rationale for measuring teacher effectiveness.

  2. JamesWimberley says

    Armies are interesting models for teaching. They have a very clear idea of what they need new soldiers to learn, and they get it done. An extreme example is the French language course for new entrants to the Foreign Legion. The pass rate is apparently very high: the students are not your first choice for a seminar on critical theory or gender sensitivity, but they are highly motivated (in some case, the legend goes, by outstanding arrest warrants).

    • MICHAEL_OHARE says

      Language teaching is an interesting, maybe a special, case. Bob Frank contrasts his Peace Corps experience, from zero to teaching math to Nepalis in Nepali in six weeks, with his high school experience in which after three years of French he could conjugate obscure tenses but not speak French. I asked a prof at Cal once whether they had experimented with on-line language methods like Rosetta Stone and he said "we need the intro teaching jobs for our grad students."
      My fringe view is that while a course in French literature or linguistics is a course, we should no more give academic credit for learning a language than for typing. "Freshmen: here are the urls of two programs we've tested, found best, and paid for you to take. By the end of the summer there are required swimming and oral + reading language tests. See you in the fall."
      .

  3. paulwallich says

    There's probably plenty of useful information in student evaluations even for pay and hiring, it's just that eliciting and using it requires the people in administrative roles not to be stupid and mechanistic. Heaven forfend they should read the comments with an understanding of unreliable narration, or use comments in other classes as context for comments in the class under examination. (Yes, a friend and I, who have over the decades agreed on many things, once took the same course. Diametrically opposing comments.))

    Would it be wrong to ask, if the administrators are just applying rote formulas to numeric scores rather than doing one-on-one evaluation and facilitation, why they shouldn't just be replaced by a part-time intern and a handful of five-dollar apps?

  4. RhodesKen says

    Not in disagreement, but playing back a comparable story from my past…

    My partner and I started a computer consulting firm in the 1970s. We needed to hire (and keep) excellent programmers and weed out the ones who just didn't measure up. Trying to look at the code of programmers working on large software development projects was an inefficient and costly way to judge our young programmers. Better, we decided, was to measure their output, not simply in lines of code, which is a dismal measure, but rather in assignments completed successfully, time taken to do it, and problems subsequently found in their work.

    Crude measures, we knew, but in a time of intense business pressure, and with a virtually endless supply of folks who wanted to be programmers, we opted for efficient use of our time, using a "culling" approach and then following up with lots of education opportunities for the ones who made the cut. We considered that our time, being the scarce resource, was best spent in this way.

    So it looks, at first glance, like the "measure teaching by student outcomes" matches our successful business model from forty years ago. It looks like it's an economist's approach to the challenge of building a successful staff of teachers.

    But there's just that one little "oh, by the way" that screws up the method-I haven't heard that there's an endless supply of bright young entrants into the field, just champing at the bit to make the big bucks in elementary and secondary education.

    • NCGatSmFcts says

      Oh come on. All that and you're not going to tell us who was better at it? ; > (Or were they all male? Not trying to start something! Just curious.) Did you notice any patterns of correlative traits?

    • paulwallich says

      There are some other confounders that may be even more important than the lack of cannon fodder. For one, we're not trying to ramp up teaching staffs, so the administrator-time constrain is much looser. For another, high-stakes multiple-choice testing bears little or no resemblance to even the kind of evaluation that you did. To do a serious job of measuring by student outcome you'd need much better instruments and a much longer time constant. So it sounds more as if someone had heard of your successful business model but not really understood it, and then decided to implement a version that would be an insult to cargo cults everywhere.

  5. doncoffin64 says

    A couple of things. Well, maybe three.

    First, as an economist, am depressed to be reminded that "Professors of philosophy, music, economics and math thought that “innate talent” was more important than did their peers in molecular biology, neuroscience and psychology. And they found this relationship: The more that people in a field believed success was due to intrinsic ability, the fewer women and African-Americans made it in that field." Certainly some of the major figures in the past did not believe that-both more mainstream (Keynes) and heterodox (Henry George) believed that anyone who took the time and made the effort could understand economics. In my teaching, I have always tried to approach the subject as one that is accessible to everyone (whether I have always succeeded is another matter).

    Second, the primary problem with course-teacher evaluations (CTEs) is how often they are badly designed. And, being badly designed, it's difficult for them to tell us as much as they might. For example, I formerly taught (I am retired) in a business school in which one of the primary objectives of the program was to assist students in learning how to work productively in teams. You might think that the CTE would ask about that. It did not. Two or three of us kept arguing that a CTE that did not even address our learning objectives for the program was somewhat problematic. No one agreed to make changes. No evaluation of assessment system that does no pay attention to our objectives is worth the trouble.

    Third, I am mostly opposed to high-stakes testing. Recently the state of Indiana proposed changing its (k-12) testing regimen to expand the scope of the tests and extend their duration. The consequence would have been for 10 and 12 year olds to spend more time on high-stakes testing that I did on my PhD qualifying exams. The kindest thing I could say about this is that it was batshit crazy. Which may be unkind to bats.

  6. NoGatorFan says

    "the fell implication"…. is this a typo?

    When I was still teaching at a university, I did a little experiment one semester. In my small quantitative chemical analysis course, I stopped penalizing students for late lab reports, I gave little prizes to the students who performed the best on each lab analysis, I greatly increased the multiple choice content of all exams and played silly games with the numerical scores on those exams (e.g., everybody got 20 points added to their score).

    I got the best evaluations of my career. My conclusion: what is good for student evaluations is not always good for students. I felt like I had cheated them but they seemed very positive about the experience. Another reason I don't teach for a living anymore.

    • RonGibson says

      "Fell" fits perfectly. It's surely not a typo, but if it is, it leads me to wonder if we have a word for a typo that improves the sentence that contains it?

  7. call_me_navarro says

    the author is using fell as an adjective in which form it means cruel or dreadful.

    in my 20 years teaching 6th grade students first math and now science i have learned that "fun and student-friendly" is no inherent sign of the quality of instruction either bad or good. good instruction provides useful information which can be remembered and applied as well as giving the background needed to go beyond simple application.