There is an old joke that professors grade essays on their heft. The weightier the paper, the better the grade. Drawing from the idea that the longer the work is, the more time was put into it and the more deserving it is of a higher grade, the concept brings the flaws of human grading into focus.
Which brings us to a recent study evaluating the accuracy of computer programs created to score essays. These programs are by no means new- they have been in use for years, particularly in the world of standardized testing. With so many short essays being churned out by test takers the world over, it seemed a simpler solution to automate the grading process.
Of course, while automated grading of multiple choice tests is simple enough, cost effective, and accurate, can we really say the same for automated essay grading?
According to a study from the University of Akron and a consultancy called The Common Pool, the answer is a resounding yes. They took something like 16,000 essays (with sets that included different lengths, different rubrics, etc.) that had already been scored once by a human, then let a computer (well, several programs, actually) grade them again. The results were almost terrifyingly similar. Want proof? Here’s a chart of the scores on mean estimation… they are all so close that the lines all appear to be one goddamn line:
Of course, charting out other factors yields less impressive-looking graphs, but fuck truth when we have visual impact, right?
Regardless of potential data skew based on the most widely circulated chart from the paper, the study really did find a striking similarity between the human and computer graders. This is the first time a study like this has been done on this scale, and it does a lot to address the many flaws in computerized essay grading. Many programs favor essays with more complex lexical choices, as they are representative of an advanced vocabulary (never mind the fact that one can easily toss around a word without knowing the finer points of its meaning, i.e. thesaurus junkies). Programs also favor length, in both the entire paper and in the sentences in themselves. And, of course, they prefer proper grammar.
However, programs have been ridiculed for favoring these technical aspects at the expense of actual content. Can we honestly dole out high marks to students spouting eloquent garbage? The programs are those theoretical professors grading papers by weight, with no regard for the actual information within. A problem, to be sure.
As artificial intelligence technology advances, though, the programs have become more complicated. They are able to discern some relationships between words and phrases that help them “understand” the meaning of the essays. Last year, the University of Florida did some research on the usage of automatic grading systems using AI technology. The system in place was able to look at something like “the heart pumps blood” and find a relationship between the words “heart” and “blood,” essentially finding the meaning of the sentence by piecing together word relationships built through the rubric created by the teacher.
Interesting, to be sure, but it’s still a crude system that can, seemingly, be easily exploited by a moderately clever student. Like a child beating the square peg into the round hole until the corners break, the systems might be able to hammer out a rudimentary “understanding” of the essays, but just as that mangled square peg will never be a perfect fit for the round hole, so too will these programs never understand complex, intricate writing.
Why, then, would we let these systems do our grading for us? There are many purported advantages to removing the human component in grading. It does away with biases (personal, racial, gender-specific), which curbs grade inflation. It alleviates teacher fatigue (from which can stem errors).
There are pros and cons to both methods of grading, to be sure. And this study seems to add another entry in the pro column of computerized grading.
My issue with all this isn’t whether or not the Akron study is accurate. They obviously found a strong similarity between human and computer grading of these essays. To me, this is indicative of a far greater problem.
I am mere days away from completing my English degree, and there is a problem that has been gnawing away at me for the majority of my school-going years. A problem I assumed would vanish when I entered the collegiate world. But it didn’t. It continued on, this relentless march toward mediocrity.
It is a problem with the formulaic nature of writing education.
If a computer can grade an essay with nearly the same degree of accuracy as a human, this says less about our marvelous technology (sorry, but I follow AI research and know even the most cutting-edge experimental programs are nowhere near as impressive as any human mind) and more about the shabby state of our student writing. We teach our students the fucking five-paragraph essay, the rote rehashing of theses to form concluding statements. Pick a topic, back it up with two or three points, wrap it up. There is no room for creativity, for real cleverness, for anything that makes writing art and not just a series of rules to be regurgitated from the tip of a pen or onto a computer screen. As Alexander Pope wrote,
True ease in writing comes from art, not chance, as those move easiest who have learned to dance.
Our students are less concerned with writing interesting, engaging pieces exploring novel ways of thinking or delicately bending the rules- they instead hammer out blocky, mechanical essays. They present bland topics with just the right number of supporting facts to net them a decent grade. That’s it.
I have had many professors, and I have never had one that really inspired me to be a more creative, interesting writer. There was one who broke the mold slightly, but even she wasn’t really a powerful force in my academic career. I know that many others have those professors that shaped them, that really touched them, that showed them something about themselves or their course of study or the world that makes the student grateful and better for having known them. I understand that, I respect that, but I neverhad that. My thirst for knowledge, information, and creativity has always best been sated on my own, outside a traditional classroom.
And while I’m sure there are many English professors [And since when are English professors the only ones expected to foster strong writing in their students? You might have a great idea, oh mighty chemist, but if you can’t write a goddamn elucidatory (…fuck you, WordPress, that’s a word) paper to share that work with the rest of the scientific community and the world, then you are shit out of luck, now aren’t you?] out there who really work to engage their students, given my own experiences and the fact that most students, if they had an “inspirational professor”, only had one or two… statistically, most professors just teach their students that mechanical, boring writing.
I suppose it is time for me to clarify a few points here, particularly for those of you who know me and are pointing at the screen in horror, screaming about my hypocrisy. I am aware that I am known for being an exceedingly technical proofreader. Am I not just perpetuating this system I purport to despise? Well… yes, I am. Because there is technically nothing wrong with writing this way. And, in fact, I am a firm believer in understanding and utilizing technically sound writing, particularly in formal settings. And those five-point essays I was harping on about? Well, they are actually a very useful tool to teach young writers about structure. I do not think they are so much the devil as I find them a despicable crutch we are not only allowing older, more advanced writers to use, but we are actively encouraging this kind of lazy writing. While there is less room for creative flair in formal, academic papers, there should be breathing room for a personal voice to show through the formal technical aspects. It’s a delicate balance, tying the writer’s soul into the formal rules… but it’s certainly possible. But we are not teaching (or even encouraging) this kind of skillful writing. Which, I believe, is a travesty.
More on that in a second.
Just last night, I was teasing a boy for marking a diaeresis, as it’s considered rather archaic in modern English. That being said, I was only poking fun because I am a right and proper bitch (and because the two of us seem to communicate primarily in taunts, mockery, and faux arguments). In all actuality, I found the use of the diacritic strangely charming. I have always enjoyed people who strive to plumb the true depths of the English language. Perhaps that’s an English major thing.
But these finer points of language… they are not taught anymore. Or, at least, not to any real degree. Why did diaeresis diacritics fall out of vogue, anyway? Because the variants, sans markings, became more popular. And our schools teach what is popular. Which is fine, which is useful, but which becomes more and more diluted. Our vocabulary shrinks, the finer points of our language get lost, and then where are we? The loss of the flavorful bits of language, those accent marks and mellifluous phrases and cheeky verbage, cripples us. We lose more than just words, we lose imagination and creativity. And as those slowly degrade, so too do advances tied to them. Invention, discovery. This destroys us slowly, across all aspects of human knowledge and progression.
And we just allow it. That is what I have such a problem with.
Formula is a base, just as we have basic vocabulary. But as we continue through our education, we need to be advancing. We build on the base. We learn the rules, then we learn how to break them. Instead, we stop at a simple formula. After we’ve mastered this, we are done. The end of the line for our writing education. Oh, there’s a bit picked up here and there. But there’s no longer any real push to expand your skills.
Not even for English students, sadly.
Our writing can be graded by a computer program. That’s how basic it is, how fucking systematic it is.
Congratulations to us.
I don’t have a quick fix solution to this perceived problem. Perhaps you don’t even agree with me that this is a problem. So be it. These were just my bitter, scattered thoughts as I read about the Akron study.
Take this with a grain of salt, like you should all my posts, dear galleons.