59 comments

  • lqet 13 days ago
    I work in academia. When we grade exams, the order of the exams on the stack is the order in which they were collected in the room (people can sit wherever they like). For grading, we are usually 5 people in a single room, and everyone grades a specific exercise for consistency. The exams are getting shuffled heavily, with everyone just grabbing stacks, looking for exams where "their" exercise was not yet graded, and taking them out. So basically, the order in which we grade exams can be considered random.

    However, I also grade weekly exercise sheets during the semester, and these are committed into a repository, where each student has a folder that... begins with the first letter of their first name. Everyone I have ever worked with acknowledges that you have to shuffle the order in which you grade these submissions each week, for fairness. Several effects come into play: (1) your are usually less tired at the beginning, (2) your mood gets better during the last 2 sheets because you know you are done soon, (3, and crucially) at the beginning, you have not yet seen all the common errors / developed a "feeling" for them, and you might thus miss them in early submissions, but spot them immediately in later submissions.

    Another alphabetic effect: In elementary school, my name was on top of the list of students in my class. I remember that I often had to do some special job simply because I was the first name on this list (for example, carry a group ticket when we visited some museum, keep track of something, be the first at something where nobody wanted to be the first, with everyone watching, be the first to be graded in PE, again with everyone watching, etc.). As a fairly shy kid, this already annoyed me in first grade.

    • cvwright 13 days ago
      My strategy was to, like you said, grade problem by problem. Then for each problem, first find all those who got full marks. Then group the others into piles based on what mistakes they made.

      This ensures that everyone who made the same mistake(s) gets the same grade. It also tends to shuffle the order of the exams after every problem.

      Obviously you don’t need this strategy for simple multiple choice questions, and it’s probably also not a great fit for long-form essays. But it worked great for technical short answer problems in CS and security.

      • bobbiechen 13 days ago
        When I was a TA at CMU, we used Gradescope https://www.gradescope.com/ for this. Every exam would be scanned and divided into problems (based on a predefined template - fixed page space for answers).

        Then, each problem was assigned to a TA. Either there's a predefined rubric, or you create it as you go (-1 point for mistake X, half credit for mistake Y, etc.). There's a pretty slick interface where you just read the answer, and use keyboard shortcuts to apply the relevant deductions.

        It still has the issue that every time you change the rubric, you'd need to go back and re-do previously-graded instances of that problem. But it was way faster and (equally important) less tiring.

        • Tijdreiziger 13 days ago
          There’s also open-source software that does the same job at TU Delft: https://zesje.tudelft.nl/

          (disclaimer: I briefly worked on the software for my bachelor’s thesis)

      • jcla1 13 days ago
        This sounds like an organisational nightmare to be honest. You'd be going through the pile of exams multiple times (at least twice) and what do you do if there are multiple mistakes that are common in a single exam question?

        Also: if you're sorting into "mistakes piles" for single exercises, how can you parallelise marking of separate and independent questions?

        • cvwright 13 days ago
          Teach at a broke public university, and you never have to juggle huge teams of TAs.
          • kkylin 13 days ago
            Even at top-notch universities (public or private), when I talk to retired faculty, grading almost always comes up as a reason they don't want to teach anymore.

            [Edit: not disagreeing with your point.]

            • bobthepanda 13 days ago
              Not only is it generally time intensive, you are also subject to lots of tiring back and forth with some students about their grades.

              No grading is perfect, but there’s also some undercurrent of an attitude that students have paid to be there and are entitled to a certain grade.

              • mschuster91 13 days ago
                > No grading is perfect, but there’s also some undercurrent of an attitude that students have paid to be there and are entitled to a certain grade.

                Given that students have taken on hundreds of thousands of dollars in debt that they'll have to repay no matter what and on top of that a lot of jobs being completely out of reach these days without an academic degree (that for fucks sake isn't remotely required by virtually all jobs requiring it!), that's completely understandable.

                Want to fix higher education? Bring the hammer down on companies abusing it as a proxy for legally discriminating against classes of society that are closely correlated with poor academic outcomes. Academic education should be reserved for the best of the best of our youth, and it should be fully paid for by the government, not simply another hurdle to pass to get a job that pays barely more than flipping burgers.

                • bobthepanda 13 days ago
                  I think it is rational that students can feel entitled to that.

                  I also think that the vast majority of poorly paid, non-tenured professors and other teaching staff don't love being the targets of this harassment, since it's not their fault and largely out of their control, and it's not like they're getting the bulk of the tuition money. (That mostly goes to administrative expenses and sports programs.)

                  Heck, most adjunct faculty are often paid below minimum wage and qualify for food stamps.

                • bsder 12 days ago
                  > Given that students have taken on hundreds of thousands of dollars in debt that they'll have to repay no matter what and on top of that a lot of jobs being completely out of reach these days without an academic degree (that for fucks sake isn't remotely required by virtually all jobs requiring it!), that's completely understandable.

                  Would that my students were this engaged before the exam. Guess which students show up the most often for office hours? ... yeah, the ones that are getting the best grades.

                  If my students spent half as much time learning the subject as arguing with me about grades, they would be getting a higher grade than the one they are arguing for.

          • jcla1 13 days ago
            I do (I'm a mathematican). We are usually between 4 and 10 people marking an exam with anywhere between 50 and 600 participants.
        • kkylin 13 days ago
          Online tools like Gradescope make this a little less painful (but still painful), but sometimes it's what's needed, especially on problems that are a little open-ended.
      • underdeserver 13 days ago
        Sibling comment already said so, but I want to emphasize - this requires two run-throughs (at least).

        When I was grading homework, it took about 5 hours a week per class per run through. They didn't pay me enough to make sense for it to be 10 hours.

        • raydev 13 days ago
          A second pass wouldn't necessarily take the same amount of time, especially if you note the issues/concerns on your first pass.
          • underdeserver 13 days ago
            True, but the overhead is large. I graded into linear algebra and intro calculus, so there were a lot of students - I think 150 or so - and most of them were wrong.

            Graders know that wrong homework takes much longer than correct homework to grade. It's correct? Full marks, move on. Is it wrong? Well, how wrong is it? Did they make a bad assumption, but followed it through to its conclusion? Did they forget a minus sign? Or is it complete hogwash?

            So it might not be 10 hours, but still would be around 8 hours. And that's still too much.

      • ska 13 days ago
        For final exams, we use to mark across all sections of a course (so for 101 type courses, this can be hundreds to 1000s of papers).

        Get all the profs and TA's together, break in to groups taking one problem or set of problems. Then you random sample (each group takes a stack) to get a feel for the 'typical' errors, once that's done - you are a machine going through the stacks.

        Every once in a while (not that often) you run into a novel error or approach, and the group discusses.

      • nextos 13 days ago
        My CS school implemented OCR test sheets, with some exceptions, and equivalent strategies, such as test suites and benchmarks for programming assignments. This was done to avoid subjective grading, as it was a big issue even in well-intentioned cases.

        Often, you still get big problems, but the set of solutions is small. It's always three options plus a fourth option (none / all). If you make a mistake you score negative points. It's not perfect, sometimes wording is ambiguous and it's unclear whether you need to tick the fourth catch-all option, but I found it better than the alternatives as it removes most arbitrariness from the process, but obviously has other issues.

        Regular exams often had wildly different grading standards for the same course depending on the class, and thus on the professor who was correcting exams. This was really annoying.

      • anticensor 12 days ago
        An even better strategy is to have the papers scanned by a double-sided scanner and graded by an AI grader.
    • V__ 13 days ago
      A teacher friend of mine always goes through his stack twice. Once to correct all mistakes and a second time to write down points. As you said, once you have seen all mistakes you know how "bad" of a mistake it actually is.
      • smogcutter 13 days ago
        > As you said, once you have seen all mistakes you know how "bad" of a mistake it actually is.

        Crucially, this is not quite what the poster said. It’s not about stack ranking students against each other.

        Say every paper makes the same subtle mistake, and you only notice it halfway through the pile. Unless you go back through them all, you’ll unfairly grade the later entries more harshly.

        • Zancarius 13 days ago
          > It’s not about stack ranking students against each other.

          It's not, but it sort of has that effect, albeit indirectly, and definitely unfairly.

          • smogcutter 13 days ago
            I think we’re talking about the same thing, but to clarify my meaning:

            If you weigh the severity of students mistakes (or successes for that matter) in relation to each other rather than to an objective rubric, you’re effectively stack ranking them whether you mean to or not.

        • kkylin 13 days ago
          I'm not a big fan of putting everything in the cloud, but one of the advantages of online grading systems is that it is easier to make this kind of adjustment. The workflow goes like this: make a rubric item for a specific kind of mistake (it takes a little experience to know which mistakes are likely one-off and which ones are likely to be repeated by other students), assign X points, and later if you decide there are worse mistakes, adjust the points and that gets applied to everyone.
    • donatj 13 days ago
      In around the year 2000 I had an essay due that day I had forgotten, and about ten minutes of computer lab time before home room in the morning. I wrote an introduction and conclusion; then filled the remainder with copy pasted chunks of the introduction and conclusion. The thought being at least I’d get a laugh. If anyone had read the thing it would have been clear it was nonsense.

      I received an 80% with no notes or markup.

      I have been left wondering for the last 25 years how much student work is actually even reviewed.

      I work in EdTech and every time we add a feature that requires manual teacher review of student work you will see that some teachers are VERY diligent while others never touch it.

      • filipezf 13 days ago
        There was this numerical calculus class at Uni where the teacher forbid us to use the calculator. So I just programmed the integral on it, got the partial steps, and just wrote random numbers to fill the the substeps. Got full grade :D The other case everybody got to pass the class, but after vacation we found the stack of exams completely untouched under a desk. The teacher had a side business to run...
      • jtriangle 13 days ago
        I know a guy who copy/pasted a wikipedia article, in line citations and all, and submitted it for a sociology class and got an A, no notes, nothing.
        • mixmastamyk 13 days ago
          He “only cheated himself.” :-D
          • wolverine876 13 days ago
            The point is to develop skills and knowledge, so I would agree. Do you disagree?
            • mixmastamyk 13 days ago
              I agree, but we used to cringe at this saying when young, so funny to bring it back now.
    • spullara 13 days ago
      Everything in this thread just randomizes who doesn't get graded fairly.
      • jibe 13 days ago
        For a single assignment, yes. But at least randomization might mitigate the effect across a term.
      • karaterobot 13 days ago
        Is there a better solution? It's not for teachers to be perfect. Since that's not possible, it's not a solution.
        • jacoblambda 13 days ago
          Probably it would be something like as follows:

          Have a group of N graders. And a parity of k. Let's say N is 6 and k is 2. Randomly shuffle the assignments and partition the assignments into N groups.

          Each grader gets assigned k of the N groups such that they share at most 1 overlap with any other grader and each group is assigned to k people. The assignment orders are shuffled for each grader. They mark up and then grade the assignments.

          Then for each of the N groups, randomly shuffle the group and equally distribute the assignments to the N-k graders.

          Now each grader reviews the assignment grades/markups (in random order) and assigns a grade based on the k grades/markups from the previous rounds along with a rationale for the grade assigned.

          From there the student receives the final assigned grade, the rationale for the grade, and the k markups. If they have a complaint they can go to the professor (who then can also see the k initial grades along with everything else) to dispute the grade for the assignment.

          ---

          This way each TA only has to mark up (class size * k / N) assignments, and review (class size / N) assignments to assign a final grade (which should take far less time to do than the initial markups). On top of that every assignment has a guaranteed (k + 1) separate eyes on it. And then the professors can serve as an unbiased arbiter while retaining all the context from the process.

          To take it an additional step further, the professors could sample a random subset of the assignments to verify the markup and grading is going properly.

          And those reviews/grade adjustments can then be recorded (along with the final grade/rationales) to document how a given TA's grading deviates from the final reviewed grade or the grade the professor assigns. Likewise for a TA's final assigned grade deviating from the professor's. This would allow deviations to be mitigated over time and major deviations to be identified.

        • jtriangle 13 days ago
          No, the solution is for the scoring to be handled by software that doesn't exist yet. Some things have easy, objective measures of correctness. STEM is mostly this way. Others, your humanities et al, are fairly subjective.

          You could probably cover most of this with an LLM, and access to a large body of graded material for a given course, provided said material was graded fairly. Generating that data would be time consuming, as, any given assignment would need to be graded by as many people as possible in order to find a fair average.

          From there, it's simple comparison between your sample work and the presented work. We're probably a decade from this really being viable en masse, but, it no doubt will happen, and for better or worse we'll likely end up with EDUAAS (education as a service).

          • jacoblambda 13 days ago
            LLMs are not going to be a solution. LLMs have absolutely no concept of truth.

            And not everything has an objective solution. Even those that do often have a process associated with them and factoring in that work/process is an important part of grading. Reducing that subjective grading process to only objective solutions being right is grossly reductive and disproportionately punishes students who have the process right and understand the material but make small errors. That's exactly what you don't want to do.

            ---

            Instead the solution is to make sure each assignment gets multiple eyes on it and in a random order. Then to document biases and trends in biases so that the TAs and professors can be aware of them and mitigate them.

            It's a process problem that can only be solved by a process solution. Replacing the graders with technology or reducing problems to a binary right/wrong will never ever solve this and in many cases will end up being more harmful than the biases they claim to solve would be.

            • coredog64 13 days ago
              The LLM can compile verbose prose down to a short summary. If the summaries of each chunk are consistent, then it’s at least structurally well written. Then you grade the summary itself.
              • brewdad 13 days ago
                At that point you are grading the work of the LLM, not the student.
        • spullara 6 days ago
          Have an LLM at 0 temperature with the correct answers in context grade it?
        • stevage 13 days ago
          Automatically unskew the results after grading based on this finding?
        • thaumasiotes 13 days ago
          Yes, you can grade objectively.
    • euroderf 12 days ago
      For grading essay assignments, and possibly also essay-style exams:

      It is important to get a feel for the collective level of writing before grading essays individually, and it is important to avoid over-grading or under-grading essays at the beginning or end of a stream of papers. Therefore I did a three-stage grading process, with three colors of pens:

      The first pass, with a red pen, is marking up single-point problems like misspellings and glaring usage errors. Also of course the general level of writing begins to seep in. This pass of course includes all papers, and it can pass fairly quickly.

      The second pass, with a green pen, mostly just marks in the margin where a (good) point is made or a conclusion is reached. This is to prepare for the next pass. Again, all papers are done in this pass.

      The third pass, in blue pen, is where the quality of the writing is assessed and critiqued. Maybe some short notes in the margins, maybe just comments at the end of the essay.

      When students get their papers back, there are some chuckles (or whatevs) when students see all the pretty colors. But after I explain the method and its rationale, the method is clear and understood (and also appreciated?).

    • xorvoid 13 days ago
      We graded similarly, incidentally, when I was at U-of-M (lol). I don’t think we ever sorted by name so I don’t know if we’d have a bias effect by name unless it’s an implicit bias towards lexicographical esthetics. I won’t deny that grading fatigue can have subjective effects. I always thought we did a pretty fair and objective job. I taught Computer Architecture and we we developed answer keys and grading scales before grading a single test. Of course assigning partial credit always ended up being pretty subjective. Typically though people would error in the same ways and so those would be subjectively identical. I never thought names factored into this much but, to be fair, no one ever collected data…

      Finally, I guess I’ll admit that I’m probably very biased because my initials as A.B. and I’ve always gotten excellent grades, so… maybe maybe maybe

    • pjdesno 13 days ago
      There are all sorts of good ways to avoid these biases. I use the same practice described above for paper exams, and grading order for eg question 2 may be affected by score on question 1, but it won’t be affected by name or ID number.

      If you use Canvas or Gradescope with the default settings, it’s almost impossible to avoid this sort of bias.

      Worse yet, in Gradescooe you’re strongly steered towards grading with a fixed “rubric” with specific points off for each of N pre-defined errors, allowing grading to be done by TAs with little more knowledge than the students themselves, resulting in scores which have little relationship to the quality of the student answer.

    • dheera 13 days ago
      When I saw the title I would have thought that the higher concentrations of Asian names starting with V, W, X, Y, Z would have led to higher grades at that end of the alphabet, and thought that effect would have eclipsed anything else.
      • pks016 13 days ago
        Anecdotally, the course I grade has this effect (just looking at the average score). I have been grading this course from last 5 years(9-10 times). Last names with L-Z score slightly more than A-L.
      • lupire 13 days ago
        Indian names start with A,B, N. Chinese names also start with, C, F, L.
    • Fnoord 13 days ago
      > [..] As a fairly shy kid, this already annoyed me in first grade.

      (I suppose the cons outweighted the cons.)

      Did you perceive any pros?

      I suppose one way to do grades is first read through all papers to get an idea of the levels of the students. Though you still have bias/nepotism and such then. Perhaps a teamwork or commitee would work, or teachers swapping classes/schools?

      I had a French teacher on high school who dropped a pen on list of students and then where it landed that person would get rehearsal. People in mid (waves) were fried.

      Plus, there is also the issue of certain last names being common in certain cultures, leading to skewed statistics.

    • starttoaster 13 days ago
      This might come off rude on accident, but I mean genuinely without malice. When I'm writing an essay to submit to my professor/teacher, I am asked to make multiple drafts to get a proper end result that is ready to submit. Understanding that educational staff is already often overworked, should I expect _less_ from the person I receive my education from? If you acknowledge that many of the grades I receive are actually not fair to me, and there's an attempt to randomize the order that papers are graded, many of the grades that I received (whether high or low) were done partially (that is to say, the opposite of "impartially".) And there's a real concern that in your example where the submissions are committed to a repository that you need to shuffle, that my submission ends up in a similar position in the stack week after week, unless you're actually doing something to ensure my position in the stack is different between submissions. It's probably sufficient in many cases but doesn't guarantee randomness unless the algorithm to randomize submissions takes previous stack orderings into account.
      • advael 13 days ago
        I know it's not the point of your post, but I think it's worth pointing out that you're misunderstanding randomness (albeit in a very typical way). Although randomness is likely eventually (over a lot of instances) going to be the most "fair" way to distribute where your submission is in the order, it does not guarantee that it will always be different, and in fact a "random" algorithm that took previous orderings into account would be provably less random than one that didn't

        It's also worth noting that randomization in a context like this is inherently an imperfect solution to a problem that generally can't be solved perfectly. If we find out that weird ordering biases exist, I think randomization is done on the assumption that many we don't know about could also exist, that there's no clear way to mitigate them completely, and then randomizing the order per-instance is just the best we can do to ensure it's fair (Which, again, won't be perfect. Perfect isn't available)

        • barfbagginus 11 days ago
          I don't think they're misunderstanding randomness.

          They are claiming that randomness is not sufficient for fairness.

          And I agree. Adding some determinacy to the shuffling design would reduce the variance of unfair scores that students receive. Ie, it could reduce the likelihood that a student's scores will be biased by a run of advantaged or disadvantaged scoring orders.

          It wouldn't be very hard to design such a scheme. It would produce fairer results in the short run. And in the long run, it would converge to the same fair averages as the fully random process.

          In fact it would converge faster. This short run statistical efficiency matters. Students don't have access to the long run results - since they're evaluated in the short run.

      • jamiek88 13 days ago
        It’s simply human nature. Teachers can either lie to themselves and you about it or mitigate it. What more could you possible want from them as humans?
        • starttoaster 13 days ago
          I somewhat assumed there would be commenters suggesting the human angle as a retort. That's why I prefaced with both "this is what the teacher expects of me" and "understanding that educational staff is already often overworked." It just seems to me that the current systems aren't sufficient, and acknowledging that is what leads people to improving those systems. The above commentor suggested what they do in academia as workarounds to what the study showed, and I'm saying even that is not sufficient.

          It seems like you're agreeing with me, but jumping to their defense with "people are fallible." People are fallible, that's why we build systems to take human elements out of it. Recognizing where humanity has soured something is key to that.

    • yeahwhatever10 13 days ago
      When I was a TA I always did a second pass to make sure everything was even. It’s not that hard.
      • eks391 13 days ago
        It's hard when you are the only TA for 260 students who get 3 assignments per week, you must also hold free hours and you aren't allowed to go over 27 hrs each week so the school isnt breaking federal laws.
    • Jsebast24 11 days ago
      Only people who's last name start with an A can relate. The rest, I have found, just don't care. It's a curse you have to carry from 1st grade and all through school and college.
    • ripjaygn 13 days ago
      While this helps the students with names lower down the order, people who are graded later still suffer.
    • bandrami 13 days ago
      We tried a lot of things. What eventually worked was ending grades. You mastered the material or you did not; perhaps a couple of students mastered it with high marks.

      Obvs this takes an administration that is OK with that, which most aren't.

      • dev_tty01 13 days ago
        Having hired a lot of engineers, I can tell you that mastery of material is nothing close to a bimodal distribution.
    • gonzo41 13 days ago
      Have you ever thought about just passing out a set of grades on random to random individuals and see how that shakes out. Like totally random and unjustified grades. D minus for an A+ student. A+ for fails etc. Just random chaos. Then just score the final correctly and see the effect?

      Or just having a Kafkaesque pass fail grade with no feedback for each student relative to their own performance over time with an expected growth rate applied?

    • madeofpalk 13 days ago
      > you have to shuffle the order in which you grade these submissions each week, for fairness

      I don't think this is fair. It's just a more randomly distributed unfairness, rather than by a deterministic factor (like the student's name)

      'Fair' would be each student is assessed independently for the work they did, rather than their mark being impacted by how early or late they were marked.

      • jcparkyn 13 days ago
        I think an important difference is that when you shuffle them, the unfairness stops being correlated across multiple assignments, so the "aggregate" unfairness over the course of the semester is much lower.
      • shepherdjerred 13 days ago
        It would be essentially impossible to have something "truly" fair for open-ended questions since humans are stateful.

        Maybe this is a case that AI could actually do quite well.

        Manually grade the answers and identify the classes of mistakes. Then hand the classes of mistakes to the AI and ask for it to determine which answers have which types of mistakes.

        Once you've done that, you just need to associate a deduction for each type of mistake and do some simple math.

        • vagrantJin 13 days ago
          what do you mean AI? you must be joking.
          • shepherdjerred 13 days ago
            Imagine a question: compare bubble sort and quick sort algorithm.

            Some students might mix up the algorithms, some might give the incorrect computation complexity, some might describe them incorrectly in some way.

            Manually grade some (or all of) the answers by noting the kinds of things students got wrong (e.g. the above criteria). Then, feed in to ChatGPT (or your favorite alternative) the answer + the categories of mistakes to expect.

            Here's a simplified example: https://chat.openai.com/share/bf801e12-51d5-4255-9968-bbf91b...

      • luplex 13 days ago
        There are many notions of "fairness", many of which are logically incompatible with each other.

        In this example, I think it's kind of fair to give everyone an equal chance of being advantaged. You're not hurting anyone specifically.

      • gqcwwjtg 13 days ago
        Is that distinction worth making here? There’s no way to “assess independently” the work of each student without some amount of randomness. But I think that’s okay, because isn’t randomly distributed unfairness just… fairness?
  • ryandrake 13 days ago
    Maybe related, or maybe not, but I remember when I was in K-12 school back in the 80s and early 90s, they would always seat us physically in the class front-to-back by last name. So the kids with last names starting with A-D or so would always be in front, and the kids with last names starting with U-Z would be in the back. For every class. I remember this because many of my friends had last names "near" my last name since we were always in close proximity to each other. I vaguely remember, by the time we were in high school, there were definitely more high-achieving kids with A-D last names and definitely more of the troublemakers were U-Z. Was it caused by sitting in closer proximity to the teacher and getting more teacher attention? We'll never know because this wasn't an experiment and there wasn't a control group.
    • wongarsu 13 days ago
      "students who sit closer are more likely to be high achievers" might also be the source of most of the stereotypes of people with glasses. It took me years to realize I'm mildly shortsighted, so the first half of school I chose seats in the front half of the classroom to make reading the blackboard easier. Many of my friends had glasses and preferred to sit up front because their glasses didn't fully correct their vision.
      • RheingoldRiver 13 days ago
        In a somewhat reverse scenario, when I was in 4th grade (9 years old), I knew 100% that I was getting nearsighted, and I absolutely did NOT want glasses. Fortunately (debatable) we got to pick our seats so I always picked a seat in the very first row, where I could kinda-sorta-almost see what was written on the board if I squinted. And I was also way above my grade level so I was able to fake it pretty well for most of the year even when this started to fail me. My mom insisted on taking me to get my eyes checked about 2/3 of the way through the year and I couldn't fake my way through that, though, so I finally got glasses, but by that point I was used to sitting at the front of the room, so I choose front-of-room seats when possible for most of the rest of my schooling. There's probably some moral here but I don't know what it is.
        • smeej 13 days ago
          I moved states and schools midway through 3rd grade and was seated alphabetically, in the back, for the first time in my life. The teachers in my previous school knew me to be a model student, so would sit me up front "to set an example."

          My parents couldn't figure out for the life of them why I was suddenly struggling and thought I was having adjustment issues. I had taught myself to read when I was 3; how could I suddenly be having trouble keeping up?

          It took longer to figure out because I was only nearsighted in one eye. I was tall for my grade, so as long as the person in front of me to the left was shorter than me or the teacher was writing high enough on the board, I was fine, because my left eye was fine. But when everything aligned just wrong, I was suddenly helpless, because my right eye could barely see clearly an arm's length from my face! It's a hard thing to notice when only one of your eyes isn't working very well, especially when you're 9.

      • Ekaros 13 days ago
        I remember at that age that my sight was going worse quite quickly. So in process there will be many points where your glasses might be slightly lacking.
    • nsriv 13 days ago
      I'm a teacher now, and this made me wince. It's exactly how I've been told by my parents that seating worked for them in school (India, 60s-80s) but their grading was done by semi-anonymous roll numbers.
      • user_7832 13 days ago
        Today I'm 99% sure all CBSE board exams (I think equivalent to A-levels?) are randomized heavily. However I did notice the name's alphabetical order effect in school, albeit in a minor way (folks with later letters were less involved in anything a teacher might need a volunteer for).
    • mertd 13 days ago
      Circular shift is the trivial solution. In my high school every row moved up on Mondays and the front row moved to back. Of course you could argue the ones who started at the front on week 1 still has an advantage but it's likely not that significant.
  • zdw 13 days ago
    As someone whose initials are Z and W, I tend to notice alpha sort a lot. Asking a friend whose initials are A and B about this, it's not something they ever noticed.

    I haven't noticed a grading/ranking difference, but far more frequently I'll hear that "oh, we ran out of item/time/etc. before we got to you", which has made me much more sensitive to issues of planning/organization.

    • arp242 13 days ago
      When I was a kid marbles were the big thing, and if you were playing with them in class the teacher would put it in a big glass jar. When it was full he would call out the kids and each would get a handful.

      I was last in the alphabet; this was already an issue with books we had to read; you could choose which book to read, but it was always in alphabetical order. When it was my turn there were just a few left, and certainly all the popular high-demand ones were gone.

      Anyway, when it finally was my turn to get my marbles he was all out. When I asked "where's my marbles?" he just shrugged and said "all out". I must've been about 7. Lots of crying ensued and I think I got some marbles from other kids, but it wasn't about the marbles – not really.

      I still don't understand how anyone can expect any different result...

    • StevenXC 13 days ago
      Like most inequities, those who are in the benefiting group frequently don't realize that privilege.
      • sdwr 13 days ago
        They realize (bring form to, make real) them, but don't realize (understand) them
        • wryoak 13 days ago
          I hate how much I love this worthlessly picky comment
        • godelski 13 days ago
          This reminds me of cliques. I give them the definition: insight everyone can recite but nobody can act upon.
          • DangitBobby 13 days ago
            I think you mean clichés.
            • godelski 13 days ago
              I do. Gotta live swipe and homophones
      • dustingetz 13 days ago
        I for one am glad that I was not born a mosquito, the odds are not in our favor!
    • andoma 13 days ago
      This reminds me of a funny event when in fourth of fifth grate. When the class was supposed to stand in line we always had to sort based on last name. My last name started with Ö (Last letter in alphabet in the Nordics) so I always ended up last. Then one time, the teacher said something like "Let's reverse the order today, but wait, we also sort on the first name". My first name starts with an A so I ended up last in line anyway, much to the joy of everyone :)
    • underlipton 13 days ago
      This seems like a good example (free of cultural baggage) of how people with privilege often don't notice that they're receiving that privilege. What seemed normal and fair to your friend turned out to be an advantage that they didn't even consider.
    • zeroonetwothree 13 days ago
      Outside of school I can’t think of even a single instance of alphabetical sorting of my name (I have a middle letter). What situations are you in that this comes up a lot?
      • libria 13 days ago
        Probably every single health or wellness "Find a Provider" portal lists them A-Z. That's a multi-billion dollar industry. If I was Dr. Zachary Zane, I'd change my name.
        • sitkack 12 days ago
          AAA Aches and Ailment
      • smeej 13 days ago
        My mom made the critical mistake of marrying from first five letters down to last five letters during the police academy, only later to be released from the "we have to expose you to tear gas so you know how it feels and only use it judiciously" chamber in alphabetical order by last name.

        It was 40ish years ago and I still don't think she's forgiven my dad.

      • wcunning 13 days ago
        My daily standup is run by the order my boss sees the participants in the JIRA board -- My first name starts with W, so I'm last in that list. Makes staying engaged the whole meeting hard...
        • macintux 13 days ago
          I'm the first in the list, which has some advantages, but I do get tired of always being the first person to throw themselves on whatever grenade is lying around.
      • zo1 13 days ago
        As the other poster said, the order of standup and other such things. Having a "Z" means that you're usually last, and sometimes people make a point of "hey let's do it in reverse today" where I end up being first.

        I remember when working on joint tasks, by the time it got to me, most of the people that worked with me had already given their updates and details. So when it was my turn, I'd say "same as A, B, C", cause they'd given all the juicy details.

        Other than that, it's pretty straightforward and boring. The world doesn't magically function differently for us.

      • hu3 13 days ago
        Company Discord of a client. My name is among the top.

        It's a remote job so, being frequently visible in that list can be an advantage.

      • talsperre 13 days ago
        The order in JIRA boards during the daily standups comes to mind. I am sure there are similar examples in other domains that are not software related.
      • kaashif 13 days ago
        I have a middle letter and also don't remember this happening much.

        We should ask people with later letters if they remember this more.

        • godelski 13 days ago
          I'm a near last letter surname. It's not uncommon for arbitrary things to be sorted by name, but a ton of official things use surname ordering. There's things that also I tend to seem to be last on where I don't know the sorting method, but I suspect it isn't uncommon for someone to just throw in a sort somewhere (though it's also common to see people do things in a LIFO so disadvantage people who get shit done on time... My apartment renewal does that...). I also remember getting a PCR test in covid where they binned by last name.

          I can just say I do remember being last in a lot of arbitrary and official things and seeing other friends just get done with it faster and have to waste less time sitting and waiting.

      • NikkiA 11 days ago
        Whenever my Doctor's office calls to arrange annual vaccines, I (surname starts with A) get an appointment just before my boyfriend's (surname starts with B) because they invariably phone me about 10-20 minutes before him.
      • IshKebab 13 days ago
        Yeah I don't think it really happens outside school, but school is pretty formative and it happens all the time in school.
        • wryoak 13 days ago
          It happens in your phone contacts when you’re deciding who to talk to. You’re starting with your Abrahams, Billys and Changs, probably rarely reaching out to your Xaviers, Yusufs and Zeldas about going out tonight because you’ve already assembled a crew by the time you reach the Mimis, Natashas and Ottos.
          • IshKebab 13 days ago
            I don't think many people use their phone contact list like that.
            • godelski 13 days ago
              I wouldn't be surprised. It's very natural. Probably not for that specific use case but if for some reason you are actually going through the list then it's natural
              • smeej 13 days ago
                Plus, it's common for me to meet someone on a first-name basis and not find out their last name right away. And people's last names change more often than their first names. Phone sorting by first name is the way to go.
          • fsckboy 13 days ago
            just want to add that in my lifetime that switched from being "by last name" to "by first name". So, Yusuf Ahmed and Abraham Zigfeld experienced a noticeable shift in popularity that they were totally unprepared for
          • thaumasiotes 13 days ago
            Do none of your friends like or dislike any of your other friends?
            • wryoak 13 days ago
              Probably but that’s not how they’re organized in my contacts. It’s a list not a graph.
    • washadjeffmad 13 days ago
      Similar initials, frequently last in line, and same.

      I wonder if this was the kiln of my patience and acceptance, or if people who road rage and get frustrated with waits are more likely to have earlier lettered names?

      • ambrose2 13 days ago
        I really cannot stand sitting in a car in traffic and my last name starts with an A, interesting!
    • RheingoldRiver 13 days ago
      > Asking a friend whose initials are A and B about this, it's not something they ever noticed.

      Kinda surprised, my last name starts with C and I was hyper-aware of this and how random it was probably all the way from kindergarten. Being a child and therefore an asshole, I was grateful for my advantage rather than thinking the system was unjust.

    • itronitron 13 days ago
      • CamelCaseName 13 days ago
        Do you have something to say about this? I'm confused, why did I read this wikipedia page?
    • winwang 13 days ago
      (just doing roll call here with initals WW)
  • noodlesUK 13 days ago
    At my university, almost all of our marking was pseudonymised. We were assigned a random candidate number at the beginning of each year, and that is what went on our important papers/exams. The less important coursework often didn’t bother with this, and used our student numbers instead, but the general idea was the same.

    We didn’t put our names on any of our work other than our dissertation (and a few trivial assignments that didn’t impact overall marks). It wasn’t that hard to de-anonymise, but it meant that the system had a bit more integrity.

    It’s a really straightforward system to implement and I don’t know why it isn’t done more frequently.

    I also think that our VLE sorted assignments by time of submission rather than any identifier.

    • trescenzi 13 days ago
      Wouldn't a possible outcome here though be that it just randomly reduces grades instead of reducing them in a way that's related to the students? If the issue is the sorting the random candidate numbers would still be sorted. It solves the problem of bias related to the individual but it doesn't solve the problem of bias related to the way that the submissions are sorted.

      A random identifier coupled with a random sort order seem like the way to go here.

    • ghaff 13 days ago
      University exams, this probably makes a lot of sense. After all, the exam is the exam and whether a student is well-spoken and actively participates in class shouldn't matter for an exam grade. I'm less convinced that blinded conference proposals are a good idea--an argument I've had with various people. If you know based on past experience that someone will almost certainly hit a home run, I'm less inclined to pick a random person without obvious qualifications for the same topic--although just picking friends of the committee can obviously go too far.
      • wongarsu 13 days ago
        You could try to work around that by first grading all anonymized proposals, then grading all potential speakers without knowing their proposal. In the third round you deanonymize and look at the weighted average of the two grades. You probably still need some judgment calls because the combination of speaker and topic can be important. But the score would give you a good base to work of.

        Maybe you could make it even more impartial by allowing conditional scores in the first two rounds. Like "Jim is a 6, but a 8 if his talk is about molecular biology" or "this Lessons Learnt talk is a 5, but if it's by X, Y or Z it's a 9"

        • ghaff 13 days ago
          Yeah, but I'm not sure conference proposals by themselves actually have a lot of value given that, in many cases (ask me how I know), the presentations won't actually exist until week or two before the the event.

          Certainly a talk by X that's totally unconnected from anything they're directly involved with has less value.

    • omoikane 13 days ago
      I had classes like that, where at the beginning of the quarter, each student gets assigned an username of the form "<course id> <three alpha characters>" and all participation is based on username from then on. Even though the usernames are seemingly random, certain usernames started gaining reputations on the class discussion forums, and students come to recognize some names.

      But computer science courses tend to have very objective rubrics for grading, so I am not sure the anonymity mattered much.

    • __MatrixMan__ 13 days ago
      I think I get better feedback when the teacher knows who I am. Grades are secondary.
      • ghaff 13 days ago
        I'm not sure exam grades at the university level are really the place to get useful feedback beyond grades.
    • xhkkffbf 13 days ago
      I think the point is that some automated systems like Canvas may hide the names, but they're still presented in alphabetical order. Pseudonyms don't help if you don't shuffle them.
  • tokai 13 days ago
    >One simple fix would be to make random order the default setting.

    Fixed in the sense that the bias will be random. Presumably students graded last will still receive lower grades.

    • kibwen 13 days ago
      It would be less than ideal, but still an improvement over the current situation as long as the order is re-randomized for every assignment, because at least then you'd only be occasionally disadvantaged rather than consistently disadvantaged.
    • tetha 13 days ago
      There are however other factors involved in the grade, which have a higher impact on the grade. Like, understanding of the material and ability to present a solution. - E - I'm mostly saying that because a bunch of comments are jumping on this as a significant bias against some students.

      From my experience as a tutor, yes, this bias exists. But it won't turn a horribly wrong or an excellently correct solution into anything else.

      I eventually knew my strugglers and my excellers. I'd skim the excellers first, because if they messed up, something bad was going on. Then I'd go through the strugglers to see problems. And then I'd grade the rest first in whatever order I got the sheets, then the strugglers and then the excellers. I needed the baseline to see how bad the worst ones actually do. Some exercise sheets were an accidental adventure, I can tell you.

      And writing it like that, it sounds totally callous and cold. But focusing on the lower third in the exercises and communicating their struggles to the TA and prof was very appreciated by everyone, especially those students. It makes sure to get the important fundamentals right.

    • exe34 13 days ago
      It should average out over their career at the university - whereas if the alphabetical order is kept, then they would be systematically penalised.
      • zeroonetwothree 13 days ago
        It won’t average out perfectly. There will still be lucky and unlucky students.

        Of course it’s better than a fixed order, and if it’s easy to switch then might as well. But we should keep thinking about how we can make it even better.

        • furyofantares 13 days ago
          Since the effect looks very small, it looks to me like it's only a problem because it adds up if it happens for every assignment for every course. I don't think it needs to average out perfectly; it looks to me like you'd have to be astronomically lucky/unlucky for it to matter if each assignment is in random order.
          • zeroonetwothree 13 days ago
            Some courses are only graded based on a small number of tests. I actually went to UM and a grade might be something like 30% midterm 60% final 10% homework (obviously different professors have different systems). In that case if you get unlucky just twice on the two tests you basically get the full penalty.
            • furyofantares 13 days ago
              I'm not sure how much a +/- 0.3 (out of 100) deviation from average on a single course matters even if you end up dead first/last for both midterm and final in that example. I mean, it will matter sometimes. But it's (by far) not as big a deal as if it happens for all your courses.

              Still, yes, you could flip the order from midterm to final instead of randomizing both and the effect goes to more like +/- 0.1 out of 100 for the luckiest and unluckiest.

              • gwern 13 days ago
                Yes, that sort of mirror-sampling would reduce variance. The problem is, though, you need to know all the uses of randomness in order to properly counterbalance them, and these systems are already enough of a pain to use.

                (For example, if you have two, you can simply swap: but what about other biases? like if it's broken in half to assign to 2 grades. Or what about if there are three exams? And what about balance across other courses? if you want to do variance-reduction and tricks like antithetic sampling, you need to know all this in order to structure it properly - get it wrong, and you may make things worse.)

                So that's why simple random shuffling would be preferred. It allows total ignorance of all other uses (past present and future), handles all ordering biases, and can be done independent in parallel across arbitrary sets of courses/exams/grades/students.

  • dotnet00 13 days ago
    Yep, I noticed this with myself too when I first did some grading a few months ago.

    There was also the factor that the ones I graded initially did not make certain mistakes or answered in expected ways, such that when I did encounter unexpected answers/mistakes, I had to go back and rethink the grading on the papers I had graded previously. Eg if someone answered in a way that made me think an answer I considered incorrect was actually less wrong.

    I only had to deal with a small class, so backtracking was doable and I graded the papers in whatever shuffled up order they were turned in, otherwise there would have definitely been a bias.

    • bee_rider 13 days ago
      I especially noticed this when grading programming projects, because it is slightly complicated.

      I’d either find that:

      A bug was really common, got to re-evaluate after the first couple times I see it, apparently it is an easy mistake to make.

      Or, I’d find a new bug that was pretty common, but which I didn’t know about at first. Got to update my tests and re-run everybody.

      I tended to be really thorough and re-do the whole stack eventually, but it was a real pain. Could have half-assed it of course, but they spend weeks on these things, feel like I owe them honest feedback.

      It would tend to lead me to “softer” grading as well, if you are lazy and only check for a couple bugs, you might take a large number of points off for each problem. Finding some problems and punishing them harshly is not very fair for those students that randomly hit the bugs you expect. If you find every bug, you can only take a couple points off per bug without tanking everybody’s score.

    • JadeNB 13 days ago
      > I only had to deal with a small class, so backtracking was doable and I graded the papers in whatever shuffled up order they were turned in, otherwise there would have definitely been a bias.

      Grading papers in submission order just introduces a different bias, though.

      (For what it's worth, I'm in the same boat and I do the same, because I don't trust my ability to give the papers any true random sorting by hand, so I take the very weak randomization that the submission order gives me.)

      • dotnet00 13 days ago
        Introducing a slight bias factor that is randomized each time results in a lower average bias compared to a bias factor that is the same every time. Plus, as these weren't take-home assignments, I think someone finishing earlier is more likely to be either someone who was already going to score well, or someone who was already going to make the most common errors.
        • withinboredom 13 days ago
          I take tests extremely quickly, I either know the answer or guess it from what I know. I don't think about it. I was usually one of the first people to turn in tests.

          I was usually (almost always) the last person to turn in assignments, I like to be one of the last people out of a door or the last person in a line (I don't like crowds).

          Grading by order-turned-in would almost always mean my assignment would be one of the first or last one's graded.

          If I were to guess that if you did a frequency analysis of people to order, you'd find there were always a certain group who turned it in first, and another group that turned it in last.

          • brewdad 13 days ago
            You need to find a classmate to be a chaos gremlin that randomly mixes up the pile when they drop off their assignments.
        • JadeNB 13 days ago
          > Introducing a slight bias factor that is randomized each time results in a lower average bias compared to a bias factor that is the same every time.

          That's what I'm saying—it's reasonable to believe that the submission time is correlated with other factors, such as ability or confidence (though the effect can cut both ways, with extremely able students submitting early because they finish early or late because they are extra careful, and similarly for other factors). Thus, this isn't really randomization, just correlation with another factor than the name.

  • jedberg 13 days ago
    This is basically the reason my kids have the last name that they do.

    My last name starts with E and my wife's with Y. Bucking tradition, she didn't change her name when we got married, so when we had kids we had to decide what name to give them. We opted to hyphenate.

    Historically, hyphenated last names were [Woman's last name]-[Man's last name]. However, my wife hated that her last name was near the end of the alphabet growing up.

    We bucked tradition once again and put my name first, so that when sorted alphabetically they would be at the front of the list. Incidentally their first names start with A and B so that they show up at the front when sorted by first name too.

    • zvolsky 13 days ago
      Haha, I've always enjoyed being at the end getting less attention from teachers. If the data merely shows a correlation, it may as well be explained by us at the end being under less pressure.
    • lelanthran 13 days ago
      > Bucking tradition, she didn't change her name when we got married,

      Unless you were married earlier than the 90s, I wouldn't really call that "bucking tradition" any time from, say, the mid-90s onwards.

      If you really want to buck tradition, then don't get married - just live together, and have kids :-)

      (After all, there's nothing more traditional than marriage, is there?)

      • jedberg 13 days ago
        In the US, 80% of women still take their husband’s last name.

        But you hit on an important point — a lot of couples are just skipping marriage now.

        We went halfway there — we bought the house together years before we got married.

        • zeroonetwothree 13 days ago
          Owning a house together is probably a more serious commitment anyway
    • mjh2539 12 days ago
      In Latin American countries (and Spain) the paternal surname goes first, followed by the maternal surname.
    • throw_pm23 13 days ago
      Wow, you really gave your children a headstart there :)
    • alephknoll 13 days ago
      [flagged]
      • jedberg 13 days ago
        You seem to be irrationally upset about my light-hearted anecdote. I sincerely hope your weekend gets better.
        • alephknoll 13 days ago
          It was just a simple observation. You are reading into things too much. There's no need to be so defensive. Your comment hasn't upset me in the slightest ( rationally or irrationally ) and I sincerely hope you weren't offended by mine.
          • macintux 13 days ago
            > It must be exhausting being married to a woman who wants to 'buck tradition'. Why didn't she buck tradition and just name your kids 'Aa, Aa', 'Aaa, Aaa', etc and be done with it? Heck why not go all the way and let them go nameless.

            You managed to combine snarky reductio ad absurdum and a gratuitous attack on his wife in three sentences. Why wouldn't someone be annoyed by that?

  • shipmaster 13 days ago
    My last name starts with a letter at the bottom of the alphabet. I notice this all the time. Anecdote from this year: My son is in a high school class that requires constant input from the teacher on long running projects they have. The teacher reviews the projects alphabetically by surname, about 40% of the time, the teacher never gets to the bottom of the class, and asks the students to find her after school if they have issues. But the nature of the projects definitely requires proactive comments from the teacher. I ask my son to go find the teacher regardless and get a pro-active review, but not all the kids do that, and hence the potential for a lower grade.
  • nebulous1 13 days ago
    I wonder why Helen Wang chose this as a research topic
    • jeegsy 13 days ago
      Well spotted!
  • candrewlee14 13 days ago
    Serious unintended consequences of ordering… Reminds me of the hungry judge effect [1] - judges tend to be more harsh before a break and more lenient after.

    [1] https://en.m.wikipedia.org/wiki/Hungry_judge_effect

    • thaumasiotes 13 days ago
      https://nautil.us/impossibly-hungry-judges-236688/

      > we should dismiss this finding, simply because it is impossible. When we interpret how impossibly large the effect size is, anyone with even a modest understanding of psychology should be able to conclude that it is impossible that this data pattern is caused by a psychological mechanism. As psychologists, we shouldn’t teach or cite this finding, nor use it in policy decisions as an example of psychological bias in decision making.

      • SamBam 13 days ago
        Odd article. It simply states that the effect size is too big to be believable (it calls it repeatedly "impossible," but it doesn't seem like it can possibly mean "literally impossible" or "mathematically impossible.") It doesn't give any alternative explanations or specific ways the study is wrong. And it links to a rebuttal by the original authors where the responded to a bunch of the suggestions for data error or confounding factors and found that their results remain.
        • thaumasiotes 13 days ago
          That is explained in pretty much the section I quoted. The explanation of the effect is given in the article's links.

          But the article is written specifically to make the point that it should be enough to observe that it isn't possible for the effect to be real. You aren't making a good point when you cite an effect that is obviously nonsense.

  • prof-dr-ir 13 days ago
    Randomizing the grading order just hides the problem at the level of an individual course, but at least it helps in the average.

    More worrying is when e.g. job candidates are discussed (often in alphabetical order) and people simply tire out near the end of the meeting. When this happens, be sure to suggest taking a break!

  • retrac 13 days ago
    Electoral ballots have often listed the candidates in alphabetic order, but some studies have suggested that it gives a small benefit, to the first person listed. [1] Many election authorities, in Canada at least, have shifted to randomizing the order in some way [2]. Some people have even played with alphabetic sort for novelty purposes; a man in Ontario changed his legal name to "Above Znoneofthe" so he would appear last on the ballot as "Znoneofthe, Above".

    [1] https://electionlab.mit.edu/research/ballot-order-effects

    [2] https://www.cbc.ca/news/canada/british-columbia/vancouver-do...

  • xyst 13 days ago
    That 0.6 pt gap over multiple semesters is the difference between graduating with “summa cum laude” or “magma cum laude”
    • zeroonetwothree 13 days ago
      It’s 0.6% so it would only be if you happened to drop a letter grade as a result. Like 90.5 -> 89.9. And that would have to happen multiple times to significantly affect your GPA.
  • ghghgfdfgh 13 days ago
    There's a section of one of the Diary of a Wimpy Kid books that talks about this exact thing. I was reminded of it as soon as I saw the headline. The justification is comes up with is that kids with names at the front of the alphabet sit in the front of the classroom, so they get called on and learn more. It definitely turned some gears in my brain when I first read it as a teen. Here's the relevant page: https://imgur.com/a/6wIx6qg
  • COGlory 13 days ago
    Multiple factors at play here.

    1) Rubrics are often defined, but the application of the rubric is by a human. Application will shift as the grader gets a sense of the classes understanding.

    2) As you get fatigued while grading, you'll make mistakes, and be less tolerant of others. Especially if you're an overworked adjunct or graduate student.

    3) There are probably a lot more last names early in the alphabet so weighting is important.

    My policy on this when I was a grad student was to publish the rubric, and ask all students to check their grades too.

  • princeb 13 days ago
    >“We kind of suspect that fatigue is one of the major factors that is driving this effect, because when you’re working on something for a long period of time, you get tired and then you start to lose your attention and your cognitive abilities are dropping,” Pei said.

    there is a similar effect found here https://en.wikipedia.org/wiki/Hungry_judge_effect

    • tokai 13 days ago
      I believe the hungry judge effect has generally been accepted as false.
    • zeroonetwothree 13 days ago
      The thing is, it’s unclear why that effect would make you give people lower grades. surely an equally reasonable guess is that less cognitive abilities could make you give higher grades because you don’t notice errors?
      • janci 13 days ago
        Sometimes you see the result is wrong so you do not give any points initially and then look on the steps and try to find something that looks correct to give at least some points. The willingness to track through every step diminishes with increasing fatigue.
      • bee_rider 13 days ago
        It depends on what you are doing and how you are grading. I’d try to not take many points off if an error is somehow “really easy to make,” but that depends on my ability to evaluate the difficulty of mistakes.
  • redandblack 13 days ago
    When I studied engineering in India, we never put our names in the finals at college. Every one gets a exam id and that goes in the answer sheets.

    Also, it is never your professor who grades you - the answer sheets are collected and lecturers/professors will correct them at the state level across all the engineering colleges in my state.

    I do not know how it is now as there has been an explosion of colleges in the state. But expect the standardized tests are similarly conducted.

    • kwhitefoot 13 days ago
      A lot of bachelor's degrees these days are awarded on the basis of modules with no finals. For instance when I did a course on C# a few years ago in Norway that was worth 6 points (I got full marks :-) ). If I had done another 29 modules of similar difficulty I would have got 180 points and been awarded a BSc in Computer Science.

      It's quite different from the way it was when I studied physics in the 1970s when only the final counted. Annual exams only determined whether one was allowed to continue but had no effect on the class of degree that was awarded.

    • user_7832 13 days ago
      As far as I know even now it's the same for government universities (eg Delhi/Mumbai Uni). But private unis may just have a few/one profs grade everything.
  • samatman 13 days ago
    A computer-based system like this is an opportunity to remove all personal details from an assignment while grading it, it baffles me that this isn't the default.

    The database could tag every assignment with a UUID4, and present them for grading top-to-bottom in UUID lexical order, without exposing who is being graded in any way.

    You can't fix fatigue bias, but this would distribute it randomly. It also removes the opportunity for favoritism and hostility, subconscious or otherwise, which is probably more important.

    Once grading is completed, the assignments are reconnected with students. Give the profs a way to mark assignments with metadata, sometimes they need to talk to a student personally about something, this should be made easy.

    Grades can't be immutable, professors need discretion in that, but it would leave an audit trail if professors maliciously modified grades (or the opposite). That should be uncommon to begin with, but both professors and students benefit from an audit trail here.

    A system like this should be used whenever it's practical, and always for high-stakes tests like midterms and finals. Not making a case against oral exams here, just that when it's possible to blind the grading process, it should be.

  • boesboes 13 days ago
    This seems related: https://news.ycombinator.com/item?id=39672111

    As in, order matters

  • xmddmx 13 days ago
    Is anyone confused by "lower-ranked names"? To me this means A, B, C, but the article says "Wang said students whose surnames start with A, B, C, D or E received a 0.3-point higher grade out of 100 possible points than compared with when they were graded randomly."

    So I guess "alphabetically lower ranked" means the last letters of the alphabet, not first? Confused.

    • samatman 13 days ago
      This is an important observation!

      The programmer's perspective and the user's perspective aren't always the same, and both need consideration. A user is going to see a list: it starts at the top, and it ends at the bottom. The first fields are higher, the later fields are lower.

      Of course, if this is a sorted list, the first field will be the "lowest" value, for whatever comparison is used to sort it.

    • pks016 13 days ago
      Yes, while grading we divide the students by their last names.
    • ghaff 13 days ago
      Yeah, I misunderstood this at first and then was somewhat confused by the comments until I actually clicked through and looked that the post. :-)

      I can actually believe the effect going in either direction and it's small.

  • danilor 13 days ago
    Has anybody found this link to this study? Or even the title?

    I searched the authors in google scholar but I couldn't find it.

  • StefanBatory 13 days ago
    I have an surname that's alphabetically low. Even at uni amount time I went to class and came out empty-handed as my teacher didn't score my assignement on time (at my uni 90% we have oral discussion about it) and I have to come next week while others don't are way too high.
  • RecycledEle 13 days ago
    I can explain why the kids with A names outperform the kids with Z names.

    As someone whose first and last names are both very early in the alphabet, I was always called on first or second when I was in elementary school and middle school. I always had to be there early.

    My friend whose name was very late in the alphabet learned he did not have to be ready for the first minute or two of class.

    He would be standing near the door talking as I was quickly pulling out last night's homework, and I would be marked down for not being ready while he would later be commended for being ready when the teacher called his name.

    As a teacher, I see that the kids who stand outside the door talking do not do as well as the kids who are there early.

  • cm2187 13 days ago
    We know there are big disparities of academic success by ethnic group (cf the whole harvard discrimination against asians controversy), and there are also big concentrations of patronyms by ethnic groups (or at the minimum first letters that are more common in one part of the world than another). And on top of that if the university itself discriminates against certain ethnic groups in its recruitment it will reinforce this bias (like if asians students require better grades to get in, it is unsurprising those students that get in perform better than the rest).

    That would be my best guess for a rationale behind that result.

  • corimaith 13 days ago
    If we changed our policy of exams from discriminative to evaluative, grading bias would be a trivial issue but here we are since we just NEED ways to fit everyone into numbers that we can easily use.
  • huffmsa 13 days ago
    I had a theory in school that this was the case for presentations too so I always forced myself to go first. No one else to compare me against, and no sitting around getting jittery.
  • TrianguloY 13 days ago
    I also have the theory that having an app/software starting with A, B, or an "alphabetically first" letter was noticeable in the past. Nowadays things are usually sorted "algorithmically", but it was common for stores to list searches with some alphabetical score, which meant that those apps were usually shown first.

    Even now, for example, if you go to Play Store and want to know the apps that you had but are not installed, the default sorting is by name.

  • TrianguloY 13 days ago
    As a different but similar situation: I have a first name that is usually at the top when sorted alphabetically. Nowadays it's not a problem anymore, but as a kid I usually received a lot of calls from people that either misclicked or didn't know how to use a phone properly. It turned out it was because I was the first on the phonebook list.
  • jimmar 13 days ago
    Order effects are real. I'm a prof. I notice that the longer I grade, the less motivated I am to take off points and then justify why I took off those points. It's easier just to give points and move on. (And if anybody wants to criticize this, I'll be happy to launch into a diatribe on the psychometric dumpster fire that most assignments and their associated grading scales really are.)
    • dgacmu 13 days ago
      Also prof: me too. I'm much more likely to provide comments on the first couple of exams I grade than on the later ones.

      I've found that gradescope is helpful in this regard, because it at least forces every point assignment to be matched to a rubric item. I don't have data, but I believe it makes our grading a lot more uniform compared to the pre-gradescope days. (This might be easier in grading computer science exams than in more subjective areas, though.)

    • zeroonetwothree 13 days ago
      This is the opppsite of the effect they found. I do wonder if there is a big difference depending on grader and the study found some kind of average.
      • jimmar 13 days ago
        The article mentions that the paper is under review, but I'm guessing the effect size is small and that individual differences between graders is very substantial. The article states:

        > The researchers collected available historical data of all programs, students and assignments on Canvas from the fall 2014 semester to the summer 2022 semester.

        Thousands of students X 8 years X lots of assignments per year and you get a sample size so big that it would be hard not to find statistically significant effects.

  • diogenescynic 13 days ago
    It's the same with applying to jobs. The first applicants have a greater likelihood to get the job. If you're given a list of names... you're just generally more likely to pick something from the top of the list than the bottom.
  • analog31 13 days ago
    I propose one of the following:

    1. Keep the present system of grading by alphabetical order

    2. Record the order in which the papers are actually graded

    When the grading is done, the teacher assigns a point scale (A = 90, B = 80 or whatever) but the computer does a regression fit and removes the bias.

    • 2cynykyl 13 days ago
      This is a great idea! Next time I mark a stack of exams I will also note the time of day that the mark was entered. I can then cross-reference this with how long I have been sitting between breaks, since my last meal, etc, etc. Unfortunately I will not have this opportunity until mid-fall 2024.
  • beryilma 13 days ago
    With huge grade inflation in US universities, all students are already getting better grades than they really deserve. The amount of gymnastics that professors do to pass all students is insane. So, no student is really receiving a lower grade.
  • jncfhnb 13 days ago
    Most exam grading is not viewing the writing as a whole but rather looking for incidences of specific points to assign credit for. One could imagine an LLM be quite effective at labeling sentences as pertaining to a predefined idea at scale.
  • stikit 13 days ago
    A .3 point difference isn’t going to make a real difference to anyone’s life and is likely a wash when other yet undiscovered biases are in the mix. Unfairness and bias is a critical factor in driving people to extraordinary achievements.
    • wolverine876 13 days ago
      > Unfairness and bias is a critical factor in driving people to extraordinary achievements.

      The evidence is a strong negative correlation between bias and achievement: Extraordinary achievements so disproportionately achieved by people in groups that are not the target of bias. Look at top government officials, SV leaders, Nobel Prize winners, etc etc - mostly white males.

      The biggest targets of bias in the US, for example - probably women and black people - genrerally get the worst results (in areas where there is discrimination). By contrast, as an example wherever black people aren't subject to bias, such as certain forms of music and certain sports, achievement is extraordinary. Imagine all that talent and drive in other fields.

    • inemesitaffia 12 days ago
      It stacks over time
  • searealist 13 days ago
  • yencabulator 13 days ago
    It seems it would take less time for Instructure, Inc. (makers of the mentioned software) to fix this than it took do this research.

    Anyone know whether this is happening, and if not why not?

  • markusde 13 days ago
    I noticed this in myself last time I was as a TA. I'd go back and re-grade the first 15 assignments or so to make sure the rules were being applied consistently.
  • largbae 13 days ago
    What other popular systems might lead to different outcomes based on sort order? Dating site matches? Your own contact list?

    Interesting category of problems...

  • stevage 13 days ago
    Would it be possible to simply accept that this exists and automatically unskew the grades after marking?
  • 1-6 13 days ago
    Let’s just hope parents don’t try to game the system by starting to name their kids AAAi Aung.
    • nsenifty 13 days ago
      I'm Indian (in the US) and I've noticed a vast majority of my Indian friends name their kids Aanav, Aanir or Aanvi etc. some of which aren't even words in any Indian language. Now I probably know why.
    • jen20 13 days ago
      Fortunately Bobby is near the front of the alphabet anyway!
  • levocardia 13 days ago
    > Wang said students whose surnames start with A, B, C, D or E received a 0.3-point higher grade out of 100 possible points than compared with when they were graded randomly. Likewise, students with later-in-the-alphabet surnames received a 0.3-point lower grade — creating a 0.6-point gap.

    The hand-wringing over such a small effect size seems unwarranted. I suspect you would find similar effect sizes for other small interventions, like whether the grading took place during the week or the weekend, or in the morning vs. the evening.

  • 1shooner 13 days ago
    This reminds me of an experience I had of just the opposite: tightly-controlled consistency in writing assessments:

    Almost 20 years ago I worked for a standardized test essay grading service. We graded against all sorts of secondary-level rubrics (not AP, who do their own). These would usually be from 9 - 12 grade, from every US state, and evaluating everything from reading comprehension to subject matter-specific assessment. We'd do weeks long jobs of a single test (e.g. Alabama 9th grade reading proficiency). These usually had at least 3 dimensions, and at least 4 points per dimension. We would go through a week or more of training on a rubric, then another week of 'leveling', where a manager would occasionally bring you aside and talk through why that '3' you gave on a dimension should have been a '2'.

    By the end of the training, we usually had had enough discussions and encountered enough edge cases to understand the weaknesses/inconsistencies in the rubric (which we had to abide by anyway). Once we were running at full-speed, everything was still double-graded and inconsistent scores were reviewed. Sometimes graders were pulled if they still didn't get the rubric.

    It was a simultaneously stimulating and very boring job, and most readers were educators themselves. I wonder how long before it disappears completely.

  • dcposch 13 days ago
    I bet this correlation goes away if you separate the data by ethnicity.
    • carabiner 13 days ago
      Yeah Chen, Cho, and Cohen are up there and would bias results.
  • p0w3n3d 13 days ago
    Just do name coding. I doubt this happens everywhere on the world
  • klysm 13 days ago
    Job interviews have similar effects
    • 1-6 13 days ago
      Order matters a lot but recruiters typically present the highest flyers first and the lower candidates last.
      • ghaff 13 days ago
        In my experience, it varies. I've been on interview panels where we just weren't feeling it for a number of candidates and basically told the recruiter to try harder and eventually hit someone who we were "That's who we want. Find a way to make it happen."
  • Aldo_MX 13 days ago
    Maybe the answer is smaller groups?
  • flawsofar 13 days ago
    what’s weird is just how long it took to find a statistic like this one
  • redandblack 13 days ago
    The other benefit for being higher in the alpha order is you get the snow day calls first - 4:30 am, and get to call your friends before school calls them.

    We were always woken up by my daughter screaming as here friends called her. No such luck for the post-pandemic kids.

  • faitswulff 13 days ago
    I wonder if these biases are replicable in LLMs.
  • pavlov 13 days ago
    Clearly evidence of anti-Polish bias when all the Zbigniews and Zygmunts and Wojteks get lower grades. (Or just another example of correlation vs. causation in action)
  • hilux 13 days ago
    > Wang noted that for a small group of graders (about 5%) that grade from Z to A, the grade gap flips as expected

    This is critical. Otherwise we could not discount some group (e.g. some ethnicity) disproportionately occupying one end of the alphabet or another.

    Super interesting and important finding. I hope this gets wide visibility and universities take a break from politicking to fix the problem - presumably through enforced randomizing.

    • buggy6257 13 days ago
      Enforced randomization isn't going to fix the problem, it just evenly distributes the problem.

      Based on these results, it would mean that the graders are just getting tired/lazy/inattentive the further they get in their stack of papers to grade. That's the problem the needs to be fixed, not the order they get graded in. Enforced randomization is simply a short term alleviation so no student(s) get disproportionately affected by this phenomenon.

      • bluGill 13 days ago
        > it would mean that the graders are just getting tired/lazy/inattentive the further they get in their stack of papers

        Or maybe they are getting better / more picky.

        I know in code reviews I often pass a few and then notice something that I realize was also wrong in previous reviews I allowed, but later reviews that day (week?) will not allow that.

        • 13of40 13 days ago
          I've participated in day-long and multi-day interview events for job candidates, and I see the same effect. At the beginning you don't have a frame of reference and you're more likely to question your own decision or give someone the benefit of the doubt, but by the end you're far more systematic, plus a little bit numb to the effect your decision is having.
          • throwaway35777 13 days ago
            > by the end you're far more systematic, plus a little bit numb to the effect your decision is having

            Maybe decision fatigue is supposed to bias humans toward the optimal solution for the fiancee problem [1].

            [1] https://en.m.wikipedia.org/wiki/Secretary_problem

          • cyanydeez 13 days ago
            For grading, you could probably just add a mediating factor and throw in test cases that calibrate the factor and then you curve everyone on that factor.

            It'd seemingly be more work but would result in averages that are more reasonable to the changes in stress.

        • labcomputer 13 days ago
          Yes, and:

          Additionally, universities (and, by extension departments) want grades to approximately follow a normal distribution (and yes, you in the back, their actions show they do actually want that, even if they say otherwise).

          When you start grading a problem you have some idea what a "good" solution looks like, what an "ok" solution looks like, and same for "bad" solutions... If you award points based on that, the result will be a normal-ish distribution. But your idea of a good/ok/bad solution evolves as you see more papers.

          There's two reasons for that:

          First, you can't (ahead of time) imagine all the ways that students will invent to fuck up a problem set, and find edge cases in your grading rubric that result in unfairly-high or -low scores. As you gain experience teaching, you will anticipate more of the ways, but you will never anticipate every way.

          Second, the TA/grader wants to be able to stack-rank the papers and have the scores be monotonic. The grader wants this because non-monotonic scoring triggers far more complaining than harsh scoring or picky scoring. When you come across papers that are worse than ones you've already recently graded, you assign even lower scores.

          This results in a ratcheting effect with more extreme scores as you get closer to the bottom of the pile. But, since the mean score is usually a B/B-/C+ (~75-85), and since scores are usually limited to the range 0-100, this means that papers closer to the bottom will receive statistically lower scores.

          Now, you could go back a re-grade ones you've already done, but:

          1. The university is officially only paying you for 20hrs/week (and requires a signed end-of-semester statement attesting to the same).

          2. The assigned workload of teaching and grading doesn't permit a two-pass grading scheme while keeping within 20 hours.

          3. If you complain to the graduate ombudsman about the workload needing more than 20 hours, you won't have funding next semester (so you have a prisoner's dilemma among TAs who might want to grade more fairly).

          4. If you're grading (say) a final exam for a frosh/soph class, you're probably in a room with 4-8 other graders late into the night. One effective way to make your coworkers hate you is to be that guy who always finishes grading his stack last, when everyone is worried about catching the last train/bus.

          Basically, all the incentives are aligned to make this happen.

          • hilux 13 days ago
            That's thought-provoking - thank you.

            Essentially, unless it's an old exam where the universe of bad answers is already known, you need two passes - a discovery pass followed by the grading pass.

        • bigfudge 13 days ago
          In my case, I have to make a conscious effort to remain consistently (in)tolerant of lazy writing. It’s hard to keep on reading between the lines and giving the benefit of the doubt.
        • rjzzleep 13 days ago
          I had the same conclusion. You learn things as you go, including things you don't like.
      • davrosthedalek 13 days ago
        In my experience, it's not tired/lazy/inattentive, but resignation. You normally have some expectation what students will be able to solve. Typically, these expectations are set too high. That's very common, not only for me, but for pretty much anyone I know. So over the time of grading, one adjusts down the expectations and gives partial credit earlier, for example.
      • throwaway35777 13 days ago
        I was a grader once. I guarantee if someone gives a good answer they'll get full marks even near the bottom of the stack. For BS answers I'll admit I got less generous as the hours went on.

        No one's getting hurt by this system if it's randomized. It's a matter of graders giving out partial credit for wrong answers which is discretionary. Rarely students are granted a small mercy. Seems OK.

        • dunham 13 days ago
          I was one of many TAs for a large math class in college (pre-calc - think high school math for college students). For uniformity, the prof had the partial credit down to a science - specifying points for getting certain aspects of the problem. For the finals, a few TAs would be assigned to a given page, for uniformity.

          The fascinating thing was that the distribution of grades was about the same every year.

          And I had a math prof for analysis who would give negative points for BS answers. You could say “I need X but don’t know how to prove it” in the middle of a proof, but if you made up something that was incorrect, you’d get negative points.

          • hilux 13 days ago
            Oh, that brings back memories! "For every epsilon, there is a delta ..."
        • bumby 13 days ago
          >For BS answers I'll admit I got less generous as the hours went on.

          What do you think is the cause of this? Do you become more cynical (and less generous) because you’ve seen so many BS answers previously? Is it just that getting fatigued makes you less generous?

          • ihaveajob 13 days ago
            When I was a TA in grad school, I noticed the same. Early on I thought some BS answers were at least kind of funny, and I gave them the benefit of the doubt, maybe giving more attention to the parts that were correct. After I saw similar answers later on, the novelty wore off and I was probably less amused, so the inclination to be lenient disappeared. Sometimes I went back to previous decisions if I remembered them, to be fair, but I don't think I always remembered since the volume could be high (grading 80 exams in a row is TEDIOUS).
      • BugsJustFindMe 13 days ago
        > Enforced randomization isn't going to fix the problem, it just evenly distributes the problem.

        Evenly distributing the problem does fix the problem. Proportionality is what matters. Grading being arbitrary is fine if everyone is graded equally.

        • zeroonetwothree 13 days ago
          Random order would still mean a few students in the class get unlucky and near the end the majority of the time. Although over the course of all classes it would tend to even out somewhat.

          It’s certainly better than fixed order.

          • BugsJustFindMe 13 days ago
            "randomization" is not the important part here. "evenly distributing" is. It is absolutely possible to reorder the sequence fairly such that your scenario doesn't occur. It could even to a human observer look randomized if you want. In a trivial example case where the effect were linear you could just switch the order back and forth, and on average every student would receive the same middle-of-group impact.
        • whiterknight 13 days ago
          The mistake is assuming grades are an objective measurement, and not gamification to try to help you learn.
          • BugsJustFindMe 13 days ago
            It's a common mistake. So common, in fact, that it has real practical impact on students at the edge who might not otherwise have failed or passed.
      • skhunted 13 days ago
        For me I grade tests as follows. The stack is created as students turn in the test. I grade the first page in that order. The stack reverses for the second page. So on and so forth. I teach college math. I just cant imagine a system of grading done in alphabetical order.
        • falseprofit 13 days ago
          Scanning and grading on a computer can alphabetize them.
          • skhunted 13 days ago
            That makes sense. I haven't had people upload assignments for a long time. I'd forgotten that this was a thing.
        • kurthr 13 days ago
          I also came here to say this. My only guess is that the alphabetization (by the "learning management system") to make filling the grades into a table "easier" for the computer or for the person handing out the results? Why is it "easier" if the system doesn't have to order them at all, or it could do so by student number (same issue as alphabetical order) or something random, which is the other (non default) option for the "learning management system".

          I feel like only the most obsessive compulsive humans would have this issue (without computer "help"), as the last thing I wanted to do as a TA was to add another step of ordering all the papers before grading them. I also always reviewed the first few papers I graded after grading the rest to make sure I was being fair, because it was obvious to me that until I saw a representative distribution of answers I couldn't do fair grading/marking.

      • hilux 13 days ago
        In the real world, universities are never going to fix the problem of overworked and underpaid grad students getting tired.
      • furyofantares 13 days ago
        It's a 0.6 gap from top to bottom out of a score of 100. Plus or minus a third of a percent from average. Pretty small effect. But it would add up (or, well, persist - it wouldn't get bigger) if it happens to you for every assignment for every class and that sucks.

        If there's more than one assignment you can basically erase it by randomizing each separately.

        If you really care beyond that then randomize for one assignment, flip it for the next, then randomize again for the next etc.

      • WaitWaitWha 13 days ago
        > graders are just getting tired/lazy/inattentive the further they get in their stack of papers to grade.

        I will admit to this. Initially, my patience and tolerance for errors is significantly higher than towards the end of the grading. By the second hour grading, I am not only mentally exhausted my tolerance is significantly lower.

        I try to prevent this by creating very explicit grading rubric and I stick to it as much as possible.

        • ghaff 13 days ago
          Clear rubrics are the thing where possible. They aren't everywhere though. I've been on conference committees and so many different factors come into play--including how late in the day it is. But, in that case, a bunch of people are rating and commenting and there's no strict order so it probably evens out to a reasonable degree.
      • bumby 13 days ago
        As the number of assignments grows, wouldn’t randomization help converge on the more accurate grades (in aggregate)?
        • falseprofit 13 days ago
          It would help, but with only a couple dozen courses and most determined by a couple exams it’s not quite a large number.
      • andix 13 days ago
        Even distribution would fix the problem. If grading has a subjective component, there will always be deviations from the "correct" grade. If those patterns are randomly distributed over all students, their grade averages will be comparable again.
      • cyanydeez 13 days ago
        Unfortunately, it's gonna be AI to the "rescue" and the problem is obfuscated.
    • freeopinion 13 days ago
      My first thought was, "Who takes the time to sort before grading?" Computers change the world in such incredibly subtle ways. Of course, such subtleties exist without computers. This is just one case where computers make the subtleties more detectable.
  • Jovita111 12 days ago
    [dead]
  • bigbacaloa 13 days ago
    [dead]
  • underseacables 13 days ago
    Anchoring?
  • m3kw9 13 days ago
    But all the wangs and Xiang and Zhu’s still getting high grades
  • mistrial9 13 days ago
    current curricular trends in California include "algebra removed from 8th grade as unfair" (or more extreme rhetoric given) and this week "equity grading for K-8" where there is no D or F given in any subject. These real-life changes combined with something so arbitrary as this one as "news" really give an impression of a collapse of some kind in public education discourse.
  • llm_trw 13 days ago
    I'm willing to take bets that in 15 years there will be a scandal about faked data by at least one of the researches in this paper.

    It smell just like every other interesting psychology result that at best is a fluke.

    • zeroonetwothree 13 days ago
      I think it’s maybe less likely since this is looking at actual grades and not some kind of survey or experiment. But certainly it’s always a concern in social sciences until we get reproduction.
    • verdverm 13 days ago
      Unlikely. If you talk with anyone who's done grading, this will likely jive with our experience and make us data aware of the outcomes. Like anything, with grading you can get into a flow, and the more you process an assignment, the more answers you've seen and those can change how you grade future answers
      • somenameforme 13 days ago
        Not really taking a position on this one way or the other, but I would say that "this jives with my experience" is near to being a prerequisite for junk science. Somebody saying something controversial is going to be challenged -- confirming biases is precisely how you peddle junk.

        For instance the Journal of Personality and Social Psychology [1] is a terrible journal, with a replication success rate in the 20% range. Yet it's ironically well regarded. Both can probably be explained by the exact same phenomena - go read their articles and reads like a stream of bias confirmations for those of a certain ideological orientation -- the same orientation that's clearly widely shared amongst social science researchers.

        [1] - https://psycnet.apa.org/PsycARTICLES/journal/psp/126/2

        • verdverm 13 days ago
          I absolutely observed my own biases and created techniques to mitigate... a few that come to mind

          1. Grade problem by problem. This actually makes grading sooo much easier on your own mind

          2. Take a second pass to look for outliers in consistency

          3. When possible, craft problems that can be automatically graded for correctness. This leaves more time for commentary on the quality of the solution

          (I taught computer science, which lends itself to some of this)

          The harder bias to handle is the one you develop for students one way or another through the course of a semester or course. Perceived effort shifts grades

      • zeroonetwothree 13 days ago
        I really doubt you can notice a 0.6% discrepancy anecdotally. They only detected it in the study because of the massive amount of data they used.

        Classic confirmation bias.

        • verdverm 13 days ago
          Anecdotally, I would go back and adjust grades on individual problems from earlier in the stack.

          I can very easily notice my own over strictness from early in the stack.

          • 2cynykyl 13 days ago
            For sure. I also find I have to update my rubric to give more/less part marks, which also requires going back. It takes about 10-15 papers grades before things settle down.
    • hilux 13 days ago
      The result seems pretty intuitive to me. The test is easy to re-run, unless the data have been "lost," which is not mentioned.

      Most importantly, none of the researchers is a psychologist or behavioral economist or any kind of "social scientist."

  • dboreham 13 days ago
    Again, if this kind of thing surprises you, read this book: https://www.amazon.com/Knowledge-Illusion-Never-Think-Alone/... The human brain is just a fancy ChatGPT with an internal UI that fools itself into believing it is more logical/smart than it actually is.
    • zeroonetwothree 13 days ago
      If anything the difference only being 0.6% seems pretty impressive for the brain.