Many years ago I investigated  the problem of Rush Hour with minimal size cars. I called it Unit Rush Hour, as the cars are just 1x1, but restricted to either horizontal or vertical movement. Interestingly, the puzzles can also be viewed as a kind of maze with restricted movement. My web page has the hardest 4x4 and 5x5 instances in playable form.
I found the hardest 6x6 puzzle to require a whopping 732 steps .
I'm interested in the "hardest" 6x6 puzzle. You show the one that takes the most moves to complete, but I don't think that's necessarily very difficult, if they're all pretty straightforward. Rather, I think the hardest puzzle is one where you have a lot of options to choose from. IOW, a very broad tree, rather than a very deep one. Are you able to quantify the hardest puzzle along those lines somehow?
ThinkFun, the US publishers of Rush Hour, wrote a blog post about this when they made the mobile app with a bunch of autogenerated levels maybe around 2010. Unfortunately that article is basically impossible to find now, since they redid their website and deleted the old posts. (I say "impossible", since I spent a couple of hours looking for it again while reading papers on procedural puzzle generation last year. Maybe somebody here has better luck).
The basic metric their level generator used for quantifying interesting difficulty was the earliest point of non-trivial divergence in solutions. I.e. if there's a puzzle with a 50 and 51 move solution, having those solutions diverge on move 3 is interesting. Having them diverge on move 45 isn't.
Check out the paper "Difficulty Rating of Sokoban Puzzle": https://pdfs.semanticscholar.org/880f/32f843e8e1fe9c712b0fc4... -- in particular the "problem decomposition" model introduced at the end. Being able to break a solution into subproblems makes it easier, even if there are a lot of steps (see Figures 5 and 6 in the paper).
One simple idea I had was NumMoves * log(ClusterSize) as a difficulty metric. But I'm not sure. I had my wife play several different puzzles to try to gauge difficulty based on that but it didn't seem so clear cut.
What we the people are demanding are insanely hard puzzles. Not an enumeration of hard games on the 6x6 board, but a heuristic generator for the 7x7 or the 8x8 or even the 10x10 if need may be, of games that seem utterly intractable for the first few hours.
Just finished this write up documenting my latest side project. I've been a bit obsessed with this problem for the past few weeks. Hope you like it! Be sure to check out the puzzle database and let me know if you do anything cool with it.
FWIW I think your approach to counting moves is more logical. We're more interested in understanding how many intermediate states a puzzle has between its initial state and solution. Whether you move a piece by one square or three is not particularly interesting.
Btw, if you like Rush Hour, I highly recommend Antivirus (terrible name for Googling) by Smart Games . The pieces come in 9 distinct shapes along with 2 wall pieces, which give most puzzles a unique character. Computational investigation shows that the hardest possible puzzle requires over 200 moves to solve. Beyond that, the game looks and feels fantastic .
> Ultimately I ended up with a complete database of every "interesting" starting position.
That's a nice thing to strive for. I love puzzle games, but I have come to realize there's a big difference in how different apps generate their games. For instance, Simon Tatham's Puzzle Collection is mostly very good. But for the "Loopy" game, I have liked the variants from the app "Slitherlink" more. Same game, but the challenges presented makes the difference.
Awesome. Have you consider  set (red car in 3 square, AAA) for primary row? The article state you only consider  set and I checked the database doesn't have element in  set. I ask it out of curiosity because some variant include 3-square red car.