What's most impressive is they've separated the rendering algorithm (Mitsuba 2) from the retargeting framework (Enoki).
Enoki look amazingfrom their paper. It supports vectorized CPUs, JITTed GPUs, forward/back autodiff, nested array types.
Mitsuba 2 then expands that range even further by templating on key types and operations. For example, a materials color property might be represented by a RGB tuple for basic rendering, or an array that captures the full spectrum of light frequencies for a spectral renderer. They supply some example code, which is absurdly clean, as it's devoid of any specifics of storage and calculation, and focusses just on the high level algorithm.
They claim that the GPU impl is superior to PyTorch / TensorFlow in some regards as it can split the difference between eagerly sending every operation to the GPU, or processing the entire graph at once.
The amount of work and understanding to produce something like this is insane - they just casually mention how they've implemented a novel light transport scheme, an "extensive mathematical support library", and sophisticated python bindings.
The original Mitsuba, also by Wenzel Jakob, is an extremely fast, clean, and modular piece of software engineering, in a space with not very many excellent free implementations; this looks to be quite an extraordinary successor. With the ability to be compiled to run on GPUs, it could even conceivably compete with Cycles, if the community picked it up and ran with it.
The renderer is capable of making incredibly life-like images, but it still takes a _lot_ of effort to make a life-like scene.
The most striking to me is that it's not naturally lit. The light from the windows looks like someone put studio lights outside each one, rather than the sun and sky. Given that Mitsuba supports spectral rendering, an empirical spectral sky/sun model would improve the quality of the lighting by several orders of magnitude, and assuming such model is present in Mitsuba would be an easy change.
Secondly the glossy surfaces lack surface details and natural variation, they are unnaturally even and lacking imperfections.
There are other things as well but those are the main ones to fix IMHO. Fixing the lighting should be simple as mentioned, but adding the surface detail can be quite time consuming.
In this case the image is meant as an illustration, so the simplifications made are quite acceptable.
Bad as in "unrealistic", i.e. lacking details and textures? Probably because they're researchers not 3D game designers. My guess is that it is just showing a standardized raytracing scene from a public research dataset used to compare raytracers.
The scene is demonstrating their "lightpath vectorization". If the claims work out, the real gains are the better use of the full hardware capabilities by vectorizing multiple rays without GPU/SIMD branch divergence - the divergence happens happens when different rays intersect different objects and dramatically slow down parallel work. That should really speed up rendering so allow more rays and create less noise, more detail.
The funny thing is this looks to me like a realistic picture of the inside of the McMansion. IE, a room whose contents are more or less "fake" things. Plastic "wood grained" floor (Pergo or whatever), Plastic "wood grained" table (fiberboard, etc), Brand-new polyester mid-tier couch (room never actually used), Pre-made pseudo-fancy gallery/bay windows, higher grade fake ferns...