“The fact they run benchmarking TWICE with wildly different results should make them stop and think.” In a postmortem published Friday, Sakana admitted that the system has found a way to ...