When you start increasing machine speed with ai cores you very quickly encounter throughput limits on data, even if you produce new data fast enough, you just can't really consume it that fast. It is especially noticeable when crafting datacells with training data, but you can see it in other recipes as well. Perhaps just scaling down the amount of data produced and required will help? Like divide all numbers by 10, or maybe even by 100. It's not gonna change balance, but will perhaps help with throughput.