That's pretty much expected, I'm afraid. Not only does it have to generate twice as many chunks, plus whatever overhead comes from the copying itself, but I suspect that the part of the code that waits for the echo chunks to be generated before copying them is causing a significant impact when generating a lot of chunks at once.
It miiiiiiight be possible to refactor things to get rid of that last bit of overhead, but the "twice as many chunks" part is always going to hurt compared to a proper fix on SE's end.