That sounds odd, which motherboard is this? Are you sure the slots have different PCIe versions?
Almost all GPUs require PCIe x16 slots to physically fit. Most motherboards only have PCIe x1 (very narrow) and PCIe x16 slots (very wide). It’s common for motherboards to have two x16 slots, but only one is wired to use 16 lanes and the other uses only 8 or 4 (x8 and x4). It is also possible to drive an x16 GPU on an x1 slot using an adapter.
If there were really two different versions on your board, that means the theoretical bandwidth per lane for the PCIe 2 would be only 500 MB/s vs 985MB/s for the PCIe 3. However, you’d still have to take into account the amount of lanes provided (i.e. multiply by 16, 8 or 4).
I’ve been searching through Internet and a lot of people claim that the performance bottleneck is neglect-able at least in games.
It usually is, because for best performance, you’d want to minimize PCIe transfer while the game is running. The difference is in the load times (usually negligble) and the per-frame transfers (different for every game, but usually a small factor).
I can’t find if the same is true for Cycles or other GPU render engines.
Right now, Cycles only does GPU transfers on load and during image updates, the PCI bandwidth is a negligble factor in both cases.
However, Cycles also needs to fit the entire scene into GPU memory, otherwise rendering will fail. In the future, Cycles may implement streaming data in during rendering, easing the memory limit but increasing the stress on the PCIe bus. I’m only aware of one GPGPU render that does this, which is Redshift. You may want to ask that community how much impact PCIe bandwidth has in practice.
Is it a more valid approach to buy a more powerful GPU (eg.1070, 1080) and use one instead of two?
Looking just at PCIe bandwidth, I’d say no. Two 1070s will have better price/performance than one 1080, but they’ll also consume more power and produce more heat.