We are considering the same thing I bet.
Building a "no-hop" grid but out of 100GbE links.
I read from googling posts that one way to test the throughput / cabling is to connect the CU QSFP28 to another one in another PC and do a dd or other tool and do a direct transfer point to point.
This validates the cable and ports without any switch in the way. Why wouldn't that work if natively as you also suggest - no switch latency delays.
The rates we need limit the number of links to target to 10-12 which is doable with 5-6 ConnectX4 in the right box.