it is not clear what is your tests scenario and how far the results differ
I suggest that if your vdbench-based benchmark test is running over mellanox adapter/s, apply to email@example.com preset the test in more details to get assistance on optimum & proper expected figures
Test scenario: running vdbench-based benchmark - randomly placed writes of varying sizes. Using ConnectX-4 with RoCE.
Result: iostat produces results that are two orders of magnitude lower than what is reported by dstat/vdbench/device counters (e.g. 100KB/s instead of 50MB/s).