See some answers below to your questions:
1. If you plan on deploying a non-blocking cluster of up to 36 nodes, the SX6036 should be fine. However if you already know you would exceed this number you should start building your L2 switches at the beginning with the proper cabling, this way you would save downtime and re-cabling of the cluster once number of nodes grows beyond 36.
2. Yes. This setup sounds reasonable.
3. There aren't any special tricks on the cabling. On a non blocking fat-tree built with 36 ports switches - 18 going toward aggregation and 18 facing the nodes. Make sure that L1 to L2 links are spread as even as possible between the L2 switches.
Another option for you to consider is a chassis switch (starting with 108 ports with SX6506).
There are some trade-offs between a design with 1U switches versus a chassis design - it is usually in favor of the chassis in a large scale design and toward the 36 ports switches in lower scale clusters.
With a chassis you need to populate all spines, and populate leafs as needed, you would need less cables but probably longer ones.
With 36 ports switches you would build your L2 aggregation at the beginning (to avoid cluster downtime and re-cabling) and add L1 switches as needed. You would need more cables but shorter ones.