So you have 360GB of raw data. If you have the IOPS to handle 20GB shards, then you have 18 primary shards with 36 replica shards given a replication factor of 2. That gives you just over 1TB of data spread over 54 shards.
So with a replication factor of 2 you need, at the very least, 3 data nodes in order to allocate a primary with two replicas. That gives you 18 shards per node. Assuming you are aiming for a 1:32 heap to storage ratio, then you will want 360/32=~12GB of heap per node. Which means you want 24GB of memory per machine in order to have enough memory left over for lucene cache. Alternatively, you can use a 1:96 ratio which would require 360/96=~4GB of heap per node and 8GB of memory per machine. Use a smaller ratio for search speed and a larger ratio for slower search but larger total storage volume.
In this case, you needed 3 nodes to logically allocate 2 replicas. Once your desired ratio suggests a heap size greater than 30GB, then you need to reduce the volume per node by adding more of them. So for a target ratio of 1:32, you can have up to 960GB of storage per node in order to stay within compressed oops limits for JVM heap.
Also, be sure to account for failover when sizing. If all your nodes are running at max volume, then not only can you not add more data, but should a node fail, and shards need to re-allocate to other nodes, there needs to be space for them in the first place.