In general, how does ML researcher scale GPU with different algo?

I saw a big data practice exam question similar like this -

You have an task running LSTM (Long short term memory) with RNN MXNET running on EC2, what is the best strategy to arrange GPU resources (Select 2):

A. Data parallelism with multiple GPU core distributed

B. Model parallelism with multiple GPU core distributed

C. One compute EC2 instance with elastic GPU

D. Process parallelism with multiple GPU core distributed

E. One general cluster with multiple GPU cores

Which two I should choose? Could some ML expert elaborate this?

  • post-author-pic
    John M
    12-03-2018

    Any implementation of parallelism would give more allocated GPU than a single compute node or a cluster sharing the same GPU. Detailed answers are at this link:

    https://aws.amazon.com/blogs/machine-learning/scalable-multi-node-deep-learning-training-using-gpus-in-the-aws-cloud/

    Hope this helps - John

  • post-author-pic
    John M
    12-03-2018

    ... one last thought. the data transfer rate is not really a factor of GPU, so its more about the compute, but see what the article link i gave above says.

Looking For Team Training?

Learn More