GCP pipeline for min service cost and input data vary?

Which is the pipeline for json messages to process from PubSub to BigQuery to use for minimum service cost, and input data volume with variable size and minimal manual intervention. I believe Dataflow is the service to go with min service cost, and with its default autoscaling feature for  variable size of input data volume  -  throughput based on few Online documentation. Please help. I also saw an option for Dataproc with diagnose command, but i don't think that diagnose is used for this purpose.
  • post-author-pic
    Matthew U
    01-14-2019

    Dataproc is only to be used if you are currently utilizing Hadoop/Spark. If you are not using Hadoop/Spark, then Dataproc is not even an option. Google's official recommendation for new big data processing pipelines is to use Dataflow, with Dataproc recommended if you are already using Hadoop/Spark and want to keep using it.


    Dataflow only charges resources when a processing pipeline is in progress, then shuts down once processing is complete, saving costs compared to a cluster that is always up regardless of whether or not it is being utilized. For that reason, Dataflow would be a better choice.

  • post-author-pic
    Roshan R
    01-15-2019

    Thanks

Looking For Team Training?

Learn More