Streaming Ingest Pipeline Hands On

Length: 00:20:03

Lesson Summary:

We are going to demonstrate how to take our streaming ingest of traffic sensor data from the previous section, and run it through a Dataflow pipeline to calculate average speeds and output it to BigQuery.

The command line reference for what we are demonstrating is below.

  1. Create BigQuery dataset for processing pipeline output:

    • bq mk --dataset $DEVSHELL_PROJECT_ID:demos
  2. Create Cloud Storage bucket for Dataflow staging:

    • gsutil mb gs://$DEVSHELL_PROJECT_ID
  3. Create a topic and publish messages:

    • cd ~/googledataengineer/courses/streaming/publish

    • gcloud pubsub topics create sandiego

    • ./download_data.sh

    • sudo pip install -U google-cloud-pubsub

    • ./send_sensor_data.py --speedFactor=60 --project=$DEVSHELL_PROJECT_ID

  4. Open a new Cloud Shell tab

    • Browse to the Dataflow directory and run the script to create a pipeline, passing along our project ID, storage bucket, and Average Speeds file to construct the pipeline.

    • cd ~/googledataengineer/courses/streaming/process/sandiego

    • ./run_oncloud.sh $DEVSHELL_PROJECT_ID $DEVSHELL_PROJECT_ID AverageSpeeds


This lesson is only available to Linux Academy members.

Sign Up To View This Lesson
Or Log In

Looking For Team Training?

Learn More