Discover your Strava Data with Serverless & Advanced Analytics

Interceptor, the decent road bike that serves me well.

The Setup

From Strava to Minority Reports with the Cloud’s power

Requirements

Strava Logo
  • A Source of Data: For our use case, we are using the Strava API, which provides access to data.
  • A Set of Tools: Cloud Providers (MSFT Azure, Amazon AWS, Google Cloud Platform) all provide the building blocks to carry out what we want to do, and at a cheap price.
  • An Insight Goal: Even if you achieve data exploration, without a premise and finding targets, you won’t achieve much. Here, we will try to gather progress, performance and improvements throughout our own activities. This can be done with Google’s DataStudio, Amazon’s Quicksight or Microsoft PowerBI.

Setting The Source

  1. Client ID
  2. Client Secret
  3. Refresh Token

Setting The Data Platform

  1. Retrieve at regular intervals our activities without the need of human interaction.
  2. Store that data.
  3. Allow ourselves to explore the data, design graphs and visualize insights and breakthroughts.
  1. Cloud Scheduler: At regular intervals, the Cloud Scheduler (essentially a cronjob) starts and triggers our serverless functionfor us. The Scheduler triggers the endpoint of the Cloud Function (via a HTTP POST) every 6 hours (note: strava allows 100 calls every 15 minutes, up to 1000 calls a day). Since our function doesn’t require a specific parameter, the body of the call can be empty but is still has to be of a valid json format.
  2. Cloud IAM: The Function requires permissions, via a service-account. For this case, it needs to :
  • Access to launch data jobs (bigquery.jobUser)
  • Store data to BigQuery (bigquery.DataEditor).
  • Access to read our Strava Secrets (secretmanager.secretAccessor).
Strava API Secrets in Secret Manager

The Serverless Function

  1. Retrieve Strava information: we store our strava oauth2 information in Secret Manager and the function must retrieve temporarily these information:
def fetch_from_secretmanager(project_id, secret_id):
client = secretmanager.SecretManagerServiceClient()
name = f"projects/{project_id}/secrets/{secret_id}/versions/latest"
response = client.access_secret_version(request={"name": name})
payload = response.payload.data.decode("UTF-8")
logging.info(f'Retrieved {secret_id}')
return payload
def fetch_strava_accesstoken(clientid, secret, refreshtoken):
resp = requests.post(
'https://www.strava.com/api/v3/oauth/token',
params={f'client_id': {clientid}, 'client_secret': {secret}, 'grant_type': 'refresh_token', 'refresh_token': {refreshtoken}}
)
response = resp.json()
logging.info(f'Retrieved refresh_token & access_token')
return response['access_token']
def fetch_strava_activities(token):
page, activities = 1, []
while True:
resp = requests.get(
'https://www.strava.com/api/v3/athlete/activities',
headers={'Authorization': f'Bearer {token}'},
params={'page': page, 'per_page': 200}
)
data = resp.json()
activities += data
if len(data) < 200:
break
page += 1

logging.info(f'Fetched {len(activities)} activites')
return activities
def activites_to_bq(jsonl_lines, project, dataset, table): 
bq_client = bigquery.Client()
job_config = bigquery.job.LoadJobConfig()
logging.info(f'Loading in {project} / {dataset} / {table}')
job_config.source_format = bigquery.job.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.write_disposition = bigquery.job.WriteDisposition.WRITE_TRUNCATE # Overwrite
job_config.create_disposition = bigquery.job.CreateDisposition.CREATE_IF_NEEDED
job_config.autodetect = True
job = bq_client.load_table_from_json(
json_rows=jsonl_lines,
destination=f'{project}.{dataset}.{table}',
job_config=job_config
)

logging.info(f'Launched job id: {job.job_id}')
return job.job_id
def run(request):
strava_clientid = fetch_from_secretmanager(GCP_PROJECT_ID, STORED_CLIENT)
strava_clientsecret = fetch_from_secretmanager(GCP_PROJECT_ID, STORED_SECRET)
strava_refreshtoken = fetch_from_secretmanager(GCP_PROJECT_ID, STORED_REFRESHTOKEN)

strava_accesstoken = fetch_strava_accesstoken(strava_clientid, strava_clientsecret, strava_refreshtoken)

strava_activities = fetch_strava_activities(strava_accesstoken)

activites_to_bq(strava_activities, GCP_PROJECT_ID, BQ_DATASET, BQ_TABLE)
return f"Strava API Job completed."

Exploring The Data

Adding a BigQuery source to DataStudio.

Discovering Insights

left: Weekly averages with watts progression; right: quarterly overview.
metrics used

Building on AWS

AWS Lake Formation design & features

References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store