Auditing GA4 Traffic Source UTMs via BigQuery

Auditing GA4 Traffic Source UTMs via BigQuery

One of the new fields in GA4 is ‘Landing Page + query string‘, which displays all query parameters, whether you exclude them or not. Digital marketers can leverage this field to solve one of their biggest pain points, which is missing or inconsistent UTMs.

Today, I will show you how to run an audit on all of your landing pages using BigQuery. You can still use the Explore feature in GA4 to get an idea of some specific pages. However, aggregating the results won’t be possible. That’s why we will be using the Google Analytics 4 BigQuery dataset to achieve this at scale.

If you’re searching for a better and more automated method to audit your Google Analytics accounts, I highly recommend the GA4 Audit tool that I recently developed.

Before we jump to the BigQuery part, I want to highlight the setting for the Explore report if you decide to use it in your GA4:

Dimensions:

– Session source / medium

– Landing page + query string

Metrics

– Sessions

Filters

Landing page + query string that contains any of the below strings

fbclid: Facebook click id

ttclid: TikTok click id

sccid: Snapchat click id

twclid: Twitter click id

msclkid: Microsoft Ads click id

To audit additional parameters, you can simply add them as desired. After examining the report, you will be able to view the values for each specific landing page and source/medium. The challenge lies in aggregating the data and obtaining an overview of each traffic source. While you could still export the data to Excel and perform aggregation there, we will be utilizing BigQuery to automate this. The final Explore report should resemble the following:



GA4 BigQuery Dataset

By default, the GA4 BigQuery dataset includes the full page location URLs, including query parameters, unless the URL parameters are not passed to GA4. To conduct an analysis on the GA4 BigQuery export, you can utilize the following query. It will aggregate the results by source / medium. Additionally, you can customize the query to capture any additional click ID fields you wish to include. Don’t forget to change the table suffix per your desired date range.

SELECT
  CASE
    WHEN landing_page LIKE "%fbclid%" THEN 'Facebook'
    WHEN landing_page LIKE '%ttclid%' THEN 'TikTok'
    WHEN landing_page LIKE '%sccid%' THEN 'Snapchat'
    WHEN landing_page LIKE '%twclid%' THEN 'Twitter'
    when landing_page LIKe '%msclkid%' THEN 'Microsoft'
  ELSE
  'other'
END
  AS click_id_channel,
  session_source_medium,
  SUM(sessions) AS sessions
FROM (
  SELECT
    CASE
      WHEN ( SELECT value.int_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND KEY = 'entrances') = 1 
      THEN ( SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND KEY = 'page_location')
  END
    AS landing_page,
  IFNULL(
  IF
    ((
      SELECT
        value.string_value
      FROM
        UNNEST(event_params)
      WHERE
        KEY = 'source') IS NULL, CONCAT(traffic_source.source, ' / ', traffic_source.medium), CONCAT( (
        SELECT
          value.string_value
        FROM
          UNNEST(event_params)
        WHERE
          KEY = 'source')," / ", (
        SELECT
          value.string_value
        FROM
          UNNEST(event_params)
        WHERE
          KEY = 'medium') ) ), '(direct) / (none)') AS session_source_medium,
    COUNTIF(event_name = 'page_view') AS sessions
  FROM
    `analytics_258822732.events_*`
  WHERE
    
    _table_suffix BETWEEN '20230101'  # change date range
    AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 day))
  GROUP BY
    1,
    2
  HAVING
    landing_page IS NOT NULL
  ORDER BY
    sessions DESC )
GROUP BY
  1,
  2
HAVING
  click_id_channel != 'other'
ORDER BY
  sessions desc


The query output should look similar to the example below, helping you to identify which click_id_channel does not match the corresponding traffic source.


To identify the exact landing pages missing UTM parameters, you can add “landing_page” to the select clause of the query. This will enable you to view the landing pages associated with each click ID and determine which ones are lacking UTMs parametrs. By including this information in your analysis, you can gain a more comprehensive understanding of the extent to which GA4 BigQuery UTMs are missing or inconsistent across landing pages.