The place where random ideas get written down and lost in time.

2025-04-12 - GA4 Stats from an ESP32?

Category DEV

At Randall, I’ll soon have the Distant Signal panel installed. This connects to the local wifi and gets the turnout state from the local MQTT broker. I want to track the “health” of that wifi connection, as that has been an issue in the past. The simplest way is to reuse my existing Google Analytics dashboards, thus I want the CircuitPython script on the ESP32 to send pings to GA4.

The goal is to measure the “uptime” of the display. The display is working when it is able to receive messages from the MQTT server. It can receive them when it can connect to the wifi. Thus we want to track the wifi state, or some kind of proxy for it.

Since this is CircuitPython, we have AdaFruit libraries to already deal with JSON and network.

So really “all” I need is something similar to the Analytics class in Conductor 2:

  • Queue stats to be sent.
  • Stats are sent as JSON payloads.
  • POSTs can fail, in which case they need to be retried with a backoff delay.
    • ⇒ Obviously a “wifi lost” event would have no wifi to send the stat.
  • Do I need to put a real date/timestamp in the events?
    • ⇒ These ESP32 don’t have a “date” clock, so it’s nice if I don’t have to.
    • There’s an NTP library which I’ve used on the LitterTimer experiment but it makes the logic more complicated since we need to expect that the device may not have wifi.

So what do we need to do custom pings to Google Analytics 4?

  • A “GA4_CLIENT_ID”, typically configured in settings.toml
  • A “GA4_MEASUREMENT_ID”, also configured in settings.toml
  • A “GA4_API_SECRET”, also configured in settings.toml
    • The feature is disabled if either of these settings is missing.
    • Placing them in setting.toml avoids leaking these in any GIT source, and it makes it easier to configure a new panel.
    • See below on where to find these.
  • A base URL:
    • https://www.google-analytics.com/debug/mp/collect    -- debugging version
    • https://www.google-analytics.com/mp/collect          -- real pings version
  • The POST URL is:
    • %(base)s?api_secret=%(GA4_API_SECRET)s&measurement_id=%(GA4_MEASUREMENT_ID)s
  • POST mime type: text/plain
  • Payload is JSON, and is used as such in Conductor:

{
        'client_id': GA4_CLIENT_ID,
        'events': [ {
                'name': event_action ,
                'params': {
                        'items': [ ] ,
                        'value' : value_int ,
                        'currency' : USD
                }
        } ]
}

  • The AdaFruit “requests” library is the simplest way to send a POST request. It accepts either a JSON (using a Python dictionary) or a string payload.
  • When POSTing to the debug URL, print the response.status_code and the response.content.decode() to get valuable information on why requests fail.
  • When POSTing to the real ping version, the GA server replies with 204 (Success w/ No Content) and a complete lack of response body.

Some details on the JSON payload:

  • The only required field is really the event’s name.
  • The “currency” is only needed if there’s a value, which is an optional field.
  • It’s possible to provide a date in YYYY-MM-HH-mm-ss to “back timestamp” an event. This is useful when doing some kind of retries with an exponential back-off. However the ESP32 does not have the current date time -- that would require adding an NTP library and corresponding calls to initialize it. Possible, yet extra work for little benefit.

Example of usage for Distant Signal

GA3 used to have an event with fields “category, “action”, “label” that formed a nice hierarchy. GA4 simplified all that and removed these fields. The only event field is “name”. Custom fields are possible yet they add setup complexity, processing delay, and other complications.

We need to pass in some kind of “panel id”, to differentiate multiple Distant Signal panels, and some events need to have a parameter that is a string. However the “value” field in the event is really a “currency value” and as such is only a number.

The GA4 event name can only be alphanumeric and underscore.

A simple solution is to adopt a pattern for the name:

“<turnout>__<category>_<action>___<extra>”

The schema would become:

Event Name                     Value      Description
<ID>__wifi_connected           <rssi>     WiFi connected
<ID>__wifi_hb                  <rssi>     WiFi heart-beat
<ID>__mqtt_connected           <rssi>     MQTT connected callback
<ID>__mqtt_disconnected        <rssi>     MQTT disconnected callback
<ID>__msg_script               n/a        MQTT “script” message received
<ID>__msg_turnout___<state>    n/a        MQTT “turnout/state” message received
<ID>__msg_block___<block>      1 or 0     MQTT “block/state” message received

Explanation:

  • The point of the wifi / MQTT messages is to also convey the state of the wifi.
  • We do that by passing the wifi RSSI (signal strength) as a value to graph it over time.
  • We don’t have a callback when the wifi is lost, nor could we send a ping to GA4 (since there would be no network, duh). Instead we’ll use a heart-beat, say every 15 minutes or every 30 minutes. I don’t need much more granularity than that.
  • The MQTT library has its own heart-beat and thus can call a “disconnected” callback. That can happen due to lack of wifi (in which case we won’t receive the event), or due to the MQTT broker server being offline. Consequently, we shall treat that event as unreliable. A good proxy for it is the MQTT “connected” event -- by definition it happens when the network is up, and it happens once at startup and then once after each disconnection.
  • My main goal is currently to track the wifi health, but let’s future proof it by also sending pings for the obvious core task of Distant Signal, which is to process the MQTT messages.
    • The “turnout” message selects a state, which is a string. Thus we cannot place it in the “value” field of the GA4 payload (it’s a currency, really). Instead we’ll sanitize it and append it to the event name, making it easier to split/extract later if needed.
    • The “block” message has 2 values: the block name, and its state. The state can easily be converted to a value (1 for ‘active’, 0 for ‘inactive’), and the block name can be sanitized and appended to the event name.

Using this pattern, we can use get these in the graphs using custom filters:

  • “starts with PanelID__” to filter specific panels.
  • “contains -mqtt_dis” to filter specific categories/actions.
  • Filter “everything after ___” to display a table with turnouts/blocks changes.

Configuring Google Analytics

Now we need to set up GA and “prime” it to make this all work.

  • Head to the Google Analytics web site.
  • Create an Account if you don’t have one, or use an existing one. We don’t need its ID.
  • Using the Admin toolbar on the left, create a Property.
    • If asked what type of property you need, make sure to select a web property. That’s important.
    • Give it some dummy base URL, something that belongs to you. We won’t be using that URL anyway. I typically select <my web site/my project name> even if there’s no such web page.
    • Under Data settings, select the Data Stream that has been created when creating the new Property. That’s the key to use the Measurement API. If you were using an existing property, you might want to add a new Data Stream to separate things. The point is that the Data Stream has to be a “web” one.
    • Enter the “Web stream details” screen. This gives us what we need:
      • GA4_CLIENT_ID is named the “Stream ID” on this page. IIRC “client id” is the old nomination under GA3, and GA4 refers to it as a “stream ID” in the configuration pages, whilst the JSON payload keep the old name of “client_id”. Same stuff. It looks like a large int64 number.
      • GA4_MEASUREMENT_ID is the “Measurement ID” on this page. It’s the “G-<letter>” code that we use in the gtag.js block.
      • Down in the “Web stream details” screen, select the “Measurement Protocol API secrets” and create a new one. The secret values goes in GA4_API_SECRET.

Now at this point, I typically run everything using the debug POST URL and check the server’s response and fix issues as reported. A typical issue is that I always forget that the event name must be only alphanumeric and underscore, and I often start with some more fancy event names that I need to simplify.

The second point is that at first nothing will work. POSTs will properly return a 204 indicating they are accepted, yet no stats are visible in the GA4 dashboard, and the Admin page has a cryptic message about “no data received in past 48 hours”. The fix is that data collection needs to be “initialized” by sending at least some successful pings from a real web page using the proper gtag.js library.

  • On a web server you own, create a new page. I typically match the one used for the stream URL.
  • Copy paste the Google tag Javascript using the link at the bottom of the “Web stream details” page.
  • Load that page to execute the google tag with your measurement ID in it.
  • Don’t forget to disable any ad blocker you’d have since they may be blocking the Google Analytics pings in the first place!
  • After a while, you should see the “Web stream details” having a green banner at the top with “Data collection is active in the past 48 hours”.
  • However at first, it will still display “no data” for a while. That page is really misleading, and instead you want to use the “home” button in the left toolbar, and that should indicate “your data collection is active”, with hopefully something in the “active users in the last 30 minutes”. Clicking the “View Realtime” link should show more. That’s progress yet it’s a bit early, and as the disclaimer indicates on the page, you will likely find no data if you try to create a custom dashboard right away.

Generally speaking, one major change from GA3 to GA4 is the major loss of “realtime” processing in the dashboards. Whilst the GA page can display some “real time” data in the one pre-made existing “Realtime overview” dashboard, that’s about it. I find that custom dashboards never have real time data. The processing delay seems to be at least 12 hours, especially if you use the GA connector for Google Looker Studio.


 Generated on 2025-04-16 by Rig4j 0.1-Exp-f2c0035