R’alf Dev Log

The place where random ideas get written down and lost in time.

2025-04-12 - GA4 Stats from an ESP32?

Category DEV

At Randall, I’ll soon have the Distant Signal panel installed. This connects to the local wifi and gets the turnout state from the local MQTT broker. I want to track the “health” of that wifi connection, as that has been an issue in the past. The simplest way is to reuse my existing Google Analytics dashboards, thus I want the CircuitPython script on the ESP32 to send pings to GA4.

The goal is to measure the “uptime” of the display. The display is working when it is able to receive messages from the MQTT server. It can receive them when it can connect to the wifi. Thus we want to track the wifi state, or some kind of proxy for it.

Since this is CircuitPython, we have AdaFruit libraries to already deal with JSON and network.

So really “all” I need is something similar to the Analytics class in Conductor 2:

Queue stats to be sent.
Stats are sent as JSON payloads.
POSTs can fail, in which case they need to be retried with a backoff delay.

⇒ Obviously a “wifi lost” event would have no wifi to send the stat.

Do I need to put a real date/timestamp in the events?

⇒ These ESP32 don’t have a “date” clock, so it’s nice if I don’t have to.
There’s an NTP library which I’ve used on the LitterTimer experiment but it makes the logic more complicated since we need to expect that the device may not have wifi.

So what do we need to do custom pings to Google Analytics 4?

A “GA4_CLIENT_ID”, typically configured in settings.toml
A “GA4_MEASUREMENT_ID”, also configured in settings.toml
A “GA4_API_SECRET”, also configured in settings.toml

The feature is disabled if either of these settings is missing.
Placing them in setting.toml avoids leaking these in any GIT source, and it makes it easier to configure a new panel.
See below on where to find these.

A base URL:

https://www.google-analytics.com/debug/mp/collect -- debugging version
https://www.google-analytics.com/mp/collect -- real pings version

The POST URL is:

%(base)s?api_secret=%(GA4_API_SECRET)s&measurement_id=%(GA4_MEASUREMENT_ID)s

POST mime type: text/plain
Payload is JSON, and is used as such in Conductor:

{
        'client_id': GA4_CLIENT_ID,
        'events': [ {
                'name': event_action ,
                'params': {
                        'items': [ ] ,
                        'value' : value_int ,
                        'currency' : USD
                }
        } ]
}

The AdaFruit “requests” library is the simplest way to send a POST request. It accepts either a JSON (using a Python dictionary) or a string payload.
When POSTing to the debug URL, print the response.status_code and the response.content.decode() to get valuable information on why requests fail.
When POSTing to the real ping version, the GA server replies with 204 (Success w/ No Content) and a complete lack of response body.

Some details on the JSON payload:

The only required field is really the event’s name.
The “currency” is only needed if there’s a value, which is an optional field.
It’s possible to provide a date in YYYY-MM-HH-mm-ss to “back timestamp” an event. This is useful when doing some kind of retries with an exponential back-off. However the ESP32 does not have the current date time -- that would require adding an NTP library and corresponding calls to initialize it. Possible, yet extra work for little benefit.

Example of usage for Distant Signal

GA3 used to have an event with fields “category, “action”, “label” that formed a nice hierarchy. GA4 simplified all that and removed these fields. The only event field is “name”. Custom fields are possible yet they add setup complexity, processing delay, and other complications.

We need to pass in some kind of “panel id”, to differentiate multiple Distant Signal panels, and some events need to have a parameter that is a string. However the “value” field in the event is really a “currency value” and as such is only a number.

The GA4 event name can only be alphanumeric and underscore.

A simple solution is to adopt a pattern for the name:

“<turnout>__<category>_<action>___<extra>”

The schema would become:

Event Name Value Description
<ID>__wifi_connected <rssi> WiFi connected
<ID>__wifi_hb <rssi> WiFi heart-beat
<ID>__mqtt_connected <rssi> MQTT connected callback
<ID>__mqtt_disconnected <rssi> MQTT disconnected callback
<ID>__msg_script n/a MQTT “script” message received
<ID>__msg_turnout___<state> n/a MQTT “turnout/state” message received
<ID>__msg_block___<block> 1 or 0 MQTT “block/state” message received

Explanation:

The point of the wifi / MQTT messages is to also convey the state of the wifi.
We do that by passing the wifi RSSI (signal strength) as a value to graph it over time.
We don’t have a callback when the wifi is lost, nor could we send a ping to GA4 (since there would be no network, duh). Instead we’ll use a heart-beat, say every 15 minutes or every 30 minutes. I don’t need much more granularity than that.
The MQTT library has its own heart-beat and thus can call a “disconnected” callback. That can happen due to lack of wifi (in which case we won’t receive the event), or due to the MQTT broker server being offline. Consequently, we shall treat that event as unreliable. A good proxy for it is the MQTT “connected” event -- by definition it happens when the network is up, and it happens once at startup and then once after each disconnection.
My main goal is currently to track the wifi health, but let’s future proof it by also sending pings for the obvious core task of Distant Signal, which is to process the MQTT messages.

The “turnout” message selects a state, which is a string. Thus we cannot place it in the “value” field of the GA4 payload (it’s a currency, really). Instead we’ll sanitize it and append it to the event name, making it easier to split/extract later if needed.
The “block” message has 2 values: the block name, and its state. The state can easily be converted to a value (1 for ‘active’, 0 for ‘inactive’), and the block name can be sanitized and appended to the event name.

Using this pattern, we can use get these in the graphs using custom filters:

“starts with PanelID__” to filter specific panels.
“contains -mqtt_dis” to filter specific categories/actions.
Filter “everything after ___” to display a table with turnouts/blocks changes.

Configuring Google Analytics

Now we need to set up GA and “prime” it to make this all work.

Head to the Google Analytics web site.
Create an Account if you don’t have one, or use an existing one. We don’t need its ID.
Using the Admin toolbar on the left, create a Property.

If asked what type of property you need, make sure to select a web property. That’s important.
Give it some dummy base URL, something that belongs to you. We won’t be using that URL anyway. I typically select <my web site/my project name> even if there’s no such web page.
Under Data settings, select the Data Stream that has been created when creating the new Property. That’s the key to use the Measurement API. If you were using an existing property, you might want to add a new Data Stream to separate things. The point is that the Data Stream has to be a “web” one.
Enter the “Web stream details” screen. This gives us what we need:

GA4_CLIENT_ID is named the “Stream ID” on this page. IIRC “client id” is the old nomination under GA3, and GA4 refers to it as a “stream ID” in the configuration pages, whilst the JSON payload keep the old name of “client_id”. Same stuff. It looks like a large int64 number.
GA4_MEASUREMENT_ID is the “Measurement ID” on this page. It’s the “G-<letter>” code that we use in the gtag.js block.
Down in the “Web stream details” screen, select the “Measurement Protocol API secrets” and create a new one. The secret values goes in GA4_API_SECRET.

Now at this point, I typically run everything using the debug POST URL and check the server’s response and fix issues as reported. A typical issue is that I always forget that the event name must be only alphanumeric and underscore, and I often start with some more fancy event names that I need to simplify.

The second point is that at first nothing will work. POSTs will properly return a 204 indicating they are accepted, yet no stats are visible in the GA4 dashboard, and the Admin page has a cryptic message about “no data received in past 48 hours”. The fix is that data collection needs to be “initialized” by sending at least some successful pings from a real web page using the proper gtag.js library.

On a web server you own, create a new page. I typically match the one used for the stream URL.
Copy paste the Google tag Javascript using the link at the bottom of the “Web stream details” page.
Load that page to execute the google tag with your measurement ID in it.
Don’t forget to disable any ad blocker you’d have since they may be blocking the Google Analytics pings in the first place!
After a while, you should see the “Web stream details” having a green banner at the top with “Data collection is active in the past 48 hours”.
However at first, it will still display “no data” for a while. That page is really misleading, and instead you want to use the “home” button in the left toolbar, and that should indicate “your data collection is active”, with hopefully something in the “active users in the last 30 minutes”. Clicking the “View Realtime” link should show more. That’s progress yet it’s a bit early, and as the disclaimer indicates on the page, you will likely find no data if you try to create a custom dashboard right away.

Generally speaking, one major change from GA3 to GA4 is the major loss of “realtime” processing in the dashboards. Whilst the GA page can display some “real time” data in the one pre-made existing “Realtime overview” dashboard, that’s about it. I find that custom dashboards never have real time data. The processing delay seems to be at least 12 hours, especially if you use the GA connector for Google Looker Studio.

2025-04-10 - CircuitPython on ESP32: “pystack exhausted”

Category DEV

Well, that escalated quickly.

I’ve been battling with this in Distant Signal:

@@ MQTT: Failed with pystack exhausted
Traceback (most recent call last):
File "code.py", line 498, in <module>
File "code.py", line 354, in _mqtt_loop
File "adafruit_minimqtt/adafruit_minimqtt.py", line 956, in loop
File "adafruit_minimqtt/adafruit_minimqtt.py", line 1027, in _wait_for_msg
File "adafruit_minimqtt/adafruit_minimqtt.py", line 381, in _handle_on_message
File "code.py", line 328, in _mqtt_on_message
File "script_loader.py", line 27, in newScript
File "script_parser.py", line 316, in parseJson
File "script_parser.py", line 289, in _parseGroup
File "script_parser.py", line 214, in _parseInstructions
File "script_parser.py", line 278, in _parseInstructions
File "adafruit_display_text/label.py", line 88, in __init__
File "adafruit_display_text/__init__.py", line 273, in __init__
File "adafruit_display_text/__init__.py", line 294, in _get_ascent_descent
RuntimeError: pystack exhausted

It turns out that CircuitPython is built with a max stack depth of 15-16 calls (*), at least for this version of the MatrixPortal S3.

It’s entirely possible to rebuild it with a different stack depth, if one is inclined to do so.

The stack size is controlled by CIRCUITPY_PYSTACK_SIZE which is defined globally here:
https://github.com/adafruit/circuitpython/blob/HEAD/py/circuitpy_mpconfig.h#L455
and is also architecture dependent -- e.g. the SAM arch overrides the default.

Documentation on CIRCUITPY_PYSTACK_SIZE is here:
https://github.com/adafruit/circuitpython/blob/HEAD/docs/environment.rst#circuitpy_pystack_size

(*) As can be seen above, the “stack size” is not really a number of calls but really a size in bytes. Thus I expect calls with more arguments will exhaust the stack faster, or whatever else needs to be pushed on the stack for each call.

At runtime, the “ustack” module can give the current max and used size:
https://docs.circuitpython.org/en/latest/shared-bindings/ustack/index.html

Well, not on the MatrixPortal S3 port at least.

So how does one solve that? Since I do not want to rebuild CircuitPython, it means I can just amend the code to reduce stack usage.

For example the main starts with this:

if __name__ == "__main__":
setup()
loop()

This conveniently mimics an Arduino sketch setup. But we don’t need that “loop()” function. Just place everything under the “if main” and save one call stack entry.

The MQTT error above happened when a message was received -- I’d parse the script right there in the MQTT handler. Instead, what I do is save the script in a temporary global variable, and process it from the main loop, after the MQTT handler has returned. That way we’re “saving” about 5 calls depth on the stack.

The parser is about 5 calls deep. I could probably save one call in there, if I really have to.

Another easy strategy is to create lambdas, and queue them for processing in the main loop.

That’s more or less what I do with the delayed processing of the script, except since there’s only one thing to do, I keep the string reference around rather than wrap the processing in a lambda. Delayed processing can quickly become hard to follow, it’s harder to understand and debug.

2025-04-07 - The HUB75 Protocol for LED Matrix Displays

Category DEV

Over there in the trains/electronics blog, I have a little write up about the HUB75 Protocol for LED Matrix Displays, as used in the Distant Signal project. Go read it there.

2025-02-09 - Wifi APIs on Android 13

Category DEV

I’m still in the process of updating RTAC to work properly on Android 13.

RTAC uses a number of APIs which have been deprecated since Android 10. Since it was running on 2017 hardware with Android 9, that wasn’t much of a problem. Now I want to move to newer hardware that runs Android 13 so I need to deal with it.

WifiLock:

The WifiManager.WifiLock API is still present and usable.
I’ve added a handler class for it in TCM.
However it’s not absolutely clear what it really provides.

WifiManager#enableNetwork is reserved to “Device Owners” and system apps.

It seems that ConnectivityManager#requestNetwork could be an adequate replacement?
A first try of the ConnectivityManager#requestNetwork API wasn’t conclusive.
The app suddenly displayed a model dialog “connecting to device” with a spinner, which has the undesirable side-effect of pausing the main TCM activity.
Configuring the request for a known SSID did… nothing. It stayed on that spinner and did not switch to that wifi network.
Configuring the request for a known SSID and its WPA password did something weird where the wifi list showed a second entry of the SSID “for this particular app”.
So far that didn’t work as expected. My goal is to switch the wifi to an already known SSID that has already been configured in the Android wifi setting and already set up with password et al.

To be continued.

2025-02-05 - Here we go again…

Category DEV

Somehow the same wishes always circle back:

“Aww, wouldn't it be cool if <insert random unfinished project name> had a built-in editor and I could code the app on my {phone, PDA} whilst I'm using it?”

A few decades ago that's how the Hint project started, which quickly got nowhere -- mostly because I was trying to get everywhere at the same time.

And of course later I realized the goal is its own anti-goal -- it's already hard to finish one project, so adding another project inside that project pretty much guarantees that neither gets anywhere, fast.

2025-01-29 - Moving Again…

Category DEV

Another month, another problem… I’ll spare you 2 days of internal discussion and I’ll just summarize it in one sentence:

I pretty much have decided to move all my open source repositories from BitBucket to GitHub.

Historically I had all my code on Google Code. When that service shut down, a lot of folks went to GitHub and I had my reservations about that service. Instead I chose BitBucket. I still have some philosophical issues with GitHub, but right now I have even more of them with BitBucket so… a lengthy migration is in order.

2025-01-06 - Motivation to Write Games

Category DEV

Recently I’ve started two game projects at home, none of which are in a good enough state to be published. One of them is TD1 which I’ve discussed here in the previous years, and I finally built a prototype, originally to have a concrete way to explore the issues in a specific defense-vs-attack gameplay. There were some interesting challenges. Then I’ve discussed that with some friends. Now that this is done, I have had little motivation to actually implement the discussed gameplay changes. Instead I have thoughts on other projects.

Then I see a lot of little short games on Itch.io that are “playable” -- small scopes and essentially “mostly” finished. But I honestly don’t feel like playing them much besides the initial intro. Each time I think I’m maybe jealous because these games are finished and mine aren’t… but is that really the case?

Well, let’s analyze this a bit more.

1- I’m jealous these games are “finished” and I don’t finish mine.

Well, it turns out the first part is only the appearance of it. I fully understand some of the games I see in the various itch.io Jams are only finished “on the surface” -- good enough for a Jam deadline. Each time I try some, I feel like there’s a lack of depth.
It’s not true that I don’t finish my games. My Labs has Nerdkill and Asqare which, also on the surface, can appear “finished” -- and in fact these 2 are actually finished from my point of view.

2- These games lack gameplay depth. Well, yeah. Totally. All of them.

Nerdkill is a good example as I never intended for the “game” to have a scoring system because then it would become the game I don’t want it to be (gratification of violence). Not having a scoring system is a core part of this gameplay (“violence is pointless”).
Asqare could do more, but really what for? It’s a me-too kind of gameplay and the more I add to it, the more it would be clear it’s not Bejeweled -- when I started, I was trying to not be that game, as part of the original implementation was to try different gameplays -- but the problem is that no matter what, it would be compared to that baseline.

3- I do have a number of totally unfinished games. Or maybe it’s better to see them as “not even really started”. They are not even on the Labs page or anywhere to be found.

Cangrejo is one. It’s 200% me-too and not even clearly defined where it was going anyway. Jump! is another one such. They are essentially technological demos with a complete lack of interest in producing even a finished MVP. I only started them because I saw some other game and I thought “hey how would I implement this”. But once I found the core engine implementation, I didn't really care about all the work needed to make it a playable game.
In a way TD1 falls into that category as well. I had a vague idea, but I see a huge gap between what the prototype is and where I’d want it to be, and it’s not clear to me the end result would be interesting. Thus, yeah, I have little motivation to put in the effort to bridge that gap, only to end up with something I don’t care for at the end.

In essence, that last part is key: I measure the cost of the project in terms of involvement versus result, and I don’t find it motivation compared to other stuff I may be doing. For example stuff like Conductor, RTAC, or even SDB have a high involvement cost, yet on the other hand I can see the tangible benefits to me of having them working -- the Randall Museum visitors may not see the software behind the train automation, yet I know what drives that automation and I’m proud of it.

The other aspect of these game projects is that I tend to focus on the wrong thing. I find it more interesting to focus on the engine and the framework than the gameplay or the rendering itself. I’m not really a game designer, and I’m definitely not a graphic artist. Eventually implementing the gameplay is the least interesting part of game development so far.

2024-12-29 - Kotlin Web

Category DEV

It's time to look again at Kotlin for web development. Last time I tried it, a few years ago, it was embryonic and really not suitable for usage. Since then things have changed.

The entry point is Kotlin Multi Platform. It has now a version of Jetpack Compose: https://www.jetbrains.com/lp/compose-multiplatform/

The goal is to evaluate that against my use of Dart / Flutter on Firebase GCP.

There's currently some uncertainty in the future of Flutter. Dart should survive a bit longer as it still seems to be used internally by core projects, however the same can't be said of Flutter according to the water tank hearsay.

Evaluation criteria:

How to build a Kotlin MP project.
What can Compose do for the web.
Overall app architecture.
Integration with Firebase authentication.
Integration with Firebase DB.
Ecosystem of 3rd party libraries.
Ease of maintenance over time.

These are all points where Flutter more or less excels. Dart seemed like a barrier at first, until I understood it as basically “Java/Kotlin meets Typescript”. After that, it just made sense and it's easy to work with and pick it up again when I only tweak a project once a year.

The “web UI” aspect of Flutter is familiar -- it's basically ReactJS, but easier to use, and better documented.

The Flutter doc is very strong, and the IJ plugin is a delight to use.

Click here to continue reading...

2024-11-11 - ESP32 with Arduino-esp32, IDF 5.1.x, and C++20

Category DEV

Time to try again if this combo finally works.

My main motivation is to use C++20 Modules.

(TL;DR Summary: it works… and yet it doesn’t. C++ Modules require a very specific compilation ordering due to cross module dependencies, and the Arduino CLI simply cannot do that. However with ESP-IDF and CMake, it should be fine.)

2024-11-09 - Rig4j: Addressing Pagination

Category DEV

There are 2 issues, which keep haunting me:

The “N+1” problem with the first page repeating some but not all of the index.
Using the “next page” is “page N” and “next page” after that is “N-1” till page 1.

I have reverse ordering. The design is that each blog page has a stable URL with N articles each, so let’s say 10: page 1 is 1..10, page 2 is 11..20, etc. up to page N that “fills” up to N articles so that the last page always has an overlap with the index.

It’s telling that even I get surprised when I click on the “Next Page” link on the index and end up on “Page 6” rather than “Page 1”.

Issue 1: Previous / Next Links

The other issue is the navigation links.

I went from this, which was problematic:
⇐ Previous Posts Next Posts ⇒
to this, which IMHO has other issues:
⇐ Previous Page Next Page ⇒

The problem with “previous/next post” is that it’s ambiguous: “previous” is implied to mean “page N-1” or “older in time”, and “next” would mean “page N+1” or “newer in time”. But because the pages are in reverse chronological order, the “next” link actually points to an older-in-time content and page N-1. That’s the reverse of what one would expect.

I think we can solve that conundrum by renaming the links to be more technically correct:
⇐ Newer Page Older Page ⇒
⇐ Newer Post Older Post ⇒

That’s a small trivial change that I think that would be less ambiguous:

Newer / Older only imply time-based ordering, not index-based ordering.
We’re not implied what is a “previous” or a “next page” anymore.
Navigation is “[page N+1] << Newer || current is page N || Older >> [page N-1]”.

⇒ Done. Implemented. Seems very nice.

Also added an “{{If.IsIndex}}” command in the template so that we can have “Newer Post” in the single blog full page vs “Newer Posts” in the index blog pages. This required adding a crude support for nested {{If}}...{{Endif}}.

Click here to continue reading...

Older Posts ⇒

Generated on 2025-04-16 by Rig4j 0.1-Exp-f2c0035