Model Train-related Notes Blog -- these are personal notes and musings on the subject of model train control, automation, electronics, or whatever I find interesting. I also have more posts in a blog dedicated to the maintenance of the Randall Museum Model Railroad.

2023-05-06 - Conductor 2: The “Flaky” Sensors Issue

Category Rtac

I’m still trying to deploy Conductor 2 and it’s clearly not ready yet. I keep finding issues, some expected, some not.

The Freight automation has been an interesting source of such unexpected issues. It’s a simple reversing shuttle automation across two blocks, yet it has proven more challenging than the complex Passenger and Branchline automations -- these two actually perform quite well. Yet the “simpler” Freight one does not.

The most notable “bug” here is with the Freight automation: the engine starts on the first block, travels to the second block as expected…, and then instead of reversing the engine back to the beginning of the route, it decides the engine is back on the first block when it’s clearly still on the second block. At that point the automation panics because the engine is obviously not back at the starting point. The programming is entirely skipping half of the route automation, for no apparent reason. Why?

One of the issues which I was more or less expecting was dealing with the “flaky” sensor on B321, as I explain in detail on that post on the other blog. There’s some of that here, but there’s also something else entirely going on that I have not figured out yet.

Even after a bit of adjusting, the mountain sensors are still flaky, notably B321 and B370 with the Passenger train. I adjusted B321 for the Freight train but the newer PA engine is still borderline.

Of importance, these sensors detect current consumption. In the case of the reversing train, the sensor turns off when the train is stopped.

I cannot adjust these sensors to make them more sensitive -- I’m already running them at the most sensitive setting. Essentially this points out a clear issue with “flaky” sensors handling -- I thought I had accounted for this in the design, and I was partly wrong, or maybe my implementation is wrong, and I should understand why.

From my old design notes, this was supposed to take care of this:

The currently occupied block has a timer. As long as the block is occupied, we “don’t care” if the sensor temporarily turns off, and we still consider the block occupied. We only care when the next block becomes active. In essence, the “occupied block” state is a deflaked nature of the “sensor active” state.

That design seems sound. The implementation is also apparently adequate:

  • The timer gets reset when the train stops. That’s because the timer uses the block timeout, which is the time to traverse the block; thus the intent was for a reversing block to have “initial timeout going in + stop time + initial timeout going out”. But in fact we don’t reset the time when the train stops, we actually stop the timer.
  • This should make the timeoutExpired() method return always false -- the timeout can only be expired if the timer has been started, and we just stopped it.
  • Thus I don’t think it’s an issue of timeout expiring, and in fact I don’t remember seeing that in the logs.

So issue #1: Why does the Freight automation route end when the train has not returned?

It’s worth noting that the implementation works for the “running blocks” like B321 and B340 when the train is merely going through the block. The issue for now was only observed with the Freight stopping at B321. I realize I have not observed it with the Passenger train on B370. The log captured last week made no sense, as they seemed to indicate that the next block B311 had auto-magically activated -- which of course cannot be since there was no train on it.

Eventually I’ll get to the bottom of that. In between, let’s move on and identify the other issues.

Issue #2: Recovery with a flaky sensor is flawed.

That’s another interesting case noticed last week: the “flaky” sensors tend to be off when the train is stopped. That wreaks the current route recovery algorithms which are inherently based on detecting where the train is located to decide how to recover it. Worse, if we don’t recognize the Freight is stopped in B321, the initial algorithm was still trying to start the Passenger train and it would collide with it -- that has been orthogonally fixed by adding an interlock: PA can’t start if the FR is not at the home location.

The current recovery algorithm is inherently simplified anyway. For example, it does not handle a train stopped on a block boundary. If more than one block is active, the algorithm just gives up. This limitation is a delivery decision to get started with the easy cases.

So how can we recover a stopped train?

One suggestion is to force any missing train to move, hoping that would trigger enough current usage to trip the sensor. Since the block activation state is cached, we can’t move and detect the train in the same engine cycle; thus we’ll need a timer, something like this:

val recoveryTimer = timer(1.second)

fun RecoverTrainsStart() {

  // Move a train if we don’t find an active block on its route

  if (PA.blocks { active }.isEmpty) PA.reverse(1.speed)

  if (FR.blocks { active }.isEmpty) FR.reverse(1.speed)

  recoveryTimer.reset()

  recoveryTimer.start()

}

on { recoveryTimer } then { RecoverTrainsPart2() }

fun RecoverTrainsPart2() {

  // This is 1 second later. Blocks may have become active.

  PA.stop()

  FR.stop()

  // Detect active blocks for each train route again, and use that info.

}

Well, that may almost work. It will likely fail though: let’s assume the FR train needs to be moving to be able to detect it on the block. This above moves it such that the second part of the recovery can find where it is. Then it stops it, and again the block sensor will fail to find the train. Then we move to the recovery route, which first checks that the initial block is properly active, and it won’t, thus it will fail again.

What we need is to not stop the train which we’re going to recover, something like this:

on { recoveryTimer } then { RecoverTrainsPart2() }

fun RecoverTrainsPart2() {

  // This is 1 second later. Blocks may have become active.

  // Detect active blocks for each train route again, and use that info.

  if (PA is ok but FR is not OK) {

    PA.stop()

    recover_fr()

  } else if (FR is ok but PA is not OK) {

    FR.stop()

    recover_pa()

  } else {

    log(“Nothing we can do, aborting.”

    PA.stop()

    FR.stop()

  }

}

Issue #3: “On” rules should not use a flaky sensor as an activation trigger.

I do note that if we have an on rule that checks whether a sensor becomes active, it will have problems dealing with these flaky sensors -- the rule will get executed every time the sensor flakes. We could compensate for this by rewriting these rules to use the block occupied state instead:

on { B321.active   } then { action }                # don’t write this

on { B321.occupied } then { action }                # write this instead

Note that IBlock does not expose a boolean “isOccupied” property. It does now.

Update: Or here’s another approach: each sensor should keep a log of say, the last 10 activations. Then we could have a command such as “sensor.activeWithin(2.second)” or something like that. That would be a “soft” way of performing debouncing where needed at the script level. This can work both in an “on rule” and in a “if” in a whileOccupied check. When used in an “on rule”, it would automatically debounce that rule for the time given.

Actually that “soft” debouncing could also help solve the issue #2 above: to start a route, we enforce that the start block appears active and sensing a train. This totally fails on the flaky B321 block, and prevents successful recovery. However it would work if instead we could check whether the sensor had been active within a small amount of time.


 Generated on 2024-12-03 by Rig4j 0.1-Exp-f2c0035