Augst 26, 1998

Risks Digest

I'm going to use this issue to republish some of the letters I've sent into Risks Digest. You should look at the Risks Archive to see ensuing discussions.

Note that some of the submissions (such as my tirade on the stupid leap-second, have already subsumed by other essays. This is not a complete set of my entries and some of been slightly edited. I've also added comments in italics where appropriate.

Paying ATT Wireless

From Risks 19.93

This seems to have gotten wide circulation in both Risks Digest and David Farber's Interesting People List. My point is more about the powerlessness that people feel in dealing with these kind of problems than to pick on ATT. But it is a lesson in the value of specific rather than theoretical essays.

Date: Fri, 21 Aug 1998 10:21 -0400
To: Risks Submissions <risks@csl.sri.com>
From: Sample@Frankston.com
Subject: AT&T and snails

Using Quicken I sent a payment to my ATT wireless account. A few weeks later they started dunning me though the payment was clearly listed and processed. But it hadn't cleared. After a while I looked at the payee record and noticed it was queued for electronic payment. But that confuses ATT wireless which claims to not accept electronic payments. So I try again but notice that my paper payment is coerced into an electronic payment automatically. I finally figure out that if, instead of paying "ATT Wireless Services", I add a {} comment to the end, it remains a paper payment. At least on my side.

I figured this out when one of the ATT billing people called me on the phone. She said she would note that the payment is on the way. Just got another call from someone at ATT wireless demanding payment. Of course, nothing in my record and once again told me that ATT doesn't handle accept electronic payments and that everyone places a check on the back of a snail (OK, snailmail might not be a fair term but it seems most appropriate or, at least, colorful here). Of course, this is nonsense considering the demographics of the PCS early adopters.

Maybe I shouldn't be surprised since this is the same company that has been sending me a monthly bill for a $.15 credit on an old home office line for over a year.

My real puzzle is why ATT doesn't seem to have a clue that it is their fault that the payment is coerced to an electronic payment and that someone should attempt to solve it. The larger issue is that whether a problem is caused by new technologies or more traditional problems, I'm struck by the lack of an attitude that problems are there to be solved instead of simply suffered. It is a reaction consistent with dealing with any bureaucracy but for those of (some of) us reading this list they are teething problems which need attention.

Y2K as a necessary event: Contingency plans needed

From Risks 19.85

In my ongoing role of being the naive* contrarian ....

I'm concerned about all the Y2K discussion that focuses on prevention and little, if any, discussions of contingency plans. This represents a basic misunderstanding of how to deal effectively technology. I use the term "ballistic automation" for the clockwork-like model of automation in which one sets up all the rules and the system just runs without ongoing intervention and tweaking.

In any system, there will be surprises and failures. While prevention is great, it is never complete. Instead one must prepare for failures. We must assume that there will be pervasive Y2K failures. The question is how do we survive and recover from them. Such planning has a higher value than Y2K prevention in that they basis for resilience that can deal with failures in general and, as a side benefit, provides better security since security breaches are simply failures.

And Y2K is only one of many problems. There are many limited-size fields including other clocks (like the Unix one due to expire in 2037?)

Systems do not deal with events that are unanticipated and have difficulty with those anticipated but not experienced.

One simple example the response zip code changes. Read Post office delivers new codes for more on the zip code changes in the Boston area.

It took years for phone systems to learn to deal with area code changes and generalized area codes. But no one has heard of a zip code change. When I provide my new zip code on e-forms, it gets rejected by systems that do checking. Even mail from the Post Office itself uses the old zip code. Not only is the zip code changing but it will be recycled within a year or so! Hopefully, unlike the phone network, I'll still be able to get mail in the future since the Post Office does have some resilience in that it tries to handle failures with manual intervention (for now). But the more general principles of systems design need to percolate from what we've learned in designing systems into more the more general awareness of design issues. The zip code system, for example, was designed without leaving extra zip codes for future growth!

While on the topic of the Post Office, there was another article Errant mail delivery brings bagful of woes about the consequences of unreliable delivery. The concept of end to end vs link level reliability is something we've learned in the design of computer systems (See End-To-End Arguments In System Design). Again, this experience needs to feed back into low tech systems.

There are indeed risks of technology. But there are also risks of nontechnology. We must understand the risks but shouldn't be naive to assume that we can choose a risk-free path. And we must learn that we only anticipate some changes and need to "shake out" systems periodically. Just like we've learned that value of forest fires, Y2K might help in clearing out the underbrush.

* I'm not really that naive, but a nonnaive discussion that goes into all the issues would be too long and boring for this forum.

Once again, I'm risking my life flying

From Risks 19.73

This one got a strong reaction from many who defended the current system. I tried to explain that the real point is that the current system emphasizes procedural correctness over safety and that it is necessary have an approach that allows one to enhance safety with the addition of GPS rather than to ban any change on the assumption that any change anywhere in the plane will make the existing systems less reliable. Alas, aviation is a field that selects against those will innovate. Safety in the airline industry is also more about marketing safety. Thus even a modest number of crashes must be avoided by making the cost of each seat very high. It is like when ATT could charge any price they deemed necessary to assure you that your phone will last 100 years.

Caveat: I'm not an expert on avionics. My interest is in creating resilient distributed systems....

I just walked off a DC-10 that had mechanical problems was delayed. The 757 I'm on is racing it to Interop at the moment.

DC-10 was already an hour late getting from the hanger to the gate due to either traffic problems (within O'Hare) or a cargo door problem.

But the new problem is (was) a bad compass. The third compass on the plane had to be replaced due to FAA rules. After all, we can't take any risks, can we? I asked the crew whether they could travel without it and rely on a GPS. Of course, a DC-10 has no GPS! Not surprising given the age of the plane. But what is of concern is that they couldn't just go out to the store, buy a GPS, and place it in the cockpit.. As a passenger, when I bring my GPS and PC, I've got technology far far ahead to the technology on the plane. Technology to which two hundred (whatever a full DC-10 holds) trust their lives! On the other hand, if both of the other two compasses did fail, there are still lots of ground systems that can find the plane and bring it to a nearby beacon (it is cloudy, so they can't just get out their road maps).

I was already thinking about these issues after talking to the crew (while waiting for the plane to appear out of the mists at the gate) about the 727 which has even more primitive avionics. The reason that the systems can't be upgraded is that the whole plane would have to be recertified as a new aircraft.

There is something very wrong here. The engineering practices that are supposed to assure our safety seem to work to assure our lack of safety.

I can understand the historic necessity of treating the airplane as a single tightly interconnected system. There wasn't the luxury of giving the electronic systems enough capability to act autonomously. I presume, though, that the mechanical systems try to be independent-enough to reduce the propagation of failures.

But, if we think about the simple example of just placing a GPS in the cockpit and allowing the airplanes computer to use the data we have a very different model. Of course, the navigation system should fully trust the GPS and must do some reasonable checks as well as cross-check with other sources. If the GPS fails, then it would compensate.

Yes, there can be strange systemic interactions. But, instead, we have a situation that assures lousy navigation rather than permitting improvements when available.

Understanding how to build such resilient distributed systems is still in the challenge category. But the Web is a very good example. I see the technology growing more due to hacking than design. Effective hackers work against the constraints of others and are thus forced into being tolerant of other's mistakes. Most will get it wrong, but I'd rather a pilot just put a GPS in the cockpit even if not interconnected, than having to get out the sextant for each flight.

Followup on Flying

From Risks 19.77

I wrote this in response to comments on the previous entry.

I shouldn't be surprised that the general response has been to tell me (personally) why things have to be the way they are. I've even been told both why there are three compasses and also that there are only two.

Of course I know there are very good reasons for the current approaches. But where is the outrage and dissatisfaction with such a cumbersome and limited approach to building and, more important, evolving systems? Implicit in many of the responses is a naive notion that system boundaries are well-defined.

It's as if I was back listening to ATT in the 70's explaining why it civilization would end if I were allowed to plug my telephone into the phone network! (Yes, really!)

There are those of us who, in the 70's took the toys such as the Apple ][, and made them the tools choice for trillion dollar calculations such as the national budget. From the thread about sextants, the Navy is discovering that the retail marketplace has become the driver. (Are there sextants in the cockpits?)

As to the complaints about the limitations of GPS (of which I and the pilots are well aware), why is there no incentive to address them? Perhaps adding level indicators and reasonableness checking? They already have batteries. One can evolve "toys" much more quickly than "commercial" equipment as long as the linkage with the other systems is arms-length and there is sufficient mutual suspicion.

It would be great to have the position data available on the in-plane IP network. Not only would one be able to add equipment (such as terrain maps) without recertifying the plane, it would allow passengers to use their PCs to enrich the view from the window.

I'm not sure how to respond to the safety issue. While I do wear my seat belts during the entire flight it's a non sequitur. Of course I understand the difference between safety and reliability but it is more than a simple matter of retreating into semantics and formalisms. Safety is not absolute "freedom from accidents or losses".

So I'll fan the flames by asking why flying is safer than driving? The reason is that the marketplace does demand it. Plane crashes are much worse PR per capita than car crashes. So we spare no expense to make planes not crash. Those who can't afford it risk their lives driving (see the 27 May 1998 NY Times business section). Have we simply shifted the risk?

Only respond if you are dissatisfied with business as usual. Post no rationalizations.

Bob Frankston http://www.mit.edu/~bobf

More on flying in the real world

I don't think I submitted this one but it is relevant. This experience of arbitrary rules was reinforced on Delta Airlines which bans the use of cell phones while inside the plane on the ground but allows me to use it near the cockpit at the open door as long as I'm theoretically outside the circumference of the place.

Once more I'm focusing on my experiences in flying. Perhaps I'm doing it too much but it is represents a strong contrast with marketplace thinking. In the consumer marketplace, one must accommodate human foibles but when flying, I have the responsibility to follow many rules. If they are violated by me or any of the other passengers the consequences are dire. If this were true, the risk would be intolerable.

To attend INet 98 (http://www.isoc.org/inet98/) I flew Lufthansa (LH) which has different rules from its partner United (UA). LH doesn't want anyone using CD players connected to their laptops or, perhaps, even within. And DL doesn't want you using a cellular phone on the ground in the back of the plane but you can use it inches from the cockpit as long as it is on the curve that would be occupied by the door if it were closed. Of course, the airlines casually ask people to turn off their phones and other transmitters but I suspect that many, if not a majority of the people, carrying two-pagers, small phones and other devices are even aware that they are carrying transmitters. Those who have tried to look for eyeglasses only to realize that they are already in place can understand not even being aware of carrying such a device. A PDA is just a note pad, it's not a computer any more than one's hearing aid is a computer or ones watch is an electronic device.

Given the reality, what is being done to make planes safe in spite of humans acting like, well, humans?

Of course, I do have my own bias against sensory deprivation due to not being allowed to use my electronic pen and paper. And against the silly need to lug the weight (and the space) of wood pulp (thus increasing the weight of the plane) when all I want to do is read the contents of a book. And I also fear flying on an aircraft that overly vulnerable to electronic noise.

BTW, similar reasoning applies to banning the use of cellular phones while driving. The ban should be extended to carrying passengers since I find the presence of interesting passengers is even more likely to put me into "automatic" driving mode due to the rich social matrix. A phone conversation tends to be less engaging and thus less distracting. I can even conjure up the necessary statistics to "prove" this. Or, at least, an anecdote or two or one.

PS: In addition to INet 98, I also attended the 2nd World Skeptics Congress in Heidelberg. The emphasis there is on critical thinking (http://www.gwup.org/konferenz_e.html). Very appropriate.

German Train Accident

From Risks 19.89

This is a short comment following on to an earlier discussing in issues 19.80, 19.81 and 19.83.

On the plane back from Europe (Lufthansa appropriately enough), I sat next to a German engineer (actually, chemical and running a company, but that's beside the point) and asked him about the train crash. He said that the wheel had been off the tracks for 5km and the magnitude of the problem was due to having a switch track just before a bridge. Similar accidents had occurred in France but in less damaging locations. The solution is to track the behavior of the wheel (or the proximity to the track) with sensors to discover the problem early. This sounds even better than periodically checking the wheel since it will catch the actual problem as soon as it occurs even if there were a separate cause for a wheel problem.

No Hyphens!

I may have submitted a comment on this topics to Risks in the past but I don't know if it got published.

I just encountered another web form – this one asking me to type in my registration number without hyphens or spaces. I encounter this often on entering credit card numbers. All I can think of is that a complete and utter idiot setup the site since:

  1. The punctuation is there to make it easier for humans to parse the number and thus reduce errors. This is in the interest of those asking for this information.
  2. It is trivial for anyone with a nonnegative IQ to parse the field and simply ignore the punctuation.

I don't understand why there are so many forms of this sort. It represents the worst in systems design – forcing insane systems decisions on users instead of fixing them. Actually, it's not just the user who suffers from the provider who gets bad information.

Will the madness never end? I don't really expect it to since cranial computrons seem to be in short supply.

The advantage of publishing here instead of in Risks is that I don't have to pretend to be well-behaved and diplomatic.

Mea Culpa

I'm not a Latin scholar, but I did make a mistake on my site which caused me to send deliver failure messages to a number of authors in a recent Risks Digest issue. I'll replace this entry with the apology after it appears in Risks itself. But it's the kind of error that reminds one to be understanding of the technical foibles of others. The issue is less how one avoids all problems as much as it is an issue of how one deals with them after that fact.

"Peter G. Neumann" neumann@csl.sri.com:
[Failure] User risks-digest not listed in public Name & Address Book (Subject was: Galaxy IV revisited)]

That said, those who tell me not to type hyphens are still idiots.