Old Chips, New Glitches: the CGA/CRTC "Phantom" VSync

At last year's Evoke demoparty, we released Area 5150, another demo for the original IBM PC using CGA video.  We're still working on a final version in our collective spare time, which is why I haven't posted about it on this blog yet (although it has now been nominated for two Meteorik awards, which is absolutely an honor!).  This final version will hopefully include all the refinements and extra touches that didn't make the cut in time for the deadline, but it's also going to fix a couple of tiny bugs that managed to slip through.  One of those bugs turned out to be a rather interesting case.

There are a few points in the demo where a black horizontal bar flashes briefly across the entire width of the screen, usually near the very top of the image.  Here are some examples, taken from the 60Hz CRT footage of the demo (slowed down 5x for clarity):

It only lasts for a single frame, but (1) I found it distracting, and (2) there was no immediate reason for it to be there at all, so I had to look into it.  Annoyingly enough, it wasn't even deterministic: one run of the demo could have a certain frame haunted by this phantom flicker, but on the next run it would be gone... only to re-emerge somewhere else entirely, middle finger raised.

When it does appear, the bar is always 16 scanlines tall.  That was my immediate clue as to what it must be - an errant vertical blanking interval: on CGA, this always lasts for 16 scanlines.  Of course, it has no business showing up in mid-frame; vertical blanking is supposed to occur past the bottom of the CRT image, when the vertical sync pulse sends the raster beam flying back up towards the top of the screen.  So what on earth was it doing in our visible area?

Vertical Sync on CGA with the 6845 CRTC

Briefly, the 6845 Cathode Ray Tube Controller (b. 1977) is the ticking "heart" of the CGA board - the same role it has played in other video solutions (MDA, HGC, the PCjr, and such non-PC architectures as the Amstrad CPC, BBC Micro, some Commodore PETs, several MSX computers, and even arcade games).  The majority of its role is to generate video signal timings and memory addresses for the display hardware; it does this with the aid of a clock signal and a host of programmable counters.

Its simplicity and flexibility explain how it came to be used in such a wide range of hardware - and how it can be abused for clever video tweaks and tricks.  There were several clones of the same basic design on the market, mostly compatible with each other; so far as known, IBM used two such chips in its CGA boards: Motorola's MC6845 and Hitachi's HD6845R.

As it happens, the black bar glitch occurs with both of these CRTC variants... but only on IBM's own CGA boards: the clone CGA cards we've tested don't seem to be susceptible [update - after more scrutiny, this turned out to be wrong].  But more on this later; first, let's see how the CGA handles the 6845's vertical sync signal.

R3: H. SyncWidth = CGAH. blank width R1: H. DisplayedR2: H. Sync Pos R0: H. Total V. Sync width =CGA V. blank width R6: V.DisplayedR7: V.Sync PosR4: V. Total +R5: V. ScanLine Adjust CGA H. Syncwidth (fixed) CGA V. Syncwidth (fixed) Top-left corneron CRT screen Active video area(VRAM contents displayed)Overscan area(border color displayed)H/V blanking (video off)Horizontal sync pulseVertical sync pulse CGA frame structure (simplified)with 6845 CRTC timings Horizontal timing registers are setin character clock unitsVertical timing registers are setin character row units (except forR5, set in scanlines)

Vertical sync position is controlled by CRTC register R7: when the internal character row counter reaches the value programmed into this register, the CRTC's VS output signal goes high for the next 16 scanlines.1  This value determines the vertical positioning of the image on the monitor; for the picture to remain stable, a vsync must occur every 262 scanlines, which is the normal duration of a CGA frame.

When a CRT's electron beam is done scanning the bottom-most line, two things have to happen: the vsync pulse triggers a vertical retrace, signalling the beam to return to the top of the screen for the next frame.  We don't want to be drawing visible retrace lines while this is going on, so the beam has to be turned off until it has reached the top again: this is what the vertical blanking interval (vblank) is for.

The vsync and vblank durations are not the same - the vsync pulse is shorter and fits wholly inside the vblank interval.  However, the 6845 CRTC provides only a single VS signal to time them both.  The CGA deals with that by blanking the video output for as long as VS is high: the CRTC's vsync becomes the CGA's vblank, and a shorter (3-scanline) vertical sync pulse is generated inside that period.

Of course, in normal usage there's absolutely no reason to reprogram the CRTC's vsync position value.  R7 is written once when the video mode is set, and a vsync dutifully proceeds to occur once per frame at the expected time.  But some video modes used in Area 5150 are not set-and-forget: for instance, several logical "frames" may be packed into a single 60Hz cycle.  This requires the CRTC to reach its Vertical Total count at least once without a vblank/vsync getting in the way, so R7 may have to be rewritten several times per frame - sometimes during the active display period.

Drawing a Blank

That evasive black bar was evidently a stray vblank interval.  The associated vsync pulse wasn't causing the monitor to lose sync, because the regular vsync pulses were still arriving, and the CRT's vertical hold oscillator was locking on to them.  Clearly I wasn't seeing a temporary displacement of the regular pulses (which does happen elsewhere in the party release, unfortunately) but an uninvited additional pulse.

Sure enough, my code was messing with R7 when the glitch was showing up, so I went looking for bugs; but even after poring over the datasheets, everything seemed to be correct: the CRTC's row counter couldn't be equal to R7 at the moment of the glitch.  In one instance, I did "fix" it by shuffling the CRTC writes and modifying some values, but that only made me jump to the wrong conclusions about what was going on - this didn't work elsewhere.  Eventually this problem got bumped way down the priority list in favor of more important work, and we never got to put more time into it before the party release.

Recently I've been tackling some of those remaining issues for the final version, and the time came to clonk this one on the head.  I started by writing a test program to reproduce what the glitchy demo sections were doing, but under more controlled conditions, so I could change when and how R7 gets re­programmed, and see what happens over multiple frames.

What I got was a bit of a surprise: the uncalled-for vblank was appearing at the instant when R7 was modified - even when the 6845's current row count was different from both the old and new R7 values.  In other words,

The rewrite operation itself can fool the 6845 CRTC into making a false-positive comparison, and cause it to trigger a vsync signal immediately, even when the defined conditions are not met.

In retrospect I had already seen the clues to this, because those phantom black bars were all starting in mid-scanline - unlike your respectable and law-abiding vblank, which always begins in the left border area (on the scanline's very last character clock period).  Of course, not every R7 write operation was causing a glitch.  Through testing different parameters, it emerged that there's a combination of several factors at play:

  • The value of the CRTC's character row counter at the time of the write operation.
  • The old value R7 was being changed from.
  • The new value R7 was being changed to.
  • The precise timing of the write.

Why the latter? - the phantom black bar was flickering rather than stable, even though the program was setting the same values on the same row, frame after frame.  Different sets of numbers were yielding visibly different flicker patterns.

The exact horizontal position where the blanking begins (within the scanline) was shifting back and forth, too.  The code was running with interrupts disabled, but this form of jitter usually appears courtesy of the PC's DRAM refresh mechanism.2  When I modified the DRAM refresh rate to synchronize it with the duration of a scanline, it became perfectly steady - there was either a solid black bar, or none at all.

One set of parameters which causes a glitch when R7 is modified - with and without DRAM refresh jitter

A couple of important points to note here: unlike some other known CGA issues ("snow" on CPU access, or the NTSC color burst falling outside the horizontal blanking period), this one is not limited to 80-column text mode (+HRES).  It shows up in 40-column mode too - and presumably in graphics modes, since the CRTC doesn't know the difference.  The other thing is that the new vsync position is recorded properly, as far as I could tell.  The next actual vsync occurs at the expected row, so this short hiccup doesn't prevent the register from retaining the correct value.

All in all, this was shaping up into a curious little puzzle - not just thickening the plot around nailing a specific bug, but raising all sorts of questions about the 6845's inner workings too.

All the Wrong Noises

I went looking through the literature again, and this time I noticed a mention in Hitachi's HD6845R/S datasheet.  One section has a list of "anomalous operations caused by rewriting registers during the display operation".  About the side-effects of modifying R7, it provides this beautifully-phrased warning:

There are some cases where VSYNC is placed on the position different from the programmed value or the noise is output [...]
When a rewrite operation is performed, there are some cases where a flicker and so on occur temporally.3

Both of my IBM CGA cards happen to use Motorola's MC6845, but this model seems to be the same overall design as the earlier HD6845R (also numbered HD46505R); none of Motorola's datasheets bother going into details of faulty behavior.

According to Hitachi then, rewriting the vsync position during the active display period is a Very Naughty Thing, and I must have unwittingly unleashed The Noise.  Problem is, for certain video tricks there's just no other way to do it - not without introducing a whole new set of headaches, anyway.  I could really do without vaguely-described anomalies sneaking up on me, so a more useful explanation would be nice.

There's something I wasn't fully aware of during our original work on Area 5150: this 45-year-old chip has been the subject of much new research in recent years.  I thought that might help clear this up, now that I've seen some of that research:

  • Enterprising users of Acorn's BBC Micro have been looking pretty deeply into the 6845's quirks, with the ultimate goal of implementing a clone in FPGA
  • French CPC demo group Logon System (shoutout!) has released the most excellent Amstrad CPC CRTC Compendium - an extremely detailed document with loads of information about the different CRTC variants used by Amstrad, including the MC6845
  • This year, UtterChaos from our own group has been doing some very thorough snooping into the HD6845's guts, with a specific focus on register rewriting behavior; some of his results can be seen on his YouTube channel, 'PCRetroTech', which you should definitely check out: Reverse Engineering the Video Chip of the IBM CGA Card (1981)

However, none of the above gave me an answer to this particular problem.  As far as I could tell, the existence of this glitch hasn't been acknowledged in the Beeb and CPC worlds (most likely because it simply cannot occur in these architectures; read on for why).4  As for UtterChaos's investigation, he hadn't run into this specific issue, although he did mention some oddities when rewriting R6 - the Vertical Displayed register (eventually, we concluded that the cause is probably the same).  The only option was to probe some more.

Anomaly Analysis Automation

Obviously there was a pattern here, so there had to be a way to determine which parameter combinations were prone to glitching, and which ones were safe.  The CGA helpfully supplies a status register at port 3DAh, which can tell us when a vertical retrace is in progress - although again, this really refers to vertical blanking.  If we can get a reading whenever our slippery black bar shows up, we can track it through parameter space by writing an automated test.

It was clear that a combination of three factors determines whether an R7 rewrite could possibly glitch at all: the CRTC's current character row count ("Row"), the old value held in R7 before the rewrite ("From"), and the new value being written ("To").  The precise timing factor was more difficult to characterize; so to get useful results, each combination had to be tested for a number of consecutive frames - with maximal DRAM refresh jitter - so that if a glitch is possible at all, it'd show up at least once.

The "Row", "From" and "To" numbers (which I'll designate r, f, and t respectively) are 7-bit values, so there are 128³ = 2,097,152 combinations to test.  To avoid taking an eternity or two, there would be two passes:

  • First pass: for each of the 128×128 (r,f) pairs, test every t value for just one frame; this should winnow out those combinations where no t ever results in a glitch.
  • Second pass: for the remaining combinations (those that did glitch at least once), test every t value more thoroughly by checking for a vblank across a number of frames in succession.  But how many? - I ended up choosing 40, based purely on eyeballing the flicker patterns that seemed to be possible.

If you do the math, you'll see that the first pass took just under 10 hours: not too bad!  I made it log the results in CSV format, with rows for r and columns for f; each field held the vsync read count, i.e. how many t values glitched for that (r,f) pair.

R7 glitch test: first pass results
MC6845, R7 glitch test: first pass results (click for full chart)

The result was rather intriguing: the 2D pattern quite clearly resembled the Sierpiński Triangle fractal set.  This was unexpected, and pretty cool in itself, but encouraging too: this pattern does show up in bitwise binary logic systems,5 so it could be pointing the way towards figuring out the formula.

A few data points seemed to break this tidy pattern, but they turned out to be false positives (artifacts of how the test screen was constructed).  After fixing my code and eliminating them, I arrived at the chart shown here; all values are hexadecimal.

A count of 1 indicates a non-glitch, since each pair has one case where r=t: when R7 is set to the same value as the current row, a vsync will always occur - that's by design, not a glitch.  In the same way, when r=f, a vsync has already been in progress since the beginning of that row by definition, so all reads are positive regardless of t: that accounts for the continuous line of "80h" results for those pairs.

With a bit of pattern-matching, two bitwise conditions emerged which fit the observed data.  To make things a bit clearer, they're color-coded on the chart.  For a given (r,f) pair, it's possible to get a glitch when both of the following are true:

  • The 3 low bits of the "Row" and "From" values are identical: r&7 == f&7
  • None of the 4 high bits is both set (1) in the "Row" value and clear (0) in the "From" value: (r&~f)&0x78 == 0

Only when these two conditions intersect on the chart, we see actual glitches being recorded.  This intersection may not be sufficient - some (r,f) pairs that fulfill these terms still didn't glitch; but this first pass was testing each t value for only one frame, so it might have missed a few.  It still established a necessary set of conditions to start with.

What could this be telling us about the 6845's actual operation?  It seems to strongly indicate that in the "glitchy" cases, the internal comparator (which tests whether R7 is equal to the current char row counter) is getting a partially-written R7 value as its input.  If the 4 high bits have already been updated when the comparison is made, but the 3 low bits haven't, that could begin to explain it.  On the other hand, there was still another pass to perform.

Making it Count

The second pass took the remaining (r,f) pairs, tested every possible t for a period of 40 frames, and counted the total number of vblanks (glitches) detected.  That would take an obscene amount of time, but the whole point of the first pass was to narrow it down for us - and it has.  Starting with the cases that meet the observed conditions, and throwing away those 128 redundant pairs where r=f, we're left with 540.  At 128 t values per case, for 40 frames each, that's just under 13 hours.  Efficiency ftw.

These counts were logged into a CSV file as well, with one row per (r,f) pair tested.  For reference, a plain text list of the results is available too (vsync counts have been converted to decimal).

We're now looking at combinations of three parameters, so a true visual representation of the results would be 3D; but they can also be re­arranged into three 2D charts - one parameter per axis, w/each cell summing up the glitch counts across the third parameter.  Doing that brings out those nice Sierpiński triangles again:

VSync position glitch occurence; r (character row) vs. f ('from' value), MC6845
r (character row) vs. f ('from' value), MC6845
Click for full chart
VSync position glitch occurence; r (character row) vs. t ('to' value), MC6845
r (character row) vs. t ('to' value), MC6845
Click for full chart
VSync position glitch occurence;  f ('from' value) vs. t ('to' value), MC6845
f ('from' value) vs. t ('to' value), MC6845
Click for full chart

The first chart (plotting r × f) resembles the pattern from the first pass, since those are the combinations it checked.  The r × t chart gives us an even more detailed Sierpiński sieve, and f × t seems to have multiple ones (additive and subtractive) woven together at different scale factors.

The data can also be visualized as a 3D scatter plot.  One online tool for that is at MiaBella AI; you can paste this text file into the 'Custom Data Set' input box at the bottom, and click Update Chart.  You'll get a 3D view where the color (and size) of each point represents the count of glitches recorded.

VSync position rewrite glitch test results: 3D scatter plot
VSync position rewrite glitch test results: 3D scatter plot

While that does look cool (and has a kind of demo-effect aesthetic in itself), what we're really after is sussing out the bitwise formulas behind it; and when you have three independent variables conspiring together, staring at patterns doesn't help as much.  There may be a formalized, rigorous mathematical/algorithmic method to figure it out, but that would be a wee bit above my pay grade.  I had to put together an interactive view of the results, place two variables (the "From" and "To" R7 values) on the x and y axes, and make the third (the character row) selectable to control which subset of results is in view.

That way I could define rules for the bit combinations, visualize them on the chart, and try to get a match for the actual data.  I put it all online at this page: VSync Position (R7) Rewrite Glitch Test Results.  This should let you see how my best-guess conditions overlap, and how their intersection correlates with the observed results.

Conclusions

The Rules

This final step revealed two more formulas governing the bit-patterns involved.  All told, to permit a glitch at the instant when R7 is rewritten, four conditions must be met:

Condition Interpretation
r&7 == f&7 the 3 low bits are the same between the CRTC row counter and the R7 "From" value
r & ~f & 0x78 == 0 none of the 4 high bits is 1 in the row counter and 0 in the "From" value
f & (t^r) & 0x78 == 0 none of the 4 high bits is 1 in "From" and has opposing states in "To" vs. the row counter
r & ~t & 0xf == 0 none of the 4 low bits is 1 in the row counter and 0 in the "To" value
The Exceptions

Of course, that's still just a best-fit approximation of what's really happening.  There are some counter-examples that refuse to line up nicely - cases that don't meet the above conditions, but still show the phantom glitch when rewriting R7.  They're colored red in the interactive chart, and can also be seen in the above 2D plot of r × t (as extra 'fluff' along the diagonal lines).

These exceptions are a little too sporadic and irregular to indicate a single coherent pattern.  Perhaps one could be established by refining the test conditions, e.g. if the R7 writes could be timed down to pixel-clock precisions.  (As UtterChaos has demonstrated in his 6845 research mentioned earlier, this is possible on the PC, but requires disabling DRAM refresh entirely - and carefully rewriting/relocating every chunk of code, so that the 8088's instruction fetches keep all DRAM rows timely refreshed.)

On the flipside, we have many cases where all four conditions are met, but no vsyncs turned up.  This could be down to a couple of things; testing each t for even more frames in a row would probably flush out more phantoms.  If you squint at the data a bit, you can also spot an overall trend where more "1" bits (across all three parameters) mean fewer glitches; so that's probably another little gremlin lurking in the silicon.

But there's one more factor which seems to suppress glitches where we expect them to show up: temperature.

A Hot Mess

...Temperature? Yes, at least that's what I observed with the non-automated version of my test program: for certain sets of values - ones that display a very sporadic flicker to begin with - the glitches become less and less frequent as time goes on, until they eventually disappear altogether!

As an example, take the combination r=0, f=8h, t=62h.  If I power up the machine and immediately load up the test screen, intermittent flicker appears as soon as I dial those values in, but then it gradually goes away.

Temperature dependence of the R7 rewrite glitch: for certain parameter combinations, the flicker eventually disappears as the system warms up

The only factor that changes here is the system's uptime, and the only explanation that seems to make sense is temperature; we've already conjectured that the glitch occurs when the 6845 compares the character row counter value to the contents of R7, but the value just written to R7 hasn't had the time to fully 'settle'.  As the chip gets warmer, the response times of its internal flip-flops changes, and if the write timings are already marginal that may just make the difference.

Both of the automated tests took hours to run, so they must have missed these temperature-dependent cases - by the time they were checked, most (if not all) of them probably weren't generating any positive vsync reads at all.  Within the limits of reason, there's not much we can do about the system's temperature throughout the test, so this will have to be good enough.

Caught Somewhere in Time

At this point, we might as well look into the the timing part of the whole equation.  Why does the glitch seem to depend on the precise scheduling of the write, and how exactly?

We've had two clues: first, the 'phantom' vblank only triggers when the character row number has a specific correlation with the old and new R7 contents, which hints that the new value is only partially resolved.  Second, if we repeat the same exact write at 1-frame intervals, we get irregular flicker with DRAM refresh jitter, but consistent behavior when the jitter is removed.  Taken together, this suggests that the glitch occurs when the data write is mis­aligned with respect to the CRTC's internal timing - i.e. when it takes place at a particular part of the character clock cycle.  Let's try to see why that should be the case.

0.8V 0.8V 0.8V 2.0V 2.0V 2.0V 1 2 3 4 4 5 6 7 8 RS, CSR/W EnableData BusD ~ D0  7 MPU Write Data 6845 Bus Write Timing adapted from MC6845, HD6845 datasheets (timingsvary between CRTC models and datasheet revisions) 1 2 3 4 5 6 7 8 Enable Cycle TimePulse Width, E LowPulse Width, E HighE Rise/Fall TimeAddress Setup TimeAddress Hold TimeData Setup TimeData Hold Time minminminmaxminminminmin 1.0 µs430 ns450 ns25 ns160 ns10 ns195 ns10 ns # Item Constraint

All timing in the CRTC is derived from CLK (the input character clock signal), except for external data transfers.  These are clocked by E (Enable), an input signal normally derived from the processor clock, which is strobed to initialize a read/write.  On the CGA, it goes high when an I/O is signalled and +IOCLK (the CPU clock on the ISA bus) is high, then low again when the transfer finishes.  As far as the CRTC is concerned, it can be "low for extended periods provided the CLK input is active".

As per diagram above, the window for a data write is defined by the E pulse widths; the data inputs may not change within the region around the active edge of E, i.e. the data setup/hold time.  Otherwise they would be resolved unreliably: the result could be either the previous input, the new input, or some meta­stable (un­predictable) state.6  That does sound awfully close to the symptoms we've been seeing with the individual bits of R7.

But if that were the case, then a second transfer would likely be needed to correct the input; in actuality the written R7 value does take effect, despite the momentary glitch.  On top of that, Reenigne from our demo team was able to probe the bus signals directly, and his measurements seem to indicate that the timing constraints are not violated (not even with an "OUT DX,AX" word output, which minimizes the delay between the two bytes sent out on the 8-bit bus).  So what else could be going on?

D0 ~ D7CSRSER/WCLK A28A31B20B14B13 U30U15 U38 (6845)U18U11U40 +D0 ~ +D7-6845CS+A0+E-IOW-CCLK+A0+A3+IOCLK-IOR-IOW+HRES+HCLK+LCLK ISA BUSExpansionSlotCGABoard 22Q2Y7CLK2D ,CLR 4.7727 MHz1.7898 MHz0.8949 MHz Clockselectcircuitry AddressdecoderModeSelect CPU R/WlatchesClockgenerators

Above is an abridged diagram of the signals involved in CRTC data I/O, based on IBM's CGA schematics.  The CLK character rate depends on the video mode: graphics modes and 40-column text use 0.895 MHz, and +HRES doubles the frequency to 1.79 MHz for 80-column text mode.  On an IBM PC or XT, the ISA system clock +IOCLK is the 4.77 MHz CPU clock - on faster systems, 8.33 MHz is usually considered the safe maximum.

The key here is the ratio between these frequencies: at 4.77 MHz, one CPU cycle takes either 3/8 or 3/16 of a character period.  When you shuffle the numbers, a write clocked by E (ultimately by the system clock) could have 8 or 16 discrete offsets with respect to the character period - that is, it could come in during any of the 8 or 16 hi-res dots that subdivide each character.  One CGA hi-res dot is 69.84 ns, which can be compared to the CRTC's timing constraints.  Under normal circumstances there's no predicting which dot it'll fall on, since the CGA circuitry makes no attempt to synchronize the two signals for I/O.

The thing is, nothing in the CRTC design appears to synchronize incoming data with the character clock either.  Bus I/O is scheduled with E, but all internal CRTC timings are based on CLK, and this has to include the triggering of the comparators which check the counters against their associated register values.  So the comparison operation may kick in at an arbitrary time with respect to the data write: too soon after it has finished, or perhaps while it's still in progress, and the CRTC has no provision against this.

Now suppose that the 7 bits in our R7 value aren't sent all at once from the input buffers to the register, say if the most significant bits go first.  This clock mis­alignment would then explain what's going on.  At some particular phase difference between E and CLK, the comparator may well get the R7 value while the 4 high bits are still being resolved, and the 3 low bits still have their old states - that's the most likely explanation for the odd bitwise interplay between the old/new R7 and the current row.

The Bottom Line

To sum it up: the "phantom vsync" glitch is an unintended black bar interrupting normal video display, caused by modifying the 6845 CRTC's VSync Position register (R7) while the CGA is sending out the visible portion of a frame.  The write itself can confuse the CRTC into triggering an immediate vsync, which in turn makes the adapter blank out the video for 16 lines.  When we cannot avoid rewriting R7 at inopportune moments, the easiest way to prevent glitches is to stick to 'known good' parameters in the video code, which is usually feasible in most practical situations.

My test results (provided above) establish which values can be safely written to R7, depending on its existing value and on the character row counter.  So far, they appear to apply to all IBM CGA cards tested, but at some point I'll clean up my test code and make it available so any CGA board could be put on the rack.  For now, let's try to answer a few more questions:

Which CRTC chips and CGA cards are susceptible?

Between myself, UtterChaos and reenigne we've checked IBM CGA boards with both MC6845 and HD6845(R) CRTCs, and they all seem to be affected in the same way (see update below).  The difference between the earlier and later CGA revisions doesn't seem to matter, although that's not terribly suprising - I'm pretty sure that the CRTC/IO interface wasn't changed.  The full automated test was only run on one of my IBM cards (with an MC6845), but varying samples from those results were double-checked on the others, and the same number combinations had the same effects.

Motorola MC6845P CRTC chip Hitachi HD6845P CRTC chip
The two 6845 CRTCs used in IBM CGA boards (in both cases 'P' is for the Plastic packaging, not a separate variant)

UtterChaos also had a quick look at a few non-IBM CGA cards with discrete 6845 chips - Motorola, Hitachi and others - and none of them a few didn't seem to exhibit the glitch at all.  We can't be sure why; perhaps these designs don't bother to act on a CRTC vsync during the active part of a scanline (a normally scheduled vsync never starts within that period).  Or don't start blanking until VS is active on the next hsync, or something along those lines.  I would assume that vertical blanking does work in general, otherwise retrace lines would be visible.  Maybe we should poke at a few of these cards some more.


UPDATE (2023-04-09): we've done some further poking at non-IBM cards with different CRTC chips, and it turns out that the above isn't quite correct.  There will be a follow-up post about this, but for now I should make the following amendments:

  • The Motorola and Hitachi chips are not affected to the same degree - cases that glitch on the MC6845 will glitch on the HD6845(R) too, but the latter glitches on additional cases as well, so in fact it's more susceptible.
  • Some other 6845 variants match the MC6845's behavior, while others do their own thing (at least one doesn't seem to glitch at all).
  • The glitch pattern appears to depend only on the CRTC variant, as cards from different manufacturers with the same CRTC chips show the same results; unlike my previous statement, the issue isn't IBM-specific at all.

Some third party CGA clones don't have an actual 6845 chip, and instead use a more highly integrated ASIC which also handles CRTC functionality.  My guess is that these aren't affected either, although that could depend on the chipset, and at any rate it's harder to tell what's going on inside them.

Is this a hardware bug in the 6845 CRTC, or in the IBM CGA?

A little bit of both.  Kind of a letdown when you can't name and shame, isn't it?  Actually, I'd say that the finger points more in the general direction of the CRTC, since the potential for a misalignment between the I/O Enable signal and the character clock (and thus, for a timing conflict between loading input into the register and acting on it) seems to be a part of the design.  Even when the published timing constraints are met (and CGA seems to meet them) that can evidently happen.

None of the manufacturers seemed to address that in the documentation, or to hint that some kind of host-side synchroni­zation may be necessary.  Except for that less-than-helpful reference by Hitachi, who only have it in their later datasheets (~1987), and go on to wash their hands of it with "the operations in this table are outside our guarantee".3  By the looks of it the issue was there ten years earlier, but only acknowledged (or discovered?) too late in the game, way after the IBM CGA.

Either way, it's worth keeping in mind that neither IBM nor the CRTC designers probably anticipated any particular need to reprogram these parameters during the active display period.  (I mean, who'd want to do such a silly thing?  If you're getting bored with all those free CPU cycles per scanline, why don't you weirdos just ditch the video RAM and go full Atari 2600?  We make real machines now in 1977, get with the times ffs.)

Why hasn't this glitch been described on other platforms using the 6845?

Probably because it simply can't ever happen on most of them.  The most prominent ones would be the Amstrad CPC, Acorn's BBC Micro, and the Commodore PET series.  The CPC has its Z80 running at 4 MHz, and its CRTC (and co.) at 1 MHz.  The hardware adds wait states for 3 out of every 4 CPU cycles; so however long a Z80 I/O instruction happens to take, CRTC writes can only take place at 4-cycle intervals: precisely 1µs apart.  That's the same frequency as the character clock, so everything's always lined up.  None of those nasty fractional phase differences you get on the PC/CGA, so the problem theorized above just isn't there.

It seems like the BBC Micro also synchronizes things in a similar way (6502 CPU at 2 MHz, CRTC clock at 1 MHz, with every other CPU cycle available for I/O).  I'm less informed about the different PET models - as far as I can tell the CPU and the CRTC both run at 1 MHz, so they're in step by definition.  Then again Commodore used their own 6845-based design (6545, also available from Rockwell), and although it's compatible with the early Hitachi/Motorola parts there are a few differences.

The IBM PC and its peripherals were put together in more of a modular fashion, and I suppose this is just a weird (and sneaky) side effect of that.  Does your system have a nice and clean monolithic design?  No phantom glitches for you!


Well, this has been too many words and too little fixing actual video code, so it's time to go apply these lessons to something productive.  Thanks to reenigne and UtterChaos for the brainstorming, and for the help in double-checking (and interpreting!) my test results.

Notes

  1. More precisely, VS appears to stay high until the horizontal character counter has reached its programmed HTotal value (which determines the length of a scanline) 16 times.  With the earlier MC6845 and HD6845R, this is a fixed count; the later MC6845-1 and HD6845S make it programmable, although 16 is still the default value. [↑]
  2. A full CGA scanline takes 76 PIT cycles.  To keep the PC's DRAM chips refreshed, the DMA controller has to "steal" bus cycles from the CPU every so often, and by default this happens every 18 PIT cycles.  76 doesn't divide by 18, so time-sensitive video operations can suffer from visible jitter; but it does divide by 19, and 19 PIT cycles is still a safe interval.  
    For more background, see Dynamic RAM Refresh: The Invisible Hand in Graphics Programming Black Book (Michael Abrash 1997), chapter 4.  Incidentally, you may notice that my test program toggles the rate between 13h (=19) and 4Bh (=75) cycles: 75 was chosen to maximize jitter, even though it's under-refreshing DRAM by a factor of 4(!).  It didn't seem to impact the stability of my XT, but that may just happen to be the case with this particular code - it's certainly not safe to do that as a general rule. [↑]
  3. From the HD6845R/HD6845S CRT Controller datasheet [↑]
  4. Interestingly the Amstrad CPC CRTC Compendium does mention a different vsync-related quirk in Motorola CRTCs, which my MC6845-equipped CGA does seem to reproduce, called the "ghost vsync" by the authors.  This black bar glitch seemed to be just as paranormal, therefore "phantom" vsync ("goblin" had less of a ring to it). [↑]
  5. Examples: The Sierpinski Tautology Map, Finding Sierpiński in the Oddest Places [↑]
  6. See Setup Time and Hold Time Basics at VLSI Universe [↑]

12 comments:

Longshot says:

Hello,

Congratulations for this very (very) interesting article and for this huge investigative work.
The architecture built around the CRTCs always has its say, and clock asynchrony on PC seems to be a problem.
No doubt this offers very interesting possibilities. :-)

On the AMSTRAD CPC, i guess the AMSTRAD 40010 GA is the equivalent of the CGA graphics card. It is commonly called GATE ARRAY (3 models: 40007,40008,40010 at 16Mhz) and it clocks the CRTC and the Z80A.

As you pointed out, the 3/4 pattern that halts the Z80 on its wait states was designed to avoid ram access issues between the CPU and the GA.
One of the consequences of applying this pattern is the framing and linearization of instructions between the Z80A and the CRTC, clocked by the GA.
This is probably why the problem mentioned in your article does not occur on CPC (I have never noticed it).

However, the Z80A has three different I/O instructions (OUT(C),reg8 , OUTI, and OUT(nn),A)
Between the first two there is a difference of 0.25 µsec on the cycles at the time of writing io, and with this difference there are some noticeable differences in internal CRTC processing.
This is particularly the case when the value of R2 or R3 is reprogrammed with a lower value (than its initial value) when C0 is equal to the position of the i/o.
This "late" update causes a late equality and a shift of the start of the HSYNC signal (for R2) or of the end of the HSYNC signal (for R3)
I was able to observe other type of anomaly on the modification of other registers (R0 for example, in particular on critical positions where the equality Cn/Rn is true).

You mention having used OUT DX,AX to minimize the delay between the two data (i guess AH /AL). I believe there is a Direct OUT on 8088 (but that's old to me).
Have you tried modifying R7 with this type of instruction too?

I haven't followed everything on the logic of the tests, which are quite complex.
If the new value of the modified register is important, perhaps it is possible to position R7 several times in a row.
For example, set R7 to 7F or 0 before giving it its final value?

With a type of CRTC allowing the number of VSYNC lines to be programmed through R3, this could have reduced the size of the VSYNC from 16 to 1 line.

In my document, I talk about two types of VSYNC, the GHOST VSYNC (CRTC 2) and the BLOCKED VSYNC (CRTC 0 HD46505S)
Thank you for confirming that the phenomenon occurs on type 2 (Motorola).
The GHOST VSYNC translates the fact that internally the CRTC processes a VSYNC, but without the VSYNC signal being active on the pin of the CRTC.
In this sense, the update of R7 during the VSYNC is not considered for the current VSYNC.

VileR says:

Hi Longshot, appreciate your comment! Also congrats on the newest Amstrad CRTC Compendium (and in English too), very well done. That's exactly where I learned all those things about the CPC timings - awesome that it's still being updated.

True, the 8088 has other forms of 'OUT', but the only direct ones are "OUT imm8,AL" and "OUT imm8,AX", i.e. the destination port number must be 0-255. Higher numbered ports can only be specified with DX, so that leaves "OUT DX,AL" or "OUT DX,AX". The CGA puts the CRTC index register on port 3D4 and the data register on 3D5... so we can either use "OUT DX,AL" twice (incrementing DX in between), or "OUT DX,AX" which effectively sends AL to port [DX], and AH to port [DX+1].

Intel's official timings are 8 cycles for "OUT DX,AL" and 12 for ",AX". But they're really 'best case' scenarios: exact cycle counting is a headache with the 8088, since execution and bus access are pipelined. The state of the instruction prefetch queue affects execution timing; plus each I/O operation takes up 4 clock cycles, and steps through 4 discrete states of the bus interface (this page has a nice summary).

So we can't really time things precisely on the cycle level, unless we happen to know the exact state of the Bus Interface Unit and the prefetch buffer. That's known to be possible (e.g. UtterChaos's test code), just not exactly practical for 'general' situations unfortunately.

You make a good suggestion about modifying R7 several times as needed. I assumed I'll have to do something like that to side-step 'problematic' parameters. It's a bit like planning a bus trip: you know you'll need to change lines (=rewrite R7), but specific changeovers (=old/new values) can only be made at particular stations (=C4), so you have to consult the schedule (=test results). :-)

VileR says:

BTW, regarding the different CRTC types: I had the wrong idea about the Motorola/Hitachi variants, but after my research for this post I think I've figured it out. The MC6845 and the HD6845R (AKA plain HD6845 or HD46505R) are both "Type 2", while the later MC6845-1 and HD6845S (HD46505S) are "Type 0". So IBM only used "Type 2" CRTCs for CGA. On the PC, it seems like "Type 0"s were used mostly in Hercules-type cards, and some clone cards have "Type 1" UMC chips.

About the "ghost vsync", when I tried moving my R7 writes to the right-side border area, I noticed that the glitches were much less frequent. That seems to confirm what you observed in your document: some of the rewrites were likely landing on the hsync interval. Even if they glitched, the resulting vsync was 'ghosted' by the CRTC and not sent out on the VS pin, so the black bar wasn't visible.

This could be another way to get around the glitch, but the hsync interval is often needed for other time-critical I/O. The timer may be otherwise occupied, too (CGA has no way to report the status of the hsync pulse, so normally you need the timer to target it precisely).

FavoritoHJS says:

After toying around with the dynamic html chart, I noticed that the only rule that doesn't always fit is "r & ~t & 0xf == 0", with every other rule seemingly fitting all cases.

Moreover, it seems like there's a pattern to what combinations break -- they usually appears as a coarser Sierpinsky, and rarely coarser than 2x2, though sometimes it can be as coarse as 8x8.

There also appears to be a pattern to what combinations are more likely to break -- less bits set in r leads to more breakage... I wonder if the "r & ~t & 0xf == 0" rule just happens to be a mayor contributor, but not sole reason for that.

Ian says:

With a borderline retrocomputing-old browser, I am seeing a (page) rendering glitch when trying to read your blog article about a (CGA) rendering glitch:

Your redesigned menu does not work right and pushes the entire page content into a narrow column on the right.

Inserting a «div class="clearfix"»«/div» into the header just before its close tag seems to sort of fix things, as does unsetting the overflow:hidden setting for .content. [NB: I used guillemets here instead of angle brackets to avoid unwanted form content sanitization shenanigans.]
If I drag the post-nav row clearfix class nav element from below the comments div to just before the close header tag, that also seems to fix things.

I had a look at Wayback Machine, but there was only one capture for this latest post, which exhbited the same problem. That's why I took a look at captures for your previous The IBM 5153's True CGA Palette article, and I found that the web/20230220031710 capture was still good (did not have the problem, i.e. was from before you tinkered with your page layout). The subsequent web/20230306031851 capture was bad (i.e. it did exhibit the same problem as the live page).

You might of course question the merit of running "close to half the current version" Firefox, but I question the merit of abandoning backwards compatibility for no good reason. (And I didn't even say anything about your use of WEBPs with no alternatives, which also cuts out older browsers.)

Anyway, maybe this gives you enough to go on, so you can avoid getting WOMM* certified.

;-D

*WOMM=Works On My Machine, a prestigious certification/award, which all Serious Web Developers seek to earn. See Google.

MiaM says:

First time reader here. Interesting post!

A few hardware related comments:

Microbee (Australian computer, also somewhat common in for example Sweden) also uses the 6545, and afaik it specifically needs a 6545 and it also uses it for keyboard scanning, using the screen position to drive the keyboard matrix and iirc using the light pen input to detect pressed keys :O.


The reason for the glitches might be due to how the 6845 handles register writes. I have no information on how it's actually done, that would probably need either some rather specific test hardware setup or looking at the actual chip using a microscope. However, in general, you can build a writable register either using a d-flip flop or a latch. The former will only accept input data on the actual active edge of it's clock input, while the latter will accept input data as long as its enable input is active. The latter is also afaik cheaper in terms of chip size / transistor/gate count, so would be the preferred option. If this is how the 6845 is designed, then any value present on the data pins while E is high will temporary appear to the comparator, so any random data on the bus, appearing when E goes high but the data bus hasn't stabilized on the intended write data yet, would be what is used by the comparator. The reason for certain combinations of the old and new register value and the actually displayed vertical row triggers the glitch and others don't could partially be due to whatever is on the data bus before it stabilizes. Does for example the observed condition correlate in any way to the data bus state of the last byte of the OUT instruction? (Probably not, but still).

Compare with that on some (most?) Commodore 64s you get a grey glitch pixel if you write to a color register that is actively displayed while writing to that register. The reason is likely that grey = $F = all bits high (color registers are 4 bits wide, the high 4 bits of a byte is discarded) and the bus probably defaults to being high (TTL logic and also NMOS logic has a weak pull-up on all inputs which causes bus signals to go high when nothing drives them actively), so when the write starts $F is read from the bus and the grey glitch is displayed, and during the write cycle the correct data is eventually driven on the bus by the processor and then the correct color is displayed).

Btw the PET and the later (commercially failed) CBM-II range all use synchronized bus and CRTC clocks.

It would be interesting to know if this CGA 6845 glich appears when using an original CGA card either in a newer computer that still has the ISA bus as a more or less integral part of the computer (say for example a 286) and also a way newer computer where the ISA bus is a separate thing that causes wait states, like say a Pentium system. The reason I would think this would be interesting is that the delay between E going high and the data bus becoming valid would be shorter, possibly the data bus might even be valid by the time E goes high on a Pentium system. Not that this would help someone who codes demos for the original IBM PC / XT, but would still be interesting.

PS your group colleague UtterChaos / PCRetroTech seems to be a pure software guy. In his latest video he holds an old Paradise Plantronics/CGA/Hercules compatible card in his hands, show two pin headers, says that there might be composite video on some pin, but "we have no way to know that" even though it would be simple to just plug in the card in a computer and hook up an oscilloscope. The "we have no way of knowing that" felt kind of like if you don't know what's in your fridge and you are too lazy to have a look :D :D Btw, the pin headers are probably the same as on a real CGA card, four pins with composite and power for an RF modulator, and the larger pin header for a light pen.

Ian says:

Not to start an argument, but MiaM, I think you're being a bit harsh on UtterChaos, who's a really good guy, and it's not nice speaking ill of people not in the room. We're all working with a mix of deep knowledge and ignorance, and untempered expectations can be just as bad as hubris. Also, I know from some of his recent vids that UtterChaos recently moved house, so even if he has an oscilloscope at home, —and not everyone does— he might not have that to hand. I mean, you may well be right about the pins I suspect, but why don't you talk to UtterChaos in the corresponding YT comment section, or better yet, make your own video about it? What's that? You're not set up for that and don't have all the gear? How is that better than not having an oscilloscope to hand? :P
I'll tell you a secret. I don't have an oscilloscope to hand here either. Shocking, I know.

VileR says:

@Ian: thanks for the report.  Yeah, it's a fine balance between optimizing for page size and speed (my motive for avoiding javascript, using .webp where it helps, etc.) and supporting older browsers.  I'd still like to do that, within reason, but it's not exactly the easiest thing to routinely test for.  FWIW, my main browser these days is Pale Moon which is a fork of pre-Quantum Firefox, and the new layout works fine on it.

That said, yours isn't the only report I've received about this... so I'll try to see what I can do.

VileR says:

@FavoritoHJS: well spotted!  I noticed more or less the same thing, and if you look at the "r×t" chart, the exceptions to that rule almost form a coherent pattern along the long diagonal line at the top-left quadrant (0×0-3F×3F; they appear as squarish 'jaggies' extending down/left).  But elsewhere they're much less consistent, if they show up at all.

For the purpose of avoiding glitches, I'd just go by the actual results and stick to the 'safe' areas of the diagram.  But I should mention that we've now fully tested the HD6845, in addition to the MC6845, and it turns out that the behavior isn't identical after all -- the Hitachi chip has more glitchy combinations, and curiously the MC6845's little exceptions are a small subset of the extra combinations that make the HD6845 glitch.  (I'm still looking into those results, but I'll update this post and make a follow-up.)

VileR says:

@Miam:  regarding UtterChaos's reasons for testing or not testing particular aspects of the Paradise card, I don't really see the use of making assumptions here when you could have simply asked him in the PCRetroTech video comments.  What I do know is that his free time isn't as abundant as he'd like it to be, and that he'd rather delve deeper if he could.  I can say however that he's definitely more knowledgeable about hardware than I am (if anyone in our group could be said to have no clue in the hardware department, it'd likely be me).

Either way, those are some helpful insights on the mechanics/timings of writing data to a latch - appreciate the details.  You brought up the idea of looking at the chip under a microscope, so I should mention that high-quality die shots of the HD6845 are available at http://www.seanriddle.com/6845/.  Not that I can make heads or tails of it, but someone with the skills might figure out how the registers/bus interface are implemented.  I'll definitely put the test code up at some point, so that various 6845/CGA combinations could be tested on different bus speeds/setups.

About the Microbee: using the CRTC's address output and light pen input as a keyboard controller is the kind of gigantic hack that takes both genius and balls... mad respect to the desginers for pulling it off!

don bright says:

absolutely fascinating!

sierpinski - ah yes, very common in small demos as u can get the pattern w simple binary operators on x/y coordinates. https://www.dwitter.net/h/sierpinski

oh gorsh, lol, i finally realized that the diagram of the CGA with the "top left" in the bottom right is a torus depiction of the screen, by starting with visible ram as blue, its leaving out the background scan area for the left and top of the screen, instead ... it's showing the background of the 'next frame" to the bottom and right??

VileR says:

@don bright: yeah, it's sort of like that - the diagram lays out the frame structure the way it's 'seen' by the 6845, not by the monitor.  A CRTC frame begins when all internal counters are reset, at the top left corner of the active raster area (visible RAM).  So the top border region is what's left of the previous frame, after vertical retrace has been executed.  The very end of the frame comes at the overscan area just to the left of the first active scanline.

Likewise a new scanline starts when the horizontal counters are reset, on the left edge of the active area.  After horizontal retrace, the last section of each CRTC scanline (shown all the way to the right in the diagram) makes up the left-side border.  On the diagram you can picture the 6845's horizontal counts going up from left to right, and the vertical ones from top to bottom - the retrace regions (black) are where the beam gets repositioned, but the 6845 is still counting off characters/rows respectively.

Write a response:

* Required.
Your email address will not be published.