Part #1 of the 8088 MPH writeup series • continued in 8088 MPH Final: Old vs. New CGA (and Other Gory Details)
By now you may have heard of the 8088 MPH demo, the winning entry in Revision 2015's Oldskool Demo compo this month. It's been my pleasure to combine efforts with the likes of Trixter, reenigne and Scali to make it happen - not only did I get the opportunity to work alongside a bunch of extremely talented wizards of code, we also achieved what we set out to do: break some world records on the venerable (and yet much-maligned!) IBM PC, the mommy and daddy of the x86 platform as we still know it today.
One of our "hey, this hardware shouldn't be doing that!"-moments was extending the CGA's color palette by a cool order of magnitude or two. How'd we pull that off? - reenigne has already posted an excellent technical article answering that very question. To complement his writeup, I'll take a bit of a different approach – here's my 'pictorial' take on how we arrived at this:
The idea that such multi-color trickery was possible came to me some time ago, as I was looking at reenigne's code for patching up composite CGA emulation in DOSBox; messing with that patch during development gave me a much better picture of composite CGA's inner workings. When I had ironed out the basic concept for this hack, I divulged it to reenigne for 'peer review' and for testing on real hardware. Soon enough, we had an improved recipe:
- Take two familiar (though officially undocumented) tweaks. Blend to an even mixture producing a new effect.
- Add one crucial new trick – an ingredient of reenigne's devising.
- Test and calibrate until blue in the face.
Below is my rundown of how it all fits together. Fair warning: the 'target audience' for this writeup is people who may not be overly familiar with CGA, and/or come from other demo platforms. As such, there's a whole bunch of background that's already well-known in CGA-land. To prevent acute boredom, I decided to stick this TOC here – feel free to skip to the interesting part(s):
Because, much like a broken clock, even Wikipedia gets it right sometimes
A short crash course on CGA basics: the first graphics standard available on PCs supports a 16KB memory buffer, and is driven by an MC6845 CRTC (some later cards used alternatives). Video output options are composite NTSC through a standard RCA jack, and the more widely-used DE9 connector, which outputs an RGBI signal (red, green, blue and intensity). The latter is what most people think of when they hear "CGA"; this is a digital (TTL) signal, where each component can be either on or off, hence 16 different colors. Despite what arcade hardware buffs would like you to think, CGA – in the strict sense – is NOT analog RGB, and never was.
Standard (BIOS-supported) graphics modes are high-resolution (640x200) in 2 colors, and medium-resolution (320x200) in 4 colors. Not a lot of wiggle room here: in hi-res mode, only one of the colors (foreground) is redefinable – the background is always black; in medium-res, it's the background color that's adjustable, while the other 3 are determined by the infamously nasty fixed palettes.
Infuriatingly, in an almost-trollish move, IBM mentioned an additional low-resolution 16-color mode - "not supported in ROM" - with zero information on how to actually achieve it. That nut was cracked pretty early on, though.
This is no graphics mode at all, but a modified 80-column text mode. Basically, you adjust CRTC registers to get 100 rows of text instead of the usual 25; this gives you a character box of 8x2 pixels, a quarter of the normal 8x8. Filling the screen with one "magic" ASCII character, 0xDE, effectively splits each character cell into left and right "pixels", corresponding to the background and foreground colors. These two colors can be individually set to any of the 16 CGA values, as in any CGA text mode, as long as you remember to turn off blinking.
So there you have it; 160x100 @ 16c. This mode was used in games as early as 1983, but never got wildly popular - probably because of the "snow" that plagues IBM CGA cards in 80-column mode, unless you burn some costly CPU time to avoid it.
You may ask: since this is text mode, what's stopping you from using the entire ASCII character set? Other than a healthy respect for your own sanity, nothing really! This was first attempted around the mid-'80s by a few brave souls at Macrocom, who combined the 100-rows trick with ASCII art, to create what Trixter once succinctly called "ANSI from hell".
As you can see above, I've experimented with this a little. With judicious use of the character set, you can almost fool somebody into thinking that this is a 640x200 mode - although there's some inevitable "attribute clash", a little like the ZX Spectrum: each 8x2 character cell can contain only two colors, foreground and background. Also, you have to be a bit of a glutton for punishment to actually draw in this mode from scratch... but that's a subject for a future post.
This trick isn't directly relevant to our demo: we were targeting
composite displays. Even if CGA's composite output didn't have its share
of bugs and quirks in 80-column mode – which it
– there'd be no way to see this level of detail over NTSC. There's a
reason I mention this effect, however; the idea behind it does figure
into the story. But more on that later.
Digital RGB monitors were still a luxury item at the time of CGA's introduction, and IBM itself didn't offer one until a couple of years later, coinciding with the release of the PC/XT. But CGA also provided composite output, giving out (mostly) NTSC-compatible video. At the expense of resolution, there's more fun to be had here with color.
On the composite output, the familiar 16-color CGA palette is represented by a series of color signals, whose hue is determined by their phase relative to a reference signal (the NTSC color burst). The frequency of the NTSC color clock (3.579545 MHz) works out to exactly 160 color cycles per active CGA scanline.
These are directly generated by CGA hardware as color signals, so we'll conveniently call them "direct colors". IBM had two main revisions of the CGA, which produce composite video somewhat differently: 'new-style' cards contain additional circuitry, which helps the palette match its RGBI counterpart a little more closely. For the demo, we standardized on 'old-style' cards, simply because we happened to have done more testing on those (with somwhat better results), so all images in this post will reflect 'old-style' CGA colors.
If these 16 direct colors were all we had, it wouldn't be a whole lot of fun, would it? They're also shockingly ugly, esepcially on an old-style CGA, which doesn’t help matters either. Just look at that palette... gross, dude. Luckily, there's a way to go one better.
Due to bandwidth restrictions, NTSC video doesn't fully separate chrominance (color) from luminance. Effectively, any high-resolution detail – that is, detail with higher frequency than the NTSC color clock – gets 'smeared' when the signal is decoded. This is responsible for the characteristic color bleed, seen in the form of fetching little fringes at the edges of text characters and other fine detail.
Remember how you get 160 color cycles per active CGA scanline? Standard CGA gives us either 320 or 640 active pixels per scanline, depending on the video mode. Ergo, we can switch pixels on and off at 2x or 4x the frequency of the color carrier. Since this high-frequency detail cannot be fully separated from color information, the upshot is this:
This NTSC color cycle is sometimes represented as a wheel: one complete period of this cycle equals a 360° revolution around the color wheel, and we have 160 complete revolutions per scanline.
Let's say we're in hi-res (640x200) mode, where 4 pixels fit into one such color cycle: moving one pixel left or right translates to moving 90° along the wheel, in either direction, and accordingly shifts the hue by 90°. Likewise, in 320x200 mode, we move in 180° increments of hue-shift.
In short, manipulating detail at high resolutions is effectively a method of generating color; being an artifact of NTSC's imperfections, this is known as artifact color.
Various filters can be (and often are) employed on the receiving end to recover some of the high-frequency detail, reducing color bleed and making edge transitions somewhat sharper. We're still dealing with technology, not magic, so full separation of detail and color can never quite be achieved, and the trade-off is a whole new set of artifacts (in the form of "echoing" or "ringing"). This trade-off may or may not be acceptable, depending on what you're doing, but the above image doesn't attempt to reproduce any such filtering.
All this business of "fringing" and "bleeding" sure sounds like a bummer, and that's exactly what it is: the unwanted side-effect of a less-than-ideal encoding scheme. But like any good flaw, it can be turned into an advantage by an enterprising soul, and this is where we get to the fun part (your mileage may vary).
When you look at the interplay of color vs. detail over NTSC, a very handy fact becomes apparent:
Our 16 direct colors are exactly this type of periodic composite signal. But hold on – with some simple high-resolution pixel-pushing, we can manually put together our own periodic waveforms! Any pattern of dots will do, as long as it repeats at the right frequency. This lets us achieve solid colors that lie outside the direct color palette.
The "classic" way of doing this on CGA is to set up BIOS mode 6 – 640x200 in 2 colors, white on black – and set the color-burst bit (which is off by default, for a B&W picture). At this resolution we can squeeze 4 pixels into a color clock period, and at 1 bit per pixel, there are 16 possible patterns – giving us 16 solid artifact colors.
This is pretty much the same technique used by Steve Wozniak to generate color on the Apple ][. In fact, on an old-style CGA card, these 16 colors are identical to the 16 low-res Apple colors (although you couldn't get them on a poster, like Apple owners could). More to the point: the pixels themselves are white, which carries no color information; it's the detail that does the deed.
*But wait, there's more!* Despite popular wisdom, CGA lets us one-up the Apple, and then some. OUR underlying pixels don't have to be white: in 640x200 mode, we can play with the palette register and set any of the 16 direct colors as the foreground (background is always black). By using the same pixel patterns with a different foreground color, we get 16 entirely new sets of artifact colors, with 16 colors each. We can only use one such set at a time, but we get to pick and choose what our 16 colors are.
Then there's 320x200 mode, which supports a palette of 4 direct colors. Only one of those, color #0 (background), is freely selectable. For the rest, intensity may be on or off, but we can only use green/red/yellow or cyan/magneta/white; the undocumented cyan/red/white palette involves disabling the color burst, making the composite picture greyscale.
Since our pixels are twice as fat in this mode, only two of them can squeeze into a color-clock cycle – but at 2 bits per pixel, the total count of artifact colors is still 16. The possible combinations of palette, plus the user-defined background color, provide us with a whole slew of other 16-color sets.
This may be a good place to correct a bit of a misconception. Since we have 160 color cycles per scanline, many people treat CGA's graphics modes over composite as 160x200 "modes", but that's not quite accurate. Our effective color resolution is indeed 160x200, and it's impossible to get finer detail than that using solid artifact colors. But as we've seen, on NTSC the pixel grid and color grid are NOT one and the same – which makes the question of horizontal resolution a bit fuzzy, depending on how you're sampling and/or filtering the signal. It even varies with the specific color waveforms you're using.
IBM itself never documented any of these artifact color tricks, other than one oblique reference to "color mixing techniques" in the PCjr tech ref (if I'm wrong about this, drop me a line and link me!). The concept is fairly old hat, however – it was used in games very early on; some of the first ones I can think of were Microsoft's Decathlon and Flight Simulator, both in 1982. And the limitation has always been the same: the maximum simultaneous color count you can get over composite CGA is 16.
....Or is it? On the off chance that you've been following me so far, and you're still reading, you may have an idea of what the next step is.
We've already observed that our choice of 16 artifact colors depends on the palette and color register settings. One fairly obvious strategy seems to suggest itself here – change those registers at particular scanlines on every frame, and get >16 colors on screen that way. Right?
This has been done before on CGA, and you can actually exploit this for 256 colors (as proven by reenigne - see the image to the left), but that's not how we did our multi-color hacking in the demo. We were actually toying with the idea of including a static screen that uses this technique, but I didn't have the time to pursue this; if anyone manages to compose some nice artwork using this method, I'd love to see it – that's gotta be a bit of an artistic challenge. But no, the way we wrangled more color out of CGA is a whole other shenanigan... which I came across by equal parts chance and morbid curiosity.
Recall how any color/dot pattern of the right length (four repeating pixels in 640x200, or two in 320x200) produces a solid color on a composite display? Back when I was testing composite emulation for DOSBox, that fact was fresh in my mind. At around the same time, I was experimenting with the "ANSI from Hell" graphical hack detailed above; that's purely a text mode / RGBI trick, but it requires a close familiarity with the ROM character set... closer than most sane people would want or need.
Let's take another look at a particular section of the CGA ROM font, in 80-column mode, with the top 2 scanlines highlighted:
At this point, if you're a visually-oriented person, and if you've been following my drift, you're probably catching on. Don't see it yet? Here's a fatter clue:
See those top 25% of the character bitmap? Two dots of foreground and two dots of background, doubled horizontally across. We're in hi-res/80-column mode, so there are two color cycles per character... corresponding exactly to those two matching halves. And those top two scanlines are identical.
That's just the type of repeating pattern that gets us a solid artifact color over NTSC. In fact, it's the very same waveform that 320x200 mode lets us play with. Except that now we have it available in text mode: you know, where we can freely assign a foreground AND a background to each character, from the 16 direct colors.
That's 256 possibilities right there... this is the part that made me go "I have a cunning plan", in my best imitation of Blackadder's Baldrick (just not out loud). Indeed, it's possible to achieve >16 colors on CGA without any flickering, dithering, interlacing or per-scanline effects.
Here's what the possible combinations work out to:
Oh, we're not done yet: once that lightbulb went off over my head, I had another look at the CGA ROM font to see if any other useful bit sequences emerge. There are a few character bitmaps that give us the exact same waveform as 'U' does – 'H', 'V', 'Y' and '¥' – but only one with a different suitable bit sequence right where we need it: 0x13, the double exclamation mark ('‼').
The top two scanlines of 'U' give us a bitmask of 11001100 for foreground/background; '‼' is 01100110 – a single shift to the right, or a 90° shift in phase. This perfectly complements 'U' in terms of having a well-rounded palette, because we get all the colors that the "...1100..." waveform has to offer: going from 'U' to '‼' shifts the phase by 90° (0110); 180° and 270° are achieved by flipping the foreground and background colors for 'U' and '‼' respectively – the same as going '0011' and '1001'.
Okay, we've pushed the envelope even further: 512 simultaneous colors! Granted, the real number is lower, because a good few are duplicates (and others are very close). But 512 seems to be the limit for this technique: no other characters in our font fit the bill for solid colors. The CGA character ROM does have an alternate 'thin' 8x8 font; but, besides the fact that you'd have to mod your card if you wanted to use it, the 'thin' font has none of the magic bit patterns in the right places, which makes it useless for our purposes.
My kingdom for redefinable characters... alas, when you're dealing with old PC hardware, IBM's penchant for cost-cutting over innovation can always sneak up from behind and ruin your day – even in the most unusual of places.
Still, I was pleased with my little discovery: extending the palette by a factor of 32 has to count for something, right? At this point, I shared my ideas with reenigne. Little did I know that he'll promptly come up with a new devious scheme to double our color count yet again...
This part is some next-level CRTC black magic which I could never have figured out by myself – I'm just a graphics guy; you might as well ask me to wait for a full moon and chant the MC6845 spec-sheet backwards in hexadecimal. All credit goes to reenigne for this particular bit of mad science, which, despite its complex execution, stems from a wonderfully simple idea: our fixed character bitmaps don't play nice with what we're trying to do? No problem – we'll make them play nice, or else.
See, there are two additional characters whose very first scanline could be used; problem is, the second scanline is different, which would ruin our solid color effect. These are ASCII codes 0xB0 and 0xB1, the 'shaded block' characters. It would be quite convenient if we could just tell that offending second scanline to buzz off, wouldn't it? As it turns out, we can.
The lowdown on how this is done is all in reenigne's writeup, which is linked to at the top of this post. But this is the basic idea: by starting a new CRTC frame every other scanline and twiddling with the start address, it's possible to lay down our character rows so that the first scanline of each gets duplicated twice!
Now we can make use of those two extra characters, and doing so gets us two more 256-color sets:
Naturally, there are downsides: having to mess with the CRTC every couple of scanlines is quite taxing for the poor 4.77MHz 8088, so there's not much you can do with this other than static pictures. The 512-color variant, using only ASCII 0x55 and 0x13, doesn't suffer from this – it's basically "set and forget", requiring no more CPU intervention than any 80-column text mode (the familiar overhead of avoiding snow).
Then, there's that other problem which plagues 80-column CGA on composite displays... the hardware bug that leads to bad hsync timing and missing color burst. There are ways to compensate for that, but none that reliably works with every monitor and capture device out there. This proved to be an enduring headache in calibrating, determining the actual colors, and obtaining a passable video capture of the entire demo... but that's all covered elsewhere.
At any rate, we now have 1K colors on a 1981 IBM CGA, at an effective resolution of 80x100 'chunky pixels'. 'Chunky' describes the memory layout, but it also applies in the visual sense: we're really plumbing the depths of resolution here. 160x100, that's as low as you could go? allow me to snicker, IBM - "low-res" just got lower, baby!
One might object that this isn't a lot of canvas. Yeah, yeah: 80x100 is a bit on the cramped side, 'artistically' speaking; but the limitation is part of the challenge, as it has always been in demos. You can keep your fancy 4K monitors - 0.008 megapixels should be enough for anybody.
When we first showed Trixter the 'proof-of-concept' 1024c drawings, his response was, and I quote: "HOLY F!@#$%G SHIT. WOW. I must know how this works!!". Achievement unlocked: getting THAT out of a veteran 8088/CGA hacker and demomaker is, by itself, almost as good as... well, joining the team, 'making a demo about it' and winning the oldskool compo. :)
That's about it for my writeup. If you made it this far, congratulations! There's more I could write about the tools and techniques I used to actually compose these graphics... but we'll get to that some other time.
Continued in part 2: 8088 MPH Final: Old vs. New CGA (and Other Gory Details)