Raising the Bar for IBM PC/XT Emulation: MartyPC

This was publicly released over a month ago, so it's old news by now; but the IBM PC has waited over 4 decades to be emulated quite this faithfully, so what's a little holdup between friends?  MartyPC is the latest breakthrough in emulating the 5150 and 5160: the machines that sired the PC platform.  It's still fresh out of the box, and perhaps it won't feel every bit as polished in all respects as some of the more battle-hardened, long-running projects, but its author GloriousCow has done a truly impressive job.  As of this writing, it's already at its second public release (v0.1.2); a whole bunch of improvements have made it in, and even more are planned.

What sets this one apart from other PC emulators out there?  Glad you ask!  My take on this question is admittedly subjective, but it really boils down to three things: precision, debugging/monitoring tools, and the way video output is handled.

Precision

R7 glitch test: first pass results
Yep, Area 5150's credits section plays just fine!

MartyPC has a cycle-exact implementation of the 8088 CPU, which has been a holy grail of sorts in PC emulation.  Right off the bat, it could boast of running both 8088 MPH and Area 5150 all the way through without hitches.1  Of course, I'm personally somewhat biased when it comes to these two demos, but they've become litmus tests for accuracy - and MartyPC is the first to pull it off so admirably.

The PC's Historical Baggage

See, cycle-accurate emulation had been achieved for a number of other systems (computers and game consoles), but for a long while it wasn't really on anyone's radar for the PC.  Partly because there wasn't much incentive: more than any other platform, the PC was always a moving target in terms of hardware.  On a machine like the ZX Spectrum or the SNES, your code can afford to expect a specific environment: this CPU and these support chips, all operating together in exactly this fashion.  The system as a whole is a fixed quantity (to some approximation), and exploiting a device's cycle-level timing characteristics is one way to squeeze more blood out of the stone.

PC programmers, on the other hand, dealt with an evolving set of standards (if they could even be called that), where it was much more typical to tell you to get a new CPU or some more RAM.  This kept providing developers with fresh forms of headache, and forced them to err on the side of compatibility.  You already had to worry whether the whole thing will run too fast or too slow, keel over when certain graphics or sound hardware wasn't there, or choke on 'extended' vs. 'expanded' memory.  Getting down to the cycle level - even just spending your time on learning how - would've been an excellent way to decimate your user base.  And when no software relies on cycle accuracy, emulators have little reason to implement it.

Sleuthing in Silicon

But another reason was that the Intel 8088 - the original PC's brains - did a good job guarding its secrets until relatively recently.  The way it's designed makes it impossible to achieve cycle accuracy by relying on published timings; the full story of its internals started to unravel only when Ken Shirriff began a series of articles dissecting the 8086, the 8088's near-identical big brother.  Ken placed the die under a microscope to reverse-engineer its inner workings, and has already revealed quite a few undocumented aspects of its behavior.  Those same die photos enabled Andrew Jenner (reenigne) to disassemble the 8086's microcode, and that of the 8088 as well, unlocking even more crucial details.

Although MartyPC doesn't emulate actual microcode execution, GloriousCow was able to use this work and derive the sought-after precise behavior at the cycle level.  This was verified against the actual CPU, using his own Arduino interface for the 8088, which was then integrated into the emulator - indeed, if you build such an interface yourself, MartyPC's CPU validator functionality will let you do the exact same thing.

It doesn't stop at the CPU, either.  Emulation of the 8237A DMA controller and the CGA board (with its 6845 CRTC) has been made cycle-exact as well, which is notable as the CGA's dot clock is 3 times the frequency of the PC/XT's CPU clock: 14.31818 MHz.  (This can get taxing on the host, so most of the time the graphics subsystem runs at character cycle precision; dot-cycle precision is triggered only when actually needed, i.e. for one character clock following a CRTC rewrite operation.)  The lowdown on all of this is covered on the author's blog, but it should be pretty clear that this is a whole different ball game in terms of fidelity.

Impersonating a Monitor, the Right Way

MartyPC's approach to video output sounds simple: always reflect what you'd expect to see on the full visible area of a real monitor.  This may sound like a no-brainer, and indeed many emulators of other systems take this approach - render a surface that corresponds to a CRT monitor's face, i.e. the active display 'window' plus a realistic amount of overscan (which usually constitutes the border area around the screen).

Certain video tricks play with the relationship between these areas; for instance, the Amstrad CPC might shift the active window's position within the video frame, or the C64 might draw graphics 'over-the-border'.  The typical CPC or C64 emulator renders such effects as you'd see them on an actual monitor, but almost nobody had bothered to do this when emulating a PC.

The typical PC emulator seems to isolate the active raster area, and use the guest's video parameters to compute the frame dimensions (and perhaps the refresh rate and the aspect ratio).  You end up with mode-dependent frame sizes, usually displaying just the active area (the 'window' into video RAM) as in DOSBox, or adding an arbitrary strip of overscan around it as in 86box or PCem.  Last I checked, even MAME does this for PC variants, but not for most other machines.

This method has its advantages when you're emulating flexible video devices that support different timings and resolutions, say (S)VGA, but at the cost of not really showing you the full picture.2  I'm sure there are ways to improve it; the thing is, CGA is a fixed-frequency standard (and so is MDA/Hercules, for that matter).  You might as well simulate the monitor just like the typical NTSC/PAL-based home system, and go with the first approach.

That's what MartyPC does, which allows it to accurately render the kind of video tricks which other PC emulators don't play nice with.  Let's have a look at some examples:


Oldskool Video Frame Manipulation in CGA Games/Demos

With CGA's mostly-NTSC-compatible timings, all standard video modes have a nice, fat border around the image (in the overscan area), extending all the way to the edges of the CRT's visible face.

It can only take on a single solid color, so it's not used for more than decorative purposes (usually), but it's part of the authentic experience of 15.7KHz video on the PC.

Moon Bugs (1983, Windmill Software)

The active video region can be moved around the monitor's display area by re­program­ming the CRTC, which is one way to manipu­late the screen without expen­sive VRAM updates.

For instance, Lost Tomb exploits this for its screen-shaking effect, when you've been clumsy enough to trigger an earthquake.

Lost Tomb (1984, Datasoft Inc.)

Another form of this trick was used in Sierra's AGI engine, where each display driver had its own way of doing things, depending on the video hardware. This one jolts the screen both horizon­tally and vertically.

Most other emulators don't reproduce Roger Wilco's full experience as he runs afoul of this delightful duo here, at least not in CGA mode.

Space Quest II (1988, Sierra On-Line Inc.)

Certain games had explicit controls to let you re­position the image on your monitor. Centering could vary quite a bit between CRTs, and not all monitors and TVs had user-accessible controls to adjust it; sometimes you had to open them up.  Doing it in software - through the CRTC - was certainly preferable to voiding your warranty.

Gremlins (1984, Atarisoft)

The 3D bobs/vectorballs section in 8088 MPH shrinks the active display region, so it takes up less of the visible frame; the preceding 'Flying DeLorean' part does the same.

Other emulators tend to respond by shrinking the total frame size.  This fails to reflect what you'd actually see, since a real monitor would have a hard time changing the physical dimensions of its screen; MartyPC shows this as intended.

8088 MPH and the Incredible Shrinking Frame

Area 5150 plays with around with these para­meters even more, to draw 'over-the-border' graphics and pull various other tricks; these trans­formations are meant to be seamless, and in MartyPC (as on a real monitor) they are.

Notice that the falling elephant drops all the way through the bottom edge of the frame: the viewport is both resized and shifted, without visible shrinking or stretching.

Area 5150 'pushing boundaries'

All of the above effects are really based on relatively straight­forward CRTC re­programming, although 8088 MPH and (especially) Area 5150 involved some fairly intricate precision work to make things just right.  Still, even the most basic scenarios can only be faithfully reproduced by rendering the video frame at a fixed size, and using the CRTC's parameters to position the active display area within it.

Debugging Tools

As someone who writes programs for those ancient machines, perhaps MartyPC's biggest selling point for me is how it spoils you with a battery of tools to debug and monitor the machine's state.  You get the basic ability to single-step, set breakpoints (on instructions, memory accesses and interrupts), view the call stack, memory dump, code disassembly and CPU state; but there's also instruction- and cycle-level logging, plus detailed views into the states of various components: the video card, PIC, PIT, PPI, and the DMA controller - including pretty much every port, counter, and internal/external register that may be relevant.

When single-stepping, one thing I find especially useful is how it displays the current position of the CRT beam on the emulated monitor.  With the cycle-accurate CPU and video emulation, you can see exactly how many cycles each instruction takes to execute, and how this translates into pixels and scanlines, i.e. how much of the raster gets scanned onto the screen within that time:

Color blending with alternate dot patterns shows a darkening effect: 40 vs. 80 columns

"Racing" the beam... one instruction at a time.

I'm told that even more debugging-related enhancements are in the works, so whether you're writing some bare-metal code for the PC/XT, or just seeking to learn more about what makes them tick, MartyPC looks set to become an integral part of your toolbox.

All in all, this piece of software is even more impressive considering that it's the author's first project written in Rust, and his first emulator project period.  With 8088 MPH and Area 5150, a big motivation behind the whole "this demo will break your emulator" thing was our hope that emulators would step up their game in terms of accuracy, and it's pretty frickin awesome to see it actually happening.

Notes

  1. And now it can play them in your web browser, too: 8088 MPH, Area 5150 [↑]
  2. There are probably some historical reasons behind this, too.  When projects like MESS or DOSBox first came along, they were emulating the PC on its own descendant, and a common scenario was to use the host's monitor just like the emulated software would: you could get the original video mode on your monitor, at the same resolution and refresh rate that the software would produce natively.
    This made the above-mentioned rendering approach practical and adaptable, but it has persisted as the only option in windowed mode as well - and in full screen (using scalers and so on), even though most of those original video modes aren't natively supported by current graphics cards any longer. [↑]

4 comments:

Raven Singularity says:

Those CGA demos are completely mind-bending! Having grown up with Monochrome, then CGA, then EGA+VGA, I don't even see how this is possible. There's too many colours, too much resolution, and it's all going too fast and smooth! Lol. The music is also quite nice for PC speaker.

pAULIE42o says:

Thanks for the post/update - I'm trying to build but hitting a couple hurdles... I'll get there; I hope to get this running on my nix machine. Amongst others, I want to run the 8088 demos and am pumped that MartyPC is giving new [emulation] life to the 8088/8086 series. :P w00t w00t!

FavoritoHJS says:

Another nixos user? i thought we were NOWHERE close to critical mass but i guess we really are...

if you're listening pAULIE42o i got it working, iirc particular pain points were the pixels crate version used only works with vulkan and thus requires vulkan-loader in buildInputs; and vulkan-loader and libxkbcommon don't seem to be properly dynamically loaded (the joys of broken FHS compliance...), worked around that by using lib.strings.makeLibraryPath and tossing the result into LD_LIBRARY_PATH but there must be a better way...

ahem sorry for that tangent. anyways, had a very concerning thought about pc speaker abuse and it's your problem now.

-----

nerdsniped myself a little bit, realized the rarely-used cassete port in the 5150 can be turned into a makeshift speaker output very easily - in fact, if the diagrams on minuszerodegrees.org are correct, it is always a speaker output except for the speaker enable bit.

also got angry at ancient ibm for making the pit gate input software-settable but not loopback from output, which would have made no-maintenance pwm waves trivial (wikipedia is wrong in this, it says the cassette port is pwm capable but it certainly isn't without the cpu setting high and low states manually)

on the bright side, with most pit modes setting this bit resets the speaker counter... and in most modes sets the speaker high, which can be used for a makeshift dac or base of a software speaker driver.

a low speaker can be set by unsetting speaker enable, and by using mode 2 you can get a middle level at 1/n amplitude from high.

this means you can use it with n=2 for a one-trit dac, or at n=3 and setting levels in software for a hopefully-low-overhead 2/3-tone speaker driver.

reenigne says:

FavoritoHJS: The cassette output is capable of PWM in the same sense that the PC speaker is, because (like you said) it's just the speaker output. Software would need to regularly load new PWM values (which sets the output to low in mode 0), but the high state occurs when the counter reaches 0.

I'm not sure exactly what loopback from output you're thinking of that would have enabled no-maintenance PWM waves. If you connect the gate to the output then I think you can get it to restart on terminal count but that basically just gives you mode 2 again. You'd need to involve two timers - one to control the PWM frequency and one to control the duty cycle. If the gate was connected to counter 1 output that could work pretty well - you'd get a 66kHz carrier which would be inaudible, and 18 levels of PWM. But you'd need to make that a software controllable gate so that mode 3 square waves could still work. And you'd still need to regularly update that PWM output in software to play back sampled audio.

Indeed you can get a few different speaker positions using mode 2, but again you'd need software to set the count regularly to actually make noises. So it could buy you some quieter square waves with the CPU only needing to get involved when that square wave goes low and high. And maybe some very lo-fi sample playback sacrificing dynamic range for lack of audible carrier.

Write a response:

* Required.
Your email address will not be published.