Back during 2013, I became interesed in both Hardware and Hypervisor support to do PCI/VGA Passthrough, as I saw a lot of potential in the idea of passthroughing a real Video Card to a Virtual Machine with a Windows guest just for gaming purposes (Note that this was a few years before this type of setups became popular, so I consider myself a pioneer in this area). At that point in time, the only Hypervisor that supported doing so was Xen, which required a Linux host. Since I had no previous Linux experience, I had to learn everything from scratch, thus I began to take a large amount of notes so that I could perfectly reproduce step by step everything that I did to get my setup in working order. Eventually, those notes evolved into a full fledged guide so that other people could use my knowledge to install an Arch Linux host together with the Xen Hypervisor.
By 2016, I replaced Xen with the standalone version of QEMU, since it gained important features at a much faster pace than Xen (Mainly the fact that QEMU could workaround nVidia artificial limitation to not let the GeForce Drivers to initialize a GeForce card if it detected that it was inside a VM). In the meantime, I updated my guide, but by that point, everyone and their moms had wrote guides about how to do passthrough. During the 3 years that I spent keeping my guide updated, only two people used it that I'm aware of, thus I lost interest in maintaining it as no one but myself used it.
Somewhere along the way, I noticed that QEMU is so ridiculous complex, that in order to fully explain all of its capabilities, you require to have a lot of in-depth knowledge about a multitude of Hardware topics related to modern x86 platforms. There is no other way to know what QEMU is actually doing if you don't know about the Hardware platform. Fine tuning a VM goes far beyond simple walkthrough installation procedures, which were what nearly everyone else was writing about. So I thought that I had finally found my niche...
Soon after, my obsolete virtualization guide was morphing to provide heavy coverage of the idiosyncrasies of recent x86 platforms. This grew out of proportions when I decided that the easiest way to explain why modern stuff works in a certain way is simply by starting at the very beginning, then introduce topics one at a time. At that point, I decided to spinoff all that into its own Wall of Text, as instead of virtualization, it had became a major course of computer history that goes all the way back to the 1981 IBM PC.
If you want a recap about why I began to write all this for a virtualization guide, consider that a Virtual Machine is an extremely complex beast that attempts to implement, mostly in Software, the functionality and behavior of an entire computer. The mix and match of emulation, virtualization and passthrough only tells you the method that was used to reproduce the functionality of a specific component of the VM, but not why it has to be reproduced at all. If you want to understand the complexity of all what a VMM like QEMU has to do in order to be able to create an usable VM, then you have to understand the underlying Hardware whose functionality it is attemping to reproduce. This includes the Processor, the Chipset, the Firmware, Buses like PCI Express and anything that you can plug in them, and the means by which everything is interconnected or related. As you should expect, explaining all that requires major amounts of text.
What you're going to read next covers, step by step, the almost four decades of evolution of the core aspects of the modern x86 platform, most of which QEMU can or has to reproduce. If you think than all that seems like a ridiculous overkill, chances are that this text isn't for you. As the majority of users have simple computers, there is no real need to know all this to be able to use QEMU since most people can rely on generic instructions to get things running smoothly, which is what nearly everyone actually does. However, as you get into bigger and wider systems like Dual Processors (Which also includes first and second generation AMD ThreadRipper, for reasons you will learn when I get there), it becomes critical to have at the very least a basic understanding of why the platform topology is directly related to the way that the system resources must be allocated, else, you will be left pondering why you are getting subpar performance.
Eventually, you will notice that all this knowledge becomes inmensely valuable to optimize any type of setup, as you will have a decent idea about what SMP and SMT are, their relationship with the configurable Sockets-Cores-Threads Topology of the VM Processor and why it is recommended to directly map each vCPU Thread to a CPU in the host, what a NUMA Node is and why it matters if you are using Dual Processors, and how to create a correct virtual PCI Bus Topology that solves some obscure issues with passthroughed Video Cards, something that most users seems to ignore because Windows for the most part doesn't care.
Our computing ecosystem as you know it today, is something that has been built directly on top of a pile of design decisions made in the early 80's. How we got here? Backwards compatibility. It was a great idea that a newer computer was an improved, faster version of your previous one, so that it could run all the Software that you already owned better than it. However, maintaining that sacred backwards compatibility eventually became a burden. It is the reason why a lot of design mistakes made early on were dragged for ages, even influencing later Chipset and Processor designs just to remain compatible with features only reelevant in the earliest generations.
If you are using a computer that has an Intel or AMD Processor, your platform has an IBM PC compatible heritage. The IBM PC compatibility is not a minor thing, it is a Hardware legacy left from the Day One of the IBM PC that is still present in pretty much every modern x86 based platform, and is the reason why you can run MS-DOS on bare metal in a 2020 computer if you want to (Albeit with a severe amount of limitations that doesn't make it useful at all). The historical importance of the early IBM computers is high enough that there are several specialized Websites that host a lot of extremely detailed information about them, like MinusZeroDegrees, which is from where I got huge amounts of the data that I organized for exposition here.
While the whole history is rather long, a lot of things will be easier to digest and make much more sense if you start from the very beginning. After all, I'm sure that neither the original IBM PC designers, nor those that built upon it, could have imagined that almost four decades later, many implementation details would still be reelevant. It also explains why from our current point of view, some things that looks like ugly hacks have some kind of ingenuity applied to them.
The beginning of this history starts with the release of the IBM PC 5150 in August 1981. While at the time there already were other moderately successful personal computer manufacturers, IBM, which back then was the dominant mainframe and enterprise player, decided to enter that emerging market by making a personal computer of its own aimed at corporate business users. Its new computer became extremely popular, and thanks to its successors and the compatible cheaper clone computers, its user base grew large enough to make it the most important computer platform of the 80's. The platform itself keep continuously evolving through the years, and by the late 90's, the descendants of the IBM PC had killed almost all the other competing platforms.
IBM was used to make fully propietary systems that provided a complete Hardware and Software solution, yet it was aware that the personal computer market had a totally different set of requeriments. Thus, for the IBM PC, IBM took many unusual design decisitions. In order to reduce the time that it would take to design the new computer, IBM used mostly standard chips and other components that were already available in the open market (Usually known as off-the-shelf), and so that anyone else could buy. IBM also delegated the responsibility of creating the main Operating System for the new computer to a third party, Microsoft, which keeped the right to license it to other third parties. Finally, IBM deemed necessary for the success of the IBM PC that there had to be a healthy amount of third party support in the form of expansion cards and user Software, so IBM opted to make the IBM PC architecture an open platform to make developing Hardware and Software for it as easy as possible. All these decisions made the IBM PC rather easy to clone...
The impact of making the IBM PC an open platform is something that can't be understated. If you want to see how serious IBM was about that, you can peek by yourself at the IBM PC 5150 Technical Reference Manual (August 1981). Besides detailed computer specifications, that document included the schematics of all the parts of the computer, and even included the Firmware source code in printed form. I doubt that currently there are more than two or three commercial computers (Or even single components, like a Motherboard) that could come close to that level of openness. If anything, I'm actually absolutely impressed by how IBM was extremely detailed and methodical in all its documentation, making most of it a pleasure to read even if you don't fully understand it. I suppose that a lot of IBM former reputation was thanks to that level of attention to the details.
The basic specifications of the IBM PC were rather good when introduced, albeit due to the hectic pace of the industry, it got outdated surprisingly quickly. It was one of the first mass market 16 Bits desktop computers, as it had an Intel 8088 CPU running @ 4.77 MHz when the contemporary home computers used 8 Bits Processors like the MOS Technology 6502 running @ no more than 2 MHz, making the difference, at least on paper, appear to be rather brutal. It had a Socket ready for an optional Intel 8087 FPU, a massive upgrade for the rare Software that could make use of floating point math. It supported up to 256 KiB system RAM (Albeit with later expansion cards it was possible to install more than that, this apparently was the official limit at launch), when others maxed out just at 64 KiB. It also included a long forgotten, built-in Microsoft BASIC interpreter in ROM known as IBM Cassette BASIC, which seems to have been a rather common feature at that point of time in other computers, with the idea being that even with no additional Software, they could be usable. For more addons it had 5 expansion slots, of which you required to use at least one for a mandatory Video Card.
The video and sound capabilities of the IBM PC seems to have been rather mediocre, which makes sense since it was aimed at business users. You could use it with either the text specialized MDA Video Card, which was what most business users prefered, or the 16 colors CGA Video Card, that got somewhat more popular for home users as gaming began to grow, but was inferior for pure console text. Sound was provided by a beeper, the PC Speaker, mounted inside the Computer Case. For removable storage the IBM PC was a hybrid of sorts, as IBM intended to tackle both the low end and high end markets at the same time, thus it supported both Cassette and Diskette interfaces. Cassettes were intended to be the cheap storage option and were already ubiquitous in other home computers, so it was decided to have support to interface with them built-in in the IBM PC Motherboard, though it relied on an external Cassette Deck. Diskettes required an extra expansion card with a FDC (Floppy Disk Controller), yet the Case had front bays for either one or two 5.25'' Diskette Drives, so they were essencially internal and integrated to the computer unit.
During the commercial life of the IBM PC, there were two markedly different models: One colloquially known as PC-1, which was the original model released in 1981, and a refresh released around 1983 known as PC-2. The difference between the two is the Motherboard, as the PC-1 Motherboard supported up to 64 KiB of system RAM installed in the Motherboard itself while the PC-2 could have 256 KiB. In both cases, more system RAM could be installed via expansion cards.
If there is a trend that the IBM PC truly set, is establishing that a desktop computer is made of three main components: Monitor and Keyboard as totally independent units, and a Case housing the Motherboard with its internal expansion slots closer to the back of the Case, with front bays to house the Diskette Drives. Other contemporary home computers like the Apple II had the Keyboard integrated to the same unit than housed the Motherboard, but had no front bays, so Diskette Drives had to be external units. Some were even like modern AiOs (All-in-Ones), with a unit housing both Monitor and Motherboard. I recall seeing photos of at least one that had Monitor, Keyboard, and Motherboard as part of a single unit, but were not in a Notebook form factor but instead resembled a portable desktop computer.
From the IBM PC chip choices, perhaps the most important one was the main Processor. The chosen one was the Intel 8088 CPU (Central Processor Unit), that was based on the same ISA (Instruction Set Architecture) than the more expensive and better performing Intel 8086 CPU (Which is from where the x86 moniker is derived from). What makes the Processor ISA important is Binary Compatibility, as when a compiler creates an executable file by translating source code into machine code (Opcodes), it does so by targeting a specific ISA. If you were to change the Processor to one based on a different ISA, it would mean that, at minimum, you would have to recompile (If not port) all the executable files because the Binary Compatibility would not be preserved. This early Processor choice means that any direct IBM PC successor was forced to use a x86 compatible Processor.
Both the 8088 and 8086 CPUs were 16 Bits Processors, where the usual definition of the "Processor Bits" refers to the size of its GPRs (General Purpose Registers). By today standards, they were Single Core Processors that could execute only a Single Thread at a given moment. As they could not execute multiple Threads concurrently, any form of primitive Multitasking was achieved by an OS that did quick context switches at small time intervals, giving the illusion that multiple applications could run simultaneously (This would hold true for 20 years, until the apparition of SMT in the Pentium 4 around 2001, and Multicore around 2005 with the Athlon 64 X2). The 8088 had an external 8 Bits Data Bus, while the 8086 had a 16 Bits one, which means that the later could move twice the data though the external Data Bus at once. Both the 8088 and 8086 had a 20 Bits Address Bus, allowing them to have a Physical Memory Address Space of 1 MiB (2^20). For comparison, other contemporary 8 Bits Processors had a 16 Bits Address Bus so they could directly address only 64 KiB (2^16).
Since the size of the 16 Bits GPRs was smaller than the 20 Bits of the Address Bus, it was impossible for either CPU to access the whole Memory Address Space with just the value of a single GPR. Intel solution was to use a Segmented Memory Model, where accessing a Memory Address required two values, that in x86 Assembler are known as the Segment and Offset pair (Note that you will see some documentation that calls Segmented Memory to how the 8086/8088 1024 KiB Address Space was partitioned into 16 named Segments of 64 KiB each, but that is not what I'm describing). This Segmented Memory Model was a major characteristic of the early x86 based Processors, though not a positive one, as programming in x86 Assembler was far messier than the Assembler of other Processors based on a Flat Memory Model, like the Motorola 68000, which IBM considered for the IBM PC at some point.
Another significant characteristic of the x86 Processors is that they actually had two completely independent Physical Address Spaces: The already mentioned Memory Address Space, and another one known as the I/O Address Space, where an individual I/O Address is known as an I/O Port. Currently, this stands out as an oddity of the x86 architecture since most other Processors ISAs just have a single Address Space, but back then this was somewhat common. The design intention of having two separate Address Spaces was to differentiate between simple RAM or ROM memory from another chip internal Registers.
In addition to the 20 lines required for the 20 Bits Address Bus, the CPUs also had an extra line, IO/M, that would signal whenever an Address Bus access was intended for a Memory Address or for an I/O Port. However, the I/O Address Space only used 16 Bits of the Address Bus instead of the full 20, so it was limited to 64 KiB (2^16) worth of I/O Ports, which ironically means that when dealing with these, you didn't need cumbersome tricks like the Segment and Offset pair as a single GPR sufficed for the full address space range. Accessing an I/O Port required the use of special x86 instructions, IN and OUT, that triggered the IO/M line.
It may be convenient to explain right now how the Address Space works. As a simplified explanation, an Address Space can be thought about as if it was represented by a fixed amount of slots, where each slot gets its own individual address that can be used to interact with it. In the case of the 8088 and 8086 CPUs, their 20 Bits Address Bus allowed for 1048576 (2^20) individual Memory Addresses in the Memory Address Space, and 65536 (2^16) individual I/O Ports in the I/O Address Space. The reason why they are worth 1 MiB and 64 KiB of memory, respectively, is because the x86 architecture is Byte-level addressable, which means that each individual address location is one Byte in size. As such, each Byte of memory gets its own address, and this also conveniently means that the amount of addressable slots matches the maximum amount of addressable Bytes. An example of an Address Space format that is not Byte-level addressable is the LBA (Logical Block Addressing) used for modern storage drives like HDs (Hard Diskses) and SSDs (Solid State Diskses), where an individual address points to an entire block of either 512 Bytes or 4 KiB of memory.
Regardless of the amount or sizes of the Physical Address Spaces that a Processor may have, there has to be something that is listening to the CPU Address Bus looking for a specific address or address range, so when the CPU sends an address through the Bus, someone actually bothers to take the call and reply back to the CPU. In this context, getting memory to respond to an address is known as Memory Mapping. However, in these ancient platforms, mapping memory faced two important implementation challenges: First, the vast majority of chips were rather dumb. It was not possible to dynamically configure the mapping so that each chip that had addressable memory knew in advance to which address ranges it was supposed to respond. This was a major problem in Buses that had multiple chips connected in parallel, as without something that arbitrated the Bus, all the chips could potentially answer simultaneously to the same addresses, a rather chaotic situation. Second, any chip that had something addressable, with ROM memory chips being the best examples, only had as many Address lines as they needed for their own internal addressable memory size, which means that if trying to wire these directly to the wider CPU Address Bus, the upper Bits of it would be left not connected, thus the chips would be unable to fully understand addresses above their own sizes. Instead, they would reply to anything that matched just the lower Bits of whatever was being sent via the Address Bus, an issue known as Partial Address Decoding. In order to make things work by making each chip mapping unique, with no addresses overlapping or other conflicts across all the address space, it was necessary to solve these two issues.
The basis for Bus arbitration in parallel Buses was the Chip Select line, that was used to make sure that only the active chip would be listening to the Bus at a given moment, making possible to wire several chips in parallel without Bus conflicts caused due to unintended concurrent usage. The Chip Select line could be built-in into the chips themselves as an extra Pin, or could be implemented in those that didn't had it with the help of some external circuitry. Still, for the whole scheme to work, you required something that managed the Chip Select line according to the expected location in the address space of the memory of these chips. That job was done by a bunch of extra discrete components that were collectively known as Glue Logic. The glue logic acted as an intermediary that took care of hooking the narrower chips external Address Buses to the wider CPU Address Bus, assisting them to externally decode the missing upper Bits and activate the Chip Select line when appropriate. Thus, from the CPU point of view, the narrower chips were correctly mapped with no overlapping occuring, as if they effectively understood the full address. You can get an excellent explanation about how the additional supporting decoding logic that helps to listen to the CPU Address Bus works here.
As the actual address range where the memory from a particular chip or bank of chips would be mapped to depended exclusively on how the arbitrating decoding logic was wired to the Address Bus, the mapping layout of a specific platform was pretty much static by design (In some cases, the decoding logic of certain Devices could be configurable, but that was not the case for those that were part of the base platform). The complete platform mapping layout eventually becomes a valuable data construct known as the Memory Map, that should include information about which address ranges got something present in them, and what their uses are going to be.
Is also important to mention that an individual address may not actually be unique. For example, to reduce costs, some platform designers could implement on purpose an incomplete address decoding logic that only does partial decoding of the CPU Address Bus. Such decoding logic typically leaves some upper Address lines not connected (Same scenario that if you were directly wiring a memory chip to a wider Address Bus), so whenever the CPU sent an address through the Bus, the incomplete decoding logic would just consider the lower Bits of the address and ignore the value of the upper Bits. This caused that the chips behind such incomplete decoders responded to address ranges not really mean for them for as long that the lower Bits of the full address matched what the decoder logic could understand. In other words, the same Byte of memory in a chip could respond to multiple individual addresses, as if they were aliases of the intended mapped one. This was bad for forward compatibility since sometimes programmers could decide to use an alias instead of the real address, so if in a later platform revision the address decoding logic was improved, the individual address that used to be an alias would now point to a completely different location, causing the Software piece that used that trick to not work as intended in the new platform.
In addition to that, the Segmented Memory Model used by x86 Processors already allowed something similar to address aliases on its own, as it was possible to use multiple different combinations of Segment and Offset pairs that effectively interacted with the same Physical Memory Address. As programmers of the era had often to use x86 Assembler to get the most performance out of the CPU, it was a necessity to have a very clear idea about both the x86 Segmented Memory Model and the specific platform address decoding scheme idiosyncrasies to know what physical address a code could be really pointing to, more so if a developer was intentionally trying to ofuscate its code as much as possible for things like making Software copy protection harder to crack.
There is some degree of flexibility regarding what can be mapped into the Address Space, which in every case is some form of memory. As explained before, the x86 Processors had two separated Physical Address Spaces, the Memory Address Space and the I/O Address Space. The Memory Address Space was intended for addressing both RAM and ROM memory, but there was a very important detail: The memory could be local, as in the case of the system RAM memory that served as the personal workspace of the CPU, or it could be remote, like when the memory is the workspace of some other Processor or Device. This second case is known as MMIO (Memory Mapped I/O), as the Memory Addresses used by the main CPU to directly address remote memory can be effectively used to transfer data between the CPU and another chip (How the commands and data gets routed to and from the remote memory, and how the other chip notices the externally initiated operation, are another matter). In the case of the I/O Address Space, since it was intended to be exclusively used to address other chips internal Registers, anything in that address space was considered PMIO (Port Mapped I/O). A single Device could use both MMIO and PMIO addressing, which is the typical case for Devices that have both internal Registers and their own local RAM or ROM memory.
The advantage of MMIO over PMIO is that the CPU could use standard generalists instructions like MOV to either read and write from local memory or to do I/O to remote memory, while PMIO required to use the already mentioned special instructions IN and OUT. The drawback of MMIO is that since it takes addresses from the Memory Address Space, you have to sacrifice how much RAM or ROM you can address, which may not be important at first, but becomes a problem as the Address Space gets crowded. Nowadays MMIO is universally prefered, using a single unified Address Space (At least from the point of view of the Processor).
Curiously, I never understood why Intel decided that x86 had to have two Address Spaces, since the 8086/8088 used 20 Address lines plus the IO/M line, for a total of 21 lines and a combined 1088 KiB addressable memory (1024 KiB Memory and 64 KiB I/O). If instead Intel decided to use all 21 lines for the Address Bus, it would have yielded 2048 KiB (2^21) addressable memory, and that shouldn't have been much harder to implement given than x86 already relied on a Segmented Memory Model anyways. I think that it is related to the fact than the x86 ISA is a descendant of the Intel 8085 CPU, which also had two Address Spaces. While the 8086 is not Binary Compatible with the 8085, Intel intended for it to be Source Compatible, as the idea was that the primitive automatic translations tools available at the era could easily port the Software between both ISAs. A single address space would have required to do complete Ports of 8085 Software. Had Intel started the x86 ISA from scratch with no backwards compatibility of any sort, chances are that it would be different in this regard.
There are two more details worth mentioning now related to the Address Space. The first is that when the Processor is turned on, the very first thing that it does is to attempt to read code beginning from a hardcoded Physical Memory Address (In the 8086/8088, the location was near the end of the addressable 1024 KiB), so it is mandatory that there is some memory mapped into that location containing executable code that can initialize the rest of the platform up to an usable state, a procedure known as Bootstrapping. For obvious reasons, such memory can't be volatile like RAM, so you're pretty much limited to contain the bootstrapping executable code in a ROM type memory chip. The contents of that critical ROM chip are known as the Firmware. The second thing is that not all the memory in a system has to be directly addressable by the CPU, sometimes there is memory that is addressable by another Device or Coprocessor but that isn't directly visible from the main CPU point of view at all.
When it comes to interfacing with other chips, there is yet another important characteristic of the 8088 and 8086 CPUs: They had a multiplexed external Bus. Basically, a Bus is a composite of other three specialized Buses called Data, Address and Control, that perform different slices of the same task. In an ideal scenario, these three Buses would be fully independent entities where each signal has its own Pin in the chip package and wire line in the Motherboard, but that wasn't the case with the 8088 and 8086. In order to reduce the Pin count of the CPUs so that they could fit in a standard and cost effective 40 Pin package, some signals had to share the same Pins. In the case of the 8088 CPU, its 8 Bits Data Bus would require 8 lines, but instead of getting its own 8 Pins, the CPU had an internal Muxer unit that multiplexed these 8 lines into the same Pins that were used by 8 of the 20 lines of the 20 Bits Address Bus. For the 8086 CPU and its 16 Bits Data Bus, the 16 lines got multiplexed into 16 of the 20 Pins used by the Address Bus.
Due to the multiplexed signals, what the 8086/8088 CPUs could be sending though the external Bus at a given moment may be either an address or data, according to which step of the Bus Cycle the CPUs were currently executing. In an identical style than the IO/M line was used to signal an operation to either a Memory or I/O address location, the Processors had a line known as ALE, that was used to differentiate whenever what was currently in the Bus was address or data. However, whatever was at the receiving end of the Bus had to be aware of this line so that the Bus contents could be interpreted correctly. This pretty much means that both CPUs could only be directly interfaced with other chips that explicitly supported their particular Bus Cycle protocol and multiplexing scheme, as for anything else, you needed at least intermediate chips to demultiplex the Address and Data signals back into separate lines. Also, as the Processor uses some of its transistor budget to have an extra Muxer unit plus the Bus Cycle has to be longer than it could actually be just to accomodate two different signals on the same lines, multiplexing the external Bus incurs in a measurable performance penalty compared to a similar design than didn't had limitations due to the amount of available Pins.
Both the 8088 and 8086 CPUs could be potentially used as single main Processors in a mode known as Minimum Mode, where they generated by themselves all the Control Bus signals required to drive other chips. The Control Bus was also multiplexed, but demultiplexing it with an intermediary chip acting as glue logic was quite easy. An example of a simple homebrew design based around an 8088 CPU in Minimum Mode that implements the required Address, Data and Control Bus demultiplexers is here. However, this is not what IBM used for the PC...
Chip vendors like Intel didn't just made CPUs, typically they also provided a multitude of complementary support chips that dramatically enhanced their main Processor capabilities, enough to make an extensible computer platform based around them. Both the 8088 and 8086 CPUs belonged to the Intel MCS-86 family of chip sets (Now you know from where the Chipset word is derived from), which were intended to interface with them (Note that the previous link includes 80286 and 80386 era parts that aren't really a good match to first generation MCS-86, but for some reason they are included). They were also rather compatible with the previous chip set generation, the Intel MCS-85, albeit with some caveats, as those were intended for the older Intel 8080 and 8085 CPUs which had only an 8 Bits Data Bus and a 16 Bits Address Bus.
A chip set not only included support chips for the main Processor, there were also major chips considered as a Coprocessor class, like the Intel 8087 FPU and the Intel 8089 IOP. The 8088 and 8086 CPUs supported Multiprocessing (It had a different meaning back then than what you would think about now), allowing them to coordinate with these chips. However, Multiprocessing required a complexer signaling scheme than what the previously mentioned Minimum Mode could do. More signals would require more Pins, but, as you already know, that was impossible to do due to the 8086/8088 packaging constrains. The solution was to heavily multiplex the Control Bus signals into only 3 Pins now known as S0, S1 and S2, then add a specialized discrete demultiplexer, the Intel 8288 Bus Controller, that could interpret thet set of 3 lines as 8 different commands (2^3 makes for a total of 8 different binary combinations). This Multiprocessing mode was known as Maximum Mode, and is how IBM had set the 8088 CPU in the PC.
The advantage of using parts of the same chip set family is that you could expect them to easily interface with each other. Members of the MCS-86 family that interacted with the CPU Address and Data Buses typically supported out of the box either or both of the Bus Cycle protocol or the multiplexed Bus protocol of the 8088 and 8086 CPUs. For example, the 8087 FPU Address and Data lines could be pretty much directly wired to those of the 8086 CPU without demultiplexers or additional circuitry other than the 8288 Bus Controller. Moreover, the 8087 FPU was capable of autodetecting whenever the CPU was a 8088 or 8086 to automatically adjust the Data Bus width. Other chips usually required some external glue logic, so using chips belonging to the same family was far more convenient than attemping to mix and match functionally equivalent chips from other families.
The chips from the older MCS-85 family partially supported being interfaced with the MCS-86 family as they shared the same Bus protocol and multiplexing scheme, but for the most part, they required some glue logic due to the Bus width differences. As they were intended for the 8080/8085 CPUs multiplexed 8 Bits Data and 16 Bits Address Bus, they were usually easier to interface with the 8088 that also had a 8 Bits Data Bus than the 8086 16 Bits one (This is one of the reasons, along with the cheaper cost and higher availability of the older support chips, that made IBM pick the 8088 CPU instead of the 8086). The Address Bus always required glue logic as the 8086/8088 CPUs supported a 20 Bits Address Bus whereas the older support chips intended for the 8080/8085 CPUs had just 16 Bits. However, even with the extra glue logic to make the older chips functional enough, they had diminished capabilities. For example, the IBM PC used an Intel 8237A DMAC that due to its native 16 Bits Address Bus was limited to a 64 KiB data transfer at a time, which was the entirety of the Physical Memory Address Space of the 8080/8085s CPU than it was intended for, yet only 1/16 of the address space of an 8088.
The major chips from the Intel MCS-86 family, besides the 8088 and 8086 CPUs (Central Processor Unit), were the 8087 FPU (Floating Point Unit. Also known as NPU (Numeric Processor Unit), but FPU should be the modern term), which focused on floating point math (It was possible to emulate floating point operations using standard CPU integers via Software, but the FPU was ridiculous faster), and the 8089 IOP (I/O Processor), which could be considered a glorified DMAC with some processing capabilities added in. Both of those chips had their own ISAs, so besides the Instruction Set of the x86 CPUs, you had the x87 FPUs and x89 IOPs, too. The less extravagant support chips included the already mentioned 8288 Bus Controller, which was a specialized Control Bus demultiplexer required if using the 8086/8088 in Maximum Mode, the 8289 Bus Arbitrator, which allowed to make Bus topologies with multiple Bus Masters (Seems to have been used mostly for Intel Multibus based systems, no idea if they were ever used in IBM PC expansion cards or IBM PC compatible computers), and the 8284A Clock Generator, which could generate clock signals as electrically required by all the mentioned Intel chips.
The Intel MCS-85 family was based around the Intel 8080 or 8085 CPUs as the main Processor, which is not interesing for the purposes of this story. The support chips, however, are extremely important. The four reelevant ones are the 8259A PIC (Programmable Interrupt Controller), the previously mentioned 8237A DMAC (Direct Memory Access Controller), the 8253 PIT (Programmable Interval Timer), and the 8255 PPI (Programmable Peripheral Interface), all of which would be used by the IBM PC. With the exception of the 8255 PPI, the other three would manage to become the staple support chips that defined the IBM PC platform and all its successors.
For the IBM PC platform, IBM mixed parts from both the MCS-86 and MCS-85 families. From the MCS-86 family, IBM picked the Intel 8088 CPU itself, the Intel 8087 FPU as an optional socketeable component, the Intel 8288 Bus Controller required for Maximum Mode, and the Intel 8284A Clock Generator. From the MCS-85 family, IBM picked all the mentioned ones, namely the Intel 8259A PIC, the Intel 8237A DMAC, the Intel 8253 PIT and the Intel 8255 PPI. Surprisingly, IBM fully omitted the Intel 8089 IOP from the IBM PC design, which was one of the advanced support chips from the MCS-86 family that Intel suggested to pair their 8088 or 8086 CPUs with. That is why the 8089 has been completely forgotten.
The support chips are part of the core of the IBM PC. Thanks to backwards compatibility, with the exception of the 8255 PPI, their functionality should be at least partially present even in modern computers. All them were mapped to the CPU I/O Address Space and accessed via PMIO, so the CPU could directly interact with them.
Intel 8259A PIC (Programmable Interrupt Controller): The x86 Processor architecture is Interrupt-driven (Also known as Event-driven). In an Interrupt-driven architecture, there is a framework for external Devices to send to the Processor a request for inmediate attention, which is done by signaling an Interrupt Request. When the Processor receives an interrupt request, it stops whatever it was doing, saves its state as if the current task was put on hold, and switches control to an ISR (Interrupt Service Routine, also known as Interrupt Handler) to check the status of the Device that made the interrupt request. After the ISR services the Device, the Processor resumes the previous task by restoring the saved state. An Interrupt-driven architecture requires additional Hardware to work, as there is an out-of-band path between the Processor and the Devices so that they can trigger the interrupts skipping the Bus. A problem with interrupts is that they make application latency highly inconsistent.
The other alternative to Interrupt-driven architectures are Polling-based ones, where the Processor polls the Devices status continuously, as if it was executing an infinite loop. While technically Polling-based architectures are simpler than Interrupt-driven ones since you don't require the dedicated Hardware to implement interrupts, it was considered that they were inefficient since they wasted too much time needlessly polling the Devices status. However, there is nothing limiting an Interrupt-driven architecture to ignore interrupts and instead do polling only. Actually, someone tested that in x86 and found that it may be better in some scenarios (TODO: Broken link, found no replacement). I suppose that polling could be prefered for Real Time OSes as it should provide a very constant or predictable latency, assuming you don't mind the extra power consumption due to the Processor being always awake and working.
The 8088 CPU supported three different types of Interrupts: Exceptions, Hardware Interrupts, and Software Interrupts. All them relied on the same IVT (Interrupt Vector Table) that holded 256 entries, with each entry containing a pointer to the actual ISR. The Interrupts reelevant here are the external Hardware Interrupts, which were of two types: The standard Maskable Interrupts, and NMIs (Non-Maskable Interrupts). The difference between them is that Maskable Interrupts can be fully ignored if the Software being run configures the Processor to do so, while NMIs can't be ignored at all and will always be serviced. To manage the standard interrupts, the 8088 had a pair of dedicated Pins, INTR (Interrupt Request) and INTA (Interrupt Acknowledged), which were used to receive interrupts and acknowledge them, respectively. NMIs had their own dedicated Pin to receive them, NMI, but it lacked another Pin for acknowledge. Note that NMIs are highly specialized and pretty much reserved for niche purposes (Usually to signal Hardware errors), so they don't get a lot of attention. Also, while NMIs can't be ignored by the Processor itself, they can be externally disabled if you have a way to block the NMI line, like via a Jumper.
The 8088, by itself, was capable of receiving standard Maskable Interrupts from just a single Device since it had only one interrupt request line, the already mentioned INTR. This is precisely when the Intel 8259A PIC comes handy. The purpose of the PIC was to act as an interrupt multiplexer, since it was connected to the Processor INTR and INTA lines and could use them to fanout to 8 interrupt lines, thus up to 8 Devices could use interrupts instead of only a single one directly wired to the CPU. The IRQ (Interrupt Request) lines were known as IRQs 0-7 and had priority levels, with 0 being the highest priority interrupt and 7 the lowest one. Each IRQ was just a single line, not two since there was no interrupt acknowledged line between the PIC and the Devices. There could be more than one 8259A PIC in a system, with one master and up to 8 slaves, for a total of 64 IRQs. The slave PICs were cascaded by taking each an IRQ line from the master PIC. Note that the 8259A PIC Datasheet mentions cascading, but not daisy chaining, so it seems that you can't have three or more PICs where there is a slave PIC hooked to an IRQ of the master, then the slave PIC has its own slave hooked to an IRQ of it in a three-level arrangement.
When using the 8259A PIC, whenever a Device signaled an interrupt, the PIC received it via an IRQ line, then the PIC signaled its own interrupt to the Processor, which received it via INTR. The Processor, in turn, used an ISR that activated the INTA line to acknowledge the PIC request. Then, the PIC told the Processor from which IRQ the Device interrupt was coming from, so it could switch to the proper ISR for that IRQ. For more information about how Hardware Interrupts works in x86 based platforms, read here.
In the IBM PC, 6 of the 8 IRQs provided by the PIC (2-7, or all except 0 and 1) were free and made directly available to discrete Devices via the expansion slots. IRQs 0 and 1 were used by internal Motherboard Devices: The Intel 8253 PIT was wired to IRQ 0, and the Keyboard interface logic to IRQ 1. IRQ 6, while exposed in the expansion slots, was used exclusively by FDC (Floppy Disk Controller) type cards, and I'm not aware of other cards that allowed you to set a Device to that IRQ. Meanwhile, the 8088 NMI line was used for error reporting by the Memory subsystem, but for some reason, the 8087 FPU was also wired to it instead of a standard PIC IRQ as Intel recommended.
Intel 8237A DMAC (Direct Memory Access Controller): On a bare Intel 8088 CPU based platform, any data movement between Devices and a memory address has to be performed by the CPU itself. This means that instead of doing the actual compute work, the CPU spends a lot of its time moving data around as if it was the delivery boy of the Devices. Supporting DMA means that there is a way for data movement to happen without the direct intervention of the CPU, offloading such memory transactions from it.
In this primitive era, most chips were rather dull and simple, they just limited themselves to do very specific functions. Having Devices like a Floppy Disk Controller or a Sound Card being able to independently initiate a memory transaction was theorically possible, but expensive in both cost and physical size due to the required amount of extra logic to add such a complex function. The solution was to add a shared controller to do so on behalf of multiple Devices, being able to provide a means to perform DMA without needing to have such capabilities integrated into each Device itself. This is known as Third-party DMA, where you have a DMAC that provides DMA services to Devices. Note that a parallel Bus would still be shared between the CPU and the DMAC, so only one of the two could use it at a given moment. The general idea was that the DMAC would help to move data in the background while the CPU was busy executing instructions that didn't required for it to access the Bus, but sometimes it could be a hindrance if the DMAC didn't wanted to relinquish Bus control to the CPU while it was waiting for data.
Between the DMAC and DMA supporting Devices there was an out-of-band path similar to IRQs. This path was known as a DMA Channel, and included two lines, DREQ (DMA Request) and DACK (DMA Acknowledge), which were used by the Device to signal a DMA request and by the DMAC to acknowledge that it was ready to service it, respectively. When a Device wanted to perform DMA, it triggered its DMA Channel to tell the DMAC to do a memory transaction. However, because the Devices themselves were rather dumb, the only thing that they could do was to make a request, but they couldn't tell the DMAC what to do when receiving it. There had to be a previous setup period as a Device Driver had to program the DMAC ahead of time so that it knew what operation it had to do when it received a DMA request from a particular Device DMA Channel, something similar in nature to an ISR.
The 8237A DMAC offered 4 DMA Channels, so up to 4 Devices could be serviced by it. The DMA Channels are numbered 0 to 3 and like the IRQs, they had different priority levels, with 0 being the highest priority channel and 3 the lowest. An important detail of the 8237A is that it was a flyby type DMAC, where flyby means that it could simultaneously set a Device to read or write to the Bus and the memory to accept such transaction in the same clock cycle, so it didn't actually had to buffer or handle the data itself to perform data movement. Like the 8259A PIC, there could be multiple 8237A DMACs in a system. The supported configurations were cascading up to 4 8237A DMACs to the 4 DMA Channels of a master 8237A in a two-level arrangement, or daisy chained, where a master 8237A had another 8237A in a DMA Channel, and this second 8237A had a third one behind it, forming a three-level arrangement (Or more if required) with 3 Devices and 1 slave DMAC per level. Cascading and daisy chaining could also be mixed, with no limit mentioned in the 8237A Datasheet.
In the IBM PC, the 8237A DMA Channel 0 was used as a means to initiate a dummy transfer that refreshed the contents of the DRAM (Dynamic RAM) chips, as the IBM PC Motherboard lacked a proper dedicated DRAM Memory Controller. DMA Channel 0 signals was used to refresh both the DRAM located on the Motherboard and on Memory expansion cards too, since there was a Pin on the expansion slots, B19, that made available the DACK0 line to expansion cards, albeit there was no DREQ0/DRA0 line exposed for cards to initiate requests. DMA Channels 1-3 were free and both DREQ and DACK lines are exposed in the IBM PC expansion slots for any Device on an expansion card to use, albeit DMA Channel 2 was considered to be exclusive for FDC cards. Like with IRQ 6, also for FDC type cards, no other expansion card type allowed you to set it to use that DMA Channel that I'm aware of.
Is important to mention that the 8237A as used in the IBM PC, was severely handicapped due to being intended for the previous Processor generation. While IBM added additional glue logic, known as the Page Registers, to manage the upper 4 address lines of the 8237A external Address Bus so that it could understand the full 8088 20 Bits width, its lack of native support for such width gave it many limitations. The 8237A was limited to transfer at maximum 64 KiB in a single transaction, and the transactions could operate only within aligned 64 KiB Segments (0-63 KiB, 64-127 KiB, etc) because the DMAC couldn't modify the Page Registers by itself. Some of the 8237A issues were specific to the IBM PC implementation. For example, IBM decided to wire the EOP (End of Process) line as Output only instead of Input/Output, so it is not possible for an external source to tell the 8237A to abort a DMA operation, it just uses it to signal when it finished. The 8237A was also able to do memory-to-memory transfers but it required to have DMA Channels 0 and 1 available, which was not possible in the IBM PC due to DMA Channel 0 being used for the DRAM refresh procedure.
While having the 8237A DMAC was better than nothing (And not that bad by 1981 levels), in a few years it would become a burden due to its performance being nearly impossible to scale up. Several people have already wondered how different the stories about DMA on IBM PC platforms would have been had IBM decided to go with the Intel 8089 IOP instead of the previous generation 8237A. Besides the facts than the 8089 IOP had only 2 DMA Channels compared to the 8237A 4 and that it was much more expensive, eventually it may have saved the entire industry a lot of headaches.
Intel 8253 PIT (Programmable Interval Timer): A computer by itself has no concept of time. At most, what it can do is simply to count elapsed clock cycles. Because the clock speed that a given part is running at is a known value, it is possible to externally infer the real time elapsed based on a clock cycle counter, at which point you have the first building block to count seconds (Or fractions of it). While theorically the CPU can count clock cycles, it would be a waste for it to do so, since it means that it wouldn't be able to do anything else without completely desynchronizing. For this reason, there were dedicated timers whose only purpose was to count cycles without interruptions, as they were required to be as consistent as possible for any timing measurements to be accurate.
The 8253 PIT was an interesing chip due to the amount of functions than it could do that exceeded those of a simple timer. Actually, it had three independent timers, known as Counters 0, 1 and 2. All three timers could be directly usable by Software, as you could program every how many clock cycles they had to tick then just read back values. Each Counter also had both an input GATE line and an output OUT line, the latter which could be independently triggered by that Counter to allow the 8253 to directly interface with other Hardware.
The IBM PC not only used all three 8253 Counters, it also used all their external OUT lines, too. Counter 0 was used as the System Timer to keep track of time elapsed, with its OUT line hooked directly to IRQ 0 of the 8259A PIC so that it interrupted the CPU to update the clock at regular intervals with the highest Interrupt priority. Counter 1 was used as a DRAM refresh timer, with its OUT line wired directly to the DRQ0 Pin of the 8237A DMAC to request also at regular intervals a dummy DMA transfer that refreshed the DRAM memory. Last but not least, Counter 2 OUT line passed though some glue logic to reach the PC Speaker. In addition to that, Counter 2 had its input GATE line wired to the Intel 8255 PPI, whereas the other two Counters didn't had it connected. Both the 8253 PIT and 8255 PPI could be used either individually or in tandem to produce noises via the PC Speaker.
Intel 8255 PPI (Programmable Peripheral Interface): As mentioned before, the CPU external Bus had a protocol of sorts, which means that anything that directly interfaced with it had to understand that protocol. Yet, when it came to interface with external peripherals, instead of adapting them to understand the CPU Bus protocol, designers seemed to always opt for some form of GPIO (General Purpose I/O). GPIO can be used to make protocol agnostic interfaces where you can bit-bang raw data in and out, and leave Software like a Device Driver to interpret the raw Bits meanings. It could be considered a raw Bus with an user-defined protocol. Obviously, there were specialized intermediary chips that interfaced with the CPU Bus protocol to provide such GPIO.
Due to my lack of electronics knowledge, I don't actually understand what GPIO truly did to be useful and earn its place in a platform design. There is technically nothing that forbids you to provide direct access to the CPU Bus via an external connector if you wanted to do so (Albeit stretching that much a circuit may hurt signal integrity if cables are too long or poor quality, causing a severely limited maximum possible stable clock speed, but the point is that it would still work). IBM itself did that with the IBM 5161 Expansion Chassis addon for the IBM PC 5150 and IBM PC/XT 5160, which was pretty much a separate Computer Case that was cabled to the main computer and provided more room for expansion slots. I assume that bit-banging though GPIO is just far easier to implement in simple peripherals than a direct CPU Bus interface, and it would also be completely neutral in nature thus easier to interface with other computer platforms. Nowadays, a lot of what previously used to be done via GPIO interfaces is now done via USB, which is a well defined protocol.
The Intel 8255 PPI provided a total of 24 GPIO Pins arranged as 3 Ports of 8 Pins each, named Ports A, B and C. Port C could also be halved into two 4 Pin sets to complement Ports A and B, with the extra lines having predefined roles like generating interrupts in behalf of each Port (Which is not very GPIO-like...). Like the 8253 PIT, the 8255 PPI had several possible operating modes to cover a variety of use cases.
In the IBM PC, the 8255 PPI was used for a myriad of things, which makes explaining all its roles not very straightforward. Actually, MinusZeroDegrees has plenty of info about the details of each 8255 GPIO Pin role in both the IBM PC and its successor, the IBM PC/XT.
The easiest one to describe is Port A. Port A main use was as the input of the Keyboard interface, doing part of the job of a Keyboard Controller with the help of additional logic between it and the Keyboard Port (Albeit the 8255 PPI is not a Keyboard Controller in the proper sense of the word, as it just provides a generic GPIO interface. It could be said than it is as much of a Keyboard Controller as the 8253 PIT is a Sound Card, they were not designed to do those roles, just parts of those subsystems circuitry). While inbound data from the Keyboard generated interrupts on IRQ 1, the 8255 itself didn't signaled those with its built-in interrupt logic because it was not wired to the 8259A PIC at all. Instead, the glue logic that did the serial-to-parallel conversion of the incoming Keyboard data so that it could be feed to the 8255 was also wired to the PIC and signaled the interrupts. Port A also had some auxiliary glue logic that wired it to a set of DIP Switches in the Motherboard known as SW1, and whenever Port A was getting input from the DIP Switches or the Keyboard depended on the status of a configuration Pin on Port B.
Port B and Port C are where the fun stuff happens. Their jobs included interacting with the 8253 PIT, the PC Speaker (Both directly, and indirectly though the PIT), the Cassette interface, another set of DIP Switches known as SW2, part of the system RAM error detection logic, and even Port B had a Pin used to control whenever Port A was reading the Keyboard or the SW1 DIP Switches. The Pins roles were completely individualized yet mixed among the same Port, unlike Port A. Here comes a major detail involving such implementation: The CPU could do only 8 Bits Data Bus transactions, which means that you couldn't interact with just a single Bit in a Port. Any operation would have it either reading or writing all the 8 Bits at once (I think the 8255 PPI supported doing so in BSR Mode, but that mode was only available for Port C, and I'm unsure whenever the IBM PC had Port C configured that way anyways). As you couldn't arbitrarily change the value of just a single Bit without breaking something, it was critical than whenever you wanted to do things like sending data to the PC Speaker, you first loaded the current value of the entire Port to a CPU GPR, modified the Bit you wanted without altering the others, then write it back to the Port.
A major design decision of the IBM PC was the organization of the 8088 CPU Address Spaces. As previously mentioned, getting something properly mapped into the Address Space required the help of auxiliary address decoding glue logic, so everything present on a Bus had to be behind one. This applied not only to the Motherboard built-in memory and Devices, but also for the expansion cards, too, as each required its own decoding logic. It was a rather complex situation when you consider that every address decoding logic in the system was listening to the same unified parallel Bus (Reason for why on next chapter), so it was extremely important that there were no address conflicts in anything plugged in the IBM PC. As the platform designer, IBM had to take on the task to explicitly define the addresses or address ranges that the platform built-in memory and Devices would be mapped to and which ranges were free to use by expansion cards, so that there was no overlapping between any of them. This is the basic concept behind a Memory Map.
The IBM PC 5150 Technical Reference Manual includes the IBM defined Memory Map in Pages 2-25, 2-26 and 2-27, while the I/O Ports Map is in Pages 2-23 and 2-24. These are unique to the IBM PC, or, in other words, different to the Memory and I/O Ports Maps of any other 8086/8088 based platforms. As the address decoding logic that took care of the support chips in the IBM PC Motherboard was hardwired to use the IBM defined addresses, the mapping for them was absolutely fixed. The expansion cards, depending on the manufacturer intentions, could have either a fixed address decoding logic, or it could be configurable via Jumpers or DIP Switches.
The IBM PC Memory Map, as can be seen on the previously mentioned tables, was almost empty. However, even by that early point, IBM had already taken a critical decision that would have an everlasting impact: It partitioned the 1 MiB Memory Address Space into two segments, one that occupied the lower 640 KiB (0 to 639 KiB), which was intended to be used solely for the system RAM (Be it located either in the Motherboard or in expansion cards), and another segment that occupied the upper 384 KiB (640 KiB to 1023 KiB), which was intended for everything else, like the Motherboard ROMs, and the expansion cards RAMs and ROMs as MMIO. These segments would be known in DOS jargon as Conventional Memory and UMA (Upper Memory Area), respectively. This is from where the famous 640 KiB Conventional Memory limit for DOS applications comes from.
The contents of the Conventional Memory are pretty much Software defined, with a single exception: The very first KiB of the Conventional Memory is used by the CPU IVT (Interrupt Vector Table), which has 256 Interrupt entries 4 Bytes in size each. Each entry (Vector) was a pointer for an Interrupt ISR (Interrupt Service Routine). From these 256 entries, Intel used only the first 8 (INT 0-7) as 8086/8088 internal Exceptions, marked from 9-32 (INT 8-31) as reserved for future expansion, then let the rest available for either Hardware or Software Interrupts. IBM, in its infinite wisdom, decided to start mapping its ISR pointers beginning at INT 8, which was reserved. This would obviously cause some issues when Intel decided to expand the possible Exceptions Interrupts, causing some overlap.
In the case of the UMA segment, the IBM defined Memory Map had several ranges that were either used, marked as reserved or explicitly free. The sum of the actually used ones was 68 KiB, consisting of all the Motherboard ROM plus two video framebuffers: 8 KiB (1016 KiB to 1023 KiB) for the Firmware ROM, that is the only thing that had to be located at a mandatory address as required by the 8088 CPU, 32 KiB (984 KiB to 1015 KiB) for the IBM Cassette BASIC, made out of four 8 KiB ROM chips, 8 KiB (976 KiB to 983 KiB) for an optional ROM chip on the Motherboard that went mostly unused, and two independent RAM based video framebuffers that were MMIO, 4 KiB (704 KiB to 707 KiB) for the MDA Video Card, and 16 KiB (736 KiB to 751 KiB) for the CGA Video Card. 124 KiB addresses were marked as reserved but unused: Two 16 KiB chunks, one right above the end of Conventional Memory (640 KiB to 655 KiB) and the other just below the Motherboard ROM (960 KiB to 975 KiB), and 100 KiB intended for not yet defined video framebuffers, that were themselves part of a 112 KiB contiguous chunk that had already MDA and CGA on it (656 KiB to 767 KiB). Finally, there were 192 KiB explicitly marked free (768 KiB to 959 KiB). Note that the Video Cards were not built-in in the IBM PC Motherboard, these were always independent expansion cards, yet IBM defined a fixed, non overlapping mapping for both of them as part of the base platform.
As early on IBM had the complete reins of the IBM PC platform, IBM itself decided when an address range stopped being free and was assigned or reserved for something else. Whenever IBM released a new type of Device in an expansion card format, typically it also defined a fixed set of resources (Memory Addresses, I/O Ports, DMA Channel and IRQ line) that it would use, then enforced the usage of these resources by making its expansion cards fully hardwired to them, so any mapping was always fixed. As the IBM PC platform matured, the number of newly defined or otherwise reserved address ranges in the Memory Map grew as IBM released more types of expansion cards (For example, the Technical Reference Manual previously linked is dated from August 1981, there is a later one from April 1984 that added a fixed address range for the ROM of the new Hard Disk Controller card). Sometimes, IBM could define two or more sets of resources if it was intending that more than one card of the same Device type was usable in a system, those cards had Jumpers or DIP Switches to select between the multiple fixed sets of resources. An example is this IBM card that had a Serial Port with a DIP Switch to select between two sets of resources, which would become much more known as COM1 and COM2.
The IBM resource hardwiring practices would become a problem for third party developers, as they had to make expansion cards that had a wide range of configurable resources to guarantee compatibility in the foreseeable scenario that IBM released a new card that used the same resources, or that another third party card did. As such, third party expansion cards typically had a lot of Jumpers or DIP Switches that allowed you to select at which address ranges you wanted to map whatever memory they had, which depending on the needs of the card, could include mapping RAM or ROM memory in different ranges of the UMA, and different I/O Ports. Same with IRQs and DMA Channels. You also had to configure the Software to make sure that it knew where to look at for that card, as third party Devices had common defaults but these were not guaranteed, unlike IBM fixed resource definitions for their cards.
While the reserved addressing range for system RAM allowed for a maximum of 640 KiB, the IBM PC Motherboard itself couldn't have that much installed onboard. During the life cycle of the IBM PC, two different Motherboards were used for it, whose main difference was the amount of onboard RAM that they supported. The first PC Motherboard version, known as 16KB - 64KB, supported up to 64 KiB RAM when maxed with 4 Banks of 16 KiB, and a second Motherboard version, known as 64KB - 256 KB, supported up to 256 KiB RAM when maxed with 4 Banks of 64 KiB (The second Motherboard was included in a new revision of the IBM PC 5150, known as PC-2, released in 1983, around the same time than its successor, the IBM PC/XT 5160). In order to reach the 640 KiB limit (Albeit at launch IBM officially supported only 256 KiB, as that was the practical limit back then), you had to use Memory expansion cards. What made these unique is that they were intended to be mapped into the Conventional Memory range instead of the UMA as you would expect from any other type of expansion card, which is why the memory on these cards could be used as system RAM.
An important consideration was that all system RAM had to be mapped as a single contiguous chunk, gaps were not accepted in the Conventional Memory mapping (This seems to NOT have been true in the original IBM PC 16KB - 64KB Motherboard with the first Firmware version, as I have actually read about it supporting noncontiguous memory. Chances are than that feature was dropped because it also required support from user Software, which would have made Memory Management far more complex). As such, all Memory expansion cards supported a rather broad selection of address ranges to make sure that you could always map their RAM right after the Motherboard RAM, or right after the RAM from another Memory expansion card. A weird limitation of the IBM PC is that it required to have all 4 Banks on the Motherboard populated before being able to use the RAM on Memory expansion cards, yet the later IBM PC/XT 5160 can use them even with only a single Bank populated.
A little known quirk is that while the IBM PC designers did not intend that there was more than 640 KiB system RAM, nothing stopped you from mapping RAM into the UMA for as long that the address decoding logic of the Memory expansion card supported doing so. With some hacks to the BIOS Firmware, it was possible to get it to recognize more than 640 KiB Conventional Memory, and pass down this data to the Operating System. PC DOS/MS-DOS supported this hack out of the box since they relied on the BIOS to report the amount of system RAM that the computer had installed, they didn't checked that by themselves. The problem was that the hack still maintained the limitation that Conventional Memory had to be a single contiguous segment, so it means that how much you could extend the system RAM depended on which was the first UMA range in use.
Even in the best case scenarios, the beginning of the Video Card framebuffer was effectively the highest limit than the hack could work: With CGA, it could go up to 736 KiB, with MDA, up to 704 KiB, and with later Video Cards like EGA and VGA, you couldn't extend it at all since their UMA ranges beginned exactly at the 640 KiB boundary. Mapping the video framebuffer higher was possible, but it required both a specialized Video Card that allowed you to do so, to further hack the Firmware, and even to patch the user Software, so that it would use the new address range instead of blindly assuming that the video framebuffer was at the standard IBM defined range. Thus, while theorically possible, it should have been even rarer to see someone trying to move the video framebuffers to see more RAM as standard Conventional Memory due to the excessive complexity and specialized Hardware involved. These days it should be rather easy to demonstrate those hacks on emulated environments, then attemping them in real Hardware.
I don't know if later systems could work with any of these hacks (Albeit someone recently showed that IBM own OS/2 had builtin support for 736 KiB Conventional Memory if using CGA), but as the newer types of Video Cards with their fixed mappings beginning at 640 KiB became popular, the hacks ceased to be useful considering how much effort they required. As such, hacks that allowed a computer to have more than 640 KiB Conventional Memory were rather short lived. Overally, if you could map RAM into the UMA, it was just easier to use it for a small RAMDisk or something like that.
The physical dimensions of the IBM PC Motherboard depends on the Motherboard version. The IBM PC 5150 Technical Reference from August 1981 mentions that the Motherboard dimensions are around 8.5' x 11', which in centimeters would be around 21.5 cm x 28 cm. These should be correct for the first 16KB - 64KB Motherboard found in the PC-1. The April 1984 Technical Reference instead mentions 8.5' x 12', and also gives an accurate value in milimeters, rounding them to 21.5 cm x 30.5 cm. Given than the second Technical Reference corresponds to the 64KB - 256KB Motherboard of the PC-2, both sizes should be true, just that IBM didn't specifically mentioned which Motherboard they were talking about in the later Technical Reference.
For internal expansion, not counting the 5 expansion slots, the IBM PC Motherboards had two major empty sockets: One for the optional Intel 8087 FPU, and another for an optional 8 KiB ROM chip (Known as U28) that had no defined purpose but was mapped. For system RAM, both Motherboards had 27 sockets organized as 3 Banks of 9 DRAM chips each, with each Bank requiring to be fully populated to be usable. The 16KB - 64KB Motherboard used DRAM chips with a puny 2 KiB RAM each, whereas the 64KB - 256KB Motherboard used 8 KiB DRAM chips, which is why it gets to quadruple the RAM capacity using the same amount of chips. While in total there were 4 RAM Banks (Did you noticed that there was more installed RAM than usable RAM already?), the 9 DRAM chips of Bank 0 always came soldered, a bad design choice that eventually caused some maintenance issues as it was much harder to replace one of these chips if they were to fail. Some components like the main Intel 8088 CPU, the ROM chips with the BIOS Firmware and the IBM Cassette BASIC came socketed, yet these had only an extremely limited amount of useful replacements available that appeared later on the IBM PC life cycle, and ironically, seems to not have failed as often as the soldered RAM.
While the IBM PC 5150 Motherboards were loaded with chips (On a fully populated Motherboard, around 1/3 of them were DRAM chips), the only external I/O connectors they had were just two Ports, one to connect the Keyboard and the other for an optional Cassette Deck. As mentioned before, both external Ports were wired to the Intel 8255 PPI (Programmable Peripheral Interface) GPIO with some additional circuitry between the external Ports and the 8255 input, so it can't be considered a full fledged Keyboard Controller or Cassette Controller. There was also a defined Keyboard protocol so that the Keyboard Encoder, located inside the Keyboard itself, could communicate with the Keyboard Controller circuitry of the IBM PC.
The IBM PC Motherboards also had an internal header for the PC Speaker mounted on the Computer Case, which was wired though some glue logic to both the Intel 8253 PIT and Intel 8255 PPI. Talking about the Computer Case, the IBM PC one had no activity LEDs (Like Power or HDD) at all, it only exposed the Power Switch at the side, near the back. It was really that dull. Also, as the computer On/Off toggle type Power Switch was not connected to the Motherboard in any way since it was part of the Power Supply Unit itself, the PC Speaker had the privilege to be the first member of the Front Panel Headers found in modern Motherboards.
The IBM PC Keyboard, known as Model F, deserves a mention. The Keyboard looks rather interesing the moment that you notice that inside it, there is an Intel 8048 Microcontroller working as a Keyboard Encoder, making the Keyboard itself to look as if it was a microcomputer. The 8048 is part of the Intel MCS-48 family of Microcontrollers, and had its own integrated 8 Bits CPU, Clock Generator, Timer, 64 Bytes RAM, and 1 KiB ROM. Regarding the ROM contents, manufacturers of Microcontrollers had to write the customer code during the chip manufacturing (Some variants of the 8048 came with an empty ROM that could be programmed once, some even reprogrammed), which means than the 8048 that the IBM Model F Keyboard used had a built-in Firmware specifically made for it, so it was impossible to replace the Model F 8048 for another 8048 that didn't came from the same type of unit. Some people recently managed to dump the 8048 Firmware used by the IBM Model F Keyboard, for either emulation purposes or to replace faulty 8048s with working reprogrammable chips with the original Model F Firmware.
It shouldn't be a surprise to say that all the mentioned chips, including the main Processor, the support chips, the RAM and ROM memory chips, plus all the glue logic to make them capable of interfacing together and getting mapped to the expected location in the address spaces, had to be physically placed somewhere, and somehow interconnected together. The Motherboard, which IBM used to call the Planar in its literature, served as the physical foundation to host the core chips that defined the base platform.
Regarding the chip interconnects, as can be seen in the System Board Data Flow diagram at Page 2-6 of the IBM PC 5150 Technical Reference Manual, the IBM PC had two well defined Buses: The first one was the parallel Local Bus, that interconnected the Intel 8088 CPU, the optional 8087 FPU, the 8288 Bus Controller and the 8259A PIC, and the second one was called the I/O Channel Bus, that transparently extended the Local Bus and interfaced with almost everything else in the system. Additionally, while the mentioned Block Diagram barely highlights it, the I/O Channel Bus was further subdivided into two segments: One that connects the Local Bus with the Memory Controller and that also extended to the expansion slots, exposing all the I/O Channel Bus signal lines for expansion cards to use, and the other is a subset of I/O Channel for the support chips located in the Motherboard itself, which just had limited hardwired resources. There is also a Memory Bus between the Memory Controller and the RAM memory chips that serves as the system RAM.
Local Bus: The core of the Local Bus is the 8088 CPU. As previously mentioned, the 8088 CPU had a multiplexed external Bus, so all the chips that sits in the Local Bus had to be explicitly compatible with its specific multiplexing scheme, limiting those to a few parts from the MCS-86 and MCS-85 chip sets. The Local Bus was separated from the I/O Channel Bus with some intermediate buffer chips that served to demultiplex the output of the 8088 Data and Address Buses, making them independent lines so that it was easier to interface them with third party chips, while the 8288 Bus Controller did the same with the 8088 Control Bus lines. What this means is that the I/O Channel Bus is pretty much a demultiplexed transparent extension of the Local Bus that is behind some glue logic but has full continuity with it, it is not a separate entitity. As such, in the IBM PC, everything effectively sits in a single, unified system wide parallel Bus. That means that all the chips were directly visible and accessible to each other, which made the IBM PC Motherboard to look like a giant backplane that just happen to have some built-in Devices.
I/O Channel Bus: It can't be argued than the I/O Channel Bus at the Motherboard level is just a demultiplexed version of the 8088 Local Bus, but things gets a bit more complex when you consider I/O Channel at the expansion slot level. The expansion slots exposed a composite version of the I/O Channel Bus, since besides being connected to the 8088 CPU demultiplexed Local Bus Address, Data and Control lines, it also had Pins that were directly wired to the 8259A PIC and 8237A DMAC, so that an expansion card could use one or more IRQs and DMA Channels lines at will (Some Devices on the Motherboard were hardwired to them, too. The DMA and IRQ lines that were used by those Devices were not exposed in the I/O Channel expansion slots).
A detail that I found rather curious is that IBM didn't really had a need to create their own, custom expansion slot and its composite Bus. Before IBM began designing the IBM PC, Intel had already used for at least some of its reference 8086/8088 platforms its own external Bus and connector standard, Multibus. Another option would have been the older, industry standard S-100 Bus. While it is easy to find discussions about why IBM choosed the Intel 8088 CPU over other contemporary Processor alternatives, I failed to find anyone asking about why IBM decided to create a new expansion slot type. Whenever IBM wanted to roll a new expansion slot standard as a vendor lock-in mechanism or just do so to sell new cards for the IBM PC, or if the existing options like Multibus weren't good enough for IBM (Multibus had Pins intended for 8 IRQ lines, but no DMA Channels), or if IBM had to paid royalties to Intel to use it and didn't want to, is something that I simply couldn't find answers about.
Memory Bus: From the Processor point of view, the most important component of the platform should be the system RAM, since it is its personal workspace. However, CPUs usually didn't interfaced with the RAM chips directly, instead, there was a Memory Controller acting as the intermediary, with the Memory Bus being what linked it with the RAM chips themselves. Note that there could be multiple Memory Buses on the same system, like in the case of the IBM PC, where you always had the Memory Controller and RAM chips that were part of the Motherboard itself plus those on the optional Memory expansion cards, as they had their own Memory Controller and RAM chips, too.
The Memory Controller main function was to multiplex the input Address Bus, as RAM chips had a multiplexed external Address Bus in order to reduce package pin count. Unlike the Intel 8088 CPU that had multiplexed part of the Address Bus lines with the Data Bus ones, RAM chips had a dedicated Data Bus, pin reduction was achieved by multiplexing the Address Bus upon itself. The multiplexing was implemented simply by halving the addresses themselves as two pieces, Columns and Rows, so a RAM chip effectively received addresses twice per single operation. Moreover, since the RAM chips were of the DRAM (Dynamic RAM) type, they had to be refreshed at periodic intervals to maintain the integrity of their contents. A proper DRAM Memory Controller would have done so itself, but the IBM PC didn't had one, instead, it relied on the 8253 PIT and 8237A DMAC as auxiliary parts of the Memory Controller circuitry to time and perform the DRAM refreshes, respectively, while some discrete logic performed the Address Bus multiplexing. Also, the expansion slots exposed the DACK0 signal (From DMA Channel 0) that could be used by Memory expansion cards to refresh the DRAM chips in sync with the Motherboard ones.
The IBM PC Motherboard Memory Controller circuitry could manage up to 4 Banks of DRAM chips through its own Memory Bus, being able to access only a single Bank at a time. Obviously you want to know what a Bank is. A Bank (It can also be a Rank. Rank and Bank definitions seems to overlap a lot, depending on context and era) is made out of multiple RAM chips that are simultaneously accessed in parallel, where the sum of their external Data Buses has to match the width of the Memory Data Bus, that, as you could guess, was highly tied to the 8088 CPU 8 Bits Data Bus. There were multiple ways to achieve that sum, including to use either a single RAM chip with an 8-Bits external Data Bus, 2 with 4-Bits, or 8 with 1-Bit. The IBM PC took the 1-Bit RAM chips route, as these were the standard parts of the era. While this should mean than the IBM PC required 8 1-Bit RAM chips per Bank, it actually had 9...
As primitive as the Memory Controller subsystem was, it implemented Parity. Supporting Parity means that the Memory Controller had 1 Bit Error Detection for the system RAM, so that a Hardware failure that caused RAM corruption would not go unnoticed. This required the ability to store an extra 1 Bit per 8 Bits of memory, which is the reason why there are 9 1-Bit RAM chips per Bank instead of the expected 8. It also means that the Data Bus between the Memory Controller and a Bank was actually 9 Bits wide, and that per KiB of usable system RAM, you actually had installed 1.125 KiB (1 + 1/8) of raw RAM. I suppose that IBM picked 1-Bit RAM chips instead of the wider ones because Parity doesn't seem to be as straightforward to implement with the other alternatives. The Parity circuitry was wired to both some GPIO Pins of the 8255 PPI and to the NMI Pin on the 8088 CPU, and was also exposed in the expansion slots. When a Memory Controller signaled an NMI, an ISR fired up that checked the involved 8255 PPI Ports Bits as a means to identify whenever the Parity error happened on the RAM in the Motherboard or in a Memory expansion card. Note than not all Memory expansion cards fully implemented Parity, some cheap models may work without Parity to save on DRAM chip costs.
At the time of the IBM PC, Parity was a rare feature considered unworthy of personal computers (Ever hear about famous supercomputer designer Seymour Cray saying "Parity is for farmers", then comically retracting a few years later of that statement by saying "I learned that farmers used computers"?), yet IBM decided to include it anyways. This is the type of actitude that made IBM highly renowned for the reliability of its systems. In comparison, modern Memory Modules standards like DDR3/4 also supports 1 Bit Error Detection, but without the need of extra raw RAM because it works on whole 64 Bits Ranks that allows for some special type of error detection algorithm that doesn't require to waste extra memory. The DDR3/DDR4 Memory Modules that have ECC (Error Checking Correction) capabilities does use an extra RAM chip with both a 1/8 wider 72 Bits Bus and 1/8 more raw RAM than the usable one, exactly like the IBM PC Parity implementation, but ECC instead allows to do 2 Bits Error Detection with 1 Bit Error Correction, assuming that the platform memory subsystem supports using ECC.
External I/O Channel Bus: The fourth and final Bus was a subset of I/O Channel, that, according to the Address Bus and Data Bus diagrams of MinusZeroDegrees, has the unofficial name of External I/O Channel Bus (The Technical Reference Block Diagram just calls them external Address Bus and external Data Bus). The External I/O Channel Bus was separated from the main I/O Channel Bus by some intermediate glue chips. Connected to this Bus, you had the ROM chips for the Motherboard Firmware and the IBM Cassette BASIC, and the Intel 8237A DMAC, Intel 8253 PIT, and Intel 8255 PPI support chips. An important detail is that the External I/O Channel Bus didn't had all the 20 Address lines available to it, it had only 13, that was just enough to address the internal memory of the 8 KiB sized ROMs. As the ROMs were still effectively behind a 20 Bits address decoding logic, they were mapped correctly into the Memory Address Space, so things just worked, but the support chips weren't as lucky...
The support chips were mapped to the I/O Address Space behind an address decoding logic that was capable of decoding only 10 Bits instead of the full 16, which means that they did just partial address decoding. This not only happened with the Motherboard built-in support chips, early expansion cards that used PMIO also decoded only 10 Bits of the CPU Address Bus. As such, in the IBM PC platform as a whole, there were effectively only 1 KiB (2^10) worth of unique I/O Ports, that due to the partial address decoding, repeated itself 63 times to fill the 64 KiB I/O Ports Map. In other words, every mapped I/O Port had 63 aliases. At that moment this wasn't a problem, but as the need for more PMIO addressing capabilities became evident, dealing with aliases would cause some compatibility issues in later platforms that implemented the full 16 Bits address decoding for the I/O Address Space in both the Motherboard and the expansion cards.
It was possible to mix an IBM PC with cards that did 16 Bits decoding, but you had to be aware of all of your Hardware address decoding capabilities and the possible configuration options, as there were high chances that the mapping overlapped. For example, in the case of an IBM PC with a card that decoded 16 Bits, while the card should be theorically mappeable to anywhere in the 64 KiB I/O Address Space, if the choosen I/O Port range was above 1 KiB, you had to check if it overlapped with any of the aliased addresses of the existing 10 Bits Devices, which I suppose that should have been simple to figure out if you made a cheatsheet of the current 1 KiB I/O Port Map, then repeated it for every KiB to get the other 63 aliases, so that you knew which ranges were really unused. Likewise, it was possible to plug a 10 Bits expansion card into a platform that did 16 Bits decoding for its built-in support chips and other expansion cards, but the 10 Bits card would create aliases all over the entire I/O Address Space, so there were chances that its presence created a conflict with the mapping of an existing 16 Bits Device that you had to resolve by moving things around.
Another important aspect of the IBM PC is the clock speed that all the chips and Buses ran at. A common misconception that people believes in, is that when you buy a chip advertised to run at a determined Frequency, it automatically runs at that clock speed. The truth is that in most scenarios, a chip doesn't decide at which clock speed it will run at, the manufacturer is simply saying that it rated the chip to run reliably up to that Frequency. What actually defines the clock speed that something will run at, is the reference clock signal that it is provided with. The origin source of a clock signal can be traced all the way back to a Reference Clock that typically comes from a Crystal Oscillator.
The reference clock can be manipulated with Clock Dividers or Clock Multipliers so that different parts of the platform can be made to run at different clock speeds, all of which are synchronous between them since they are derived from the same source. It is also possible for different parts of the same platform to have their own Crystal Oscillators providing different reference clocks. A part of the platform that runs at the same clock speed and is derived from the same reference clock is known as a Clock Domain. A signal that has to transition from one Clock Domain to a different one must go through a Clock Domain Crossing. The complexity of a Clock Domain Crossing varies depending on if a signal has to go from one Clock Domain to another one that is synchronous with it because it is derived from the same reference clock, or to an asynchronous Clock Domain that is derived from a different reference clock source.
In the IBM PC, as can be seen in the Clock Generation diagram of MinusZeroDegrees, everything was derived from the 14.31 MHz reference clock provided by a single Crystal Oscillator. This Crystal Oscillator was connected to the Intel 8284A Clock Generator, that used its 14.31 MHz reference clock as input to derive from it three different clocks signals, OSC, CLK and PCLK, each with its own output Pin. While this arrangement was functional in the IBM PC, it would cause many headaches later on as things would have to eventually get decoupled...
8284A OSC: The first clock output was the OSC (Oscillator) line, which just passthroughed the 14.31 MHz reference clock intact. The OSC line wasn't used by any built-in Device in the IBM PC Motherboard, instead, it was exposed as the OSC Pin in the I/O Channel expansion slots. Pretty much the only card that used the OSC line was the CGA Video Card, which included its own clock divisor that divided the 14.31 MHz OSC line by 4 to get a 3.57 MHz TV compatible NTSC signal, so that the CGA Video Card could be directly connected to a TV instead of a computer Monitor. Actually, it is said that IBM choosed the 14.31 MHz Crystal Oscillator precisely because it made easy and cheap to derive a NTSC signal from it.
8284A CLK: The second clock output, and the most important one, was the CLK (System Clock) line, that was derived by dividing the 14.31 MHz reference clock input by 3, giving 4.77 MHz. Almost everything in the IBM PC used this clock: The Intel 8088 CPU, the Intel 8087 FPU, the Intel 8237A DMAC, the Buses, and even the expansion cards, as the I/O Channel expansion slots also exposed this line as the CLK Pin. Even if an expansion card had its own Crystal Oscillator, there should be a clock domain crossing somewhere between the 4.77 MHz I/O Channel CLK and whatever the card internally used. Note that Intel didn't sold a 4.77 MHz 8088 CPU, 8087 FPU or 8237A DMAC, the IBM PC actually used 5 MHz rated models of those chips underclocked to 4.77 MHz simply because that was the clock signal that they were getting as input.
8284A PCLK: Finally, the third clock output was the PCLK (Peripheral Clock) line, that was derived by dividing by 2 the previous CLK line, giving 2.38 MHz. The Intel 8253 PIT used it, but not directly, since it first passed though a discrete clock divisor that halved it by 2, giving 1.19 MHz, which was the effective clock input of the PIT. Note that each Counter of the 8253 PIT had its own clock input Pin, but all them were wired to the same 1.19 MHz clock line in parallel.
The Keyboard circuitry also used the PCLK line, but I never looked into that part, so I don't know its details. The Keyboard Port had a clock Pin, but I'm not sure whenever it exposes the 2.38 MHz PCLK line or not. I'm not sure what uses this line in the Keyboard side, either. For reference, the 8058 Microcontroller inside the Model F Keyboard has an integrated Clock Generator that can use as clock input either a Crystal Oscillator, or a line coming straight from an external Clock Generator. A schematic for the Model F Keyboard claims that the 8048 uses a 5.247 MHz reference clock as input, yet I failed to identify a Crystal Oscillator in the photos of a dissambled Keyboard. I'm still not sure whenever the 8048 in the Keyboard makes direct use of the Keyboard Port clock Pin or not, as both options are viable to use as reference clock.
An important exception in the IBM PC clocking scheme were the RAM chips used as system RAM, as their functionality was not directly bound at all to any clock signal. The RAM chips of that era were of the asynchronous DRAM type. Asynchronous DRAM chips had a fixed access time measured in nanoseconds, which in the case of those used in the IBM PC, were rated at 250 ns. There is a relationship between Frequency and time, as the faster the clock speed is, the lower each individual clock cycle lasts. Sadly, I don't understand the in-depth details of how the entire DRAM Memory subsystem worked, like how many clock cycles a full memory operation took, or the important MHz-to-ns breakpoints, nor its relationship with the 8088 Bus Cycle, to know how fast the DRAM chips had to be at minimum for a target clock speed (For reference, the IBM PC Technical Reference Manual claims that the Memory access time was 250 ns with a Cycle time of 410 ns, while an 8088 @ 4.77 MHz had a 210 ns clock cycle with a fixed Bus cycle of four clocks, for a total of 840 ns. On paper it seems than the 410 ns Cycle time of the Memory subsystem would have been good enough to keep up with the Bus cycle of an 8088 running at up to 9.75 MHz, but I know that faster 8088 platforms had to use DRAM chips with a lower access time, so there is something wrong somewhere...). Basically, the important thing about asynchronous DRAM is that for as long that it was fast enough to complete an operation from the moment that it was requested to the moment that it was assumed to be finished, everything was good. Faster platforms would require faster DRAM chips (Rated for lower ns, like 200 ns), while faster DRAM chips would work in the IBM PC but from a performance standpoint be equal to the existing 250 ns ones.
Albeit not directly related to the operating Frequency, a Hardware function that would later become prevalent is the insertion of Wait States. A Wait State is pretty much an extra do-nothing clock cycle, similar in purpose to the Processor NOP Instruction, that is inserted at the end of all CPU Bus Cycles. Adding Wait States was the typical solution when you had a slower component that wouldn't work at a higher Frequency. Wait States can be injected in varying amounts, and rates can be configured depending on the type of operation, like for a Memory access or an I/O access (This is what makes a Wait State more useful that outright downclocking everything).
In the case of the IBM PC, it had no Wait States (Also called Zero Wait States) for Memory accesses, but it had an 1 Wait State for I/O accesses (The Technical Reference Manual explicitly mentions that an I/O cycle takes one additional 210 ns clock cycle than the standard 8088 840 ns Bus Cycle for Memory accesses, for a total of 1050 ns in 5 clock cycles. For some reason, I can't find information that refers to this extra clock cycle as a Wait State, yet it seems than that is exactly what this extra cycle is). The 1 I/O WS would allow support chips and expansion cards a bit more time to recover before the next access. Later platforms that ran at higher Frequencies had user selectable Memory WS and I/O WS to allow some degree of compatibility with both slower DRAM chips and expansion cards, but at a significant performance cost.
Since there was pretty much no built-in I/O in the IBM PC Motherboard, to get the IBM PC 5150 to do anything useful at all, you had to insert specialized expansion cards into any of the 5 I/O Channel Slots. All the slots had the same capabilities, so it didn't mattered in which slot you inserted a card. In other computers of the era, slots could have predefined purposes since the address decoding logic for them was found in the Motherboards themselves, whereas in the IBM PC that logic was in the expansion cards. Initially, the card variety was extremely limited, but as the IBM PC ecosystem grew, so did the amount and type of cards.
As mentioned previously, the I/O Channel Slots exposed the I/O Channel Bus, which is itself a demultiplexed yet transparent extension of the CPU Local Bus, and also had Pins with lines directly wired to the Intel 8259A PIC IRQs and Intel 8237A DMAC DMA Channels. Because some Devices in the Motherboard were hardwired to a few of those, only the lines that were actually free were exposed in the slots: 6 Interrupt Request lines, namely IRQs 2-7, and 3 DMA Channels, namely 1-3. None of the 8253 PIT Counters input or output lines were exposed.
The most critical discrete card was the Video Card, be it either MDA or CGA, both because you needed one to connect the IBM PC to a Monitor or TV so you could see the screen, and because without a Video Card, the IBM PC would fail during POST (There was nothing resembling a headless mode). Besides the RAM used as video framebuffer, both Video Cards had an 8 KiB ROM with character fonts, but these were not mapped into the CPU Address Space as they were for the internal use of the Video Cards. Each Video Card also had a goodie of sorts: MDA had a built-in Parallel Port, being perhaps the first example of a multifunction card, while CGA had a rather unknown header that was used to plug an optional Light Pen, which is a very old peripheral that was used by pressing the tip on some location of the screen of a CRT Monitor as if it was a touch screen. Finally, it was possible for a MDA and a CGA Video Card to coexist simultaneously in the same system, and even it was possible to make a Dual Monitor setup with them. Software support to actually use both screens simultaneously was very, very rare, typically they defaulted to only use the active card and Monitor combo. Two of the same Video Card type was never supported at all.
The second most important expansion card was the FDC (Floppy Disk Controller), so that you could attach 5.25'' Diskette Drives to the IBM PC. The original IBM FDC had an internal header for cabling either one or two drives inside the Computer Case, and an external Port for two more drives, for a total of 4. On the original IBM PC models at launch, the FDC and the internal 5.25'' Diskette Drives were optional, as IBM intended Diskettes to be the high end storage option while Cassettes took care of the low end, which is the reason why IBM included a built-in port in the Motherboard to connect to a Cassette Deck. This market segmentation strategy seems to have failed very early on the life of the IBM PC, as Cassette-only models soon dissapeared from the market. At the time of the IBM PC release, the only compatible Diskettes had a 160 KB usable size after formatting. I have no idea about how much usable capacity a Cassette had.
Other important expansion cards included those that added Serial and Parallel Ports (Unless you used a MDA Video Card, that had a Parallel Port integrated), so that you could connect external peripherals like a Modem or a Printer. Memory expansion cards seems to have been rather common in the early days, as RAM cost plummeted while densities skyrocketed. Consider that at launch, the IBM PC officially supported only 256 KiB RAM using the 16KB - 64KB Motherboard fitted with all 64 KiB plus 3 64 KiB Memory expansion cards, which back then was a ton, yet in a few years getting the full 640 KiB that the computer Hardware supported became viable.
Besides the mentioned cards, the IBM PC expansion possibilities at launch day weren't that many. Assuming that you were using a typical setup with a CGA Video Card, a Diskette Drive and a Printer, you just had two free expansion slots left, and I suppose that more RAM would be a popular filler. Later on the life of the platform, many multifunction cards appeared that allowed to save a few expansion slots by packing multiple common Device types into the same card, and thus making possible to fit more total Devices into the slot starved IBM PC. A very known multifunction card is the AST SixPack, which could have installed in it up to 384 KiB RAM, a Serial Port, a Parallel Port, and a battery backed RTC (Real Time Clock) that required custom Drivers to use (A RTC would later be included in the IBM PC/AT).
Worth mentioning is that the Mouse wasn't a core part of the PC platform until the IBM PS/2. Actually, the Mouse took a few years before it made its first appearance on the PC platform, plus a few more before becoming ubiquitous. A Mouse for the IBM PC would have required its own expansion card, too, be it a Serial Port card for a Serial Mouse, or a Bus Mouse Controller card for a Bus Mouse (The Bus Mouse isn't an example of a Device supporting the CPU Bus protocol that I mentioned when talking about the 8255 PPI because protocol translation stills happens on the controller card, as it would happen in a Serial Port card if it were a Serial Bus Mouse, too. They don't seem different than the GPIO-to-CPU Bus interface of the 8255 PPI).
One of the most curious types of expansion cards that could be plugged in the IBM PC were the Processor upgrade cards. These upgrade cards had a newer x86 CPU and sometimes a Socket for an optional FPU, intended to fully replace the 8088 and 8087 of the IBM PC Motherboard. Some cards also had RAM in them (With their own Memory Controllers) because the newer Processors had wider Data Buses, so using the IBM PC I/O Channel Bus to get to the system RAM in the Motherboard or another Memory expansion card would limit the CPU to 8 Bits of Data and absolutely waste a lot of the extra performance. At some point they were supposed to be cost effective compared to a full upgrade, but you will eventually learn a thing or two about why they were of limited usefulness...
An example of a fully featured upgrade card that could be used in the IBM PC is the Intel InBoard 386/PC, perhaps one of the most advanced ones. This card had a 16 MHz 80386 CPU with its own Crystal Oscillator, a Socket for an optional 80387 FPU, a local 1 MiB RAM using a 32 Bits Data Bus, and also supported an optional Daughterboard for even more RAM. In recent times we had conceptually similar upgrade cards like the AsRock am2cpu, but these work only on specific Motherboards that were designed to accommodate them.
Upgrade cards had several limitations. One of them was that the cards that had RAM in them were intended to use it as Conventional Memory replacement with a wider Data Bus for better performance, but, since the RAM in the Motherboard had an inflexible address decoding logic, it was impossible to repurpose it by mapping that RAM somewhere else, with the end result that the Motherboard RAM had to be effectively unmapped and remain unused. Upgrade cards without RAM could use the Motherboard RAM as Conventional Memory, but that would mean that it would miss any performance increase due to the newer CPU wider Data Bus. Another issue is that the I/O Channel Slots didn't provided all the signal lines that were wired to the 8088 CPU in the Motherboard, like the INTR and INTA interrupt lines connected to the 8259A PIT, thus the upgrade cards had to use a cable that plugged into the 8088 CPU Socket (You obviously had to remove the CPU on the Motherboard, and the FPU, too) to route these to the new main Processor, making installing an upgrade card less straightforward that initially appears to be.
As I failed to find enough detailed info about how the upgrade cards actually worked, I don't know if the card used for I/O the cable connected to the 8088 Socket, or if instead it used the expansion slot itself except for the unavailabled lines that had to be routed through the cable like the previously mentioned INTR and INTA. Regardless of these details, at least the RAM on the upgrade card should have been directly visible in the I/O Channel Bus, else, any DMA to system RAM involving the 8237A DMAC would have been broken.
There were other cards that were similar in nature to the previously described upgrade cards, but instead of replacing some of the Motherboard core components with those sitting in the card itself, they added independent Processors and RAM. Examples of such cards include the Microlog BabyBlue II, which had a Zilog Z80 CPU along with some RAM. Cards like these could be used to run Software compiled for other ISAs in these addon Processors instead of x86, and is basically how emulation looked like during the 80's where you couldn't really do it purely in Software, actually requiring dedicated Hardware to do the heavy lifting.
Yet another major piece of the IBM PC was a Software one, the Firmware, that was stored in one of the Motherboard ROM chips. The IBM PC Firmware was formally known as the BIOS. The BIOS Firmware was the very first code that the Processor executed, and for that reason, the ROM had to be mapped to a specific location of the Memory Address Space so that it could satisfy the 8088 CPU hardcoded startup behavior. The BIOS was responsible for initializing and testing most of the computer components during POST before handing out control of the computer to user Software. You can read here a detailed breakdown of all the things that the BIOS did during POST before getting the computer into an usable state. Some parts of the BIOS remained always vigilant for user input, like the legendary routine that intercepted the Ctrl + Alt + Del Key Combination.
The BIOS also provided a crucial component of the IBM PC that is usually underappreciated: The BIOS Services. The BIOS Services were a sort of API (Application Programming Interface) that the OS and user Software could call via Software Interrupts as a middleware that interfaced with the platform Hardware Devices. As such, the BIOS Services could be considered like built-in Drivers for the computer. IBM actually expected that the BIOS Services could eventually be used as a HAL (Hardware Abstraction Layer), so if the support chips ever changed, Software that relied on the BIOS Services would be forward compatible. Although IBM strongly recommended to Software developers to use the BIOS Services, it was possible for applications to include their own Drivers to bypass them and interface with the Hardware Devices directly. Many performance hungry applications did exactly that, as the BIOS Services were very slow. Regardless of these details, the BIOS Services were a staple feature of the IBM PC.
Compared to later systems, there was no "BIOS Setup" that you could enter by pressing a Key like Del during POST, nor there was any non volatile writable memory where the BIOS could store its settings. Instead, the Motherboard was outfitted with several DIP Switches, the most notorious ones being SW1 and SW2, whose positions had hardcoded meanings for the BIOS, which checked them on every POST. This made the BIOS configuration to be quite rudimentary in nature, as any change required physical access to the Motherboard.
The BIOS pretty much did no Hardware discovery on its own, it just limited itself to check during POST for the presence of the basic Hardware Devices that the DIP Switches told it that the computer had installed, thus it was very important that the DIP Switches were in the correct position since there were many failure conditions during POST that involved the BIOS being unable to find a Hardware Device that it expected to be present. For example, the BIOS didn't scanned the entire 640 KiB range of Conventional Memory to figure out how much system RAM it could find, it simply checked the position of the DIP Switches that indicated how much Conventional Memory the computer had installed, then just limited itself to test if that amount was physically present (You could use the Motherboard DIP Switches to tell the BIOS that the computer had less system RAM that it actually had and it would work, it failed only if the amount set was higher than what was physically installed). The type of Video Card was also configurable via DIP Switches, you could use them to tell the BIOS if you had a MDA or CGA Video Card, and it would check if it was present and use it as primary video output.
There were a few BIOS Services that allowed an OS or any other application to ask the BIOS what Hardware it thought that the computer had installed. For example, INT 11h was the Equipment Check, that could be used by an application to determine if the computer was using a MDA or a CGA Video Card, among other things. There was also INT 12h, which returned the amount of Conventional Memory. A rather interesing detail of INT 12h is that it was the closest thing to a Software visible Memory Map available during that era. Neither the BIOS, the OS nor the user applications knew how the Memory Map of the computer truly was, they just blindly used what they were hardcoded to know about based on both the IBM defined Memory Map, and any user configurable Drivers that could point to where in the Memory and I/O Address Spaces an expansion card was mapped to.
It was possible to upgrade the BIOS of the IBM PC, but not by flashing it. You were actually required to purchase a new ROM chip with the latest BIOS version preprogrammed in it, as the standard ROMs of that era weren't of the rewritable variety (And chances are that those that were would need an external reprogrammer, so at the bare minimum, you would have to remove the BIOS ROM from the computer to rewrite it with a new image. It was not like modern in-situ flashing). There are three known BIOS versions for the IBM PC, and the last one (Dated 10/27/82) is rather important since it introduced a major feature: Support for loadable Option ROMs. This greatly enhanced what an expansion card could do.
When the IBM PC was first released, the BIOS had built-in support for pretty much all the existing Device types designed by IBM. The BIOS could check for the presence, initialize, test and provide an interface to use most of IBM own expansion cards or 100% compatible third party ones, but it could do nothing about Devices that it didn't knew about. This usually was not an issue since a Device that was not supported by the BIOS would still work if using either an OS Driver, or an application that included a built-in Driver for that Device, which was the case with Sound Cards (Although these came much, much later). However, there were scenarios where a Device had to be initialized very early for it to be useful. For example, the earliest HDs (Hard Diskses) and HDC (Hard Disk Controller) cards that could be used on the IBM PC were from third parties that provided Drivers for the OS, so that it could initialize the HDC and use the HD as a storage drive. As the BIOS had absolutely no knowledge about what these things were, it was impossible for the BIOS to boot directly from the HD itself, thus if you wanted to use a HD, it was an unavoidable procedure to first boot from a Diskette to load the HDC Driver. IBM should have noticed that the BIOS had this huge limitation rather quickly, and decided to do something about it.
The solution that IBM decided to implement was to make the BIOS extensible by allowing it to run executable code from ROMs located in the expansion cards themselves, effectively making the contents of these ROMs a sort of BIOS Drivers. During POST, the BIOS would scan a predefined range of the UMA every 2 KiB intervals (The IBM PC BIOS first scans the 768 KiB to 800 KiB range expecting a VBIOS, then a bit later it scans 800 KiB to 982 KiB for any other type of ROMs) looking for mapped memory with data in it that had an IBM defined header to indicate that it was a valid executable Option ROM. Option ROMs are what allowed an original IBM PC with the last BIOS version to initialize a HDC card so that it could boot from a HD, or to use later Video Cards like EGA and VGA, as these had their initialization code (The famous VBIOS) in an Option ROM instead of expecting built-in BIOS support like MDA and CGA Video Cards. While if IBM wanted to, it could have keep releasing newer BIOS versions in preprogrammed ROMs that added built-in support for more Device types, it would have been a logistical nightmare if every new expansion card required to also get a new BIOS ROM. Moreover, such implementation would have hit a hard limit rather soon due to the fixed Motherboard address decoding logic for the BIOS 8 KiB ROM chip, while an expansion card was free to use most of the UMA to map its Option ROM. Considering that we are still using Option ROMs, the solution that IBM choosed was a very good one.
Related to Option ROMs, an extremely obscure feature of the IBM PC 5150 is that the previously mentioned empty ROM socket, known as U28, was already mapped in the 976 KiB to 983 KiB address range (Just below the IBM Cassette BASIC), and thus ready to use if you plugged a compatible 8 KiB ROM chip there. With the last BIOS version, a ROM chip fitted in U28 worked exactly like if it was an Option ROM built-in in the Motherboard itself instead of an expansion card, as the BIOS routine that scans for valid Option ROMs also scans that 8 KiB address range, too. So far, the only commercial product that I'm aware of that shipped a ROM chip that you could plug into the U28 Socket was a Maynard SCSI Controller, albeit I don't know what advantages it had compared to having the Option ROM in the expansion card itself. Some hobbyist also managed to make a custom Option ROM for debugging purposes. Since this empty ROM socket is present in both IBM PC Motherboard versions and predates the loadable Option ROMs feature introduced by the last BIOS version, I don't know what its original intended use was supposed to be, or if the previous BIOS versions supported U28 as an Option ROM then the last BIOS extended the scheme for around half of the UMA range.
Last but not least, after the BIOS finished the POST stage, it had to boot something that allowed the computer to be useful. Without Option ROMs, the IBM PC BIOS knew how to boot from only two types of storage media: Diskettes, via the FDC and the first Diskette Drive, or the built-in IBM Cassette BASIC, also stored in ROM chips in the Motherboard like the BIOS Firmware itself.
If your IBM PC couldn't use Diskettes because you had no FDC or Diskette Drive, or there was no Diskette inserted in the Diskette Drive, the BIOS would boot the built-in IBM Cassette BASIC. This piece of Software isn't widely known because only genuine IBM branded computers included it. Perhaps the most severe limitation of the IBM Cassette BASIC was that it could only read and write to Cassettes, not Diskettes, something that should have played a big role in how quickly it was forgotten. To begin with, Cassettes were never a popular type of storage media for the IBM PC, so I doubt that most people had a Cassette Deck and blank Cassettes ready to casually use the IBM Cassette BASIC. With no way to load or save code, everything was lost if you rebooted the computer, so its usefulness was very limited.
The other method to boot the IBM PC was with Diskettes. Not all Diskettes could be used to boot the computer, only those that had a valid VBR (Volume Boot Record) were booteable. The VBR was located in the first Sector (512 Bytes) of the Diskette, and stored executable code that could bootstrap another stage of the boot process. Besides the booteable Diskettes with the OSes themselves, there were self contained applications and games known as PC Booters that had no reliance on an OS at all, these could be booted and used directly from a Diskette, too.
The last piece of the IBM PC was the Operating System. While the OS is very reelevant to the platform as a whole functional unit, it is not part of the Hardware itself (The Firmware is Software, yet it is considered part of the Motherboard as it comes in a built-in ROM with code that is highly customized for the Hardware initialization of that specific Motherboard model. However, the IBM Cassette BASIC was also in the Motherboard ROM but can actually be considered some sort of built-in PC Booter). The Hardware platform is not bound to a single OS, nor it is guaranteed that an user with a specific Hardware platform uses or even has an OS, either.
What made OSes important are their System Calls. System Calls are Software Interrupts similar in style and purpose to the BIOS Services, as both are APIs used to abstract system functions from user applications. The major difference between them is that the BIOS Services relies on Firmware support, so they are pretty much bound to the Hardware platform, while System Calls relies on the OS itself. Since at the time of the IBM PC launch it was very common that the same OS was ported to many Hardware platforms, an application that relied mainly on the System Calls of a specific OS was easier to port to another platform that already had a port of that OS running in it. I suppose that there should be a lot of overlapping between BIOS Services and System Calls that did very similar or even identical things. I image that is also possible that a System Call had a nested BIOS Service call, thus the overhead of some functions could have been rather high.
There were two OSes available at the launch date of the IBM PC, and some more appeared later. Regardless of the alternatives, the most emblematic OS for the IBM PC was PC DOS. PC DOS was developed mainly by Microsoft, who keep the right to license it to third parties. Eventually, Microsoft would start to port PC DOS to other non-IBM x86 based platforms under the name of MS-DOS.
The first version of PC DOS, 1.0, was an extremely dull, simple and rudimentary OS. PC DOS had no Multitasking capabilities at all, it could only run a single application at a time. Besides having its own System Calls, known as the DOS API, it also implemented the System Calls of CP/M, a very popular OS from the previous generation of platforms with 8 Bits Processors, making it sort of compatible. The intent was that it would be easier to port CP/M applications to the IBM PC if the developer had a familiar OS interface to work with, as it could instead focus on the new Hardware (Mainly the CPU, as it was a different ISA than the previous 8 Bits ones like the Intel 8080/8085 CPUs). However, as far that I know, the CP/M System Calls of PC DOS were barely used, and pretty much entirely forgotten after PC DOS took over the vast majority of OS marketshare.
Perhaps the most reelevant and exclusive feature of PC DOS was that it had its own File System for Diskettes, FAT12 (File Allocation Table. Originally it was merely named FAT, but it has been retconned). A File System is a data organization format that with the aid of metadata defines how the actual user data is stored onto a storage media. The DOS API provided an interface that Software could use to easily read and write to Diskettes formatted in FAT12, greatly simplifying development of user applications that had to store files. I have no idea if PC DOS was able to effectively use Cassettes on its own as a Diskette alternative, it is a quite unexplored topic.
In the early days of the IBM PC, it seems that only office and productivity Software that had to deal with data storage in Diskettes used PC DOS, as they could rely on the DOS API to use FAT12. If the developer didn't needed sophisticated storage services, or could do its own implementation of them (Albeit such implementation may not necessarily be FAT12 compatible, thus not directly readable by PC DOS) along with any other required PC DOS functionality, it was more convenient to do so to make the application a self contained PC Booter that relied only on the BIOS Services, since using the DOS API meant that you had to own and boot PC DOS first and lose some valuable RAM that would be used by it. Basically, as ridiculous as it sounds now, developers actively tried to avoid using the IBM PC main OS if storing data was not a required feature. And don't even get started with the amount of things that you had to be careful with to not break DOS basic functionality itself, like separately keeping track of the System Time. Other than the importance of the FAT12 File System, there isn't much else to say about the humble beginnings of PC DOS, it was really that plain. This impression will only get worse the moment that you discover that before the history of PC DOS began, Microsoft was already selling a full fledged UNIX based OS, Xenix, for other non-x86 platforms.
One of the things that I was always curious about, is how the daily usage of an IBM PC was like from the perspective of an early adopter. Information about the IBM PC 5150 user experience in 1981 is scarse, but based on what I could muster from digital archivists blog posts and such, the limited amount of expansion cards and user Software means that there weren't a lot of things that you could do with one right at the launch date. That makes somewhat easy to hypothesize about how the IBM PC could have been used by an average user...
The lowest end model of the IBM PC had only 16 KiB RAM, the MDA Video Card, and no FDC nor Diskette Drive. The only thing that this setup could do, was to boot the built-in IBM Cassette BASIC. As previously mentioned, the IBM PC Motherboard supported connecting a Cassette Deck directly to it, so it was possible to use Cassettes as the storage media for the IBM Cassette BASIC environment with just an external Cassette Deck and the base IBM PC unit. You could also use the IBM Cassette BASIC with no Cassette Deck at all, but that would made it an entirely volatile environment as it would have been impossible to save your work, losing whatever you had done if the computer was shutdown or rebooted. IBM intended that this particular model competed against other contemporary personal computers that also used Cassettes, but aimed at people that were willing to pay a much higher premium for the IBM branding. Supposedly, the Cassette-only IBM PC 5150 was so popular that factory boxed units lasted only a few months in the market before vanishing, never to be seen again.
A mainstream IBM PC would have 64 KiB RAM (Maxing the first 16KB - 64KB Motherboard) with a FDC and a single 5.25'' Diskette Drive. In addition to the IBM Cassette BASIC, this setup would also have been able to boot from Diskettes with a valid VBR, like standalone PC Booters applications and games, or proper OSes. While PC DOS was the most popular IBM PC OS, it didn't came by default with the system, it was an optional purchase, yet most people got it as it opened the door to use any application that had to be executed from within the DOS environment.
Booting PC DOS from a Diskette should have been an experience similar to booting a modern day Live CD in a computer with no Hard Disk, with the major difference being that every application that you may have wanted to use was in its own Diskette, so you had to disk swap often. Due to the fact that RAM was a very scarse and valuable resource, the whole PC DOS didn't stayed resident in RAM, only the code reelevant for the DOS API did, leaving an application able to freely overwrite the rest. Because of this, if you executed then exited a DOS application, typically you also had to disk swap back again to the PC DOS Diskette to reload it, then disk swap once more to whatever other application you wanted to use.
While is arguable that people didn't multitasked between applications as often as we do now, the entire disk swapping procedure made using the system rather clumsy. Actually, it could have been worse than that, since many applications could require another Diskette to save data, so if you were ready to quit an application to use another one, you may have had to first disk swap to a blank Diskette to save the current data, probably disk swap to the application Diskette again so that it could reload itself (In the same way that the whole PC DOS didn't stayed in RAM all the time, an application Diskette had to be accessed often to load different parts of it as code that was already in RAM got continuously overwritten), exit, then disk swap to the PC DOS Diskette. It is not surprising that PC Booters were the favoured format for anything not really requiring the DOS API services until fixed storage became mainstream, as the amount of disk swapping should have been quite painful.
A high end setup of the IBM PC would include a FDC with two 5.25'' Diskette Drives. The advantage of this setup was that it massively reduced the daily disk swapping, as applications that couldn't save data to the same Diskette that they were loaded from should have been able to write the data to the other Diskette Drive. You could also do a full disk copy at once, something that was impossible to do with only a single Diskette Drive before the advent of fixed storage media, unless you had enough RAM to make a RAMDisk to hold the data (Keep in mind that early Diskettes were only 160 KB worth in size, so using 3 64 KiB Memory expansion cards could have been a viable, if expensive, alternative to another Diskette Drive).
After fully describing the IBM PC 5150, we can get to the point where it is possible to explain what a computer platform is. Since by now you know about nearly all the components present in the IBM PC and their reason to be there, you should have already figured out yourself about all the things that the definition of platform involves. A computer platform can be defined, roughly, as all of the previously explained things considered as part of a whole functional unit. There are subdivisions like the Hardware platform, that focuses only on a minimum fixed set of physical components, and the system Software, that includes Firmware and OS, the first which can be upgraded or extended, and the latter that is not even a required component at all for the existence of standalone user Software, albeit it makes for it to be easier to develop.
How much the components of a computer platform can vary while still being considered the same platform, is something quite constrained by the requeriments of the Software that will run on it. The user Software can make a lot of fixed assumptions about both the Hardware platform and the system Software, so if you want that these applications work as intended, the computer must provide an environment that fully satisfies all those assumptions. In the Hardware side, the most important part is the Processor due to the Binary Compatibility, so that it can natively run executable code compiled for its specific ISA. Following the Processor you have the support chips and basic Devices like the Video Card, all of which can be directly interfaced with by applications (This statement was true with 80's and early 90's platforms at most. In our modern days you always go though a middleware API, like DirectX, OpenGL or Vulkan in the case of Video Cards or some OS System Call for everything else), as they can be found in a fixed or at least predictable Memory Map. In the system Software side, you have the Firmware and the OS, both of them providing Software Interrupts with a well defined behavior that are intended to ease for applications how to interface with the Hardware. An application could assume that most of these things were present and behaved in the exact same way in all the systems it would be run on, so any unexpected change could break some function of the application or cause that it didn't work at all.
When you consider the IBM PC as a whole unit, you can think of it as a reference platform. As the user Software could potentially use all of the IBM PC platform features, it was a minimum of things that had to be present in any system that expected to be able to run applications intended for an IBM PC. There could be optional Devices that could enhance an application functionality, but these weren't a problem because user Software didn't always assumed that an optional Device was always available. For example, during the early 90's, not everyone had a Sound Card, but everyone had a PC Speaker. Games typically supported both, and albeit the PC Speaker capabilities aren't even remotely comparable to a Sound Card, at least you had some low quality sound effects instead of being completely mute.
There were other platforms based on x86 Processors that were quite similar to the IBM PC from a component standpoint, but barely compatible with its user Software. For example, in Japan, NEC released their PC-98 platform a bit more than a year after the IBM PC, and from the core Hardware perspective, they have a lot in common. The first PC-98 based computer, the NEC PC-9801, had an Intel 8086 CPU that required a wider Bus and a different type of expansion slots to accommodate its 16 Bits Data Bus, but it was otherwise functionally equivalent to the 8088. The support chips included an Intel 8253 PIT, an 8237A DMAC and two cascaded 8259A PICs, so it can be considered that the core components of the platform were around halfway between the IBM PC and the IBM PC/AT. Microsoft even ported MS-DOS to the PC-98 platform, so the same System Calls were available in both platforms.
However, the NEC PC-98 platform had substantial differences with the IBM PC: The Devices weren't wired to the 8237A DMAC and the 8259A PICs support chips in the same way than on the IBM PC, so the standard DMA Channel and IRQ assignments for them were different. The Video Card was completely different since it was designed with japanese characters in mind, thus anything from the IBM PC that wanted to use it directly, typically games, would not work. The PC-98 also had a Firmware that provided its own Software Interrupts, but they were not the same than those of the IBM PC BIOS, so anything that relied on the IBM PC BIOS Services would fail to work, too. The Memory Map of the PC-98 was similar since the Memory Address Space also was partitioned in two with the lower 640 KiB reserved for system RAM and the upper 384 KiB for everything else, but the upper section, which in the IBM PC was known as UMA, was almost completely different. In practice, the only IBM PC user Software that had any chance to work in the PC-98 were PC DOS applications executed under PC-98 MS-DOS that relied only on the DOS API, were console based and otherwise very light on assumptions, anything else required to be ported.
From the platforms that were partially compatible with the IBM PC, a very notorious one came from IBM itself: The IBM PCjr 4860, released in March 1984. The IBM PCjr was a somewhat cut down version of the IBM PC that was targeted at the home user instead of the business user. While the PCjr Firmware was fully BIOS compatible, it had some Hardware differences with the PC. The IBM PCjr Video Card wasn't fully CGA compatible as it had just a partial implementation of the CGA registers, so any game that tried to directly manipulate the registers that weren't implemented would not function as expected. Games that instead used BIOS Services for the graphics mode change worked, making them a good example of how the BIOS Services were doing the job of a HAL when the underlying Hardware was different. The IBM PCjr was also missing the Intel 8237A DMAC, which means that all memory accesses had to go through the Processor as they couldn't be offloaded to a DMA Controller support chip like in the IBM PC. This caused some applications to not work as expected, since they couldn't perform some operations simultaneously. There were many more differences, but the important point is that due to these differences, around half of the IBM PC Software library didn't run in the IBM PCjr, which is perhaps the main reason why it failed in the market.
Yet another example of a platform is an IBM PC that had installed the Intel InBoard 386/PC upgrade card mentioned a while ago. The upgrade card allowed an IBM PC to enjoy the new Intel 80386 CPU performance and features, however, not all Software requiring a 386 would work with it. For example, a 386 allowed Windows 3.0 to use the 386 Enhanced Mode, but Windows 3.0 assumed that if you had a 386 Processor, you also had an IBM PC/AT compatible platform. An IBM PC with an InBoard 386/PC is a PC class 386, which is a rather unique combination. In order to make the upgrade card useful, Intel and Microsoft took the extra effort of colaborating to develop a port of Windows 3.0 386 Enhanced Mode for the IBM PC platform. This shortcoming essencially made the whole Processor upgrade card concept a rather incomplete idea, it was simply too much effort.
Among the most amazing engineering feats, I should include all the efforts made to attempt to achieve cross compatibility between completely different computer platforms. Because in those days full Software emulation was not practical due to a lack of processing power, the only viable way to do so was to throw in dedicated Hardware in an expansion card, which was still cheaper than a full secondary computer. With the help of Software similar in purpose to a Virtual Machine Monitor that arbitrated the shared host resources and emulated what was missing from the guest platform, you could execute Software intended for a completely different one on your IBM PC. Examples include the previously mentioned Microlog BabyBlue II, that had a Zilog Z80 CPU so that it could natively run code from CP/M applications, or the DCS Trackstar and Quadram Quadlink, which included a MOS 6502 CPU, its own RAM, and a ROM with an Apple II compatible Firmware so that it could run Apple II applications.
Some computers went dramatically further and attemped to fuse two platforms in one, including a very high degree of resource sharing as they were considered part of the same base platform instead of a simple addon expansion card. These unique computers include the DEC Rainbow 100, that was both IBM PC compatible and could run CP/M applications thanks to having both an Intel 8088 CPU and a Zilog Z80 CPU, but in a more tighly integrated relationship than an IBM PC with a BabyBlue II installed due to some specific resource sharing, or the Sega TeraDrive, an IBM PC/AT and Sega Mega Drive (Genesis) hybrid with both an Intel 80286 CPU, Zilog Z80 CPU, and Motorola 68000 CPU. I'm even curious whenever someone ever attemped to emulate them due to their unique features...
Some of the earliest rivals that the IBM PC had to face in the personal computer market were other computers extremely similar to the IBM PC itself, but based on different platforms. Most of these PC-like platforms were x86 based (Some even used the Intel 8086 CPU instead of the 8088) and also had their own ported versions of MS-DOS and other user Software, making them similar to the previously mentioned NEC PC-98, but closer to the IBM PC than it, and thus far more compatible.
However, as the IBM PC popularity grew, its Software ecosystem did, too. Soon enough, it became obvious for any computer manufacturer that it would be impossible to break into the personal computer market with yet another different computer platform that would require its own Software ports of almost everything, it was already hard enough for the partially compatible platforms that were already in the market to stay reelevant to keep introducing more. Thus, the idea of non-IBM computers that could run out of the box the same user Software than the IBM PC with no modifications at all, and even use the same expansion cards, became highly popular...
The open architecture approach of the IBM PC made cloning the Hardware side of the computer ridiculous easy, as all the chips could be picked off the shelf, while the IBM PC 5150 Technical Reference Manual had extensive diagrams documenting how they were interconnected at the Motherboard level. Microsoft would happily license MS-DOS and a Diskette version of Microsoft BASIC to other computer manufacturers, too. There was a single showstopper: The IBM PC BIOS. While IBM openly provided the source code for it, it was propietary, making it the only thing that allowed IBM to keep the other computer manufacturers from being able to fully clone the IBM PC. In fact, some of the earliest makers of IBM PC clones got sued by IBM since they outright used the IBM PC BIOS source code that IBM published.
Eventually, a few vendors with good lawyers (Compaq being the most famous and successful) figured out that it was possible to do a legal BIOS replacement that was functionally identical to the IBM PC one, for as long that they were using a clean room design procedure (Basically, someone had to reverse engineer the BIOS to document what it did, then another developer that had absolutely no contact with anything related to it had to reimplement the same functionality, as to not accidentally use any of the IBM code). This opened the door for the first legal IBM PC clone computers...
The first wave of fully legal IBM PC clone computers was spearheaded by the launch of the Columbia Data Products MPC 1600 in June 1982, less than a year after the original IBM PC. It got eclipsed by the Compaq Portable launched in March 1983, which is far more known. Soon after, there was a great breakthrough when in May 1984, Phoenix Technologies made its own legal BIOS replacement available to other computer manufacturers at an affordable license cost (You can read about the original Phoenix BIOS author experiences). This caused a flood of competition from new manufacturers, as by that point, anyone with some capital could set up a workshop to build computers with the same (Or compatible) Hardware than that of the IBM PC, the PC DOS compatible MS-DOS from Microsoft, and a Firmware functionally equivalent to the IBM PC BIOS from Phoenix Technologies. All these clones were able to run almost all the Software designed for the IBM PC for a much lower price that what IBM charged for their systems.
The only thing missing from clones was the IBM Cassette BASIC, as IBM had an exclusive licensing deal from Microsoft for a built-in ROM version of BASIC (Amusingly, there were a few rare versions of MS-DOS intended to be loaded from a built-in ROM used by some clone manufacturers). This wasn't critical for general IBM PC compatibility, except for some specific BASIC applications that expected to read code from the fixed addresses reserved for the ROMs of the IBM Cassette BASIC. Still, for some reason, many clone Motherboards had empty spaces to plug the 4 ROM chips that made up the IBM Cassette BASIC, but these were only rarely used. You could either remove the original ROM chips from an IBM PC, or just make a pirate copy in programmable ROMs.
Clone computers are the very reason why the IBM PC platform was so successful. IBM prices were always very high since it was aiming mostly at profitable business and enterprise customers that paid for the brand (Remember the famous "No one got fire for buying IBM" motto?), it was the clone manufacturers the ones that were directly fighting against the other partially compatible PC-like platforms (And themselves) based on price and specifications. Clones dramatically helped to increase the size of the installed user base of IBM PC compatible computers, snowballing the growth of its Software ecosystem at the cost of the other platforms, and consequently, driving even more sales of IBM PC compatible computers. Is possible to say that the clones made all the PC-like platforms go extinct.
IBM, eager to continue the success of the IBM PC, released the IBM PC/XT 5160 in March 1983. The PC/XT tackled on the main weakness of the IBM PC, which was that its 5 expansion slots were far from enough as the addons ecosystem began to grow. On top of that, it added an internal Hard Disk plus a HDC (Hard Disk Controller) card as out-of-the-box components, albeit at the cost of a Diskette Drive. In later models the Hard Disk would become entirely optional, being able to run two internal Diskette Drives like the previous IBM PC did.
From the platform perspective, the PC/XT is barely interesing as it had no core changes over the IBM PC 5150. It still had the same Processor, the same support chips, the same Clock Generation, and the exact same performance. It had a slighty different Memory Map, but nothing major. In essence, the IBM PC/XT was pretty much a wider version of the original IBM PC with some minor enhancements, but otherwise functionally almost identical to it. In fact, most of the things that could be added to it would also work in the previous PC. There is a single thing than the IBM PC/XT actually removed: The Cassette Port and its associated circuitry, perhaps for the grief of no one.
There were two different Motherboards used in the IBM PC/XT: The original one, known as the 64KB - 256KB, and a later one released around 1986, known as 256KB - 640KB. The main difference is obviously the amount of installed RAM that they supported. There were also a few minor revisions that tried to solve some quirks involving the Intel 8237A DMAC, a premonition of things to come.
Most of the big changes of the IBM PC/XT were found in its Motherboard. The easiest thing to notice is that the new Motherboard had 8 I/O Channel Slots instead of the 5 available in the original PC, which was a major upgrade considering how slot starved the original IBM PC was. To accommodate more slots, the spacing between them was reduced compared to the IBM PC. This is one of the greatest legacies of the PC/XT, as its slot spacing became a de facto standard, and would eventually be formalized for Motherboards and Computer Cases Form Factors like ATX, so we are still using today the PC/XT slot spacing.
A curious detail regarding the expansion slots is that one of them, Slot 8, behaved differently to all the other ones. As can be seen in the Address Bus and Data Bus diagrams of MinusZeroDegrees, for some reason, Slot 8 was wired to the External I/O Channel Bus instead of the main segment. Moreover, Slot 8 repurposed the Pin B8 of the I/O Channel Slot, which in the IBM PC 5150 was marked as Reserved, as CARD SLCTD (Card Selected), and it was expected that any card inserted into Slot 8 made use of that line. Because of this, an expansion card had to explicitly support Slot 8 behavior in order to work if installed there. Many IBM PC/XT era expansion cards typically had a Jumper to select either standard or Slot 8 specific behavior, so they could work in either. There has been some speculation about what was IBM idea behind Slot 8. So far, there seems to exist at least a single card that ONLY works in Slot 8, which IBM used in a PC/XT variant known as the PC 3720 that had some expansion cards bundled with it so that it could emulate an IBM 3270 Terminal (Similar in nature to the Microlog BabyBlue II, DCS Trackstar and Quadram Quadlink mentioned previously) to interface with IBM mainframes.
While at the core the support chips remained the same, the Intel 8255 PPI had several GPIO rearrangement changes. It was still being used as part of the Keyboard and PC Speaker subsystems, and their GPIO Pins remained the same for backwards compatibility reasons, but many others were rearranged. Port A, being fully used by the Keyboard interface, remained completely intact, the changes were among Ports B and C GPIO Pins. The most notorious thing related to those changes is the removal of anything related to the Cassette interface, as the Cassette Deck Port and its support circuitry completely vanished, so it was now impossible to plug a Cassette Deck to the PC/XT. Given how popular the Cassette-only IBM PC models were known to be, I wonder if someone actually missed it...
The Motherboard still included the IBM Cassette BASIC in ROM and you could still boot to it. However, it was the exact same version than the previous IBM PC one, which means that it still lacked Diskette support. Since Cassettes were now impossible to use, the BASIC environment was absolutely volatile. Keeping the IBM Cassette BASIC in the PC/XT wasn't entirely useless because the Diskette versions of BASIC that Microsoft provided for genuine IBM PC computers were not standalone, instead, they relied on reading code from the Motherboard ROM. I suppose that this method could save some RAM, as the BASIC environment could get part of its executable code directly from the always available ROM instead of having to waste RAM to load the data from the Diskette. This gimmick became pretty much useless as the amount of RAM that systems had installed continued to grow.
The Memory Map of the PC/XT saw a few changes in the UMA that bridges the gap between the PC and the next platform, the PC/AT. Whereas the original IBM PC had a 16 KiB chunk just above the Conventional Memory (640 KiB to 655 KiB) marked as reserved, the PC/XT unifies it with the next segment for video framebuffers that includes MDA and CGA as to make it a full 128 KiB block (640 KiB to 767 KiB. Note that the IBM PC/XT Technical Reference from April 1983 seems to be merely marking it as reserved instead of for video framebuffer purposes as in the IBM PC Technical Reference). At the end of it, there is a 192 KiB block intended for Option ROMs (768 KiB to 959 KiB), which is free except for the predetermined allocation of a 8 KiB chunk for the new HDC (Hard Disk Controller) card Option ROM (800 KiB to 807 KiB. Note that this allocation was retconned for the original IBM PC as it also appears on the April 1984 version of its Technical Reference). Finally, the second 16 KiB chunk that was marked as reserved in the IBM PC is now unified with the range reserved for the Motherboard ROMs, making it a 64 KiB block (960 KiB to 1024 KiB). In resume, the UMA consist of 128 KiB for video framebuffers, 192 KiB for Option ROMs, and 64 KiB for Motherboard ROMs.
About physical characteristics, as previously mentioned, there were two major PC/XT Motherboard versions: The first one was the 64KB - 256KB, and the second one was the 256KB - 640KB, which was released around 1986. The IBM PC/XT 5160 Technical Reference mentions that the Motherboard physical dimensions are 8.5' x 12', which are aproximately 21.5 cm x 30.5 cm, and seems to apply to both Motherboards. That should make them identical in size to the second version of the IBM PC 5150 Motherboard. However, I have seen at least one source claiming that a PC/XT Motherboard measures 8.5' x 13'. Yet, as in the same sentence it says that the IBM PC Motherboards are of the same size than the PC/XT one, I suppose than that measurement is inherently wrong, since if the PC/XT Motherboard were to be longer, it would directly contradict both versions of the IBM PC 5150 Technical Reference...
The two PC/XT Motherboards had a few differences. The first, and rather obvious difference, is how much system RAM could be installed into the Motherboard itself, as the first one maxed out at 256 KiB RAM while the second one could fit the full 640 KiB for Conventional Memory. The latter is rather curious since it had asymmetric RAM Banks (Two Banks had 256 KiB RAM each and the other two had 64 KiB each, so they had to use different DRAM chip types), whereas the first had the same arrangement than the 64KB - 256KB IBM PC Motherboard.
Something worth mentioning is that in the PC/XT Motherboards, all the RAM chips are socketed, making troubleshooting a dead Motherboard easier than it was in the IBM PC since you can swap the Bank 0 DRAM chips for other known good ones without having to desolder. Yet another miscellaneous but still interesing change related to RAM is that the PC/XT, in order to determine how much system RAM was installed in the computer, had a Conventional Memory scanning routine in the BIOS that was executed during POST, so it pretty much autodetected it. This is in contrast to the IBM PC, whose Firmware just checked the position of a secondary block of DIP Switches and stopped testing the Conventional Memory when it reached the set amount.
The second difference between the PC/XT Motherboards is the amount of ROM memory in the Motherboard itself. The 64KB - 256KB Motherboard had just two ROM chips, a 32 KiB one and a 8 KiB one, that were mapped matching their contents expected position in the IBM PC 5150. The 32 KiB ROM chip one included the full contents of the 8 KiB BIOS, while the 32 KiB of the IBM Cassette BASIC were splitted and spanned the remaining 24 KiB of the 32 KiB ROM chip and the entirety of the 8 KiB ROM chip. In total that makes for 40 KiB worth of ROM, which remains unchanged from the IBM PC 5150 (If you don't count the IBM PC obscure optional 8 KiB ROM of the U28 socket). A quirk of this arrangement was that, as can be seen in the Memory Layout of BIOS and BASIC Motherboard Diagram of MinusZeroDegrees, the 8 KiB ROM was affected by partial address decoding and thus its contents were repeated four times in the Address Space, so it actually occupied 32 KiB in the Memory Map, wasting 24 KiB (960 KiB to 983 KiB. This overlapped with the address range reserved for the U28 socket in the IBM PC, but this was not a problem since no PC/XT version had such empty ROM socket at all).
The later 256KB - 640KB Motherboard instead used two 32 KiB ROM chips with the same basic arrangement, but it had a new BIOS version that was extended using the extra 24 KiB ROM space in the second ROM chip (For a total of 32 KiB for the BIOS, that now was noncontiguously splitted among both ROM chips, and a total of 64 KiB ROM memory, fully making use of the extended 64 KiB for Motherboard ROMs in the PC/XT Memory Map), so those 24 KiB of the address range weren't wasted mapping nothing anymore. The bigger BIOS included support for a few new Devices that included Keyboards, FDCs and Diskette Drives. This BIOS version is also known to do the previously mentioned Conventional Memory scanning and testing routine faster than the original 1983 one due to code optimizations. It was possible to upgrade the original 1983 64KB - 256KB PC/XT Motherboard with the two 32 KiB ROMs of the 1986 BIOS, as the second socket was compatible with both 8 KiB and certain 32 KiB ROM chips. As even more trivial info, both Motherboards had a Jumper that allowed to disable the two ROM Sockets in case that you wanted to have an expansion card with ROMs mapped there (As you couldn't bypass the 8088 CPU hardcoded requeriments for bootstrapping, this means that the Firmware was external since it would be located in an expansion card).
A major new feature of the IBM PC/XT was that it had models that came from the factory with a HDC (Hard Disk Controller) card and a 10 MB HD (Hard Disk). The PC/XT BIOS had no built-in support at all for these (Not even the 1986 one), instead, the HDC card had an 8 KiB ROM chip that the BIOS could use as a loadable BIOS Driver thanks to the Option ROM feature first introduced in the third and last version of the IBM PC 5150 BIOS (Aside from the BIOS timestamp, I don't know if that IBM PC BIOS version was available in the market before or after either the IBM PC/XT or the 64KB - 256KB IBM PC Motherboard, so is debatable if the loadable Option ROM feature was first available in the market with the IBM PC/XT). As already mentioned, IBM reserved a fixed address range for the HDC card Option ROM so that newer expansion cards could avoid that address range. The HDC and HD used the ST-506 interface, which became the standard for the IBM PCs and compatibles.
The PC/XT also introduced a new version of PC DOS, 2.0. The new OS included a lot of new features, like native support for Hard Diskses (The FAT12 File System received some minor modifications to make it usable in them. Besides reading and writing to a HD, PC DOS 2.0 could also be installed and booted from one), the introduction of directories (PC DOS 1.0/1.1 had no such concept, all the files were found in the root of the Drive. This also required changes to the way that FAT12 stored metadata in the disk media), a standarized interface for loadable Device Drivers (PC DOS 1.0/1.1 had no concept of loadable Drivers, either, the closest thing to that was to hack the OS itself to add the required functionality. This was the case with the pre-PC/XT third party Hard Diskses, they directly modified PC DOS 1.0/1.1), and support for TSR (Terminate and Stay Resident) applications. PC DOS 2.0 also had several major improvements to DOS subsystems that began to make it look like a minimalistic but real OS, compared to the ultra primitive PC DOS 1.0.
The enormous amount of space that HDs were capable of (10 MB was a lot in 1983) required new forms of data organization. While it should have been possible to have a simple booteable VBR (Volume Boot Record) in the first sector of the HD like it was done with Diskettes, that would have been quite suboptimal (Ironically, today that is often done with USB Flash Drives and external HDs. Formatting them that way is known as Superfloppy). The problem was that at the time, each OS had its own exclusive File System, and there was almost zero cross compatibility between different OSes and File Systems. Any type of exchange from data stored in a media formatted with a specific File System to another one had to be done with special tools. Thus, if you formatted a HD with FAT12 to use it with PC DOS, you would have been unable to natively use it with another OS like CP/M-86, unless you were willing to format it again at the cost of erasing all the data already stored in it.
As HDs were expensive and considered Workstation class Hardware, it was expected that they would be used by power users that needed the versatility of having more than a single OS installed in it. To resolve this issue, IBM decided to define a partitioning scheme known as MBR (Master Boot Record). By partitioning a HD then getting each partition formatted with its own File System, it was possible to use a single HD to install and boot multiple OSes within their own native environments.
When a HD is partitioned using the MBR scheme, the first sector (512 Bytes) of the HD contains the MBR itself. The MBR has executable code similar in nature to a booteable VBR, but it also contains a Partition Table that defines the start and end of up to 4 partitions (Later known as Primary Partitions). Each partition could be formatted with a different File System, and it could also have its own booteable VBR in the first sector of the partition. A major limitation of the MBR is that only one of those partitions could be flagged as booteable (Also known as Active Partition) at a given time, and the MBR would always load only that one. In order to boot an OS installed in another partition, you had to use disk tools that allowed you to set that partition booteable flag. Luckily, it was also possible to implement right after loading an Active Partition VBR a Boot Loader that had chain loading capabilities (Known as Boot Manager), so that it could load the VBR of another partition to boot whatever OS was in it.
Booting an OS from a HD wasn't complicated, instead, it was actually absurdly linear in nature. The BIOS had absolutely no knowledge of Partitions or File Systems, the only thing it knew how to do was to read the first sector of the HD, which is where the MBR is found. Thus, after the BIOS POSTed, it would instinctively read the MBR, then the MBR in turn would load the VBR of the partition flagged as booteable, and it could either bootstrap the rest of the OS, or have a Boot Manager that could chain load another VBR so that you could boot to another OS. When there are more that one HD in the same computer, it is possible to tell the BIOS which HD you want to read the MBR first from. This is essencially how booting from a Hard Disk (Including Dual Boot) worked during the next 30 years.
As some anecdotal experiences, perhaps you remember that during the long Windows XP era, there were some unwritten rules about the order that you had to do things whenever you wanted to prepare a system for Dual Booting, like installing first the older Windows version then the newer one, or, if you wanted a Dual Boot with Windows and Linux, it was recommended that you always installed Windows first, then Linux second. At the time, I knew that the rules were true as doing it in the wrong order pretty much wrecked the OS that was already installed, but it always puzzled me what black magic the OS installers used since it didn't seem that any particular file managed that.
For example, at some point I wanted to have a Windows 95 and Windows XP Dual Boot. The first time I tried to do it, I installed W95 to an empty, unformatted partition in a HD that already had a NTFS partition with a booteable WXP. The W95 installer formatted the empty partition with FAT16 and installed W95 there, but after that, I couldn't get WXP to boot again. I found no way to make the computer remember that it previously had WXP installed, it felt like its existence went entirely ignored. I moved the HD to another computer, and found that the WXP partition data was still there and intact (The data was not visible to W95 since the WXP partition was formatted using the NTFS File System, which W95 does not understand, but I was expecting that). As back then I didn't knew any way of fixing things other that reinstalling Windows, I decided to start over by installing W95 first then WXP, and that worked (At a cost of some hours of work and a lot of lost settings). I figured out that later Windows versions were intelligent enough to scan the other partitions looking for previous Windows installations, and automatically builded a Boot Manager so that you could choose which Windows installation you wanted to boot. However, Windows ignored any other unsupported OSes (Including newer Windows versions) when building the list of OSes for the Boot Manager, which is also why it was suggested to always install Linux after Windows. Not much later, I also had issues with computers with two HDs where a Windows installation in the second HD becomes unbooteable if the first HD is removed, something that made absolutely no sense since Windows had no reason to use the other HD at all. Just in case, I decided to always install Windows with only one HD plugged in the computer, then plugging the second one after finishing the Windows install process. Some time later I finally learned why that happened: Windows could write in the MBR of one HD to try to load a VBR found in another HD, forcing you to have both HDs unless you manually fix this.
All these things began to make sense after I finally understood the root cause: There is a high amount of manipulation that the MBR and VBRs silently suffers from in the hands of the Windows installers. The MBR and VBRs usually go unnoticed since you can't see them as standard files, even though their data exists on the physical disk, just that it is outside of any File System boundaries. The Windows installers always ask you in what partition you want to install Windows, but probably in the name of user friendliness, they always hide the low level details of what modifications they do to the MBR and VBRs of any connected HDs, causing unnecessary chaos with no justificable reason. With proper knowledge and specialized tools to repair the MBR and VBRs, fixing my issues should have been rather easy, but sadly, I didn't had them back then. I just learned about both how simple booting is and how much of a mess the Windows installers can make by reading Linux distributions installation instructions. Something similar happens when people talks about their "C" and "D" Drives when actually the letter is just a Windows nomenclature for a mapped partition that it can access (Partitions that Windows can't understand doesn't get a Drive letter), but that says nothing about how many physical HDs there are, nor how they are actually partitioned. The devil is always in the details...
The clone manufacturers didn't stand still after being able to legally clone the original IBM PC. As you could expect, they continued to chase from behind IBM and its newest computer. While I'm not exactly sure about specific dates (Someone that wants to crawl old computer magazines looking for era appropriate announcements or advertisements may be able to make such a list), the timeframe for the debut of the first PC/XT clones (NOT the PC-likes) should have been around one year after the IBM PC/XT launch, which means that they should have been available in the market just in time to meet the new IBM PC/AT. However, clone manufacturers weren't satisfied enough by just doing the same thing that IBM did but at a cheaper price, many of them actually attempted to create genuinely superior products by introducing newer features ahead of IBM while preserving full IBM PC compatibility, something that the other early x86 based PC-like platforms failed at.
One of the most surprising features that was first seen in clone computers was the Reset Button, which neither the IBM PC, the PC/XT, nor even the later PC/AT had at all. The usefulness of the Reset Button relied on the fact that sometimes a running application could become unresponsive in such a way that the Ctrl + Alt + Del Key Combination to reboot the computer was completely ignored (An example would be an application that masked IRQ 1, the Interrupt Line used by the Keyboard, then failed to restore it for some reason). If that happened in an IBM PC, PC/XT or PC/AT, your only option was to power cycle the computer (Unless you added a Reset Button via an expansion card...).
The Reset Button was implemented as a momentary type switch in the front of the Computer Case, with a cable that plugged into a header in the Motherboard that was wired to the RES line of the Intel 8284A Clock Generator. When the Reset Button was pressed, the 8284A CG would receive the RES signal, then in turn generate a signal through the RESET line that was directly wired to the Intel 8088 CPU. Basically, resetting the CPU was an existing Hardware feature that IBM didn't exposed in its computers for some reason. The Reset Button was the second member of the Front Panel Headers to be introduced, the first being the PC Speaker.
The most daring clones used as Processor the non-Intel but x86 compatible NEC V20 CPU, released in 1982. IBM wasn't the only major manufacturer that had products reverse engineered, cloned, and improved upon, the Intel 8088 CPU had become popular enough to guarantee such treatment, too. The NEC V20 was an impressive chip: It was around 5% faster than the Intel 8088 CPU at the same clock speed (Theorically, with code optimized specifically for it, it could be much faster, but in practice, V20 optimized code was extremely rare), supported the new x86 instructions introduced by the Intel 80186/80188 CPUs along with some custom NEC ones, had a mode that emulated the old Intel 8080 CPU that could be used by Software emulators to run 8080 based 8 Bits CP/M applications without needing a dedicated card (The NEC V20 did not emulated the popular Zilog Z80 enhancements made to the Intel 8080 ISA, applications that used these didn't work), and, best of all, remained pin compatible with the 8088. There were some scenarios where specific applications would not work with it due to low level details, but otherwise, it was the closest thing that was 8088 compatible without being an 8088. The NEC V20 was even used as an upgrade for the original IBM PC and PC/XT, as it was the only possible drop in replacement for the Intel 8088 CPU that was faster than it when running at the same fixed 4.77 MHz clock speed (The other alternative were upgrade cards). There was also a V30, that gave the 8086 the same treatment.
Finally, something minor but still noteworthy, is that the PC/XT clone Motherboards didn't bothered to reproduce the IBM PC/XT functionality up to the last detail. A notorious omission is the special behavior of the original IBM PC/XT expansion slot known as Slot 8, as the PC/XT clone Motherboards implemented it as if it was just another standard slot. This was perhaps for the better, as it means that you could fit any expansion card in it instead of only those that had explicit Slot 8 support. I'm not even sure if there was a PC/XT clone Motherboard that actually implemented the IBM PC/XT Slot 8 behavior.
By 1985, the PC/XT clone computers were reaching maturity level. While by that point IBM had already greatly extended the PC platform with their latest computer, the IBM PC/AT, it was a much more complex design than the IBM PC/XT and thus much more expensive, something that also applied to the early PC/AT clones. As such, it took several years for PC/AT compatible platforms to become affordable enough for mainstream users, and a few more to get into the low end of the market. In the meantime, systems based around the original PC platform continued to be designed, manufactured and sold, giving PC class based systems as a whole a surprisingly long life before being phased out of the new computer market altogether.
Besides price, something that made the PC platform useful life to last as long as it did, is that it became both possible and cheap to make a faster version of it. So far, the IBM PC/XT and its direct clones were still stuck at 1981 performance levels since they had the exact same chips running at the same clock speeds than the original IBM PC (The only reasonable upgrade was the already mentioned NEC V20 CPU), but in the meantime, Intel and the other chip manufacturers kept improving their manufacturing processes, allowing them to release new models of the same chips used by the PC/XT that were capable of running at higher Frequencies. Eventually, clone manufacturers decided to use the faster chip models along with a revised Clock Generation scheme in otherwise basic PC/XT designs. This new class of PC/XT clones became known as Turbo XTs. If you were to compare the Turbo XTs with the earlier PC-like computers that were faster than the IBM PC, the main difference is that a PC-like was at best just partially compatible with the PC platform, whereas a Turbo XT uses it as a base thus compatibility is far better with both its Software and its expansion cards.
As the Turbo XTs were going beyond the capabilities of the system that they were copying, I think that at this point, they actually deserve to be considered as its own class of IBM PC compatibles, which is a much more fitting and respectable term that being considered just mere clones. Actually, clone is usually considered a pejorative term, as it is dismissing the fact that most of the early vendors aspired to become serious competitors, and some eventually managed to become recognized brand names with their own custom designs and beating IBM in its own market, like Compaq. If anything, there were a lot of nameless, rather generic Turbo XT Motherboard designs in the late 80's that had no redeemable original feature, nor a manufacturer to make support inquiries, and were aimed to the low end budget market. For those, being called clones could be more fitting than for the early vendors that managed to provide full fledged computer systems.
The foundation of a Turbo XT Motherboard was the use of higher binned chips. Whereas the IBM PC and PC/XT had an Intel 8088 CPU rated for 5 MHz paired with support chips that matched that, Turbo XTs generally used the newer 8 or 10 MHz 8088 or NEC V20 models, and some even pushed for 12 MHz. As you should already know, the effective Frequency that a chip would run at depends on its reference clock input, so assuming the same Clock Generation scheme of the IBM PC and PC/XT, any faster 8088 CPU would still run at 4.77 MHz for as long that you used the same 14.31 MHz Crystal Oscillator. To get the CPU to run at 8 or 10 MHz, you had to use a 24 or 30 MHz Crystal Oscillator, respectively, so that you could derive the required 8 or 10 MHz CLK line that the CPU used as input. However, if you actually remember the entire IBM PC 5150 Clock Generation diagram, you should quickly notice that there is a major problem with this approach: If you change the 14.31 MHz Crystal Oscillator for any other one, absolutely everything in the system would run at a different Frequency. Most things could run at a higher Frequency, but some had issues if ran at anything other than the default clock speed for very specific reasons. The first platform designer that should have noticed this issue was not any of the early Turbo XT designers but IBM itself, since the IBM PC/AT design should predate the earliest Turbo XTs by at least a year, yet IBM had to face many of the same problems with its new system.
For example, suppose that you want to run an 8088 or V20 @ 8 MHz, be it by overclocking the 5 MHz models, using a properly rated 8 MHz one, or even underclocking a 10 MHz version. If you were to simply change the 14.31 MHz Crystal Oscillator for a 24 MHz one so that the derived CLK line runs at 8 MHz instead of 4.77 MHz, you would also be deriving a 24 MHz OSC and a 4 MHz PCLK lines instead of the default 14.31 MHz and 2.38 MHz (Which was further halved to 1.19 MHz), respectively. These clocks would absolutely wreck both the CGA Video Card and the Intel 8253 PIT. The CGA Video Card would be unable to output a NTSC video signal as intended, making it unusable with TVs, albeit it should still work with CGA Monitors. This was not a major issue since in a worst case scenario, you could replace the CGA Video Card with a MDA one and avoid the OSC line issue altogether (I don't know if a third party manufactured a CGA Video Card that includes its own Crystal Oscillator and Clock Generator to remove dependency on the OSC line, albeit that would mean that the card would have to deal with a more complex asynchronous Clock Domain Crossing).
The 8253 PIT was not as lucky: Its concept of time was directly related to the duration of a clock cycle. Basically, the PIT had no idea about what real time is, it is just a clock cycle counter, so it had to be externally calibrated against a real time second based on its 1.19 MHz clock input. If the PIT is running at a faster clock speed, each individual clock cycle would last less time, which means that the PIT would count the same amount of clock cycles at a faster rate, or, in other words, completing a count in a shorter period of real time. This directly translates in that anything that used the PIT to track time would be unaware that it is actually going faster than real time. While it may be possible to workaround this issue at the Firmware level by enhancing the BIOS Services to be aware that now a real time second could be worth a varying amount of clock cycles, Software that directly programmed the PIT would assume the standard IBM timings and misbehave as a result. As such, the PIT input clock had to be fixed to 1.19 MHz for compatibility reasons, and compared to the CGA issue, this one was completely unavoidable if you were making an IBM PC compatible platform, so it had to be solved somehow.
In order to resolve the dependency issue of the 14.31 MHz reference clock, one solution was to decouple the single system wide reference clock by using two Clock Generators and two Crystal Oscillators. Instead of a single Intel 8284A Clock Generator with a single 14.31 MHz Crystal Oscillator, now you had two 8284A CGs, one paired with the 14.31 MHz crystal to derive the 14.31 MHz OSC and 2.38 MHz PCLK lines for the CGA Video Card and the PIT, just like in the IBM PC, and another paired with a 24 or 30 MHz crystal that was used exclusively to derive an 8 or 10 MHz CLK line for almost all the rest of the platform. This was the common arrangement of a Turbo XT platform, but a bit complex since it required an asynchronous Clock Domain Crossing. Another solution was to use a single crystal and CG, but passing the 4.77 MHz CLK line by a clock doubling circuit to turn it into a 9.54 MHz one, with the advantages that it remained synchronous and was probably cheaper to do due to requiring less parts. IBM had to deal with the same issue in the PC/AT and decided to go for the former solution with two CGs, as did all PC/AT clones and compatibles, so the single crystal solution lasted only as long as the Turbo XT platform.
While an 8 MHz CLK line would also be overclocking the FPU (If present), the Buses and everything in them, like the support chips and expansion cards, this was not as much of a problem as the previously mentioned PIT since none of the other components were as dramatically bound to a fixed Frequency as the PIT was. Fortunately for Turbo XT based platforms, as the manufacturing technology progressed quite a bit, almost all the chips that depended on the CLK line either had available higher rated parts, or at least could tolerate being overclocked to 8 MHz (This was the case of the Intel 8237A DMAC, the highest rated one would go up to 5 MHz and was already in use by the original IBM PC. I'm not sure if the other manufacturers made faster ones, or if the 5 MHz parts had to be individually tested to make sure than the overclock was stable). As support chips were built-in in the Motherboard itself, there was no reason to worry too much about them as it was the responsability of the Motherboard (Or computer) manufacturer to make sure than if it is selling a Turbo XT Motherboard that officially supports running at 8 MHz, all its components can actually work at that clock speed, so I would take as granted that everything soldered was good enough.
In the case of the asynchronous DRAM, while it wasn't directly affected by the higher clock speeds, it still had to be quick enough so that the data was ready when the Processor was asking for it. Regarding the DRAM chips themselves, the Manual of a Turbo XT Motherboard mentions that the model intended for 8 MHz operation needed 150 ns or faster DRAM chips, while the 10 MHz model, 120 ns. Some Turbo XT Motherboards also had configurable Memory Wait States, so that you could add a do-nothing cycle for memory accesses to let slower RAM to catch up. Typical choices were 0 or 1 Memory WS. As I don't know about the MHz-to-ns ratios, I'm not sure what was the maximum access time tolerable for a given Frequency, or how much Memory WS you had to add. I suppose that if it was possible to add enough Memory WS, you could be able to use even the IBM PC original 250 ns DRAM chips, albeit the performance would have been horrible.
Expansion cards deserve their own paragraphs, since they could be fairly temperamental. As far that I know, I/O Channel Cards were never rated to be guaranteed to work to up to a determined Bus clock speed as individual chips like the CPU, FPU or DMAC were, nor they had a rated access time like the asynchronous DRAM chips. The only thing that seemed to be set in stone, is that at the bare minimum, an I/O Channel Card had to work in the original IBM PC or PC/XT with its 4.77 MHz I/O Channel Bus and 1 I/O WS (This statement should have been true in 1985 when the IBM PC was still common, but after it became obsolete, many newer cards that worked in faster platforms and were physically compatible with the IBM PC I/O Channel Slots would not work in it at all). Some early IBM PC cards were unable to work reliably at higher clock speeds, yet it should be safe to assume that era appropiated cards should have consistently worked @ 8 MHz at a minimum, as otherwise, without a good source of usable expansion cards, the Turbo XT platforms would have been completely useless. Regardless, if you were purchasing an entire Turbo XT computer, the responsability to test the bundled expansion cards to make sure that they were capable of reliably working @ 8 MHz or higher was of the system builder.
In order to have a higher degree of compatibility with expansion cards, some Turbo XT Motherboards had Firmwares whose BIOS Services did their expected function, but also executed some redundant code that served to add extra delay after I/O accesses, as a sort of artificial Software based Wait State (Basically, it was like a NOP Instruction, but done with a sort of dummy code instead of a real NOP, apparently because the NOP added more delay than needed). The problem with this approach was that these Software Wait States were not universal, as applications that accessed the Hardware directly instead of using the BIOS Services would completely bypass this workaround. An alternative that was technically possible but that I failed to find examples of, is a Turbo XT Motherboard that could inject I/O Wait States other than 1 (0 could also be a valid value), as that would have been an universal solution. You can read more in-depth info about this topic in this Thread, so you can have a grasp of how complex it was to deal with this.
I'm rather clueless regarding how expansion card compatibility issues were dealed with back in the day. It seems that it was impossible to know ahead of time if a particular expansion card would work in a specific Turbo XT computer without testing it first, given the high amount of platform variety and that there was nothing standarized above the IBM PC capabilities. Moreover, 8, 10 and 12 MHz platforms had a vastly different difficulty scale (8 was standard, 10 may have required stability testing, and 12 should have been on the borderline of the possible and thus require handpicking the cards). As even back in the day there were Hardware enthusiast communities, I would expect that there was a lot of informal knowledge regarding if a particular card model could consistently work at higher speeds, as if we were talking about the modern day "average overclock" (The clock speed that almost all chips of a given type can consistently get to). 12 MHz ended up being a clock speed wall for the I/O Channel Bus, which eventually forced the decoupling of the Processor from the Bus so that the CPU clock speed could be increased independently, without being burdened by the slower Bus and everything in it. As far that I know, no Turbo XT Motherboard tried to decouple them, it took until the Turbo AT platforms from IBM PC compatible manufacturers to begin the decoupling, and they finally became fully asynchronous when the I/O Channel Bus was standarized as ISA in 1988.
While everything mentioned previously was pretty much the bare minimum required from the Hardware perspective for a functional Turbo XT computer, there were still several compatibility issues left to deal with. A lot of applications, mainly games, didn't followed IBM best programming practices of using the PIT for timming purposes, instead, they used loops. The faster the Processor was, the faster the loop would complete, the faster the game would run. This was pretty much an almost identical situation than what happened with the PIT, with the major difference being that increasing the PIT clock speed didn't increased system performance, so there was no real reason to clock it higher, but that wasn't the case with the CPU. It seems like developers didn't expected that their Software could run in any other IBM PC compatible system with a CPU clock speed other than 4.77 MHz. Dealing with this compatibility issue required yet another solution: Selectable Frequency. This feature would be known as Turbo. Turbo is pretty much what differentiates the Turbo XTs from earlier PC-likes that weren't fully compatible with the IBM PC precisely because they were faster.
Turbo was typically implemented as an On/Off toggle type switch in the front of the Computer Case known as the Turbo Button, that was plugged into a header in the Motherboard (Technically it is the third member of the Motherboard Front Panel Headers, albeit by the year 2000 it was already extinct). Its function was to select between two Frequencies: The clock speed that the platform nominally supported, and a 4.77 MHz one that provided compatibility with Software tuned for the speed of the IBM PC. The switch worked by selecting which would be the reference clock source for the derived CLK line. In platforms with two Crystal Oscillator and Clock Generator pairs, it was rather easy for the 14.31 MHz crystal already in place for the PIT and CGA to also happily derive the 4.77 MHz CLK line in the same way that it was done in the IBM PC, while in the simpler design with one crystal and CG pair, the switch could be used to bypass the clock doubling circuit. Essencially, this means that a Turbo XT platform had two operating modes: An IBM PC clone mode that should in theory behave identical to an IBM PC or PC/XT and thus work with applications bound to the 4.77 MHz 8088 CPU speed (And even allowing to work older expansion cards), and a faster IBM PC compatible mode for everything else.
Other Turbo implementations allowed for Software control of the clock speed. It was possible to run an executable file with appropiate parameters to change the clock speed, or a DOS TSR (Terminate and Stay Resident) Driver that could do so by pressing a Key Combination similar to Ctrl + Alt + Del to dynamically switch clock speeds on the fly, even when an application was running. I'm not sure if they called a custom BIOS Service of that Motherboard Firmware, or if they interfaced with the clock source switching circuitry directly, both could be possible. I wouldn't be surprised either if there was a pure Firmware side implementation that could do on the fly clock switching with a Key Combination in the same way than Ctrl + Alt + Del, without wasting Conventional Memory in a TSR Driver. I'm also aware that some platforms had higher granularity and could select between multiple speeds, but I'm not sure what Clock Generation scheme they used to achieve that (It may be possible that they were entirely Software Wait States, too).
The switcheable clock scheme uncovered yet another issue: If an application was running when you switched the clock source, chances are that it crashed. This typically didn't happened if you were at the DOS Command Prompt when you switched the Turbo status, but applications like games were not happy, though each piece of Software had its own degree of sensitivity. As the Turbo XT platforms matured, newer designs allowed for better clock switching circuitry that could make dynamic switching as seamless as possible, so that you could turn on or off Turbo while applications were running without getting them to implode. These things that looks pathetically simple and stupid are the staple of our modern era: Just imagine that currently, the Core of a Processor is constantly jumping from its nominal Frequency to several Turbo States, and soon after can go into a low Power State for power saving purposes, then be clock gated entirely, then jump again into action. If things don't implode as they used to 30 years ago, is because they learned how to do seamless dynamic clock switching right.
And yet another compatibility issue that was uncovered by running at high Frequencies, is that the 8237A DMAC tended to be very problematic. This is not surprising since they usually were 5 MHz models with a severe overclock. Some Turbo XT platforms had Firmwares that could dynamically switch the CLK line to 4.77 MHz everytime that there was a DMA access going on. Assuming that the platform had mastered the seamless dynamic clock speed switching, this would have been entirely possible to do, yet it doesn't explain how it was supposed to work with applications that programmed the DMAC directly instead of going through the BIOS Services (Identical situation than for anything else that directly interfaced with the Hardware. This includes bypassing the Software I/O Wait States added by BIOS Services for expansion cards, and being the reason why the PIT has to run at exactly 1.19 MHz). I don't really know the details or specific platform implementations, just some comments that seems to point that some Turbo XTs did run the DMAC above 4.77 MHz all the time (Probably the 8 MHz ones, since 10 and 12 MHz are simply too much overclock for that chip), while others automatically slowed down to 4.77 MHz when the DMAC was in use, then switched back to normal (This would also have been very useful for all I/O operations, considering that expansion card compatibility was all over the place and that I/O Wait States were not configurable). It is possible than the systems that automatically downclocked to 4.77 MHz were all based on Chipset designs instead of discrete chips like the early Turbo XT platforms. However, is unfair to directly compare those late Chipset based Turbo XT platforms against the early ones made out of discrete support chips since they are not at the same maturity level, and Chipsets allowed a platform to do many tricks that would require a lot of custom circuitry to implement with individual support chips.
Very late Turbo XTs were made using Chipsets instead of discrete support chips. I prefer to cover Chipsets after the IBM PC/AT generation since the first ones seems to have targeted PC/AT compatibility, not PC, so chronologically speaking, Chipset based Turbo XT platforms appeared after the PC/AT ones. Chipset based platforms had slightly different Clock Generation schemes, since the Chipset could fulfill the role of a Clock Generator: It took as input the reference clock of two Crystal Oscillators instead of needing two independent CGs, and derived all the required clock lines on its own. Chipsets could also offer a few more features not found in implementations made out of discrete chips, like a RAM Memory Controller with support for more than 640 KiB RAM installed in the Motherboard itself that was able to remap the excess RAM as Expanded Memory (EMS) or as Shadow RAM. All these things are covered after the IBM PC/AT section, too.
I should make a few honorable mentions to other contemporary PC-like platforms that were partially IBM PC compatible yet weren't of the Turbo XT variety. A few of those were based on the Intel 8086 CPU instead of the 8088, like the Olivetti M24, which was released during 1983 and used an 8086 @ 8 MHz, making it among the first IBM PC compatible computers that could run IBM PC applications faster than it before the IBM PC/AT. Using the 8086 required a more complex design in order to interface its 16 Bits Data Bus with 8 Bits support chips and expansion cards, but at that point you were close to the PC/AT design, since IBM had to do the same thing to interface the 16 Bits Data Bus of the Intel 80286 CPU with the same 8 Bits support chips. Moreover, the 8086 had a massive disadvantage: It was not an 8088. You could pick an 8088, make it run @ 4.77 MHz, and get a CPU with identical compatibility and accuracy than the 8088 in the IBM PC, while an 8086 had no real chance to perform identical to an 8088 (The same applies to the NEC V20, but you could swap it for an 8088 without needing another Motherboard). As such, only the 8088 based Turbo XT platforms were theorically capable of matching the original IBM PC compatibility level while also having the capability to offer more performance in Turbo mode. I don't know specific edge cases where the 8086 had worse compatibility than the 8088, but chances are that like always, games and copy protection schemes were involved. Still, it didn't make a lot of sense to use more expensive parts to make a less compatible computer. Only after mainstream Software learned to properly behave regardless of the speed of the Processor there was a dramatic increase in x86 Processors variety, but by then, the PC based platforms were almost extinct.
The last fresh breath of the PC platform would be in April 1987, when IBM revived it for the entry level Model 25 and 30 of the PS/2 line, but I prefer to talk about them in their own section since it will be easier to digest if going through the PC/AT and Chipsets first. Ironically, the mentioned PS/2 models are 8086 based, not 8088. As amusing as it sounds, IBM own PS/2 line was less IBM PC compatible than many of the Turbo XT ones...
The IBM PC platform true successor was the IBM PC/AT 5170 series, released in August 1984. By then, IBM had already learned from the recent IBM PCjr fiasco that the IBM PC compatibility was a serious thing, so it had to make sure that the new computer was as backwards compatible as possible with their previous hit. As chronologically speaking IBM should have been the among the firsts to design a PC compatible platform that was faster than the original PC (I'm aware only of the Olivetti M24 being commercially available before the IBM PC/AT and compatible enough, the rest were just far less compatible PC-likes), IBM should also have noticed early on that doing so was not an easy feat. Meanwhile, Intel also had to go through a failure of its own with the unsuccessful iAPX 432 Processor architecture, which forced it to continue designing x86 based products as a stopgap measure for the time being. Amusingly, for marketing purposes Intel had rebranded all their x86 Processors series with an iAPX prefix, like iAPX 88 for the 8088 or iAPX 286 for the 80286, but eventually Intel reverted them back to the old nomenclature as to not taint the now successful x86 line with the failure of the iAPX 432.
The PC/AT is historically significant because it is the last common ancestor that all modern x86 platforms have. In the original IBM PC era, after the overwhelming difference that the IBM PC huge Software ecosystem made became evident, everyone ended up following IBM de facto platform leadership, which served as a sort of unifying force. This can be clearly seen in how the first generation of non-IBM x86 based computers, the PC-likes, that were similar to the IBM PC but only partially compatible with it, got followed by a second generation of systems, known as clones, that strived to be fully compatible with the IBM PC platform. The PC/AT era was similar, but in reverse: At first, everyone followed IBM leadership with a first generation of computers that were mere PC/AT clones, followed by a second generation where the top IBM PC compatible manufacturers decided to create new things of their own, and a divergence began when those manufacturers implemented exclusive platform features ahead of IBM with no standarization nor cross compatibility whatsoever.
Though many of these features or their specific implementations faded away after failing to achieve mainstream adoption and should be very obscure, some were popular enough to be adopted by everyone else. Yet, the base PC/AT compatibility was never compromised, as the new platform features were always added as a sort of superset of the existing PC/AT platform (And, by extension, the PC), requiring to be explicitly enabled by the OS or a Driver. Basically, until UEFI based Motherboard Firmwares became mainstream around 2012, x86 based computers always booted as PC/AT compatible computers with the Processor running in Real Mode (8086/8088 level ISA behavior and features). Everything else was built on top of that.
The IBM PC/AT is a major extension of the PC platform. It introduced a newer x86 Processor, the Intel 80286 CPU, widened the I/O Channel Bus to accommodate it, changed a few support chips and added some more. It also introduced a longer version of the IBM PC expansion slot that exposed the wider I/O Channel Bus in a second section, so that a new type of cards could make full use of the PC/AT platform features (Including more IRQs and DMA Channels) while remaining physically backwards compatible with the IBM PC cards. Like the IBM PC, the IBM PC/AT was an open platform, as IBM also released the IBM 5170 Technical Reference (March 1984) with all the goodies.
There were three notorious variants of the IBM PC/AT. The original one used the Type 1 Motherboard, with an Intel 80286 CPU @ 6 MHz. It got eventually replaced by the Type 2 Motherboard, that had the same specifications and was functionally identical, but was physically smaller. In April 1986 IBM released the IBM PC/AT Model 319 and Model 339, which used the Type 3 Motherboard, that was the same as the Type 2 but had highed binned parts so that it could run the 286 @ 8 MHz. Finally, there was a little known model, the IBM PC/XT 5162 (Model 286), that is usually considered part of the PC/AT family, since despise being branded as a PC/XT, it was fully based on the PC/AT platform. The PC/XT 5162 used a different Motherboard that was smaller than the PC/AT 5170 Type 2/3 and functionally slighty different, but otherwise PC/AT compatible.
The first major component to be changed in the PC/AT was the Intel 8088 CPU, which got replaced by the Intel 80286. The most visible external change caused by this, is that instead of the 8088 8 Bits Data Bus and a 20 Bits Address Bus that gives it an 1 MiB (2^20) Physical Memory Address Space, the 80286 had a 16 Bits Data Bus and a 24 Bits Address Bus that gives it a 16 MiB (2^24) Physical Memory Address Space. Moreover, the Buses were not multiplexed into the same Pins anymore, each Bit had its own dedicated line. As such, the new Processor required a major rework of the buffer chips section that separated the Local Bus from the I/O Channel Bus, as it had to be extended to support both its wider Bus and the dedicated lines.
Internally, the 80286 had a plethora of new features, the major ones being the introduction of an integrated MMU (Memory Management Unit), and a new operating mode, Protected Mode, to make use of it. Lesser features included Hardware Task Switching and some new instructions. Ironically, all of these features would be barely used during most of the useful lifetime of the PC/AT. The saving grace of the 80286 was simply that it had a faster execution unit than any of the previous x86 Processors, as it took it less clock cycles to process the same machine code than the 8088, making it perform much better than it even if it were to be running at the same clock speed. This ended up being the 286 biggest, and almost sole, selling point for the average user.
The MMU is a dedicated Coprocessor that is used by OSes that implements Virtual Memory to offload the dynamic translation of Virtual Addresses to Physical Addresses, dramatically reducing the overhead compared to doing so purely in Software (Something that was done experimentally in other platforms before the MMUs came into existence. I'm not entirely sure if someone tried a Software only implementation of Virtual Memory for the original IBM PC platform, but good candidates would be the Microsoft Xenix ports for it, which are not to be confused with the ports for the PC/AT). I suppose that you also want to know what Virtual Memory and Virtual Addresses are, and why they are a major feature. Basically, Virtual Memory is the rather old concept of Address Space virtualization, which provides each running application its own private and exclusive Virtual Address Space.
In a platform without support for Virtual Memory or any other type of Memory Protection, like the original IBM PC, the OS and all applications being run sees and shares the same Physical Address Space. This essencially means that any code being executed can potentially read or write directly to any address. In that environment, creating an OS with Multitasking capabilities was very hard, since it required that all user applications were aware that not all the visible memory was theirs to use. At most, it would have been possible to create an OS API that cooperating applications could call to limit themselves to use only the memory address ranges that the OS allocated for each of them, yet this would not stop a badly coded application from accidentally (Or intentionally...) thrash the memory contents of another one, or even the memory of Devices that used MMIO. This is the case of PC DOS/MS-DOS, any application could do whatever it wanted, so it was not practical to transform such Single Tasking OS into a full fledged Multitasking one (Many attempts were made, like IBM TopView or Digital Research Concurrent DOS 286, but with mediocre results. Yet, in a few more years, everything would change with a feature introduced by the Intel 80386 CPU...).
In an OS that implements Virtual Memory, each application sees only its own, exclusive Virtual Address Space, while the OS, with the help of the MMU, manages where the Virtual Addresses that are in use are really mapped to in the Physical Address Space. As Virtual Memory is absolutely transparent to the applications themselves, the OS has full control of the Memory Management, allowing for easier implementations of Memory Protection and thus Multitasking. It also simplifies applications themselves, as they can all assume that all usable memory is contiguous, completely forgetting about gaps or holes (Remember the first IBM PC BIOS version supporting noncontiguous memory?). Finally, Virtual Memory also allows for Swap File or Page File implementations, which is when you use a file or partition in standard storage media to hold contents of data that an application believes to be in the system RAM, a technique that was widely used in the old days since RAM was quite expensive, and it was assumed than Multitasking with very low performance was better than no Multitasking. However, this feature was a byproduct of Virtual Memory, and is NOT the reason why it exist in the first place (If you think otherwise, blame Windows 9x for calling the Page File Virtual Memory).
Implementing Virtual Memory in the 286 required a rework of the 8086/8088 Segmented Memory Model scheme, which I like to call Segmented Virtual Memory. Like the 8086/8088, the 286 was a 16 Bits Processor with a Physical Address Space larger than what the value of a single 16 Bits GPR could address, so it still had to use the Segment and Offset pairs to access an individual address. However, in order to accomodate both the larger 24 Bits Physical Address Space and the Virtual Memory Address Space that was layered on top of it, the Segment and Offset pair had a different meaning in the 286 compared to the way that they worked in the 8086/8088.
The 286 made full use of the 16 Bits of the Segment and Offset pairs to effectively create a compounded 32 Bits value for an individual Virtual Address. From these 32 Bits, 2 were used for the Privilege Level (Part of the Memory Protection scheme, colloquially known as Rings 0, 1, 2 and 3), leaving 30 Bits for addressing that effectively resulted in a maximum of 1 GiB (2^30) Virtual Memory Address Space for each running application, assuming that it was organized as 16384 (2^14) Segments of 64 KiB (2^16) each. From those 30 Bits, one Bit was used to partition the Virtual Address Space between Global and Local, leaving a 512 MiB space that had the same contents for all applications (Used for addressing of the OS API, common libraries, etc), and a 512 MiB private space for each individual application. Note that all this applies only to the Memory Address Space, the I/O Address Space stayed with the same 64 KiB (2^16) worth of physical I/O Ports, and actually, was never virtualized nor extended.
Virtual Memory incurs in a RAM overhead, since there has to be some physical place to store the data about which Virtual Addresses are mapped to which Physical Addresses. It is impractical for the mapping data to contain individual address level granularity, since it would mean that for each Virtual Address that is mapped to a Physical Address, you would be spending no less than 7 Bytes (32 Bits from the Virtual Address itself and its Privilege Level plus 24 Bits of the Physical Address) to hold mapping information about something that is actually worth a single Byte. The only way to implement Virtual Memory in an efficient manner, is by mapping chunks of address ranges, like the Segments themselves. There were data structures known as Segment Descriptor Tables located somewhere in the system RAM that did exactly that, effectively being used as the workspace of the MMU. Each Descriptor Table was 8 Bytes in size and contained the mapping of an entire Segment, which could vary in size from 1 Byte to 64 KiB, so the Virtual Memory overhead wasn't that much in terms of raw RAM if using adequate granularity. The whole Virtual-to-Physical translation added extra complexity that incurred in some overhead even when having a MMU to offload it, however.
The other major feature of the 80286 was the introduction of Protected Mode. As the new meaning of the Segment and Offset pair changed a very basic behavior of how addressing worked in the x86 ISA, all preexisting code would be pointing to incorrect addresses. As such, in order to remain Binary Compatible with executable code intended for the 8086/8088 and 80186/80188 CPUs addressing style, most of the 286 new functionality required to be explicitly enabled by setting the CPU to operate in the new Protected Mode. The 286 itself started in a mode now known as Real Mode, where the Segment and Offset pair (And several more things, but not all) behaved like in the previously mentioned older Processors. Switching the 286 to Protected Mode was a prerequisite to use both the 16 MiB Physical Address Space and the MMU, otherwise, the 80286 would behave just like a faster 8086 and barely anything else. When in Protected Mode, all addressing was processed by the MMU Segmentation Unit even if not using Virtual Memory, Memory Protection or such, causing some performance overhead when compared to Real Mode.
Though PC DOS/MS-DOS itself didn't used Protected Mode, DOS applications had full control of the computer and could do so if they wanted, something that made sense for any application that was constrained by the 640 KiB Conventional Memory barrier. There was a problem with this approach, which is that the BIOS Services (Including those added by expansion cards Option ROMs) and the DOS System Calls couldn't be used from within Protected Mode, as all existing code assumed Real Mode addressing style. That is where a major limitation than the 286 Protected Mode had became very noticeable: There was no way for Software to return to Real Mode after enabling Protected Mode. Intel idea was that a new generation of OSes and applications relied exclusively on the new feature, not for it to be used as a mere extension that a lone application could enter and exit at will.
As the primitive PC DOS was the most common OS in use and there was no mainstream Protected Mode successor on the horizon, not being able to return to Real Mode on demand to use the DOS API was a rather harsh limitation. Without proper OS support, any standalone application that wanted to use Protected Mode would have to reimplement everything on its own, massively bloating the application and making the development process far more complex. As such, a commercial DOS application that used the pure form of the 286 Protected Mode should be very rare, I'm not even aware if one existed at all. The only example that I know that did used in the IBM PC/AT the pure 286 Protected Mode as Intel envisioned, is an OS and all its native applications: The UNIX based IBM PC XENIX, which IBM licensed from Microsoft and was released less than a year after the PC/AT. However, Xenix was always aimed at the high end OS market, neither it nor any other Xenix variant ever competed directly against PC DOS (Albeit one was supposedly planned at some point), which reigned supreme among average users. One can wonder how different history would be if IBM decided to push Xenix instead of PC DOS back when it had undisputed control of the PC platform...
From the minor features of the 80286, a rather interesing yet mostly unknow one is Hardware Task Switching, something that Intel should have included expecting it to be useful for Multitasking OSes. Using this feature had a few disadvantages, the most surprising one being that doing Task Switching using the built-in Hardware support was usually slower than doing Context Switching fully in Software. As such, the Hardware Task Switching function of x86 Processors ended up being a completely unused feature. Actually, while it is still supported by modern x86 Processors for backwards compatibility purposes, it can't be used while running in Long Mode (64 Bits), effectively making this feature obsolete.
The other minor feature is that the 80286 added some new instructions to the x86 ISA, albeit most of them deal with Protected Mode Segmentation thus they were not very useful. Also, as the introduction of new instructions works mostly in a cumulative way since a new Processor typically has all the instructions of the previous ones based on the same ISA, the 80286 included the instructions previously introduced by the 80186, too. Intel completely ignored the NEC V20/V30 custom instructions, making them exclusive to NEC x86 ISA extension and thus short lived.
Deploying the Intel 80286 CPU required a few new support chips that were superceding versions of the 8086 ones. The 286 was intended to use the Intel 82288 Bus Controller instead of the 8288 (Furthermore, there was no Minimum Mode that I'm aware of like with the 8086/8088, so you always required the Bus Controller), and the Intel 82284 Clock Generator instead of the 8284A. On the Coprocessor front, the 80287 FPU succeded the 8087. However, at its core, the 80287 was an 8087 and performed like it, Intel just upgraded the external Bus interface to make it easy to wire to the new 286 CPU but otherwise didn't improve the actual FPU. Interesingly, the 80287 had pretty much no formal external Address Bus, just a Data Bus. A successor to the 8089 IOP was never developed, making that Coprocessor ISA a dead end.
If there is a single thing than the 80286 did over the 8088 that actually mattered when the IBM PC/AT was launched, it was simply to be much faster than it even when executing code that only used the features of the basic 8086 ISA, instantaneously bringing tangible performance benefits to all the preexisting IBM PC applications (Keep in mind than when the IBM PC/AT was released, there were already a few PC-likes that were faster than the IBM PC and could run many applications, but compatibility was usually mediocre. And the PC/AT 80286 CPU @ 6 MHz was quite faster anyways). What made the 80286 perform better is that it took significantly less clock cycles to process the same machine code (Opcodes) than the 8088, making the 80286 able to execute more instructions in the same time frame and thus perform faster even if it was running at the same clock speed than the 8088. This is also the case of the NEC V20, it was slighty faster than the 8088 because it also took less clock cycles to process the same machine code, but the difference was rather small and pales in comparison to how faster the 80286 was.
In general, there are only two viable ways to increase compute performance for existing Software: The first is to run the same Processor at faster clock speeds, which is what the Turbo XT platforms did with their 8088s running at 8 or 10 MHz. The other is to design a Processor that can execute the same machine code in a more efficient manner thus making it faster, like the Intel 80286 and the NEC V20 did. If you have ever heard about the term IPC (Instructions per Cycle) before, what it describes is precisely these performance differences at the same clock speed. There is a sort of balance between the concepts of clock speed and efficiency, since the more efficient the Processor is (Where more efficient usually means more complex), the harder it is to make that design clock higher, so the idea is to get to a sweet spot where the complexity and possible attainable clock speed gives the best overall performance. From the multitude of things that can increase a Processor efficiency, one that early on was done very often, was to optimize the Processor execution unit so that processing opcodes took less clock cycles.
The amount of clock cycles that it takes for the Processor to process an instruction isn't an uniform fixed value, instead, it depends on the machine code that it produces (An instruction can produce one or multiple opcodes) and the context of the operation itself. In the old days, the execution latency of an instruction in a specific context was pretty much a known fixed contant value, as the Processors were rather simple and the Memory subsystem predictable. In the case of modern Processors, it takes a lot of work to get the exact execution latency of an instruction because there are a ridiculous amount of variables at play (Multiple Cache memory levels, Branch Prediction Units, multiple parallel execution units, decoding instructions into MicroOps, OoOE (Out of Order Execution), MacroOp Fusion, resource sharing due to Simultaneous MultiThreading, variable Memory subsystem latency, etc) that makes for an almost infinite amount of contexts, so at most, you may get an average value that speaks more about the efficiency of the Processor as a whole rather than of the execution unit as an isolated entity. As such, theorycrafting what the maximum possible performance of a modern Processor is can be much harder, and even more so attemping to get close to it.
Since the execution latency of each instruction and context is different, optimizations to the execution unit does not produce uniform IPC increases, instead, the apparent IPC would greatly depend on the instructions most used by the code being processed. For example, the NEC V20 could be several times faster on some operations compared to the 8088, and maybe it is possible that if running some specialized code that exploited its strengths (Without using the V20 custom instructions, so that the same code could also run on the 8088 and 80286), it could come close to 80286 level performance, but that would never happen in real world usage scenarios where it just averaged around 5% more performance. Precisely due to this reason, IPC is a rather ambiguous and overgeneralized term that can only be meaningful as a sort of average value when running code from real applications, as some handcrafted tech demos may give rather misleading results. With all factors taken into account, in the average IPC scale, the Intel 8088 comes last, followed by the NEC V20, the Intel 8086 (Due to its wider Data Bus), the NEC V30 (Like the V20, but for the 8086 instead of the 8088), then finally the 80286 on top of them, with a significant lead margin.
Perhaps by now, you should have already figured out that due to the fact that IPC increases are not uniform, is impossible for the 80286 to be cycle accurate with the 8088, no matter at what clock speed you run it at. Whereas in a Turbo XT you could underclock a faster binned 8088 (Not a V20) to 4.77 MHz and get identical results to an IBM PC or PC/XT, at no clock speed the 80286 would perform exactly identical to the 8088. This means that the IBM PC/AT, by design, was unable to be 100% IBM PC compatible in the most strict sense of that definition.
When designing a new Processor, the designers have to face decisitions about performance tradeoffs, since they can prioritize to improve the execution unit to process the opcodes produced by certain types of instructions faster, but maybe at the cost of not being able to do so for another type of instructions, or worse, they could be forced to make these slower compared to its predecessor. In the case of the 80286, even if its IPC increases were not uniform, it was still faster than the 8088 in pretty much any context. Obviously, an application could be optimized for the 80286 so that it did the same thing but using the most favourable instructions, albeit that code should be suboptimal if it ran in an 8088. However, the execution units of later x86 Processors like the 80386 and 80486 evolved in a radically different way, as they focused in processing as fast as possible the most commonly used instructions, whereas the less common ones actually tended to perform worse in each successive generation. This ended up causing some issues for the Software developers of the era, since they had to decide which Processor they wanted to optimize for, almost always at the detriment of another one.
Back in the day, optimizing Software was much harder than today. The compilers of the era were quite dumb and couldn't exploit all the potential of a CISC ISA like that of the x86 Processors, so for anything that required speed, any optimization relied on the use of handcrafted x86 Assembler code. At a time where Hardware resources were quite limited, the difference between compiled C code and optimized ASM could be rather brutal, so it was important to get a skilled x86 Assembler programmer to do at least the most compute intensive parts of the application. ASM optimizations could not only make the difference between a real time application like a game being fluid or a slideshow, it could also be massively useful in productivity applications like Lotus 1-2-3, which was much faster than its rivals thanks to being written entirely in ASM (Among other tricks, like using the Video Card directly, skipping the BIOS Services).
As these code optimizations had to be done manually, optimizing for multiple Processors means that the Assembler programmer had to know the idiosyncrasies of each of those, and would also have to do its job two or more times for each target, something that was expensive in both cost and time. Moreover, including multiple code paths (And a way to detect which Processor the computer had, at a time where figuring out so was not standarized) or providing different executables could increase the application size, something not desirable as Diskettes capacities were very small. The end result is that Software developers typically choosed a Processor to serve as the lowest common denominator that the application was expected to run in, then optimize only for it, letting clock speed increases handle the potentially lower relative IPC in later Processors. For example, a programmer during the late 80's could have had two choices: To optimize its application targeting an IBM PC/AT with an 80286 @ 6 MHz, guaranteeing that on a newer 80386 @ 16 MHz it would be much faster anyways due to the much higher clock speed, or to optimize for the 80386 to perform even better on it thus giving some headroom to add more features, but maybe at the cost of being just too slow to be usable enough on the older 80286.
During the next three decades, both Processors and compilers evolved a lot. The evolution of the x86 execution units, with two notorious exceptions, ended up being a sort of a feedback loop: Software developers had to optimize their code to use whatever the mainstream Processors available at a given moment did better, then the next generation of Processors focused on doing the same thing even better than the previous one since that accounted for the majority of code already found in the commercial applications reelevant during that era. As such, modern x86 Processors, in general, benefit roughly from most of the same optimizations. This also caused an implied chicken-and-egg scenario: Designing a vastly different execution unit without being able to push around the entire Software ecosystem to optimize for what the new Processor did better, would just give counterproductive performance results that made such Processors look bad. That is the case of the mentioned two exceptions, as they were Processors that had extremely different execution units compared to other contemporaneous ones, and only shined in very specific circunstances. These two were the relatively recent AMD Bulldozer (FX) architecture, which was quite different when compared to the previous AMD K10/K10.5 (Phenom/Phenom II), Intel Conroe (Core 2 Duo) and the contemporaneous Intel Nehalem (Core i7 1st generation), and a much more famous one, the Intel NetBurst (Pentium 4) architecture, that was radically different when compared to the Intel P6 (Pentium Pro/2/3/M) or AMD K7 (Athlon/Athlon XP).
Meanwhile, compilers matured enough to almost nullify the gap with handcrafted Assembler, and also learned how to compile code optimized for each Processor architecture. Currently, a programmer only has to focus on writting code that follows the general optimizations guidelines, then simply offload the Processor architecture optimization job to the compiler, which can produce many codepaths or different executable files optimized for each Processor. Actually, modern compilers are so good that they can produce faster executables than what an average ASM programmer can do, since the compiler can implement a whole bunch of optimizations reelevant in modern Processors that are quite complex for humans to do manually. It takes an expert Assembler programmer to attempt to beat a modern compiler, and only in very specific niches there will be substantial performance differences that makes the effort worth it.
The previous sections covers the basic features of the Intel 80286 CPU, yet that pales in comparison to the amount of custom circuitry that IBM implemented in the PC/AT to workaround some of the 286 shortcomings. All these would forever influence how the x86 architecture evolved, and not in a good way...
Maybe the biggest issue with 286 Protected Mode is that you were not supposed to be able to return to Real Mode after enabling it, being barred from using the BIOS Services or the DOS API. However, someone discovered a workaround that made doing so possible. The workaround consisted in sending a reset signal to the 80286 while preserving the computer state and RAM contents by keeping everything powered on. When the Processor was resetted, it would restart in Real Mode, then, as you already know, it would begin to execute code loaded from a known fixed address location that is where the Firmware is mapped to. With a specialized Firmware, it was possible to create an interface where an application that wanted to use Real Mode could save to a predefined RAM address data about the last Processor state and the next address location where it wanted to continue executing code right before triggering the CPU reset. After the reset, the Processor would begin code execution by loading the Firmware, that in turn, would check very early the predefined RAM address contents to figure out if it was a normal boot or if instead it was an intentional CPU reset. If there was valid data in that predefined RAM address, it could load the previous state and directly resume code execution at the next specified location, completely bypassing the standard POST and Hardware initialization procedures. This way, it was possible for an application running in the 286 Protected Mode to return to Real Mode. It was a massive hack, but it worked. The method required a lot of preparation to use since, as you also already know, in Real Mode you're limited to 1 MiB Physical Address Space, which means that the code that would be executed in Real Mode had to be found in that addressable MiB. The most fun part about this 286 reset hack is that we have a rather similar interface in our modern days, in the form of the ACPI S3 Low Power State, which can shut down then restart most of the computer after saving its state while keeping the system RAM powered to resume where it left.
IBM seems to not have agreed with Intel strict vision of Protected Mode since it implemented the described 286 reset hack in the IBM PC/AT. The way that it worked was by wiring a GPIO Pin from the new Intel 8042 Microcontroller (Whose main purpose was to replace the IBM PC Intel 8255 PPI as Keyboard Controller), known as P20, to the RESET line of the Processor. Software could then operate this line via an I/O Port. An application that entered Protected Mode could use the 286 reset hack to return to Real Mode, do whatever it wanted to do with the BIOS Services and the DOS API, then enter Protected Mode again. The PC/AT BIOS itself used the 286 reset hack during POST so that it could use Protected Mode to test all the physical RAM, then return to Real Mode to boot legacy OSes like PC DOS as usual. Sadly, while the PC/AT 286 reset hack was functional, it was rather slow. An application that had to use it often to invoke the BIOS Services or DOS System Calls would incur in a massive performance penalty, something that made reliance on the 286 reset hack rather undesirable. Moreover, how much Conventional Memory was free was still important for applications that used both Real Mode and Protected Mode since, as mentioned before, the code and data that relied on Real Mode had to fit in the first addressable MiB (Or more specific yet, the 640 KiB of Conventional Memory). Basically, the reset hack made possible for DOS applications to use more RAM, but via a dramatically complex method.
Less than a year after the IBM PC/AT was released, a Microsoft developer discovered and patented an alternative way to reset the 80286 CPU that was faster than using the Intel 8042 Microcontroller to initiate the reset hack. The new method consisted in basically the same things that had to be done to preserve and resume the computer state with the previously described reset hack, but instead of resetting the Processor via external means, the reset was achieved by performing a controlled CPU Triple Fault, that was much faster and didn't required additional Hardware at all. This method would eventually be prominently used in the OS/2 Operating System that was jointly developed by IBM and Microsoft, and released at the end of 1987. It was also used in some Kernel versions of Windows intended for the 286, like Windows 3.1 running in Standard Mode.
The CPU Triple Fault is one of the reasons why it was quite hard to emulate or virtualize OS/2, as not all the emulators or VMM/Hypervisors knew how to deal properly with such event. Due to the fact that a patent was involved, I doubt that there was any other major commercial Software that triggered a CPU Triple Fault to return to Real Mode, since using it may have been opening the door to a lawsuit. Moreover, any Software targeting the Intel 80386 or later CPUs could switch back and forth between Modes by simply changing a Bit, as Intel ended up making the return to Real Mode from Protected Mode an official feature of the x86 ISA. It can be said that both the reset hack via the 8042 Microcontroller and the CPU Triple Fault are pretty much 80286 specific, and thus, very obscure, but they should have been supported for compatibility purposes for a long time, and perhaps they still are.
The other major issue of the 80286 CPU involves backwards compatibility. Though in theory the 80286 was fully backwards compatible with the previous x86 Processors thanks to Real Mode, its behavior wasn't 100% identical to any of them. Both backward and forward compatibility is usually tested and guaranteed only for properly defined and documented behavior, yet sometimes programmers decide to exploit quirks, bugs (Formally known as erratas), or undocumented instructions (More on that later...) that are not part of the standarized ISA of a particular Processor series. As such, applications that relies on any of those may fail to work when executed in later Processors that doesn't behave as expected, as the Processor designer is under no obligation to maintain support for nonstandard or unintentional behavior.
In this case, the 8086/8088 CPUs had a very particular quirk: While thanks to the 20 Bits Address Bus they had an 1 MiB Physical Address Space, the way that their Segmented Memory Model worked made possible for these Processors to internally address a range of almost 64 KiB above the 1 MiB limit, from 1024 to 1087 KiB. Because the 8086/8088 didn't had the 21 Address lines that would be required to externally express that in binary, the missing upper 1 Bit caused that any address that the Processor thought that was in the 1024 to 1087 KiB range to appear in the external Address Bus as if it was in the 0 to 63 KiB range. This quirk would become known as Address Wraparound. The 286 in Real Mode, instead, considered that range entirely normal and used 21 Address lines. Basically, the 8088 and the 80286 would send via the Address Bus two different addresses for any operation within the mentioned address range, obviously getting two different results. For some reason, a few IBM PC applications relied on the Address Wraparound quirk instead of addressing the 0 to 63 KiB range directly. As the 286 didn't reproduced that behavior even when running in Real Mode, those applications would run fine in the IBM PC with an 8088, but fail if using the newer 80286.
IBM once again stepped in with a workaround for this 286 shortcoming. The PC/AT added some discrete logic infamously known as the A20 Gate, which managed the 80286 A20 Address line (Corresponding to Bit 21 of the CPU Address Bus). The A20 Gate was wired to another spare Pin of the Intel 8042 Microcontroller, known as P21, so that Software could control it via an I/O Port like the previously mentioned 286 reset hack to exit Protected Mode. The default state of the A20 Gate was to force the 80286 A20 Address line to always return 0 so that it could reproduce the 8086/8088 Address Wraparound quirk while in Real Mode, and it could also be configured to let it to function as normal for when in Protected Mode. Thanks to this external help, the 80286 achieved better backwards compatibility with the previous x86 Processors than if it was used as a standalone CPU. Something worthy to mention is than this hack wasn't specific to the IBM PC/AT and clones or compatibles, the NEC PC-98 based computers that used the 80286 CPU also implemented their own version of the A20 Gate hack (Albeit it was managed from a different I/O Port, so like everything in the NEC PC-98, it was IBM PC like but not IBM PC compatible), and it isn't hard to guess that it was for the exact same reason.
The OS/2 Museum site did a long series of articles hunting for IBM PC applications that relied on the 8088 Address Wraparound quirk, identifying the CP/M compatible CALL 5 System Call implemented in PC DOS/MS-DOS and any application that used it, like MicroPro WordStar, and anything compiled with IBM Pascal 1.0. There were also post-IBM PC/AT offenders found like the spellchecker tool of early versions of Microsoft Word, as it used the CP/M CALL 5 interface (This one was actually discovered by PCjs), and any executable compressed with Microsoft EXEPACK. However, being released after the IBM PC/AT, these didn't influenced its design at all thus are irrelevant, yet after users and developers forgot the intricate details of the A20 Gate, it was possible for some applications to fail with no explainable reason because not everyone was taking the A20 Gate state into account.
A major omission of the A20 Gate hack is that IBM didn't implement a BIOS Service to standarize how to manage the A20 Gate for things like to enable it, disable it, or ask it about its current status (It was possible for this to be a classic BIOS Service since it could be called from Real Mode both before entering and after exiting Protected Mode). This would become a problem several years later as all PC/AT compatibles had something that could do what the A20 Gate did, but not everyone managed it in a PC/AT compatible way, thus, with the lack of a BIOS Service, there was no last resort HAL to manage the A20 Gate. Later Software Drivers whose purpose was to manage the A20 Gate, known as A20 Handlers, had to become unnecessarily bloated and complex because they had to support all the different custom implementations. Even further adding to the complexity is that back then there was no standarized function to ask the system which computer model it was, so the only way for an A20 Handler to automatically get things working was by using a sequence of tests that tried to find the idiosyncracies of a specific computer model before it could decide what code path could be used to manage the A20 Gate in that particular system (Alternatively, an advanced user could manually configure the A20 Handler if it allowed the user to do so).
While the A20 Gate was intended to be used to maintain full backwards compatibility with the IBM PC 8088 behavior when the 80286 was in Real Mode, it could be managed completely independent from the Processor mode. Several years after the IBM PC/AT debut, OSes like Windows/286 and PC DOS 5.0/MS-DOS 5.0 would include A20 Handlers that intentionally made use of the standard behavior of the 80286 A20 line while in Real Mode to get access to the extra 64 KiB memory range, assuming that there was RAM mapped there for it to be useful. The 1024 to 1087 KiB Address Range would become known as the HMA (High Memory Area). The HMA was useful because the memory there could be addressed directly from within Real Mode, without needing to go through Protected Mode at all. In the particular case of PC DOS/MS-DOS, if the HMA was available, they could move data from the OS itself that traditionally used valuable Conventional Memory into it, and since this was done in a completely transparent way as the OS took care of enabling and disabling the A20 Gate everytime that it had to access its own data located in the HMA (I suppose that at a performance penalty due to the extra overhead), any regular Real Mode application could benefit from the extra free Conventional Memory without needing to directly support the HMA or anything else.
A thing that still puzzles me, is that I don't recall ever hearing about how upgrade cards for the IBM PC were supposed to deal with the Address Wraparound quirk, as any 80286 or 80386 based card (Like the Intel InBoard 386/PC) should have faced the exact same Processor related problems than the PC/AT. It was certainly possible to include the A20 Gate functionality in the upgrade card itself, and even to map the A20 Gate controls to the same I/O Port that in the IBM PC/AT for compatibility with any Software that managed it. Sadly, I don't know details about specific implementations. I would not be surprised that in some cases, even if a manageable A20 Gate was present, the implementation was not compatible with that of the IBM PC/AT and thus failed to run with some Software. In a worst case scenario, the A20 Gate could either use a Jumper or be completely hardwired, which would make dynamic switch via Software impossible. The only thing that I'm certain, is that I doubt that the upgrade cards used an Intel 8042 Microcontroller to manage the A20 Gate, since it would have been expensive and pointlessly overkill to include it just for that, so if upgrade cards did include the A20 Gate hack, they likely used something else to control it. I'm not sure either how they were supposed to deal with the 286 reset hack, either, albeit the Triple Fault method should still be possible.
By this point, you get the idea that the story of the 80286 CPU is about an incomplete Processor whose shortcomings had to be workarounded with hacks. Keep in mind that the 80286 was released in 1982, one year after the IBM PC and two before the PC/AT, so it should have been designed well before Intel had any chances to realize what its long term impact could be. The sad truth is that Intel didn't had a solid plan for the evolution of the x86 architecture, the whole product line was supposed to be an afterthought Plan B of sorts that acted as a filler while Intel finished what it believed to be its next flagship product, the iAPX 432 Processor architecture. However, when the iAPX 432 finally became available, it catastrophically failed in the market. As such, Intel vision of the world would completely change when it became aware that it relied on the newfound success of its x86 Processor line to stay reelevant. Basically, the main reason why Intel began to take x86 seriously was because at that moment it had no other viable alternative, and it had to thank the success of the IBM PC and PC/AT platforms for that opportunity.
When the 80286 was deployed in new computer platforms like the IBM PC/AT, Intel should have quickly noticed that some of its big customers of x86 Processors (At least IBM and NEC, not sure how many other did the same) were going out of their way to make the new 286 do things that it wasn't supposed to do, be it either to improve backwards compatibility with the previous x86 Processors (The A20 Gate hack), or to add features that Intel didn't cared about (The 286 reset hack to return to Real Mode from Protected Mode). Around that time, Intel must have realized that the success of the x86 architecture depended on it being able to meet the future needs of any successors of the current platforms, and most important, of the Software that would run in them, which by definition included all the preexisting ones. Having learned from its mistakes, Intel efforts culminated in it doing a much better job when it designed the 80386, which fixed most of the 286 weak points and synchronized the evolution of the x86 architecture with the needs of the Software. The 386 would not only be an absolute success as a Processor, it would also be good enough to solidify the basics of the x86 ISA for almost two decades thanks to the much better long term planning. However, by then, it was already too late, as the dark ages of the 80286 had already done a lot of irreparable damage to the x86 ISA as a whole. The legacy of the 80286 CPU is the cause of both broken backwards compatibility and also broken forward compatibility, all of which had to be taken into account in future generations of x86 Processors.
The issues of the 80286 were caused not only by Intel own short sightedness when designing it, the idiosyncrasies of the Software developers of the era had a lot to do with the prevalence of those issues, too. Back in the early days, Software developers weren't shy of using, even for commercial Software, undocumented instructions (LOADALL, more about that one soon), quirks (The already covered 8086/8088 Address Wraparound quirk, reason why the A20 Gate exist), Bits marked as reserved in the Processor Registers or other type of data structures (The IBM PC XENIX 1.0 made by Microsoft is a good example of this one), or any other function of a Processor that was not part of its standard ISA. How developers found about all these things is a rather interesing topic itself, yet the point is that after a particular discovery had been made public, there were high chances that its use would proliferate. If another developer thought that the discovered nonstandard behavior was somehow useful, it could carelessly use it in its applications, completely disregarding than there were no guarantees than that behavior would be the same in the next generation of Processors. Thus, if a newer Processor didn't implement whatever non standard behavior or quirk a previous one had, there was a high risk that as soon as someone discovered that a popular application didn't work with the new Processor, the selling point of backwards compatibility would go down the drain, with all the economic consequences associated with having a less sellable product. This was precisely the case of the 80286 and the Address Wraparound quirk, and why IBM did its own effort to workaround it.
Many Software developers were at fault regarding the compatibility issues, since abusing unstandarized behavior just because it worked fine at that point of time made a clean forward compatibility strategy outright impossible to implement for both Intel and IBM. At the time where there were only IBM branded PCs with 8088s or PC/ATs with 80286s and direct clones of them, being bound to their idiosyncrasies didn't really mattered as all the computers based on these platforms were functionally identical to each other (If it wasn't, then it was either a bad clone or a mere PC-like), but as the parts used to build IBM PC or PC/AT compatible computers began to massively diversify, applications that relied on unstandard Hardware behavior became a major compatibility problem as the subtle differences in "100% compatible" parts were discovered (An interesing example is IBM own CGA Video Card, as not all CRTC chips that could be used for it behaved exactly the same than the Motorola MC6845). Is like if no developer thought that there could be newer x86 Processors that didn't include the undocumented functions or quirks of the older ones, thus not behaving in the same way that expected in an 8088, 80286, or whatever other Processor the unintended behavior was first seen. If this situation sounds familiar, is because it is. Is something pretty much identical than what happened with applications (Mainly games) that were tuned for the exact performance of the 8088 @ 4.77 MHz of the IBM PC, as described in the Turbo XT section.
All these issues were not exclusive to the Software ecosystem of the IBM PC and PC/AT platforms, it also happened in other contemporaneous platforms. For example, the Apple Macintosh platform had a major transition period when the Motorola 68000 CPU line extended its addressing capabilities from 24 to 32 Bits. The Motorola 68000 CPU that the early Macintoshs used had Address Registers of 32 Bits in size, but only 24 Bits of them were computed, as the Processor ignored the upper 8. Early Macintosh Software stored flags in these unused 8 Bits (Quite similar to what IBM PC XENIX 1.0 did with the reserved Bits of the Descriptor Tables of the 80286 MMU). The newer Motorola 68020 extended addressing to 32 Bits by actually computing the upper 8 Bits, causing a lot of applications that used the Address Registers the wrong way to implode when executed in later Macintoshs equipped with the new Processor (Note that Apple implemented a translation layer in its OSes to maintain backwards compatibility). At least on the x86 side, from the P5 Pentium onwards, Intel decided that instead of simply discouraging reserved Bits use, Software developers would be entirely sandboxed, as any attempt to write to something that they were not supposed to access would cause a GPF (General Protection Fault), forever fixing that sort of forward compatibility issues.
While making patches for the affected Software was technically possible (After all, it was the developer fault for using Hardware functions that it shouldn't), it would have been a logistical nightmare to distribute them among affected users, since the only mainstream way to do so was by mailing Diskettes, incurring in material and shipping costs. Is not like today that a Software developer can release a half broken piece of Software, then fix it via patches that anyone can download via Internet at a negligible cost. A decade later, it would become common for computer magazines to include CDs with a tons of utilitys, patches for games, applications and such, but when the PC/AT was released, that wasn't an option, either. The only viable way to mitigate the compatibility issues, was to forcibly design new Hardware that was fully backwards compatible with the old Software. Thus, what Intel learned the hard way, and a bit later than IBM (Thanks to the IBM PCjr fiasco, and perhaps also because IBM could notice first hand how 100% IBM PC compatibility was the supreme goal of clone manufacturers), is that backwards compatibility was much more than just reproducing the documented and standarized behavior of something, it should also reproduce the undocumented and unintentional one, too. What was broken in previous Processors, had to remain broken in the newer ones. As soon as Intel took backwards compatibility as serious as its main customers did, it had to make sure that the unintended behavior of any new Processor matched that of the old ones, just to support Software that blatantly ignored the recommended programming practices. But this alone was not enough, either. The workarounds and hacks that Intel customers implemented for its Processors quirks, like those that IBM included in the PC/AT due to the halfway attempt of the 80286 CPU at backwards compatibility, produced their own quirks, so Intel had to support them, too.
For example, the A20 Gate is an excellent case of a customer introduced workaround that Intel would end up having to accept as part of its x86 architecture. Thanks to the previously mentioned HMA, Intel couldn't simply go back and fix Real Mode in newer Processors by always using the 8086/8088 style Address Wraparound, as now it had to consider that there was mainstream Software that depended on the normal behavior of the 80286 A20 Address line while in Real Mode to address the extra 64 KiB used for the HMA. The only way to maintain full backwards compatibility was to support two different modes for the A20 Address line that could be changed in the fly: Address Wraparound for 8086/8088 compatibility, and treating the A20 line as normal for 80286 HMA compatibility. Intel eventually included support for that hack directly in their Processors beginning with the 80486 CPU, which had a Pin, A20M, that was exclusively used to change the Processor internal behavior of the A20 Address line. This is a bit amusing when you consider that the 80486 as a standalone CPU was more compatible with the 8088 than an earlier successor, the 80286, which was released back when 8088 compatibility was much more important. Hardware support for the A20 Gate hack was still present as late as 2014 with Intel Haswell, albeit not in physical form with a dedicated Pin anymore.
The fact that a particular behavior or function of a Hardware part is not documented doesn't necessarily means that the manufacturer is not aware about its existence. At times, not providing public documentation about something can be intentional, since it may be meant to be a confidential feature that gives an advantage to privileged developers. For example, the 80286 CPU had an infamous undocumented instruction, LOADALL, that could be used to directly write values to the Processor Registers in a way that bypassed standard validity checks. If used properly, LOADALL could bent the 286 own official rules to do useful things that were otherwise impossible (Ironically, not even LOADALL could help the 80286 to return to Real Mode from Protected Mode). While LOADALL was not officially documented, Intel did had internal documentation about it, which they provided to several privileged developers in a document known as Undocumented iAPX 286 Test Instruction. One of those developers was Digital Research, that used it for the development of a new OS, Concurrent DOS 286, a direct descendant of CP/M. With the help of LOADALL, Concurrent DOS 286 could use the 286 Protected Mode to emulate an 8088 CPU, making it able to multitask some PC DOS/MS-DOS applications.
Even if there is internal documentation about publicly not divulged features, their behavior may still not have been standarized and thus is subject to change. Sometimes, behavior changes can happen even between different Steppings (Minor or Major Revisions) of the same Processor design. That was precisely the roadblock that sealed the fate of Concurrent DOS 286. Digital Research developed that OS using early Steppings of the 80286, where its 8088 emulation trick worked as intended. For some reason, Intel decided to change LOADALL behavior in the production Steppings of the 80286, brokening the emulation capabilities of the preview version of Concurrent DOS 286. Digital Research delayed the widespread release of its new OS while it argued with Intel about the LOADALL behavior change. Eventually, Intel released the E2 Stepping of the 80286 with a LOADALL version that allowed Concurrent DOS 286 to perform again the emulation trick as intended. However, the whole affair caused the final version of Concurrent DOS 286 to be a bit too late to the OS market to make any meaningful impact, besides that it had much more specific Hardware requeriments as the emulation feature relied on a particular Processor Stepping. I don't know if the E2 Stepping LOADALL behaves like the early Steppings that Digital Research said that worked with Concurrent DOS 286, or if it is a different third variation, with the second variation being the broken one. The point is, not even something that was intentionally designed and internally documented had its behavior set in stone, and reliance on it harmed a privileged developer. Such is the unpredictable nature of undocumented things...
Another privileged Software developer was Microsoft, which also got documentation from Intel about the 80286 LOADALL instruction. Whereas Digital Research used LOADALL to achieve 8088 emulation, Microsoft was perhaps the first one to use LOADALL to address the entire 16 MiB Physical Address Space from within Real Mode, completely skipping Protected Mode (Amusingly, Intel documented both use cases in its LOADALL documentation). This trick would informally become known as Unreal Mode. As LOADALL could access the memory above the 1088 KiB boundary without having to enter Protected Mode nor needing to exit it by using the 286 reset hacks (The A20 Gate still had to be enabled and disabled on demand), it was the fastest way for a Real Mode application to use more memory. During the time period where 286 CPUs were mainstream (Late 80's to early 90's), Microsoft often used Unreal Mode in some of their PC DOS/MS-DOS Drivers. By that point of time, Unreal Mode would have already evolved further with the newer Intel 80386 CPU, as there was no need to use an undocumented instruction to enable it, and Unreal Mode itself was extended to access from within Real Mode the 4 GiB (2^32) Physical Memory Address Space of the 386 instead of only the 16 MiB (2^24) of the 286. Chances are that Intel had to keep support for Unreal Mode going forward because its usage became widespread enough that, like with the A20 Gate, it was forced to either adopt it as part of the x86 ISA or risk to implode backwards compatibility. Decades later, Unreal Mode is still missing from the official x86 ISA documentation, yet Intel acknowledge it as "Big Real Mode" in other specifications that it helped to develop like the PMMS (POST Memory Manager Specification), somewhat solidifying its status as a permanent component of the x86 ISA. This left Unreal Mode as yet another legacy of the 80286 days that still remains even on modern x86 Processors.
While typically a newer Processor has all the instructions of a previous one, the undocumented LOADALL would be specific to the 80286 ISA. As a lot of important Software used it, platforms based on later Processors had to deal with the removal of LOADALL by having the Firmware trap and emulate it to maintain backwards compatibility. Since LOADALL was extremely unique in nature, fully emulating it was pretty much impossible, yet the emulation techniques seems to have been good enough for mainstream use cases. An interesing detail is that the Intel 80386 CPU included a different version of LOADALL that is similar in purpose but not behavior than the 286 LOADALL. The 386 LOADALL wasn't as interesing as the 286 one because there weren't a lot of useful things that it could do that weren't possible to achieve via more standard means, including enabling Unreal Mode. Still, something that made the 386 LOADALL useful is that it could actually be used to fully emulate the 286 LOADALL, a method that at least Compaq used for the Firmwares of its 386 based PC/AT compatible platforms. Regardless, is possible than there is Software that uses the 286 LOADALL that fails to work in any other platform that either isn't a 286 or does not use the emulation method based on the 386 LOADALL.
Finally, to end this extremely long section, I'm sure that you may be wondering about why there are almost no mentions about the Intel 80186 and 80188 CPUs, as if everyone had simply opted to skip these Processors even when they were the direct successors to the 8086 and 8088. A little known detail about them is that they were not standalone CPUs (Nor the 80286 for that matter, since it had an integrated MMU. In other chip set families of the era, the MMU was a discrete Coprocessor), instead, they were closer to a modern SoC (System-on-Chip). Besides the CPU, they also included an integrated Clock Generator, PIC, DMAC and PIT. At first glance, this seems wonderful, but that is until you notice that the integrated functions were not 100% compatible with the discrete parts that were used in the IBM PC, like the 8259A PIC and the 8237A DMAC. Thus, designing an IBM PC compatible computer based around an 80186/80188 was harder than a standard clone based on either the PC 8088 or the PC/AT 80286. Given the almost complete lack of IBM PC compatible computers that used the 80186/80188 Processors, I'm not sure if it was possible to somehow disable or bypass the integrated functions and wire to them the standard discrete support chips as usual, or if that was a lost cause. I recall hearing about at least one 80188 computer that claimed to be IBM PC compatible, but I have absolutely no idea how good it was in practice. There was also an upgrade card with an 80186. Regardless, the reason why almost everyone skipped the 80186/80188 Processors is because they were at a horrible middle point, as you couldn't make an 100% IBM PC compatible with them, nor you could make an IBM PC/AT clone.
Regarding the capabilities of the 80186/80188 CPUs themselves, they were rather decent. The difference between the 80186 and the 80188 is the same one that between the 8086 and the 8088, the first had a 16 Bits external Data Bus while the later had a 8 Bits one. Both had a 20 Bits Address Bus, so they were limited to an 1 MiB Physical Address Space like the 8086/8088. The execution unit was vastly improved over the 8086/8088, with an average IPC more than halfway between them and the 80286 CPU. Actually, if 80186/80188 based IBM PC compatible computers had been viable, the average IPC of the 80286 wouldn't appear as ridiculous high as it was since people wouldn't be directly comparing it to the 8088. The 80186/80188 introduced new x86 instructions that were included in all later x86 Processors like the 80286, but also in the NEC V20/V30. They had no support for Protected Mode or anything related to it like the MMU and Virtual Memory, albeit at least Siemens in its PC-like PC-X and PC-D platforms coupled an 80186 with an external MMU precisely to use Virtual Memory (I failed to find info about which MMU chip these used and how it was interfaced with the 80186, as it stands out as a major oddity). While pretty much unused in the world of the IBM PC platform, these SoCs were supposedly rather successful in the embedded market.
The 80186/80188 generation biggest legacy in the x86 ISA (Instruction Set Architecture) was the introduction of a proper method to handle invalid opcodes. An opcode is a single instruction of machine code that is what the Processor actually executes and tells it what to do. Opcodes can be displayed in an human readable format as an Assembler programming language instruction, which is highly specific to a particular ISA, and has a direct 1:1 translation to machine code. As an opcode encodes both instruction and registers involved, a single Assembler instruction can be translated into several different opcodes depending on the registers that are involved in such operation. Also, a single complex instruction can produce multiple opcodes as part of a single operation. While any Assembler code to machine code translation should cover all the opcodes available in the proper, valid ISA, as otherwise the Assembler tool would reject the invalid input, it is still possible to generate opcodes that doesn't correspond to any Assembler instruction by manually inputting hexadecimal values. An invalid opcode can be considered worse than an undocumented instruction (Or single opcode) because the latter is supposed to have a defined function that isn't acknowledge by the manufacturer, whereas an invalid opcode is undefined by nature, albeit its behavior should be reproducible on the same Processor Stepping. Depending on the Processor, invalid opcodes can produce a crash, do nothing, be an alias of other documented opcodes, or do something actually useful that can't be done with documented opcodes (In such case, they may be more likely undocumented opcodes than merely invalid ones). You can read some interesing information pertaining to invalid and undocumented opcodes here (Note that it mentions other Processors besides x86).
Whenever the 8086/8088 CPUs encountered an invalid opcode, they tried to interpret it anyways since there was nothing forbidding the execution unit from doing so. Because by nature they're undefined, the only way to know what an invalid opcode does is via trial-and-error while watching the Processor state before and after its execution, something that someone already bothered to do. Some of these opcodes were reused in later x86 Processor generations, so Software relying on them would, as you expect, not work in later Processors, once again breaking forward compatibility. On the same scenario, the 80186/80188 instead triggered an Invalid Opcode Exception, allowing an Exception Handler to take over execution and do things like emulating the invalid opcode. This also forced programmers to not do things that they weren't supposed to do in the first place, which ironically is pretty much the basis for forward compatibility. Basically, if you don't want programmers using some obscure Processor function that could cause troubles later on, don't even give them the chance to use it at all.
The IBM PC/AT cast of support chips received a massive upgrade. It added a second set of 8259A PIC and 8237A DMAC that provided additional IRQs and DMA Channels, thus now having two of each chip. The 8253 PIT was upgraded to a 8254 PIT, that has a few more functions but is otherwise backwards compatible. A Motorola MC146818 RTC was added as a core piece of the platform, so that with proper OS support, you didn't need to input the Date and Time on OS startup ever again. Finally, the 8255 PPI got replaced by a full blown Microcontroller, the Intel 8042, whose main job was that of a Keyboard Controller, but actually did a multitude of roles.
While none of these chips are present in their discrete physical forms any longer, most of their functionality is still found in all modern x86 based computers. The Chipset absorbed the functionality of all the support chips except the 8042 Microcontroller, which sticked around as an independent chip for many more years until being replaced by a newer class of Embedded Controllers that were later on integrated onto the Super I/Os chips (I think that a few Chipsets existed that included 8042 functionality, too, but was not usual). Everything else would change, yet compatibility with the PC/AT support chips is a given even now, making them the IBM PC/AT greatest legacy.
2x Intel 8259A PICs (Programmable Interrupt Controller): The 8259A PIC could be easily cascaded with one or more slave 8259A, each wired to an IRQ line of the primary one. While with two 8259A PICs like in the IBM PC/AT you had 16 total IRQs, due to the cascading, the usable amount was 15. Interrupt Priority would depend on how the PICs were wired.
In the IBM PC/AT, the IRQ 2 line of the primary 8259A PIC was repurposed and used to cascade the slave PIC, which provided 8 IRQs of its own. These IRQs would be known as IRQs 8-15. If you remember correctly, IRQ 2 was directly exposed in the PC and PC/XT I/O Channel Slots, but that wasn't possible anymore in the PC/AT due to having been repurposed. Instead, the IRQ2 pin of the I/O Channel Slots was wired to the second PIC, at IRQ 9. For backwards compatibility purposes, the PC/AT BIOS Firmware transparently rerouted Interrupt Requests from IRQ 9 so that they appeared Software side as if they were made from IRQ 2. Being cascaded from IRQ 2 also impacted Interrupt Priority, as from the main PIC point of view, any incoming interrupts from the slave PIC at IRQ 2 would have higher priority than those of IRQs 3-7. The slave PIC itself had the standard 8259A Interrupt Priority, with IRQ 8 being the highest one and IRQ 15 the lowest one. Thus, Interrupt Priority in the IBM PC/AT would be non contiguous, with IRQs 0-1 being followed by 8-15 then 3-7. I suppose that it was considered a more balanced approach than having wired the slave PIC to IRQ 7 of the primary one.
From the new 8 IRQs, there were two that were for internal Motherboard use only, IRQs 8 and 13. IRQ 8 was used by the new RTC, whereas IRQ 13 was used by the Intel 80287 FPU. The FPU case is interesing as in the IBM PC, the 8087 FPU was wired to the 8088 NMI line (Along with the error reporting from the Memory Parity subsystem) instead of a standard IRQ as Intel recommended. For backwards compatibility reasons, IBM decided to give IRQ 13 the same treatment that it did with IRQ 9, as the Firmware rerouted Interrupt Requests from IRQ 13 as NMI. Something that I found quite weird is that the 80287 FPU does not generate Interrupts directly like the 8087 FPU did, instead, the 82288 Bus Controller seems to do so in its behalf with its INTA line, which passes though some glue logic before getting into the slave 8259A PIC at IRQ 13. The remaining 6 IRQs, namely IRQs 10-12, 14 and 15, were exposed in the new section of the longer expansion slots, so only new cards could use them (Does not apply to IRQ 9 as it is used as a direct IRQ2 Pin replacement, thus found in the first section of the slots). IRQ 14 was pretty much reserved for HDC (Hard Disk Controller) type class cards only.
2x Intel 8237A DMAC (Direct Memory Access Controller): Like the PIC, the 8237A DMAC supported to use multiple units in the same system. It was possible to wire one or more slave DMACs to a primary one, each wired to a DMA Channel, either cascaded or daisy chained. This also means that for each DMAC, you added 4 DMA Channels but lose one of the chip upstream of that DMAC.
A major limitation is that as DMA bypasses the CPU, it also means that you are bypassing its MMU. As such, DMA transactions involved Physical Addresses only, as it was impossible for a Device to be aware of the new Virtual Memory scheme, nor for the CPU to be aware of the independent DMA transactions. This wasn't important at a time when PC DOS reigned supreme since everything used Physical Addresses, but eventually it would become a pain for Protected Mode OSes as they required Bounce Buffers to move data from the Physical Addresses that the DMA would see to its final destination (This is what the IOMMU would solve 3 decades later, not by making Devices themselves aware of Virtual Memory, but by giving an OS a means to transparently intercept and remap Device DMA directly to Virtual Addresses).
In the IBM PC/AT, the DMAC cascading was done in a reverse way than that of the PICs. Whereas the new PIC was added as a slave to the existing one, the new DMAC was inserted upstream of the preexisting DMAC, thus making it a slave. This master-slave relationship is in truth just a matter of point of view at a platform level, since as numbered by IBM, DMA Channels 0-3 came from the slave DMAC and 4-7 from the master one. I don't know whenever it could have been possible to do it the other way around (Add the new DMA Channels with a cascaded slave controller, like the PICs) or just renumbering them, as at least it would have made a bit more sense. The discrete logic of the 4 Bits Page Register that already extended the native 16 Bits Address Bus of the 8237A DMAC to 20 Bits, was further extended to 8 Bits to support all 24 Bits of the I/O Channel Address Bus, so it was possible to perform a DMA to any of the 80286 16 MiB Physical Memory Address Space.
A surprising peculiarity is that the Address Bus of the master DMAC was wired differently than the slave. Whereas the slave DMAC was wired roughly the same way than in the IBM PC, which means being directly wired to the lower 16 Bits of the Address Bus (Pins A0 to A15) and letting the Page Registers handle the rest, the master DMAC 16 Address lines were offset by one Bit, as they weren't wired to the lowest Bit of the Address Bus and instead covered one higher Address line (Pins A1 to A16), with one Bit of the Page Registers being ignored. As the 8237A DMAC could do flyby operations that didn't required the DMAC itself to handle the data, just to set up the transfer, this gave the master DMAC the ability to do transfers of 16 Bits Words instead of standard 8 Bits Bytes, even though the DMAC was otherwise designed with 8 Bits Data Buses in mind. For the scheme to work, 16 Bits transfers required the Bytes to be aligned at even address boundaries, and addresses also had to be aligned at 128 KiB boundaries (0-128, 128-256, etc). As it worked with Words, a 16 Bits block transfer could do a maximum of 128 KiBs (65536 Words) in one go instead of 64 KiBs.
Regarding the DMA Channels themselves, from the 4 DMA Channels of the master DMAC, the only one used was DMA Channel 4 to cascade the slave DMAC. Since from the 8237A point of view DMA Channel 4 is its first channel, this also means that the slave DMAC channels had higher priority. The other 3 channels, known as DMA Channels 5-7, were exposed in the second section of the new expansion slots, and did only 16 Bits Word sized transfers. The slave DMAC still provided DMA Channels 0-3, but with a notorious change: Due to the IBM PC/AT having new dedicated logic to do DRAM refresh, there was no need to waste a DMA Channel on that. This freed up DMA Channel 0, which was also exposed in the second section of the slots. Meanwhile, the B19 Pin that previously was wired to the DACK0 of DMA Channel 0 to provide DRAM refresh for Memory expansion cards now exposed the REFRESH signal from the Memory Controller dedicated refresh subsystem, so from a Memory expansion card perspective, nothing changed.
Having both DMA Channels 0 and 1 available theorically made possible to do the famed memory-to-memory transfers supported by the 8237A, albeit it was supposed to be impractical. The master DMAC would still never be able to do them for two reasons: First, memory-to-memory transfers required to be performed on channels 0 and 1, but like in the original IBM PC, its channel 0 was used, this time to cascade the other DMAC. Second, because these transfers required the DMAC to actually manipulate the data by internally buffering it instead of doing a flyby operation, and since its buffer register was only 8 Bits wide, those would be impossible anyways.
Intel 8254 PIT (Programmable Interval Timer): As a superset of the 8253 PIT, the 8254 does all what it did plus adds a Status Read-Back function, which seems to ease reading the current status of the Counters compared to having to do so with 8253 commands.
The IBM PC/AT had the 8254 PIT 3 Counters assigned to the same tasks than the IBM PC and PC/XT, but the wiring was slighty different due to changes in the other chips and discrete circuitry. Counter 0 was used as the System Timer, with its GATE line unused and its OUT line hooked to IRQ 0 of the master 8259A PIC. Counter 1 took care of the DRAM refresh timer, with its GATE line unused and its OUT line wired to the new discrete DRAM refresh circuitry (Part of the Memory Controller circuitry). Counter 2 was used to drive the PC Speaker, with its GATE line coming from discrete logic now managed by the new Intel 8042 Microcontroller (As replacement for the Intel 8255 PPI) and its OUT line wired to the PC Speaker circuitry.
Motorola MC146818 RTC (Real Time Clock): A RTC is a specialized Timer that is intended to track the human concept of time, instead of just counting arbitrary clock cycles like other generalist Timers do. Because the speed that the Timers count clock cycles is directly related to the clock speed than they are running at, RTC chips also define in their specifications the Clock Generation scheme that should be used for them to work as intended. As such, you can consider RTCs to be factory calibrated to tick every second (Or a fraction of it) if implementing them correctly. RTCs usually have a calendar so that they can keep track of the full Time and Date, and is common for them to be part of a circuit that can operate with an external battery so that they remain powered and ticking while the rest of the system is off, as to not have to set the current time on every power on.
The Motorola MC146818 RTC should have been among the earliest RTC chips (I don't know if there are any other single chip RTCs before it), and provided a set of features that would become standard for this class of Devices. It supported tracking Time and Date, had alarm support, and could signal Interrupts. It had built-in support to use either a 32.768 KHz, 1.048576 MHz and 4.194304 MHz reference clock. It also included a built-in 64 Bytes SRAM, which is where the current Time and Date and alarm settings were stored. From those 64 Bytes, 14 were used for the standard chip functions, and 50 were free for user data. Curiously enough, the MC146818 RTC supports a mode where the RTC functions can be disabled to free up 9 more SRAM Bytes (The chip Registers are directly mapped on the SRAM thus they can't be disabled), giving a total of 59 Bytes for user data. This mode was supposed to be for systems that have two or more MC146818 RTCs so that only one serves in the RTC role while the others are treated as mere memory. I have no idea if someone ever used them that way, nor if from a cost perspective it made sense compared to standard SRAM chips.
The IBM PC/AT introduced the RTC as part of the base platform. The RTC circuitry used both standard Motherboard power and also an external battery, the idea being that the battery only supplied power when the computer was powered off. Thanks to the battery, the RTC 64 Bytes SRAM became NVRAM (Non-Volatile RAM) in nature, which IBM decided to put to use to store the Motherboard Firmware configuration settings. From the 50 Bytes of SRAM available to user data, IBM used 24 to store BIOS settings (IBM even documented which address stored which setting), and the other 28 were marked as reserved. The RTC Interrupt line was wired to the slave PIC, at IRQ 8 (Its first IRQ).
Intel 8042 UPI (Universal Peripheral Interface) Microcontroller: A Microcontroller is pretty much a chip that is more highly integrated than a SoC as it includes Processor, RAM, ROM, GPIO and maybe other built-in peripherals, yet it is minimalistic in specifications, being a sort of swiss army knife. Microcontrollers are used by any type of electronic or electromechanical devices that requires a very small embedded computer performing tasks that are essencial but which don't demand a lot of computing power.
The Intel 8042 UPI was part of the same MCS-48 family than the previously described 8048 used in IBM Model F Keyboards. Compared to it, it had slighty better specifications, but also had a slighty different Instruction Set, thus it couldn't run all 8048 Software (Not all 8048 instructions were supported, yet 8048 machine code not using those would run in the 8042). The 8042 included an 8 Bit CPU, Clock Generator, Timer, 2 KiB ROM and 256 Bytes RAM, and at least 16 GPIO Pins (I'm not really sure how many of the other Pins could be repurposed for GPIO), organized as two 8 Bit Ports. The contents of the built-in ROM had to be factory programmed. It also had an 8 Bit external Data Bus while for addressing there is a single line known as A0 to select which of its two Ports will be accessed, and an optional mode where 12 of the GPIO Pins could be used for a 12 Bits Address Bus, providing a 4 KiB Address Space (This matches the maximum supported external ROM size, as there are other models based on the 8042 design that don't have built-in ROM and instead fully rely on an external ROM chip). The Bus protocol of the 8042 was quite compatible with that of the Intel 8080/8085 CPUs, thus it could interface directly with support chips from the MCS-85 family, just like the 8086/8088 CPUs.
Besides the 8042, there were several pin and functionally compatible variants. The most important one is the 8742, whose main difference with the 8042 is that the former used EPROM instead of ROM for its internal Firmware, making it user programmable, assuming that the EPROM was blank. It had two major variants, one being OTP (One Time Programmable), useful for low volume products that only had to write the Firmware once in its lifetime (It would have been easier to program small numbers of those chips than requesting a custom 8042 batch from Intel), and the other had a crystal window so that you could erase the EPROM with ultraviolet light and write to it again, making it reusable. There was also the 8242 series, which had multiple models that came with commercial Firmwares in their ROMs. An example is the 8242BB, which had an IBM Firmware that was usable for PC/AT compatible platforms (I have no idea whenever IBM licensed a generic Firmware, or if it is the same than one used for its own systems. Nor I have idea about why these chips existed on the first place... Maybe IBM decided to try to get some extra cash from clone manufacturers?).
In the IBM PC/AT, the 8042 replaced the 8255 PPI used by the IBM PC. Whereas the 8255 PPI provided pretty much just dumb GPIO interfaces that had to be operated by the main Processor, the 8042 capabilities also allowed it to do some processing itself, thus it is more akin to a polyvalent auxiliary controller. While the 8042 had a multitude of roles, like providing host managed GPIO interfaces for the A20 Gate and the 286 reset hack (These could have been implemented with the 8255 as they're dumb GPIO, too), its most known role is just being used as a full fledged Keyboard Controller. In such role, the 8042 entirely replaced the discrete Keyboard circuitry of the IBM PC and the associated 8255 PPI GPIO interface, as it could do the serial-to-parallel data conversion, signal Interrupts and interface with the I/O Channel Bus all by itself, thus the Keyboard Port could be directly wired to it. In order to keep backwards compatibility with IBM PC Software, the Keyboard function was mapped to the same I/O Port than the 8255 PPI Port A was, Port 60h. However, that is pretty much where backwards compatibility with the 8255 PPI ends.
Since the IBM PC/AT had no Cassette interface and the Motherboard had just a few Jumpers, its GPIO requeriments were lesser than the previous Motherboards, which is quite convenient as the 8042 had 8 less GPIO Pins than the 8255 PPI. Because the exposed configuration Bits on the second 8042 Port were different enough, IBM should have decided to map it somewhere else, as to not risk any conflicts with applications that blindly manipulated the other 8255 PPI Ports. As such, the second 8-Bit Port was mapped to I/O Port 64h, jumping over the I/O Ports 61h and 62h, which is where 8255 PPI Ports B and C used to be mapped to, respectively. The reason why the mapping jumped 4 addresses was because the IBM PC/AT wired the A0 Pin used by the 8042 to choose Port to the XA2 Address line of the External I/O Address Bus, whereas had it been wired to XA0 as usual, the mapping would have been continuous, thus it would have been predictably found at Port 61h. While for the most part Ports 61h and 62h readable Bits are PC or PC/XT specific, there is a notorious exception involving the PC Speaker, as the PC Speaker used two Bits from Port 61h (More precisely, the one that managed GATE 2 input on the 8254 PIT and the Speaker data). The IBM PC/AT still had the PC Speaker and it was programmed in the same way as to remain backwards compatible with IBM PC Software, but I'm not sure who responds to Port 61h requests, as the 8042 is completely unrelated to it now. I suppose that some discrete glue logic may be involved.
It is easy to make a full roundup of the GPIO Pins of the 8042 as used by the IBM PC/AT. In total, there were 16 GPIO Pins organized as two 8-Bit Ports known as Port A and B, which managed the Pins known as P10-17 and P20-27, respectively. P10-13, P22-23 and P25 were Not Connected, for a total of 7 completely unused GPIO Pins. The Keyboard interface used only two Pins, P26 and P27 for Clock and Data, respectively, which were wired to the Keyboard Port. Related to the Keyboard, P24 was wired to the master PIC to generate Interrupts on IRQ 1, and P17 was used for the long forgotten Keyboard Inhibitor (The IBM PC/AT Computer Case had a physical Kensington Lock and a matching key that toggled this function). P21 was used to manage the infamous and already covered A20 Gate. P20 was wired to the Reset circuitry, and was used by the 286 reset hack. P16 was used to read the SW1 Jumper to choose between MDA or CGA Video Cards. P14 was used for the J18 Jumper, to set whenever there were one or two populated RAM Banks on the Motherboard. Last but not least, P15 wasn't Not Connected but I don't understand what it is supposed to do since the schematic doesn't wire it anywhere, either. Also, you can check MinusZeroDegrees Keyboard Controller Chip diagram to see most of the auxiliary functions of the 8042 together, albeit it is a bit incomplete given my coverage.
As the 80286 CPU extended the Physical Address Space from 1 MiB to 16 MiB, there had to be a new Memory Map to organize it. As can be seen in Page 1-8 of the IBM PC/AT 5170 Technical Reference (March 1984), the PC/AT Memory Map is, for obvious backwards compatibility reasons, built upon what was on the IBM PC one. As such, you will instantly recognize it.
As you would expect, the organization of the first MiB was mostly the same as previously. While the Conventional Memory covering the 0 to 639 KiB range with system RAM would be forever the same, the UMA saw a few changes. The first 128 KiB chunk (640 KiB to 767 KiB) remains intended for video framebuffers, just like in the PC/XT, and is perhaps the only one that was consistent since the original PC. The range reserved for Option ROMs in expansion cards has been reduced from 192 KiB in the PC/XT to just 128 KiB (768 KiB to 895 KiB). Meanwhile, the range for Motherboard ROMs was extended from the 64 KiB of the PC/XT to a whopping 128 KiB (896 KiB to 1023 KiB). This range was the one that growed the most, as the original IBM PC had just 48 KiB (40 KiB installed plus the empty U28 Socket for an extra 8 KiB ROM) for Motherboard ROMs, which got extended to 64 KiB in the IBM PC/XT (Either only 40 KiB or the full 64 KiB were used, depending on Firmware version), then doubled on the IBM PC/AT, both times at the expense of Option ROMs. Note that the PC/AT actually had only 64 KiB of ROM memory installed, but it had two empty sockets for a second pair of 32 KiB ROM chips that were mapped and ready to use, just like the original IBM PC with its U28 Socket.
The things that are completely new are obviously those above the 1024 KiB range, courtesy of the 80286 CPU. IBM defined the range between 1 MiB to 15 MiB as Extended Memory, which would be used to map system RAM from a new type of Memory expansion cards (The PC generation cards didn't had the Address Bus lines to map themselves above 1 MiB). At the very end of the Physical Address Space, there was a 128 KiB chunk (16256 KiB to 16383 KiB) that mirrored the 128 KiB of Motherboard ROMs located at the end of the UMA. I don't really know what purpose this mirroring would serve other than a possible attempt to remove the Motherboard ROMs from the UMA in a future platform. Finally, the range between 15 MiB and 16255 KiB (16 MiB - 128 KiB) is not actually mentioned in the Technical Reference, not even marked as reserved or anything, so I prefer to call it undefined. So far, so good, as this covers the basic IBM PC/AT Memory Map at the time of release.
There are a whole bunch of things related to the PC/AT Memory Map that belongs to its own section since they were later additions that belonged to the DOS ecosystem and not part of the original IBM platform definition, which is what I'm covering here. The only one worthy of mention now is the HMA (High Memory Area). Some years after the IBM PC/AT release, the first 64 KiB of the Extended Memory (1024 KiB to 1087 KiB) would become known as the HMA, as it was different from the rest of the Extended Memory since it could be accessed from within Real Mode (Related to the 80286 not behaving like the 8088 with its Address Wraparound quirk, which the A20 Gate hack workarounded), but otherwise the HMA was simply a subset of it.
One thing that became noticeable in the IBM PC/AT Memory Map is the concept of Memory Hole. The Conventional Memory and the Extended Memory address ranges are both used to map system RAM, yet they are not contiguous in nature because the UMA is between them. Because it was not possible to relocate the UMA elsewhere without breaking IBM PC compatibility, any application than could use the full 16 MiB Physical Address Space had to be aware of the fact than the system RAM was composed of two separated RAM pools, which complicates things when compared to a single continuous unified address range. Sadly, I don't have specific examples to quantify how messier it was, but consider than the first version of the IBM PC Firmware was supposed to support noncontiguous Conventional Memory, yet that feature was removed. Eventually, Virtual Memory would solve Memory Hole issues since it would abstract the Memory Map details from user Software and simply present it with a completely uniform environment, so it didn't mattered for anyone but the OS developer how ugly the physical Memory Map of the platform truly was.
The physical dimensions of the IBM PC/AT Motherboard depends on the Motherboard version. As Type 2 and 3 Motherboards were identical except for the use of a higher Frequency Crystal Oscillator and higher binned chips, there are only two physically different versions to be considered, Type 1 and 2/3. The dimensions are a bit of an issue since the IBM PC/AT 5170 Technical Reference appears to contradict itself: The March 1984 version, which covers only the Type 1 PC/AT Motherboard, says that its dimensions were 12' x 13', or 30.5 cm x 33 cm, yet the March 1986 version specifically says than the Type 1 Motherboard size was 12' x 13.8', or 30.5 cm x 35 cm, making it 2 cm longer. This difference seems to be because the PC/AT Motherboards are not standard rectangles but had a slighty different shape, so it may be a minor correction as if the initial measurement was made from the shortest side. Either way, it was gargantuan in size compared to those of the PC and PC/XT. For the Type 2/3 Motherboards, the March 1986 Technical Reference says that they measured 9.3' x 13.8', or 23.8 cm x 35 cm, making it 6.7 cm narrower than Type 1. As always, MinusZeroDegrees has info and photos about this.
For internal expansion, not counting the expansion slots, the PC/AT Motherboards had an empty socket for the optional Intel 80287 FPU, and the previously mentioned two empty sockets a pair of optional 32 KiB ROM chips. Type 1 Motherboards had 36 sockets for RAM chips, of which at least half of them (A RAM Bank) had to be populated, while Type 2/3 had only 18 sockets since they used RAM chips with twice the capacity. The Intel 80286 CPU and the two 32 KiB ROM chips for the BIOS Firmware and IBM Cassette BASIC came socketed, albeit there were limited replacements for them. The Motherboards also had an internal header for the Case mounted PC Speaker, now wired to the newer Intel 8254 PIT.
The PC/AT Motherboards, like those of the PC/XT, only had a single external I/O connector, the Keyboard Port. However, there is a major difference between it and the one of the PC and PC/XT: While the physical size and pinout of the Keyboard Port was the same as them, the Keyboard protocol was different. You could plug an older Keyboard to the PC/AT, but it would not work. The main difference between the old Keyboard protocol and the new one was that the former was unidirectional, as the Keyboard could send data to the IBM PC but not receive anything back, whereas the new one was bidirectional, so the Keyboard Controller could send commands to the Keyboard if it wanted to do so. If you ever saw in a Firmware an option to toggle the NumPad default status as On or Off, is precisely thanks to this bidirectional protocol.
Ironically, the 8042 of the PC/AT supported an alternate mode to use the PC Keyboard protocol, but due to the fact that for the BIOS, not detecting a PC/AT compatible Keyboard during POST was a fatal error, the only way to use this mode was to boot with a PC/AT Keyboard, use a command to switch the 8042 to PC Keyboard protocol mode, disconnect the PC/AT Keyboard, then plug in the PC Keyboard (Actually, hotplugging the Keyboard was not supported. You were not supposed to disconnect the Keyboard while the computer was on, and there was a real risk of Hardware damage since back then, the Keyboard Port was not designed in a way that gracefully handled hotplugging). I have no idea why IBM left this feature half done, as selecting the Keyboard protocol could have been done via a Jumper on the Motherboard given than the 8042 had many unused GPIO Pins. Later third party Keyboards had switcheable modes to use either the PC or the PC/AT Keyboard protocol (Typically known as "XT mode" and "AT mode"), and could work in either platform. The PC/AT Keyboard protocol is still being used today, just that the Keyboard Port connector mutated to the smaller yet pin compatible Mini-DIN-6 format with the IBM PS/2 (This is the reason why AT-to-PS2 and PS2-to-AT passive adapters always worked so well).
The IBM PC/AT Computer Case is reelevant enough to deserve its own paragraph, compared to those of the PC and PC/XT that had pretty much absolutely nothing worthy of mention. The new Case introduced what IBM called the Control Panel, nowadays known as the Front Panel. There were two activity LEDs, one for computer Power and the other for HDD activity, and it also had a physical Kensington Keylock with a matching key. The Power LED and the Keylock were plugged via a cable to the same internal header array in the Motherboard. The Power LED doesn't seem to be connected to any circuitry that controlled it, it just received dull power to point out that the Motherboard was receiving it from the Power Supply. The Keylock was wired to Pin P17 of the 8042 Microcontroller, which toggled the Keyboard Inhibitor function. When the Keyboard Inhibitor was turned on, the 8042 didn't listened to the Keyboard (And some OSes extended that to the Mouse), something that served as a primitive form of security against unauthorized users with physical access to the computer. Finally, the HDD LED wasn't plugged to the Motherboard itself, instead, its cable was plugged to a header on a new combo FDC/HDC expansion card that came with the PC/AT (I literally spend HOURS looking around for info about this one, and so far, none of the IBM documentation I checked that deals with disassembly and assembly of the PC/AT or of the particular expansion card seems to explicitly point out that the PC/AT Case HDD LED is plugged in the J6 header of that card. I do not know if I didn't checked well enough or IBM actually omitted to explicitly mention it). Also, IBM still didn't bothered to implement a Reset Button.
As the previous Keyboards were unusable in the IBM PC/AT, IBM had to release a revised version of the Model F Keyboard that used the new Keyboard protocol. These revised Model F Keyboards were the first to implement the 3 status LEDs for Numpad Lock, Caps Locks and Scroll Lock that nowadays you see in almost all Keyboards. The PC/AT compatible Model F Keyboards used the same Intel 8048 Microcontroller as Keyboard Encoder than the previous Model F based units for the PC, but the contents of its internal ROM were different as they had to implement the PC/AT Keyboard protocol and the controls of the new status LEDs.
Beginning in 1985, IBM released the famous Model M Keyboards. These Keyboards used a Motorola 6805 Microcontroller, that had its own embedded CPU, RAM, ROM, Clock Generator and GPIO like its Intel counterparts, but was based on a different ISA. Model M Keyboards had a Ceramic Resonator that provided a 4 MHz reference clock for the 6805, which could run at either 1/2 or 1/4 the clock input (Either 2 or 1 MHz), albeit I'm not sure how it was configured in the Model M. Due to the extreme amount of Keyboard submodels, even after some googling I don't have clear enough if all Model F are Intel 8042 based and if all Model M are Motorola 6805 based. Regardless, what matters is that the Keyboard protocol could be implemented with either Microcontroller. Learning the implementation details are reelevant just because you learn to respect more the Keyboards when you notice that they were far more complex that they initially appear to be. As a bonus, you can read here for the evolution of the Keyboard Layout, covering both the original PC Model F, the PC/AT Model F, and the Model M.
As can be seen in the System Board Block Diagram in Page 1-6 of the IBM PC/AT Technical Reference, the Buses of the IBM PC/AT had almost an identical layout to those of the original IBM PC, being inmediately recognizable as wider versions of them. These Buses still were the Local Bus, the I/O Channel Bus, the Memory Bus and the External I/O Channel Bus.
Local Bus: Like always, the Local Bus interconnected the main Processor with the support chips that could be directly interfaced with it. As the new Intel 80286 CPU had a demultiplexed Bus with entirely dedicated lines instead of multiplexed ones like in the 8086/8088 CPUs, only the new 286 specific support chips, namely the 82288 Bus Controller and the optional Intel 80287 FPU, could be directly wired to it. The Intel 8259A PIC that was present in the Local Bus of the IBM PC got kicked out because it only supported the multiplexed Bus of the previous Processors, leaving only three chips in the Local Bus instead of four. As the main Processor dictates the Bus width, the Local Bus was obviously extended for the 286 16 Bits Data Bus and 24 Bits Address Bus.
I/O Channel Bus: Being a direct extension of the Local Bus, the I/O Channel Bus was widened along with it. It was still separated from the Local Bus by some buffer chips, albeit compared to the IBM PC, their task should have been simplified since demultiplexing the output and multiplexing the input was no longer required. The I/O Channel Bus did mostly the same thing that it did previously: It interconnected the Local Bus with the DRAM Memory Controller, the External I/O Channel Bus, and extended up to the expansion slots, where it was exposed for the expansion cards to use. Exposing the wider I/O Channel Bus required a new type of slot with more Pins to expose the new Bus lines, and to use them you obviously needed a new type of cards.
The IBM PC/AT also moved the ROM memory from the External I/O Channel Bus into the main I/O Channel section. Since the ROM chips had just an 8 Bits wide Data Bus, IBM decided to do the same thing that it did for the RAM memory and organized the ROM chips as Banks (A Bank uses multiple chips accessed simultaneously in parallel to match the width of the Data Bus), making a 16 Bits ROM Bank using two 8 Bits ROM chips. You can check MinusZeroDegrees BIOS ROM chips - Diagram 2 to see how it was supposed to work.
A rarely mentioned detail is that the PC/AT actually had two ROM Banks: One had two 32 KiB ROM chips with the Firmware and the IBM Cassette BASIC, ocuppying the standard 64 KiB mapping for them (960 KiB to 1023 KiB) as defined by the PC/XT. The other Bank had two ROM sockets, known as U17 and U37, that were mapped (896 KiB to 959 KiB) and ready to use but otherwise unpopulated. Their usefulness was roughly the same as the empty U28 Socket in the IBM PC 5150: It was only an extremely obscure and niche feature that was rarely, if ever, used. As far as I know, it could be used for an optional Option ROM, and it was possible to insert some type of writeable ROMs and program them without the need of an external programmer (Albeit you had to open the Computer Case). I'm not sure if it was possible to only install and use a single ROM chip, but assuming ht was possible, doing so would have required some type of parser Software to write or read it correctly. Either way, good luck trying to find someone that actually used these ROM sockets.
Memory Bus: The Memory Bus, which interconnected the RAM Banks with the DRAM Memory Controller present in the I/O Channel Bus, was also extended to match the 80286 CPU external Data Bus capabilities. IBM keep using DRAM chips with an 1-Bit external Data Bus, so in order to match the 16 Bits Data Bus of the 80286, the PC/AT had to use 16 DRAM chips per Bank. However, like in the PC and PC/XT, IBM implemented Parity for RAM memory error detection, which took two more DRAM chips. Thus, there were 18 1-Bit DRAM chips per RAM Bank (Twice that of the 9 DRAM chips per Bank in the PC and PC/XT), effectively making the Data Bus part of the Memory Bus 18 Bits wide. Additionally, while the Memory Controller was still made out of discrete logic, it was complexer than the PC and PC/XT one as it could refresh the DRAM by itself without needing to waste a DMA Channel, albeit it still used an 8254 PIT Counter and its OUT line.
Depending on the PC/AT Motherboard version (Type 1 or Type 2/3, respectively), there were either two 256 KiB RAM Banks (18 DRAM chips of 16 KiB per Bank), for a total of 512 KiB of usable RAM, or a single one that used DRAM chips of twice the capacity (32 KiB) to get the same total RAM size (Note that this didn't was a downgrade from the IBM PC/XT since the 256KB - 640KB Motherboard was released around two years after the PC/AT, so both the PC and PC/XT Motherboards of the era maxed out at 256 KiB RAM). The Motherboard RAM was mapped as Conventional Memory, as usual. Additionally, the Type 1 Motherboard had a Jumper, J18, that could be used to disable the mapping of the second RAM Bank, thus the Memory Controller only responded to addresses for the first 256 KiB RAM. This allowed you to use Memory expansion cards with RAM mapped in the 256 to 512 KiB address range, should for some reason using such card be more convenient than populating the second RAM Bank 18 DRAM chips.
If you wanted more RAM, you had to use Memory expansion cards. All the PC/AT Motherboards could use cards that mapped 128 KiB RAM into the 512 to 640 KiB range to max out the Conventional Memory. Also, something that I'm not entirely sure is if 8 Bits Memory expansion cards for the IBM PC worked in the PC/AT. I would suppose that they should, but accessing their RAM would come at a high performance penalty, so even if they worked, it would be highly undesirable to use them for Conventional Memory.
External I/O Channel Bus: The External I/O Channel Bus had quite major changes. To begin with, it was harder to interface old support chips of the MCS-85 family intended for the 8085 CPU 8 Bits Data Bus with the 80286. Whereas in the IBM PC the External I/O Channel Bus had the same Data Bus width than everything else, the 80286 had a wider 16 Bits Data Bus, with means the same issues than the 8086 had. As such, the glue logic that separated the External I/O Channel Bus from the main I/O Channel Bus also castrated its Data Bus width to 8 Bits. Additionally, the address decoding logic was vastly improved so it now fully decoded the 16 Bits of the I/O Address Space instead of only 10 Bits.
As the IBM PC/AT made significant chances to the support chips, the External I/O Channel Bus also had a rearrange of denizens. Now you've got 2 8237A DMACs, 2 8259A PICs, the upgraded 8254 PIT, the MC146818 RTC, and the 8042 Microcontroller, and some discrete logic that managed the now orphaned I/O Port of the PC Speaker. The ROM chips with the Firmware and IBM Cassette BASIC were moved to the main I/O Channel Bus.
The IBM PC/AT, as can be seen in the Clock Generation diagram of MinusZeroDegrees, had a much more elaborated Clock Generation scheme than that of the IBM PC and PC/XT. Whereas they derived all their clocks from a single reference clock source, the PC/AT had three Crystal Oscillators and three Clock Generators. A major difference when comparing the PC/AT with the Turbo XT platforms is that the clock speed of everything was fixed, as the PC/AT didn't had two or more selectable speed modes like the Turbo XTs did.
The first Clock Domain was used by almost everything in the system, and involved the Intel 82284 Clock Generator and a Crystal Oscillator whose Frequency depended on the PC/AT Motherboard version. Type 1/2 Motherboards had a 12 MHz Crystal Oscillator, whereas the later Type 3 Motherboards had a 16 MHz one. As such, derived Frequencies varied accordingly. The 82284 only provided two output clock lines, CLK and PCLK, instead of the three of the previous generation Intel 8284A CG.
82284 CLK: The CLK (System Clock) line was the most important one. Its clock speed was equal to that of the reference clock (Like the 8284A OSC line), which means either 12 or 16 MHz depending on the Motherboard version. It directly provided input clock for the 80286 CPU, the 80287 FPU and the 82288 Bus Controller. However, whereas the 8088 CPU ran at the Frequency of the input clock, the 80286 CPU and the 82288 Bus Controller internally halved it, so their effective clock speed was either 6 or 8 MHz. The 80287 FPU was a bit special since it could run either at the input clock line Frequency, or at 1/3 of it. In the PC/AT case, the 80287 FPU was hardwired to run at 1/3 of the input clock, so with a 12 or 16 MHz CLK line, it effectively ran at either 4 or 5.33 MHz, respectively.
The 82284 CLK line also passed though two discrete clock dividers that halved it, providing two separate 6 or 8 MHz lines, one that served as input for the Intel 8042 Microcontroller, which ran at the same clock speed than the input clock (Its built-in CG went unused), and a SYSCLK line that was used for almost everything else, including the I/O Channel Bus and also the expansion cards, as it was exposed in the I/O Channel Slots as the CLK Pin. Moreover, while not shown in the MinusZeroDegrees diagram, the SYSCLK line was used to derive yet another line for the two DMACs. That line passed though another discrete clock divider that halved it to provide either a 3 or 4 MHz input for the DMACs, making the I/O Channel DMA subsystem slower than the IBM PC one (This should have been the easiest way to deal with the fact that the Intel 8237A DMAC chip never had factory binned models that ran higher than 5 MHz, a problem that the Turbo XTs had to deal with by either binning the DMAC chips, or underclocking the system to 4.77 MHz whenever a BIOS Service wanted to use DMA).
82284 PCLK: The PCLK (Peripheral Clock) line was simply CLK/2. For reasons that I don't understand, it went completely unused. Why IBM decided to use discrete clock dividers to derive SYSCLK instead of directly using the 82284 PCLK line is not something I know.
The second Clock Domain involved a secondary Intel 8284A Clock Generator and a 14.31 MHz Crystal Oscillator. As explained in the Turbo XT sections, some chips like the 8253/8254 PIT had to run at a specific Frequency as to not screw up the timming for applications that relied on these. The PC/AT was among the first IBM PC compatible computers that had to deal with this issue, and IBM solution, later adopted by everyone else, was to decouple the system wide clock into two clock domains. Since the 8284A CG is the same one used in the IBM PC, it also works in the same way, providing three derived clock lines as output: OSC, CLK and PCLK.
8284A OSC: The OSC (Oscillator) line passthroughed the reference clock input, so it ran at 14.31 MHz. As in the IBM PC platform, it wasn't used internally by the Motherboard but instead was exposed as the OSC Pin in the expansion slots, pretty much just for CGA Video Cards.
8284A CLK: The CLK (System Clock) line was OSC/3, thus 4.77 MHz like in the IBM PC. In the IBM PC/AT, this line went unused.
8284A PCLK: The PCLK (Peripheral Clock) line was CLK/2, thus 2.38 MHz. It was halved with the help of a discrete clock divider to provide the 1.19 MHz required by the 8254 PIT, as usual.
Finally, the third Clock Domain was extremely simple: There is a 32.768 KHz Crystal Oscillator that serves as input for a Motorola MC14069 Hex Inverter (It is not explicitly a Clock Generator. Since I have near zero electronics knowledge, I don't know what it is supposed to be or do). Its output is a 32.768 KHz line, used by the Motorola MC146818 RTC. This clock domain will also become a very common sight as the 32.768 KHz for the RTC would become as ubiquitous as the 14.31 MHz OSC and the 1.19 MHz clock line for the PIT. Note than the MC146818 RTC could also work with 1.048576 MHz and 4.194304 MHz reference clock inputs, but I have no idea why 32.768 KHz was chosen, nor if there was any advantage of using the other ones.
Regarding Wait States, the PC/AT still used asynchronous RAM, but the DRAM chips for it had to be rated for a 150 ns access time or faster. This included even the Type 3 Motherboard with the 80286 @ 8 MHz, so it seems that the initial 150 ns DRAM chips were overrated for 6 MHz operation. However, IBM also decided to introduce a Memory Wait State. I'm not sure why IBM did that, as the fact that the same 150 ns DRAM chips could keep up in the PC/AT 8 MHz version seems to point out that IBM was perhaps a bit too conservative with the RAM subsystem in the Type 1/2 Motherboards. Regardless if it was necessary or not, all the PC/AT Motherboards had a fixed 1 Memory WS. Measured in time, according to the IBM 5170 Technical Reference, the 286 @ 6 MHz had a clock cycle time of 167 ns with a Bus cycle of 500 ns, and the 286 @ 8 MHz had a clock cycle time of 125 ns with a Bus cycle of 375 ns. These Bus cycle values for Memory operations included 2 clock cycles, which were the fixed Bus cycle of the 80286, plus 1 Memory WS (For reference, the 8088 had a Bus cycle of 4 clock cycles. This also helped with the lower instruction execution latency of the 80286).
Meanwhile, the I/O subsystem of the IBM PC/AT got a bit more complex since it had to deal simultaneously with both the new 16 Bits expansion cards and the old 8 Bits ones. To maintain compatibility with the old cards, a lot of Wait States were added to operations that involved 8 Bits Devices. Moreover, the IBM 5170 Technical Reference at Page 1-7 mentions that the PC/AT further separates I/O operations to these 8 Bits Devices into two different types: 8 Bits operations to 8 Bits Devices, which had 4 I/O WS, and 16 Bits operations to 8 Bits Devices, which had 10 I/O WS (Somehow the page that mentions those values is missing 16 Bits operations to 16 Bits Devices...). These brutal amount of Wait States should have been added to try to be as compatible as possible with the older I/O Channel Cards for the original IBM PC, as those were designed expecting only an 1050 ns I/O Bus cycle time (8088 @ 4.77 MHz plus 1 I/O WS). Compatibility between these cards and the PC/ATs with the 286 @ 6 MHz should have been rather high, since for 8 Bits operations, the two 167 ns clock cycles of the 80286 Bus cycle plus 4 I/O WS is equal to an 1000 ns Bus cycle, which seems to be a difference small enough compared to the expected 1050 ns. For 16 Bits operations, 2 clock cycles plus 10 I/O WS is equal to 2000 ns, exactly twice that of 8 Bits operations. Something that I'm not sure about is how 16 Bits operations to 8 Bits Devices were supposed to work, perhaps there was a discrete buffer chip to split a single 16 Bits operation into two 8 Bits ones, which makes sense considering than the Bus cycle time of 16 Bits operations is twice that of 8 Bits ones.
A notorious problem is that the amount of I/O Wait States were still the same for the 8 MHz PC/AT version, which means that with a 125 ns clock cycle time, you get 750 ns and 1500 ns Bus cycle times for 8 Bits and 16 Bits operations, respectively. Basically, at first, with the first version of the PC/AT, IBM tried to compensate with high I/O Wait States to maintain compatibility with the PC and PC/XT expansion cards, but by the time of the 8 MHz PC/AT models, it seems that IBM didn't cared about these cards enough to readjust the amount of I/O Wait States. It was very likely that a first generation I/O Channel Card for the IBM PC worked in the 6 MHz PC/AT, but not as likely that it worked in the later 8 MHz one. The cause of this was the exact same one that caused card compatibility issues across different speeds of Turbo XT based systems, as the closest thing to a standard was compatibility with the original IBM PC (Or now, with one of the two versions of the PC/AT), but there was nothing above that nor a rating system that could tell an end user about what was the lowest Bus cycle latency than the card was designed to work reliably at.
In the case of the new 16 Bits expansion cards, I'm not sure what the default amount of I/O WS was supposed to be since the IBM 5170 Technical Reference doesn't explicitly mentions it (It should be a 16 Bits operation to a 16 Bits Device). However, completely related to that, the PC/AT repurposed one Pin from the I/O Channel Slots, B8, to use for a new line known as 0WS (Zero Wait States). Expansion cards intended for the PC/AT could use the 0WS line to completely bypass the I/O Wait States, albeit I have no idea if all the 16 Bits cards did so.
If you have good memory, you may remember that I mentioned that the PC/XT uses the B8 Pin for the CARD SLCTD (Card Selected) line used by the infamous Slot 8. As such, there was a risk that older 8 Bits cards hardwired for Slot 8 could make contact with the repurposed line if plugged into the PC/AT and misbehave as a result. Due to the fact than the B8 Pin was marked as reserved in the original IBM PC 5150, and that the Slot 8 weird behavior was never popular to begin with (The PC/AT was released just one year after the PC/XT and whatever Slot 8 special purpose was supposed to be, it was already dropped), I'm not sure if there actually exist cards that could have issues when plugged in the PC/AT due to the different behavior of that Pin (The cards that supported Slot 8 typically did so as an alternate mode selectable via a Jumper). It could also be possible that there were 8 Bits cards intended for the PC/AT that misbehaved in the PC/XT because they were expecting the 0WS line, but this seems a rare occurence since anyone making a card that could fit in either the original IBM PC or PC/XT should have thought about the possibility than that card was used in those computers, too. I don't really know if there were demonstrable compatibility issues or edge cases caused by the repurposed Pin.
To expose the wider I/O Channel Bus (16 Bits Data Bus and 24 Bits Address Bus) plus the additional IRQs and DMA Channels, the IBM PC/AT required a new expansion slot type with more Pins for them, and obviously new cards that used those. The new 16 Bits I/O Channel Slot was physically backwards compatible with the previous 8 Bits slot from the IBM PC since it merely added to the existing slot a second section for the new Pins instead of redesigning the entire slot, so you could physically plug an IBM PC expansion card into the PC/AT. Doing it the other way around by plugging a new 16 Bits card into an 8 Bits slot while leaving the second section connector hanging was also possible assuming that there was no physical obstacle, albeit it only worked if the card supported an 8 Bits mode. Note that not all PC era cards worked properly in the PC/AT for the same reason that they didn't in Turbo XTs, some simply couldn't handle the higher Bus clock speeds. For IRQs and DMA Channels, the new 16 Bits I/O Channel Slot exposed all the ones available in the old 8 Bits slots, namely IRQs 2-7 (Note that IRQ2 was wired to IRQ9) and 8 Bits DMA Channels 1-3, plus the new additions of IRQs 10-12, 14-15, 8 Bits DMA Channel 0, and 16 Bits Channels 5-7. In total, the new slots exposed 11 IRQs, four 8 Bits DMA Channels, and three 16 Bits DMA Channels.
All the PC/AT Motherboards had 8 I/O Channel Slots for expansion cards, but they were of two different types: 6 were of the new 16 Bits type, and two of them were of the older 8 Bits type, physically the same as those of the PC and PC/XT. I find very intriguing that if you see photos of the PC/AT Motherboards, the two shorter 8 Bits Slots had extra solder pads like if at some point it was intended for those to be full fledged 16 Bits Slots. As far that I know, 8 Bits and 16 Bits Devices could be freely mixed for as long that their MMIO address ranges were not mapped into the same 128 KiB address block (In the UMA, this means three 128 KiB blocks: 640 to 767 KiB, 768 to 895 KiB, and 896 to 1023 KiB. Each block had to be either all 8 Bits or all 16 Bits). I'm not entirely sure if this also means that it is possible to use older 8 Bits Conventional Memory expansion cards for the IBM PC in the 512 to 639 KiB range.
There was a huge varity of expansion cards for the IBM PC/AT, some of them made by IBM itself and eventually adopted by others. The most prominent one was the new 16 Bits multifunction FDC + HDC card, which used the ST-506 interface for HDs like the previous PC/XT HDC. I'm not entirely sure if this card came with all PC/ATs or only those that came with a HD, but the latter seems improbable cause I couldn't find a FDC only card for the PC/AT. The FDC + HDC card supported up to two internal Diskette Drives and one internal Hard Drive, or one Diskette Drive and two Hard Drives. It didn't had a Port for external Diskette Drives like the IBM PC FDC, and since you couldn't use two FDC cards nor it made sense to downgrade to the IBM PC FDC, this effectively capped the PC/AT platform to just two Diskette Drives.
Talking about PC/AT with no HDs, those models include one of the most amusing hacks I ever saw. Since the Power Supply required a certain load to turn on and a bare PC/AT without HD wasn't enough for its expected minimum load, IBM decided to use a 50W resistor to plug into the computer. As far that I know, this should make the PC/AT with no HD models the first true space heater-computer hybrids, even before the Intel Pentium 4 Prescott and AMD Piledriver-based FX 9000 series!
The other major type of expansion cards were obviously the Video Cards. Initially, Video Cards included only MDA and CGA, as they were the only ones available at the release date of the PC/AT. Some months later, IBM released the EGA (Enhanced Graphics Adapter) and PGC (Professional Graphics Controller) Video Cards. EGA prevailed for a while before VGA superceded it 4 years later, yet it left an important legacy due to its allocation in the Memory Map and the specialized type of Option ROM that it introduced, the VBIOS. To pitch EGA, IBM even developed an obscure tech demo known as Fantasy Land to pitch EGA. The PGC, as impressive as it was as a piece of Hardware at the time, left no legacy, thus is pretty much irrelevant.
The original IBM EGA Video Card came with 64 KiB RAM to use as framebuffer, and could be expanded to 256 KiB RAM using an optional daughterboard populated with a ton of DRAM chips (Amusingly, Fantasy Land required the EGA card be fully geared). However, there was a major problem: A framebuffer of that size was far bigger than the 128 KiB range that IBM reserved in the UMA for such purpose. Extending the range to 256 KiB was impossible as it would leave pretty much no room for Option ROMs, and relying on the 286 16 MiB Physical Address Space would make EGA totally incompatible with Real Mode Software and the still reelevant PC, so it wasn't a viable option, either. To access all of the EGA framebuffer, IBM had to resort to map it indirectly via Bank Switching, as getting those 256 KiB as straightforward MMIO was not possible.
Bank Switching is a means to indirectly map more memory than the Physical Address Space would allow if done directly, at the cost of not being able to access all the memory at the same time. It works by partitioning the full pool of memory that you want to map into chunks known as Banks, then reserving an address range to use as a Window, which is where the Processor would see them. With the help of a specialized Memory Controller, you can tell it to map at a given moment either a single Bank of the same size than the window or multiple smaller Banks (Depending on the implementation details, obviously), then tell it to switch which Banks are mapped when you want to access more memory. In the case of EGA, its 256 KiB framebuffer was partitioned as 16 Banks of 16 KiB each while the entire 128 KiB block in the UMA reserved for video framebuffers (640 KiB to 767 KiB) was used as a MMIO window, thus the Processor was able to simultaneously see 8 Banks. By switching which of the EGA 16 Banks you wanted to see in the mapped window, you could indirectly access all of its 256 KiB framebuffer.
If you have good memory, maybe you already figured out that the EGA framebuffer completely overlapped both MDA and CGA ones. IBM considered that, as the original EGA Video Card also supported an alternate mode where it just used a 64 KiB window (640 KiB to 703 KiB) for 4 Banks so that it didn't overlapped with the fixed location of the previous Video Cards framebuffers, a mode that should have been useful in case that you wanted to use dual displays with EGA + MDA or EGA + CGA.
Is notorious than the PC/AT had no built-in Firmware support for the EGA Video Card like it had for MDA and CGA, instead, the EGA card had its own 16 KiB Option ROM that the BIOS could load to initialize it. This Option ROM with the Video Card Firmware would become known as the VBIOS. Is quite important to mention that the VBIOS is a special type of Option ROM, as it is mapped to a specific location (The EGA VBIOS was mapped to the 768 KiB to 783 KiB range) than the BIOS would check very early in the POST process, as it was extremely useful to get the computer main video output working as soon as possible so that the BIOS can display error codes on screen if something goes wrong (For reference, in the last BIOS version of the IBM PC 5150, it checks if there is a VBIOS available several steps before any other Option ROM, actually, even before it fully test the Conventional Memory). Pretty much every Video Card that had to be initialized as primary output during POST would have a VBIOS.
Regarding the PGC, the card itself was really, really complex, and priced accordingly. The PGC was made out of three PCBs sandwiched that took two adjacent slots with PC/XT spacing (Actually, it couldn't fit in the original IBM PC because the slot spacing was different), had an Intel 8088 CPU all for itself, a ROM with Firmware, 320 KiB DRAM to use as framebuffer, and 2 KiB SRAM that was simultaneously mapped in the host UMA (792 KiB to 793 KiB) and in the PGC 8088 so that it could be used as a buffer to pass data between both the host CPU and the card CPU. It also had CGA emulation capabilities. To use the PGC full capabilities you were expected to use a specialized IBM Monitor. Due to the extremely scarse info about the PGC is hard to say anything else, albeit the few comments of users and programmers that had to deal with it seems to point that it was extremely powerful at the time.
The final type of major expansion cards were the Memory expansion cards. Whereas with the PC and PC/XT the Memory expansion cards were rather simple since they just had RAM that was going to be mapped into some address range inside the 640 KiB Conventional Memory, in the PC/AT, thanks to the 286 16 MiB Physical Address Space, RAM memory could also be mapped above 1024 KiB. The RAM mapped over the old 8088 1 MiB Physical Address Space boundary became known as Extended Memory, as would the cards that supported mapping RAM to ranges above 1024 KiB. Technically, both Conventional Memory and Extended Memory are system RAM, just that the former is mapped to a range that can be accessed in Real Mode and works exactly as expected by applications intended for an IBM PC with an 8088, while the latter requires dealing with all the issues described in the section explaining the 80286 Protected Mode (Or any other alternative like using the 64 KiB HMA, or LOADALL. All them are reliant on proper management of the A20 Gate, too), so they are treated in two completely different ways.
Some Extended Memory cards had an address decoding logic flexible enough that allowed mapping part of the card RAM into the Conventional Memory range. This was known as backfilling. For example, a 2 MiB Extended Memory card could be inserted into a PC/AT Type 1 Motherboard with only 256 KiB RAM installed, then configured to backfill 384 KiB in the 256 to 640 KiB range so that you could max the 640 KiB Conventional Memory, then the remaining 1664 KiB would be mapped into the 1024 KiB to 2687 KiB range as Extended Memory. Like always, the user had to be careful to make sure that there was no overlap if using multiple cards and that the Extended Memory mapping was contiguous, something that would require to deal with Jumpers or DIP Switches on the cards themselves. It was not necessary to fully fill the 640 KiB of Conventional Memory to use the Extended Memory. Albeit it may be redundant to mention, due to the fact that addressing above 1024 KiB required more than the 20 Bits Address lines exposed by an 8 Bits I/O Channel Slot, only 16 Bits I/O Channel Cards could provide Extended Memory, since these implemented the full 24 Bits Address lines of the 80286.
Another type of Memory expansion cards that appeared on the market around 1986 were those that added Expanded Memory. Expanded Memory worked conceptually the same way that the already explained EGA Video Card: It reserved a 64 KiB window in the UMA, then used Bank Switching to indirectly map either one 64 KiB or four 16 KiB Banks into it. Expanded Memory cards became somewhat popular since these didn't rely on any of the PC/AT exclusive features (Including those of the 80286 CPU) and thus could work in an IBM PC or PC/XT for as long that the application supported it. The Expanded Memory cards required a specialized API to use it, and eventually the Extended Memory would get its own API, too, beginning the nightmare that was DOS Memory Management...
There isn't much to say about the functionality of the PC/AT Firmware itself, the BIOS did the same basic things that it used to do, and added a few more BIOS Services on top of the previous ones to support the new PC/AT platform Hardware changes. The most important change was in how the BIOS was configured. Compared to the PC and PC/XT, the PC/AT Motherboards had a dramatically reduced amount of configurable stuff that required physical interaction, with just a Jumper to set the main Video Card type, and, in Type 1 Motherboards only, a Jumper to select between 256 or 512 KiB RAM installed on the Motherboard. Nearly everything else became a Software configurable setting that the BIOS would read during POST. The PC/AT took advantage of the fact that the Motorola MC146818 RTC had 50 Bytes free of SRAM that thanks to being battery backed could be used as an NVRAM (Non Volatile RAM) to store the BIOS Settings. This also gave birth to the Clear CMOS procedure: If you wanted to force a reset of the BIOS Settings, you had to cut the battery power for the RTC, so that the SRAM would lose its contents (Including the Date and Time, which was the primary purpose of the RTC).
The PC/AT also introduced the BIOS Setup so that you could change the BIOS Settings stored in the RTC SRAM, but it was not accessed in the way that you know it. Instead of being stored in the Motherboard ROM itself, the BIOS Setup was a traditional application that came in a booteable Diskette known as Diagnostics for the IBM Personal Computer AT, so you had to boot with it in order to change the BIOS Settings. As a last resort, it was possible to use the built-in IBM Cassette BASIC to edit the BIOS Settings, but you had to know exactly what you were doing. Some years later, third party Firmware vendors released customs BIOSes for the PC/AT delivered as a pair of ROM chips so that you could replace the IBM ones with the standard Firmware. These had a built-in BIOS Setup accessed via a Key Combination during POST, something that may sound familiar to you.
One of these amusing trivial details is that even 35 years ago, there used to be Hardware enthusiasts that liked to tinker with their Hardware. A popular mod for the PC/AT was to replace its 12 MHz Crystal Oscillator for a 16 MHz one (Like the one used by the Type 3 Motherboards), which typically worked well because the early PC/AT models were very conservatively clocked and most chips could take the overclock. IBM didn't liked that power users were toying with its business Hardware, so to make sure that their PC/ATs were not getting overclocked, in a later PC/AT BIOS version, IBM added a speed loop test during POST that failed if the 286 was running above 6 MHz. The Type 3 Motherboard had another BIOS version that revised the speed loop test to support 8 MHz operation. If you were overclocking, avoiding these speed loops was a good reason to use a third party Firmware.
The PC/AT also included a new version of its main OS, PC DOS 3.0, which could use some of the new computer capabilities (Mainly the RTC for time keeping) and introduced the FAT16 File System. The convenience of having a built-in battery backed RTC was enormous, as in the PC and PC/XT, you had to type in the Date and Time everytime you turned on the computer (Unless you had an expansion card with a RTC and OS Drivers for it. RTC cards were a popular addon for the original IBM PC), whereas in a PC/AT with PC DOS 3.0, the RTC worked out of the box. You could also use older versions of PC DOS, assuming you didn't mind their limitations.
Besides the three PC/AT Motherboards types and the multiple PC/AT computer models based on them, there was another computer that is usually considered part of the IBM PC/AT series. In 1986, IBM launched the IBM PC/XT 5162 Model 286, which has a misleading name since it is actually based on the PC/AT platform. While the computer was fully PC/AT compatible, it had both minor and major differences that are reelevant to the PC/AT section.
The PC/XT 5162 Motherboard was completely new. It had 8 I/O Channel Slots, 5 were for 16 Bits cards and 3 for 8 Bits ones, a slight downgrade from the PC/AT (One of the 8 Bits slots had extra solder pads, as if it could have been a 16 Bits slot). It measured 8.5' x 13.2', or 22 cm x 33.8 cm, making it slighty smaller than the PC/AT Type 2/3 Motherboard. This size is highly important because it would eventually become the basis for the Baby AT Motherboard Form Factor, making the PC/XT 5162 quite reelevant in the history of the PC platform evolution.
The clock generation scheme of the PC/XT 5162 was very similar to the Type 1/2 PC/AT Motherboards with the 286 @ 6 MHz, but there were two major differences: It had 0 Memory Wait States instead of 1, making it perform faster than these PC/ATs, albeit it was slower than the Type 3 with the 286 @ 8 MHz. It used 150 ns DRAM chips like all the PC/AT models, pretty much confirming that IBM was a bit too conservative with the 1 Memory WS in the original 6 MHz PC/AT.
The other difference involves the entire 80287 FPU clocking subsystem. The 287 FPU could be configured to run at either the reference clock input or 1/3 of it. In the PC/AT, it was hardwired to run at 1/3, whereas in the PC/XT 5162 it instead was hardwired to run at clock input speed. However, it is not wired to the 82284 CG 12 MHz CLK line like it was in the PC/AT but to the secondary 8284A CG 4.77 MHz CLK line that in the PC/AT went unused, so 4.77 MHz is its effective speed. This also made the FPU to run fully asynchronous from the CPU. Not sure why IBM made these changes at all.
The PC/XT 5162 Motherboard could max out the 640 KiB of Conventional Memory without requiring Memory expansion cards. However, it had a completely asymmetrical arrangement: It had two Banks, the first was 512 KiB in size and the second one a smaller 128 KiB. Filling the first Bank was mandatory, but the latter could be disabled via a Jumper. The 512 KiB RAM Bank wasn't made out of discrete DRAM chips on the Motherboard like all the previous computers, instead, it had two 30 Pin SIMM Slots, each fitted with a 9 Bits 256 KiB SIMM Memory Module. These SIMMs were made out of nine 1-Bit 32 KiB DRAM chips each and included Parity. The second 128 KiB RAM Bank was made out of DRAM chips socketed on the Motherboard, as usual. However, this Bank had an even more asymmetrical arrangement, as it used four 4-Bit 32 KiB DRAM chips plus two 1-Bit 32 KiB DRAM chips for Parity (The sum of that was 18 Bits for the Data Bus, as expected, but technically most of the RAM in the Parity chips should have gone unused since they were much bigger than what was required).
I'm not sure if the PC/XT 5162 was the first PC based computer that had its RAM installed in SIMM Slots. At first, SIMMs were usually seen in very high density Memory expansion cards only, they took a year or two before being used in the Motherboards themselves. In the early days, SIMMs used the same DRAM chips that used to be socketed on the Motherboard itself but in a much more convenient format as they took far less Motherboard space than a ton of individual DRAM chips sockets, yet there wasn't any actual functional difference.
The two empty ROM sockets for the optional ROM Bank than the PC/AT Motherboards had isn't present on the PC/XT 5162 Motherboard. I'm not sure if the 64 KiB range in the PC/AT Memory Map reserved for those are free in the PC/XT 5162 Memory Map or not. At least a source I recall said that the Firmware and IBM Cassette BASIC are mirrored there (Perhaps unintentionally caused by partial address decoding...), so that range may not be free at all.
The IBM PC/XT 5162 used an actual PC/XT 5160 Case, so it had no Control Panel (The two LEDs and the Keylock). I'm not sure if the Motherboard had a leftover header for the Power LED or if it was removed since it wasn't going to be used anyway. However, the PC/XT 5162 came by default with the same HDC + FDC multifunction card that was used by the IBM PC/ATs, so the header for the HDD LED should still be there, but unused.
During the lifetime of the IBM PC/AT, a major issue arose that multiple generations of users had to learn how to deal with: Everything related to DOS Memory Management. From the launch of the IBM PC/AT onwards, this topic would slowly become a convoluted mess, particularly during the 386 era, where its complexity skyrocketed due to new Processor features that added even more workarounds to deal with this problem. Knowledge of DOS Memory Management techniques would remain necessary even by the late 90's, as people still used DOS for applications and games that weren't Windows 9x friendly (Or simply due to better performance, since executing them from within a Windows environment added a not insignificant overhead). It took until Windows XP became the most common mainstream OS for this topic to stop being reelevant, after which it would be almost completely forgotten.
The issues with 286 DOS Memory Management are directly related to the unusual requeriments of the 80286 CPU to be able to use its entire Physical Address Space combined with the PC/AT platform idiosyncrasies, like its Memory Map. As you already know, the 80286 CPU used in the IBM PC/AT had a 16 MiB Physical Memory Address Space, a significant upgrade over the 1 MiB one of the 8088 CPU used in the IBM PC. While the extra 15 MiB that the IBM PC/AT could address should have been good enough for several years, in order to use them as Intel intended when it designed the 80286, Software had to be running within Protected Mode, whose usage had a multitude of cons that were already detailed.
To recap Protected Mode cons, first of all, Software that relied on it couldn't run on the IBM PC 8088 at all, significantly reducing the amount of potential customers of such products unless the developer also provided a dedicated IBM PC port of its application, something that would require more developing resources than just making a single Real Mode version that could easily run in both platforms. Moreover, the mainstream Software ecosystem made use of the DOS API from PC DOS to read and write to the FAT File System, and the BIOS Services from the BIOS Firmware to do the role of a Hardware Driver, both of which were usable only from within Real Mode. A developer that wanted to make a Protected Mode application without getting support from a Protected Mode environment would have had to reimplement absolutely everything from scratch, similar to a PC Booter for the IBM PC but worse, since those could at least rely on the BIOS Services to deal with the Hardware. Albeit it was still possible to make a Protected Mode DOS application that could use the DOS API and the BIOS Services by using the 286 reset hack to return to Real Mode, it was slow and cumbersome to do so (Keep in mind that while Hardware support for both Unreal Mode and resetting the 80286 CPU via a Triple Fault was present in the IBM PC/AT since Day One, the techniques to use them were not discovered or massified until much later. Unreal Mode required knowledge about how to use LOADALL, something that only privileged developers had access to during the 80's, making it effectively unavailable. The 286 Triple Fault technique was discovered and patented early on by Microsoft, so even if public it was risky to use, yet chances are that most developers didn't knew about it back then, either).
On top of the Physical Memory Address Space you have the Memory Map, which defines how that address space is intended to be assigned to try to cover all the stuff that has addressing needs, like RAM and ROM memories, the memory of other Devices to use as MMIO, etc. For backwards compatibility reasons, the IBM PC/AT Memory Map had to be built on top of the original IBM PC one.
To recap the basic IBM PC Memory Map, IBM defined it by subdividing the 8088 1 MiB Physical Address Space into two segments: A 640 KiB range between 0 to 639 KiB known as Conventional Memory, intended to be used exclusively for system RAM, and a 384 KiB range between 640 KiB to 1023 KiB known as UMA (Upper Memory Area), intended for everything else, including ROMs like the Motherboard BIOS Firmware, Option ROMs in expansion cards, and MMIO, like the Video Card framebuffer. For the IBM PC/AT Memory Map, in addition to keeping what was in the IBM PC one, IBM defined that the new address space above 1024 KiB and up to 15 MiB would be known as Extended Memory, intended to be used for more system RAM (The last 128 KiB were used to mirror the Motherboard ROM, and the remaining 896 KiB between the end of the Extended Memory and the beginning of the ROM mirror were left undefined. But these are not important). An issue caused by this arrangement is that the system RAM was no longer a single continuous chunk, since the presence of the UMA between the Conventional Memory and the Extended Memory left a memory hole, so Software that directly interacted with the physical memory had to be aware of that (This is something that Virtual Memory greatly simplifies, as it abstracts these details from user applications).
Some years after the IBM PC/AT release, the first 64 KiB of the Extended Memory (1024 KiB to 1087 KiB) would become known as the HMA (High Memory Area), as it was different from the rest of it since the HMA could be accessed from within Real Mode (Related to the 80286 not behaving like the 8088 with its Address Wraparound quirk, which the A20 Gate hack workarounded), but otherwise it was simply a subset of the Extended Memory. So far, so good, as this covers the basic IBM PC/AT Memory Map.
Due to the complexity of actually putting the extra address space to use, for a few years after the IBM PC/AT release, only niche high end applications that really required more memory took care of all the mandatory hazzle to use the RAM mapped into the Extended Memory address range, the vast majority of mainstream Software just keep using pure Real Mode and PC DOS as the lowest common denominators, both for full IBM PC compatibility and to ease development. This means that in a PC/AT, while mainstream applications could execute faster, they were still constrained by the 640 KiB RAM limit of the Conventional Memory, which would quickly become a severe limitation. I think that during the years that the IBM PC/AT was top of the line, the only popular mainstream DOS Software that used Extended Memory were RAMDiskses, as the virtual disk contents could be stored into the Extended Memory with little risk than it would get thrashed by PC DOS or another application since these were mostly Real Mode only (Remember that there was effectively no Memory Protection). The RAMDisk Driver resided in Conventional Memory and took care of switching Processor modes to access the Extended Memory when requested to do so, then restored the Processor state.
Perhaps one of IBM most notorious shortsightedness events was precisely that it didn't took seriously enough the importance of pushing for a Protected Mode OS for the IBM PC/AT as soon as possible, as transitioning to a Protected Mode Software ecosystem early on could have saved us from the chaos that was DOS Memory Management during the 90's. The big irony is that just a few months after the IBM PC/AT release, a viable Protected Mode OS appeared in the form of IBM PC XENIX, a Xenix port for the IBM PC/AT that had FAT File System support, fully replacing any need for the DOS API, the BIOS Services, or Real Mode itself for any new applications that targeted it. However, since Xenix was aimed at serious enterprise users, it was priced far above PC DOS, which means that applications for Xenix would have an even lower potential customer base than an IBM PC/AT running PC DOS. In addition to that, Xenix itself also required more resources than PC DOS, so users that didn't made use of any of Xenix advanced features would certainly have their applications running slower than on PC DOS due to the higher OS overhead for no tangible benefit.
I believe that such transition could have been done anyways since during the middle of the 80's, IBM was powerful enough that it should have been possible for it to force the entire PC/AT Software ecosystem to adopt either Xenix or another new Protected Mode OS even if it was at the cost of compatibility, like Microsoft did when it pushed Windows XP, which during the early days had a lot of compatibility issues with Windows 9x and DOS Software. As even at the 1984 release date of the IBM PC/AT you could partition a HD to install multiple OSes, power users could have survived the transition perhaps with no compatibility loss, as they could have been able to Dual Boot both PC DOS when requiring an IBM PC compatible environment and a Protected Mode OS for PC/AT applications. Sadly, IBM didn't made any serious attempt to transition the Software ecosystem to Protected Mode until 1987 with OS/2, but by then, it was already too late. With application memory requeriments increasing yet no mainstream Protected Mode OS in the horizon to use the enhanced 80286 addressing capabilities, it seems logical to expect that such situation created the need for stopgap measures that allowed applications to use more memory from within PC DOS. The problem relies on the fact that those stopgap measures lasted for far longer than expected, and directly increased PC DOS longetivity at the detriment of better thought alternatives that fixed the issue directly from its root...
In 1985, Lotus, Intel and Microsoft teamed up to create a hack that allowed for more memory to be used from within Real Mode, also done in a way that conveniently made it usable in the original IBM PC. This hack was known as Expanded Memory (Not to be confused with Extended Memory). It also had an associated API, EMS (Expanded Memory Specification).
Expanded Memory was physically implemented as a new type of Memory expansion cards that had a Memory Controller capable of doing Bank Switching, similar in nature to the one in EGA Video Cards. Initially, these cards had 256 KiB, 512 KiB or 1 MiB capacities. The way that Expanded Memory worked was by reserving an unused 64 KiB address range in the UMA to use as a window, then mapping to it a specific 64 KiB block of RAM, known as a Page Frame, from the Expanded Memory card. For example, a card with 512 KiB Expanded Memory would be partitioned into 8 Page Frames, each with 64 KiB RAM. By switching which Page Frame was visible in the UMA window at a given time, it was effectively possible to use more RAM, although at a notorious complexity and overhead cost, since the application had to keep track about which Page Frame had which contents, then switch on demand between them. A later generation of Expanded Memory cards that came with a new version of the EMS specification allowed to subdivide a compliant card RAM into 16 KiB blocks, so that four Page Frames could be mapped at a given time in the UMA window instead of only a single 64 KiB one. The reserved 64 KiB window for Expanded Memory located somewhere in the UMA would become a common sight in future PC and PC/AT Memory Maps, as it became ubiquitous enough.
Since in order to properly do Bank Switching the Memory Controller had to be managed, the Expanded Memory cards always required Drivers. To hide this from the application developers, an API was defined, the previously mentioned EMS, which allowed the Software developer to rely on it to access the Expanded Memory instead of having to manually program the Memory Controllers themselves. This was quite important, as there were multiple Expanded Memory cards manufacturers whose Hardware implementations were different, so using the EMS API provided a very convenient Hardware Abstraction Layer so that applications didn't had to include Drivers for each card.
Sometime around early 1988, Lotus, Intel and Microsoft teamed up again, now also along with AST, to develop another API: XMS (Extended Memory Specification). XMS was conceptually different than EMS since it didn't required to be paired with new Hardware, instead, it focused on managing the existing Extended Memory via a new Driver known as the XMM (Extended Memory Manager).
What the XMS API did was merely to standarized how a Real Mode DOS application could use the Extended Memory by delegating the responsability of switching Processor modes, exchange data between the Conventional Memory and the Extended Memory, then restoring the Processor state, to the XMM Driver. Basically, an application that supported XMS just had to use its API to ask the XMM Driver to move data to or from the Extended Memory, then let it take care of everything else, significantly easing application development since the developer had no need to meddle with all the mandatory hazzles required to access the Extended Memory from within a mostly Real Mode environment. Executable code was expected to remain in the Conventional Memory since the XMS API just moved RAM contents around, leaving the Extended Memory as a sort of secondary storage just for data, so the 640 KiB Conventional Memory limit was still important.
The first XMM Driver was probably Microsoft HIMEM.SYS, which also doubled as an A20 Handler. The first version of HIMEM.SYS that was distributed should have been the one included in Windows 2.1 (Windows/286), released around the middle of 1988. In 1991, a newer version of HIMEM.SYS was included in PC DOS/MS-DOS 5.0, so that XMS was usable in a pure DOS environment (Keep in mind that while it worked on the original 1984 IBM PC/AT, it isn't really era appropiated, as by then, these early 286s were rather ancient). A very interesing detail is that the Microsoft XMM Driver was much faster than it should have been under normal circunstances, and that is because Microsoft used all its knowledge about the Intel Processors undocumented functions to cheat. Many versions of HIMEM.SYS relied on LOADALL and its Unreal Mode to access the Extended Memory, completely bypassing the standard way of doing things that included entering Protected Mode then resetting the Processor. Basically, any application that used XMS under DOS in a 286 based computer transparently benefited from the performance hacks involving Unreal Mode.
To surprise of no one, clone manufacturers didn't rest in their laurels. Right after mastering how to clone the PC, they began to work on how to clone the PC/AT. The early PC/AT clones should either slighty predate or be contemporaries of the earliest Turbo XTs (Not PC-likes), and be both faster and more expensive than them. The PC/AT clones would almost inmediately evolve into Turbo ATs (A term that is rarely used), which were faster than the system that they were to be compatible with.
Turbo ATs would eventually further evolve with the introduction of the first Chipsets, as they would dramatically alter the platform topology. This also highly reduced the number of variety in original Motherboard designs, as the Chipset based Motherboards tended to consolidate their feature set around a specific Chipset characteristics, thus a vendor had to really go out of its way to do something different enough. Turbo XTs would eventually get Chipsets, too, but apparently slighty later.
The first PC/AT clone was the Kaypro 286i, announced at the end of February 1985 and available in March, a mere 7 months after the first IBM PC/AT release. It had some redeeming features over the IBM PC/AT, like supporting 640 KiB RAM mapped as Conventional Memory installed in the Motherboard itself as in the later IBM PC/XT 5162 instead of just 512 KiB as in the PC/AT. Note that the 80286 @ 6 MHz was of a cheaper soldered variant instead of a socketed one, but otherwise performed the same. Other than that, it didn't did anything different enough to be remarkable.
In July 1985, 4 months after the Kaypro 286i and almost a year after the original IBM PC/AT, Compaq released its DeskPro 286. What made it interesing, is that it had an Intel 80286 CPU that could run at two operating modes, one known as the Slow Mode that worked at 6 MHz for full compatibility with the original IBM PC/AT, and a Fast Mode that ran at 8 MHz (Note that it predates IBM own 8 MHz PC/AT model). Unlike most Turbo XT systems, the DeskPro 286 didn't had a Turbo Button, instead, it relied on either a Key Combination, Ctrl + Alt + \ or the MODE SPEED command in MS-DOS. Since it explicitly mentions two different clock speeds, I suppose that it has two different Crystal Oscilators like Turbo XTs and switches between them on demand. As I'm not aware whenever there was an earlier system with such feature, the DeskPro 286 may be the first PC/AT compatible that can be classified as a Turbo AT.
A major difference between Turbo XTs and Turbo ATs is what they were intended to be compatible with. As you already know, a lot of early PC Software was tuned expecting only an IBM PC with its 8088 CPU @ 4.77 MHz, thus they didn't work as intended in faster computers like Turbo XTs or PC/AT based ones. This exact issue also happened with some PC/AT Software, as at least a few applications were tuned specifically for the IBM PC/AT (Usually related to those pesky copy protection schemes...), failing to work properly in anything faster than it. However, when it comes to PC/AT Software, there was an additional hurdle: Which IBM PC/AT version an application was tuned for? Whereas both the PC and PC/XT had identical performance, there was an early 6 MHz PC/AT and a later 8 MHz model, plus the PC/XT 5162 that ran at 6 MHz but with 0 Memory WS. The performance of these three PC/AT platform variants was different. As all PC/AT Software released before 1986 would only be expecting the original 6 MHz IBM PC/AT model, it was possible that some of these very early applications would not work as intended even in IBM own PC/AT variants. Later PC/AT compatible manufacturers noticed this and implemented user configurable clock speed and Memory WS, so that their PC/AT clone computers could perfectly match any of the three IBM PC/AT variants performance levels, resulting in the Turbo ATs being at times more compatible with older PC/AT Software and expansion cards than the later IBM PC/AT 8 MHz models themselves, as IBM never bothered to implement a feature similar to Turbo.
An example of a 1987 Turbo AT that covered the full PC/AT compatibility spectrum is the AST Premium/286, which had an 80286 CPU that could run at either 6 MHz @ 1 Memory WS, 8 MHz @ 2 Memory WS (Albeit based on a page that list the Motherboard Jumpers it seems that it is actually 8 MHz @ 1 Memory WS, which makes sense as otherwise it wouldn't match the 8 MHz IBM PC/AT), or 10 MHz @ 0 Memory WS. Getting a 286 to run at 10 MHz @ 0 Memory WS was quite an accomplishment, and it made it slighty faster than another contemporary 286 PC/AT compatible system running at 12 MHz @ 1 Memory WS. However, it required new and expensive 100 ns DRAM chips mounted in a custom, very long Memory expansion card that used a propietary connector, known as FASTram, that was pretty much a standard 16 Bits I/O Channel Slot followed with another separate section at the end of the card (This is roughly the same thing that VESA Local Bus Slots would do years later). The Motherboard had two slots with the FASTram extension, so you could use two of these custom Memory expansion cards. Finally, like the Compaq DeskPro 286, the AST Premium/286 didn't had a Turbo Button, it changed clock speed via a Key Combination, yet for convenience, the Case Front Panel had 3 LEDs to indicate at which clock speed the Processor was currently running at (Changing Memory Wait States required to deal with Motherboard Jumpers). You can read the experiences and bragging rights of the AST Premium/286 Motherboard designer here, at The AST 286 Premium section (Worthy of mention is that it is the same guy that wrote the Assembly code of the original Phoenix BIOS).
Something that I don't remember hearing about is of any PC/AT compatible that could go lower than 6 MHz to try to have better compatibility with early PC Software, as a 286 @ 3 MHz may have been not much above the PC 8088 @ 4.77 MHz performance level and thus could have been far more usable for things like games. Considering that, as explained elsewhere, perfect PC compatibility was impossible due to the 286 not being cycle accurate with the 8088, it makes sense that PC/AT compatible manufacturers didn't bothered adding a way to slow down their newer flagships systems since it was a lost cause anyways (Ironically, at least an 80386 based system, the Dell System 310, could be underclocked to 4.77 MHz. Perhaps Dell added such option because, as you already know, the 4.77 MHz Frequency is easy to derive). Actually, compatibility with early PC Software would crumble rather fast after the PC/AT generation, as no one going for the high end segment of the PC/AT compatible market would event attempt bothering to try to support these any longer.
The good thing is that at the end of the PC/AT generation (More likely after the release of the 8 MHz IBM PC/AT), there was a paradigm shift about how the Software developers took care of implementing any type of timers or timmings in their applications. Instead of outright relying on speed loops or any other bad practices that worked at that moment but killed forward compatibility, Software developers began to be aware that they needed to be more careful as to not make things that are hardly usable on faster computers, as by that point it had already become quite obvious that the PC/AT platform was going to have faster versions of the same base platform, or successors that were going to be mostly backwards compatible. This paradigm shift is why late 80' Software is much less problematic on faster computers from the late 90's compared to many of the early Software that pretty much forced you to use a PC class computer. There are some notorious exceptions like the 1990 Origin Systems game Wing Commander, which could play faster than intended on some era accurate 80386 systems, depending on the Processor clock speed and external Cache configuration.
In general, timming issues and bad programming practices would still be present for a long time, but they took many years to manifest instead of the very next computer model, as happened during the early days. Perhaps the most famous timming issue of the 90's was the Windows Protection Error when trying to boot Windows 95 on AMD Processors over 350 MHz, as it was a pest for mainstream users that in many cases forced the upgrade to Windows 98. This bug was recently researched by OS/2 Museum, which noticed both a divide by zero and division overflow bugs in a speed loop code that relied in the LOOP instruction, then came to the conclusion than the reason why AMD Processors were affected but Intel ones were not is because on an era accurate AMD K6-II, the LOOP instruction executed with a mere 1 clock cycle latency, whereas on an Intel Pentium Pro/II/III it took 6 clock cycles, thus it would trigger the bug at a much lower clock speed than otherwise (In much later Intel Processors that can finally match the K6-II speed on that particular routine, the bug is reproducible, too). Every now and then other ancient timming issues pop out when trying to run older pieces of Software in newer computers, like this early Linux SCSI Controller Driver, but none of these are as notorious as the Windows 95 one, nor their cause is as esoteric in nature.
Finally, while I'm not sure about the exact time frame (It could be as late as 1988), an extremely important feature that everyone considers a given nowadays began to appear in either PC/AT clones or Turbo ATs. Third party BIOS Firmware vendors like Phoenix Technologies and AMI decided to integrate the BIOS configuration application into the Firmware ROM image itself, giving birth to the modern BIOS Setup. By integrating this critical tool, if you ever wanted to configure the BIOS Settings, you could use a Key Combination during POST to enter the BIOS Setup, without having to worry about preserving the Diskette that came with the system, which could be damaged or become lost. Some Firmware vendors even offered third party BIOSes for the original IBM PC/AT that included an integrated BIOS Setup, which were delivered as a pair of ROM chips to replace the standard IBM ones (These Firmware images should also work in 100% compatible PC/AT clones, too). At the time this integration was a minor convenience, but as the PC/AT compatible platforms began to drastically differ between them and thus required a BIOS configuration tool appropiately customized for that computer or Motherboard revision, having the BIOS Setup integrated was becoming a necessity. Some time later, during the IBM PS/2 generation timeframe, IBM would learn the hard way that having a multitude of different configuration Diskettes was a logistical nightmare...
A major breakthrough that revolutionized the computer industry was the introduction of the Chipset. Up to this point, it seems that the only one that had to deal with fierce competence was IBM. Whereas clone computer manufacturers had been trying to differentiate themselves from IBM by providing computers that were faster or had more features than IBM ones while remaining as close as possible to 100% compatible with them, most of the semiconductor industry had been happy enough merely by being used as second sources that manufactured Intel designed chips (Having second sources was a requeriment that IBM imposed to Intel. At least until Intel got big enough that it didn't need those second sources any longer, thus Intel tried to get rid of them, but that is part of another story...). So far, NEC was the only one that I'm aware of that early on did successfully improve an Intel design with its V20/V30 CPUs.
Eventually, some smaller semiconductor designers wanted to create their own chips that could compete with Intel ones yet remain compatible with them. One of their ideas was to integrate as much as possible of the PC/AT platform functionality into a reduced set of specialized chips intended to be always used together, which is what in our modern days we known as a Chipset (Not to be confused with the original chip set meaning, which is very similar, just that the individual chips were far more generalist). The Chipset effectively began the trend of integration that culminated in much more affordable computers, as providing the same functionality of a multitude of discrete generalist chips with far less parts allowed for smaller, simpler, and thus cheaper Motherboards.
The Chipset era also began the consolidation of the PC/AT platform. Instead of having a bunch of general purpose support chips that in the IBM PC/AT behaved in a certain way due to how they were wired, you had a smaller set of chips intended to reproduce the generalist chips behavior exactly as implemented in the IBM PC/AT itself, at the cost of leaving no room for further additions. For example, the Intel 8259A PIC supported being cascaded with up to seven other 8259A, for a total of eight of them interconnected together. The PC used only one PIC, and its successor, the PC/AT, used two, but there was nothing stopping a third party computer manufacturer to make its own superset of the PC/AT platform by using three or more and still be mostly backwards compatible. A Chipset that was intended to reproduce the IBM PC/AT functionality should behave in the same way that its two cascaded 8259A would, but it was impossible to wire to a Chipset a third standalone 8259A because they weren't intended to be interfaced with more support chips. As Chipsets invaded the market, they pretty much cemented the limits of the PC/AT platform, since it was not possible to extend a highly integrated Chipset by adding individual support chips. Thus, the evolution of the PC platform as a whole became dominated by which features got into Chipsets and which did not.
The very first Chipset was the C&T (Chips & Technologies) CS8220, released in 1986, and whose datasheet you can see here. The CS8220 Chipset was composed of 5 parts: The 82C201 System Controller (Which also had a better binned variant for 10 MHz Bus operation, the 82C201-10), the 82C202 RAM/ROM Decoder, and the 82C203, 82C204 and 82C205 specialized Buffer chips. These chips integrated most of the IBM PC/AT generalist glue chips functionality (Including those that made the primitive Memory Controller, its Refresh Logic, and the A20 Gate) but few of the Intel support chips, as from those, the CS8220 only replaced the two Clock Generators, namely the Intel 82284 and 8284A, and the Intel 82288 Bus Controller. Nonetheless, that was enough to allow the C&T CS8220 Chipset to provide almost all of the platform different reference clocks, to interface directly with the Intel 80286 CPU and the Intel 80287 FPU, and to do the role of Memory Controller for the system RAM and ROM memories. While by itself the CS8220 didn't provide the full PC/AT platform functionality, it could do so if coupled with the standard set of Intel discrete support chips. So far, the CS8220 didn't add anything revolutionary from a feature or performance standpoint, yet the way that it simplified Motherboard designs allowed for easier implementations of the IBM PC/AT platform, making it the first step of bringing PC/AT compatible computers to the masses.
As can be seen in the first page of the CS8220 datasheet, there is a Block Diagram of a reference 286 platform based on it. The Block Diagram showcases four Buses: The Local Bus for the Processor, the Memory Bus for the Motherboard RAM and ROM memory, the System Bus for expansion cards, and the Peripheral Bus for support chips. The Block Diagram also makes clear that each chip had a very specific role, as the Data, Address and Control components of each Bus were dealt with by a specialized chip each (Actually, the Address Bus was divided into the lower 16 Bits and the upper 8 Bits, requiring two chips). Since the System Bus and the Peripheral Bus of the CS8220 effectively fulfilled almost the same roles than the I/O Channel Bus and the External I/O Channel Bus of the original IBM PC/AT, respectively, the Bus topology of both platforms are actually directly comparable, as the IBM PC/AT also had 4 Buses. When comparing both sets of Buses, is rather easy to notice that the main difference is that in the CS8220, the Buses are much more clearly defined due to the Chipset acting as a formal separator that makes them to look as individual entities instead of mere transparent extensions of the Local Bus that go though a multitude of glue chips.
Local Bus: In the Local Bus of a CS8220 based platform, you have the 286 CPU and the 287 FPU, along with the Chipset 5 components. The Processor no longer interfaces directly with any other system device, for everything but the FPU now it has to go always though the Chipset, which centralizes everything.
Memory Bus: The Memory Bus still pretty much works in the same way as in the original IBM PC/AT, including being wider than the Processor Data Bus to include Parity support, just that now the Chipset includes specialized logic that can be formally called a Memory Controller instead of merely using generalist chips that were wired together to serve that role. However, a major difference is that now the Memory Controller is closer to the Processor since there are far less glue chips that it has to go though to get to it, which is quite convenient, as shorter physical distances potentially allows for lower latencies and higher operating Frequencies. For comparison, if the Processor wanted to read or write to the system RAM located on the Motherboard, in the original IBM PC/AT the commands and data had to travel from CPU -> Local Bus -> Buffer Chips -> I/O Channel Bus -> Memory Controller -> Memory Bus -> DRAM Chips, whereas in a CS8220 based platform it had to do one hop less, CPU -> Local Bus -> Chipset/Memory Controller -> Memory Bus -> DRAM Chips.
System Bus: Albeit it is not notorious at first glance, the new System Bus saw perhaps the most significant changes when compared to its predecessor. The most visible difference is that whereas in the IBM PC/AT almost everything had to go though the I/O Channel Bus, in Chipset based platforms, the Chipset is the one that centralizes the Buses and takes care of interfacing them together, leaving the System Bus relegated to being just a specialized Bus for expansion cards. However, as a consequence from that, the System Bus is no longer a direct extension of the Local Bus, it is now a completely separate entity that can have its own protocol. What this means is that if a new x86 Processor type changed the Local Bus protocol (Which eventually happened a multitude of times), an appropiate Chipset could easily take care of interfacing both Buses by translating between the new and old protocols, so that it would still be possible to use I/O Channel Cards that were designed to use the 8088 or 80286 Local Bus protocol in a much newer platform. This effectively began to pave the way for fully decoupling whatever Bus the expansion cards used from the one that the Processor used.
Peripheral Bus: Finally, the Peripheral Bus had the same duties than the external I/O Channel Bus as both were intended for support chips, but like the previously described Memory Bus, the Peripheral Bus and the support chips in it are closer to the Processor than the external I/O Channel Bus was due to a lesser amount of glue chips. Basically, whereas in the original IBM PC/AT communications from the Processor to a support chip like the PIC had to go from CPU -> Local Bus -> Buffer Chips -> I/O Channel Bus -> Buffer Chips -> External I/O Channel Bus -> PIC, in a CS8220 based platform it had one hop less, CPU -> Local Bus -> Chipset -> Peripheral Bus -> PIC.
What I'm not sure about is how the Chipset address decoding logic worked, as it is possible that a Chipset was hardwired to always map some address ranges to a specific Bus. For example, it could be possible that Memory Addresses under 640 KiB and above 1024 KiB had a hardwired mapping to the RAM attached to the Chipset Memory Bus, conflicting with older memory expansion cards now located in the System Bus that wanted to map their own RAM into the Conventional Memory or Extended Memory ranges. It may explain why the Memory expansion cards vanished so quickly, as they may not have been compatible with some Chipset based platforms (I'm aware that at least the Dell System 220 computer, equipped with the C&T CS8220 Chipset successor, the CS8221, claimed to have upgradeable RAM via an "AT-style memory card", so maybe Memory expansion cards did work. However, around the same time the SIMM format became popular, so it could have simply been that the convenience of populating the SIMM Slots on the Motherboard itself demolished DRAM sockets and Memory expansion cards in less than a generation...).
The CS8220 Chipset also simplified clock generation. As mentioned before, instead of requiring the Intel 82284 and 8284A Clock Generators as used in the original IBM PC/AT, the CS8220 had the 82C201 System Controller fulfilling the role of Clock Generator. The 82C201 had two reference clock inputs and six derived clock outputs. As inputs it used two Crystal Oscillators, one that provided a 14.31 MHz reference clock and another one that could be either 12, 16 or 20 MHz (20 MHz would require the better binned variant of the chip, the 82C201-10). The 14.31 MHz input was used as usual to derive two clock lines, the 14.31 MHz OSC line for expansion slots and the 1.19 MHz OSC/12 line for the external 8254 PIT. The other crystal was used to derive four clock lines that would supply the reference clocks for everything else in the system. These lines were PROCCLK (Processor Clock), SYSCLK (System Clock), PCLK (Peripheral Clock) and DMACLK (DMA Clock).
The most important reference clock was the PROCCLK line, which ran at the same Frequency than the Crystal Oscillator and supplied the reference clock of both the 286 CPU and 287 FPU. As you already know, the 286 CPU internally halves the input clock, and the 287 FPU typically runs at one third the input (It can also run at the clock speed of the input clock, but as far that I know, only the IBM PC/XT 5162 used it that way), so assuming a 20 MHz crystal, the effective operating clock speed for them would be 286 @ 10 MHz and 287 @ 6.66 MHz. The SYSCLK and PCLK lines ran at half the PROCCLK speed, with the difference between them being that SYSCLK was synchronized to the Processor clock cycles while PCLK seems to be asynchronous (I'm not sure about the practical difference). SYSCLK was used for the CLK line of the expansion slots, and PCLK was used as the input line for the support chips, except the DMACs and the RTC. Finally, the DMACLK line ran at half the SYSCLK speed (Effectively 1/4 PROCCLK) and was used solely by the pair of Intel 8237A DMACs. Assuming a 20 MHz crystal, the DMACs would be running @ 5 MHz, which is the clock speed than the highest rated Intel 8237A DMACs could officially run at.
Missing from the CS8220 Chipset clock generation scheme is everything related to the RTC, which includes the 32.768 KHz Crystal Oscillator and its Clock Generator. These should still have been discrete parts.
Soon after the CS8220 Chipset, C&T launched the 82C206 IPC (Integrated Peripherals Controller). The 82C206 IPC combined the functions of the two Intel 8237A DMACs, the two 8259A PICs, the 8254 PIT, and the Motorola MC146818 RTC (Including the spare RTC SRAM) into a single chip. All these account for almost the entire heart of the PC/AT platform, the missing part is just the Intel 8042 Microcontroller. The 82C206 could be used either as a standalone chip or paired on top of a CS8220 platform, in both cases as replacement of the mentioned discrete support chips. A curious thing about the 82C206 is that some Internet sources (Mostly Wikipedia) claim that it integrates the functions of the Intel 82284 and 8284A Clock Generators and of the Intel 82288 Bus Controller too, but those are nowhere to be found in the datasheet. In the CS8220 Chipset, these functions are performed by the 82C201 and 82C202 components, respectively.
From the clock scheme perspective, the 82C206 had two reference clock inputs, one that had to be half the Processor input clock line (If paired with the CS8220 Chipset it should be the PCLK line), and another clock line with a 32.768 KHz Frequency for the internal RTC (Albeit the Data Sheet mentions that it supported other two input values, that is the most typical one). The integrated DMACs could run at either the input clock speed or internally halve it, removing the need for a third clock input that ran at half its speed, which is what the CS8220 DMACLK line was for.
So far, the C&T CS8220 Chipset and the 82C206 IPC fits perfectly the definition of what would be later known as a Northbridge and a Southbridge. The Northbridge took care of all the logic required to interface completely different chips and Buses together, and the Southbridge provided functionality equivalent to the PC/AT support chips. Is amusing than in the very first Chipset generation the Northbridge and Southbridge were actually separate, independent products, but in the next one, they would be part of the same one.
C&T eventually released an improved version of the CS8220 Chipset, the CS8221, also known as the famous NEAT Chipset. The CS8221 managed to integrated the functions of the five chips from the previous Chipset into only three parts: The 82C211 Bus Controller, the 82C212 Memory Controller and the 82C215 Data/Address Buffer. It also added the previously described 82C206 IPC as an official fourth chip, merging Northbridge and Southbridge as part of the same Chipset product.
The CS8221 was a late 286 era Chipset. By that time period, Intel was already selling 286s binned to run at 16 MHz, while other manufacturers put some more effort to get them to 20, or even 25 MHz. The previous CS8220 could run a 286 @ 10 MHz if using the better binned variant, but even if 286s could be clocked higher, 10 MHz was pretty much the upper limit of what the expansion cards sitting in the System Bus and the support chips could tolerate (The slower, asynchronous RAM was dealed with by using Wait States), a limit also shared by 8088 Turbo XT platforms. Among the multiple major features of the CS8221, the most notorious one was that the System Bus could be configured to be a fully independent asynchronous clock domain that was not bound to the clock speed of the Local Bus. This is the very reason why it could run a 286 at clock speeds that easily broke the 10 MHz barrier, since clocking the Processor higher didn't clocked almost everything else higher. While the CS8221 datasheet claims that it supports either 12 or 16 MHz 286s, I'm aware that some Motherboards used it with 20 MHz ones, too.
The CS8221 Chipset had two Clock Generators. The less interesing one was in the 82C212 Memory Controller, which used as input clock the already too many times mentioned 14.31 MHz crystal to derive the OSC and OSC/12 lines from. The main Clock Generator was in the 82C211 Bus Controller, which could use as input either one or two Crystal Oscillators for synchronous (Single CLK2IN input) or asynchronous (CLK2IN and ATCLK inputs) operating modes. In total, the 82C211 supported five clock deriving schemes, three synchronous and two asynchronous, giving some degree of configuration flexibility according to the crystals used. Amusingly, the 82C211 had only two output lines, PROCCLK and SYSCLK, which provided the reference clocks for everything else in the system. As the 82C206 IPC could internally halve the clock for the DMACs, there was no need for another clock line at all.
At the time that the CS8221 was being used in commercial Motherboards, it seems that there were two standard scenarios regarding how to clock the platform. Getting a 286 @ 16 MHz could be easily achieved by relying on just a single 32 MHz Crystal Oscillator wired to the CLK2IN input line, as in one of the Clock Generator synchronous modes it could be used to derive a 32 MHz PROCCLK (Same as CLK2IN) and a 8 MHz SYSCLK (CLK2IN/4). Basically, the Processor clock ran synchronous with the rest of the system, but at twice its speed. The other use case is far more interesing, as it involves a higher clocked Processor, a 286 @ 20 MHz. Using a 40 MHz crystal to derive the reference clocks for the entire system wasn't a good idea because in the previous setup, it would also mean that the System Bus would be @ 10 MHz, which was borderline (Albeit still within the realm of the possible, as the previous CS8220 Chipset had a better binned version that could do so). By running the Clock Generator in asynchronous mode with a companion 16 MHz crystal wired to the ATCLK input, it was possible to have a 40 MHz PROCCLK (Same as CLK2IN) with a 8 MHz SYSCLK (ATCLK/2). This seems to be the way that Turbo AT manufacturers got their high speed 286s running, like the Dell System 220 and the GenTech 286/20, both of which had a 286 @ 20 MHz with a 8 MHz Bus using a C&T Chipset. This 286 era Hardware is quite representative of how the platform topology would be for the next 20 years.
The other major features introduced by the CS8221 Chipset involved its advanced Memory Controller. The 82C212 Memory Controller supported up to 4 Banks with Parity, as usual, but it had extensive dynamic mapping capabilities that allowed to use the RAM managed by it to emulate Expanded Memory (EMS), and also allowed for the introduction of a new type of memory known as Shadow RAM. These features by themselves wouldn't be as important if it wasn't because at the same time that Motherboards with this Chipset were being designed, the SIMM Memory Module format became prominent. With SIMMs, it was possible to save a ton of physical space in the Motherboard compared to the old way of having sockets for individual DRAM chips, and that space could then be used for more SIMM Slots so that you could install even more RAM in the Motherboard itself. All that RAM was directly managed by the Chipset Memory Controller as a single unified RAM pool that could be partitioned and mapped as the user wanted to, removing any need for Memory expansion cards.
In order to understand in retrospective how amazing the capabilities of the CS8221 Memory Controller were, you first have to consider about how much RAM a PC/AT of the inmediate previous generation had, and the required Hardware to actually install that much RAM. As an example, suppose that a well geared 1988 PC/AT computer had at the bare minimum the full 640 KiB Conventional Memory. As not all computers supported to have that much installed on the Motherboard itself (This was the case on the IBM PC/AT, which supported only 512 KiB RAM on the Motherboard), chances are that to get to 640 KiB, the computer needed a Memory expansion card. I suppose that it was preferable to pick an Extended Memory card that could also backfill the Conventional Memory instead of wasting an entire expansion slot for just 128 KiB RAM or so. However, since most applications were unable to use Extended Memory due to the 80286 CPU idiosyncrasies, for general purposes only the first 64 KiB of Extended Memory for the HMA actually mattered (Albeit before 1991, which is when PC DOS 5.0 introduced HIMEM.SYS in a pure DOS environment, I think that only Windows/286 or later could make use of it). Meanwhile, the applications that were memory heavy relied on EMS, which means that you also required an Expanded Memory card (There were also pure Software emulators that could use Extended Memory to emulate Expanded Memory, but I suppose that these were very slow and used only as a last resort. I don't know how Expanded Memory emulators were supposed to work in a 286, those are functionally different from the more known 386 ones). Thus, a well geared PC/AT would probably have two memory expansion cards, a 512 KiB or so for Extended Memory that also backfilled the Conventional Memory, and a 512 or 1 MiB Expanded Memory one. The CS8221 Chipset along with SIMM Memory Modules would dramatically change that...
A typical Motherboard based on the CS8221 Chipset had 8 SIMM Slots. SIMMs had to be installed in identical pairs to fill a Bank (Two 9-Bit SIMMs for a 18-Bit Bank). With SIMMs capacities being either 256 KiB or 1 MiB, a computer could have from 512 KiB up to 8 MiB installed on the Motherboard itself, which at the time was a massive amount. The magic of the Memory Controller relied on its mapping flexibility, which could be conveniently managed via Software. Basically, you could install a single RAM memory pool in the Motherboard via SIMMs without having to touch a single Jumper, then set in the BIOS Setup how you wanted to map it. For example, with 2 MiB installed (8 256 KiB SIMMs), you could fill the 640 KiB Conventional Memory (I'm not sure if mapping lower than that was possible in the CS8221. The original IBM PC/AT didn't required to max out Conventional Memory to use an Extended Memory expansion card), then choose how much of the remaining 1408 KiB would be mapped as Extended Memory or used for Expanded Memory emulation. If you wanted, you could tell the BIOS Setup to use 1024 KiB for Expanded Memory, then leave 384 KiB for Extended Memory. In resume, the Chipset Memory Controller took care of all the remapping duties so that your system RAM was where you wanted it to be, and all this was possible without the need of specialized Hardware like the previous Memory expansion cards, nor having to pay the performance overhead of Software emulation. A trivial detail is that the Memory Controller required an EMS Driver for the Expanded Memory to work, something that should make this Chipset maybe the first one to require its own custom Driver to be installed instead of relying on generic PC or PC/AT 100% compatible Firmware and OS support.
The other Memory Controller feature was Shadow RAM. By the time of the CS8221 Chipset, RAM memory was significantly faster than ROM chips. The PC/AT platform had several ROMs that were readed very often, like the BIOS due to the BIOS Services, and the Video Card VBIOS due to its own routines. Shadow RAM consisted on copying the contents of these ROMs into RAM memory right after POST, then tell the Memory Controller to map that RAM into the same fixed, known address ranges than these ROMs were expected to be. Thanks to this procedure, some ROMs were readed only once to load them into RAM, then applications would transparently read them from it, which was faster. This resulted in a significant performance boost for things that called the BIOS Services or the VBIOS often enough. After copying the ROMs contents to the Shadow RAM, it was typically write protected, both for safety reasons and to reproduce ROM behavior, as it was impossible to directly write to it anyways. However, write protecting the Shadow RAM was not mandatory, so I suppose that either due to an oversight or maybe intentionally, someone could have left it writeable so that live patches could be applied for things like the BIOS or VBIOS code. I wonder if someone ever had fun doing that?
What can be shadowed is extremely dependent on the Chipset capabilities. In the case of the CS8221 Chipset, it seems to be able to shadow the entire 384 KiB of the UMA in individually configurable chunks of 16 KiB (This is what the datasheet says that the Chipset supports, the BIOS developers of a particular Motherboard could have skimped on exposing the full settings and just left the ones to enable shadowing in the ranges that they thought that mattered the most). However, shadowing the entire UMA was rather pointless because there are things that shouldn't be shadowed to begin with, like the Video Card framebuffer (Depending on Video Card type, could be as much as 128 KiB), which is already RAM, or the 64 KiB window for the Expanded Memory, that is also RAM. Typically, the two most important ranges to shadow were the BIOS (896 KiB to 959 KiB) and the VBIOS (768 KiB to 799 KiB), which means that in total, 96 KiB RAM had to be set aside for general shadowing purposes if you enabled them both. I suppose that Option ROMs in expansion cards were also worth shadowing, for as long that you knew at which address they were located as to not waste RAM shadowing nothing. Finally, shadowing nothing actually served a purpose, since doing so was still effectively mapping usable free RAM into otherwise unmapped UMA address ranges, something that in previous generations would require a specialized Memory expansion card as regular users didn't mapped RAM into the UMA. That unused mapped RAM would eventually become useful for UMBs (Upper Memory Blocks). However, UMBs pretty much belongs to the 386 era Memory Mangement section since these aren't really era appropiated for a 286, and their availability on 286 platforms was extremely dependent on the Chipset mapping or shadowing capabilities.
Later developments from C&T for the 286 includes the 82C235 Chipset released during 1990, also known as SCAT, whose reference platform Block Diagram can be seen in Page 11 here. The C&T 82C235 integrated almost all the previously mentioned things into a mere single chip, the most notorious exception still being the Intel 8042 Microcontroller. Is rather ironic if you consider how almost everything had been consolidated into a single chip, then eventually would get bloated in the Intel 80486 and P5 Pentium generations before repeating again the cycle of consolidation. By the time that the 82C235 Chipset was reelevant, computers based on the 80386 CPU were already available in the mass market and next to become mainstream, pushing the 286 based Turbo ATs as the new budget computers, while the original PC platform based ones like the Turbo XTs were extremely close to obsolescence.
For some reason that I don't understand, the 82C235 had a lower Frequency ceiling than the previous CS8221 Chipset, since it seems that it supported only 12.5 MHz 286s instead of up to 16 MHz, and there is no mention about Bus speeds at all (I'm a bit curious about all the "12.5 MHz" 286s found in some Data Sheets and PC magazines of the era, since the Processor bins themselves seems to always have been the standard 12 MHz ones. It is even more weird since the CS8221 Chipset clock generation scheme was as simple as it could get if using synchronous mode, there was no reason to change anything. Maybe it was a small factory overclock that computer manufacturers could get away with?). There is also the legend of the 25 MHz 286 from Harris, which I never bothered to check the details about the required platform to get it running, like which Chipset supported it and which was the prefered clock generation method.
Even thought C&T appeared to be quite important during the late 80's and early 90's, it would eventually be purchased by Intel in 1997 and its legacy fade into obscurity...
In September 1986, Compaq, one of the most known manufacturers and among the firsts to release a virtually 100% IBM PC compatible computer, launched the DeskPro 386. The launch was significant enough to cause a commotion in the industry, as it was the first time that a clone manufacturer directly challenged IBM leadership. Until then, IBM was the first to use and standarize the significant platform improvements, with clone manufacturers closely following the trend that IBM set before eventually attemping to do it better, faster or cheaper. That was the case with the original IBM PC and PC/AT, the clone manufacturers would begin with the same parts than IBM, then eventually deploy higher clocked ones in Turbo XTs and Turbo ATs. A similar thing happened with the EGA Video Card: IBM designed the card, some semiconductor designers like C&T integrated it in a Chipset-like fashion, then manufacturers began to use it to make cheaper EGA compatible Video Cards. This time it was totally different, as Compaq bested IBM in releasing a PC/AT compatible computer with the latest and greatest Intel x86 Processor, the 80386 CPU. The consequences of this would be catastrophic for IBM, as it would begin to lose control of its own platform.
The Compaq DeskPro 386 was the first PC/AT compatible system to make use of the new Intel 80386 CPU, placing Compaq ahead of IBM. As IBM didn't really planned to use the 80386 CPU in any of its PC compatible computers since it was still milking the PC/AT, it was the DeskPro 386 launch what forced IBM to compete to maintain its spot at the top, which it did when it launched a new platform, the IBM PS/2, in April 1987. Sadly for IBM, the DeskPro 386 lead of half a year in the market gave it an enormous momentum, since other PC/AT compatible manufacturers began to follow Compaq and pretty much do clones of the DeskPro 386. Besides, the IBM PS/2 was heavily propietary in nature, whereas the PC/AT was an open architecture, which gave PC compatible vendors even more incentive to go with Compaq approach, helping the DeskPro 386 to become a de facto standard. As such, the DeskPro 386 has an enormous historical importance, as we're considered direct descendants of it instead of IBM next platform, the PS/2.
The original 80386 CPU, released in October 1985, is perhaps the most important Processor in all the 35 years of evolution of the x86 ISA, as its feature set tackled everything that mattered at the best possible moment. The 386 introduced pretty much almost everything of what would later became the backbone of the modern x86 architecture, with the 386 ISA remaining as the baseline for late era DOS Software, and going as far as Windows 95 (Even if by then the performance of a 386 was far from enough to be usable, it could still boot it). This happened mostly thanks to Intel finally learning that backwards compatibility was important, so many of the 386 features were introduced to solve the shortcomings of the 286.
To begin with, the 80386 was a 32 Bits Processor, as measured by the size of its GPRs (General Purpose Registers). A lot of things were extended to 32 Bits: The eight GPRs themselves, which previously were 16 Bits (And for backwards compatibility, they could still be treated as such), the Data Bus, and the Address Bus. Extending the Address Bus to a 32 Bits width was a rather major feature, since it gave the 80386 a massive 4 GiB (2^32) Physical Address Space. Protected Mode was upgraded to allow it to return to Real Mode by just setting a Bit, completely removing the need of resetting the Processor and all the involved overhead when using any of the 286 reset hacks. The 386 also introduced a new operating mode, Virtual 8086 Mode, a sort of virtualized mode that helped it to multitask 8088/8086 applications. A lot of action also happened in the integrated MMU, too. It was upgraded to support, in addition to Segmentation, Paging, as a new, better way to implement Virtual Memory in an OS, so both the old Segmentation Unit and the new Paging Unit coexisted in the MMU. The MMU Paging Unit also had its own small cache, the TLB (Translation Lookaside Buffer), which you may have hear about a few times.
The vast majority of the 386 features were available only in Protected Mode, which got enhanced to support them but in a backwards compatible manner, so that the 80386 Protected Mode could still execute code intended to run in the 80286 Protected Mode. Since the inmense amount of new features means that applications targeting Protected Mode in an 80386 would not work in an 80286, I prefer to treat these modes as two separate entities, 286/16 Bits Protected Mode and 386/32 Bits Protected Mode. Real Mode could be considered extended, too, since it is possible to do 32 Bits operations within it, albeit such code would not work in previous x86 Processors.
The Virtual 8086 Mode was a submode of Protected Mode where the addressing style worked like Real Mode. The idea was that a specialized application, known as a Virtual 8086 Mode Monitor, executed from within a Protected Mode OS to create a Hardware assisted Virtual Machine (We're talking about 30 years ago!) for each 8088/8086 application that you wanted to run concurrently. The V86MM was almost identical in role and purpose to a modern VMM like QEMU-KVM, as it could provide each virtualized application its own Virtual Memory Map, trap and emulate certain types of I/O accesses, and a lot of other things. A V86MM intended to be used for DOS applications was known as a VDM (Virtual DOS Machine), which was obviously one of the prominent use cases of the V86 Mode. Another possible usage of the V86 Mode was to call the BIOS Services or the VBIOS from within it, which had an alternative set of pros and cons when compared to returning to Real Mode to do so.
Memory Management in the 80386 was incredibily complex due to the many schemes that it supported. For memory addressing you had, as usual, the old Segmented Memory Model that required two GPRs with a Segment and Offset pair to form a full address, in both its 8088 Real Mode addressing style variant, the 286 Protected Mode addressing style variant, and a new 386 Protected Mode one that differed in that it allowed to extend Segments to be up to 4 GiB in size compared to the previous maximum of 64 KiB. Moreover, since in the 80386 the size of the GPR was equal to that of the Address Bus, it was now finally possible to add a different mode for memory addressing: Flat Memory Model. Using Flat Memory Model, a single 32 Bits GPR sufficed to reference an address, finally putting x86 on equal footing against other competing Processor ISAs. The fun part is that the Flat Memory Model was merely layered on top of the Segmented Memory Model: Its setup required to create a single 4 GiB Segment. Basically, if using the Flat Memory Model, the MMU Segmentation Unit still performed its duties, but these could be effectively hided from the programmer after the initial setup.
When in comes to Virtual Memory, you could use the existing Segmented Virtual Memory scheme, either at a 286 compatible level or with 386 enhancements, or the new Paged Virtual Memory scheme. The Virtual Address Space also got extended to 4 GiB per task. Is important to mention that internally, when the 386 was in Protected Mode and thus addresses were always translated by the MMU, the address translation was done by the Segmentation Unit first, then optionally, if Paging was being used, the Paging Unit, before finally getting the Address Bus to output a Physical Address. Basically, even if using Paging, the Segmentation Unit couldn't be disabled or bypassed, albeit it could be quite neutered if using a Flat Memory Model. And this is perhaps one of the 386 MMU less known tricks: It someone wanted, it could fully use Segmentation and Paging simultaneously, which made for an overly complicated Memory Management scheme that still somehow worked. Surprisingly, there was at least a single specific use case where mixing them could be useful...
The Paged Virtual Memory scheme consist on units known as Page Frames that referenced a block of addresses with a fixed 4 KiB size, giving a consistent granularity and predictability compared to variable size Segments. Instead of Segment Descriptor Tables (Or better said, in addition to, since you required at least a minimal initialization of the MMU Segmented Unit), Paging uses a two-level tree hierarchy of Page Directories and Page Tables to reference the Page Frames. The Page Directories and Page Tables are also 4 KiB in size, with a Page Directory containing 1024 4-Byte entries of Page Tables, and each Page Table containing 1024 4-Byte entries of Page Frames. The 386 MMU TLB could cache up to 32 Page Table entries, and considering that they are 4 Bytes each, it should mean that the TLB size was 128 Bytes.
Compared to Segment Descriptor Tables, the Page Directories and Page Tables data structures could have a substantially higher RAM overhead, depending on how you are comparing them. For example, to hold Virtual-to-Physical mapping data of 4 MiB worth of addresses, with Paging you could do it with a single 4 KiB Page Table (And an additional required 4 KiB Page Directory, plus 8 Bytes for the 4 GiB Segment Descriptor) as it can hold the mapping data of 1024 4 KiB Page Frames, or in other words, it can map these 4 MiB worth of addresses with a 4 KiB overhead. In comparison, with the Segmented scheme you could have either a single 8 Byte Segment Descriptor for a 4 MiB Segment, 64 Segment Descriptors with a Segment size of 64 KiB each for a total of 512 Bytes overhead, or even 1024 Segment Descriptors with a Segment size of 4 KiB each for a total of 8 KiB overhead, just to make it an even comparison with Page Frames. However, keep in mind that there was a fixed limit of 16384 Segments (It was not extended from the 80286), so Segments would absolutely not scale with low granularity, whereas with Paging, with just a 4 KiB Page Directory and 4 MiB in 1024 Page Tables, you are already addressing 1048576 Page Frames of 4 KiB each for a grand total of 4 GiB mapped addresses with a reasonable 4100 KiB overhead.
Paging had a drawback of sorts: Memory Protection was simplified, so that each Page could only be set with either Supervisor (Equivalent to Ring 0/1/2) or User (Equivalent to Ring 3) privileges. For the vast majority of uses this was enough, the typical arrangement was to have the OS running as Supervisor/Ring 0 and the user applications being User/Ring 3. However, in the early 2000's, an use case appeared where this was not enough: x86 virtualization. The first attempts at x86 virtualization were made entirely in Software, there was no specialized Hardware features that helped with it. These early VMMs (Virtual Machine Managers) had to run both the guest OS and the guest applications at the same privilege level, Ring 3, which basically means that the guest OS had no Hardware Memory Protection from its user applications. By mixing Segmentation and Paging, it was possible to implement a technique known as Ring Deprivileging, where the host OS could run in Ring 0, as usual, the guest OS as Ring 1, and the guest applications at Ring 3, providing some form of Hardware protection. Ring Deprivileging and everything associated with x86 virtualization via Software only methods pretty much dissapeared after Intel and AMD Hardware virtualization extensions, VT-x and AMD-V, respectively, became mainstream (Actually, a VMM that uses them is colloquially considered to be running in Ring -1).
While it doesn't seems like a lot of features, it had all the ones that it needed to make it extremely successful. Actually, its success dramatically altered where Intel was heading, as it would strongly shift its focus to x86 Processors, adapting to the fact that most of them would be used for IBM PC compatible computers. You may want to read this interview with many of the original designers involved in the 80386, which can give you a better idea of how important the 386 was for Intel. Still, the 386 generation had a lot of rarely told tales that happened while Intel was still experimenting with their product lineup...
While the 80386 CPU would quickly become an era defining Processor, it had a very rough start in the market. This is not surprising, as the 386 was an extremely ambitious CPU that mixed new, modern features on top of backwards compatibility with previous Processors that in some areas operated quite differently, so it was like combining the behaviors of three different Processors into a single one that did them all (8088 Real Mode, 80286 16 Bits Protected Mode with Segmented Virtual Memory, 80386 own 32 Bits Protected Mode with both Segmented and Paged Virtual Memory, and the Virtual 8086 Mode). Having so many operating modes made it such a complex beast that the early Steppings were plagued with issues.
The earliest issue with the 386 was that it couldn't hit its clock speed targets. As far that I know, Intel was expecting that the 386s would be able to run at 16 MHz, but it seems that yields for that bin were initially low since it also launched a 12 MHz part, which in modern times is an extremely rare collectible chip (Since the Compaq DeskPro 386 launched using a 16 MHz 80386, I suppose that the 12 MHz ones were used only in the earliest development systems, then removed from the lineup). However, not hitting the clock speed target was perhaps one of the more minor issues...
From all the bugs and errata that the 80386 had, a major one was a multiplication bug when running in 32 Bits Protected Mode. It seems that the bug was caused by a flaw in the manufacturing process since not all 386s were affected. According to the scarse info that can be found about that matter, the bug was severe enough that Intel issued a recall of the 80386 after it was found around 1987. The 386s that were sent back and those newly produced were tested for the bug in 32 Bits mode, the good ones were marked with a Double Sigma, and the bad ones as "16 BIT S/W ONLY". It makes sense than not all 80386 units were sent back, so many should't be marked with either.
The recall and replacement cycle caused an industry wide shortage of 386s, yet Intel seems to have capitalized on that as they sold those defective 16 Bits 386 anyways. As there was little to no 32 Bits Software available (Actually, the earliest 386 systems like the Compaq DeskPro 386 were the developing platforms for them), most 386s were used just as a faster 286, so it made sense to sell them, probably at some discount. I have no idea to whom Intel sold these 16 Bits 386s, nor in which computers of the era they could be found, nor if end users knew that they could be purchasing computers with potentially buggy chips, either. The Compaq DeskPro 386, being one of the first computers to use a 386 (And the first IBM PC/AT compatible to do so), should have been affected by all these early 386 era issues, albeit I never looked around for info about how Compaq handled it.
A questionable thing that Intel did was playing around with the 386 ISA after the 80386 launched. There used to be two properly documented instructions, IBTS and XBTS, that worked as intended in early 80386 units, but were removed in the B1 Stepping because Intel thought that they were redundant. The respective opcodes became invalid opcodes. However, these opcodes were reused for a new instruction in the 80486 CPU, CMPXCHG. Eventually, it seems that Intel noticed that it wasn't a good idea to overlap two completely different instructions onto the same opcodes, so in later 80486 CPU Steppings, these instructions were moved to formerly unused opcodes, as to not have any type of conflict. All this means that there may exist Software intended for early 80386 CPUs that uses the IBTS and XBTS instructions, thus will fail to execute properly in later 80386 CPUs or any other Processor except the early 80486s, where it can show some undefined behavior, as these could execute the otherwise invalid opcodes but with different results. Likewise, early 80486 Software that used the CMPXCHG instruction with the old encoding may fail in anything but the earliest 80486 Steppings, and misbehave on early 80386s. I suppose that there may still exist early Compilers or Assembler versions that can produce such broken Software. As always, details like this is what makes or breaks backwards and forward compatibility.
One of the most surprising things is that the 80386 CPU was released with pretty much no support chips available specifically for it, the only one that I could find was the Intel 82384 Clock Generator. As the 386 Bus protocol was backwards compatible with the previous x86 Processors, the early 386 platforms could get away by reusing designs very similar to non-Chipset based 286 platforms, but with at least the Local Bus width extended to 32/32 (Data and Address Buses, respectively), then letting glue chips fill the void. The most notorious example of this were early 386 platforms that had a Socket for an optional 80287 FPU, which could partially sit in the Local Bus (The 16 Bits of the Data Bus were directly wired to the lower 16 Bits of the CPU itself, the Address Bus had to be behind glue logic). Essencially, the whole thing was handled as IBM did in its PC/AT, which used the 286 with its 16/24 Bus for a platform extension of the IBM PC with the 8088 with its 8/20 Bus, and everything to end up interfacing them to 8/16 8085 era support chips. Is fun when you consider how advanced the 80386 was, and how much the rest of the platform sucked.
Intel eventually released some new support chips to pair with its 80386 CPU. The most known one is the Coprocessor, the 80387 FPU, simply because it was a major chip. Because it arrived two years after the 80386 CPU, computer manufacturers filled the void by adapting the 80287 FPU to run with an 80386, as previously mentioned. The FPU would remain a very niche Coprocessor, as in the DOS ecosystem only very specific applications supported it. There was also the Intel 82385 Cache Controller, a dweller of the Local Bus that interfaced with SRAM chips to introduce them as a new memory type, Cache. As faster 386s entered the market, it was obvious that the asynchronous DRAM was too slow, so the solution was to use the faster, smaller, but significantly more expensive SRAM as a small Cache to keep the Processor busy while retrieving the main DRAM contents. The Cache is memory, yet it is not mapped into the Processor Physical Address Space thus is transparent to Software. Later 386 Chipsets like the C&T CS8231 for the 80386 incorporated their own Cache Controllers, albeit Cache SRAM itself was typically populated only in high end Motherboards due to its expensive cost.
Maybe one of the breaking points of the entire x86 ecosystem is that Intel actually tried to update the archaic support chips, as it introduced a chip for the 386 that both integrated and acted as a superset of the old 8085 ones. This chip was the Intel 82380 IPC (Integral Peripheral Controller), which was similar in purpose to the C&T 82C206 IPC, as both could be considered early Southbridges that integrated the platform support chips. There was an earlier version of it, the 82370 IPC, but I didn't checked the differences between the two.
The 82380 IPC, among some miscellaneous things, integrated the functions of a DMAC, a PIC and a PIT, all of which were much better compared to the ancient discrete parts used in the PC/AT. The integrated DMAC had 8 32 Bits channels, a substantial improvement compared to the two cascaded 8237A DMACs in the PC/AT platform and compatible Chipsets that provided 4 8 Bits and 3 16 Bits channels. The 82380 integrated PIC was actually three internally cascaded 8259A compatible PICs instead of just two like in the PC/AT. The three chained PICs provided a total of 20 interrupts, 15 external and 5 internal (Used by some IPC integrated Devices) compared to the PC/AT total of 15 usable interrupts. The PIT had 4 timers instead of the 3 of the 8254 PIT, and also took two interrupts instead of one, but these interrupts were internal. Finally, as it interfaced directly with the 80386, it could sit in the Local Bus.
The question that remains unsolved of the 82380 IPC is that of backwards compatibility. Both the new PIC and PIT were considered by Intel supersets of the traditional parts used in the PC/AT, so in theory these two should have been IBM PC compatible. What I'm not so sure about is the DMAC, as the datasheet barely makes a few mentions about some Software level compatibility with the 8237A. Since I failed to find any IBM PC compatible that used the 82380 IPC, I find it hard to assume that its integrated Devices were fully compatible supersets of the PC/AT support chips, albeit that doesn't seem logical since I suppose that by that point, Intel had figured out that another 80186 CPU/SoC spinoff with integrated Devices that are incompatible with its most popular ones wouldn't have helped it in any way. There were some non IBM PC compatible 386 based computers that used the 82380 IPC like the Sun386i, but in the IBM PC/AT world, everyone seems to have ignored it, as no one used it neither to implement the PC/AT feature set, nor the new supersets. Even Intel itself seems to have forgotten about it, since some years later, when Intel got into the Chipset business during the 486 heyday, the Chipsets only implemented the standard PC/AT support chips functionality, not the 82380 superset of them. Basically, whatever features the 82380 IPC introduced seems to have been orphaned in a single generation like the 8089 IOP, but it is an interesing fork that the x86 ecosystem evolution could have taken.
During the 386 generation, Intel began to experiment heavily with product segmentation. Whereas in the 286 generation Intel consolidated everything behind the 80286 (Assuming that you ignore the many different bins, packagings and Steppings) with no castrated 80288 variant like in the previous two generations, for the 386 generation Intel ended up introducing a lot of different versions.
The first 386 variant was not a new product but just a mere name change, yet that name change is actually intertwined with the launch of a new product. After the 80386 had been for 2 years or so in the market, Intel decided to introduce a 386 version with a castrated Bus width, a la 8088 or 80188. However, instead of introducing it under the 80388 name that was supposed to be obvious, Intel decided first to rename the original 80386 by adding a DX surfix, becoming the 80386DX. This name change also affected the Coprocessor, as the DX surfix was added to the 80387 FPU, effectively becoming the 80387DX. The 82385 Cache Controller seems to have avoided the DX surfix, since there aren't mentions of a 82385DX being a valid part at all. Soon after the name change, it launched the castrated 386 version as the 80386SX. The SX surfix was also used by the new support chips that were specifically aimed for the 80386SX, namely the 80387SX FPU and the 82385SX Cache Controller, which are what you expect them to be. I'm not sure whenever Intel could have come up with an autodetection scheme so that the DX parts could be used interchangeably with either DX or SX CPUs, as the 8087 FPU could work with either the 8088 or the 8086. Amusingly, Intel designed the 80386 with both a 32/32 and 16/24 Bus operating modes, so the same die could be used in either product line according to factory configuration and packaging.
Whereas the now 80386DX had a 32 Bits Data Bus and 32 Bits Address Bus, the new 80386SX had a 16 Bits Data Bus and a 24 Bits Address Bus (16 MiB Physical Address Space. This was just the external Address Bus, it could still use the 4 GiB Virtual Address Space per task of the 386DX and all its features). Technically that was a bigger difference than the one between the 8088 and 8086, as these only had a different Data Bus width (8 Bits vs 16 Bits), yet the Address Bus was still the same in both (20 Bits). While the 16 Bits Data Bus and 24 Bits Address Bus of the 386SX matched that of the 286 and the Bus Protocol was the same, it couldn't be used as a drop in replacement since the package pinout was different (Ironically, if you read the previously linked article, you will certainly notice that the 80386SX was both expected to be named 80388, and be fully pin compatible with the 80286, too. Some old articles simply didn't age well...). As it was close enough in compatibility, simple adapters were possible, so upgrading a socketed 286 Motherboard to use a 386SX could have been done, allowing that platform to use the new 386 ISA features plus some IPC improvements. Sadly, such upgrades weren't usually cost effective since adapters could cost almost as much as a proper 386SX Motherboard, thus very few people did that, and that is even assuming that you had a socketed 286 and not a soldered one to begin with.
The next 386 variant, and for sure the most mysterious one, is the 80376 CPU, introduced in 1989. The 376 was a sort of subset of the 386 intended for embedded use, which had a few peculiarities not seen anywhere else in the history of the x86 ISA. The most important one is that it had no Real Mode support, instead, it directly initialized in Protected Mode. The second one is that its integrated MMU didn't support Paging for some reason. While 80376 applications should run in an 80386, the viceversa was not true if they used any of the unsupported features (Basically, nothing DOS related would run on a 376). If you ever wondered why Intel never tried to purge backwards compatibility from the modern x86 ISA, the fact that it is very probable that you never hear before about how Intel already tried to do so with the 376, should tell you something about how successful doing so was.
In 1990, intel launched the 80386SL, targeting the nascent laptop and notebook market. The 386SL was almost a full fledged SoC, as it integrated a 386SX CPU/MMU Core with a Memory Controller (Including Expanded Memory emulation support), a Cache Controller and an ISA Bus Controller (The I/O Channel Bus had already been standarized and renamed to ISA by 1990). It also had a dedicated A20 Gate pin, which was first seen on the 80486 released in 1989. However, it didn't integrated the core platform support chips, something that the 80186/80188 CPUs did. Instead, the 80386SL had a companion support chip, the 82360SL IOS (I/O Subsystem), which sits directly in the ISA Bus and implements most of the PC/AT core (Except that for, some reason, it had two 8254 PITs instead of one. Albeit Compaq would have a 386 system like that...), thus making it comparable to a Southbridge.
If you check the 386SL Block Diagrams, is easy to notice that it didn't had a standard Local Bus to interface with other major chips, since all them were integrated. As such, the 386SL was its own Northbridge, with the Memory Bus, Cache Bus, Coprocessor Bus (The 386SL could be paired with a 387SX FPU) and ISA Bus being directly managed by it. The Intel 82360SL also had its own Bus, the X-Bus, a little known ISA Bus variant that is directly comparable with the external I/O Channel Bus since it castrated the Data Bus to 8 Bits and typically hosted an 8042 compatible Keyboard Controller, the Floppy Disk Controller and the Firmware ROM. Perhaps the 80386SL and 80360SL are the first Intel chips where we can see how it embraced the x86-IBM PC marriage, as Intel made clear than this pair of chips were designed specifically to be used for PC/AT compatible computers. Also, I think that the 82360SL was technically Intel first pure PC/AT compatible Southbridge.
The 386SL introduced a major feature, as it was the first Processor to implement a new operating mode, SMM (System Management Mode), which you may have hear about if you follow Hardware security related news. The intention of the SMM was that the BIOS Firmware could use it to execute code related to power management purposes in a way that was fully transparent to a traditional OS, so that the computer itself would take care of all the power saving measures without needing Software support, like a Driver for the OS. The SMM also had a lot of potential for low level Hardware emulation, a role for which it is currently used by modern UEFI Firmwares to translate input from an USB Keyboard into a PC/AT compatible virtual Keyboard plugged into the Intel 8042 Microcontroller. Being in SMM mode is colloquially known as Ring -2, albeit this is a modern term since by the time that it received that nomenclature, the mode introduced by Intel VT-x and AMD-V Hardware virtualization extensions was already being called Ring -1. The Processor can't freely enter this mode, instead, there is a dedicated Pin known as the SMI (System Management Interrupt) that generates a special type of Interrupt to ask the Processor to switch to SMM. The SMI line is typically managed by the Chipset (In the 80386SL case, by its 82360SL IOS companion chip), any request to enter SMM has to be done though it.
Another interesing thing is that the 386SL had a hybrid Memory Controller that could use either DRAM or SRAM for system RAM. While everyone knows that SRAM is theorically faster, in Page 26 of the 386SL Technical Overview Intel claims that it performed slower than DRAM since the SRAM chips that were big enough to be worth using as system RAM required 3 Memory WS, being ridiculous slower than the small ones that are used as Cache. Thus, the purpose of using SRAM as system RAM was not for better performance, but because it was an ultra low power alternative to DRAM. I found that chart quite appaling, since I always though that SRAM as system RAM should have been significantly faster, even if at a huge capacity cost for the same money. Also, the 386SL datasheet claims that it has a 32 MiB Physical Address Space, pointing out to a 25 Bits Address Bus, but the integrated Memory Controller supports only up to 20 MiB installed memory. I'm not sure why it is not a standard power of two.
The final step in the 386 evolution line was the 80386EX, released in 1994. While the 386EX is out of era since by that time 486s were affordable, the original P5 Pentium had already been released, and Intel had formally entered the PC/AT compatible Chipset business, it is still an interesing chip. The 80386EX is somewhat similar to the 80386SL as it had a lot of integrated stuff SoC style, but instead of notebooks, it targetted the embedded market. The 386EX had a 386SX CPU/MMU Core plus the addition of the 386SL SMM, with an external 16 Bits Data Bus and 26 Bits Address Bus (64 MiB Physical Address Space). However, compared to the 386SL, it doesn't have an integrated Memory Controller, Cache Controller or ISA Controller, instead, it integrates the functionality of some PC/AT support chips like those found in the 82360SL IOS. Basically, the 80386EX had an integrated Southbridge, with the Northbridge being made out of discrete chips, exactly the opposite thing than the 80386SL (Not counting its companion Southbridge). Compared to the previous chip that Intel designed to target the embedded market, the 80376, the 80386EX was quite successful.
While the 386EX had some PC/AT compatible peripherals, they don't seem to be as compatible as those of the 82360SL IOS. The RTC seems to be completely missing, so it needs a discrete one. There is a single 8254 PIT, which seems to be standard. It has two 8259A PICs that are internally cascaded, but it only exposes 10 external interrupts, with 8 being internal. This means that if trying to make a PC/AT compatible computer with ISA Slots, one Interrupt Pin would remain unconnected, since an ISA Slot exposes 11 Interrupt Lines. Finally, the DMAC is supposed to be 8237A compatible but only has 2 channels, which also means unconnected Pins on the ISA Slots. Moreover, implementing the ISA Bus would require external glue chips since the 386EX exposes its Local Bus, not a proper ISA Bus like the 386SL. It seems that it was possible to make a somewhat PC/AT compatible computer out of an 386EX, giving that some embedded computer manufacturers are selling it as such, but not fully compatible in a strict sense. As such, I find the 386SL much more representative of predicting where Intel would be going...
There are two almost unhear variants of the 386 that aren't really important to mention but I want to do so to satisfy curious minds. The first is the 80386CX, which seems to be mostly an embedded version of the 80386SX with minor changes. The second is the 80386BX, which is a SoC similar to the 386EX but with a totally different set of integrated peripherals that aimed to cover the needs of early PDAs (The Smartphones of late 90's) with stuff like an integrated LCD Controller.