This is basically how the SGC works, but it manipulates the decoder. Without some sort of mechanism like this, it is impossible to implement different OS's because the exception vectors and vectored OS routines need to be changed when the OS is changed. There are many mechanisms to implement this. As far as I know, the GC/CGS does this by initially decoding it's on-board ROM at address 0, where the initial PC points to an alias of the same ROM where it will always be once the system starts. In other words, it disables it's address decoding and just aliases the ROM all over the address map.
Once the loader starts, the 'real' address map is established by enabling the decoder. This can be either done by detecting access to the actual decoded ROM address (usually by simply detecting an address line going high) or an instruction has to write data to a certain address to switch on the decoder. The latter method may have some advantages. I think the GC uses the former, the SGC the latter method.
Of course, the decoder decodes the 'boot' ROM exactly where the code is now executing

(otherwise it would all go belly-up). It also decodes RAM at 0, exactly as Peter said, save for the IO area ($18000-$1BFFF) and some special consideration for the original screen.
The next step is testing the RAM at 0 (remember that the OS is not up at this time and no tests have been made). Normally the OS does not expect RAM here anyway so some testing has to be done, because of the next step, which is copying a ROM image to RAM - so you must insure that the RAM where the OS is going to sit is working properly.
At this point one might ask, how do we get to an image of an OS? For one thing, we know we have some sort of ROM storage from which we are just now executing code. So, this is one option, if the size is large enough. We also know, that sitting at $0000..$BFFF, there is the QL ROM (remember, I am talking here about a system like the SGC). Also, if we were clever enough

the much larger address map of out upgraded CPU might enable us to have a whole alias of the 1M QL address map, somewhere else in the address map, at some high address, so the 1M of address space we would expect at $000000..0FFFFF, also appears to the CPU at say, $F00000..$FFFFFF. This means we can get at anything on the QL bus simply by offsetting the address by $F00000. So, there would be a QL ROM at $F00000..$F0BFFF.
In fact, since we are not any more looking at a bare QL, we can use some parts of the original 1M of QL address space for other things. Like, a boot ROM, or in fact, a much larger Flash chip, which not only holds the boot code, but also several OS images and add-ons.
This latest option is the most flexible, as it involves copying an OS image from whatever storage to RAM, perhaps with a method of choosing which one. The first option is actually a form of the third option, with one difference - the boot ROM which may also contain an OS image or images is only visible temporarily, until the OS is copied to RAM. The SGC uses a variant of the third option, the first 256k of the QL's address map also appears at $400000, this is where the SGC boot code finds the QL ROM (whatever version), copies it into RAM at $0, and patches to run with the extended hardware).
The second option is a bit more complex because it requires one extra step, and that is reconfiguring the decoder to access the QL bus at $0000..$BFFF when reading (finding the OS ROM there) and accessing the RAM when writing.
In all options the boot code then next the OS image to RAM starting at 0. In version 2 above it is slightly interesting because it involves reading and then writing to the same address
The penultimate step is to reconfigure the decoder to write protect the portion of RAM where the OS copy is now situated. In reality this is usually the first 48 or 64k. This seems to be mandatory, apparently some software is known to attempt writes to low addresses.
The final step is, either a soft reset (no external hardware is reset, otherwise the whole procedure would be triggered from the start), which resets the CPU, which then starts executing from the RAM copy of ROM as usual, OR, the boot code simulates a reset by loading the initial SSP and jumping to the initial PC.
THe careful observer would notice that I have referred twice to a method of changing the behavior of the decoder. At the very least, to write protect the first 48k of RAM. It is possible to implement this in a 'one-time-only' manner, which returns the initial state only on power-on reset, but is it more flexible to implement this using some sort of simple 'write only' register(s) sitting at an address somewhere in the rest of the IO area ($18000..$1BFFF), there would be 3 basic states - power on reset (boot code at 0), RAM load (RAM at 0, read-write), and RAM protect (RAM at 0, read only). Making it possible to change the state at any time enables loading a new OS from software. This is how SMSQ/e is loaded on the SGC.
In Peter's scenario, the OS store is a SD card (connected in SPI mode) which is a nice possibility because it enables management of OS images externally. However, boot code is not trivial. It may well be a version of the OS (eg. Minerva) with a boot program (in sbasic?) to find, chose and load OS images and extensions. The only slight disadvantage is that the storage is fairly slow as it is entirely software driven, but since booting is basically a 'one time' thing, this is not a big problem.
There is a hardware consideration, and that is the size of the boot loader ROM. Today it comes down to choosing an easily available and cheap flash chip, and usually what you end up with is a 29F040. By QL standards it's huge - half a meg - but the smaller ones are not much cheaper, and also may be slower. However, given that we have an upgraded CPU, there is certainly address space for it. But then, there is more than enough space for several OS images including something like SMSQ/E in it to begin with
Interfacing an SD card as a SPI device is dead easy (hardware-wise) but needs a driver of some sort, on the other hand having a means of choosing the OS image to be loaded also requires software.
An approach that is simpler software-wise, but more complex hardware-wise, would actually map a selectable 64k portion of the flash to the first 64k. This is really still only viable if it is actually copied into 32-bit RAM, or run from 32-bit flash. Also, management of the flash is more difficult. For one, you cannot program a flash chip while it is being read simultaneously for code execution, in other words, we come back to running the OS from a RAM emulation of ROM.
The impact of a wide bus on the speed of the CPU cannot be underestimated. Peter is right in expecting a 3-4 fold speed increase even at QL clock speeds only on account of a wide bus and improved CPU architecture. Remember that QL software heavily relies on OS code, which means it is being executed all of the time. Doing that from narrow memory will severely reduce perceived speed of the system.