Page 2 of 16

Re: Fun things to do with an MC68EC020....

Posted: Fri Mar 08, 2013 9:47 pm
by Dave
Today I ran a loop on an unexpanded QL in monitor mode (US model, has weird interrupts) and then with a 68EC020.

Code: Select all

100 PRINT DATE$
110 FOR loop = 0 to 100000 : NEXT loop
120 PRINT DATE$
It's a good timewaster. It just moves things around, takes a certain amount of time.

Code: Select all

Machine         Time
Bare QL         2m17s
My 14MHz EC020  1m01s
My 020+RAM      0m39s
Gold Card       0m29s
Super Gold Card 0m12s
My first line is an EC020 running just pure QL hardware. My second, faster line is with 512MB of 16-bit SRAM, and a simple PAL address decoder that allows the 020 to access the memory in its native width.

Rounding off, with QL = 1, basic 020@14MHz = 2.3, 020+RAM = 3.5, GC = 4.8, SGC = 12.5

So I have a little way to go, I think. However, 2.3x achieved by around £40 of components is... nifty.

Re: Fun things to do with an MC68EC020....

Posted: Mon Mar 25, 2013 3:30 am
by Nasta
A few corrections which may be useful in understanding how to get the most out of an old system, using a compatible fast CPU:
Dave wrote: One of the neat tricks of the 68EC020 is that it can use an 8, 16 or 32-bit bus. On the fly. The address decoder can be configured to not just select devices, but also let the CPU know the width of the data bus for that access. The CPU is natively 32-bit all the time, but can freely access 16- and 8-bit areas/devices.
It goes without saying that one access to 32 bits (a long word) is four times faster than four consecutive accesses to 8 bits (a byte), and that moving contents around memory being a big chunk of what computers do, wider is better. Think 4-lane freeway versus single lane country road.
Unfortunately out of the whole 68k series only the 68020 and EC variants can do this.
The main thing is that they do it in a very interesting manner.
The way the CPU bus signalling on these CPUs works, is to start a cyecle and then using the cycle termination sugnal, which is normally supplied by external devices, to determine what width of bus it has just accessed. In other words, it FIRST does a read or write to an address, then decides what to do next depending on what the external hardware has told it about how much of the data was used or supplied. The reason this is done is to avoid the need for double signalling (first to tell the CPU what size of bus it's looking at for the address it has supplied, then to perform the actual data transfer) which would result in a signifficant performance penalty.
In particular, and this is quite important in achieving maximum speed, the CPU starts off assuming a 32-bit wide bus, and waits for the external hardware to tell it what really happened. This external hardware can produce 4 possible states:

1) Do nothing and wait for me to tell you otherwise
2) You have just succeeded completing a 32-bit wide data transfer, start the next one
3) You have just succeeded completing a 16-bit wide data transfer, if you needed to make a wider one, start a new cycle to transfer the next 16 bits.
4) You have just succeeded completing a 8-bit wide data transfer, if you needed to make a wider one, start a new cycle to transfer the next 8 bits.

In this manner the CPU can access nearly any combination of addresses and bus widths, the only restriction is that addresses vs. bus width must be aligned to long word addresses. I.e. the smallest unit of address space where the bus width must be constant (8, 16 or 32-bit wide) is 4 consecutive byte addresses.

Now, when dealing with a system where the native CPU is 32-bit wide but the rest of the system is 16 or 8 bit wide, considerable speed-up can be gained by using an address decoding method known as shadowing. Unlike standard address decoding which maps one physical device (such as memory or IO chips etc) to one area of the address space of the CPU, shadowing takes advantage of mapping multiple physical devices (again - memory systems, Io chips etc) to the same area of the memory space, and/or mapping one physical device to several areas of the CPU address space. Special rules are implemented on how these devices and addresses are accessed to gain various advantages (usually speed), and prevent contention, i.e. a situation where each of these systems may attempt to store or supply different data (at which point the question would arise which data is correct), or hardware issues such as one piece of hardware attempting to drive a logic 1 to a bus line, while another tries a logic 0.
Using shadowing, many tricks are possible, some of which are in present QL hardware, and I will attempt to explain how they work below.
One of the problems with the QL is that it is implemented on an 8-bit data bus. The video, main memory and all devices are structured as 8-bit. Even the ROMs are 8-bit. That means a QL is always running at a quarter of the speed of the same computer with a 32-bit bus.
So, let's get back to the 68EC020. It can access any width on the fly. We can upgrade the memory with 32-bit wide RAM, replace the ROMS with a pair of 16-bit wide EEPROMS or flash. Even with the crappy video, the stock QL would run a LOT faster.
The SGC does all this and more. It has a copy of the video RAM in its own fast memory space. It copies the ROMs containing QDOS into faster, wider RAM. All neat tricks I'm far from capable of copying. However, I think I can get 80% of the results with about 40% of the effort.
Actually, most of the tricks the SGC uses are rather easy to implement.
One small point to make is that usually 8-bit wide (and lately 16-bit wide, but also serial access 1 or 4-bit wide) firmware storage is used mainly for reasons of cost and space use (be that EPROM, Flash, FRAM or any other non-volatile storage technology). One of the main reasons it comes to this is that modern non-volatile storage, although offering high memory capacity, does not offer great speed. This is somewhat a 'chicken and egg' thing as it was RAM that was absolutely necessary to make as wide as possible as it offered the most speed increase, in computers that have traditionally been driven by firmware that was loaded (booted) from external devices. The actual non-volatile firmware would in the end only be used for booting the OS and the actual hardware containing it would not be used at all once the OS was booted up and started. As a result 'evolution' ocured where the hardware holding the boot firmware needed not be fast, partly because one of the first things it would do would be to copy itself into fast RAM. As semiconductor non-volatile storage became larger, it replaced other types of external non-volatile memory such as discs, especially in embedded and ruggedized systems, but this started by it emulating discs, which have in principle (block) serial access. Code was never executed from this memory even though it might in principle support random read access. So, such memory was optimized to hold a lot of data, be reasonably fast for reading in order to copy itself to RAM for those parts that would be executed, and to use the least possible board space and connection signals. This trend continues and it will likely continue still unless someone discovers comerically viable non-volatile RAM. In the mean time it is cheaper to use a package with less pins (i.e. a narrower bus) since it uses less space on the board and requires less signal lines to connect to it (again, using less space so more available for other signal lines). THe penalty is relatively small and it's the time needed to copy the relevant part to RAM.

In the particular case of the SGC, the ROM is initially (after reset) mapped to (if memory serves me right) address F00000h to F0BFFFh, i.e. the forst 48k of the 16th megabyte. Now, the usual address is 000000h to 00BFFFh, the first 48k of the address space. The CPU requires that the first long words in the address maps (address 0 and 4) contain the address the stack pointer will be initially loaded with, and the address at which code execution is to start, so one would normally expect real data there at reset, and this is usually done by there being some sort of non-volatile storage mapped to these addresses, in the case of the standard QL, this is the system ROM.
However, on the SGC, this area initially maps to the SGC ROM, so instead of the system ROM, it's the one on the SGC that provides the initial stack pointer and address to start code execution from. And, this is done to create a copy of the system ROM in the SGC's RAM. The SGC ROM also maps to F10000h to F1FFFFh, and (if I remember correctly) part of it (to 32k) appear in 018000h to 01FFFFh, the 32k just below the RAM addresses.
The actual SGC RAM (8M) maps to the first 8M of the CPU memory area, except for the parts mentioned above.
Now, this looks like a whole jumble of addresses, but here is how this works:

Initially, i.e. after reset, every byte of the SCG ROM appears at two addresses, 15 Megs apart - i.e. the first byte of the ROm appears at address 000000h and at address F10000h, the second at 000001h and F10001h etc. The first thing the CPU does is read addresses 000000h and 000004h and uses the data there to load the stack pointer and program counter, i.e. these addresses tell it where to put the stack and at which address to look for it's first instruction. And, in fact, the first instruction points to the very same SGC ROM, but the 'alias' at address F10000 etc. HArdware within the glue PLD detects the CPU accessing any high address and from that onwards changes the way it decoded addresses so that the SGC ROM does not appear at addresses 000000h to 00FFFFh any more, but instead now it accesses the corresponding addresses in RAM.

The code that executes from the rom alias first copies the system ROM from it's re-mapped addresses at F00000 to F0BFFFh to RAM at the addresses it would appear at, in the normal QL - namely down at 000000h to 00BFFFh. Then it figures out what ROM it is and applies the necessary patches to make it run on a 68020 CPU. Next it accesses another high address (Probably in the F40000h to FBFFFFh range) which sets the glue PLD to ignore write accesses to RAM for addresses 000000h to 00BFFFh, in essence making it look like ROM, except that this copy of the actual ROM is 32-bits wide and MUCH faster (approximately by a factor of 14 or so).

So, as you can see, one trick that can be done is making the decoder behave differently based on accesses to special addresses to change it's behaviour after an address has been accessed. In this case multiple 'shadows' and aliases of the same physical memory are used, as well as temporary disabling of access or a kind of access to other kinds of memory to get a desired advantageous effect.

It should be noted that other ways of doing things are also possible. For instance, many systems use shadowing to implement faster copies of a ROM in RAM (i.e. a sort of ROM emulation), by mapping the ROM to an area normally used by RAM but only for reading, while writing the same address still writes into RAM. Then, copying the contents of the ROM to RAM is done by reading easch address within this shadow area and then writing it back to the same address, which actually reads the contents of the ROM but writes them back to RAM. When everything is copied, the ROM is completely disabled (so it is not accessible any more at all), retirning the RAM containing the copy of the ROM in it's place, but also ignoring writes to the same area to prevent corruption of the ROM copy in RAM. This sort of thing is normally done when we want the maximum RAM possible, so we want to fill as much of the available address map of the CPU with RAM, using only the minimum to implement ROM copies and IO spaces. Such an approach would be used in this proposed project when 16M of RAM is used since that is the size of the complete address map available on 68EC020 series CPUs.

Now, the matter of video. THis also uses shadowing on the SGC but uses different maps and read-write distinction in a different manner.
In particular, the SGC maps it's own 320bit RAM to the addresses of the video memory as well as the actual video memory (through the ZX8301 ULA). The difference is that it only ever WRITES to the ZX8301 i.e. the real video memory so that you get a copy of the contents of the SGC RAM at the same address in the actual video memory, and are thus able to see the picture.
However, when the CPU reads from any address within the video memory area, it actually reads ONLY from it's on-board RAM. Access speed is always limited by the slowest of these, so writing speed is governed by the speed the ZX8301 can accept data written into the original RAM, but since reading is only done from SGC RAM, this is MUCH faster - over 25x faster, since QL motherboard RAM is slower than all other accesses through the original 8-bit bus.
Here is where the peculiar way the 68EC020 does access comes right in - because it first assumes a 32-bit bus, if 32 bits of data are to be written, it initially provides the full 32 bits of valid data, even though it may turn out it's only going to need 8 or 16, because it's accessing an 8 or 16 bit bus. The initial 32 bits of data provided on writing to the video RAM addresses are used for writing into the SGC RAM, and only 8 bits go out to the original 8-bit bus. The 68EC020 then performs 3 more cycles to provide the remaining 3 bytes to the 8-bit QL bus, while the RAM waits. For reading, the RAM supplies all 32-bits at once, and the slow 8-bit bus is not used at all.

Now, this approach at first glance offers limited acceleration, but two significant facts exist that actually make this faster than one would expect in real life.
Since one pixel on the screen uses less bits than the actual width of either the 8 or the 16 bit bus, to draw one pixel, the CPU must almost always first read a byte or two or four from the video area, modify the required bits that correspond to the pixels to be changed, then write this data back. Because of this, nearly all accesses to the video RAM except when filling it with a pattern or say, stored window image, are read-modify-write cide sequences. Reading is so much faster than writing in this case, that it uses up next to no time compared to the bog standard QL, so every such operation will be at least twice as fast - things such as line and character drawing, and scrolling, for instance. But that's not all - the 68EC020 has something called a read-write buffer. Instead of waiting for some data to be written before it continues executing code, it stores data to be written in an internal buffer and it's bus control circuits take care of it while the CPU goes on about it's business. The only time it has to stall and wait is if it needs to write something again before the previous has been completely written. It will of course also stall if it has to read data or instructions vie the external bus and it has to wait for the previous transfer to complete. This is where two other things come in, namely the instruction cache and the instruction pipeline buffer. Because graphics operation code is quite repetitive (lots of loops), a CPU with cache can offer significant speed advantage even with slow external memory - the cache will likely contain instructions for the CPU that were read from the RAM in some other pass through a loop, and will not need to interrupt a write transfer in progress to get them, so a number of instructions may execute while the actual data is written to slow RAM, instead of them being executed only once the write is finished. If the next instruction is a jump up to 3 words backwards, the CPU will not even attempt a cache access but will find the instruction in it's read buffer - this would be the case of a very tight loop used for raw data transfer. In any case, a degree of parallelism occurs, producing a speed advantage because transfers that would use the bus on lesser CPUs don't even appear to happen, so obviously speed penalties do not apply.
An aside to this is that you can actually remove the old video RAM or have faulty ones and the system will still work just fine (except the screen will be corrupt). One trick that can be used is to remove the upper 64k or RAM from an original motherboard - this will save some power.
Higher density DRAM can also be had that makes it possible to replace each bank of 8 64k DRAM chips by 2 chips with a 64x4 bit organisation. There are 8-bit DRAM chips but they are not easy to find - it's easyer to find 16-bit ones and only use 8 bits, but these are packaged in a PLCC case so not easy to breadboard with.

An interesting point to mention is that shadowing of this kind is possible even on standard QL's, expanding the logic used on many of the standard RAM expansion cards with extra RAM and decoders to implement the shadowing function.
A long time ago I made my own RAM expansion for the original QL because I could not afford a Miracle TrumpCard. However, I decided that I would rather have 768k total RAM and leave the IO axtension space open. 3 banks of 256k were used to implement the 768k, but the first 182k of the first bank was used to shadow the video RAM and the second 128k fot he forst bank replaced the internal RAM completely. The add-on RAM could support the 68008 at full speed. This expansion ran noticeably faster than any other 8-bit CPU QL and the reason was faster access to video, as well as the system variables and tables which are also located in slow RAM on the QL motherboard.
Later on, when I got my hands on some PLCC 68008s (which have 2 more address lines and have a 4M address space compared to the original DIP 68008 with only 1M), I used a 4 meg old style 8-bit SIM to populete the whole 4M of the 68008 PLCC address space with RAM, leaving only a small space at the end for a floppy controller. I seem to remember I discovered a bug in an early version of Minerva which Tony and Lau then fixed, related to using more than 1 meg of RAM :)
This also ran the 68008 at about 9MHz (more than that and the ZX8301 would not access RAM correctly) but microdrives and net would not work :)

FInally, a word on the ZX8301. It's very finicky when it comes to what it expects from the CPU. In particular, a faster CPU must be prevented from seeing the DTACKL signal from the ZX8301 for a certain time because it will otherwise be too fast in finishing up the data transfer and will either remove data to be written before the ZX8301 will actually write it to RAM or will assume the ZX8301 provided data from RAM before it actually did.
Fortunately, accesses to the ZX8301 control register as well as the ZX8302 do not have that problem, so usinf video RAM shadowing can let the designer debug this on-fly - the CPU RAM will provide correct data for the system to operate, even though the screen might show 'snow' - getting the correct timing will then clear up the picture, while the actual functionality will be unimpaired.

Re: Fun things to do with an MC68EC020....

Posted: Mon Mar 25, 2013 3:54 pm
by Dave
Hi Nasta *hugs*

That all sounds dangerously close to magic to me. :) I will reread it a few times to understand the concepts you explained which are good for many people to know, though I doubt I'd ever be able to actually implement any of them. My brain has shown a strong allergy to custom logic.

I have a bunch of Goldfire components here. Do you think they may ever be used, or should I sell them for what I can get?

Re: Fun things to do with an MC68EC020....

Posted: Tue Mar 26, 2013 12:26 am
by Nasta
Dave wrote:Hi Nasta *hugs*

That all sounds dangerously close to magic to me. :) I will reread it a few times to understand the concepts you explained which are good for many people to know, though I doubt I'd ever be able to actually implement any of them. My brain has shown a strong allergy to custom logic.

I have a bunch of Goldfire components here. Do you think they may ever be used, or should I sell them for what I can get?
If you are able to do PAL code (or rather GAL), you can do this. Just think of it in terms of a decoder that does not only use addresses as it's input but also makes a distinction between read and write, and implements differences in decoding based on extra inputs. Basically only the difference between using these extra inputs bym say, DIP switches and an output of a flip-flop that you can cet or reset by accessing some address, makes an ordinary decoder into one capable of shadowing. The only thing remaining is to figure out what needs to be where in order to have all the bits and pieces of hardware and data accessible, and satisfy the rules set by the CPU - such as, where the reset vectors are supposed to be so you need to put something non-volatile there to supply them, etc.

There are a few tricks commonly used in such scenarios. For instancde, because you need to have two long words available to the CPU at address 0 and 4 at startup, but you later want to have RAM there, you start off by decoding some kind of ROM with init code both at address 0 and wherever you want it later, for instance the top 64k of the 16M of available space, that would be FF0000h. So, now you have two 'aliases' of the same ROM - this means that CPU address 0 maps to address 0 in the ROM, but so does cpu address FF0000. So, assuming actual code starts from say address FF of the ROM, and the CPU looks at address 0 to find an address where the code starts, instead of writing FF into address 0 (which makes the CPU jump to address FF and start executing code from there after reset), knowing that address FF is eventually not going to contain the ROm any more, you put FF00FF as the start address, targeting the very same ROM that the CPU also sees at FF0000, so the CPU immediately after reset jumps to the ROM alias which will indeed remain in place, and so enable the CPU to continue executing code from the ROM (obviously the code needs to be position independent but on the QL this is practically a requirement). You also set up some logic that detects an access to any address say above 800000 as a signal that the ROM should no longer appear at address 0 (in essence you use an output from a flip-flop which is set when A23 becomes 1 and reset when the system is reset as an input to the decoder, it could hardly be simpler), so the very act of jumping to the init code at FF00FFh (read: the CPU will read it's first instruction from that address, for which A23=1) will automatically set up the correct address mapping.

Once you grasp the same basic principle, it's really not too difficult to apply it several times to get what you want, you jsut need to think about what goes on during initialization. For instance you could use an alternative approach which puts the ROM at address 0 but only accesses it when reading. RAM is also mapped to address 0 but will be accessed only when writing. Since reading the init vectors and instruction code is just that - reading - the CPU will access only ROM as long as the code does not write to the same addresses. It WILL in fact want to do so in order to create a shadow copy in RAM, at this point there will be no way to read the actual contents of that RAM area as reading those addresses will continue to supply data from the ROM. The ROM code can thus copy whatever it likes from whatever readable address into the RAM 'behind' the ROM. This makes the address decoder simpler since in a later stage it only needs to start decoding the RAM for read and nothing for write (so writes are ignored to prevent shadow RAM corruption), BUT it has the disadvantage that you cannot easily check if the data was written to the RAM properly (i.e. the RAM works as intended and has no errors at those addresses). This is because the same address either reads from ROM and writes to RAM, OR reads from RAm and ignores writes, so you can't on-fly read from the ROM, write back to RAM, then read from the RAM and check wether it was written correctly, assuming this means the RAM works properly.
The way around this is to produce a 'checksum' (actually a CRC, more sophisticated and far less prone to missing an error) of all the data that was read, and then, once the shadow RAM is activated for reading (which makes it emulate a ROM), have code in place to read the contents of the shadow RAM and make a CRC to compare with the one made from the actual ROM, and compare them. If they are not the same, the contents of the shadow RAM are corrupt and the system reports an error and halts. If they are the same, the shadow copy is verified and code can then continue to execute from it.

The hardware resulting from any of these approaches (and there could be many others along the same lines) is actually very simple, but the scenario of what needs to be done when has to be carefully considered for the system to work, so that thought process and the code to do this is really where the work is.

Re: Fun things to do with an MC68EC020....

Posted: Tue Mar 26, 2013 7:18 pm
by Dave
i think that's the thing. I learned to design PCBs from schematics and to assemble PCBs. The schematic and logic design.... not my thing.

Me spending weeks designing logic is about as good a use of my time as you spending weeks assembling Auroras. Your time is far better spent designing things, and offloading the assembly to someone who is into that ;)

No, that's not a hint AT ALL, ooooh no! :P

Re: Fun things to do with an MC68EC020....

Posted: Mon Apr 01, 2013 5:20 pm
by 1024MAK
Some great information in this thread. Thank you.

Some other 68k CPU systems use a shadow (or part shadow) of the ROM at CPU address 0 and address 4. After reset, the CPU reads the ROM at address 0 and address 4, then the logic disables the shadow of the ROM at these addresses and maps RAM there instead. Only a CPU reset or power on reset will re-enable the ROM shadow at address 0 and address 4.

For a development board / simple proof of concept, in a system with not too complex logic, you could also use fast "standard" logic chips instead of a PLD.

Mark

Re: Fun things to do with an MC68EC020....

Posted: Thu Apr 04, 2013 1:28 pm
by Nasta
1024MAK wrote:Some great information in this thread. Thank you.
For a development board / simple proof of concept, in a system with not too complex logic, you could also use fast "standard" logic chips instead of a PLD.
Mark
Actually, using a GAL (simple PLD) for this kind of task is almost ideal, and also, when it comes down to it, nearly trivial - this asumes you do have some sort of development software (compiler) and a programmer capable of programming your chosen device. Most SPLDs are not in-circuit programmable, but in this simple case this may be an advantage, as you don't have to mess around with special pins that are ither dedicated or need to be accessible for programming. Just use a socket, and off you go.

Programming simple (and not so simple) decoder logic ito a GAL is one of it's reasons for being. The advantage is, in the grand majority of ceses (and probably in all cases given a bit of forethought) the pinout can stay the same, so you can design for pins to be inputs and outputs of a decoder, and then program the chip later, and of course, given that GALs are reprogrammable, you can change the logic as it suits you. The great advantage is that the delays the decoder introduces are 99% independent of what it actually decodes, and additional 'rules' are easy to add (within the limitations of the GAL logic capacity), and, most importantly, putting together the required logic is MUCH simpler than using traditional 74xx etc. because the way yo do it is always the same. It can be as simple as constructing a table of address line states to decoder outputs.

Whole computers have been built this way in the past - sometimes even when larger PLDS and gate arrays were available. I actually remember a DEC workstation whose entire system dependant hardware including the video system was built out of 24 26V12 GALs (a very capable SPLD which is sadly obsolete now :( ), plus some smaller GALs. A long time ago (seems almost like a lifetime now :) ) I built a predecessor to the Aurora of sorts, that used a 640x480 monochrome LCD - it was a true stand-alone graphics card, using 128k of SRAM, a buffer, a few counters and 6 GALS to implement it's entire logic.

Things do get more complex when SPLDs are used for sequential logic, i.e. circuits employing some sort of memory (read: latches and flip-flops) but for simple combinatorial logic, they are great, and used in the millions even today.

Re: Fun things to do with an MC68EC020....

Posted: Thu Apr 04, 2013 2:05 pm
by Dave
I have been using a GAL for address decoding in my little project. Aside from simple address decoding, it also generates the two bus width inputs for the 68EC020.

My older EPROM programmer has a neat function: I can place an existing PAL or GAL in it and it will sequentially go through and work out the programming. It also allows me to get the programming of two GALs and combine them into a larger device. This makes duplicating them easy even if I don't have the JEDEC file.

The only way I have learned to do this myself is the simpler method Nasta describes above - I make a chart of inputs and outputs. Everything after that is magic.

I really do need to get an education!

Re: Fun things to do with an MC68EC020....

Posted: Thu Apr 04, 2013 4:37 pm
by twellys
The only way I have learned to do this myself is the simpler method Nasta describes above - I make a chart of inputs and outputs. Everything after that is magic.

I really do need to get an education!
I went to one of the best Unis in the world, and all taught me was to make a chart of inputs and outputs.

Look at me now...

[Pans back from a close-up of me to include the milk float]

:ugeek: !

Re: Fun things to do with an MC68EC020....

Posted: Fri Apr 05, 2013 5:47 pm
by Nasta
Well, ditto here, basically we did learn lots of 'manual' methods for simplifying logic, but in the end, there's the two methods that both essentially translate to 'make a table of inputs and outputs'. Figure out which inputs are don't care for which case, these won't get mentioned in the equations :)

Even out of those two methods, one is not directly suitable for a PAL/GAL. they use a sum-of-products architecture, where huge and gates can have any of the inputs (normal or inverted) fed to them, but there are only a fixed number of these, and they all OR together into the output pin. For smaller devices like 16V8, 20V8 the or gates are 8-wide. This, however, is enough for 99% of the world's decoders :) and a whole lot of other useful things.
Now, when looking at the tables you make, you normally look at all combinations of inputs that result in a logic 1 to the output, but sometimes you may run out of OR terms. In most cases, it then helps to look at all combinations that result in a logic zero, and specify the output as inverted.
And with this we conclude the official university material on simple PLD logic :) as in most cases you don't even need to go that far, the logic compiler (that makes the JEDEC file) will normally perform logic optimization with the actual GAL device you select in mind.

There are, of course, extra bits like output enables and registered logic, that would be advanced stuff :)