• Content count

  • Joined

  • Last visited

  • Days Won


Posts posted by EtchedPixels

  1. With suitable level shifters you can simply wire the 6502 directly to the FPGA. There are similar projects that use a PIC or similar device to do this.


    Another option, which has its own big set of advantages is to use one of the 6502 FPGA cores and put the entire thing on the FPGA. That way you can have your 32MHz 6502 with 512Kb of banked memory and all the other bits you fancy.



  2. Most of a multi-core CPU is the memory interface. At least you've got dual ported RAM (with the odd fun 'feature' - definitely read the chip documentation!) so some of the horrible bits are done for you.

  3. You don't need call return providing you can get access to PC somewhere.


    Ultimately however all 8bit microprocessor designs that pursue elegance evolve into a 6809 (and I say that as a Z80 fan)



    You don't actually need a lot of instructions to get a pretty effective processor, but how easy it is to program for and how short the code is are rather different questions. The 6502 for example is pretty minimal and quite effective (if a pita to program), while the 8008 is miniscule but does lack a proper stack and arbitrary depth call/return


    In your instructions set I'd say you can drop NAND (you have NOT and AND), you can drop NOT (XOR). You can in theory even drop SUB as you have ADD and XOR (thus NOT).


    Some of the bigger machine word systems also didn't have a jump instruction as such, you merely need store-conditional and you can treat program counter as a register. That also makes stacks or register link calls trivial


    SL/SR can both be replaced with the more useful rotate operation which is as cheap to implement but can do shift left/shift right/ rotate left/rotate right if combined with AND


    I suspect you can implement call/return and the stack ok as you've got register relative ops so you can use a register of your choice as stack. The only ugly would be that you basically end up doing "load register with constant computed at link time", stick it in (Rstack), Rstack += 2, JMP xx, and your 'RET' is slightly ugly too

  4. If you are doing an rt system then you can hide interrupts from users by making an interrupt an event. You never see an "interrupt" just your thread gets woken up again. Making that work often needs support for priority inversion handling and priorities but I suspect you need them anyway and deadlock detection if you want it to be reasonably userproof.


    Controllers monitoring bits of I/O space seems sensible - it's not new, floppy controllers generally polled the disk change lines of the disks and turned it into an IRQ. Some ethernet controllers support similar PHY polling schemes so the hardware polls the phy regularly and checks for certain changes.

  5. Design West 2013 (IIRC) say he preferred his FPGA designers to use VHDL instead of Verilog to make it obvious that they aren't using a programming language and must think differently :-)


    Thats confusing given that VHDL is a sort of bastardised ADA which is (allegedly) a programming language.

  6. I can only speak for the Linux case, but I think Linux will be quite happy with physical mappings in supervisor mode. Do you need to force a page size or can you match on a base/mask pair  as the 68010/68451 pair did ? Physical without proper caching would be bad though. I guess with 16MB tlbs for the kernel it wouldn't be too bad.


    I guess the other alternative is segment based addressing 8) There are reasons a lot of the earlier microprocessors with memory protection used segments even if it made programming them less fun in some cases (x86 due to the 16bit size). Does make full virtual memory harder but it makes the MMU architecture much simpler because you cache the entry with the segment register. Would limit you to ucLinux but with protection (although in theory with a bit of core kernel hacking you could also get fork() etc working) or perhaps a retrobsd/2BSD.


    Would going to 8 or 16K pages help - seems like it would also help for performance, especially if your code isn't very compact. There's definitely going to be a trade-off on how much time you spend reloading TLB entries and efficiency of memory use. 16K pages isn't that unreasonable and x86 is really only 4K nowdays because of compatibility. 16K ought to mean less misses and two less match bits to worry about in the cam


    Other trick is to ignore some bits of the virtual address space for now (and support it later as needed). Some 64bit cpus do this today.


    Not sure I'd bother with a context/ASID. If you only have 8 entries then it'll be cheap to save/reload them on a task switch and if that lets you have more TLBs that I imagine would be a bigger win ?

  7. There is plenty wrong with VHDL and Verilog but I have to say the biggest problem I (and I think many people from a programming background) have is the business of thinking in parallel. Not just the idea that things like assignments take time and aren't instant but things like the fact that (except for power in some cases) it's actually not worth doing conditional evaluation of something, you can evaluate it every clock at no extra cost, in fact you can evaluate hundreds of un-needed things for free just in case they are relevant to a given cycle.


    Not sure a language can help much with that. There is simply a gap between the conceptual model of programming and the reality of FPGA.

    • Like 2

  8. It's nastier than that if you are not very careful. Consider the sequence



    TLB miss

    fetch TLB miss handler instruction, oh bugger it's not there -> BOOM



    TLB miss

    fetch instruction

    save old stack pointer, oh bugger -> BOOM


    (and thats a general trap handling issue with TLB misses - where do you put the trap vector and restart data

     that won't itself cause a TLB miss)



    I would vote for running the TLB miss handler physically mapped. In fact if you don't have many TLB entries, or your TLB entries don't have a size field I'd vote for running "supervisor mode" code physically mapped always.


    If you've only got fixed size say 4K TLBs then you have to take hits executing kernel code, which is stupid, and you have some other horrible cases (the infamous one is drawing a vertical line on a frame buffer)


    On x86 we try and do things like map the Linux kernel and its view of physical RAM using large pages, because even with a hardware TLB fetcher and a big TLB the TLB misses hurt.


    Another approach used by some processors is to in effect sacrifice a couple of bits of virtual address space to "direct mapped" and "uncached" and things like that.


    Then it becomes address[31] = physical mapped "0"&address[30 down to 0] else TLB


    If you do that then I think you are probably ok providing the user makes sure their TLB trap vector, code and the like is all in physical space.

  9. Most of the processors that do this dump enough internal state onto the stack for the trap and then throw the lot at the OS and say "you clean it up". For some processors this was *evil* but usually consisted of a chunk of nasty to understand assembler the manufacturer provided.


    I would think if your instructions are restartable then you probably only need to know PC of trapping instruction and delay slot flag. At that point you can reconstruct and resume execution (you might need to know if the jump was taken).


    So I think I'd push

            condition codes

            flags [delayslot, etc]


    onto the trap stack or even push both a trap pc and a resume pc (the same except for delay slots when resume pc is the



    It becomes something like

                           restartaddr = stack[trap_pc];

                           if (stack[flags]&DELAY_SLOT) {

                              restartaddr -= JUMP_SIZE

                              stack[trappc] = restartaddr;



    (restores condition code, continues in usermode. The branch will be re-executed and go the same way as before)





    Probably even hideable in hardware. One thing that's nice about hiding it is that you keep compatibility if your behaviour has to change in future processors (eg x86 hides all sorts of parallelism in the real processor when it comes to throwing exceptions)

  10. The SD only works in UZI and is a bit iffy as its driving it in SPI mode  (1,1) not (0,0) as it should. I've added support for other SPI modes to the spimaster VHDL but only so far tested the 0,0 with ethernet. Once its a bit more tidied up I'll put up a new version of the 'classic' build with CP/M 3.x including SD support, ethernet SPI port etc

  11. You might also want to look at Will's 'SocZ80'. That's got all the VHDL and bits plus a bit file so you can just drop it onto the Papilio Pro and then fiddle with it.


    They do seem to grow however. I think I'm going to have to make a case for my SocZ80 now it's got ethernet, SD card and a few other bits dangling off it and is beginning to look more like a jellyfish 8)



  12. USB host is hard because its a very complicated protocol you need to speak and the timings are tricky. USB slave is fair bit simpler, at least at the lower speeds.


    Simply using the USB connector is also not necessarily easy because on most FPGA boards it isn't wired to the FPGA directly in a useful fashion.


    The other problem for board choice and software is that Haider is in Iraq, and a lot of FPGA devices and software are on the controlled lists so export rules may be a problem (especially as the US enforces them its end with the 'shock and awe' scheme).


    Hamster: I assume you've seen, which only needs a UTMI buffer ?



    • Like 1

  13. Playing with this today getting it working in SocZ80


    Useful things I have learned


    - 10MHz or less

    - SPI mode 0,0 only

    - 100nS is needed between deselect and reselect

    - 100nS is needed between the transaction end and deselection


    That was far too much fun because the chip responds to just about any SPI protocol timing violation of the above by almost working, everything the mac address writes which mysteriously fail a lot.


    I have added SPI mode 0 to the SocZ80 SPI master and figured out the delays. Unlike a little microcontroller that the libraries are written for on a 128MHz Z80 on FPGA you can violate them! I can now send and receive packets. On the bright side since we now do SPI 0,0 I can switch Will's SD card driver to use the correct SPI modes, which might explain why some of the cards I have didn't work.



  14. 7segs are a nice learning exercise in strobes and also quite handy for debug on other projects, just with resistors this time perhaps 8)


    An LCD would be cool but if you have some handy SPI pins on the top or a suitable connector then you already have the bits to hook up one of the little Adafruit displays if wanted, assuming the Papilio Duo can source the 100mA + it needs ?