alvieboy Posted September 22, 2013 Report Share Posted September 22, 2013 Hey guys, I am developing a new CPU (for fun and beyond), which aims to replace the slow ZPU we have been using so far. The new CPU design is coming along very well, and should match and eventually outperform the Xilinx Microblaze in program size, performance (MHz) and implementation size (well, perhaps this one not, let's see). The CPU is 32-bit, RISC-like, with 31 general purpose registers, a zero register, and a few special registers. It's an hybrid of well known CPUs, like Microblaze, ARM, SPARC, and others. All instructions are 16-bit, and can be extended for immediate values. It has 2 to 5 asymmetric ALU, which in certain scenarios allows the CPU to execute two (or more) instructions at the same time. All normal addressing modes are supported. The design uses 3 to 6 pipeline stages, depending on configuration. All branch instructions have delay slots. The objective is to have a fast CPU (something between 100MHz and 166Mhz) , superscalar, and have it fit nicely on a PPro/Papilio One while using the same Wishbone interface as ZPUino does. The current state is: it works in simulation, an assembler/linker is already working, still missing the C/C++ compiler (LLVM), Now... I really need to name it. And this is where I need your advice and help. The best name I found so far is "XThunderCore", or abbreviated, "XTC". What are your ideas ? Can you come up with a better name for it ? Best,Alvie Link to comment Share on other sites More sharing options...
Jack Gassett Posted September 23, 2013 Report Share Posted September 23, 2013 Hmmmm, this sounds really exciting. I like the name ThunderCore, and it is also important to have a nice abbreviation. I'm going to put my thinking cap on for the next couple days, but honestly, I don't think I can top that name. ThunderCore! I process like the storm! Or, this processor brings the thunder to your code! I like ThunderCore. Jack. Link to comment Share on other sites More sharing options...
alex Posted September 23, 2013 Report Share Posted September 23, 2013 Yeah, ThunderCore. Also the X in front makes it more eXtreme so really, you can't beat that. Link to comment Share on other sites More sharing options...
alvieboy Posted September 23, 2013 Author Report Share Posted September 23, 2013 Actually the "X" there ended up dual-purpose. First, nothing shows up on google for it (which is a good thing). Second, the sound of "XThunder" seems to emphasize the "Thunder" part, like a very very impressive one. Also, the "XT" part might resemble "eXTreme" (or eXTended), which is also a good thing. You know, right now and all along the implementation, I've been calling it "newcpu". Such a dumb name... And all vendors seem to adapt naming which resembles something odd. "SPARC" (Scalable Processor ARChitecure). "SuperH". "Blackfin". "S+core". "Microblaze". "Picoblaze". "XGate". "XStormy". "XTensa". "Dragonfly". We need something as powerful as these. Link to comment Share on other sites More sharing options...
hamster Posted September 23, 2013 Report Share Posted September 23, 2013 HP went through a "fishy" phase with "Mako" (PA-8800) and "Shortfin" (PA-8900). I think that they missed out on not using 'Orca' - maybe a backronym for Open Reusable CPU Architecture? Link to comment Share on other sites More sharing options...
Jack Gassett Posted September 23, 2013 Report Share Posted September 23, 2013 My vote is for XThundercore, much better then newcpu. Now, if only my Denver Broncos can bring the thunder to Monday night football tonight... Link to comment Share on other sites More sharing options...
alex Posted September 23, 2013 Report Share Posted September 23, 2013 Yeah let's stick to football. Nobody mention the c - u - p shhhhhh... Link to comment Share on other sites More sharing options...
F6EEQ Posted September 24, 2013 Report Share Posted September 24, 2013 What is CUP?? Football fan acronym... or Completely Upsetting (the hacker) Processor :D ?? BTW Alvie, not only you do super job, but also you have bright ideas for naming. I vote 100% for your XTC. Link to comment Share on other sites More sharing options...
alex Posted September 24, 2013 Report Share Posted September 24, 2013 America's Cup sailing. Brings our office to a standstill every morning. Nail biting really, the finalists are literally neck in neck. Link to comment Share on other sites More sharing options...
alex Posted September 25, 2013 Report Share Posted September 25, 2013 Wow, that was a great comeback for Team Oracle USA, hands down. It was absolutely shocking watching our kiwi boys have such a huge lead then manage to snatch a great defeat from the jaws of victory. Link to comment Share on other sites More sharing options...
Jack Gassett Posted September 26, 2013 Report Share Posted September 26, 2013 Man, I need to start watching, it sounds exciting! Jack. Link to comment Share on other sites More sharing options...
hamster Posted September 26, 2013 Report Share Posted September 26, 2013 [geek on] I just want to know how they stabilise the boats while they are up on the hydrofoils.... Link to comment Share on other sites More sharing options...
alex Posted September 26, 2013 Report Share Posted September 26, 2013 Man, I need to start watching, it sounds exciting! Jack. i think you missed the boat, so to speak It's all over now, maybe next time. I imagined they would have made a big deal in the news about it over there. Font page news material, unless it was only a big deal in San Francisco. Link to comment Share on other sites More sharing options...
Jack Gassett Posted September 26, 2013 Report Share Posted September 26, 2013 Maybe they did, but I get my head in the sand when I'm working on something big. Right now I'm trying to get the SID filters for the RetroCade released so I haven't tuned into the news or anything else for the last week! Link to comment Share on other sites More sharing options...
alvieboy Posted October 3, 2013 Author Report Share Posted October 3, 2013 Guys, I am happy to announce that first implementation of XTC, with a simple assembly program that does something unusual - it prints 'Hello World!" through the serial port ) actually works in a Papilio Pro!!!! The assembly program is very simple. I'm posting it here so you can see some of the XTC assembly instructions:.text.globl _start_start: limr 0x80000000, r3 /* Load IO base address into r3 */ limr 55, r6 /* 104MHz. Baud rate: 115200, 16x oversample, gives 55 for baud divider */ copy r4, r3 /* r4 <- r3 */ addi 4, r4 /* Add 4 for the UART control register. */ stw r4, r6 /* Store baud rate divider in UART control reg */.endless: limr mystring, r2 /* Load mystring offset into r2 */ call putstring, r0 /* Call putstring */ nop call delay, r0 /* Delay a few clock cycles */ nop bri .endless /* Repeat */ nop .global delaydelay: limr 0x400000, r2 /* 0x400000 cycles */.wait: or r2, r2 /* is r2 zero ? */ brine .wait /* No, jump into .wait ... */ addi -1, r2 /* .. and decrement r2 (this is delay slot) */ ret nop.global putstring.type putstring, @functionputstring: limr 2, r5 /* Load 2 into r5 */.waitready: ldw r4, r1 /* Load the UART control register */ and r1, r5 /* Check if bit 1 is set (and with 2) */ brine .waitready /* No, jump into wait ready, UART is still busy */ nop ldb+ r2, r1 /* Load a char from string (at r2) into r1, increment r2 */ or r1, r1 /* Is a null char ? */ brine .waitready /* No, not a null char, jump ... */ stw r3, r1 /* But store it in UART transmit register (this is delay slot) */ ret /* Return from subroutine and ... */ limr 0, r1 /* set r1 to zero (the subroutine return value (this is delay slot) */.data .global mystringmystring: .string "Hello World!\r\n\0" /* Our string! */Still a few thing to tune. But seeing it working made me feel very happy Alvie Link to comment Share on other sites More sharing options...
hamster Posted October 3, 2013 Report Share Posted October 3, 2013 Neat! Why the NOPs after the instructions that change the program counter (call/ret/branch...)? Something to do with the pipeline? How will that fit in with interrupts (or do you need an NOP at the start of the interrupt handler?) Link to comment Share on other sites More sharing options...
alvieboy Posted October 3, 2013 Author Report Share Posted October 3, 2013 The nop (or other instructions) after branching instructions are called "delay slots". These instructions are executed even if the branch is taken, and are used to increase the throughput due to the pipeline latencies. http://en.wikipedia.org/wiki/Delay_slot This has almost no impact on interrupts. If the delay slot is being executed when the interrupt occurs, the interrupt is delayed until the next cycle. The only thing that impacts interrupts a bit more are the load/stores, multiplications and immediate loading. Since the architecture specifies instructions of 16-bit, loading a 32-bit value for example into a register might take more than one instruction. This is accomplished in this case by using an internal register called "immreg" which can be filled in chunks. So, taking a look at the first instruction actually: limr 0x80000000, r3 This will be expanded into 3 assembly instructions: 0: 8800 imm 0x800 // Load lowe12-bits into immreg 2: 8000 imm 0x000 // shift immreg left by 12, set lower 12 bits to 0 4: e00f limr 0x00, r15 // shift immreg left by 8, set lower 8 bits to zero. Load immreg into r15. Immreg now has 12+12+8 == 32 bits. However, not all values need those three instructions to load immediates. They only need to be emmited if the value does not fit into the 8-bit immed value we have on the instructions. For unknown values (like symbol addresses) we do emit these two extra IMM, but they will be "relaxed" afterwards by the linker. Relaxation is done when all the symbols are resolved, and the extra instructions are removed if not needed. One example: imm 0xFFF imm 0xFFF limr 0xFE, r1 The value to be set is -2 (0xFFFFFFFE). Since loading immediates has a sign extent feature, only the last instruction is actually needed. As a rule of thumb, the number of extra instructions for an immediate load is: zero: if immediate is between -128 and 127 (8-bit signed),one: if immediate is between -524288 and 524287 (20-bit signed)three: all other cases. This might affect interrupts, because since there is no way to read the immreg, we need to disable interrupts before processing the first "imm" instruction and until we have the actual instruction (on this case, limr (Load IMmediate into Register) ). This immediate technique is used in ZPU and Microblaze. For ZPU, the imm size is 7 bits, for microblaze it's 16 (microblaze has 32-bit instructions) Link to comment Share on other sites More sharing options...
Jack Gassett Posted October 3, 2013 Report Share Posted October 3, 2013 Congratulations Alvie, this is a major milestone for XTC. What is next? gcc? Jack. Link to comment Share on other sites More sharing options...
alvieboy Posted October 3, 2013 Author Report Share Posted October 3, 2013 LLVM is already on the forge, but has proven more difficult than I expected. I guess I could try GCC, but I've spent some effort already on LLVM, and the implementation is cleaner than in GCC. Link to comment Share on other sites More sharing options...
hamster Posted October 3, 2013 Report Share Posted October 3, 2013 This might affect interrupts, because since there is no way to read the immreg, we need to disable interrupts before processing the first "imm" instruction and until we have the actual instruction (on this case, limr (Load IMmediate into Register) ). This immediate technique is used in ZPU and Microblaze. For ZPU, the imm size is 7 bits, for microblaze it's 16 (microblaze has 32-bit instructions) Could immedate loads be acheived through PC relative addressing? eg at the end of the function's code have a table of constants, then access them with something like regX <= mem[PC+offset] (not knowing the syntax for the assembler you are using). That way constants could be shared, and maybe linking could be achived just by plugging values into the table of constants? Link to comment Share on other sites More sharing options...
alvieboy Posted October 4, 2013 Author Report Share Posted October 4, 2013 It's an option, indeed. But that introduces other problems, like memory access latency stalling the pipeline, and would only be possible for 8-bit PC-relative addressing. So, 4 bits for opcode, 4 for register index, and 8 for the offset. Loading values from the memory must cause the pipeline to completely stall at this point - one improvement would be to "dirty" the desitnation register only, and allow all other operations that do not use that register to proceed. Note, however, that XTC is meant to be superscalar in some situations. One of those situations might be automatically executing "imm+ otherinst" in a single clock cycle, improving the performance. I't a tradeoff between performance and code size. Link to comment Share on other sites More sharing options...
alvieboy Posted October 4, 2013 Author Report Share Posted October 4, 2013 Another example: "strcpy":.global strcpy.type strcpy, @functionstrcpy: /* Source string in r2, destination string in r3 */ copy r1, r3 /* Save destination pointer in r1 (result from function).nextchar: ldb+ r2, r4 /* Load char into r4, increment source pointer */ or r4, r4 /* NULL (terminating) char ? */ brine .nextchar /* No, fetch next char ... */ stb+ r3, r4 /* But store it into dest, increment dest pointer */ ret /* return from function */ nop /* Delay slot */This is a byte-per-byte strcpy. Only uses 7 instructions, 18 bytes of code. Nothing useful to put on the delay slot (ret) on this one, but we use the delay slot after the "brine" (BRanch Indirect if Not Equal) to store the char into destination, and increment the destination pointer. Link to comment Share on other sites More sharing options...
alvieboy Posted November 4, 2013 Author Report Share Posted November 4, 2013 I've set up a domain and a mailing list for XTC. If you're interesting in participating, send an email to: majordomo <at> xthundercore.com With a line (not in the subject!) containing subscribe dev Then follow the instructions which you'll receive in your email address in order to subscribe the mailing list. Best Alvie Link to comment Share on other sites More sharing options...
Felix Posted November 9, 2013 Report Share Posted November 9, 2013 well done, alvie mate Link to comment Share on other sites More sharing options...
alvieboy Posted March 2, 2014 Author Report Share Posted March 2, 2014 if you want to participate in specifying the new ISA: https://docs.google.com/spreadsheet/ccc?key=0Ao3w4mv116ZqdHdzXzdPcjRKa2tNVmVBM1psbXBwQnc#gid=0 I suggest you also subscribe the mailing list. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.