(October 2019)
I am developing a strange habit - I keep "twisting" products from failed companies... into nice toys.
For science! :-)
After completing my AtomicPI saga, something else caught my attention - a very cheap FPGA board, that came out of yet another failed company - the Pano Logic G2,
Why not compile an open-source CPU inside this FPGA?
And in fact, since this FPGA is quite big, why not make it a multi-core one?
And then compile and run programs inside it - with an open-source cross-compiler, that uses an open-source real-time OS? The same OS that most European satellites and their instruments are using?
Why not, indeed!
(he said, a month ago - and dove into the abyss
How fast could it go? And since the Pano Logic was meant to be a thin client, and comes with VGA, USB, Ethernet... it's packing all the pieces necessary to create a standalone computer! Could it be that this can be made into a fully open-source computer?
Keep reading - I believe you'll learn a thing or two.
P.S. The material is heavily technical and long, so I'll try to lighten it up here and there, with the occasional rant / funny picture. Also, please remember that I am a software developer, not a HW one; I simply enjoy fooling around with technology like this, so take everything said in this blog post - and in the referenced repos - with a grain of salt.
This adventure begun a few months ago, when I read a magnificent article from Tom Verbeure - a principal hardware engineer at NVIDIA. Tom built a real-time ray tracer on a dirt-cheap FPGA board; and "dirt-cheap" is not an exaggeration, since even now, you find ads like this on e-Bay:
Lot of 25 Pano Logic Thin / Zero Desktop Client Black w/ Power Supply Buy now: US $170.00
I'll just quote Tom here, so you can understand the "why" and "how" behind this:
Pano Logic was a Bay Area startup that wanted to get rid of PCs in large organizations by replacing them with tiny, CPU-less thin clients; connected to a central server. Think of them as VNC replacements. No CPU? No software upgrades! No viruses!
...The thin clients had a wired Ethernet interface, a couple of USB ports, an audio port and a video port. And all this was glued together with an FPGA
...The company has been defunct since 2013 and the clients are not supported by anything. But they are amazing for hobby purposes and can be bought dirt cheap on eBay.
So I got my hands on a Pano Logic - in particular, a G2 model; with the Spartan6 LX100 FPGA inside it. This is a rather large FPGA, promising far more power than any hobbyist has a right to - but since Pano Logic (the company) failed, the product itself is of no use to anyone but hackers and tinkerers; and it's therefore sold at amazing bargain prices.
I followed Tom's instructions - first dismantling the box, and then soldering wires to the JTAG connector:
On the other end, I soldered 6 pins from pin header strips - and used a small piece of perfboard, to create an "adapter" of sorts. This allowed me to "plug" the 6 cables into the JTAG connector of a Xilinx programmer. Note that these programmers can be found for cheap on eBay (see Tom's article linked above for details).
The last picture you see above, shows the IMPACT tool - made by Xilinx, the company that created these FPGAs - being able to see the chip.
Just like many other engineers, I learned over the years to hate non-determinism; in all its forms, and all its manifestations. This means that I gravitate towards open-source operating systems; where I can use my engineering skills to fully trace what happened, and why; and fully control the OS's behavior.
I don't want my computer to decide to upgrade while I am giving a presentation. I don't want some fancy antivirus decide that it must "scan" every .c and .cpp file read by my compiler during a build, because it performs "on-access scans".
I want myself - not some mega-corporation - to be in control of my own hardware. And to automate all the workflows and processes that I need; like installing my developing environments on new machines by running a few simple one-liners...
bash$ sudo apt install gcc-8 vim git make cscope exuberant-ctags tmux bash$ git clone https://github.com/ttsiodras/dotfiles bash$ git clone https://github.com/ttsiodras/dotvim .vim ...
...and seeing all the myriads of complex dependencies being perfectly resolved under my Debian (or in a similar way, under my Arch)...
...or orchestrating the creation of a complex open-source cross-compiler; allowing me to deterministically build applications with a real-time, freely-accessible OS (that happens to fly on many European satellites)...
...or installing a company's HW synthesis tools - via...
bash$ sudo apt install spartan6-xilinx-synthesis
Actually....
That last one was a lie.
HW is cheap - throw money at the problem! Yes?
But what does that mean for our endeavour with the Pano Logic G2?
Well, the last version of the Xilinx synthesis tools that was supporting the Spartan family under Linux, was the freely available ISE 14.7 WebPACK. I have installed this in my machine, and it does - thankfully - allow me to synthesize for an older Spartan3 board I have.
It's also miniscule. So tiny!
bash$ du -s -h Xilinx/14.7/ 15G Xilinx/14.7/
But I digress - and forgot that... the final Linux version of WebPACK doesn't support Spartan6 LX100 chips.
Let me repeat that - in case you didn't catch it - in a way that will make it clear:
bash$ sudo apt install gcc-8 We are sorry, but we detected a 9 year old CPU that is not supported by the freely available version of our compiler. Please buy our BRAND NEW CPU - WITH BONUS NSA MANAGEMENT EXTENSIONS! Or sell your left kidney and buy our BRAND NEW COMPILER TOOLCHAIN that supports everything!
(sigh)
Searching the Xilinx site some more, we see that there is a free version of the ISE WebPACK that targets Spartan6 devices - but only for Windows.
After downloading and unzipping this package... what do you know! That setup actually installs a Virtual Machine, containing...
...a Linux distribution!
But let's continue our investigation - and have a look at this .ova
file:
$ cd Xilinx_ISE_S6_Win10_14.7_ISE/ova $ tar tvf ISE_S6_VM.ova -rw-r----- vboxovf10/vbox_v5.2.VBOX_VERSION_PATCHr11 12425 2018-02-03 00:39 ISE_S6_VM.ovf -rw-rw---- vboxovf10/vbox_v5.2.VBOX_VERSION_PATCHr11 7253232128 2018-02-03 00:39 ISE_S6_VM-disk001.vmdk
The .vmdk
file contained inside is a virtual drive. After extracting it from the .ova
with tar
,
we discover that this is a dynamic volume; so it can't be mounted as-is with qemu-nbd
.
It must first be converted to a "normal" VMDK - and then, we can mount it:
$ qemu-img convert ISE_S6_VM-disk001.vmdk -O vmdk plain.vmdk $ qemu-nbd -r -c /dev/nbd0 plain.vmdk $ mount /dev/nbd0p1 /iso4/ $ ls -l /iso4/opt/Xilinx drwxrwxr-x. 3 500 500 4096 Dec 8 2016 14.7
...and of course, there the Xilinx toolchain is - right where we'd expect it to be... In the same folder as the "Spartan3-supporting" version!
Maybe we don't have to boot this thing at all - we'll just copy the entire tree of ISE_DS
,
to create two folders - one with the normal (2013-era) WebPACK ISE that we used for
ZestSC1/Spartan3 Mandelbrot experiments,
and this new one (2016-era) for the upcoming Pano Logic ones.
A symlink will point to one or the other:
$ ls -l drwxr-xr-x 4 ttsiod users 4096 Oct 12 21:14 ./ drwxr-xr-x 11 ttsiod users 4096 Oct 12 20:59 ../ lrwxrwxrwx 1 root root 15 Oct 12 20:46 ISE_DS -> ISE_DS.Spartan6/ drwxr-xr-x 7 ttsiod users 4096 Mar 4 2018 ISE_DS.Spartan3/ drwxrwxr-x 6 ttsiod users 4096 Dec 8 2016 ISE_DS.Spartan6/
...and since Xilinx tools depend on license files, a script will switch everything from one form to the other - depending upon what we want to do:
#!/bin/bash check_symlink() { if [ ! -h "$1" ] ; then echo "$1 was not a symlink! Aborting..." exit 1 fi } if [ $# -eq 0 ] ; then cd || exit 1 ls -l .Xilinx/Xilinx.lic Xilinx/Xilinx.lic Xilinx/14.7/ISE_DS echo echo Use xilinx.sh 3 or xilinx.sh 6 echo else if [ "$1" -ne 3 -a "$1" -ne 6 ] ; then echo Use xilinx.sh 3 or xilinx.sh 6 exit 1 fi XIL="$1" cd || exit 1 cd .Xilinx || exit 1 check_symlink Xilinx.lic rm Xilinx.lic || exit 1 ln -s Xilinx.lic.spartan${XIL} Xilinx.lic || exit 1 cd ../Xilinx/14.7/ || exit 1 check_symlink ISE_DS rm ISE_DS || exit 1 ln -s ISE_DS.Spartan${XIL} ISE_DS || exit 1 cd .. || exit 1 check_symlink Xilinx.lic rm Xilinx.lic || exit 1 ln -s Xilinx.lic.spartan${XIL} Xilinx.lic || exit 1 cd || exit 1 ls -l .Xilinx/Xilinx.lic Xilinx/Xilinx.lic Xilinx/14.7/ISE_DS echo echo Now go run this: echo " cd ~/Xilinx/14.7/ISE_DS" echo " . settings64.sh" fi
$ xilinx.sh 6 lrwxrwxrwx 1 ttsiod users 15 Oct 19 16:10 Xilinx/14.7/ISE_DS -> ISE_DS.Spartan6 lrwxrwxrwx 1 ttsiod users 19 Oct 19 16:10 .Xilinx/Xilinx.lic -> Xilinx.lic.spartan6 lrwxrwxrwx 1 ttsiod users 19 Oct 19 16:10 Xilinx/Xilinx.lic -> Xilinx.lic.spartan6 Now go run this: cd ~/Xilinx/14.7/ISE_DS . settings64.sh $
Additionally, to avoid wasting a metric ton of hard drive storage, we use rdfind
;
to identify the files that are identical between these two subtrees - and
form hard links so they only occupy space once:
$ rdfind -makehardlinks true ISE_DS.Spartan{3,6}/
After this finished, it became clear that the two trees shared almost EVERYTHING. In fact, the total storage cost went BELOW the original storage cost used for just the single ISE for the Spartan3!...
In the absence of miracles, this can only mean one thing: that apparently there's plenty of copies of files spread all over - even within the same folder subtree.
So... can we now, finally, launch the thing ?
Err... no.
I don't know what else to say. I believe the situation is describing itself, very eloquently - about the merits of closed-source software.
Let's check the .ovf
file in the original package:
$ grep MAC ISE_S6_VM.ovf | head -1
<Adapter slot="0" enabled="true" MACAddress="08002768C935"...
We see here that the Virtual Machine is equipped with an "eth0" Ethernet adapter, with a specific MAC address. Since my laptop only has a "wlan0" interface, I added a dummy one - making it the way Xilinx apparently expects it:
$ cd /etc/systemd/network
$ cat 25-dummy.netdev
[Match]
[NetDev]
Name=eth0
Kind=dummy
MACAddress=08:00:27:68:C9:35
$ sudo systemctl restart systemd-networkd
$ sudo ifconfig eth0 up
$ sudo ifconfig eth0 | grep ether
ether 08:00:27:68:c9:35 txqueuelen 1000 (Ethernet)
So does it work now?
Nope.
First, you have to disable your wlan0 adapter (!) - otherwise the detected
lmhostid
by the Xilinx tools, is the MAC address of the wlan0
adapter!
Clearly, Xilinx doesn't check whether there's an eth0
with the MAC they want...
No, they look up which network adapter internet traffic goes through - and check
that adapter's MAC.
Maybe.
Or maybe they stop at the first network adapter they find during enumeration.
Or maybe they draw lottery tickets from /dev/urandom
- and perform
an rm -rf /usr
Russian roulette once in a blue moon.
Remember, we are talking about the free version of WebPACK here - that is officially distributed for people who somehow payed the company to get Xilinx Spartan6 chips, and want to program them.
And, yet, the free version distributed, has to perform checks like these - because, erm, it has to... IT JUST HAS TO.
(facepalm)
And you'll then remember how the world was... before these giants decided to rescue us.
Now that we have sacrificed our firstborns and are
able to run the synthesis toolchain for our target - and see our FPGA
being detected in IMPACT - we can finally move to "compiling" our CPU.
Over the last 4 years, I've been working as a real-time embedded SW engineer in the European Space Agency. In a very large percentage of our missions, our SW runs on one form or another of an open-source CPU design - specifically, on a SPARC derivative called LEON.
So when I begun fooling around with CPU synthesis and FPGAs, I forked this repository; that contains a mirror of the open-source version of GRLIB, the home of LEONs. My own copy is here; please remember that I am a software developer, not a HW one; I just enjoy playing around with technology. What you are reading is just one of my hobbies - don't go and bet the family farm on my repository's code quality :-)
Also, don't expect this post to start from a 'hello, VHDL world' and end
with a working LEON3. That would require a book, not a blog post.
Instead, we will follow along the traditional paths of engineering;
we will base our efforts on pre-existing designs, and tweak them
to match our own target. This is in fact one of the roles served by
the designs
folder in the original repository.
As one might expect, writing code for programmable HW shares similarities with the SW development workflows. You have stages of processing your inputs in both worlds - instead of compiling compilation units into object files and then linking them, the FPGA tools perform synthesis, followed by placement-and-routing. You run your unit and integration tests prior to deploying your SW in production - just as you run your VHDL testbenches in your simulator prior to deploying your circuit to your FPGA.
And you edit your VHDL or Verilog code with your Vim, or perhaps your Emacs - NOTHING ELSE, INFIDEL - just like you would for your traditional SW programming languages.
Some of my friends have literally invested their lives in learning the peculiarities of specific toolchains - heck, even of specific versions of the toolchains!
Think about it - what else can you do, when you don't have the source code of a tool? All you can do, is "learn", over decades, the things to avoid... So that the black-box you build your designs with, doesn't go... banana.
But there are significant differences, too.
For example, HW tools have far more issues with re-using previous work.
If you touch a single .c
file in a codebase containing thousands of source files,
only that one will be recompiled when you make
- you'll pay the small price
for a quick compilation of a single file, and a re-link. Fast build-compile-test
cycles.
But in the HW world, that doesn't seem to the case. There is no edit-compile-run cycle; there's edit-compile-GoForAtripToTheAlpsAndStayForAweek-then-run cycle.
Another crazy difference I experienced was that builds are NOT deterministic; in the sense that in a design that utilises almost all resources of your FPGA, you may try rebuilding your code after just adding a comment - only to see it fail to satisfy the timing constraints it did in the previous build!
I am NOT joking. The placement and routing stages, in particular, are apparently very "tough" (algorithmically speaking). Heuristics are applied, in the cost functions that are used to estimate routing and timing performance... These in turn "feed" the gradient descents and simulated annealings that try to find the best location in the search space. In the end, this translates to, potentially, your "compilation" ending up trapped in a different, worse, "local minimum" than the one it found in your previous build.
Which is why you see HW designers COMMITING the bitfiles they generated, after they see them actually work on the chip.
Put simply:
A SW developer commiting an executable in his source repository, is an idiot.
A HW developer doing the same, is a wise man.
Apparently.
Which is nice.
Executive Summary: HW design is a strange land. It is, after all, a land full of clocks!
As we said above, we will now base our efforts on pre-existing designs, and tweak them to match our own target.
After cloning my repository, navigate to designs
; and copy
the folder of my previous (unexpectedly successful!)
attempt to bootstrap a LEON3 inside my Spartan3 board:
bash$ cp -a leon3-zestsc1-xc3s1000 lets-make-a-cpu
First of all, the master configuration file - config.vhd
- defines a number
of things that are FPGA specific. We are targeting a Spartan6 now, not a
Spartan3 - so...
--- ../leon3-zestsc1-xc3s1000/config.vhd 2019-03-17 09:37:11.151623486 +0100 +++ config.vhd 2019-10-19 09:21:21.950877962 +0200 @@ -1,7 +1,7 @@ ----------------------------------------------------------------------------- --- My customizations for my ZestSC1 board - based on the original design +-- My customizations for my PanoLogic G2 board - based on the original design -- for the leon3-digilent-xc3s1000. -- -- Original Copyright: @@ -15,22 +15,22 @@ package config is -- Technology and synthesis options - constant CFG_FABTECH : integer := spartan3; - constant CFG_MEMTECH : integer := spartan3; - constant CFG_PADTECH : integer := spartan3; + constant CFG_FABTECH : integer := spartan6; + constant CFG_MEMTECH : integer := spartan6; + constant CFG_PADTECH : integer := spartan6; constant CFG_TRANSTECH : integer := TT_XGTP0; constant CFG_NOASYNC : integer := 0; constant CFG_SCAN : integer := 0; -- Clock generator - constant CFG_CLKTECH : integer := spartan3; + constant CFG_CLKTECH : integer := spartan6;
We change all references of spartan3 to spartan6
We set CFG_CLKMUL
and CFG_CLKDIV
to the same value - e.g. 5 - for now,
the LEON will be running at the same speed as the board's clock (25MHz).
After we've done our first successful synthesis/placement/routing,
we'll see the maximum frequency our circuit can be run - and we will
bump up the clock accordingly.
In the Makefile.inc
, we change to using the proper HW parts:
--- ../leon3-zestsc1-xc3s1000/Makefile.inc 2019-02-28 20:55:35.143510266 +0100 +++ Makefile.inc 2019-10-19 08:55:49.590853311 +0200 @@ -1,12 +1,12 @@ -TECHNOLOGY=Spartan3 -PART=xc3s1000 -PACKAGE=ft256 -SPEED=-5 +TECHNOLOGY=Spartan6 +PART=xc6slx100 +PACKAGE=fgg484 +SPEED=-2 SYNFREQ=48 # PROMGENPAR=-x xcf04s -u 0 $(TOP).bit -p mcs -w -o digilent-xc3s1000 MANUFACTURER=Xilinx -MGCPART=3s1000$(PACKAGE) +MGCPART=6slx100$(PACKAGE) MGCTECHNOLOGY=$(TECHNOLOGY) MGCPACKAGE=$(PACKAGE)
constant CFG_AHB_MONWAR : integer := 0; constant CFG_AHB_DTRACE : integer := 0; -- DSU UART - constant CFG_AHB_UART : integer := 1; + constant CFG_AHB_UART : integer := 0; -- JTAG based DSU interface - constant CFG_AHB_JTAG : integer := 0; + constant CFG_AHB_JTAG : integer := 1;
-- LEON2 memory controller constant CFG_MCTRL_LEON2 : integer := 1; constant CFG_MCTRL_RAM8BIT : integer := 0; @@ -132,7 +133,7 @@ constant CFG_ROMMASK : integer := 16#E00# + 16#100#; -- AHB RAM constant CFG_AHBRAMEN : integer := 1; - constant CFG_AHBRSZ : integer := 16; + constant CFG_AHBRSZ : integer := 256; constant CFG_AHBRADDR : integer := 16#400#; constant CFG_AHBRPIPE : integer := 0; -- UART 1
And for now, that's it - LEON configuration wise.
Now, there are many ways to use LEONs in one's design. To make things easier, for this 1st test, I will be using the freely available evaluation version of GRMON. GRMON is a debugging monitor/control tool specifically made to assist development with LEONs. For later stages in particular, where we will be loading the software we compiled inside our CPU, GRMON offers a GDB server; allowing us to debug things over good old GDB. Very convenient.
One might even call this a philosophy.
Speaking of small, nice tools, you better download xc3sprog
as well.
It can be compiled from source - it's fully open; and then, instead of
launching IMPACT to program our XC6LX100, we will be able
to spawn a tiny 300KB executable - and do all the work via a simple
incantation in our Makefile:
xc3sprog -c xpc -v YourBitfileGoesHere
But enough about tooling, let's get back to the code.
What about leon3mp.vhd
- the VHDL file that describes our LEON3 core?
--- ../leon3-zestsc1-xc3s1000/leon3mp.vhd 2019-03-17 09:36:24.901622577 +0100 +++ leon3mp.vhd 2019-10-19 08:45:55.520843754 +0200 @@ -60,15 +60,14 @@ use_ahbram_sim : integer := 0 ); port ( - resetn : in std_ulogic; - clk : in std_ulogic; - iu_error : out std_ulogic; - dsuact : out std_ulogic; - dsu_rx : out std_ulogic; - dsu_tx : in std_ulogic; - rx : out std_ulogic; - tx : in std_ulogic; - IO : inout std_logic_vector(46 downto 0) + resetn : in std_ulogic; + clk : in std_ulogic; + iu_error : out std_ulogic; + dsuact : out std_ulogic; + rx : out std_ulogic; + tx : in std_ulogic ); end; @@ -76,7 +75,7 @@ constant blength : integer := 12; constant fifodepth : integer := 8; - constant maxahbm : integer := CFG_NCPU+CFG_AHB_UART; -- A truly "Spartan" set of AHB masters :-) + constant maxahbm : integer := CFG_NCPU+CFG_AHB_JTAG; -- A truly "Spartan" set of AHB masters :-)
Compared to the previous ZestSC1/Spartan3 design, GRMON won't be controlling
the LEON's Debug Support Unit (DSU) via special serial data; we will be using JTAG
instead (spawning grmon -u -xilusb
- or, if you are using a Digilent
HS2-compatible device, grmon -u -digilent
). We therefore need to drop
these DSU-serial signals (dsu_rx
, dsu_tx
).
The Pano UCF file also has no IO
. It contains many signals towards other
parts that look like a lot of fun, though - VGA signals, for instance... :-)
Looking forward to hooking my HW
Mandelbrot,
directly on a monitor :-)
-- my ZestSC1 board's frequency in KHz - constant BOARD_FREQ : integer := 48000; - -- cpu frequency in KHz will be 34000 - as per my S/P/R results, + constant BOARD_FREQ : integer := 25000; + -- cpu frequency in KHz will be 25000 - as per my S/P/R results, -- my design can easily reach this speed. constant CPU_FREQ : integer := BOARD_FREQ * CFG_CLKMUL / CFG_CLKDIV; constant IOAEN : integer := 0; @@ -126,13 +123,9 @@ attribute syn_keep : boolean; attribute syn_preserve : boolean; - -- RS232 APB Uart - signal rxd1 : std_logic; - signal txd1 : std_logic; - -- A "heartbeat" LED for the DSU - I used it to make sure the - -- locally instantiated clock here beats indeed at 34MHz - -- (search below for 34000000 to see the logic) + -- locally instantiated clock here beats indeed at 25MHz
The clock in the Pano runs at 25MHz, not 48MHz.
We also need to instantiate the JTAG controller - and remove the DSU-controlling UART:
@@ -199,35 +192,28 @@ dsuo.tstop <= '0'; dsuo.active <= '0'; end generate; + ahbjtaggen0 :if CFG_AHB_JTAG = 1 generate + ahbjtag0 : ahbjtag generic map(tech => fabtech, hindex => CFG_NCPU) + port map(rstn, clkm, tck, tms, tdi, tdo, ahbmi, ahbmo(CFG_NCPU), + open, open, open, open, open, open, open, gnd(0)); + end generate; + -- To verify that the clock shenanigans actually work on my board, -- I hooked this up to LED6 (i.e. the 2nd from the right) and -- confirmed that the clock driving the LEON3 and the DSU and all - -- the rest is indeed a 34MHz clock. + -- the rest is indeed a 25MHz clock. process(clkm) begin if rising_edge(clkm) then counter_dsu <= counter_dsu + 1; - if counter_dsu = 34000000 then + if counter_dsu = 25000000 then counter_dsu <= 0; heartbeat_led_dsu <= not heartbeat_led_dsu; end if; end if; end process; - -- Debug UART - dcomgen : if CFG_AHB_UART = 1 generate - dcom0 : ahbuart - generic map (hindex => CFG_NCPU, pindex => 4, paddr => 7) - port map (rstn, clkm, dui, duo, apbi, apbo(4), ahbmi, ahbmo(CFG_NCPU)); - dui.rxd <= rxd1; - end generate; - nouah : if CFG_AHB_UART = 0 generate apbo(4) <= apb_none; end generate; - - urx_pad : inpad generic map (tech => padtech) port map (dsu_tx, rxd1); - utx_pad : outpad generic map (tech => padtech) port map (dsu_rx, txd1); - txd1 <= duo.txd; - ---------------------------------------------------------------------- --- APB Bridge and various periherals -------------------------------
Now, I am not the only one stating that - when compared to their SW counterparts - the HW synthesis toolchains are in an abysmal state. There is a movement underway to implement open-source alternatives (e.g. see Yosys, arachne-pnr, etc). But for these efforts to succeed, an ecosystem of open library IPs needs to be developed around them.
I know the current "DNA" of HW engineers is very much of a proprietary nature - but IMHO, the HW design community needs to evolve beyond this. Become open-source mutants, like us SW people!
I am pretty sure some truly spectacular super-powers would come out of such a mutation.
Finally, let's update our testbench to comply with all the changes we did to our LEON3 design:
--- ../leon3-zestsc1-xc3s1000/testbench.vhd 2019-03-08 17:41:44.770716858 +0100 +++ testbench.vhd 2019-10-20 08:58:26.754650548 +0200 @@ -31,6 +31,7 @@ library gaisler; use gaisler.libdcom.all; use gaisler.sim.all; library techmap; use techmap.gencomp.all; use std.textio.all; @@ -56,8 +57,9 @@ signal rstn : std_ulogic := '1'; signal iu_error : std_ulogic; signal dsuact : std_ulogic; - signal dsu_tx : std_logic; - signal dsu_rx : std_logic; component leon3mp port ( @@ -65,8 +67,8 @@ resetn : in std_ulogic; iu_error : out std_ulogic; dsuact : out std_ulogic; - dsu_rx : out std_ulogic; -- UART1 tx data - dsu_tx : in std_ulogic -- UART1 rx data ); end component; @@ -75,12 +77,12 @@ begin d3 : leon3mp port map ( - clk => CLK, resetn => rstn, + clk => CLK, iu_error => iu_error, dsuact => dsuact, - dsu_rx => dsu_rx, - dsu_tx => dsu_tx ); clk <= not clk after CLK_PERIOD/2; @@ -94,79 +96,21 @@ severity failure; end process; - dsucom : process - procedure dsucfg(signal dsutx : out std_ulogic; signal dsurx : in std_ulogic) is - variable w32 : std_logic_vector(31 downto 0); - variable c8 : std_logic_vector(7 downto 0); - constant txp : time := 320 * 1 ns; - variable l : line; - begin - dsutx <= '1'; - write(l, String'("Resetting for 40 cycles")); - writeline(output, l); - rstn <= '1'; - wait for 40*CLK_PERIOD; - rstn <= '0'; - wait for 10*CLK_PERIOD; - - wait for 5000 ns; - - -- Send exactly what grmon3 sends. - txc(dsutx, 16#55#, txp); - txc(dsutx, 16#55#, txp); - txc(dsutx, 16#55#, txp); - txc(dsutx, 16#55#, txp); - txc(dsutx, 16#80#, txp); - txc(dsutx, 16#ff#, txp); - txc(dsutx, 16#ff#, txp); - txc(dsutx, 16#ff#, txp); - txc(dsutx, 16#f0#, txp); - txc(dsutx, 16#80#, txp); - txc(dsutx, 16#ff#, txp); - txc(dsutx, 16#ff#, txp); - txc(dsutx, 16#ff#, txp); - txc(dsutx, 16#f0#, txp); - txc(dsutx, 16#ff#, txp); - - -- and look at the magnificent output from our design; - -- the DSU replies with 00 00 10 70 ; the proper response! - - -- This test can also be used - it is the original - -- scenario taken from digilent-xc3s1000. - - -- txc(dsutx, 16#55#, txp); -- sync uart - - -- txc(dsutx, 16#c0#, txp); - -- txa(dsutx, 16#90#, 16#00#, 16#00#, 16#00#, txp); - -- txa(dsutx, 16#00#, 16#00#, 16#20#, 16#2e#, txp); - - -- wait for 25000 ns; - -- txc(dsutx, 16#c0#, txp); - -- txa(dsutx, 16#90#, 16#00#, 16#00#, 16#20#, txp); - -- txa(dsutx, 16#00#, 16#00#, 16#00#, 16#01#, txp); - - -- txc(dsutx, 16#c0#, txp); - -- txa(dsutx, 16#90#, 16#40#, 16#00#, 16#24#, txp); - -- txa(dsutx, 16#00#, 16#00#, 16#00#, 16#0D#, txp); - - -- txc(dsutx, 16#c0#, txp); - -- txa(dsutx, 16#90#, 16#70#, 16#11#, 16#78#, txp); - -- txa(dsutx, 16#91#, 16#00#, 16#00#, 16#0D#, txp); - - -- txa(dsutx, 16#90#, 16#40#, 16#00#, 16#44#, txp); - -- txa(dsutx, 16#00#, 16#00#, 16#20#, 16#00#, txp); - - -- txc(dsutx, 16#80#, txp); - -- txa(dsutx, 16#90#, 16#40#, 16#00#, 16#44#, txp); - - -- Look! The DSUACT signal goes high! All good. - wait for 50000 ns; - - write(l, String'("Test completed.")); - writeline(output, l); - end procedure; + jtagproc : process + variable l : line; begin - dsucfg(dsu_tx, dsu_rx); - wait; - end process; + write(l, String'("Resetting for 40 cycles")); + writeline(output, l); + rstn <= '1'; + wait for 40*CLK_PERIOD; + rstn <= '0'; + wait for 10*CLK_PERIOD; + + wait for 5000 ns; + + write(l, String'("Looks like we are booting.")); + writeline(output, l); + assert false report "Reached end of test" severity failure; + end process; + end;
Time to launch GHDL to simulate this circuit - GHDL being a magnificent open-source simulator that you can compile from source (or install via your Linux distribution's repositories):
bash$ make simulation-setup ... bash$ make simulation ... Resetting for 40 cycles Panologic G2 LX100 Demonstration design GRLIB Version 2017.3.0, build 4208 Target technology: spartan6 , memory library: spartan6 ahbctrl: AHB arbiter/multiplexer rev 1 ahbctrl: Common I/O area disabled ahbctrl: AHB masters: 2, AHB slaves: 8 ahbctrl: Configuration area at 0xfffff000, 4 kbyte ahbctrl: mst0: Cobham Gaisler LEON3 SPARC V8 Processor ahbctrl: mst1: Cobham Gaisler JTAG Debug Link ahbctrl: slv1: Cobham Gaisler AHB/APB Bridge ahbctrl: memory at 0x80000000, size 1 Mbyte ahbctrl: slv2: Cobham Gaisler LEON3 Debug Support Unit ahbctrl: memory at 0x90000000, size 256 Mbyte ahbctrl: slv3: Cobham Gaisler Single-port AHB SRAM module ahbctrl: memory at 0x40000000, size 1 Mbyte, cacheable, prefetch ahbctrl: slv4: Cobham Gaisler Test report module ahbctrl: memory at 0x20000000, size 1 Mbyte apbctrl: APB Bridge at 0x80000000 rev 1 apbctrl: slv1: Cobham Gaisler Generic UART apbctrl: I/O ports at 0x80000100, size 256 byte apbctrl: slv2: Cobham Gaisler Multi-processor Interrupt Ctrl. apbctrl: I/O ports at 0x80000200, size 256 byte apbctrl: slv3: Cobham Gaisler Modular Timer Unit apbctrl: I/O ports at 0x80000300, size 256 byte testmod4: Test report module ahbram3: AHB SRAM Module rev 1, 256 kbytes gptimer3: Timer Unit rev 1, 8-bit scaler, 2 32-bit timers, irq 8 irqmp: Multi-processor Interrupt Controller rev 4, #cpu 1, eirq 0 apbuart1: Generic UART rev 1, fifo 4, irq 2, scaler bits 12 ahbjtag AHB Debug JTAG rev 2 dsu3_2: LEON3 Debug support unit + AHB Trace Buffer, 2 kbytes leon3_0: LEON3 SPARC V8 processor rev 3: iuft: 0, fpft: 0, cacheft: 0 leon3_0: icache 1*8 kbyte, dcache 1*8 kbyte clkgen_spartan3e: spartan3/e sdram/pci clock generator, version 1 clkgen_spartan3e: Frequency 25000 KHz, DCM divisor 5/5 1750 ns : cpu0: 0x00000000 unimp (trapped) Looks like we are booting. testbench.vhd:113:5:@6us:(assertion failure): Reached end of test ghdl:error: assertion failed from: process work.testbench(behav).jtagproc at testbench.vhd:113 ghdl:error: simulation failed make: *** [Makefile:38: simulation] Error 1
All good! The LEON3 traps after 1.75 microseconds, since it reads a nice 32-bit zero from our "ram" - which is not valid code for a SPARC.
The "assertion failure" is normal, since that's how the testbench ends:
assert false report "Reached end of test" severity failure;
Now, there's plenty more things we can do here - like configuring the simulated RAM to have a binary we compile ourselves.
But we are insane SW people here, playing with forces we don't comprehend.
Let's launch the thing in the real HW!
$ make ise
...
... laptop fans wake up - sounds like an airplane here...
... 5 minutes pass...
... there's no edit-compile-run cycle... there's...
... edit-compile-GoForAtripToTheAlpsAndStayForAweek-then-maybe-run cycle...
...
FLEXnet Licensing error:-5,357
For further information, refer to the FLEXnet Licensing documentation,
available at "www.flexerasoftware.com".
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR:Map:258 - A problem was encountered attempting to get the license for this
architecture.
Ah yes, I forgot!
$ # No wireless lan interface tolerated by Xilinx ;
$ # temporarily remove the driver for wlan0 from the kernel
$ sudo rmmod wl
$ # Must also have 'eth0' - with the magic MAC address set,
$ # so process my /etc/systemd/network/25-dummy.netdev
$ sudo systemctl restart systemd-networkd
$ sudo ifconfig eth0 up
I refuse to memorize idiocy - so I just add these commands in the Makefile ;
the network will be automatically made the way Xilinx wants it,
every time the build takes place - and will then be automatically set back to normal
(modprobe wl ; dhclient wlan0
).
Come to think of it, perhaps I should investigate making synthesis happen inside a Docker container; and setup this insane network inside the container. Hmm.
Oh well, postponed for later investigation.
For now, take 2:
$ make ise
...
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
INFO:Security:56 - Part 'xc6slx100' is not a WebPack part.
WARNING:Security:42 - Your software subscription period has lapsed. Your current
version of Xilinx tools will continue to function, but you no longer qualify for
Xilinx software updates or new releases.
----------------------------------------------------------------------
...
(panic attack at first - but thankfully, synthesis continues fine regardless)
(shakes head)
You poor, poor HW people...
...
All constraints were met.
...
Generating Pad Report.
All signals are completely routed.
Design statistics:
Minimum period: ........ (Maximum frequency: 83.081MHz)
...
Creating bit map...
Saving bit stream in "TheBigLeonski.bit".
Creating bit mask...
Saving mask bit stream in "TheBigLeonski.msk".
Bitstream generation is complete.
Woohoo! We're good. Way beyond good, in fact - we can bump up our LEON's clock way above our current setting of 25MHz.
But before we do that, let's bump the number of cores - In fact, this FPGA is such a monster, it can easily accommodate 2, even 4 LEONs. The utilization report - showing percentage of utilised resources - is far from maximised, in everything (except BlockRAMs [2]).
So we bump up the number of cores:
-- 2 LEON cores, please! constant CFG_NCPU : integer := (2);
...and we bump up the clock, to a very safe 50MHz:
-- 10/5 = 2, so 2x25MHz = 50MHz constant CFG_CLKMUL : integer := (10); constant CFG_CLKDIV : integer := (5);
This is the part that should make you stand and notice - here we are, casually specifying, in code, that we want 2 cores in our CPU. FPGAs are amazing.
We run our synthesis again, and a few minutes later...
That's it - GRMON sees both our cores, running at 50MHz:
JTAG chain (1): xc6slx100
GRLIB build version: 4208
Detected frequency: 50.0 MHz
Component Vendor
LEON3 SPARC V8 Processor Cobham Gaisler
LEON3 SPARC V8 Processor Cobham Gaisler
JTAG Debug Link Cobham Gaisler
AHB/APB Bridge Cobham Gaisler
LEON3 Debug Support Unit Cobham Gaisler
Single-port AHB SRAM module Cobham Gaisler
Generic UART Cobham Gaisler
Multi-processor Interrupt Ctrl. Cobham Gaisler
Modular Timer Unit Cobham Gaisler
Time to compile some SW and run it inside this!
One can compile a cross compiler for this target from source (and in fact I frequently do, as part of my duties in the Agency) . But to avoid making this gigantic blog post even heavier, let's just use the precompiled open-source toolchain of BCC2 - from here. We un-tar under /opt
; and build our hello world:
$ cat hello.c
#include <stdio.h>
int main() { puts("Hello, Big Leonski!"); }
$ /opt/bcc-2.0.8-gcc/bin/sparc-gaisler-elf-gcc -mcpu=leon3 \
-o hello hello.c
$ /opt/grmon-eval-3.1.0/linux/bin64/grmon -u -xilusb
...
grmon3> load hello
40000000 .text 25.2kB / 25.2kB [===============>] 100%
40006500 .rodata 128B [===============>] 100%
40006580 .data 1.2kB / 1.2kB [===============>] 100%
Total size: 26.53kB (1.18Mbit/s)
Entry point 0x40000000
Image /var/tmp/hello loaded
grmon3> run
Hello, Big Leonski!
CPU 0: Program exited normally.
CPU 1: Power down mode
And that's it - we have ourselves a multi-core CPU, built from our own source code, running binaries built from our own source code, with a cross-compiler that can also be built from openly accessible source code.
Ideally, one would want to support the remaining pieces of this board; It has two USB slots, Ethernet, and most importantly 128MB of DDR2 SDRAM. These last two pieces in particular, would elevate it to something like the first "serious" machine I worked with, back when I was a student: a SPARCStation. I'd love that; and if the HW controllers involved are supported by Linux, bootstrapping the undisputed king of OSes inside this would be a breeze.
Alas, I am told by my friends that DDR controllers are no joke; they are not the playground of bored SW engineers.
Sigh :-)
Still, I hope you found this (very long) read an interesting one.
Cheers!
Discussion in Reddit/Linux Discussion in Reddit/FPGA
To HW designers reading this - please remember who is the intended audience of this blog post. Come to think of it, remember this is written by a SW developer; cue appropriate meme.
Index | CV | Updated: Tue Jun 13 21:45:26 2023 |
The comments on this website require the use of JavaScript. Perhaps your browser isn't JavaScript capable; or the script is not being run for another reason. If you're interested in reading the comments or leaving a comment behind please try again with a different browser or from a different connection.