Info
Copyright © 2021-2024 Robert Winkler
This book is licensed under the Creative Commons BY-NC-SA 4.0 which summarized means:
You are free to:
-
Share — copy and redistribute the material in any medium or format
-
Adapt — remix, transform, and build upon the material
-
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NonCommercial — You may not use the material for commercial purposes.
-
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
-
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
You can support the book and purchase the chapter exercise solutions here.
The repo for the book, where you can get the code referenced and report any errors (submit an issue or even a pull request) is here. The code is MIT licensed.
If you’re interested in contacting me regarding RISC-V tutoring or any other business request related to the book, you can reach me at books@robertwinkler.com.
Dedication
This book is dedicated to all its supporters and all the students I’ve helped with RISC-V over the years who inspired me to create it.
Thank you.
An extra thank you goes to the corporate level sponsors below:
|
Chapter 0: Hello World
In which we lay the groundwork for the rest of the book…
Prereqs
While someone with no programming experience could probably learn RISC-V from this book, it is definitely preferable to have at least some experience in a higher level imperative programming language. I say imperative, because programming in assembly is the antithesis of functional programming; everything is about state, with each line changing the state of the CPU and sometimes memory. Given that, experience in functional languages like Lisp, Scheme etc. are less helpful than experience in C/C++, Java, Python, Javascript etc.
Of all of the latter, C is the best, with C++ being a close second because at least all of C exists in C++. There are many reasons C is the best prior experience when learning assembly (any assembly, not just RISC-V), including the following:
-
pointers, concepts and symmetry of "address of" and "dereference" operators
-
pointer/array syntax equivalence
-
stack allocation as the default
-
manual memory management, no garbage collector
-
global data
-
rough equivalence in structure of a C program and an assembly program (vs. say Java)
-
pass by value
There is some overlap between those and there are probably more, but you can see that most other languages that are commonly taught as first languages are missing most, if not all of those things.
Even C++, which technically has all of them being a superset of C, is usually taught in a way that mostly ignores all of those things. They teach C++ as if it’s Java, never teaching the fundamentals. In any case this is getting into my problems with CS pedagogy of the last 20 years based on my experience as a CS major myself ('12) and as a programming tutor helping college students across the country since 2016, and I should save it for a proper essay/rant sometime.
Long story short, I use C code and C syntax to help explain and teach RISC-V. I’ll try to provide enough explanation regardless of past experience as best I can.
System Setup
As I tell all of my tutoring students, if you’re majoring in CS or anything related I highly recommend you use Linux. It’s easier in every way to do dev work on Linux vs Windows or Mac. Many assignments require it, which often necessitates using a virtual machine (which is painful, especially on laptops) and/or ssh-ing into a school Linux server, which is also less than ideal. In general, you’ll have to learn how to use the Unix terminal eventually and will probably use it to some extent in your career so it also makes sense to get used to it asap.
That being said, Windows does now have WSL so you can get the full Ubuntu or Debian or Fedora etc. terminal based system on Windows without having to setup a real virtual machine (or dealing with the slowdown that would cause). I’ve even heard that they’ll get support for Linux GUI programs soon.
MacOS on the other hand, is technically a Unix based system and you can use their terminal and install virtually any program from there using Macports or Homebrew or similar.
There are a few RISC-V simulators that I know of and have used:
-
RARS is a RISC-V port of MARS, a Java GUI based simulator with dozens of extra environment calls, syntactic sugar and features like graphics, memory mapped I/O, etc.
-
Venus, a web based simulator used by Berkeley (also has a downloadable jar)
-
Ripes, a graphical processor simulator and assembly editor for bare bones assembly programming
Of those three, RARS is by far the most full featured and user friendly for learning since it forked from the venerable MARS MIPS simulator. It is also the most commonly used by students outside of Berkeley. Given that, this book will focus primarily on RARS, though most of it applies equally well to Venus. Appendix A: Venus covers the differences between RARS and Venus.
You can download/access both at the following links:
-
RARSM RARS iMproved; fork of RARS with fixes/features
-
ThaumicMekanism’s Venus Get the jar here
-
CS 61C’s Venus Get the jar here.
Handy Resources
There are a few references that you should bookmark (or download) before you get started. The first is the RISC-V Greensheet. It’s possible you already have a physical copy of this as it’s actually the tearout from the Patterson and Hennessey textbook Computer Architecture and Design that is commonly used in college courses. Berkeley provides a similar reference sheet with the same information.
There is also a large format of the greensheet.
The second thing is the list of environment calls (aka ecalls, system calls, syscalls) from the RARS wiki.
I recommend you download/bookmark both and keep them open while working because you’ll be referencing them often to remind yourself which instructions and ecalls you have available and how they work.
Hello World
Let’s start with the classic hello world program, first in C, then in RISC-V, and go over all the pieces in overview. You can copy paste these into your editor of choice (mine being neovim), or use the files in the associated repo to follow along.
1
2
3
4
5
6
7
#include <stdio.h>
int main()
{
printf("Hello World!\n");
return 0;
}
It is pretty self explanatory. You have to include stdio.h so you can use the function printf (though in the real world I’d use puts here), the function main is the start of any C/C++ program, which is a function that returns an int. We call printf to display the string "Hello World!\n" to the user and then return 0 to exit. Returning 0 indicates success and there were no errors.
You can compile and run it in a linux/unix terminal as shown below. You can substitute clang or another compiler for gcc if you want.
$ gcc -o hello hello.c
$ ./hello
Hello World!
Now, the same program in RISC-V:
1
2
3
4
5
6
7
8
9
10
11
.data
hello: .asciz "Hello World!\n"
.text
main:
li a7, 4 # load immediate, a7 = 4 (4 is print string system call)
la a0, hello # load address of string to print into a0
ecall
li a7, 10 # exit ecall
ecall
The .data
section is where you declare global variables, which includes string
literals as in this case. We’ll cover them in more detail later.
The .text
section is where any code goes. Here we declare a single label main:
,
indicating the start of our main function.
We then put the number 4 in the a7
register to select the print string system
call. The print string system call takes one argument, the address of the string
to print, in the a0
register. We do that on the next line. On line 8, we call
the system call using the ecall instruction.
Finally we call the exit system call which takes no arguments and exits the program.
Again, we’ll cover system calls in a later chapter. This is just an intro/overview so don’t worry if some things aren’t completely clear. This chapter is about getting you up and running, not really about teaching anything specific yet.
Building and Running
Now that we have our hello world RISC-V program, how do we run it? Well the easiest and quickest[1] way is of course to do it on the command line, which can be done like this:
$ java -jar ~/rars_latest.jar hello.s
RARS 1.5 Copyright 2003-2019 Pete Sanderson and Kenneth Vollmar
Hello World!
Program terminated by calling exit
The name of your RARS jar file may be different[2], so be sure to use the correct name and path. For myself, I keep the jar file in my home directory so I can use tilde to access it no matter where I am. You can also copy it into your working directory (ie wherever you have your source code) so you don’t have to specify a path at all. There are lots of useful command line options that you can use[3], some of which we’ll touch on later.
Running the jar directly on the command line works even in the Windows/DOS command line though I’ve never done it and it’s probably not worth it.
Alternatively, you can start up RARS like a normal GUI application and then load your source file. RARS requires you to hit "assemble" and then "run".
Conclusion
Well, there you have it, you have written and run your first RISC-V program. Another few chapters and you will have no trouble with almost anything you would want to do in RISC-V, whether for a class, or on your own for fun.
Exercises
You can support the book and purchase the chapter exercise solutions here.
-
Modify the hello world program to print something different, perhaps your name.
-
Run it with both RARS and Venus.
purposes of learning and actually using assembly directly, there’s no reason to make your life harder than necessary.]
Chapter 1: Data
In RISC-V, you can declare global variables in the .data
section.
At a minimum, this is where you would declare/define any literal strings your program will be printing, since virtually every program has at least 1 or 2 of those.
When declaring something in the .data
section, the format is
variable_name: .directive value(s)
where whitespace between the 3 is arbitrary. The possible directives are listed in the following table:
Directive | Size | C equivalent |
---|---|---|
.byte |
1 |
char |
.half |
2 |
short |
.word |
4 |
int, all pointer types |
.float |
4 |
float |
.double |
8 |
double |
.ascii |
NA |
char str[5] = "hello"; (no '\0') |
.asciz |
NA |
char str[] = "hello"; (includes the '\0') |
.string |
NA |
alias for .asciz |
.space |
NA |
typeless, unitinialized space, can be used for any type/array |
As you can see it’s pretty straightforward, but there are a few more details about actually using them so let’s move onto some examples.
Say you wanted to convert the following simple program to RISC-V:
1
2
3
4
5
6
7
8
9
10
11
#include <stdio.h>
int main()
{
char name[30];
int age;
printf("What's your name and age?\n");
scanf("%s %d", name, &age);
printf("Hello %s, nice to meet you!\n", name);
return 0;
}
The first thing you have to remember when converting from a higher level language
to assembly (any assembly), is that what matters is whether it is functionally
the same, not whether everything is done in exactly the same wayfootnote:[Obviously compilers have to follow stricter rules, but for the.
In this instance, that means realizing that your literal strings and your local
variables name
and age
become globals in RISC-V.
1
2
3
4
5
6
7
8
9
10
11
12
.data
age: .word 0 # can be initialized to anything
ask_name: .asciz "What's your name and age?\n"
hello_space: .asciz "Hello "
nice_meet: .asciz ", nice to meet you!\n"
name: .space 30
.text
# main goes here
As you can see in the example, we extract all the string literals and
the character array name
and int age
and declare them as globals.
One thing to note is the second printf
. Because it prints a variable, name
,
using the conversion specifier, we break the literal into pieces around that.
Since there is no built-in printf
function in RISC-V, you have to handle printing
variables yourself with the appropriate environment calls.
Arrays
Obviously strings are special cases that can be handled with .ascii
or .asciz
(or the alias .string
) for literals, but for other types or user inputed strings
how do we do it?
The first way, which was demonstrated in the snippet above is to use .space
to declare an array of the necessary byte size. Keep in mind that the size is
specified in bytes not elements, so it only matches for character arrays. For
arrays of ints/words, floats, doubles etc. you’d have to multiply by the sizeof(type).
"But, .space
only lets you declare uninitialized arrays, how do I do initialized ones?"
Actually, it appears .space
initializes everything to 0 similar to global/static
data in C and C++, though I can’t find that documented anywhere.
Aside from that, there are two ways depending on whether you want to initialize every element to the same value or not.
For different values, the syntax is an extension of declaring a single variable
of that type. You specify all the values, comma separated. This actually gives
you another way to declare a string or a character array, though I can’t really
think of a reason you’d want to. You could declare a .byte
array and list all
the characters individually.
However, if you want an array with all elements initialized to the same value
there is a more convenient option. After the type you put the value you want,
a colon, and then the number of elements. So a: .word 123 : 10
would
declare a 10 integer array with all elements set to 123. Note, Venus does
not support this syntax.
Given what we just covered, this:
1
2
3
4
5
int a[20];
double b[20];
int c[10] = { 9,8,7,6,5,4,3,2,1,0 };
int d[5] = { 42, 42, 42, 42, 42 };
char e[3] = { 'a', 'b', 'c' };
becomes
1
2
3
4
5
6
.data
a: .space 80
b: .space 160
c: .word 9,8,7,6,5,4,3,2,1,0
d: .word 42 : 5
e: .byte 'a', 'b', 'c'
For more examples of array declarations, see array_decls.s. You don’t have to understand the rest of the code, just that it prints out each of the arrays.
Exercises
You can support the book and purchase the chapter exercise solutions here.
-
Create a RISC-V data section that declares variables equivalent to the following. This will not be a runnable program without a main.
1 2 3 4 5 6
float a; float b = 2.71; int myarray[10] = { 9, 8, 7, 6, 5, 4, 3, 2, 1 }; short array2[10]; char riscv_str[] = "RISC-V assembly is awesome!";
-
How would you declare an array of 500 points? The point structure is tightly packed and defined like this:
1 2 3 4 5
struct point { float x; float y; float z; };
Chapter 2: Environment Calls
We mentioned environment calls (aka ecalls, though they’re also called system calls or syscalls in other languages like MIPS) in chapter 0 when we were going over our "Hello World" program, but what exactly are they?
Essentially, they are the built in functions of an operating system; in this case, the simple operating system of the RARS simulator. They provide access to all the fundamental features, like input and output to/from both the console and files, allocating memory, and exiting. Those are the basics but RARS supports many more, for things ranging from playing MIDI sounds, to getting a random number, to creating GUI dialogs.[4]
Name | a7 | Arguments | Result |
---|---|---|---|
print integer |
1 |
a0 = integer to print |
|
print float |
2 |
fa0 = float to print |
|
print double |
3 |
fa0 = double to print |
|
print string |
4 |
a0 = address of string |
|
read integer |
5 |
a0 = integer read |
|
read float |
6 |
fa0 = float read |
|
read double |
7 |
fa0 = double read |
|
read string |
8 |
a0 = address of input buffer |
works like C’s fgets |
sbrk |
9 |
a0 = size in bytes to allocate |
a0 = address of allocated memory (sbrk is basically malloc but there is no free) |
exit |
10 |
program terminates |
|
print character |
11 |
a0 = character to print (ascii value) |
|
read character |
12 |
a0 = character read |
|
open file |
1024 |
a0 = address of filename |
a0 = file descriptor (negative if error) |
lseek |
62 |
a0 = file descriptor, a1 = offset from base, |
a0 = selected position from beginning of the file or -1 if error |
read from file |
63 |
a0 = file descriptor |
a0 = number of characters read, -1 if error |
write to file |
64 |
a0 = file descriptor |
a0 = number of characters written |
close file |
57 |
a0 = file descriptor |
|
exit2 |
93 |
a0 = termination result |
program terminates, returning number in a0 (only meaningful when run in the terminal, ignored in GUI) |
As you can see, they really only cover the basics. You can read or write the different types, do file I/O using calls identical to POSIX functions (open, read, write, close; see man pages), allocate memory, and exit. Even so, they’re sufficient to build anything you want.
So, what does that table mean? How do these actually work?
The process is:
-
Put the number for the ecall you want in
a7
-
Fill in the appropriate arguments, if any
-
Execute the ecall with
ecall
1
2
3
li a7, 1 # 1 is print integer
li a0, 42 # takes 1 arg in a0, the number to print
ecall # actually execute ecall
You can think of the above as print_integer(42);
. Let’s look at an actual
program that uses a few more ecalls next.
Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <stdio.h>
int main()
{
int age;
int height;
char name[50];
printf("What's your name? ");
fgets(name, 50, stdin);
printf("Hello %s", name);
printf("How old are you? ");
scanf("%d", &age);
printf("Enter your height in inches: ");
scanf("%d", &height);
printf("Your age + height = %d\n", age + height);
return 0;
}
I’m using fgets()
instead of scanf("%s", name)
because fgets works the same as the
read string ecall (8).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
.data
name: .space 50
nameprompt: .asciz "What's your name? "
hello_space: .asciz "Hello "
how_old: .asciz "How old are you? "
ask_height: .asciz "Enter your height in inches: "
ageplusheight: .asciz "Your age + height = "
.text
main:
li a7, 4 # print string
la a0, nameprompt # load address of string to print into a7
ecall
li a7, 8 # read string
la a0, name
li a1, 50
ecall
li a7, 4
la a0, hello_space
ecall
la a0, name # note 4 is still in a7
ecall
# don't print a newline here because
# one will be part of name unless they typed >48 characters
li a7, 4
la a0, how_old
ecall
li a7, 5 # read integer
ecall
mv t0, a0 # save age in t0
li a7, 4
la a0, ask_height
ecall
li a7, 5 # read integer
ecall
add t0, t0, a0 # t0 += height
li a7, 4
la a0, ageplusheight
ecall
li a7, 1 # print int
mv a0, t0 # a0 = age + height
ecall
# print newline
li a7, 11 # print char
li a0, 10 # ascii value of '\n'
ecall
li a7, 10 # exit ecall
ecall
There a few things to note from the example.
We don’t declare global variables for age or height. We could, but there’s no reason
to since we need them in registers to perform the addition anyway. Instead, we
copy/save age to t0
so we can use a0
for 2 more ecalls,
then add height to t0
.
This is generally how it works. Use registers for local variables unless required to do otherwise. We’ll cover more about register use when we cover the RISC-V calling convention.
Another thing is when we print their name, we don’t put 4 in a7
again because it
is still/already 4 from the lines above.
Lastly, many people will declare a string "\n"
and use print string to print a newline,
but it’s easier to use the print char ecall as we do right before exiting.
Exercises
You can support the book and purchase the chapter exercise solutions here.
-
Convert the following C code to RISC-V
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
#include <stdio.h> int main() { float price; double golden = 1.618; int ret; printf("Enter what the price of gas was last time you filled up: "); scanf("%f", &price); printf("%f is too expensive!\n", price); printf("The golden ratio is roughly %f\n", golden); printf("Enter an integer for the program to return: " scanf("%d", &ret); return ret; }
-
Write a program that asks the user for their name, reads it in, and then prints "Hello [user’s name]!"
Chapter 3: Branches and Logic
We can’t go much further in our RISC-V programming journey without covering branching.
Almost every non-trivial program requires some logic, even if it’s only a few if
or
if-else
statements. In other words, almost every program requires branching, a way
to do a instead of b, or to do a only if certain conditions are met.
You already know how to do this in higher level languages, the aforementioned if
statement. In assembly it’s more complicated. Your only tool is the ability
to jump to a label on another line based on the result of various comparisons. The
relevant instructions are listed in the following table:
Name | Opcode | Format | Operation |
---|---|---|---|
Branch On Equal |
beq |
beq rs1, rs2, label |
if (rs1 == rs2) goto label |
Branch On Not Equal |
bne |
bne rs1, rs2, label |
if (rs1 != rs2) goto label |
Branch Less Than |
blt |
blt rs1, rs2, label |
if (rs1 < rs2) goto label |
Branch Greater Than |
bgt |
bgt rs1, rs2, label |
if (rs2 > rs2) goto label |
Branch Less Than Or Equal |
ble |
ble rs1, rs2, label |
if (rs1 ⇐ rs2) goto label |
Branch Greater Than Or Equal |
bge |
bge rs1, rs2, label |
if (rs1 >= rs2) goto label |
Set Less Than |
slt |
slt rd, rs1, rs2 |
rd = (rs1 < rs2) ? 1 : 0 |
Set Less Than Immediate |
slti |
slti rd, rs1, imm |
rd = (rs1 < imm) ? 1 : 0 |
Set Less Than Immediate Unsigned |
sltiu |
sltiu rd, rs1, imm |
rd = (rs1 < imm) ? 1 : 0 |
Set Less Than Unsigned |
sltu |
sltu rd, rs1, rs2 |
rd = (rs1 < rs2) ? 1 : 0 |
You can see the same information and more on the RISC-V greensheet and the RARS Supported Instructions list.[5][6]
There are additional pseudoinstructions in the form of beq/bne/blt/bgt/ble/bge + 'z' which are syntactic sugar to compare a register against 0, ie the 0 register.
So the following:
beq t0, x0, label bne t1, x0, label blt t2, x0, label
would be equivalent to:
beqz t0, label bnez t1, label bltz t2, label
Note x0
is the same as zero
and is the hard coded 0 register. I’ll cover
registers in more detail in the chapter on functions and the calling conventions.
One final thing is that labels have the same naming requirements as C variables and functions. They must start with a letter or underscore and the rest can be letters, underscores, or digits.
Practice
The rest of this chapter will be going over many examples, looking at snippets of code in C and translating them to RISC-V.
Basics
Let’s start with the most basic if statement. The code in and after the if statement is arbitrary.
1
2
3
4
if (a > 0) {
a++;
}
a *= 2;
Now in RISC-V, let’s assume that a is in t0
. The tranlation would look
like this:
1
2
3
4
ble t0, x0, less_eq_0 # if (a <= 0) goto less_eq_0
addi t0, t0, 1 # a++
less_eq_0:
slli t0, t0, 1 # a *= 2 (shifting left by n is multiplying by 2^n)
There are a few things to note in this example. The first is that in assembly we test for the opposite of what was in the if statement. This will always be the case when jumping forward because (if we want to keep the same order of code) we can only jump over a block of code, whereas in C we fall into the block if the condition is true. In the process of mentally compiling a bit of C to assembly, it can be helpful to change to jump based logic first. For example the previous C would become:
1
2
3
4
5
if (a <= 0)
goto less_eq_0;
a++;
less_eq_0:
a *= 2;
This is obviously still valid C but matches the branching behavior of assembly exactly. You can see I put comments for the equivalent C code in my assembly; it helps with readability to comment every line or group of lines that way.
The second thing to notice is how we handled the multiplication. This has nothing to do with branching but is something we’ll touch on multiple times throughout the book. Your job when acting as a human compiler is to match the behavior. You are under no obligation to match the structure or operations of the higher level code exactly (unless your professor stupidly forces you to).
Given that, it is in your best interest to change and rearrange things in order to simplify the assembly as much as possible to make your life easier. Generally speaking, this also tends to result in more performant code, since using fewer instructions and fewer branches (the most common outcomes) saves execution time.
In this case, using the standard mul
instruction would actually take 2
instructions:
1
2
li t1, 2
mul t0, t0, t1 # a *= 2
This is why, when multiplying or dividing by a constant power of 2 it’s common
practice to use slli
or srai
. This is true in all assembly languages because
multiplication and division are relatively costly operations so using shifts
when you can saves performance even if you didn’t actually save instructions.
Ok, let’s look at an if-else
example. Again, the actual code is arbitrary and
we’re assuming a and b are in t0
and t1
respectively
1
2
3
4
5
if (a > 0) {
b = 100;
} else {
b -= 50;
}
You could do it something like these two ways
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
bgt t0, x0, greater_0 # if (a > 0) goto greater_0
addi t1, t1, -50 # b -= 50
j less_eq_0
greater_0:
li t1, 100 # b = 100
less_eq_0:
# or
ble t0, x0, less_eq0 # if (a <= 0) goto less_eq_0
li t1, 100 # b = 100
j greater_0
less_eq_0:
addi t1, t1, -50 # b -= 50
greater_0:
You can see how the first swaps the order of the actual code which keeps the actual conditions the same as in C, while the second does what we discussed before and inverts the condition in order keep the the blocks in the same order. In both cases, an extra unconditional branch and label are necessary so we don’t fall through the else case. This is inefficient and wasteful, not to mention complicates the code unecessarily. Remember how our job is to match the behavior, not the exact structure? Imagine how we could rewrite it in C to simplify the logic:
1
2
3
4
b -= 50;
if (a > 0) {
b = 100;
}
which becomes
1
2
3
4
addi t1, t1, -50 # b -= 50;
ble t0, x0, less_eq_0 # if (a <= 0) goto less_eq_0
li t1, 100 # b = 100
less_eq_0:
That is a simple example of rearranging code to make your life easier. In this case, we are taking advantage of what the code is doing to make a default path or default case. Obviously, because of the nature of the code subtracting 50 has to be the default since setting b to 100 overwrites the original value which we’d need if we were supposed to subtract 50 instead. In cases where you can’t avoid destructive changes (like where the condition and the code are using/modifying the same variable), you can use a temporary variable; i.e. copy the value into a spare register. You still save yourself an unecessary jump and label.
Compound Conditions
These first 2 examples have been based on simple conditions, but what if you have compound conditions? How does that work with branch operations that only test a single condition? As you might expect, you have to break things down to match the logic using the operations you have.
Let’s look at and first. Variables a, b, and c are in t0, t1, and t2.
1
2
3
4
if (a > 10 && a < b) {
c += 20;
}
b &= 0xFF;
So what’s our first step? Like previous examples, we need to test for the opposite when we switch to assembly, so we need the equivalent of
1
2
3
4
5
if (!(a > 10 && a < b))
goto no_add20;
c += 20;
no_add20:
b &= 0xFF;
That didn’t help us much because we still don’t know how to handle that compound condition. In fact we’ve made it more complicated. If only there were a way to convert it to or instead of and. Why would we want that? Because, while both and and or in C allow for short circuit evaluation (where the result of the whole expression is known early and the rest of expression is not evaluated), with or, it short circuits on success while and short circuits on failure. What does that mean? It means that with or, the whole expression is true the second a single true term is found, while with and the whole expression is false the second a single false term is found.
Let’s look at the following code to demonstrate:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
if (a || b || c) {
something;
}
// What does this actually look like if we rewrote it to show what it's
// actually doing with short circuit evaluation?
if (a) goto do_something;
if (b) goto do_something;
if (c) goto do_something;
goto dont_do_something;
do_something:
something;
dont_do_something:
// You can see how the first success is all you need
// Compare that with and below:
if (a && b && c) {
something;
}
if (a) {
if (b) {
if (c) {
something;
}
}
}
// which in jump form is
if (a)
goto a_true;
goto failure;
a_true:
if (b)
goto b_true;
goto failure;
b_true:
if (c)
goto c_true:
goto failure;
c_true:
something;
failure:
// Man that's ugly, overcomplicated, and hard to read
// But what if we did this instead:
if (!a) goto dont_do_something;
if (!b) goto dont_do_something;
if (!c) goto dont_do_something;
something;
dont_do_something:
// Clearly you need all successes for and. In other words
// to do and directly, you need state, knowledge of past
// successes. But what about that second translation of and?
// It looks a lot like or?
You’re exactly right. That final translation of and is exactly like or.
It takes advantage of De Morgan’s laws.[7] For those of you who haven’t taken a Digital Logic course (or have forgotten), De Morgan’s laws are 2 equivalencies, a way to change an or to an and, and vice versa.
They are (in C notation):
!(A || B) == !A && !B
!(A && B) == !A || !B
Essentially you can think of it as splitting the not across the terms and changing the logical operation. The law works for arbitrary numbers of terms, not just 2:
(A && B && C) is really ((A && B) && C) so when you apply De Morgan's Law recursively you get: !((A && B) && C) == !(A && B) || !C == !A || !B || !C
Let’s apply the law to our current compound and example. Of course the negation of greater or less than comparisons means covering the rest of the number line so it becomes:
1
2
3
4
5
if (a <= 10 || a >= b))
goto no_add20;
c += 20;
no_add20:
b &= 0xFF;
which turns into:
1
2
3
4
5
6
7
li t6, 10
ble t0, t6, no_add20 # if (a <= 10) goto no_add20
bge t0, t1, no_add20 # if (a >= b) goto no_add20
addi t2, t2, 20 # c += 20
no_add20:
andi t1, t1, 0xFF # b &= 0xFF
See how that works? Or's do not need to remember state. Just the fact that you reached a line in a multi-term or expression means the previous checks were false, otherwise you’d have jumped. If you tried to emulate the same thing with an and, as you saw in the larger snippet above, you’d need a bunch of extra labels and jumps for each term.
What about mixed compound statements?
1
2
3
4
if (a > 10 || c > 100 && b >= c)
printf("true\n");
b |= 0xAA;
Well, the first thing to remember is that &&
has a higher priority than ||
,
which is why most compilers these days will give a warning for the above code
about putting parenthesis around the &&
expression to show you meant it (even
though it’s completely legal as is).
So with that in mind, let’s change it to jump format to better see what we
need to do. While we’re at it, let’s apply De Morgan’s law to the &&
.
1
2
3
4
5
6
7
8
9
10
11
if (a > 10)
goto do_true;
if (c <= 100)
goto done_if;
if (b < c)
goto done_if;
do_true:
printf("true\n");
done_if:
b |= 0xAA;
This one is trickier because we don’t flip the initial expression like normal. Instead of jumping over the body which would require testing for the opposite, we jump to the true case. We do this because we don’t want to have multiple print statements and it lets us fall through the following conditions. We would need multiple print statements because failure for the first expression is not failure for the entire expression. Here’s how it would look otherwise:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
if (a <= 10)
goto check_and;
printf("true\n");
goto done_if;
check_and:
if (c <= 100)
goto done_if;
if (b < c)
goto done_if;
printf("true\n");
done_if:
b |= 0xAA;
That is harder to read and has both an extra print and an extra jump.
So let’s convert the better version to RISC-V (a,b,c = t0
, t1
, t2
):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
.data
true_str: .asciz "true\n"
.text
li t5, 10 # get the necessary literals in some unused regs
li t6, 100
bgt t0, t5, do_true # if (a > 10) goto do_true
ble t2, t6, done_if # if (c <= 100) goto done_if
blt t1, t2, done_if # if (b < c) goto done_if
do_true:
li a7, 4 # print string
la a0, true_str # address of str in a0
ecall
done_if:
ori t1, t1, 0xAA # b |= 0xAA
If-Else Chain
Ok, let’s look at a larger example. Say you’re trying to determine
a student’s letter grade based on their score. We’re going to need a chain
of if-else-if
's to handle all cases.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <stdio.h>
int main()
{
int score;
char letter_grade;
printf("Enter your score: ");
scanf("%d", &score);
if (score >= 90) {
letter_grade = 'A';
} else if (score >= 80) {
letter_grade = 'B';
} else if (score >= 70) {
letter_grade = 'C';
} else if (score >= 60) {
letter_grade = 'D';
} else {
letter_grade = 'F';
}
printf("You got a %c\n", letter_grade);
return 0;
}
With chains like these, if you follow everything we’ve learned, it comes out
looking like this (assuming score is t0
and letter_grade is t1
):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
.data
prompt: .asciz "Enter your score: "
grade_str: .asciz "You got a "
.text
main:
li a7, 4 # print str
la a0, prompt
ecall
li a7, 5 # read int
ecall
mv t0, a0 # move score into t0
li t1, 70 # letter_grade default to 'F' ascii value
li t2, 90
blt t0, t2, not_a # if (score < 90) goto not_a
li t1, 65 # leter_grade = 'A'
j grade_done
not_a:
li t2, 80
blt t0, t2, not_b # if (score < 80) goto not_b
li t1, 66 # leter_grade = 'B'
j grade_done
not_b:
li t2, 70
blt t0, t2, not_c # if (score < 70) goto not_c
li t1, 67 # leter_grade = 'C'
j grade_done
not_c:
li t2, 60
blt t0, t2, grade_done # if (score < 60) goto grade_done
li t1, 68 # leter_grade = 'D'
grade_done:
li a7, 4 # print str
la a0, grade_str
ecall
li a7, 11 # print character
mv a0, t1 # char to print
ecall
li a0, 10 # print '\n'
ecall
li a7, 10 # exit
ecall
You can see how we set a default value and then test for the opposite of each condition to jump to the next test, until we get one that fails (aka was true in the original C condition) and set the appropriate grade.
You can arrange chains like this in either direction, it doesn’t have to match the order of the C code. As long as it works the same, do whatever makes the code simpler and more sensible to you.
Conclusion
Branching and logic and learning to translate from higher level code to assembly is something that takes a lot of practice, but eventually it’ll become second nature. We’ll get more practice in the chapter on looping which naturally also involves branching.
One final note, there’s rarely any reason to use the slt
family of opcodes
unless your professor requires it for some strange reason. Even if your
professor says you can’t use pseudoinstructions, that would still leave you with
beq
, bne
, blt
, bge
, which covers every possibility even if you sometimes
have to switch the order of the operands.
Exercises
You can support the book and purchase the chapter exercise solutions here.
-
Convert the following C code to RISC-V.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#include <stdio.h> int main() { int num; printf("Enter an integer: "); scanf("%d", &num); if (num > 50) { puts("The number is greater than 50"); } else if (num < 50) { puts("The number is less than 50"); } else { puts("You entered 50!"); } return 0; }
-
Prompt for the user’s name, then tell them whether their name starts with a letter from the first or second half of the alphabet. Be sure to handle both upper and lower case correctly, but assume they entered a valid letter.
Chapter 4: Loops
"Insanity is doing the same thing over and over again and expecting different results."
Often misattributed to Albert Einstein
Before we get into the RISC-V, I want to cover something that may be obvious to some but
may have never occurred to others. Any loop structure can be converted to any other
(possibly with the addition of an if
statement). So a for
can be written as a while
and vice versa. Even a do-while
can be written as a for
or while
loop. Let’s look
at some equivalencies.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
for (int i=0; i<a; i++) {
do_something;
}
int i = 0;
while (i < a) {
do_something;
i++;
}
int i = 0;
if (i < a) {
do {
do_something;
i++;
} while (i < a);
}
// you could also have an if (i >= a) goto loop_done; to jump over do-while
In general, when writing assembly, it can help to think more in terms of while
or
do-while
rather than for
because the former more closely resemble what the
assembly looks like in terms of what goes where. Like in the last chapter,
where we would think of the if-else
statements in "jump-form" or "branch-form",
we can do the same here, converting for
to while
in our head as an intermediary
step before going to assembly.
Speaking of "jump-form", lets apply it to the loop above:
1
2
3
4
5
6
7
8
9
10
int i=0;
if (i >= a)
goto done_loop;
loop:
do_something;
i++
if (i < a)
goto loop;
done_loop:
You can see how that starts to look more like assembly. Another thing to note is that
unlike with if
statements where we test for the opposite to jump over the block of code,
when you’re doing the loop test at the bottom like with a do-while
, it is unchanged
from C because you are jumping to continue the loop. If you put the test at the top it
becomes inverted, and you put an unconditional jump at the bottom:
1
2
3
4
5
6
7
8
9
int i=0;
loop:
if (i >= a)
goto done_loop;
do_something;
i++
goto loop:
done_loop:
In general it’s better to test at the bottom, both because the condition matches
the higher level form, and because when you know the loop is going to execute at least once it requires
only one jump + label, rather than 2 since you can forgo the the initial if
check:
1
2
3
4
5
6
7
8
9
10
11
for (int i=0; i<10; i++)
do_something;
// becomes
int i=0;
loop:
do_something;
i++
if (i < 10)
goto loop;
Ok, now that we’ve got the theory and structure out of the way, let’s try doing a simple one in RISC-V.
1
2
3
4
int sum = 0;
for (int i=0; i<100; i++) {
sum += i;
}
That’s about as basic as it gets, adding up the numbers 0 to 99.
1
2
3
4
5
6
7
li t0, 0 # sum = 0
li t1, 1 # i = 1 we can start at 1 because obviously adding 0 is pointless
li t2, 100
loop:
addi t0, t0, t1 # sum += i
addi t1, t1, 1 # i++
blt t1, t2, loop # while (i < 100)
Ok I don’t think there’s much point in doing any more without getting to what loops are most often used for, looping through data structures, most commonly arrays.
Looping Through Arrays
Looping and arrays go together like peanut butter and jam. An array is a sequence of variables of the same type, almost always related in some way. Naturally, you want to operate on them all together in various ways; sorting, searching, accumulating, etc. Given that the only way to do that is with loops, in this section we’ll cover different ways of looping through arrays, including multidimensional arrays.
1D Arrays
Let’s pretend there’s an array int numbers[10];
filled with 10 random numbers.
1
2
3
4
int total = 0;
for (int i=0; i<10; i++) {
total += numbers[i];
}
There are several ways to do this. The first is the most literal translation.
1
2
3
4
5
6
7
8
9
10
11
12
li t0, 0 # total = 0
li t1, 0 # i = 0
la t2, numbers # t2 = numbers
li t3, 10
sum_loop:
slli t4, t1, 2 # t4 = i*sizeof(int) == i*4
add t4, t4, t2 # t4 = &numbers[i]
lw t4, 0(t4) # t4 = numbers[i]
add t0, t0, t4 # total += numbers[i]
addi t1, t1, 1 # i++
blt t1, t3, sum_loop # while (i < 10)
We initialize the relevant variables beforehand (numbers
and 10
could be loaded
every iteration but that’s less efficient). Now what’s with the i*4
? We already
discussed using shifts to multiply and divide by powers of 2 in a previous chapter,
but here we’re doing something that higher level languages do automatically for you
every time you do an array access. When you access the i'th element, under the hood
it is multiplying i
by the size of the type of the array and adding that number of
bytes to the base address and then loading the element located there.
If you’re unfamiliar with the C syntax in the comments, &
means "address of", so
t4
is being set to the address of the i'th element. Actually that C syntax is
redundant because the the &
counteracts the brackets. In C adding a number to a
pointer does pointer math (ie it multiplies by the size of the items as discussed
above). This means that the following comparison is true:
&numbers[i] == numbers + i
which means that this is true too
&numbers[0] == numbers
The reason I use the form on the left in C/C++ even when I can use the right is it makes it more explicit and obvious that I’m getting the address of an element of an array. If you were scanning the code quickly and saw the expression on the right, you might not realize that’s an address at all, it could be some mathematical expression (though the array name would hopefully clue you in if it was picked well).
Anyway, back to the RISC-V code. After we get the address of the element we want, we
have to actually read it from memory (ie load it). Since it’s an array of words
(aka 4 byte ints) we can use load word, lw
.
Finally, we add that value to total
, increment i
, and perform the loop check.
Now, I said at the beginning that this was the most literal, direct translation
(not counting the restructuring to a do-while
form). However, it is not my preferred
form because it’s not the simplest, nor the shortest.
Rather than calculate the element address every iteration, why not keep a pointer to the current element and iterate through the array with it? In C what I’m suggesting is this:
1
2
3
4
5
6
7
int* p = &numbers[0];
int i = 0, total = 0;
do {
total += *p;
i++;
p++;
} while (i < 10);
In other words, we set p
to point at the first element and then increment it every
step to keep it pointing at numbers[i]
. Again, all mathematical operations on pointers
in C deal in increments of the byte syze of the type, so p++
is really adding 1*sizeof(int)
.
1
2
3
4
5
6
7
8
9
10
11
li t0, 0 # total = 0
li t1, 0 # i = 0
la t2, numbers # p = numbers
li t3, 10
sum_loop:
lw t4, 0(t2) # t4 = *p
add t0, t0, t4 # total += *p
addi t1, t1, 1 # i++
addi t2, t2, 4 # p++ ie p += sizeof(int)
blt t1, t3, sum_loop # while (i < 10)
Now, that may not look much better, we only saved 1 instuction, and if we were
looping through a string (aka an array of characters, sizeof(char) == 1
) we wouldn’t
have saved any. However, imagine if we weren’t using slli
to do the multiply but
mul
. That would take 2 instructions, even if one could be above the loop.
And remember we would have to use mul
instead of slli
if we were iterating
through an array of structures with a size that wasn’t a power of 2, so using this
method saves even more in that rare case.
However, there is one more variant that you can use that can save a few more instructions.
Instead of using i
and i<10
to control the loop, use p
and the address just past the
end of the array. In C it would be this:
1
2
3
4
5
6
7
int* p = &numbers[0];
int* end = &numbers[10];
int total = 0;
do {
total += *p;
p++;
} while (p < end);
You could also use !=
instead of <
. This is similar to using the .end()
method
on many C++ data structures when using iterators. Now the RISC-V version:
1
2
3
4
5
6
7
8
9
li t0, 0 # total = 0
la t2, numbers # p = numbers
addi t3, t2, 40 # end = &numbers[10] = numbers + 10*sizeof(int)
sum_loop:
lw t4, 0(t2) # t4 = *p
add t0, t0, t4 # total += *p
addi t2, t2, 4 # p++ ie p += sizeof(int)
blt t2, t3, sum_loop # while (p < end)
So we dropped from 10 to 7 instructions, 6 to 4 in the loop itself which is the most important for performance. And this was for a 1D array. Imagine if you had 2 or 3 indices you had to use to calculate the correct offset. That’s what we go over in the next section.
2D Arrays
The first thing to understand is what’s really happening when you declare a 2D array in C. The contents of a 2D array are tightly packed, in row-major order, meaning that all the elements from the first row are followed by all the elements of the second row and so on. What this means is that a 2D array is equivalent to a 1D array with rows*cols elements in the same order:
1
2
3
4
5
#define ROWS 2
#define COLS 4
// The memory of these two arrays are identical
int array[ROWS][COLS] = { { 1, 2, 3, 4 }, { 5, 6, 7, 8 } };
int array1d[ROWS*COLS] = { 1, 2, 3, 4, 5, 6, 7, 8 };
See the code example 2d_arrays.c for more details.
What this means is that when we declare a 2D array, it’s basically a 1D array with the size equal to rows * columns. Also, when we loop through a 2D array, we can often treat it like a 1D array with a single loop. So everything that we learned before applies.
Let’s do an example.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
for (int i=0; i<rows; i++) {
for (int j=0; j<cols; ++j) {
array[i][j] = i + j;
}
}
// becomes
int r, c;
for (int i=0; i<rows*cols; i++) {
r = i / cols;
c = i % cols;
array[i] = r + c;
}
So assuming rows
and cols
are in a0
and a1
(and nonzero), it would
look like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
la t0, array # p = &array[0]
li t1, 0 # i = 0
mul t2, a0, a1 # t2 = rows * cols
loop:
div t3, t1, a1 # r = i / cols
rem t4, t1, a1 # c = i % cols
add t3, t3, t4 # t3 = r + c
sw t3, 0(t0) # array[i] = *p = r + c
addi t1, t1, 1 # i++
addi t0, t0, 4 # p++ (keep pointer in sync with i, aka p = &array[i])
blt t1, t2, loop # while (i < rows*cols)
You might ask if it’s it worth it to convert it to a single loop when you still
need the original i
and j
as if you were doing nested loops. Generally, it is
much nicer to avoid nested loops in assembly if you can. There are many cases
when you get the best of both worlds though. If you’re doing a clear for example,
setting the entire array to a single value, there’s no need to calculate the row
and column like we did here. I only picked this example to show how you could
get them back if you needed them.
For comparison here’s the nested translation (while still taking advantage of the 1D arrangement of memory and pointer iterators):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
la t0, array # p = &array[0]
li t1, 0 # i = 0
looprows:
li t2, 0 # j = 0
loopcols:
add t3, t1, t2 # t3 = i + j
sw t3, 0(t0) # array[i][j] = *p = i + j
addi t2, t2, 1 # j++
addi t0, t0, 4 # p++ (keep pointer in sync with i and j, aka p = &array[i][j])
blt t2, a1, loopcols # while (j < cols)
addi t1, t1, 1 # i++
blt t1, a0, looprows # while (i < rows)
It’s the same number of instructions, but with an extra label and branch. I think I prefer this version despite the extra branch. On the other hand, either of the last 2 versions are better than the literal translation below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
la t0, array # p = &array[0]
li t1, 0 # i = 0
looprows:
li t2, 0 # j = 0
loopcols:
add t3, t1, t2 # t3 = i + j
# need to calculate the byte offset of element array[i][j]
mul t4, t1, a1 # t4 = i * cols
add t4, t4, t2 # t4 = i * cols + j
slli t4 t4, 2 # t4 = (i * cols + j) * sizeof(int)
add t4, t4, t0 # t4 = &array[i][j] (calculated as array + (i*cols + j)*4)
sw t3, 0(t4) # array[i][j] = i + j
addi t2, t2, 1 # j++
blt t2, a1, loopcols # while (j < cols)
addi t1, t1, 1 # i++
blt t1, a0, looprows # while (i < rows)
That chunk in the middle calculating the offset of every element? Not only is it far slower than iterating the pointer through the array, but you can imagine how much worse it would be for a 3D array with 3 nested loops.
Conclusion
Hopefully after those examples you have a more solid understanding of looping in RISC-V and how to transform various loops and array accesses into the form that makes your life the easiest. There is more we could cover here, like looping through a linked list, but I think that’s beyond the scope of what we’ve covered so far. Perhaps in a later chapter.
Exercises
You can support the book and purchase the chapter exercise solutions here.
-
Convert the following C code to RISC-V. If using Venus, you can just hard code a "random" number between 0 and 100.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#include <stdio.h> #include <stdlib.h> int main() { int num = rand() % 101 int guess; puts("I'm thinking of a number 0-100. Try to guess it!"); while (1) { printf("Guess a number: "); scanf("%d", &guess); if (guess > num) { puts("Too high!"); } else if (guess < num) { puts("Too low!"); } else { break; } } printf("Correct, it was %d!\n", num); return 0; }
-
Write a RISC-V program to find and print the average of the following array. Use integer division.
1 2
.data array: .word 93,8,78,-6,51,49,3,2,128,0
-
Write a program to find the min and max of the array in the previous exercise
Chapter 5: Functions and the RISC-V Calling Convention
While I’m sure everyone here probably knows what functions are, you might be wondering what a "Calling Convention" is. In short, it is an agreement between the caller and callee about how to treat/use certain registers. We’ll get to the why and how later.
Functions
In assembly, a function is simply a label with a return instruction associated with it; because this is far more ambiguous than a function in a higher level language, it is good practice to only have a single return instruction associated with a function.[8] A comment above the label is also helpful. Together those help you quickly see the start and end of the function.
1
void func1() {}
would be
1
2
3
4
# void func1()
func1:
# body goes here
ret
As you can see my policy is to put a single line comment of the C prototype above label.
But how do you call a function in assembly? You use the instruction Jump and Link:
jal func_label
.
Let’s change the hello world program from chapter 0 to call a function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
.data
hello: .asciz "Hello World!\n"
.text
main:
jal hello_world
li a7, 10 # exit ecall
ecall
# void hello_world()
hello_world:
li a7, 4 # print string ecall
la a0, hello # load address of string to print into a0
ecall
ret
What jal
actually does, is save the address of the next instruction to ra
and then do an unconditional jump to the function label. So you could achieve
the same results with the following:
jal func # is equivalent to la ra, next_instr j func next_instr:
That would get tiring and ugly fast though, having to come up with unique labels
for the next instruction every time. You also might be confused about why the
greensheet says jal
saves PC+4 in an arbitrary register R[rd] instead of ra
specifically (which would be R[1]). The instruction does actually take a register argument but
since it’s most commonly used to call a function if you don’t specify a register
it will use ra
as if you did jal ra, func
. This works in conjunction with
the pseudoinstruction ret
which does PC=R[1] using the instruction jalr
(specifically jalr x0, ra, 0
) to easily return from functions.
You might also see another way of returning using jr
which stands for
"Jump Register" and jumps to the address in the register, so to return
from a function you’d do jr ra
. It is also a pseudoinstruction
that uses jalr
. Unless your professor insists on something else, prefer
ret
; not only is it the shortest, returning from functions is its sole
purpose.
The Convention
We’ve gone as far as we can without starting to talk about registers and their purposes in functions. You can think of registers as variables[9] that are part of the CPU. In this case, since we’re dealing with a 32-bit RISC-V architecture, they are 32-bit (aka 4 bytes, 1 word) variables.[10] Since they’re part of the CPU, they exist for the life of the program and the whole program shares the same registers.
But how does that work? If all parts of the program use the same
32 registers, how does one function not stomp all over what another was doing
when it uses them? In fact, how do functions communicate at all? How do they
pass arguments or return results? All these questions are solved by deciding
on a "Calling Convention". It’s different for different architectures and even
different operating systems on the same architecture. This is because different
architectures have different numbers of registers, and some registers like ra
have
semi-hardcoded uses. The the pseudoinstruction ret
uses ra
, and x0
is a
constant 0 and there’s no way to change either of those facts. That still
leaves a lot of flexibility when designing a calling convention. While they
mostly match, you can probably find several variations of RISC-V calling
conventions online. They usually differ in how they setup a stack frame.
The convention covered in this chapter is consistent with, and sufficient for,
almost every college course I’ve ever heard of.
Regardless, what matters is that the calling convention works by setting rules (and guidelines) for register use, and when/how to use the stack.
If you’re unfamiliar with the
runtime stack, it’s exactly what it sounds like. It’s a Last-In-First-Out (LIFO)
data structure that you can use to store smaller values in a program. It grows
in a negative direction, so to allocate 12 bytes, you would subtract 12 from the
stack pointer (in RISC-V that’s sp
).
RISC-V specifically designates certain registers to be used for passing arguments
(at least the first 8), a couple for return values, and others for misc. temporary
or saved values. The rest are special use registers like ra
.
The quickest way to summarize is to look at the table on the greensheet which is reproduced below:
Register | Name | Use | Preserved Across a Call |
---|---|---|---|
x0 |
zero |
Constant 0 |
N.A. |
x1 |
ra |
Return address |
No |
x2 |
sp |
Stack pointer |
Yes |
x3 |
gp |
Global pointer |
— |
x4 |
tp |
Thread pointer |
— |
x5-x7 |
t0-t2 |
Temporaries |
No |
x8 |
s0/fp |
Saved register/Frame pointer |
Yes |
x9 |
s1 |
Saved register |
Yes |
x10-x11 |
a0-a1 |
Function arguments/Return values |
No |
x12-x17 |
a2-a7 |
Function arguments |
No |
x18-x27 |
s2-s11 |
Saved registers |
Yes |
x28-x31 |
t3-t6 |
Temporaries |
No |
To summarize, you have 15 registers that can be used anytime for temporary
values, though some have special uses too (the a and t registers).
You have 12 s registers that have to be saved on the stack if you use
them, plus ra
as well. The zero
register is obviously a special case.
The sp
register is technically preserved but not in the same way. Basically
what you allocate (subtract) you have to deallocate (add) before returning
from a function, thus preserving the original value.
You can ignore gp, tp
, and most of the time fp
too. Also, with 8 registers
to pass arguments, you’ll almost never need to pass arguments on the stack.
Basic example
Let’s start with something simple that doesn’t use the stack.
int hello_name_number(char* name, int number) { printf("Hello %s!\n", name); return number + 10; }
According to the convention that becomes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
.data
hello_space: .asciz "Hello "
exclaim_nl: .asciz "!\n"
.text
# int hello_name_number(char* name, int number)
hello_name_number:
mv t0, a0 # save name in t0 since we need a0 for the ecall
li a7, 4 # print string
la a0, hello_space
ecall
mv a0, t0 # print name (a7 is still 4)
ecall
la a0, exclaim_nl # print "!\n"
ecall
addi a0, a1, 10 # return number + 10
ret
Some things to note, ecalls are not function calls so we can "save" a0
in
a t register and know that it’ll still be there when the ecall is done. In the
same way, we know that a7
is still the same so we don’t have to keep setting
it to 4 for print string. Lastly, to return a value, we make sure that value
is in a0
before returning.
Using the Stack
First, let’s establish the rules on when you have to use the stack (You can always use it for arbitrary local variables, like a local array for example, but generally don’t if you don’t have a good reason).
-
You call another function, ie you’re a non-leaf function.
This means you have to save
ra
on the stack at the very least, otherwise when you do yourret
you’d jump back into yourself (right after the lastjal
instruction). This does not apply to main because you don’t/shouldn’t return from main, you should call the exit (or exit2) ecall (10 or 93). -
You need to save values across a function call (automatically includes reason 1).
This is fairly common for non-trivial functions. Obvious examples are calling a function in a loop or loops (you’d have to preserve the iterator(s)), and many recursive functions.
-
You run out of temporary registers and overflow into the s registers.
This is very rare. The most common reason this "happens" is people forget they have 8 a registers, in addition to the 7 t registers, that they can also use for temporaries. 15 is more than enough to handle pretty much any function because you rarely need 16 discrete values at the same time.
Let’s look at an example for the first two. Any example for the last rule would be prohibitively large and complicated.
1
2
3
4
5
int non_leaf()
{
func1();
return 42
}
This calls the empty function discussed at the top of this chapter.
1
2
3
4
5
6
7
8
9
10
11
12
#int non_leaf()
non_leaf:
addi sp, sp, -4 # space to save 1 register, ra
sw ra, 0(sp) # store ra in the newly allocated stack space
jal func1
li a0, 42 # return 42
lw ra, 0(sp) # restore original ra
addi sp, sp, 4 # pop the stack
ret
The bit of code at the top and bottom of the function are called the prologue
and epilogue respectively for obvious reasons. We allocate 4 bytes on the stack
by subtracting 4 (I add a negative rather than subtract
because I can copy-paste the line with a single character change for the
epilogue). Then we store the current ra
in that space at the new top of the
stack. Then before we exit we have to load it back and pop the stack.
If we didn’t save and restore ra
we would jump to line 7 when we do our
ret
and then we’d be in an infinite loop.
Next we have the second case, where we need to preserve regular local values across a function call.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void print_letters(char letter, int count)
{
for (int i=0; i<count; i++) {
putchar(letter);
}
putchar('\n');
}
int save_vals()
{
for (int i=0; i<10; i++) {
print_letters('A'+i, i+1);
}
return 8;
}
That becomes this in RISC-V:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#void print_letters(char letter, int count)
print_letters:
ble a1, x0, exit_pl # if (count <= 0) goto exit_pl
li a7, 11 # print character
pl_loop:
ecall
addi a1, a1, -1 # count--
bgt a1, x0, pl_loop # while (count > 0)
li a0, 10 # '\n'
ecall
exit_pl:
ret
#int save_vals()
save_vals:
addi sp, sp, -12
sw ra, 0(sp)
sw s0, 4(sp)
sw s1, 8(sp)
li s0, 0 # i = 0
li s1, 10
sv_loop:
addi a0, s0, 65 # i + 'A'
addi a1, s0, 1 # i + 1
jal print_letters
addi s0, s0, 1 # i++
blt s0, s1, sv_loop # while (i < 10)
lw ra, 0(sp)
lw s0, 4(sp)
lw s1, 8(sp)
addi sp, sp, 12
ret
Notice that for print_letters, we not only convert the loop to a do-while
, but
we also use the parameter count
as the iterator to count down to 0. It saves
us an instruction initializing an i
.
Second, for save_vals
, we save not only ra
because we call another function,
but also two s registers to save i
and our stopping point. The second is not
actually necessary; because it’s a constant, we could load 10 into a register
right before the check every iteration of the loop. Which version is better depends on
several factors, like how long or complex the loop is, how many times it executes, and
of course personal preference.
Recursive Functions
Let’s do a classic recursive function, the fibonacci sequence.
1
2
3
4
5
6
7
int fib(int n)
{
if (n <= 1)
return n;
return fib(n-2) + fib(n-1);
}
You can see how, at the very least, we’ll have to save ra
and n
, because we
need the original even after the first recursive call. It’s not as
obvious, but we’ll also have to save the return value of the first call so
we’ll still have it to do the addition after the second. You might think
this would require using two s regs, but does it? Let’s see…
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#int fib(int n)
fib:
addi sp, sp, -8
sw ra, 0(sp)
sw s0, 4(sp)
# n already in a0 for immediate return
li t0, 1
ble a0, t0, exit_fib # if (n <= 1) goto exit_fib (ie return n)
mv s0, a0 # save n
addi a0, a0, -2
jal fib # fib(n-2)
addi t0, s0, -1 # calc n-1 first so we can use s0 to save fib(n-2)
mv s0, a0 # save return of fib(n-2) in s0
mv a0, t0 # copy n-1 to a0
jal fib # fib(n-1)
add a0, a0, s0 # a0 = fib(n-1) + fib(n-2)
exit_fib:
lw ra, 0(sp)
lw s0, 4(sp)
addi sp, sp, 8
ret
Notice how we don’t have to save n
any sooner than necessary, ie right before
we have to use a0
to setup the first recursive call. Also, the ordering of
lines 16-18 is important. We needed the original n
to calculate n-1
but
once that’s in a0
ready for the call, because we won’t need n
again afterward,
we can now use s0
to preserve the return value of the first call.
Some of you, if you were paying attention, might point out that you could save a few instructions of performance if you moved the base case testing before the prologue as long as you put the exit label after the epilogue. This is true, but I’d recommend against it unless you were really trying to eke out every last microsecond. It’s nicer/cleaner to keep the prologue and epilogue as the first and last things; they’re one more thing to catch your eye and help delineate where functions start and end. Regardless, if you’re curious, you can see that version, along with every other function in this chapter in the included program calling.s.
Conclusion
While grasping the basics of a calling convention is not too difficult, it takes
practice to get used to it. There are many things that we haven’t covered
in this chapter, like how to pass more than 8 arguments, or use fp
, or handle
floating point arguments or return values. The latter at least, will be covered in
the next chapter.
Exercises
You can support the book and purchase the chapter exercise solutions here.
-
Implement the following functions in RISC-V and write a program to demonstrate their use. You can reuse much of your code from the previous chapter’s exercises.
1 2 3 4 5 6 7 8 9 10
// return the min or max int get_min(int* array, int size); int get_max(int* array, int size); // return the index of the min/max int locate_min(int* array, int size); int locate_max(int* array, int size); // return the average of the array int calc_average(int* array, int size);
-
The Collatz conjecture is defined as follows: start with any positive integer n, if n is even, divide by 2, otherwise, multiply by 3 and add 1. The conjecture is that all sequences will eventually reach 1. Write 2 versions of the collatz function, one iterative and one recursive. Print out the sequence as they go.
1 2 3 4 5 6 7
void collatz_iterative(int n); void collatz_recursive(int n); // for an added challenge return the number of steps taken to reach 1 // you can remove the printing of the steps int collatz_iterative2(int n); int collatz_recursive2(int n);
Chapter 6: Floating Point Types
Up to this point we haven’t really mentioned floating point values or instructions
at all, except how to declare them in the .data
section and the ecalls for
reading and printing them. There are two reasons we’ve left them alone till now.
First, they use a whole separate set of registers and instructions. Second, and
partly because of the first reason, most college courses do not ever require
you to know or use floating point values. Since this book is targeted at college
students, if you know you won’t need to know this feel free to skip this chapter.
Floating Point Registers and Instructions
The greensheet contains all the floating point registers and their uses but you can also see them in the table below:
Register | Name | Use | Preserved Across a Call |
---|---|---|---|
f0-f7 |
ft0-ft7 |
Temporaries |
No |
f8-f9 |
fs0-fs1 |
Saved registers |
Yes |
f10-f11 |
fa0-fa1 |
Arguments/Return values |
No |
f12-f17 |
fa2-fa7 |
Arguments |
No |
f18-f27 |
fs2-fs11 |
Saved registers |
Yes |
f28-f31 |
ft8-ft11 |
Temporaries |
Yes |
Likewise, you can look to the greensheet to see all the floating point instructions but here are the most important/useful ones:
Name | Opcode | Format | Operation |
---|---|---|---|
Load |
flw,fld |
|
F[rd] = M[R[rs1]+n] |
Store |
fsw,fsd |
|
M[R[rs1]+n] = F[rd] |
Move from Integer |
fmv.[sd].x |
|
F[rd] = R[rs1] |
Move to Integer |
fmv.x.[sd] |
|
R[rd] = F[rs1] |
Move |
fmv.[sd] |
|
F[rd] = F[rs1] |
Convert to SP from DP |
fcvt.s.d |
|
F[rd] = (float)F[rs1] |
Convert to DP from SP |
fcvt.d.s |
|
F[rd] = (double)F[rs1] |
Convert from 32b Integer |
fcvt.[sd].w |
|
F[rd] = (float)R[rs1] |
Convert to 32b Integer |
fcvt.w.[sd] |
|
R[rd] = (int)F[rs1] |
Compare Equal |
feq.[sd] |
|
R[rd] = (F[rs1] == F[rs2]) ? 1 : 0 |
Compare Less Than |
flt.[sd] |
|
R[rd] = (F[rs1] < F[rs2]) ? 1 : 0 |
Compare Less Than Equal |
fle.[sd] |
|
R[rd] = (F[rs1] ⇐ F[rs2]) ? 1 : 0 |
Absolute Value |
fabs.[sd] |
|
F[rd] = (F[rs1] < 0) ? -F[rs1] : F[rs1] |
Add |
fadd.[sd] |
|
F[rd] = F[rs1] + F[rs2] |
Subtract |
fsub.[sd] |
|
F[rd] = F[rs1] - F[rs2] |
Multiply |
fmul.[sd] |
|
F[rd] = F[rs1] * F[rs2] |
Divide |
fdiv.[sd] |
|
F[rd] = F[rs1] / F[rs2] |
Negation |
fneg.[sd] |
|
F[rd] = -F[rs1] |
Anywhere you see a [sd], use s or d for single or double precisision.
You only get equal, less than, and less than equal, but it’s easy enough to flip the operands or test for the opposite result to cover the others.
Practice
We’re going to briefly go over some of the more different aspects of dealing with floating point numbers, but since most of it is the same but with a new set of registers and calling convention, we won’t be rehashing most concepts.
Getting Floating Point Literals
The first thing to know when dealing with floats is how to get float (or double) literals into registers where you can actually operate on them.
There are two ways. The first, and simpler way, is to declare them as globals
and then use the flw
and fld
instructions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
.data
a: .float 3.14159
b: .double 1.61
.text
main:
la t0, a
flw ft0, 0(t0) # get a into ft0
la t0, b
fld ft1, 0(t0) # get b into ft1
# other code here
The second way is to use the regular registers and convert the values. Of course this means unless you want an integer value, you’d have to actually do it twice and divide, and even that would limit you to rational numbers. It looks like this:
1
2
3
4
5
fmv.s.x ft0, x0 # move 0 to ft0 (0 integer == 0.0 float)
# get 4 to 4.0 in ft1
li t0, 4
fcvt.s.w ft1, t0 # ft1 = (float)t0
As you can see, other than 0 which is a special case, it requires at least 2 instructions.
Note
|
There is a 3rd way in RARS that saves you a step. RARS lets you
use flw and fld like this flw ft0, label, t0 where t0 is used as
a temporary, ie it’s doing the load address into t0 for you before doing
the actual flw.
|
Branching
Branching based on floating point values is slightly different than normal. Instead of being able to test and jump in a single convenient instruction, you have to test first and then jump in a second instruction if the test was true or not. This is similar to the way x86 and MIPS (for floats) do it. For them, the test sets a special control/flag register (or a certain bit or bits in the register) and then all jumps are based on its state.
With RISC-V there is no special control register. The float comparisons are like the slt
instructions where you choose a destination register to set to 1 (true) or 0 (false).
Using them looks like this:
1
2
3
4
5
6
7
8
9
10
11
flt.s t0, ft0, ft1 # t0 = ft0 < ft1
bne t0, x0, was_less # if (t0 != 0 aka ft0 < ft1) goto was_less
# do something for ft0 >= ft1
j blah
was_less:
# do something for ft0 < ft1
blah:
Functions
Finally, lets do a simple example of writing a function that takes a float and returns a float. I’m not going to bother doing one for doubles because it’d be effectively the same, or doing one that requires the stack, because the only differences from normal are a new set of registers and knowing which ones to save or not from the table above.
So, how about a function to convert a fahrenheit temperature to celsius:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
.data
# 5/9 = 0.5 with 5 repeating
fahrenheit2celsius: .float 0.5555555
.text
# float convert_F2C(float degrees_f)
convert_F2C:
la t0, fahrenheit2celsius
flw ft0, 0(t0) # get conversion factor
# C = (F - 32) * 5/9
li t0, 32
fcvt.s.w ft1, t0 # convert to 32.0
fsub.s fa0, fa0, ft1 # fa0 = degrees_f - 32
fmul.s fa0, ft0, fa0 # fa0 = 0.555555 * fa0
ret
You can see we follow the convention with the argument coming, and the result being
returned, in fa0
. In this function we use both methods for getting
a value into float registers; one we load from memory and the other, being
an integer, we convert directly.
Conclusion
As I said before, it is rare for courses to even bother covering floating point instructions or assign any homework or projects that use them. Venus doesn’t even support floating point instructions. Hopefully this brief overview, combined with the knowledge of previous chapters is sufficient if you do need or want to work with floating point values.
There are also 2 example programs conversions.s and calc_pi.s for you to study.
Exercises
You can support the book and purchase the chapter exercise solutions here.
-
Write a program to convert an input in minutes to hours.
-
Write the following functions and use them in a program.
1 2
float miles2kilometers(float miles); float pounds2kilograms(float pounds);
Chapter 7: Tips and Tricks
This chapter is a grab bag of things you can do to improve your RISC-V programs and make your life easier.
Formatting
You may have noticed I have a general format I like to follow when writing RISC-V (or any) assembly. The guidelines I use are the following
-
1 indent for all code excluding labels/macros/constants.
I use hard tabs set to a width of 4 but it really doesn’t matter as long as it’s just 1 indent according to your preferences.
-
Use spaces to align the first operand of all instructions out far enough.
Given my 4 space tabs, this means column 13+ (due to longer floating point instructions like fcvt.s.w, though I often stop at 10 or 11 when that’s sufficient). The reason to use spaces is to prevent the circumstances that gave hard tabs a bad name. When you use hard tabs for alignment, rather than indentation, and then someone else opens your code with their tab set to a different width, suddenly everything looks terrible. Thus, tabs for indentation, spaces for alignment. Or as is increasingly common (thanks Python), spaces for everything but I refuse to do that to the poor planet.[11]
-
A comma and a single space between operands.
The simulators don’t actually require the comma but since other assembly languages/assemblers do, you might as well get used to it. Besides I think it’s easier to read with the comma, though that might be me comparing it to passing arguments to a function.
-
Comment every line or group of closely related lines with the purpose.
This is often simply the equivalent C code. You can relax this a little as you get more experience.
-
Use a blank line to separate logically grouped lines of code.
While you can smash everything together vertically, I definitely wouldn’t recommend it, even less than I would in a higher level language.
-
Put the
.data
section at the top, similar to declaring globals in C.There are exceptions for this. When dealing with a larger program with lots of strings, it can be convenient to have multiple
.data
sections with the strings you’re using declared close to where you use them. The downside is you have to keep swapping back and forth between.text
and.data
.
Misc. General Tips
-
Try to use registers starting from 0 and working your way up.
It helps you keep track of where things are (esp. combined with the comments). This can fall apart when you discover you forgot something or need to modify the code later and it’s often not worth changing all the registers you’re already using so you can maintain that nice sequence. When that happens I’ll sometimes just pick the other end of sequence (ie
t6
,a7
, ors11
) since if it’s out of order I might as well make it obvious. -
Minimize your jumps, labels, and especially your level of nested loops.
This was already covered in the chapters on branching and loops but it bears repeating.
-
In your prologue save
ra
first, if necessary, then all s regs used starting ats0
.Then copy paste the whole thing to the bottom, move the first line to the bottom and change the number to positive and change all the
sw
tolw
.
func: addi sp, sp, -20 sw ra, 0(sp) sw s0, 4(sp) sw s1, 8(sp) sw s2, 12(sp) sw s3, 16(sp) # body of func here that calls another function or functions # and needs to preserve 4 values across at least one of those calls lw ra, 0(sp) lw s0, 4(sp) lw s1, 8(sp) lw s2, 12(sp) lw s3, 16(sp) addi sp, sp, 20
Constants
One of the easiest things you can do to make your programs more readable is to use defined constants in your programs. RARS has a way of defining constants similar to how C defines macro constants; ie they aren’t "constant variables" that take up space in memory, it’s as if a search+replace was done on them right before assembling the program.
Let’s look at our Hello World program using constants:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
.eqv sys_print_str 4
.eqv sys_exit 10
.data
hello: .asciz "Hello World!\n"
.text
main:
li a7, sys_print_str
la a0, hello # load address of string to print into a0
ecall
li a7, sys_exit
ecall
Macros
RARS supports function style macros that can shorten your code and improve readability in some cases (though I feel it can also make it worse or be a wash).
The syntax looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
.macro macroname
instr1 a, b, c
instr2, b, d
# etc.
.end_macro
# or with parameters
.macro macroname(%arg1)
instr1 a, %arg1
instr2 c, d, e
# etc.
.end_macro
Some common examples are using them to print strings:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
.macro print_str_label(%x)
li a7, 4
la a0, %x
ecall
.end_macro
.macro print_str(%str)
.data
str: .asciz %str
.text
li a7, 4
la a0, str
ecall
.end_macro
.data
str1: .asciz "Hello 1\n"
.text
# in use:
print_str_label(str1)
print_str("Hello World\n")
...
You can see an example program in macros.s.
Switch-Case Statements
It is relatively common in programming to compare an integral type variable (ie basically any built-in type but float and double) against a bunch of different constants and do something different based on what it matches or if it matches none.
This could be done with a long if-else-if
chain, but the longer the chain the more
likely the programmer is to choose a switch-case
statement instead.
Here’s a pretty short/simple example in C:
1
2
3
4
5
6
7
8
9
10
printf("Enter your grade (capital): ");
int grade = getchar();
switch (grade) {
case 'A': puts("Excellent job!"); break;
case 'B': puts("Good job!"); break;
case 'C': puts("At least you passed?"); break;
case 'D': puts("Probably should have dropped it..."); break;
case 'F': puts("Did you even know you were signed up for the class?"); break;
default: puts("You entered an invalid grade!");
}
You could translate this to its eqivalent if-else
chain and handle it like we
cover in the chapter on branching. However, imagine if this switch statment had
a dozen cases, two dozen etc. The RISC-V code for that quickly becomes long and ugly.
So what if we implemented it in RISC-V the same way it is semantically in C?
The same way compilers often (but not necessarily) use? Well, before we do that,
what is a switch actually doing? It is jumping to a specific case label based
on the value in the specified variable. It then starts executing, falling through
any other labels, till it hits a break
which will jump to the end of the switch
block. If the value does not have its own case label, it will jump to the default
label.
Compilers handle it by creating what’s called a jump table, basically an array of label addresses, and using the variable to calculate an index in the table to use to jump to.
The C eqivalent of that would look like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include <stdio.h>
// This compiles with gcc, uses non-standard extension
// https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
int main()
{
// jump table
void* switch_table[] =
{ &&a_label, &&b_label, &&c_label, &&d_label, &&default_label, &&f_label };
printf("Enter your grade (capital): ");
int grade = getchar();
grade -= 'A'; // shift to 0
if (grade < 0 || grade > 'F'-'A')
goto default_label;
goto *switch_table[grade];
a_label:
puts("Excellent job!");
goto end_switch;
b_label:
puts("Good job!");
goto end_switch;
c_label:
puts("At least you passed?");
goto end_switch;
d_label:
puts("Probably should have dropped it...");
goto end_switch;
f_label:
puts("Did you even know you were signed up for the class?");
goto end_switch;
default_label:
puts("You entered an invalid grade!");
end_switch:
return 0;
}
The &&
and goto *var
syntax are actually not standard C/C++ but are GNU
extensions that are supported in gcc (naturally) and clang, possibly others.[12]
First, notice how the size of the jump table is the value of the highest valued label
minus the lowest + 1. That’s why we subtract the lowest value to shift the range
to start at 0 for the indexing. Second, any values without labels within that range
are filled with the default_label
address. Third, there has to be an initial
check for values outside the range to jump to default otherwise you could get an
error from an invalid access outside of the array’s bounds.
The same program/code in RISC-V would look like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
.data
a_str: .asciz "Excellent job!\n"
b_str: .asciz "Good job!\n"
c_str: .asciz "At least you passed?\n"
d_str: .asciz "Probably should have dropped it...\n"
f_str: .asciz "Did you even know you were signed up for the class?\n"
invalid_str: .asciz "You entered an invalid grade!\n"
enter_grade: .asciz "Enter your grade (capital): "
switch_labels: .word a_label, b_label, c_label, d_label, default_label, f_label
.text
main:
li a7, 4
la a0, enter_grade
ecall
li a7, 12 # read char
ecall
li t2, 5 # f is at index 5
la t0, switch_labels
addi t1, a0, -65 # t1 = grade - 'A'
blt t1, x0, default_label # if (grade-'A' < 0) goto default
bgt t1, t2, default_label # if (grade-'A' > 5) goto default
slli t1, t1, 2 # offset *= 4 (sizeof(word))
add t0, t0, t1 # t0 = switch_labels + byte_offset = &switch_labels[grade-'A']
lw t0, 0(t0) # load address from jump table
jr t0 # jump to address
a_label:
la a0, a_str
j end_switch
b_label:
la a0, b_str
j end_switch
c_label:
la a0, c_str
j end_switch
d_label:
la a0, d_str
j end_switch
f_label:
la a0, f_str
j end_switch
default_label:
la a0, invalid_str
end_switch:
li a7, 4
ecall
li a7, 10 # exit
ecall
You can see we can use the pseudoinstruction jr
(jump register) to do the eqivalent
of the computed goto statement in C.
This example probably wasn’t worth making switch style, because the overhead and extra code of making the table and preparing to jump balanced out or even outweighed the savings of a branch instruction for every case. However, as the number of options increases, the favor tilts toward using a jump table like this as long as the range of values isn’t too sparse. If the range of values is in the 100’s or 1000’s but you only have cases for a dozen or so, then obviously it isn’t worth creating a table that large only to fill it almost entirely with the default label address.
To reiterate, remember it is not about the magnitude of the actual values you’re looking for, only the difference between the highest and lowest because high - low + 1 is the size of your table.
Command Line Arguments
Command line arguments, also known as program arguments, or command line parameters, are strings that are passed to the program on startup. In high level languages like C, they are accessed through the parameters to the main function (naturally):
1
2
3
4
5
6
7
8
9
10
11
12
#include <stdio.h>
int main(int argc, char** argv)
{
printf("There are %d command line arguments:\n", argc);
for (int i=0; i<argc; i++) {
printf("%s\n", argv[i]);
}
return 0;
}
As you can see, argc
contains the number of parameters and argv
is an array of
those arguments as C strings. If you run this program you’ll get something
like this:
$ ./args 3 random arguments
There are 4 command line arguments:
./args
3
random
arguments
Notice that the first argument is what you actually typed to invoke the program, so you always have at least one argument.
RISC-V works the same way. The number of arguments is in a0
and an array of strings
is in a1
when main starts. So the same program in RISC-V looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
.data
there_are: .asciz "There are "
arguments: .asciz " command line arguments:\n"
.text
main:
mv t0, a0 # save argc
li a7, 4
la a0, there_are
ecall
mv a0, t0
li a7, 1 # print int
ecall
li a7, 4
la a0, arguments
ecall
li t1, 0 # i = 0
j arg_loop_test
arg_loop:
li a7, 4
lw a0, 0(a1)
ecall
li a7, 11
li a0, 10 # '\n'
ecall
addi t1, t1, 1 # i++
addi a1, a1, 4 # argv++ ie a1 = &argv[i]
arg_loop_test:
blt t1, t0, arg_loop # while (i < argc)
li a7, 10
ecall
Unfortunately, RARS works slightly differently, probably because it’s more GUI focused. It does not pass the program/file name as the first argument, so you can actually get 0 arguments:
$ java -jar ~/rars_latest.jar args.s pa 3 random arguments
RARS 1.5 Copyright 2003-2019 Pete Sanderson and Kenneth Vollmar
There are 3 command line arguments:
3
random
arguments
Program terminated by calling exit
You can see that you have to pass "pa" (for "program arguments") to indicate that the following strings are arguments. In the GUI, there is an option in "Settings" called "Program arguments provided to progam" which if selected will add a text box above the Text Segment for you to enter in the arguments to be passed.
No Pseudoinstructions Allowed
One relatively common assignment requirement is forbidding pseudoinstructions, either all of them, or some subset of them. This forces us to explicitly write what those pseudoinstructions are translated into (or could be translated into since there are often several alternatives).
Pseudoinstruction | Example Use | Equivalence |
---|---|---|
Load Immediate |
li t0, 42 |
ori t0, x0, 42 # or addi t0, x0, 42 |
Move |
mv t0, t1 |
or t0, x0, t1 # or add t0, x0, t1 # or addi t0, t1, 0 |
Load Address |
la t0, label |
lui t0, 0x10010 ori t0, t0, byte_offset |
Branch Less Than or Equal |
ble t0, t1, label |
# test for < and = separately blt t0, t1, label beq t0, t1, label # or add 1 to change <= to < # use a spare reg if you need # to preserve the original value addi t3, t1, 1 blt t0, t3, label |
Branch Greater Than |
bgt t0, t1, label |
# flip the operands and use blt blt t1, t0, label |
Return |
ret |
# as mentioned in chapter 5 jalr x0, ra, 0 |
Jump Register |
jr t0 |
# as mentioned in chapter 5 jalr x0, t0, 0 |
You can see how you use the non-pseudoinstructions to match the same behavior, and there’s often (usually) more than one way.
Another thing I should comment on is the la
equivalence. The reason it is
a pseudoinstruction in the first place is that an address is 32 bits. That’s also
the size of a whole instruction. Clearly there’s no way to represent a whole
address and anything else at the same time. The lower right corner of the
greensheet has the actual formats of the 6 different types of instructions and
even the U and UJ formats still needs 12 bits for the opcode and destination.
This is why lui
(and auipc
) exists,
in order to facilitate getting a full address into a register by doing it in two
halves, 20 + 12. The lower 12 can be placed with addi
or ori
after the lui
.
Note
|
RARS actually uses auipc not lui probably because that’s what the RISC-V
spec says (also see top left on page 2 of the greensheet). They both can accomplish
the same thing but the former is needed for PC-relative addressing which whereas
lui is 0/absolute addressing. Since RARS works with absolute addressing
the easier-to-use lui is fine.
|
That begs the question, what actually goes in the upper half? Well, since
we’re dealing with addresses in the .data
section, the upper portion should
match the upper part of address of the .data
section. In RARS the
.data
section starts at 0x10010000
.
But what about the lower part of the address? This involves counting the bytes
from the top of .data
to the label you want. If all you have is words, halfs,
floats, doubles, or space (with a round number), that’s fairly easy, but the
second you have strings between the start and the label you want, it’s a bit
more painful. This is why I recommend putting any string declarations at the
bottom so at least any other globals will have nice even offsets. Also, if
you have a bunch of globals, it doesn’t hurt to count once and put the offsets
in comments above each label so you don’t forget. Of course, none of this matters
if you’re allowed to just use la
which is true the vast majority of the time.
Let’s look at a small example. We’ll convert the args.s from above (reproduced here for convenience) to bare mode:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
.data
there_are: .asciz "There are "
arguments: .asciz " command line arguments:\n"
.text
main:
mv t0, a0 # save argc
li a7, 4
la a0, there_are
ecall
mv a0, t0
li a7, 1 # print int
ecall
li a7, 4
la a0, arguments
ecall
li t1, 0 # i = 0
j arg_loop_test
arg_loop:
li a7, 4
lw a0, 0(a1)
ecall
li a7, 11
li a0, 10 # '\n'
ecall
addi t1, t1, 1 # i++
addi a1, a1, 4 # argv++ ie a1 = &argv[i]
arg_loop_test:
blt t1, t0, arg_loop # while (i < argc)
li a7, 10
ecall
So we need to change the mv
, the li
's, the j
, and the la
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# RARS .data (like MARS) always starts at 0x10010000, whether pseudoinstructions
# are on or not.
.data
there_are: .asciz "There are "
arguments: .asciz " command line arguments:\n"
.text
.globl main
main:
or t0, x0, a0 # save argc
ori a7, x0, 4
#la a0, there_are
lui a0, 0x10010 # there_are is at beginning of data so just lui, lower is 0
ecall
or a0, x0, t0
ori a7, x0, 1 # print int
ecall
ori a7, x0, 4
lui a0, 0x10010
ori a0, a0, 11 # 11 is length in bytes of "There are " 10 chars + '\0'
#la a0, arguments
ecall
ori t1, x0, 0 # i = 0
#j arg_loop_test
jal x0, arg_loop_test
arg_loop:
ori a7, x0, 4 # print string for argv[i]
lw a0, 0(a1)
ecall
ori a7, x0, 11
ori a0, x0, 10 # '\n'
ecall
addi t1, t1, 1 # i++
addi a1, a1, 4 # argv++ ie a1 = &argv[i]
arg_loop_test:
blt t1, t0, arg_loop # while (i < argc)
ori a7, x0, 10
ecall
Following the table, you can see the mv
became or
, the li
became
ori
, the j
became jal
with x0
as the destination register,
and lastly the la
became lui
plus an ori
if necessary for the byte offset.
Exercises
You can support the book and purchase the chapter exercise solutions here.
-
Convert the exercises from chapter 5 to run with no pseudoinstructions (
java -jar ~/rars_latest.jar np file.s
on the command line). -
Convert the following C code to RISC-V using a jump table (Note in C/C++ enum values start at 0 and go up by one unless the user manually assigns a value, in which case it continues counting up from there).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
enum { STATE0, STATE1, STATE2, STATE3, STATE14 = 14, STATE42 = 42, STATE43, STATE44 }; int main() { int num; do { printf("Enter a number between 0 and 50: "); scanf("%d", &num); } while (num < 0 || num > 50); switch (num) { case STATE0: puts("Zilch"); break; case STATE1: puts("Uno"); break; case STATE2: puts("Dos"); break; case STATE3: puts("Tres"); break; case STATE14: puts("Catorce"); case STATE42: puts("The answer to life, the universe, and everything."); case STATE43: puts("Off by one"); case STATE44: puts("4 * 11?"); break; } puts("Thanks for playing!"); return 0; }
Appendix A: Venus
Venus is the simulator that is used by UC Berkeley’s CS 61C course. It’s nowhere near as nice nor as full featured as RARS and currently seems poorly documented and maintained.
UC Berkeley is the designer of RISC-V and so was one of the first colleges to start teaching RISC-V, years before RARS existed. Given that they’ve built up a curriculum around Venus and invested at least a little in their own fork it makes sense that they’d be hesitant to switch.
In any case, this appendix will go over the most important differences between Venus and RARS.
Versions and History
Unlike RARS which seems to have one primary and fairly well maintained stable version, and a few even more updated forks, Venus is less clear. As far as I can tell it was started by kvakil 5 years ago, was forked by ThaumicMekanism, and finally was forked by Berkeley for their cs61c course. While the latter has the most recent commit none of them are what I would call well maintained, with the former 2 seeming to stall in 2018 and 2020 respectively. Venus (all versions) has a web based interface but I don’t know why anyone would prefer that to the standalone JAR version.
As of the end of 2022 you can get Venus jar files from ThaumicMekanism’s github here or from Berkeley’s course website here.
I don’t know what the minor differences are between them but you should use whatever is provided by the course if you’re taking it.
Data section
The documentation is sorely lacking and out of date but the most relevant
differences from RARS are that it uses .asciiz
instead of .asciz
and
while .float
and .double
are reserved, Venus doesn’t actually support
floating point instructions so you can’t actually do anything with them.
Environment Calls
Venus only supports a handful of ecalls and they work differently than they do in RARS. Again, the only documentation I can find is ThaumicMekanism’s.[15]
There is one that I know of that is not listed in that table. Ecall 5 is basically
the C function atoi which converts a string to an integer. It takes the address
of the string in a1
and returns the number in a0
. If there is any leading or trailing
whitespace or the string isn’t a valid number it returns 0.
One thing you might have noticed is that Venus doesn’t support any of the input ecalls. This is extremely limiting since you can’t write any interactive programs and to test a function you have to actually change the program (usually a variable in the data section) and re-run it. The only way to even get variable program behavior is to use the file input capability to read data and do something different based on that input.
But aside from the lack of some basic ecalls, there’s also a behavioral difference in
how they work. In RARS, you use a7
to select the ecall leaving the lower a
registers for arguments which matches how you would pass arguments to a function.
Venus uses a0
to select the ecall which means your argument(s) start in a1
.
This is probably why they provide wrapper functions for the ecalls to students
for their projects, though it took until recently to get them right.
So, what does this look like in practice? Let’s take the simple command line argument program from chapter 7 and convert it for Venus. So the following program:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
.data
there_are: .asciz "There are "
arguments: .asciz " command line arguments:\n"
.text
main:
mv t0, a0 # save argc
li a7, 4
la a0, there_are
ecall
mv a0, t0
li a7, 1 # print int
ecall
li a7, 4
la a0, arguments
ecall
li t1, 0 # i = 0
j arg_loop_test
arg_loop:
li a7, 4
lw a0, 0(a1)
ecall
li a7, 11
li a0, 10 # '\n'
ecall
addi t1, t1, 1 # i++
addi a1, a1, 4 # argv++ ie a1 = &argv[i]
arg_loop_test:
blt t1, t0, arg_loop # while (i < argc)
li a7, 10
ecall
becomes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
.data
there_are: .asciiz "There are "
arguments: .asciiz " command line arguments:\n"
.text
main:
mv t0, a0 # save argc
mv t2, a1 # save argv
li a0, 4 # print str
la a1, there_are
ecall
mv a1, t0
li a0, 1 # print int
ecall
li a0, 4
la a1, arguments
ecall
li t1, 0 # i = 0
j arg_loop_test
arg_loop:
li a0, 4
lw a1, 0(t2)
ecall
li a0, 11
li a1, 10 # '\n'
ecall
addi t1, t1, 1 # i++
addi t2, t2, 4 # argv++ ie t2 = &argv[i]
arg_loop_test:
blt t1, t0, arg_loop # while (i < argc)
li a0, 10
ecall
Note that we had to save argv beacuse a1
is now used in our ecalls. In addition
the behavior is different than RARS:
$ java -jar ~/venus-jvm-latest.jar args_venus.s
There are 0 command line arguments:
$ java -jar ~/venus-jvm-latest.jar args_venus.s hello
There are 2 command line arguments:
args_venus.s
hello
RARS may have decided not to include the program name/"executable" in its arguments
but at least it is consistent in its behavior. In addition, the section
on passing arguments in CS 61C’s documentation is inaccurate as of December 2022. Just
like with C and RARS, argc comes first in a0
then argv in a1
, not
the other way around as described.[16]
Constants
Another minor difference from RARS is how you declare constants. In Venus you use the same syntax as C, so the constants example from chapter 7:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
.eqv sys_print_str 4
.eqv sys_exit 10
.data
hello: .asciz "Hello World!\n"
.text
main:
li a7, sys_print_str
la a0, hello # load address of string to print into a0
ecall
li a7, sys_exit
ecall
becomes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#define sys_print_str 4
#define sys_exit 10
.data
hello: .asciiz "Hello World!\n"
.text
main:
li a0, sys_print_str
la a1, hello # load address of string to print into a1
ecall
li a0, sys_exit
ecall
Conclusion
While most of this book applies equally well to both programs (other than chapter 6 of course) and to RISC-V in general, this appendix should clear up any difficulties you might have run into trying to convert the examples to run in Venus. Hopefully Berkeley will decide to switch to RARS eventually.
References and Useful Links
Supporters
Corporate
supporter3 |
Platinum
supporter4 |
supporter5 |
|
supporter7 |
|
|
Gold
supporter9 |
supporter10 |
|
supporter11 |
supporter12 |
supporter13 |
Silver
supporter14 |
supporter15 |
supporter16 |
supporter17 |
supporter18 |
supporter19 |
supporter20 |
|
|
Bronze
supporter23 |
||
supporter24 |
supporter25 |
supporter26 |
supporter27 |
supporter28 |
supporter29 |
supporter30 |
|
|