Assembly is the programming language closest to a CPU. ‘Closest’ here means that the keywords in the language translate 1:1 into instructions done by the CPU.
With other (higher lever) languages, there’s either an interpreter or compiler which unpacks the language and generates usually quite significantly higher number of instructions when the code is to be executed on a real CPU.
With assembly, you tell exactly the processor what you want done – at register level.
CPU operates on basically 2 kinds of thingies:
- memory locations
Memory is like a large collection of storage lockers. When you write to memory, a numeric value stays in that location, until either it is being overwritten (written again with another value), or the computer is turned off.
Reading from memory simply means retrieving the value at a memory location, to a CPU register.
Latency is the measure of time between initiating a memory read (also called sometimes “fetch”), and the time the value is usable in the register. What actually happens in real-world physics, is that electrons travel via copper interconnects, in the chip. The memory location is handled by two layers, logically: a cache, and the memory controller. Cache is a map (think of a hash) which can store some maximum number of recently fetched and/or written values from actual RAM memory. Cache speeds up operations in CPU by a significant factor. Think of the cache as a good colleague: instead of finding out stuff in a complex computer-based research, Googling around, reading dozens of pages, you could ask your colleague and he would have an answer in 2 seconds. You’d save a ton of time. This requires your colleague has the particular piece of information you’re looking for. And naturally our brains are limited. So are cache sizes limited.
Whereas higher level languages achieve a lot of operations by abstracting the ingredients into a function (subroutine) name, assembly is about the elementals of computation: manipulating bits, and registers (composed of 8, 16, 32, or 64 bits at a time) — essentially, one of the smallest storage units of digital computers.
The things we often learn as programmers is a bit more “sophisticated”, and rightly so! It’s good to work on a level we’re comfortable with.
All languages, deep down “enough”, will be compiled into assembly. For example, Java is compiled to Java bytecode, which is then run in a virtual machine called JVM. The JVM however has to eventually execute plain assembly. Same with all other languages.
A (high level) programming language can be either:
- interpreted, or
The question between those two choices has mainly to do with: at which point does the conversion to machine language happen; is it “early on” (compiled languages), or during the execution (interpreted languages). Python is interpreted, while C language is a compiled language. C produces traditional executable files; whereas Python source code is run by passing the file to the Python interpreter.
Assembly is a good language to really get an understanding of what the computing hardware actually does. All modern computers are described with the van Neumann architectural model:
A computer simply can load binary values to registers; do comparison and the usual suspects like addition, subtraction, multiplication, and division; store the value back to RAM (memory); and do stuff like jump around in various parts of the code (the ‘IF… THEN’ equivalent).
At first the basic level of profiency in assembly is attained by learning the opcodes: what commands are available. In reality, even assembly commands are internally stored and executed as a sequence of microcode within the processor.
Think of registers as just 8-, 16- 32 or 64 bit variables. They are done in real gates, physical constructs in the CPU. So they “always exist”, fixed. Their value can be altered: you can load a number, or you can load a number (content) from a memory location. There are commands to
- zero a register (make it 0)
- add two registers (arithmetically sum)
- subtract a register’s value from another register
- divide a number in a register by another register
- compare the values of registers (and take action: a jump = branch)
I did a lot of Intel x86 assembly programming as teen.
Is assembly really that hard?
Why does assembly have a hard-to-grasp reputation? It’s probably because of the very terse and “weird” vocabulary. Also compared to other languages, there’s so much of “nonsensical” stuff in assembly: why the heck do you “move the value 64778 to register this-or-that”.. It doesn’t seem to make any sense at all!
When you’ve learned to program in assembly, it all makes sense. But I have to admit that looking at some of the code now, in the year 2019 – that’s some 25 years later – I don’t recollect all the details anymore.
Let’s look at a image uncompression program. It’s a complete program, showing a RIX image on-screen. RIX is a format which is now almost extinct. It used to be quite popular in the wild, although very simple format. Because of being simple the .RIX was also a perfect training target for making a program that can interpret it.
KOODI SEGMENT PARA 'CODE'
ASSUME CS:KOODI, DS:TIETO
Set_DTA proc near
AllocMem proc near
DeAllocMem proc near
;; Find first file, matching the search mask string defined
;; in memory area pointed to by "maski"
FindFirst proc near
;; After we have called once the FindFirst proc,
;; continue giving next results using the same search mask string
FindNext proc near
LoadRIX proc near
SwitchPic proc near
push ds es
mov al,byte ptr [es:si]
push ds ax
mov byte ptr [si-030ah],al
inc word ptr [w1]
pop es ds
ClearBuf proc near
mov word ptr [es:si],ax
PROSED PROC FAR
PROSED ENDP KOODI ENDS TIETO
SEGMENT PARA 'DATA'
w1 dw 0
w2 dw 0
maski db '*.rix',0
nomem db 'Not enough free memory (64K) to run program!$'
norix db 'No .RIX files found in current directory!$'
alseg dw 0 kahva
dw 0 new_dta db 30 dup(0) dta_name
db 13 dup(0) TIETO