An assembly language is a low-level programming language that is specific to a given CPU's instruction set in an one-to-one relationship. For instance, the Zilog Z80 assembly language is different from the Intel 8080 assembly language, despite the similarities in the underlying instruction sets. Each assembly instruction maps directly to a machine language instruction and vice versa. This also means that machine language instructions can easily be converted ("disassembled") to assembly language for, for example, purposes of debugging and reverse engineering. In other words, assembly language is "just" a human-comprehensible way of displaying processor instructions.
A program, called an assembler converts source code written in assembly language into the CPU's machine code. Each CPU instruction is assigned a short mnemonic code (traditionally 3 or 4 characters in length). Additional syntax supports registers, addressing modes, labels and other symbolic names, comments, and directives. Assembly languages supporting macro processing (substitution, etc.) used to be called macro assembly languages, but as most assembly languages now support macros, the "macro" prefix has been dropped from common usage.
The instruction mnemonics are typically defined by the manufacturer of the CPU. Although anyone could assign their own arbitrary set of mnemonics for a CPU's instruction set, in practice this is seldom done.
Assembly languages do not provide many of the useful abstractions of high-level languages, such as memory management, object and other complex data structure support, or string manipulation. Such features are often available through libraries of assembly code, though.
Assembly language is essential when a new CPU is developed since it allows the development of higher-level language compilers for that CPU. Assembly is also required for access to unique and low-level features of a CPU which is why portions of most operating systems must be written using assembly language. Because assembly provides no abstractions, code written with it can run faster than equivalent code written in a high-level programming language. However, modern optimizing compilers are often better at generating efficient machine code than human-crafted assembly. Finally, writing assembly code is more tedious (and therefore more error-prone) than using a high-level language.
Macros provide short-hand for assembly programmers. The most common macro feature is the substitution macro. This allows the programmer to define his own mnemonic in place of several preexisting mnemonics. When the macro is used, the assembler will substitute the corresponding mnemonics in place of the macro, as if the programmer had included those mnemonics at that point in the source code. Another form of macro is the iteration macro which allows the programmer to duplicate something without having the manually duplicate it numerous times. Thus macros can reduce the size, and complexity, of the assembler source code.
Without an assembler, a person who wanted to increment the Zilog Z80 "C" register, would have to figure out the numeric code (in this case 4 hexadecimal or 00000100 binary). Assembly allows this operation to be specified with the mnemonic:
INC C
which is much easier to remember. Modern CPUs have even more complicated instruction sets which can exceed the length of this instruction by four times or more. For this reason, some refer to machine language as a first-generation language, and assembly as a second-generation language. High-level programming languages are called third-generation languages.
The following is a sample program written in assembly for an ARM CPU running Linux kernel with ARM EABI system call conventions. The program will print "hello world" to standard output.
# Arguments are passed in registers r0-6, the system call number in r7 # Write msglen bytes starting from address msg to standard output (file descriptor 1) mov r0, #1 adr r1, msg mov r2, #msglen mov r7, #4 swi #0 # Exit with status 0 mov r0, #0 mov r7, #1 swi #0 # Assembler macros that specify the data msg: .ascii "hello world\n" msglen = . - msg
Categories: [Programming Languages]