########################################################################### # Assembly Notes # # Anuradha Weeraman, 03 JUNE 2004 # # Adapted from "Gavin's Guide to 80x86 Assembly" # # $Id: assembly.txt,v 1.1 2004/06/02 21:17:53 anuradha Exp $ # ########################################################################### 80x86 family was started in 1981 with the 8086. 1 Nibble = 4 Bits 1 Byte = 2 Nibbles = 8 Bits 1 Word = 2 Bytes = 4 Nibbles = 16 Bits three sizes of registers : 8 bit, 16 bit, 32 bit (on 386 and above) four types of registers : general purpose registers segment registers index registers stack registers general purpose registers (16 bit registers) : they are split into two 8-bit registers. eg AX is split into AH, which contains the high byte and AL which contains the low byte. AX, BX, CX, DX on 386 and above there are some 32-bit registers called : EAX, EBX, ECX, EDX you can use AL, AH, AX, EAX separately and treat them as separate for some tasks SI, DI, SP and BP can also be used as general purposes registers but they have more specific tasks. they are not split into two halves. index registers aka pointer registers. 16-bit. mainly used for string instructions. SI (source index), DI (destination index) and IP (instruction pointer) on 386 and above there are also 32-bit index registers : ESI and EDI you can also use BX to index strings IP can't be manipulated directly because it has the address of the next instruction stack registers BP and SP due to memory constraints placed by the original designers of the 8088, memory above 1MB needs to be accessed by using 'segments' and 'offsets' which allows two 16-bit registers to be used though not as a single 32-bit register which they thought was too much for anyone. OFFSET = SEGMENT * 16 SEGMENT = OFFSET / 16 (the lower four bits are lost) one register contains the segment and the other contains the offset. If you put the two registers together you get a 20-bit address. DS stores the segment and SI stores the offset, DS:SI, SEGMENT:OFFSET. segment registers CS, DS, ES, SS on 386+, FS, GS offset registers BX, DI, SI, BP, SP, IP. on 386+ protected mode, any general register (not a segment register, or IP) can be used as an offset register. the PC stack is LIFO (Last in First Out). INSTRUCTIONS there are a lot of instructions in assembly but only about 20 are used a lot to put data to a register : intel syntax : mov ax,10 mov bx,20 mov cx,30 mov dx,40 at&t syntax : mov 10,ax mov 20,bx mov 30,cx mov 40,dx GNU assembler syntax : mov $10,%ax mov $20,%bx mov $30,%cx mov $40,%dx these notes will cover mostly the intel syntax. 'push' puts data onto the stack and 'pop' takes the last one out of it. push cx push ax pop cx pop ax this swaps the two registers cx and ax. this can also be done using the xchg instruction : xchg ax,cx three types of operands : immediate eg. '10', 'Y' register any general purpose or index register memory a variable which is stored in memory mov ax,10 mov bx,cx mov dx,Number 'int' calls an OS or BIOS function which are subroutines to do things that we would rather not write a function for. int 21h ; Calls a DOS service int 10h ; Calls the Video BIOS interrupt most interrupts have more than one function and may require you to pass a number to the function you want. this is usually put in AH. to print a message on the screen : mov ah,9 ; subroutine number 9 int 21h ; call the interrupt but in order for this to work the text to be printed should be specified. mov dx,OFFSET Message ; DX contains offset of message mov ax,SEG Message ; AX contains segment of message mov ds,ax ; DS:DX points to message mov ah,9 ; function 9 - display string int 21h ; call dos service the words OFFSET and SEG tells the compiler to put the OFFSET and the SEGMENT of the message to the register rather than its contents. in the data segment : Message DB "Hello World!$" the string has to be terminated by a '$'. DB is a an acronym for 'declare byte' and the message is an array of bytes. bytes (DB) words (DW) double words (DD) - requires a 32-bit register such as EAX to fit it in Number1 db ? Number2 dw ? ; ? means uninitialized Number1 db 0 Number2 dw 1 if you declare the variable as a word, u cannot move it to an 8-bit register or if you declare it as a byte u cannot move it to a 16-bit register. you can only put bytes into 8-bit register and words into 16-bit registers. 'xor' performs a boolean XOR. it is commonly used to erase a register or var. 'jmp label' changes flow by jumping to the section of code labeled "label :" jmp ALabel ...... ...... ALabel: to compare and change flow control, use the 'cmp' instruction : cmp ax,3 je label ; jump if equal to label jump on condition instructions : ja : first > second jae : first >= second jb : first < second jbe : first <= second jna : ! ja (jb) jnae : ! jae (jbe) jnb : ! jb (ja) jnbe : ! jbe (jae) jz, je : jumps if equal jnz,jne : jumps if not equal jc : jumps if carry flag is set a jump can be only a maximum of 127 bytes in any direction cmp al,'Y' je ItsYes ADD operand1,operand2 adds operand2 to operand1 and stores in operand1 SUB operand1,operand2 subtracts operand2 from operand1 MUL [register | variable] multiplies two unsigned integers (always positive) IMUL [register | variable] multiplies two signed integers (positive or negative) this multiplies AL or AX by the register or variable given This multiples AL or AX by the register or variable given. AL is multiplied if a byte sized operand is given and the result is stored in AX. If the operand is word sized AX is multiplied and the result is placed in DX:AX. On a 386, 486 or Pentium the EAX register can be used and the answer is stored in EDX:EAX. DIV [register | variable] divides two unsigned integers (always positive) IDIV [register | variable] divides two signed integers (positive or negative) This works in the same way as MUL and IMUL by dividing the number in AX by the register or variable given. The answer is stored in two places. AL stores the answer and the remainder is in AH. If the operand is a 16 bit register than the number in DX:AX is divided by the operand and the answer is stored in AX and remainder in DX. PROC AProcedure . . ; some code to do something . ret ; if this is not here then your computer will crash ENDP AProcedure It is equally easy to run a procedure all you need to do is this: call AProcedure parameters can be passed to procedures via the registers, memory and stack. available memory models : tiny one segment for both code and data small all code is placed in one segment and all data declared in the data segment is also placed in one segment. this means that all procedures and variables are addressed as NEAR by pointing at offsets only compact by default, all elements of code are placed in one segment but each element of data can be placed in its own physical segment. this means that data elements are addressed by pointing at both at the segment and offset addresses. code elements (procedures) are NEAR and variables are FAR. medium opposite of compact. data elements are NEAR and procedures are FAR. large both procedures and variables are FAR. you have to point at both the segment and and offset addresses. flat this isn't used much as it is for 32-bit unsegmented memory space. macros are like pre-processor directives being substituted by the code during compilation. macro ...... instructions endm SaveRegs macro push ax push bx push cx push dx endm to call it : SaveRegs when defining variables in macros, use the 'local' directive : LOCAL name where name is is the name of a local variable or label macros can also have parameters : macro arg1,arg2,arg3,result ...... endm stopped at asm6