########################################################################### # Memory Management Notes # # Anuradha Weeraman, 03 June 2003 # # Adapted from "The Linux Kernel" # # $Id: memory-management.txt,v 1.1 2004/06/02 21:17:53 anuradha Exp $ # ########################################################################### What the mm subsystem provide : * large address space * protection * memory mapping * fair physical memory allocation * shared virtual memory in a 'virtual memory' system, all addresses are virtual address and not physical address. these virtual addresses are converted into physical addresses by the processor based on information held in a set of tables maintained by the operating system. each process has a 'page table' which translates the processes 'virtual page frame numbers' to 'physical addresses' based on the VPFN's offsets. the pages are 4KB or 8KB chunks of memory. each entry in a page table consists of : valid flag, the physical page frame number and access control information. when the processor cannot find a page identified by the page table it notifies the operating system with a 'page fault' along with the faulting virtual address and a reason for the page fault. in this way, virtual memory can be mapped to physical memory in any order. by using page tables, its easy to share code and memory. shared memory is used as IPC and linux uses System V shared memory IPC. loading only the pages that are required is known as 'demand paging'. linux uses demand paging to load executable images into a process virtual memory. whenever a command is run, its first part is brought into memory and the rest is mapped into the process's address space and demand paged from disk. this is also called 'memory mapping'. 'swapping' is when 'dirty pages' are saved to a swap file for later retrieval when the os finds out that it has run out of physical memory and needs to bring a virtual page into memory. 'thrashing' occurs when the 'swap algorithm' is not very effective. the set of pages that a process is currently using is called a 'working set'. linux uses the 'Least Recently Used (LRU)' paging technique to choose which pages need to be swapped. in this scheme, every page has an age and the more the page is accessed the younger it gets. older, stale pages are good candidates for swapping. virtual memory makes it easy for processes to share memory via page tables. 'physical addressing mode' as opposed to 'virtual address mode' requires no page tables and the processor doesn't attempt to do any address translations in this mode. the linux kernel is linked to run in physical address space. the page table entries also contain access control information. most processors have at least two modes of execution : kernel and user. kernel code and data structures aren't executable or accessed by user except when its running in kernel mode. it does access control information by the use of flags, V - valid. if set then this PTE (page table entry) is valid FOE - fault on execute. processor reports a page fault and passes control over to the os whenever code here is tried to be executed FOW - fault on write. FOR - fault on read. ASM - address space match. used when the os wishes to clear some of the entries from the translation buffer. KRE - code running in kernel mode can read this page. URE - code running in user mode can read this page. GH - granularity hint when mapping an entire block with a single translation buffer entry rather than many KWE - code running in kernel mode can write to this page UWE - code running in user mode can write to this page page frame number - for PTEs with the V bit set, this field contains the physical Page Frame Number for this PTE. for invalid PTEs, if this field is not zero, it contains information about where the page is in the swap file. the following two bits are defined and used by linux : _PAGE_DIRTY - if set, the page needs to be swapped. _PAGE_ACCESSED - used by linux to mark a page as having been accessed linux uses a number of memory management related CACHES : buffer cache contains data buffers used by block device drivers, fixed size (512B). it is indexed via the device identifier and the desired block number. block devices are only ever accessed via the buffer cache. page cache used to speed up access to images and data on disk. swap cache only modified (or dirty) pages are rewritten to a swap file. hardware caches a cache of Page Table Entries is found in the processor. if these caches get corrupted, there is a chance for the system to crash. LINUX PAGE TABLES linux assumes that there are three levels of page tables. each page table accessed contains the page frame number of the next level of page table. in this way, the virtual address can be broken into a number of fields, each field providing an offset into a particular page table. to translate a virtual address into a physical one, the processor must take the contents of each level field, convert it into an offset into the physical page containing the page table and read the page frame number of the next level of page table. this is repeated three time until the page frame number of the physical page containing the virtual address is found. the final field in the virtual address, the byte offset, is used to find the data inside the page. each platform that linux runs on must provide translation macros that allow the kernel to traverse the page tables for a particular process. this way, the kernel does not need to know the format of the page table entries or how they are arranged. this is so successful that linux uses the same page table manipulation code for the alpha processor, which has three levels of page tables, and for intel x86 processors, which have two. PAGE ALLOCATION AND DEALLOCATION the mechanisms and data structures used for page allocation and deallocation are perhaps the most critical in maintaining the efficiency of the virtual memory subsystem. all of the physical pages in the system are described by the 'mem_map' data structure which is a list of 'mem_map_t'. important fields (concerning mm) in mem_map_t are : count : number of uses of this page. greater when the page is being shared age : age of the page. used to decide whether to swap it or not. map_nr: physical page frame number that this mem_map_t describes. the 'free_area' vector is used by the page allocation code to find free pages linux uses the buddy algorithm to effectively allocate and deallocate blocks of pages. pages are allocated in blocks which are powers of 2 in size. eg. it can allocate 1 page, 2 pages, 4 pages, 8 pages etc. so long as there are enough free pages in the system to grant this request. 'free_area' data structure is scanned for blocks of the required size. the page deallocation code recombines adjacent pages into larger blocks of free pages whenever it can. MEMORY MAPPING when an image is executed, the contents of the executable image must be brought into the proesses virtual address space. the same is also true of any shared libraries that the executable image has been linked to use. the executable file is not actually brought into physical memory, instead it is merely linked into the process virtual memory. then, as parts of the program are referenced by the running application, the image is brought into memory from the executable image. this linking of an image into a processes virtual address space is known as memory mapping. every processes virtual memory is represented by an 'mm_struct' data structure. it has pointers to 'vm_area_struct' data structures that describes the start and end of the are of virtual memory, the processes access rights to that memory and a set of operations for that memory. these operations are a set of routines that linux must use when manipulating this area of virtual memory. vm_area_struct : vm_end vm_start vm_flags vm_inode vm_ops open() close() unmap() protect() sync() advise() nopage() wppage() swapout() swapin() vm_next stopped at DEMAND PAGING