Even at 4k pages, 2.5MB per process is pretty extreme. Imagine
launching a process that needs 100k of memory but has 2.5MB of
overhead! Imagine if you had to map more than 4GB of memory!
In a 2 level page table, the first 10-bits is used for the first
level and the second 20-bits is used for the second level.
This doesn’t reduce the size of the entry, but reduces the number of
entries we need to store per process.
Basically, this works by having one level of the page table manage a
larger range. If we’re using 10-bits, then the first level page table
is mapping 4MB pages, then the 2nd level divides it into 4k pages. We
only create entries in the 2nd level if any exist.
Most computers record how each page has been accessed.
Typically, most hardware records whether a page has been read or
modified in two bit fields with the ability to reset these bits. This
yields four classes of pages:
1 - Not referenced, not modified
2 - Not referenced, modified
3 - Referenced, not modified
4 - Referenced, modified
Some hardware implementations will periodically clear the read bit to
help determine which pages have been recently read. This is how you
can get class 2 above.
Second chance FIFO improves over FIFOs deficiency of paging out
heavily used pages by taking into account the read and write bits
Second chance FIFO will scan the list in order for a page with both
read/write bits set to zero. If it finds a page in this class, it
will swap that page out. If it fails to find such a page, it will
swap out the first page in the list.
The clock algorithm improves upon second chance FIFO
Second chance FIFO suffers from many modifications to its internal
list.
The clock algorithm uses a uses a circular list and stores a pointer
to the oldest page. When a page fault occurs, the page pointed to is
inspected. If its read bit is 0, it is evicted. If the read bit is 1,
it is set to 0 and the pointer advances.
In reality, the clock algorithm is only very slightly better than
second chance FIFO.
Pages that have been heavily used recently will be heavily used in
the near future
Pages that have not been used recently will not be used in the
near future
To maintain the data necessary to implement LRU, the OS would have to
maintain a linked list of all pages in physical memory. This list
would have the most recently used page in the head and the least
recently used page in the tail. This is not cheap. Every access
requires a search of the list. Also the list can be very big.
When a page fault occurs, the page with the lowest counter is
evicted.
This policy more closely approximates LRU by favoring recently
accessed pages and penalizing pages that have not been recently
access by decreasing their value.
This algorithm falls short of LRU in two ways:
The number of bits in the counter are finite. This allows for two
pages two have the value of zero, but one of them being more
recently used.
The algorithm is constrained to the grain of a clock interrupt.
All pages accessed between two successive interrupts are
considered to be as recently accessed as each other.
A naive paging implementation would start up a process with none of
its pages in memory (libraries, program, data, bss, etc…).
When the process attempts to execute its first instruction it would
immediately generate a page fault.
For the first few moments of a program’s execution it would generate
many page faults until it was mostly loaded and then run without
generating many page faults.
Generating many and unnecessary faults leads to poorly performing
applications.
Although not universally true, many applications exhibit a locality
of reference.
This means that, if a process is working with a given page at one
point in time, then just before that time and in the near future it
is likely to continue to work with that page and pages that are near
(in terms of virtual address distance).
Many programs will have one or more regions that they exhibit a
locality of reference. Most commonly they will be one or more regions
in the stack or heap.
The set of pages that a process is currently using is called the
working set
If we take locality into account, how can we make paging systems
faster?
Some approaches:
When loading pages from a library or program file, adjacent pages
are loaded at the same time.
Often, after servicing a page fault, and operating system can
continue to load program pages in asynchronously.
When choosing pages to evict, if there is more than one page that
is desirable to evict, the OS can choose to evict the page with
the greatest distance from any pages in the working set.
Text and other read-only pages will always be in classes 1 or 3 (not
modified)
Stack and heap pages can be in any of 1-4.
When a page in class 1 or 3 (not modified) is evicted, the swapper
only needs to mark the page as not resident and then reuse the
physical page for a new entry.
When a page in class 2 or 4 is evicted, the swapper must also copy
the contents of the page to a different storage system (typical a
disc). This increases the cost of evicting these pages.
For some operations, we need to guarantee that a page will remain in
physical memory
The most common case is for regions of memory that are dedicated to
buffering for devices or regions of memory that work with DMA.
DMA is basically a system by which an operating system kernel can
tell a device to write the results of an operation directly to a
specific region of memory without interacting directly with the CPU.
During the period of time a DMA operation is occuring, the OS must
guarantee that the region of memory is not evicted by the swapper.
The best, but less ideal alternative to this is to only do DMA
operations to operating system buffers and then copy them to program
buffers.
COW comes into effect when write operations happen to a shared page.
The process that causes the page fault, will make a copy of the
physical page into a new physical page and updates its page table
entry to point to that page.
The new page will then be marked as writable.
In this way, new processes only use memory that is different from the
parent process.
For heap, stack, and data pages the backing store in most operating
systems is one of:
Swap file
Swap partition
In simpler operating systems, swap partitions are preferable because
the operating system can interact directly with the disc and not FS
code.
In more advanced operating systems (more recent versions of Linux or
Windows), the FS implementation is advanced enough that the OS can
guarantee the location of sectors of the swap file that there is no
overhead to using a swap file.
Swap files have the advantage of being able to be resized on demand.
In Linux, additional swap files can be created and then used with the
swapon command.
Windows manages one or more swap files automatically.
Hibernation is a specific implementation of a swap file.
To hibernate, the operating system will page out all used physical
pages to disc. Either a special hibernation file or the swap file
will be used.
Then, the operating system will either shutdown the computer or put
the computer in a special low power state.
When the computer boots back up, the operating system will notice
that it was previously shutdown by hibernation.
After the core OS components are loaded, the OS will restore the page
table from the hibernation file and then begin paging in from the
hibernation file.
Theoretically, the best use of physical memory is to use all of the
physical memory if possible.
To improve performance and responsiveness of operations that need new
memory (reading a new file, writing new data, allocating new pages to
the heap or stack), many modern VM implementations will keep a few
pages free at all times.
Both Windows and Linux will typically keep about 12-16MB free as a
“hot memory” area.
This hot memory area has the effect of preventing page faults due to
“jitters” of memory usage. So, if a process is increasing and
decreasing its memory usage rapidly, it will not likely generate page
faults.
1- The hardware interrupts the kernel. Program counter and registers
are saved. Information necessary to restart the current instruction
is also saved. The OS is then called.
2- The OS discovers a page fault has occurred. The OS inspects either
a special register or inspects the saved instruction from 1 to figure
out which page is needed.
3- Once the page is discovered, it determines the cause of the page
fault. If the address is inconsistent with access rights or memory
accessible to the process, a signal is sent to the process or the
process is terminated.
4- If it is consistent, the OS tries to acquire a free physical page
to load the necessary page into memory. If no physical page is free,
the page replacement algorithm is invoked.
5- If the evicted page is dirty, it is scheduled to be written to
disc. In this case the faulting process is put to sleep and a context
switch occurs.
6- As soon as the evicted page is clean, the OS schedules a disc
operation to load the page. While waiting for the load, the faulting
process is suspended and the scheduler will pick another process to
run.
7- As soon as the page is loaded from disc, the page table entry is
updated to reflect its position and updates the status of the page to
resident
8- The OS restores the registers of the program, and depending on
hardware details will retry the faulting instruction, updating the
program counter accordingly.
9- The faulting process is then marked as runnable for the scheduler.