Virtual Machines (03:24)

Go Back

Virtual Machines

System Virtual Machines
Goal: Imitate hardware
What hardware?
- Whatever's easiest
Running Operating Systems
Terms
- Hypervisor (or "VM Monitor")
  - e.g. Virtualbox
  - Sometimes a part of host OS
- Guest OS
  - e.g Linux
  - Application for the Hypervisor
- Host OS
  - e.g. Windows
  - Sometimes Host OS = Hypervisor
We'll assume Hypervisor = Host OS
- e.g. Hypervisor will have exception handlers

Go Back

Imitate: How close?

Full Virtualization
- Guest OS unmodified
Paravirtualization
- Small changes to guest OS
Fuzzy line

Other techniques (01:16)

Go Back

Other techniques

Extra HW support for VMS
Compile guest OS' machine code to new machine code
- Not as slow as you might think

VM Layering (00:34)

Go Back

VM Layering

Run Hypervisor in Kernel mode
Run Guest OS in User mode
- Can think of Guest OS as Hypervisor's "process"

Pretend Modes (00:41)

Go Back

Pretend Modes

Pretend User mode and Pretend kernel mode for guest OS
- Needs to run stuff in Kernel mode
- But whole thing in user mode

Keeping track of stuff (01:36)

Go Back

Keeping Track of stuff

Hypervisor keeps track of regular process stuff
- Some extras tho
  - Whether in User/kernel mode
  - Page table ptr of guest OS (HARD)
  - Exception table ptr
  - Whether interrupts are disabled
    - (whether we're pretending they are or not)
Guest OS runs like a process, with extra state to keep track of

Basic Hypervisor Flow (02:32)

Go Back

How do we make it look like it's on real HW?

Run it normally until it stops (exceptions)
Hypervisor simulates what processor would "normally" do
- e.g. Screen system call: talks to screen on your behalf
- e.g. fault to disable interrupts: Store that you disabled interrupts
- e.g. System call: Run Guest OS' system call handler

Virtual machine execution pieces (00:50)

Go Back

Virtal Machine execution pieces

Making IO and kernel-mode related instructions work
- Trap and emulate
- Force instruction to cause fault
- Make fault handler do what HW would do
- might require reading machine code to emulate instruction
Making exceptions/interrupts work
- "reflect" exceptions/interrupts into guest OS
Making page tables work
- Whole thing

Trap and Emulate (00:56)

Go Back

Trap and emulate

Normally: Privileged instrs trigger fault
Normal OS: crash the program
Instead: Hypervisor pretends it did the right thing
- We'll do the privileged thing on your behalf

Privileged I/O flow (00:59)

Go Back

Privileged I/O flow

Guest OS tries to access device
- Not allowed: Guest OS in user mode
Protection fault triggered
Handler in hypervisor
- "Oh you were trying to get keyboard input"
- talk to device for you
- Update guest OS w/ any results
- Switch back

What this looks like (pseudocode) (05:04)

Go Back

How this looks

Handler actually looks at the faulting instruction
Also check what mode it thinks it's in
Straightforward but tedious
- Why I/O isn't super fast in VMs

What if we're in pretend user mode? (02:00)

Go Back

What if we're in pretend user mode?

Read from keyboard

If in pretend kernel mode: Do the actual privileged instruction
If in pretend user mode: Invoke guest OS' exception handler (after switching to fake kernel mode, etc)
- How nested VMs work
- (pass handling down the VM chain)

System calls (03:07)

Go Back

System calls

Program in Guest OS makes system call
Invokes handler in hypervisor
Hypervisor forwards system call to the guest OS' system call handler
- Also mark fake kernel mode, change PC, etc.
- Also have to update page table (more on this later)

After system call

Return from exception in guest OS' syscall handler
- Privileged instruction --> Protection fault cuz pretend kernel mode isn't real kernel mode
- hypervisor has to emulate the actual return to the user program

How this works for Memory-Mapped I/O (02:22)

Go Back

Applying this I/O to memory-mapped I/O

Emulating writing to a control register
- Have to emulate every memory-writing instruction to make devices work
- (at least) 2 types of page faults for hypervisor:
  1. Guest OS trying to access device memory (emulate the device)
  2. Guest OS trying to access memory not in its page table (run exception handler in guest OS)
    - Trigger page fault in guest OS
- tl;dr: Is it not mapped because it's a device, or because we don't have this mapping in the guest OS?
- Virtual mem on top of this? Extra cases (next)

Exercise: How many exceptions? (02:37)

Go Back

Exceptions that occur:

System call
Write 1
Write 2
Write 3
Write 4
Return from system call
More if page faults to bring in memory, etc

Making this faster

Paravirtualization

This doesn't always work... (01:29)

Go Back

This often doesn't actually work

Works on some architectures
- Relies on all special instrs triggering fault
- Not all in x86
  - Some behave differently in User and kernel mode, but not just fault and no fault
- Why original VMWare used to compile guest OS' machine code to other machine code
Modern x86 added some ISA extensions that solve this problem

What about virtual memory? (01:01)

Go Back

Things Virtual Memory needs

Change page table from user to kernel mode
- Not automatic cuz actually all in user mode
Virtual memory most complicated part of virtualization
- also source of slowdowns

Terms (01:21)

Go Back

Terms

Virtual address: Fake virtual addr for guest OS
Physical address: Fake physical addr for guest OS
- Virtual address for hypervisor
Machine address: Real physical address for hypervisor / host OS

Three page tables (05:01)

Go Back

Three page tables

Guest page table
- Hypervisor records where it is, etc
  - Privileged instruction sets base ptr
- Translates virtual to physical
Hypervisor page table?
- Translates physical to machine
- Might not really be a page table
- We do need to know this mapping thok
- Need to have a mapping
  - Where would you store hypervisor?
  - What if you wanna run multiple VMs?

But guest OS needs translation from virtual to machine directly

Shadow page table
- Virtual to machine translation
- "Shadow" of guest page table
- Hypervisor constructs this from guest page table and its own page table
Guest OS only knows about guest page table
Hardware only knows about shadow page table

Creating the shadow page table (07:44)

Go Back

Creating the shadow page table

Simple
2 page table lookups:
- guest PT lookup
- hypervisor ~PT lookup
Combine the two

When do we update the shadow page table?

Needs to be up to date
What do you have to do when you update the page table?
- Flush the TLB (Translation Lookaside Buffer)
  - A privileged instruction!
- Processor actually uses TLB to do address translation w/ normal PT's
  - Same problem
    - Sythesized from a page table
    - Gets out of date
    - Needs to be told when to update
  - So can use same strategy
    - Manage the Shadow PT the same way the hardware manages the TLB
    - Shadow page table is basically a "virtual TLB" (stored as a PT instead of a cache)
  - Usual strategy
    - Want addr translated
    - actually go through TLB
    - Fetch addr from PT if not in TLB (Fetch on demand)
      - Shadow PT: fetch on page fault
    - When PT is modified
      - OS invalidates TLB entry (or flush them all)
      - This is the privileged instruction we'll take advantage of
  - Our strategyk
    - Guest OS edits guest PT
      - triggers instruction to flush TLB
      - Hypervisor clears part of shadow page table (fill in later if page fault)
    - Guest page faults
      - Hypervisor automatically detches on demand if necessary (like how HW does this automatically)
      - Hypervisor does conversion from 2 PTs
More on shadow page table
- Caches commonly used PTEs in translated form

nit: memory-mapped I/O (00:45)

Go Back

nit: memory-mapped I/O

If physical addr is for fake I/O device
- Make shadow PTE invalid
- We want page fault for these to emulate access to the device

Page tables and kernel mode? (02:41)

Go Back

Page tables and kernel mode

remember: Can mark pages as kernel-only
Hypervisor needs to make this work
If guest OS in pretend kernel mode
- Sjhadow PTE: marked as user-mode accessible (really in user mode)
If guest OS in pretend user mode
- Shadow PTE: marked inaccessible

One solution: Two shadow page tables

One for pretend user mode
One for pretend kernel mode
Switch between them on exceptions etc
Higher cost of emulation (switching page tables every time)

Alternate solution: clear PT on kernel/user switch

Also not great for overhead

Exercise (06:52)

Go Back

Guest PT switches

Switch to another program
Switch back to original program

Shadow PT switches

read(): To kernel mode (assume it doesn't switch back yet)
switch(): progA to progB
switch(): Back to user mode
keyboard(): To kernel mode (assume it doesn't switch back yet)
switch(): progB to progA
switch(): Back to user mode
32-bit x86: Every time you change PT ptr or invalide TLB, you gotta flush the whole TLB
- So, all this gets really expensive

Tagged TLBs (01:39)

Go Back

Tagged TLBs

HW sometimes includes "address space ID" in TLB entries
- kinda a process ID
Helpful for normal OSes
- faster context switching
Super useful for hypervisor
- Lots of switches
- less expensive per switch
not used by modern OSes until recently

Proactively filling page tables (06:52)

Go Back

problem with filling on demand

Many OSes invalidate entire TLB on context switch
- Especially without tagged TLBs
So, rebuild shadow PT on every OS context switch
- Often unacceptably slow
- Want to cache shadow page tables
- problem: OS won't tell you when it's writing

Solution: Shadow page table for multiple processes

Actually 2 for each one (kernel and user)
Problem: What if guest OS modifies another process' page table (e.g. fork, copy on write, evicting pages)
- Guest OS thinks TLB only stores current process
  - So won't tell the hypervisor of these changes
Solution: Trap and emulate
- Track what physical pages are part of PTs the guest OS knows about
- Mark them as read-only in shadow PTs
- When guest OS tries to modify, triggers protection fault
- Sounds really expensive, but cheaper than updating shadow PT every time
- Real VM monitors do this

Pros and cons (02:34)

Go Back

proactive (trap and emulate) over on-demand

pro: works with guest OSes that make assumptions about TLB size
pro: maintain shadow Pt for each guest process (avoids rebuilding on every context switch)
pro: better fit with tagged TLBs
con: more instructions spent doing copy-on-write
con: What happens when PT memory recycled?
con: Super complicated

Hardware Hypervisor support (02:13)

Go Back

Hardware Hypervisor support (not on exam)

Common in modern processors
Benefits:
- Lets processor be in kernel mode but still switch to VM monitor when something important happens
  - Processor will track User/Kernel mode PTEs for you (don't need separate shadow PT's for these)
- Lets you configure when to run hypervisor
  - HW can run guest handlers, and specify which things go to hypervisor
- Nested page table support
  - HW will do 2 page table lookups itself (to help with Virtual Machines)
  - Even tho this is super complicated in the HW
Why VMs are much faster today

Virtual Machines

Intro / Flashback (05:12)

Go Back

Virtual Machines (03:24)

Go Back

Virtual Machines

Imitate: How close? (02:08)

Go Back

Imitate: How close?

Other techniques (01:16)

Go Back

Other techniques

VM Layering (00:34)

Go Back

VM Layering

Pretend Modes (00:41)

Go Back

Pretend Modes

Keeping track of stuff (01:36)

Go Back

Keeping Track of stuff

Basic Hypervisor Flow (02:32)

Go Back

How do we make it look like it's on real HW?

Virtual machine execution pieces (00:50)

Go Back

Virtal Machine execution pieces

Trap and Emulate (00:56)

Go Back

Trap and emulate

Privileged I/O flow (00:59)

Go Back

Privileged I/O flow

What this looks like (pseudocode) (05:04)

Go Back

How this looks

What if we're in pretend user mode? (02:00)

Go Back

What if we're in pretend user mode?

System calls (03:07)

Go Back

System calls

After system call

How this works for Memory-Mapped I/O (02:22)

Go Back

Applying this I/O to memory-mapped I/O

Exercise: How many exceptions? (02:37)

Go Back

Exceptions that occur:

Making this faster

This doesn't always work... (01:29)

Go Back

This often doesn't actually work

What about virtual memory? (01:01)

Go Back

Things Virtual Memory needs

Terms (01:21)

Go Back

Terms

Three page tables (05:01)

Go Back

Three page tables

But guest OS needs translation from virtual to machine directly

Creating the shadow page table (07:44)

Go Back

Creating the shadow page table

When do we update the shadow page table?

nit: memory-mapped I/O (00:45)

Go Back

nit: memory-mapped I/O

Page tables and kernel mode? (02:41)

Go Back

Page tables and kernel mode

One solution: Two shadow page tables

Alternate solution: clear PT on kernel/user switch

Exercise (06:52)

Go Back

Guest PT switches

Shadow PT switches

Tagged TLBs (01:39)