Distributed Systems 3: Failure / Network filesystem

Intro / Flashback (04:10)

Go Back

TPC: Normal operation (03:21)

Go Back

TPC: Some failure cases (00:57)

Go Back

Worker fails after C prepare (01:54)

Go Back

Problem

  • C sent Prepare, then worker failed
  • C doesn't know if W got Prepare or not

Three strategies

  1. C resends Prepare after a timeout
  2. C aborts
  3. W logged that it agreed to commit, so sends Agree-to-commit when it's up again (not required, but could make you abort less often)

Network failure after C prepare (same strategies) (01:02)

Go Back

Worker failure after C commit (02:36)

Go Back

Problem

  • C sent commit
  • Worker/network fails
  • Need to get C to resend commit message (can't abort now)

Strategies

  1. W resends agree-to-commit vote
  2. C resends outcome somehow? (But how would it know?)

C resending outcomes

  • Have workes acknowledge the Commit message
    • Assignment: Whether or not you returned

Timeouts (01:22)

Go Back

2-phase assignment (01:36)

Go Back

Assignment: RPC (01:04)

Go Back

Assignment: Failure recovery (00:32)

Go Back

Assignment: Failure types (02:10)

Go Back

Reordering (05:38)

Go Back

Extending voting (00:40)

Go Back

Quorums (majority) (03:18)

Go Back

Different size quorums for different operations (02:07)

Go Back

Tricky details (01:31)

Go Back

Raft (02:10)

Go Back

Quorums for Byzantine failures (01:53)

Go Back

Network filesystems (00:51)

Go Back

Problems (00:22)

Go Back

What's the server storing? (00:55)

Go Back

What if the client crashes? (00:53)

Go Back

What if the server crashes? (00:33)

Go Back

NFSv2 (00:39)

Go Back

NFSv2 RPC calls (01:52)

Go Back

NFSv2 client vs server (01:09)

Go Back

Generation numbers (01:34)

Go Back

Stateless protocol (00:45)

Go Back

Reading directories: Offset "cookie" (01:54)

Go Back

Statelessness vs Statefulness (01:26)

Go Back

Performance (01:08)

Go Back

Updating cached copies while staying consistent (02:15)

Go Back

NFSv3's solution: allow inconsistency (04:56)

Go Back

Open to close consistency model (01:20)

Go Back

An alternative model (00:48)

Go Back

AFSv2: a stateful server (01:46)

Go Back

Last writer wins (00:57)

Go Back

AFS caching: Callbacks (01:23)

Go Back

Callback inconsistency (01:59)

Go Back

NFS: working by blocks like normal (01:37)

Go Back

Protection vs security (01:36)

Go Back

Adversaries (01:06)

Go Back

Authorization vs Authentication (00:59)

Go Back

Authentication (01:48)

Go Back