

FLUSH2 design
=============

Author: Bela Ban

Prerequisites:
- A flush is always started and stopped by the *same* member, both for joins and state transfer
- For state transfers (but not for joins), we can have different members start a flush at the same time


Structures:
- FlushState
  - flush_active: whether we are currently running a flush
  - flush_leaders: set of members which started the flush (no duplicates)
  - start_flush_set: list of members from whom we expect a START-FLUSH-OK message
  - digests: digests from all members, received with START-FLUSH-OK message
  - stop_flush_set: list of members to whom we send a STOP-FLUSH message
  - down_mutex: for blocking of multicast down calls, unicasts are passed at all times



On SUSPEND event:
- If FlushState.flush_active:
    - Add flush leader to flush_leaders (if not yet present)
    - If already present: terminate (don't send START-FLUSH message)
-Else
    - Set FlushState.flush_active to true
    - Add flush leader to flush_leaders (if not yet present)
    - Set start_flush_set and stop_flush_set
- Multicast START-FLUSH
- Block until start_flush_set is empty (block on start_flush_set)
- Look at FlushState.digests:
    - If all digests are the same --> return
    - Else --> run reconciliation phase

On START-FLUSH(sender=P):
- If FlushState.flush_active:
    - Add flush leader to flush_leaders (if not yet present)
-Else
    - Set FlushState.flush_active to true
    - Add flush leader to flush_leaders (if not yet present)
- Call block()
- Make down_mutex block multicast messages from now on
- Return START-FLUSH-OK with my digest to P


On START-FLUSH-OK(sender=P,digest D):
- Add digest D to FlushState.digests
- Remove P from FlushPhase.start_flush_set
- Notify FlushPhase.start_flush_set if empty

On RESUME event:
- Multicast STOP-FLUSH event (asynchronously, don't wait for results)


On STOP-FLUSH(sender=P):
- Remove P from flush_leaders
- If flush_leaders is empty:
    - Unblock down_mutex


On SUSPECT(S):
- Remove S from start_flush_set and stop_flush_set
- If start_flush_set is now empty --> notify thread blocking on it
- Remove S from FlushState.flush_leaders
- If FlushState.flush_leaders is empty: (flush leader S crashed during flush !)
    - Unblock down_mutex


On view change V:
- Remove all members that are not part of V from start_flush_set and stop_flush_set
- If start_flush_set is now empty --> notify thread blocking on it
- Remove all members from FlushState.flush_leaders which are not part of V
- If FlushState.flush_leaders is empty: (flush leader crashed during flush !)
    - Unblock down_mutex



Reconciliation phase (in NAKACK):
- Sends XMIT requests for message not in local digest, but in digest received from REBROADCAST_MSGS
- Wait until local digest (updated when missing messages are received) is same as digest received from REBROADCAST_MSGS
- Takes suspects into account




Concurrent flushes
------------------

- Concurrent flushes to overlapping member sets will cause only 1 flush to succeed, and all others to fail

- A failed flush set the flushInProgress flag back to false on all members on which it successfully set it

- Algorithm:
  - Send a START-FLUSH to all members in the target set
  - Every member sets the flushInProgress flag to true
    - If this succeeds, the member sends back an OK
    - Else a FAIL is sent back
  - If we received OKs from all members, startFlush() succeeded and returns true
  - If 1 FAIL is received:
    - Send an ABORT-FLUSH to all members which took part in the flush, causes flushInProgress to be set to false
    - startFlush() failed and returns false


Concurrent total flushing (members={A,B,C,D})
---------------------------------------------

- In general, concurrent total flushes are not allowed

- Concurrent flushes on the same member:
  - The first thread to call Channel.startFlush() runs the flush phase
  - Subsequent threads to call Channel.startFlush() block until the current flush phase has completed
    (Channel.stopFlush())
    
- Concurrent flushes on different members:
  - A.startFlush() and B.startFlush() are called concurrently
  - A sends a START-FLUSH to all members, and so does B
  - When a START-FLUSH is received, a flag 'flushing-in-progress' is set
  - If flushing-in-progress cannot be set (because another flush protocol is running), we send back a
    START-FLUSH-FAIL. (We might add a small timeout to block until flushing-in-progress can be acquired)
  - On reception of a START-FLUSH-FAIL, we 'roll back' the flush, setting all of our acquired 'flushing-in-progess'
    flags back to false
  - SUMMARY: either A fails, or B fails, or both A and B fail in a concurrent flush phase (similar to concurrent
    transactions)


Concurrent partial flushing (members={A,B,C,D})
-----------------------------------------------

- In general, concurrent partial flushes are not allowed (same as for concurrent total flushes)











