Unit III: Consensus Algorithm:
Introducing the consensus problem, Analysis and design, Classification, Algorithms: CFT algorithms, BFT algorithms, Choosing an algorithm
Contents
Que. 1 Explain Consensus Algorithm.
Consensus Algorithms:
- It is a procedure that allows each peer of the blockchain network to set a shared agreement about the state of the decentralized ledger.
- In other words, it is a protocol using which all the nodes in the blockchain network come to a common consensus (agreement) on the current data state in the ledger and can trust unknown peers in the network.
- It is designed to acquire reliability in a network that consists of nodes or multiple users.
- Some of the common objectives of the consensus mechanism for blockchain are collaboration, equal rights to each node, coming to an agreement, cooperation, and mandatory participation of every node.
Working of Consensus Algorithm:
- Consensus algorithms are vital in large-scale, fault-tolerant systems because they enable a set of distributed/replicated machines or servers to work as a coherent group and agree on system state, even in the presence of failures or outages.
- To achieve this, the algorithm sets a threshold, or the number of member machines that must reach consensus or agreement.
- As they solve a consensus problem, consensus algorithms assume some processes and systems will be unavailable and that only a portion of the nodes will respond.
- They also assume some communications will be lost during transmission. However, a response is required from the available nodes.
Types:
- Proof of Work (PoW)
- Practical Byzantine Fault Tolerance (PBFT)
- Proof of Stake (PoS)
- Proof of Burn (PoB)
- Proof of Capacity
- Proof of Elapsed Time
Application of consensus algorithm:
- Consensus algorithms synchronize state machine replicas and ensure consistency among them.
- The most basic application of the algorithm is to decide whether a transaction in a distributed environment needs to be implemented or not
- Google’s PageRank,
- load balancing,
- smart power grids,
- clock synchronization and
- control of unmanned aerial vehicles like drones.
Also Read: Unit 2: Basic Cryptographic primitives
Que. 2 State and explain the Byzantine general’s problem.
- The problem of reaching agreement in the presence of faults or Byzantine consensus was first formulated by M. Pease, R. Shostak, and L. Lamport.
- In distributed systems, a common goal is to achieve consensus (agreement) among nodes on the network even in the presence of faults.
- In order to explain the problem, Lamport came up with an allegorical representation of the problem and named it the Byzantine general’s problem.
- The Byzantine general’s problem metaphorically depicts a situation where a Byzantine army, divided into different units, is spread around a city.
- A general commands each unit, and they can only communicate with each other using a messenger.
- To be successful, the generals must coordinate their plan and decide whether to attack or retreat.
- The problem, however, is that any generals could potentially be disloyal and act maliciously to obstruct agreement upon a united plan.
- The requirement now becomes that every honest general must somehow agree on the same decision even in the presence of treacherous generals.
- In order to address this issue, honest (loyal) generals must reach a majority agreement on their plan.
- In the digital world, generals are represented by computers (nodes) and communication links are messengers carrying messages. Disloyal generals are faulty nodes.
Que. 3 Fault tolerance and State machine replication
Fault tolerance:
- A fundamental requirement in a consensus mechanism is that it must be fault-tolerant.
- In other words, it must be able to tolerate a number of failures in a network and should continue to work even in the presence of faults.
- This naturally means that there has to be some limit to the number of faults a network can handle, since no network can operate correctly if a large majority of its nodes are failing.
- Based on the requirement of fault tolerance, consensus algorithms are also called fault-tolerant algorithms, and there are two types of fault-tolerant algorithms.
Types of fault-tolerant consensus:
- Fault-tolerant algorithms can be divided into two types of fault-tolerance.
- The first is Crash fault-tolerance (CFT) and the other is Byzantine fault-tolerance (BFT).
- CFT covers only crash faults or, in other words, benign faults. In contrast, BFT deals with the type of faults that are arbitrary and can even be malicious.
- Replication is a standard approach to make a system fault-tolerant. Replication results in a synchronized copy of data across all nodes in a network. This technique improves the fault tolerance and availability of the network. This means that even if some of the nodes become faulty, the overall system/network remains available due to the data being available on multiple nodes.
- There are two main types of replication techniques:
- Active replication, which is a type where each replica becomes a copy of the original state machine replica.
- Passive replication, which is a type where there is only a single copy of the state machine in the system kept by the primary node, and the rest of the nodes/replicas only maintain the state.
- In the context of fault-tolerant consensus mechanisms, replication plays a vital role by introducing resiliency into the system.
State machine replication (SMR);
- State machine replication (SMR) is a de facto technique that is used to provide deterministic replication services in order to achieve fault tolerance in a distributed system.
- State machine replication was first proposed by Lamport in 1978 in his paper:
- At an abstract level, it is a mathematical model that is used to describe a machine that can be in different states.
- It is important to understand that a state machine can only have one state at a time.
- A state machine stores a state of the system and transitions it to the next state as a result of input received.
- As a result of state transition, an output is produced along with an updated state.
- The fundamental idea behind SMR can be summarized as follows:
- All servers always start with the same initial state.
- All servers receive requests in a totally ordered fashion (sequenced as generated from clients).
- All servers produce the same deterministic output for the same input.
- State machine replication is implemented under a primary / backup paradigm, where a primary node is responsible for receiving and broadcasting client requests.
- This broadcast mechanism is called total order broadcast or atomic broadcast, which ensures that backup or replica nodes receive and execute the same request in the same sequence as the primary.
- Consequently this mean that all replicas will eventually have the same state as the primary thus resulting in achieving consensus
Que. 4 Explain Analysis and Design
a. Model
b. Processes
c. Timing assumptions
d. Synchrony
e. Asynchrony
f. Partial synchrony
Model:
- Distributed computing systems represent different entities in the system under a computational model.
- This computational model is a beneficial way of describing the system under some syst assumptions.
- A computational model represents processes, network conditions, timing assumptions, and how all these entities interact and work together.
- We will now look at this model in detail and introduce all objects one by one
Processes
- Processes communicate with each other by passing messages to each other.
- This is why these systems are called message-passing distributed systems. There is another class, called shared memory, which we will not discuss here as we are only dealing with message-passing systems
Timing assumptions
There are also some timing assumptions that are made when designing consensus algorithms.
Synchrony
- In synchronous systems, there is a known upper bound on the communication and processor delays. Synchronous algorithms are designed to be run on synchronous networks.
- At a fundamental level, in a synchronous system, a message sent by a processor to another is received by the receiver in the same communication round as it is sent.
Asynchrony
- In asynchronous systems, there is no upper bound on the communication and processor delays.
- In other words, it is impossible to define an upper bound for communication and processor delays in asynchronous systems.
- Asynchronous algorithms are designed to run on asynchronous networks without any timing assumptions.
- These systems are characterized by the unpredictability of message transfer (communication) delays and processing delays.
- This scenario is common in large-scale geographically dispersed distributed systems and systems where the input load is unpredictable
Partial synchrony
- In asynchronous systems, there is no upper bound on the communication and processor delays.
- In other words, it is impossible to define an upper bound for communication and processor delays in asynchronous systems.
- Asynchronous algorithms are designed to run on asynchronous networks without any timing assumptions.
- These systems are characterized by the unpredictability of message transfer (communication) delays and processing delays.
- This scenario is common in large-scale geographically dispersed distributed systems and systems where the input load is unpredictable
Que. 5 Explain the term “State Machine Replication”
State Machine Replication
- It is a general method for implementing a fault-tolerant service by replicating servers and coordinating client interactions with server replicas.
- It is also called fundamental approach in distributed computing for building fault tolerant systems.
- In a client-server setting, the servers simulate a state machine in a fault tolerant manner. Clients send commands as if there is a single state machine.
- Maintaining a single server is prone to crashes or Byzantine faults.
- Thus, instead of maintaining a single server, an SMR system uses multiple server replicas, some of which can be faulty.
- The group of servers presents the same interface as that of a single server to the client.
- The server replicas all initially start with the same state.
- However, when they receive concurrent requests from a client, honest replicas first need to agree on the sequence of client commands received by them.
- This problem is called log replication, and it is a multi-shot version of consensus.
Que. 6 Explain Fault Tolerance.
Fault tolerance:
- A fundamental requirement in a consensus mechanism is that it must be fault-tolerant.
- In other words, it must be able to tolerate a number of failures in a network and should continue to work even in the presence of faults.
- This naturally means that there has to be some limit to the number of faults a network can handle, since no network can operate correctly if a large majority of its nodes are failing.
- Based on the requirement of fault tolerance, consensus algorithms are also called faulttolerant algorithms, and there are two types of fault-tolerant algorithms.