MEMO consensus mechanism: hotstuff based on PBFT protocol
In the MEMO decentralized storage network, in order to ensure data security, Keepers need to periodically challenge Providers, and many Keepers will reach a consensus on the result of that Keeper’s challenge. Since the initial release of Megrez in April this year and up to Megrez 2.5, MEMO has adopted the hotstuff consensus mechanism based on the PBFT protocol.
PBFT was the first proposed consistency protocol to use Byzantine fault tolerance, and although the communication overhead is relatively high, its core logic is very solid.
Compared to PBFT, hotstuff is simpler and more efficient, with lower communication overhead. The principle is that hotstuff turns PBFT’s mesh communication network topology into a star communication network topology, i.e. it relies on a master node keeper for each communication, which is rotated among all Keepers in the system.
hotstuff is more like an improvement on SBFT and Paxos.
Both Paxos and SBFT have more complex roles and processes, especially Paxos.
Paxos is suitable for solving consistency problems in centralized distributed systems, not for decentralized environments like blockchains, and therefore cannot be Byzantine fault-tolerant.
The addition of the role collector to SBFT introduces centralization issues and makes the protocol less concise.
The bottleneck in PBFT is the broadcast of all nodes in the prepare and commit phases. The reason why each node needs to broadcast is that the leader node is not trusted and needs to broadcast and get 2f+1 messages before confirming that the consensus for that phase has passed, which leads to increased network communication complexity.
However, PBFT can actually reduce the amount of communication by simply having each replica node only communicate with the leader and not broadcast. However, because the leader is not trusted, to declare the proposal passed, it is necessary for the leader to collect enough votes and broadcast these votes to other replica nodes, which is a heavy burden for the leader nodes. In the original PBFT, the burden was completely shared equally among all n nodes, but for the leader node in this case, the receiving burden is low and the sending burden is high. Although the total burden does not change much, the burden is heavier for the leader node and more demanding on it. This is why PBFT has chosen an equal sharing broadcast strategy.
The main point of improvement for hotstuff is in the aggregated signature area. Since the nodes are overburdened with the problem, the size of the message is minimized as much as possible. Since the nodes are signing the same object, the BLS signature aggregation technique can be used to turn the n signatures into one aggregated signature. This reduces the burden of sending and receiving for the replica node, and the burden of sending and receiving for the leader node, thus reducing the total complexity, and the leader is just one more job — aggregation — with little overhead.
The same applies to the view-change phase, where the complexity is reduced because the volume of messages passed is reduced and only the replica communicates with the leader instead of broadcasting between replicas (because the aggregation signature is verifiable).
Enhanced Security and Reliability
Although aggregated signatures can effectively improve the performance of consensus, after all, it needs to rely on the untrustworthy leader as the pivot of consensus to deliver messages and advance consensus, so unlike PBFT, hotstuff needs three stages to ensure the security and reliability of consensus.
1. prepare: the leader node initiates the proposal, which is the beginning of consensus. replica nodes reply to votes on the proposal, and the leader collects enough votes before moving on to the next stage. This is equivalent to the prepare phase of PBFT.
2. precommit: the leader node broadcasts a proof of aggregation that the proposal passed the prepare phase, and the replica node can verify that the proof does have enough replica to agree to the proposal. The replica then feeds the leader node that it has received the precommit message and can proceed to the next stage. This indicates that the replica node that received the precommit message can confirm that the proposal has passed, but can only guarantee that it knows this state and cannot guarantee that the other replicas are in this state.
3. commit: The leader node broadcasts an aggregated proof that the proposal has passed the precommit phase, and the replica node can verify that there are enough replicas to pass the precommit phase and confirm that the proposal has passed, at which point the proposal is in the pending commit state for that replica which is is equivalent to the commit phase of PBFT. The replica then replies to the leader that it is ready to commit.
4. decide: The leader node broadcasts an aggregated proof that the proposal has passed the commit phase, and the replica node can verify that there are enough replicas that have reached the commit phase to submit and execute the proposal. This replica node therefore submits the execution proposal and switches to the next view, which is equivalent to the apply phase of PBFT.