Blockchains are basically distributed SQL databases
Introduction
Blockchains are more similar to centralized databases than you think. Fundamentally, a database is a storage system that process I/O requests. The major difference is that a blockchain uses cryptography to restrict a user’s update scope. At the networking layer, a blockchain runs a Byzantine fault tolerant (BFT) consensus protocol to ensure no centralized entity has the ability to violate the rules of the system.
Distributed database
Take MySQL for example. MySQL is a distributed relational database that runs as a cluster of machines, but under one administrative domain, so even though it runs on multiple computers, it’s still centralized. In other words, one (malicious) database admin can go in and change whatever data they want, or wipe all the data because they have uncapped control over the system.
Blockchains prevent this from happening by running a BFT consensus protocol for updates. Now, no single entity can control the data at will because it relies on a supermajority of network nodes to update the data.
Row updates
In a MySQL database server, any query can update any row. The queries usually come from some authenticated source (e.g. backend server), but once it reaches the database cluster, it runs to completion.
In a blockchain, every transaction request is signed by the sender’s private key. The signature must match the sender’s address (i.e. public key) in order for the transaction to be considered valid. If the signature fails verification, then the blockchain rejects the transaction.
Another difference is that these MySQL queries are written declaratively, and the database engine figures out the execution steps. Blockchain transactions can only trigger smart contracts, which are predefined execution steps that mutate the data in some predefined way. These blockchain smart contracts are analogous to SQL stored procedures. In other words, there are no generic unbounded transactions in a blockchain. Everything must be predefined as a smart contract (stored procedure).
As an aside, both smart contract languages (e.g. Solidity) and SQL are Turing-complete. The only difference is that blockchains impose a gas restriction on transactions so that they never hang infinitely, thereby making blockchains quasi-Turing-complete. SQL queries can hang until some system-configured deadline elapses.
Putting it all together
Let’s start with a centralized distributed MySQL cluster, and add cryptographic signature verification to every query. Now let’s take away general-purpose queries and replace it with only stored procedures that anybody can upload (for a fee). By the way, every stored procedure spawns its own set of tables and cannot mutate other tables, unless by calling another stored procedure.
At the networking layer, let’s make it so that anybody can operate a node for this networked MySQL cluster, provided they stake some money for the privilege to do so. The cluster is multi-leader, so users can send a transaction request to any node. We run a BFT consensus protocol to come to agreement on which transactions are valid and which ones are not.
But wait, if users can send transaction requests to any node and anybody can run a leader, then there are too many transaction requests on the network which can lead to lock contention and conflicts. As an optimization, let’s batch the transactions into groups called a “block”. Now whenever a node receives a transaction request, it gossips it to its peers in the network who then store it in their “mempool” instead of committing the individual transaction request immediately.
The nodes in the network take turns proposing blocks of transaction requests to their peers, who run the BFT consensus protocol to decide whether or not to accept that block. Congratulations, we just made a blockchain starting from a distributed MySQL cluster!