From Idea to Reality: The Power of Serverless for Modern Applications
Overview
Problem Statement
The API might return a JSON like below on 18th of March 2023 at 12:30pm
"us-east-1-18-03-23" : {
"temperature" : 28.4,
"atm_pressure" : 110045,
"timestamp" : 1679122800
}
The Initial Approach
Since the data being returned by the API is unstructured and also since we are unsure about whether we might be getting other parameters as well in the future from the API , we cannot have a fixed , defined structure. Let’s explore Non-relational databases for the above scenario. DynamoDB , one of Amazon Web Services’ offerings for storing Non-relation data comes to mind. Let’s picture how data from the API can be stored in the DynamoDB.
The Drawback
The Solution
_
Google Bigtable is a distributed, highly scalable, and NoSQL database system designed to handle large amounts of structured data. It is used by Google to power its own services such as Gmail, Google Search, and Google Analytics. Bigtable is built on top of Google’s proprietary distributed file system, known as Google File System (GFS), and provides a simple data model with sparse, distributed, and persistent multidimensional sorted maps. It allows for efficient reads and writes, automatic sharding, and replication across multiple data centers, making it suitable for storing and processing massive amounts of data in real-time applications.
What am I compromising on by migrating from DynamoDB to BigTable ?
Bigtable architecture is centralised , the table is composed of rows , each of which describes a single entity , and columns which contain individual values for each row. Each row is indexed by a single row key and the columns are related to another based on the defined column family. Data model of Bigtable is basically column oriented which is as opposed to DynamoDBs decentralised data model with key-value data model.
Replication of data in Bigtable only happens within a single data centre whereas DynamoDB can replicate data across multiple data centres.
DynamoDB provides partitioning with consistent hashing where every node in the system is assigned to one or more points on a fixed circular space called “ring”. Each data item identified by a key , is assigned to a node by hashing its key with a hash function whose output is a point on the ring and then walking the ring clockwise to find the first node that appears on it. The main advantage of this technique is that addition or removal of a node only affects its immediate neighbours while other nodes remain unaffected. Whereas partitioning in BigTable is Key range based and data is ordered by a row key. Row ranges are called tablets. Each table consists of a set of tablets, and each tablet contains all data associated with a row range. Initially, each table consists of just one tablet. As a table grows, it is automatically split into multiple tablets. BigTable implementation includes single master and multiple tablet servers. The master is responsible for assigning tablets to tablet servers, whereas tablet servers are responsible for handling read and write requests to the tablets that they serve, and splitting tablets that have grown too large. Since each table’s cell belongs to a particular row, each row belongs to a particular tablet, and each tablet is assigned to exactly one tablet server at a time, it is very simple to find a node which stores the data of a specific table’s cell. The only thing to do is to query a special METADATA table, which stores the location of a tablet under a row key.
DynamoDB clients do not have to wait until their updates reach all the replicas, but in return, they deal with the multiple object versions on reads. Whereas, BigTable clients enjoy a consistent view of their data, but in return, they must wait in the presence of system failures. DynamoDB sacrifices consistency while Bigtable sacrifices availability.
DynamoDB completely ignores security related requirements while Bigtable has an adequate authorization mechanism. In BIgTable , access control rights are granted at column family level. For example- it can be configured that three different applications can access data from table 1 where the first application is only permitted to view personal data, the second can view and update personal data and the third can view and update all users’ data.