Thursday, November 16, 2017

Postgres JSONB storage capabilities

JSONB Features

  • JSONB data input is little slower, but processing is then significantly faster because the data does not need to be re-parsed
  • JSONB could be restricted by data constraint and validation functions
  • JSONB is a efficient representation with indexing capability
  • JSONB is efficient in the storage and retrieval of JSON documents, but the modification of individual fields requires the extraction of the entire JSON document

Rapid Prototyping

  • The data stored is schema-less, as the business requirements rapidly change there is no effort needed to continuously write migrations
  • No effort is required to think through a data-model, ensuring proper normalization
  • No need to write SQL
  • The data is sub-optimal importance, it is acceptable of rarely data loss or corruption, thus the strong guarantees provided by a standard RDBMS are not necessary

ACID

Atomicity

each transaction should be "all or nothing"

Consistency

any transaction will bring the database from one valid state to another, any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof.

Isolation

the concurrent execution of transactions results in a system state that would be obtained if transactions were executed sequentially, i.e., one after the other.

Durability

Once a transaction has been committed, it will remain so, even in the event of power loss.

CAP Theorem

In the presence of a network partition, one has to choose between consistency and availability.

Consistency

Every read receives the most recent write or an error

Availability

Every request receives a (non-error) response - without guarantee that it contains the most recent write

Partition Tolerance

The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

It is really just A vs C

Availability is achieved by replicating the data across different machines
Consistency is achieved by updating several nodes before allowing further reads
Total partitioning, meaning failure of part of the system is rare. However, we could look at a delay, a latency, of the update between nodes, as a temporary partitioning. It will then cause a temporary decision between A and C:
1. On systems that allow reads before updating all the nodes, we will get high availability
2. On systems that lock all the nodes before allowing reads, we will get consistency

Level of Transaction Isolation

Read Committed
Repeatable Read
Serialization

Document Database

are designed to store semi-structured data that there is no clear separation between the data's schema and the data itself

Column-oriented DBMS

is a database management system (DBMS) that stores data tables by column rather than by row. A column-oriented database serializes all of the values of a column together, then the values of the next column, and so on.

Third Normal Form (3NF)

Each attribute contains only atomic values.
No data is redundantly represented based on any non-unique subsets. For every unique set of the entries (a candidate key), no other attribute depends on any subset of the candidate key
No data is dependent on anything other than the key


No comments:

Post a Comment