Monday, January 29, 2018

Cassandra vs Oracle

Source:
https://www.datastax.com/wp-content/uploads/2013/11/WP-DataStax-Oracle.pdf

Oracle is a solid RDBMS that performs well for the use cases for which it was designed (e.g. ERP and accounting applications). It is not architected to tackle the new wave of big data, online applications developed today. The scale- up, master-slave, non-distributed architecture of Oracle falls short of what modern online applications need.

Scalability and Performance Limitations 

Oracle’s scale-up, master-slave design limits both its scalability and performance for servicing the online elasticity and performance SLA needs of many online applications. 
The failure of Oracle to add capacity online in an elastic, scale-out vs. scale-up manner to service increasing user workloads, keep performance high, and easily consume fast incoming data from countless geographical locations is widely recognized.

Benefits of Cassandra

  • Massively scalable architecture – a masterless design where all nodes are the same.
  • Linear scale performance – online node additions produce predictable increases in performance.
  • Continuous availability – redundancy of both data and function mean no single point of failure.
  • Transparent fault detection and recovery – easy failed node recovery.
  • Flexible, dynamic schema data modeling – easily supports structured, semi-structured, and unstructured data.
  • Guaranteed data safety – commit log design ensures no data loss.
  • Active everywhere design – all nodes may be written to and read from.
  • Tunable data consistency – support for strong or eventual data consistency.
  • Multi-data center replication – cross data center and multi-cloud availability zone support for writes/reads built in.
  • Data compression – data compressed up to 80% without performance overhead.
  • CQL (Cassandra Query Language) – an SQL – like language that makes moving from an RDBMS very easy.

DataStax Cassandra

  • Built-in analytics functionality for Cassandra data via integration with a number of Hadoop components (e.g. MapReduce, Hive, Pig, Mahout, etc.)
  • Enterprise search capability on Cassandra data via Solr integration.
  • Enterprise security including external/internal authentication and object permission management, transparent data encryption, client-tonode and node-to-node encryption, and data auditing.
  • Visual cluster management for all administration tasks including backup/restore operations, performance monitoring, alerting, and more.

Cassandra Use Cases

  • Time-series data management (e.g. financial, sensor data, web click stream, etc.) 
  • Online web retail
  • Web buyer behavior and personalization management
  • Recommendation engines
  • Social media input and analysis
  • Online gaming
  • Fraud detection and analysis
  • Risk analysis and management
  • Supply chain analytics
  •  Web product searches 
  •  Write intensive transactional systems

Data Modeling Differences

In traditional databases such as Oracle, data is modeled in a standard “third normal form” design without the need to know what questions will be asked of the data. 
By contrast, in NoSQL, the questions asked of the data are what drive the data model design and the data is highly de-normalized.

Data Processing Concerns of Modern Applicaiton

Legacy Application
Modern Application
Slow/medium velocity data
High velocity data
Data coming in from one/few locations
Data coming in from many locations
Rigid, static structured data
Flexible, fluid, multi-type data
Low/medium data volumes; purge often
High data volumes; retain forever
Deploy app central location/one server
Deploy app everywhere/many servers
Write data in one location
Write data everywhere/anywhere
Primary concern: scale reads
Scale writes and reads
Scale up for more users/data
Scale out for more users/data


DataStax Cassandra vs. Oracle at Functional Level

Feature/Function
DataStax/Cassandra
Oracle RDBMS
High Availability
Continuous availability with built in redundancy and hardware rack awareness in both single and multiple data centers
General replication; Oracle Dataguard (for failover) and Oracle RAC (single point of failure with storage) bout of which are expensive add-ons. GoldenGate also offered for certain use cases.
Scalability Model
Linear performance gains via node additions
Scale up via adding CPU’s RAM or Oracle RAC or Exadata
Replication Model
Peer-to-peer; number of copies configurable across cluster and each datacenter.
Peer-to-peer; number of copies configurable across cluster and each datacenter
Multi-data center/geography/cloud capabilities
Multi-directional, 1-many data center support built in, with true read/write anywhere capability
Nothing specific for multi-data center
Data partitioning/sharding model
Automatic; done via primary key; random or ordered
Table partitioning option to enterprise edition; manual server sharding
Data volume support
TB-PB capable
TB capable; PB with Exadata
Analytic support
Analytics on Caddadra data via Hadoop integration( MapReduce, Hive, Pig, Mahout )
Analytic functions in Oracle RDBMS via SQL MapReduce. Haddop support done in NoSQL appliance
Enterprise search support
Built into dataStax Enterprise via Solr integration
Handled via Oracle search (cost add-on)
Mixed workload support
All handled in one cluster with built-in workload isolation; no workload competes for resources with another
Handled via Oracle Exadata
Data Model
Google Bigtable like; a wide column store
Relational/tabular
Flexibility of data model
Flexible. Designed for structured, semi-structured, and unstructured data
Rigid; primarily structured data
Data consistency mode
Tunable consistency (CAP theorem consistency per operation (e.g. per insert, delete, etc.) across cluster
Traditional ACID
Transaction Support
Provides full Atomic, Isolated, and Durable (AID) transactions including batch transactions and “lightweight” transactions with Cassandra 2.0 and higher
Traditional ACID
Security
Support for all key security needs: Login ID/passwords, external security support; object permission management; transparent data encryption; client to mode, node to node encryption; data auditing
Full security support
Storage model
Targeted directories with separation (e.g. put some column families on SSD’s, some on spinning disk)
Tablespaces
Data compression
Built in
Various methods
Memory usage model
Distributed object/row caches across all nodes in a cluster
Standard data/metadata caches with query cache
Logical database container
Keyspace
Database
Primary data object
Column family/table
Table
Data variety support
Structured, semi-structured, unstructured
Primarily structured
Indexes
Primary, secondary. Extensible via Solr indexes
B-Tree, bitmap, clustered, others
Core language
CQL (Cassadra Query Language; resembles SQL)
SQL
Primary query utilities
CQL shell
SQL*Plus
Visual query tools
DataStax DevCenter and 3rd party support (Aqua data Studio)
SQL Developer from Oracle etc.
Development Language support
Many (Java, C#, Python)
Many
Geospatial support
Done via Solr ingtegration
Oracle Geospatial option (cost add-on)
Logging (e.g., web, application) data support
Handled via log4j
Nothing built in
Backup/recovery
Online, point-in-time restore
Online, point-in-time restore
Enterprise management/monitoring
DataStax OpsCenter
Oracle Enterprise Manager