Enterprise Java Application Architecture and Implementation: 2018

Source:
https://www.datastax.com/wp-content/uploads/2013/11/WP-DataStax-Oracle.pdf

Oracle is a solid RDBMS that performs well for the use cases for which it was designed (e.g. ERP and accounting applications). It is not architected to tackle the new wave of big data, online applications developed today. The scale- up, master-slave, non-distributed architecture of Oracle falls short of what modern online applications need.

Scalability and Performance Limitations

Oracle’s scale-up, master-slave design limits both its scalability and performance for servicing the online elasticity and performance SLA needs of many online applications.

The failure of Oracle to add capacity online in an elastic, scale-out vs. scale-up manner to service increasing user workloads, keep performance high, and easily consume fast incoming data from countless geographical locations is widely recognized.

Benefits of Cassandra

Massively scalable architecture – a masterless design where all nodes are the same.
Linear scale performance – online node additions produce predictable increases in performance.
Continuous availability – redundancy of both data and function mean no single point of failure.
Transparent fault detection and recovery – easy failed node recovery.
Flexible, dynamic schema data modeling – easily supports structured, semi-structured, and unstructured data.
Guaranteed data safety – commit log design ensures no data loss.
Active everywhere design – all nodes may be written to and read from.
Tunable data consistency – support for strong or eventual data consistency.
Multi-data center replication – cross data center and multi-cloud availability zone support for writes/reads built in.
Data compression – data compressed up to 80% without performance overhead.
CQL (Cassandra Query Language) – an SQL – like language that makes moving from an RDBMS very easy.

DataStax Cassandra

Built-in analytics functionality for Cassandra data via integration with a number of Hadoop components (e.g. MapReduce, Hive, Pig, Mahout, etc.)
Enterprise search capability on Cassandra data via Solr integration.
Enterprise security including external/internal authentication and object permission management, transparent data encryption, client-tonode and node-to-node encryption, and data auditing.
Visual cluster management for all administration tasks including backup/restore operations, performance monitoring, alerting, and more.

Cassandra Use Cases

Time-series data management (e.g. financial, sensor data, web click stream, etc.)
Online web retail
Web buyer behavior and personalization management
Recommendation engines
Social media input and analysis
Online gaming
Fraud detection and analysis
Risk analysis and management
Supply chain analytics
Web product searches
Write intensive transactional systems

Data Modeling Differences

In traditional databases such as Oracle, data is modeled in a standard “third normal form” design without the need to know what questions will be asked of the data.

By contrast, in NoSQL, the questions asked of the data are what drive the data model design and the data is highly de-normalized.

Data Processing Concerns of Modern Applicaiton

Legacy Application	Modern Application
Slow/medium velocity data	High velocity data
Data coming in from one/few locations	Data coming in from many locations
Rigid, static structured data	Flexible, fluid, multi-type data
Low/medium data volumes; purge often	High data volumes; retain forever
Deploy app central location/one server	Deploy app everywhere/many servers
Write data in one location	Write data everywhere/anywhere
*Primary concern: scale reads*	*Scale writes and reads*
Scale up for more users/data	Scale out for more users/data

DataStax Cassandra vs. Oracle at Functional Level

Feature/Function	DataStax/Cassandra	Oracle RDBMS
High Availability	Continuous availability with built in redundancy and hardware rack awareness in both single and multiple data centers	General replication; Oracle Dataguard (for failover) and Oracle RAC (single point of failure with storage) bout of which are expensive add-ons. GoldenGate also offered for certain use cases.
Scalability Model	Linear performance gains via node additions	Scale up via adding CPU’s RAM or Oracle RAC or Exadata
Replication Model	Peer-to-peer; number of copies configurable across cluster and each datacenter.	Peer-to-peer; number of copies configurable across cluster and each datacenter
Multi-data center/geography/cloud capabilities	Multi-directional, 1-many data center support built in, with true read/write anywhere capability	Nothing specific for multi-data center
Data partitioning/sharding model	Automatic; done via primary key; random or ordered	Table partitioning option to enterprise edition; manual server sharding
Data volume support	TB-PB capable	TB capable; PB with Exadata
Analytic support	Analytics on Caddadra data via Hadoop integration( MapReduce, Hive, Pig, Mahout )	Analytic functions in Oracle RDBMS via SQL MapReduce. Haddop support done in NoSQL appliance
Enterprise search support	Built into dataStax Enterprise via Solr integration	Handled via Oracle search (cost add-on)
Mixed workload support	All handled in one cluster with built-in workload isolation; no workload competes for resources with another	Handled via Oracle Exadata
Data Model	Google Bigtable like; a wide column store	Relational/tabular
Flexibility of data model	Flexible. Designed for structured, semi-structured, and unstructured data	Rigid; primarily structured data
Data consistency mode	Tunable consistency (CAP theorem consistency per operation (e.g. per insert, delete, etc.) across cluster	Traditional ACID
Transaction Support	Provides full Atomic, Isolated, and Durable (AID) transactions including batch transactions and “lightweight” transactions with Cassandra 2.0 and higher	Traditional ACID
Security	Support for all key security needs: Login ID/passwords, external security support; object permission management; transparent data encryption; client to mode, node to node encryption; data auditing	Full security support
Storage model	Targeted directories with separation (e.g. put some column families on SSD’s, some on spinning disk)	Tablespaces
Data compression	Built in	Various methods
Memory usage model	Distributed object/row caches across all nodes in a cluster	Standard data/metadata caches with query cache
Logical database container	Keyspace	Database
Primary data object	Column family/table	Table
Data variety support	Structured, semi-structured, unstructured	Primarily structured
Indexes	Primary, secondary. Extensible via Solr indexes	B-Tree, bitmap, clustered, others
Core language	CQL (Cassadra Query Language; resembles SQL)	SQL
Primary query utilities	CQL shell	SQL*Plus
Visual query tools	DataStax DevCenter and 3^rd party support (Aqua data Studio)	SQL Developer from Oracle etc.
Development Language support	Many (Java, C#, Python)	Many
Geospatial support	Done via Solr ingtegration	Oracle Geospatial option (cost add-on)
Logging (e.g., web, application) data support	Handled via log4j	Nothing built in
Backup/recovery	Online, point-in-time restore	Online, point-in-time restore
Enterprise management/monitoring	DataStax OpsCenter	Oracle Enterprise Manager

Enterprise Java Application Architecture and Implementation

Monday, January 29, 2018

Cassandra vs Oracle