Big Data Security Assessment
What is Big Data?
Big Data refers to datasets whose size and/or structure is beyond the ability of traditional software tools or database systems to store, process, and analyze within reasonable timeframes.
HADOOP is one of the main computing environment built on top of a distributed clustered file system (HDFS) that was designed specifically for large scale data operations and embraced by enterprises.
Benefits of Big Data and Data Analytics

Big Data Security Assessment

What is Big Data?

Big Data refers to datasets whose size and/or structure is beyond the ability of traditional software tools or database systems to store, process, and analyze within reasonable timeframes.

HADOOP is one of the main computing environment built on top of a distributed clustered file system (HDFS) that was designed specifically for large scale data operations and embraced by enterprises.

Benefits of Big Data and Data Analytics
Security issue
Security Strategy 

Understanding architecture and cluster composition of the ecosystem in place is the first step to putting together for security strategy. it is important to understand each component interface as an attack target.

Each component offers attacker a specific set of potential exploits, while defenders have a corresponding set of options for attack detection and prevention.

Security Issue
Security Strategy

Understanding architecture and cluster composition of the ecosystem in place is the first step to putting together for security strategy. it is important to understand each component interface as an attack target.

Each component offers attacker a specific set of potential exploits, while defenders have a corresponding set of options for attack detection and prevention.

Threats on Big Data Platforms
Data access & ownership
Relational and quasi-relational platforms include roles, groups, schemas, label security, and various other facilities for limiting user access to subsets of available data. authentication and authorization requirements shall be assessed while managing the cluster for limiting access to sensitive data.
Audit
Logging capabilities in the big data ecosystem, both open source and commercial shall be assessed for proper implementations. We need to verify that the logs are configured to capture both the correct event types and sufficient information to determine user actions including queries executed.
Security Monitoring
The built-in monitoring tools to detect misuse or block malicious queries shall be validated and assessed. Database activity monitoring technologies will help to flag or even block misuse operations.
Data at rest protection
Encryption can help to protect against attempts to access data outside established application interfaces. Unauthorized stealing of archives or directly reading files from disk, can be mitigated using encryption at the file or HDFS layer .This ensures files are protected against direct access by users as only the file services are supplied with the encryption keys. Third parties products can help to provide advanced transparent encryption options for both HDFS and non-HDFS file formats. Transport Layer Security (TLS) provides confidentiality of data and provides authentication via certificates and data integrity verification.
Inter-node communication
Data in transit, along with application queries might be accessible for inspection and tampering while using unencrypted RPC over TCP/IP communication protocols. Ensure TLS and SSL capabilities are bundled in big data distributions.
Multi-tenancy
We need to ensure one tenant cannot read another’s data and ‘encryption zones’ are built into native HDFS. Additional security controls shall be implemented to ensure privacy using Access Control Entries (ACE) or Access Control Lists (ACL) when multiple applications and ‘tenants’ are served in ecosystem.
Client interaction
Gateway services shall be created to load data, instead of clients communicate directly with both resource managers and individual data nodes as Compromised clients may send malicious data or link to services.
API security

Ensure the big data cluster APIs be protected from code and command injection, buffer overflow attacks.

Threats on Big Data Platforms

Data access & ownership
Relational and quasi-relational platforms include roles, groups, schemas, label security, and various other facilities for limiting user access to subsets of available data. authentication and authorization requirements shall be assessed while managing the cluster for limiting access to sensitive data.
Audit
Logging capabilities in the big data ecosystem, both open source and commercial shall be assessed for proper implementations. We need to verify that the logs are configured to capture both the correct event types and sufficient information to determine user actions including queries executed.
Security Monitoring
The built-in monitoring tools to detect misuse or block malicious queries shall be validated and assessed. Database activity monitoring technologies will help to flag or even block misuse operations.
Data at rest protection

Encryption can help to protect against attempts to access data outside established application interfaces. Unauthorized stealing of archives or directly reading files from disk, can be mitigated using encryption at the file or HDFS layer .This ensures files are protected against direct access by users as only the file services are supplied with the encryption keys. Third parties products can help to provide advanced transparent encryption options for both HDFS and non-HDFS file formats. Transport Layer Security (TLS) provides confidentiality of data and provides authentication via certificates and data integrity verification.

Inter-node communication
Data in transit, along with application queries might be accessible for inspection and tampering while using unencrypted RPC over TCP/IP communication protocols. Ensure TLS and SSL capabilities are bundled in big data distributions.
Multi-tenancy
We need to ensure one tenant cannot read another’s data and ‘encryption zones’ are built into native HDFS. Additional security controls shall be implemented to ensure privacy using Access Control Entries (ACE) or Access Control Lists (ACL) when multiple applications and ‘tenants’ are served in ecosystem.
Client interaction
Gateway services shall be created to load data, instead of clients communicate directly with both resource managers and individual data nodes as Compromised clients may send malicious data or link to services.
API security
Ensure the big data cluster APIs be protected from code and command injection, buffer overflow attacks.
Holistic Approach for Big Data Security Operation
Administration
Authentication and perimeter security
Data protection
Configuration and patch management

Holistic Approach for Big Data Security Operation

Administration
Authentication and perimeter security
Data protection
Configuration and patch management
Security Solutions

Apache Ranger-Ranger is a policy administration tool for Hadoop clusters. It includes a broad set of management functions, including auditing, key management, and fine grained data access policies across HDFS, Hive, YARN, Solr, Kafka and other modules.

Apache Ambari-Ambari is a facility for provisioning and managing Hadoop clusters. It helps administrators set configurations and propagate changes to the entire cluster.

Apache Knox-You can think of Knox as a Hadoop firewall. More precisely it is an API gateway. It handles HTTP and RESTful requests, enforcing authentication and usage policies on inbound requests and blocking everything else.

Monitoring-You can think of Knox as a Hadoop firewall. More precisely it is an API gateway. It handles HTTP and RESTful requests, enforcing authentication and usage policies on inbound requests and blocking everything else.

Security Solutions

Apache Ranger-Ranger is a policy administration tool for Hadoop clusters. It includes a broad set of management functions, including auditing, key management, and fine grained data access policies across HDFS, Hive, YARN, Solr, Kafka and other modules.

Apache Ambari-Ambari is a facility for provisioning and managing Hadoop clusters. It helps administrators set configurations and propagate changes to the entire cluster.

Apache Knox-You can think of Knox as a Hadoop firewall. More precisely it is an API gateway. It handles HTTP and RESTful requests, enforcing authentication and usage policies on inbound requests and blocking everything else.

Monitoring-Hive, PIQL, Impala, Spark SQL and similar modules offer SQL or pseudo-SQL syntax. This enables you to leverage activity monitoring, dynamic masking, redaction, and tokenization technologies originally developed for relational platforms..