Data Governance

 Data governance is a set of processes and policies that ensure the quality, usability, security, and compliance of data. It is a critical part of any organization that wants to make effective use of its data.

The four main components of data governance are:

  • Data policies and procedures: These define the rules and regulations for how data is managed. They should cover areas such as data ownership, access control, and data retention.
  • Data quality management: This ensures that the data is accurate, complete, and consistent. It includes processes for data cleansing, validation, and monitoring.
  • Data catalog and metadata management: This provides a central repository for storing information about the data. This information can include the data's source, format, and usage.
  • Data security and privacy: This protects the data from unauthorized access, use, or disclosure. It includes measures such as encryption, access control, and security awareness training.

Data governance is important for a number of reasons. It can help to:

  • Improve the quality of data: By ensuring that the data is accurate, complete, and consistent, data governance can help to improve the quality of decision-making.
  • Increase the usability of data: By providing a central repository for data and by defining data standards, data governance can make it easier for people to find and use the data they need.
  • Protect the security of data: By implementing security measures, data governance can help to protect the data from unauthorized access, use, or disclosure.
  • Comply with regulations: By defining data policies and procedures, data governance can help organizations to comply with regulations such as GDPR and CCPA.

Data governance is a complex and challenging task, but it is essential for any organization that wants to make effective use of its data. By implementing data governance practices, organizations can improve the quality, usability, security, and compliance of their data.

Here are some of the benefits of data governance:

  • Improved decision-making: By ensuring that the data is accurate, complete, and consistent, data governance can help to improve the quality of decision-making. This is because decision-makers will have access to the information they need to make informed decisions.
  • Increased efficiency: Data governance can help to increase efficiency by streamlining the data management process. This can be done by automating tasks, such as data cleansing and validation.
  • Reduced risk: Data governance can help to reduce risk by identifying and mitigating potential problems. This can be done by implementing security measures, such as encryption and access control.
  • Improved compliance: Data governance can help organizations to comply with regulations, such as GDPR and CCPA. This is because data governance defines the rules and regulations for how data is managed.
  • Increased trust: Data governance can help to increase trust between stakeholders by ensuring that the data is managed in a transparent and accountable manner.

If you are considering implementing data governance in your organization, I recommend that you do the following:

  • Define your goals: The first step is to define your goals for data governance. What do you want to achieve by implementing data governance?
  • Identify your stakeholders: The next step is to identify your stakeholders. Who will be affected by data governance?
  • Assess your current state: The next step is to assess your current state of data governance. What are your strengths and weaknesses?
  • Develop a plan: The next step is to develop a plan for implementing data governance. This plan should include the goals, stakeholders, and resources needed for data governance.
  • Implement the plan: The next step is to implement the plan for data governance. This may involve making changes to your policies, procedures, and technology.
  • Monitor and improve: The final step is to monitor and improve your data governance practices. This will help you to ensure that data governance is effective and that it meets your goals.

By following these steps, you can implement data governance in your organization and reap the benefits that it has to offer.

DataLake Migration Strategy

Data lake migration is the process of moving data from a legacy data warehouse or data mart to a data lake. This can be a complex and challenging task, but it can be a valuable way to improve the efficiency and scalability of your data management.

There are three main data migration strategies:

  • Lift and shift: This is the simplest and cheapest strategy. It involves copying the data from the old system to the new system without any changes. This can be a good option if the old system is well-designed and the data is in good shape.
  • Replatform: This strategy involves transforming the data to fit the new system. This can be a more complex and expensive strategy, but it can be a good option if the old system is not well-designed or if the data needs to be cleaned up.
  • Refactor: This strategy involves redesigning the data architecture to take advantage of the new system. This can be the most complex and expensive strategy, but it can be a good option if you want to make significant changes to the way you manage your data.

The best data migration strategy for you will depend on your specific needs and requirements. If you are not sure which strategy is right for you, I recommend that you consult with a data migration expert.

Here are some of the factors to consider when choosing a data migration strategy:

  • The size and complexity of the data: The larger and more complex the data, the more complex the migration strategy will be.
  • The cost of the migration: The cost of the migration will depend on the size and complexity of the data, as well as the chosen strategy.
  • The time it takes to migrate the data: The time it takes to migrate the data will depend on the size and complexity of the data, as well as the chosen strategy.
  • The availability of the data during the migration: The data may not be available during the migration, so you need to make sure that you have a plan for how to manage this.
  • The risk of data loss or corruption: There is always a risk of data loss or corruption during a migration. You need to make sure that you have a plan for how to mitigate this risk.

Once you have chosen a data migration strategy, you need to develop a detailed plan. The plan should include the following:

  • The steps involved in the migration: The plan should include a detailed description of the steps involved in the migration.
  • The resources needed for the migration: The plan should identify the resources needed for the migration, such as hardware, software, and staff.
  • The timeline for the migration: The plan should specify the timeline for the migration.
  • The risks associated with the migration: The plan should identify the risks associated with the migration and how they will be mitigated.
  • The contingency plans: The plan should include contingency plans in case of unexpected problems.

By following these tips, you can increase your chances of success when migrating your data lake.

Create ODBC DSN using powershell

https://docs.microsoft.com/en-us/powershell/module/wdac/add-odbcdsn?view=windowsserver2019-ps


Why do cyber scam happen? - Social Engineering


Desi Cultural Circle (DCC)

Click Here : https://forms.gle/bjCvL53Hxsb1XPV36

Engineering

 https://github.com/T-Kuhn/HighPrecisionStepperJuggler


https://github.com/EdjeElectronics/OpenCV-Playing-Card-Detector

Docker - Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock

 

ERROR: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info: dial unix /var/run/docker.sock: connect: permission denied code example

Example 1: Got permission denied while trying to connect to the Docker daemon socket

sudo chmod 666 /var/run/docker.sock

Example 2: Server: ERROR: Got permission denied while trying to connect to the Docker daemon socket

sudo newgroup docker
sudo chmod 666 /var/run/docker.sock
sudo usermod -aG docker ${USER}

Secure Kafka

 ractices for building secure Hadoop cluster and you could find details here. In that blog I intentionally didn't mention Kafka's security, because this topic deserved dedicated article. Now it's time to do this and this blog will be devoted by Kafka security only. 

Kafka Security challenges

1) Encryption in motion. By default you communicate with Kafka cluster over unsecured network and everyone, who can listen network between your client and Kafka cluster, can read message content.

the way to avoid this is use some on-wire encryption technology - SSL/TLS. Using SSL/TLS you encrypt data on a wire between your client and Kafka cluster.

Communication without SSL/TLS:

SSL/TLS communication:

 

After you enable SSL/TLS communication, you will have follow consequence of steps for write/read message to/from Kafka cluster:

2) Authentication. Well, now when we encrypt traffic between client and server, but here is another challenge - server doesn't know with whom it communicate. In other words, you have to enable some mechanisms, which will not allow to work with cluster for UNKNOWN users. The default authentication mechanism in Hadoop world is Kerberos protocol. Here is the workflow, which shows sequence of steps to enable secure communication with Kafka:

 

Kerberos is the trusted way to authenticate user on cluster and make sure, that only known users can access it. 

3) Authorization. Next step when you authenticate user on your cluster (and you know that you are working as a Bob or Alice), you may want to apply some authorization rules, like setup permissions for certain users or groups. In other words define what user can do and what user can't do. Sentry may help you with this. Sentry have philosophy, when users belongs to the groups, groups has own roles and roles have permissions.

4) Rest Encryption. Another one security aspect is rest encryption. It's when you want to protect data, stored on the disk. Kafka is not purposed for long term storing data, but it could store data for a days or even weeks. We have to make sure that all data, stored on the disks couldn't be stolen and them read with out encryption key.

Security implementation. Step 1 - SSL/TLS

There is no any strict steps sequence for security implementation, but as a first step I will recommend to do SSL/TLS configuration. As a baseline I took Cloudera's documentation. For structuring all your security setup, create a directory on your Linux machine where you will put all files (start with one machine, but later on you will need to do the same on other's Kafka servers):

$ sudo chown -R kafka:kafka /opt/kafka/security

$ sudo mkdir -p /opt/kafka/security

A Java KeyStore (JKS) is a repository of security certificates – either authorization certificates or public key certificates – plus corresponding private keys, used for instance in SSL encryption. We will need to generate a key pair (a public key and associated private key). Wraps the public key into an X.509 self-signed certificate, which is stored as a single-element certificate chain. This certificate chain and the private key are stored in a new keystore entry identified by selfsigned.

# keytool -genkeypair -keystore keystore.jks -keyalg RSA -alias selfsigned -dname "CN=localhost" -storepass 'welcome2' -keypass 'welcome3'

if you want to check content of keystore, you may run follow command:

# keytool -list -v -keystore keystore.jks

...

Alias name: selfsigned

Creation date: May 30, 2018

Entry type: PrivateKeyEntry

Certificate chain length: 1

Certificate[1]:

Owner: CN=localhost

Issuer: CN=localhost

Serial number: 2065847b

Valid from: Wed May 30 12:59:54 UTC 2018 until: Tue Aug 28 12:59:54 UTC 2018

...

As the next step we will need to extract a copy of the cert from the java keystore that was just created:

# keytool -export -alias selfsigned -keystore keystore.jks -rfc -file server.cer

Enter keystore password: welcome2

Then create a trust store by making a copy of the default java trust store.  Main difference between trustStore vs keyStore is that trustStore (as name suggest) is used to store certificates from trusted Certificate authorities(CA) which is used to verify certificate presented by Server in SSL Connection while keyStore is used to store private key and own identity certificate which program should present to other party (Server or client) to verify its identity. Some more details you could find here. In my case on Big Data Cloud Service I've performed follow command:

# cp /usr/java/latest/jre/lib/security/cacerts /opt/kafka/security/truststore.jks

put it into truststore:

# ls -lrt

-rw-r--r-- 1 root root 113367 May 30 12:46 truststore.jks

-rw-r--r-- 1 root root   2070 May 30 12:59 keystore.jks

-rw-r--r-- 1 root root   1039 May 30 13:01 server.cer

put the certificate that was just extracted from the keystore into the trust store (note: "changeit" is standard password):

# keytool -import -alias selfsigned -file server.cer -keystore truststore.jks -storepass changeit

check file size after (it's bigger, because includes new certificate):

# ls -let

-rw-r--r-- 1 root root   2070 May 30 12:59 keystore.jks

-rw-r--r-- 1 root root   1039 May 30 13:01 server.cer

-rw-r--r-- 1 root root 114117 May 30 13:06 truststore.jks

It may seems too complicated and I decided to depict all those steps in one diagram:

so far, all those steps been performed on the single (some random broker) machine. But you will need to have keystore and trustore files on each Kafka broker, let's copy It (note, current syntax is working on Big Data Appliance, Big Data Cloud Service or Big Data Cloud at Customer):

# dcli -C "mkdir -p /opt/kafka/security"

# dcli -C "chown kafka:kafka /opt/kafka/security"

# dcli -C -f /opt/kafka/security/keystore.jks -d /opt/kafka/security/keystore.jks

# dcli -C -f /opt/kafka/security/truststore.jks -d /opt/kafka/security/truststore.jks

after doing all these steps, you need to make some configuration changes in Cloudera Manager for each node (go to Cloudera Manager -> Kafka -> Configuration): In addition to this, on each node, you have to change listeners in "Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.properties" 

Also, make sure, that in Cloudera Manager, you have security.inter.broker.protocol equal to SSL:  After node restart, when all brokers up and running, let's test it:

# openssl s_client -debug -connect kafka1.us2.oraclecloud.com:9093 -tls1_2

...

Certificate chain

0 s:/CN=localhost

   i:/CN=localhost

---

Server certificate

-----BEGIN CERTIFICATE-----

MIICxzCCAa+gAwIBAgIEIGWEezANBgkqhkiG9w0BAQsFADAUMRIwEAYDVQQDEwls

b2NhbGhvc3QwHhcNMTgwNTMwMTI1OTU0WhcNMTgwODI4MTI1OTU0WjAUMRIwEAYD

VQQDEwlsb2NhbGhvc3QwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCI

53T82eoDR2e9IId40UPTj3xg3khl1jdjNvMiuB/vcI7koK0XrZqFzMVo6zBzRHnf

zaFBKPAQisuXpQITURh6jrVgAs1V4hswRPrJRjM/jCIx7S5+1INBGoEXk8OG+OEf

m1uYXfULz0bX9fhfl+IdKzWZ7jiX8FY5dC60Rx2RTpATWThsD4mz3bfNd3DlADw2

LH5B5GAGhLqJjr23HFjiTuoQWQyMV5Esn6WhOTPCy1pAkOYqX86ad9qP500zK9lA

hynyEwNHWt6GoHuJ6Q8A9b6JDyNdkjUIjbH+d0LkzpDPg6R8Vp14igxqxXy0N1Sd

DKhsV90F1T0whlxGDTZTAgMBAAGjITAfMB0GA1UdDgQWBBR1Gl9a0KZAMnJEvxaD

oY0YagPKRTANBgkqhkiG9w0BAQsFAAOCAQEAaiNdHY+QVdvLSILdOlWWv653CrG1

2WY3cnK5Hpymrg0P7E3ea0h3vkGRaVqCRaM4J0MNdGEgu+xcKXb9s7VrwhecRY6E

qN0KibRZPb789zQVOS38Y6icJazTv/lSxCRjqHjNkXhhzsD3tjAgiYnicFd6K4XZ

rQ1WiwYq1254e8MsKCVENthQljnHD38ZDhXleNeHxxWtFIA2FXOc7U6iZEXnnaOM

Cl9sHx7EaGRc2adIoE2GXFNK7BY89Ip61a+WUAOn3asPebrU06OAjGGYGQnYbn6k

4VLvneMOjksuLdlrSyc5MToBGptk8eqJQ5tyWV6+AcuwHkTAnrztgozatg==

-----END CERTIFICATE-----

subject=/CN=localhost

issuer=/CN=localhost

---

No client certificate CA names sent

Server Temp Key: ECDH, secp521r1, 521 bits

---

SSL handshake has read 1267 bytes and written 441 bytes

---

New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384

Server public key is 2048 bit

Secure Renegotiation IS supported

Compression: NONE

Expansion: NONE

SSL-Session:

    Protocol  : TLSv1.2

    Cipher    : ECDHE-RSA-AES256-GCM-SHA384

    Session-ID: 5B0EAC6CA8FB4B6EA3D0B4A494A4660351A4BD5824A059802E399308C0B472A4

    Session-ID-ctx:

    Master-Key: 60AE24480E2923023012A464D16B13F954A390094167F54CECA1BDCC8485F1E776D01806A17FB332C51FD310730191FE

    Key-Arg   : None

    Krb5 Principal: None

    PSK identity: None

    PSK identity hint: None

    Start Time: 1527688300

    Timeout   : 7200 (sec)

    Verify return code: 18 (self signed certificate)

Well, seems our SSL connection is up and running. Time try to put some messages into the topic:

#  kafka-console-producer  --broker-list kafka1.us2.oraclecloud.com:9093  --topic foobar

...

18/05/30 13:56:28 WARN clients.NetworkClient: Connection to node -1 could not be established. Broker may not be available.

18/05/30 13:56:28 WARN clients.NetworkClient: Connection to node -1 could not be established. Broker may not be available.

reason of this error, that we don't have properly configured clients. We will need to create and use client.properties and jaas.conf files.

# cat /opt/kafka/security/client.properties

security.protocol=SSL

ssl.truststore.location=/opt/kafka/security/truststore.jks

ssl.truststore.password=changeit

-bash-4.1# cat jaas.conf

KafkaClient {

      com.sun.security.auth.module.Krb5LoginModule required

      useTicketCache=true;

    };

# export KAFKA_OPTS="-Djava.security.auth.login.config=/opt/kafka/security/jaas.conf"

 now you could try again to produce messages:

# kafka-console-producer --broker-list kafka1.us2.oraclecloud.com:9093  --topic foobar --producer.config client.properties

...

Hello SSL world

no any errors - already good! Let's try to consume message:

# kafka-console-consumer --bootstrap-server kafka1.us2.oraclecloud.com:9093 --topic foobar --from-beginning  --consumer.config /opt/kafka/security/client.properties

...

Hello SSL world

Bingo! So, we created secure communication between Kafka Cluster and Kafka Client and write a message there.

Security implementation. Step 2 - Kerberos

So, we up and run Kafka on Kerberized cluster and write and read data from a cluster without Kerberos ticket.

$ klist

klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_1001)

This is not how it's suppose to work. We assume that if we protect cluster by Kerberos it's impossible to do something without ticket. Fortunately, it's relatively easy to config communications with Kerberized Kafka cluster.

First, make sure that you have enabled Kerberos authentification in Cloudera Manager (Cloudera Manager -> Kafka -> Configuration):

second, go again to Cloudera Manager and change value of "security.inter.broker.protocol" to SASL_SSL:  Note: Simple Authentication and Security Layer (SASL) is a framework for authentication and data security in Internet protocols. It decouples authentication mechanisms from application protocols, in theory allowing any authentication mechanism supported by SASL to be used in any application protocol that uses SASL. Very roughly - in this blog post you may think that SASL is equal to Kerberos. After this change, you will need to modify listeners protocol on each broker (to SASL_SSL) in "Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.properties" setting: you ready for restart Kafka Cluster and write/read data from/to it.  Before doing this, you will need to modify Kafka client credentials:

$ cat /opt/kafka/security/client.properties

security.protocol=SASL_SSL

sasl.kerberos.service.name=kafka

ssl.truststore.location=/opt/kafka/security/truststore.jks

ssl.truststore.password=changeit

after this you may try to read data from Kafka cluster:

$ kafka-console-consumer --bootstrap-server kafka1.us2.oraclecloud.com:9093 --topic foobar --from-beginning  --consumer.config /opt/kafka/security/client.properties

...

Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner  authentication information from the user

...

Error may miss-lead you, but the the real reason is absence of Kerberos ticket:

$ klist

klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_1001)

$ kinit oracle

Password for oracle@BDACLOUDSERVICE.ORACLE.COM:

$ kafka-console-consumer --bootstrap-server kafka1.us2.oraclecloud.com:9093 --topic foobar --from-beginning  --consumer.config /opt/kafka/security/client.properties

...

Hello SSL world

Great, it works! now we have to run kinit everytime before read/write data from Kafka cluster. Instead of this for convenience we may use keytab. For doing this you will need go to KDC server and generate keytab file there:

# kadmin.local

Authenticating as principal hdfs/admin@BDACLOUDSERVICE.ORACLE.COM with password.

kadmin.local: xst -norandkey -k testuser.keytab testuser

Entry for principal oracle with kvno 2, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:oracle.keytab.

Entry for principal oracle with kvno 2, encryption type aes128-cts-hmac-sha1-96 added to keytab WRFILE:oracle.keytab.

Entry for principal oracle with kvno 2, encryption type des3-cbc-sha1 added to keytab WRFILE:oracle.keytab.

Entry for principal oracle with kvno 2, encryption type arcfour-hmac added to keytab WRFILE:oracle.keytab.

Entry for principal oracle with kvno 2, encryption type des-hmac-sha1 added to keytab WRFILE:oracle.keytab.

Entry for principal oracle with kvno 2, encryption type des-cbc-md5 added to keytab WRFILE:oracle.keytab.

kadmin.local:  quit

# ls -l

...

-rw-------  1 root root    436 May 31 14:06 testuser.keytab

...

now, when we have keytab file, we could copy it to the client machine and use it for Kerberos Authentication. don't forget to change owner of keytab file to person, who will run the script:

$ chown opc:opc /opt/kafka/security/testuser.keytab

Also, we will need to modify jaas.conf file:

$ cat /opt/kafka/security/jaas.conf

KafkaClient {

      com.sun.security.auth.module.Krb5LoginModule required

      useKeyTab=true

      keyTab="/opt/kafka/security/testuser.keytab"

      principal="testuser@BDACLOUDSERVICE.ORACLE.COM";    

};

seems we are fully ready to consumption of messages from topic. Despite on we have oracle as kerberos principal on a OS, we connect to the cluster as testuser (according jaas.conf):

$ kafka-console-consumer --bootstrap-server kafka1.us2.oraclecloud.com:9093 --topic foobar --from-beginning  --consumer.config /opt/kafka/security/client.properties

...

18/05/31 15:04:45 INFO authenticator.AbstractLogin: Successfully logged in.

18/05/31 15:04:45 INFO kerberos.KerberosLogin: [Principal=testuser@BDACLOUDSERVICE.ORACLE.COM]: TGT refresh thread started.

...

Hello SSL world

Security Implementation Step 3 - Sentry

One step before we configured Authentication, which answers on question - who am I. Now is the time to set up some Authorization mechanism, which will answer on question - what am I allow to do. Sentry became very popular engine in Hadoop world and we will use it for Kafka's Authorization. As I posted earlier Sentry have philosophy, when users belongs to the groups, groups has own roles and roles have permissions:

And we will need to follow this with Kafka as well. But we will start with some service configurations first (Cloudera Manager -> Kafka -> Configuration):

Also, it's very important to add in Sentry config (Cloudera Manager -> Sentry -> Config) user kafka in "sentry.service.admin.group":

 Well, when we know who connects to the cluster, we may restrict he/she from reading some particular topics (in other words perform some Authorization). 

Note: for perform administrative operations with Sentry, you have to work as Kafka user.

$ id

uid=1001(opc) gid=1005(opc) groups=1005(opc)

$ sudo find /var -name kafka*keytab -printf "%T+\t%p\n" | sort|tail -1|cut -f 2

/var/run/cloudera-scm-agent/process/1171-kafka-KAFKA_BROKER/kafka.keytab

$ sudo cp /var/run/cloudera-scm-agent/process/1171-kafka-KAFKA_BROKER/kafka.keytab /opt/kafka/security/kafka.keytab

$ sudo chown opc:opc /opt/kafka/security/kafka.keytab

obtain Kafka ticket:

$ kinit -kt /opt/kafka/security/kafka.keytab kafka/`hostname`

$ klist

Ticket cache: FILE:/tmp/krb5cc_1001

Default principal: kafka/kafka1.us2.oraclecloud.com@BDACLOUDSERVICE.ORACLE.COM

 

Valid starting     Expires            Service principal

05/31/18 15:52:28  06/01/18 15:52:28  krbtgt/BDACLOUDSERVICE.ORACLE.COM@BDACLOUDSERVICE.ORACLE.COM

    renew until 06/05/18 15:52:28

Before configuring and testing Sentry with Kafka, we will need to create unprivileged user, who we will give grants (Kafka user is privileged and it bypassed Sentry). there are few simple steps, create test user (unprivileged) on each Hadoop node (this syntax will work on Big Data Appliance, Big Data Cloud Service and Big Data Cloud at Customer):

# dcli -C "useradd testsentry -u 1011"

we should remember that Sentry heavily relies on the Groups and we have to create it and put "testsentry" user there:

# dcli -C "groupadd testsentry_grp -g 1017"

after group been created, we should put user there:

dcli -C "usermod -g testsentry_grp testsentry"

check that everything is how we expect:

# dcli -C "id testsentry"

10.196.64.44: uid=1011(testsentry) gid=1017(testsentry_grp) groups=1017(testsentry_grp)

10.196.64.60: uid=1011(testsentry) gid=1017(testsentry_grp) groups=1017(testsentry_grp)

10.196.64.64: uid=1011(testsentry) gid=1017(testsentry_grp) groups=1017(testsentry_grp)

10.196.64.65: uid=1011(testsentry) gid=1017(testsentry_grp) groups=1017(testsentry_grp)

10.196.64.61: uid=1011(testsentry) gid=1017(testsentry_grp) groups=1017(testsentry_grp)

Note: you have to have same userID and groupID on each machine. Now verify that Hadoop can lookup group:

# hdfs groups testsentry

testsentry : testsentry_grp

All this steps you have to perform as root. Next you should create testsentry principal in KDC (it's not mandatory, but more organize and easy to understand). Go to the KDC host and run follow commands:

# kadmin.local 

Authenticating as principal root/admin@BDACLOUDSERVICE.ORACLE.COM with password. 

kadmin.local:  addprinc testsentry

WARNING: no policy specified for testsentry@BDACLOUDSERVICE.ORACLE.COM; defaulting to no policy

Enter password for principal "testsentry@BDACLOUDSERVICE.ORACLE.COM": 

Re-enter password for principal "testsentry@BDACLOUDSERVICE.ORACLE.COM": 

Principal "testsentry@BDACLOUDSERVICE.ORACLE.COM" created.

kadmin.local:  xst -norandkey -k testsentry.keytab testsentry

Entry for principal testsentry with kvno 1, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:testsentry.keytab.

Entry for principal testsentry with kvno 1, encryption type aes128-cts-hmac-sha1-96 added to keytab WRFILE:testsentry.keytab.

Entry for principal testsentry with kvno 1, encryption type des3-cbc-sha1 added to keytab WRFILE:testsentry.keytab.

Entry for principal testsentry with kvno 1, encryption type arcfour-hmac added to keytab WRFILE:testsentry.keytab.

Entry for principal testsentry with kvno 1, encryption type des-hmac-sha1 added to keytab WRFILE:testsentry.keytab.

Entry for principal testsentry with kvno 1, encryption type des-cbc-md5 added to keytab WRFILE:testsentry.keytab.

Now we have all setup for unprivileged user. Time to start configure Sentry policies. As soon as Kafka is superuser we may run admin commands as Kafka user. For managing sentry settings we will need to use Kafka user. To obtain Kafka credentials we need to run:

$ kinit -kt /opt/kafka/security/kafka.keytab kafka/`hostname`

$ klist 

Ticket cache: FILE:/tmp/krb5cc_1001

Default principal: kafka/kafka1.us2.oraclecloud.com@BDACLOUDSERVICE.ORACLE.COM

 

Valid starting     Expires            Service principal

06/15/18 01:37:53  06/16/18 01:37:53  krbtgt/BDACLOUDSERVICE.ORACLE.COM@BDACLOUDSERVICE.ORACLE.COM

    renew until 06/20/18 01:37:53

First we need to create role. Let's call it testsentry_role:

$ kafka-sentry -cr -r testsentry_role

let's check, that role been created:

$ kafka-sentry -cr -r testsentry_role

...

admin_role

testsentry_role

[opc@cfclbv3872 ~]$ 

as soon as role created, we will need to give some permissions to this role for certain topic:

$ kafka-sentry -gpr -r testsentry_role -p "Host=*->Topic=testTopic->action=write"

and also describe:

kafka-sentry -gpr -r testsentry_role -p "Host=*->Topic=testTopic->action=describe"

next step, we have to allow some consumer group to read and describe from this topic:

$ kafka-sentry -gpr -r testsentry_role -p "Host=*->Consumergroup=testconsumergroup->action=read"

$ kafka-sentry -gpr -r testsentry_role -p "Host=*->Consumergroup=testconsumergroup->action=describe"

next step is linking role and groups, we will need to assign testsentry_role to testsentry_grp (group automatically inherit all role's permissions):

$ kafka-sentry -arg -r testsentry_role -g testsentry_grp

after this, let's check that our mapping worked fine:

$ kafka-sentry -lr -g testsentry_grp

...

testsentry_role

now let's review list of the permissions, which have our certain role:

$ kafka-sentry -r testsentry_role -lp

...

HOST=*->CONSUMERGROUP=testconsumergroup->action=read

HOST=*->TOPIC=testTopic->action=write

HOST=*->TOPIC=testTopic->action=describe

HOST=*->TOPIC=testTopic->action=read

it's also very important to have consumer group in client properties file:

$ cat /opt/kafka/security/client.properties

security.protocol=SASL_SSL

sasl.kerberos.service.name=kafka

ssl.truststore.location=/opt/kafka/security/truststore.jks

ssl.truststore.password=changeit

group.id=testconsumergroup

after all set, we will need to switch to testsentry user for testing:

$ kinit -kt /opt/kafka/security/testsentry.keytab testsentry

$ klist 

Ticket cache: FILE:/tmp/krb5cc_1001

Default principal: testsentry@BDACLOUDSERVICE.ORACLE.COM

 

Valid starting     Expires            Service principal

06/15/18 01:38:49  06/16/18 01:38:49  krbtgt/BDACLOUDSERVICE.ORACLE.COM@BDACLOUDSERVICE.ORACLE.COM

    renew until 06/22/18 01:38:49

test writes:

$ kafka-console-producer --broker-list kafka1.us2.oraclecloud.com:9093 --topic testTopic --producer.config /opt/kafka/security/client.properties

...

> testmessage1

> testmessage2

>

seems everything is ok, now let's test a read:

$ kafka-console-consumer --bootstrap-server kafka1.us2.oraclecloud.com:9093 --topic testTopic --from-beginning  --consumer.config /opt/kafka/security/client.properties

...

testmessage1

testmessage2

now, for showing Sentry in action, I'll try to read messages from other topic, which is outside of allowed topics for our test group.

$ kafka-console-consumer --from-beginning --bootstrap-server kafka1.us2.oraclecloud.com:9093 --topic foobar --consumer.config /opt/kafka/security/client.properties

...

18/06/15 02:54:54 INFO internals.AbstractCoordinator: (Re-)joining group testconsumergroup

18/06/15 02:54:54 WARN clients.NetworkClient: Error while fetching metadata with correlation id 13 : {foobar=UNKNOWN_TOPIC_OR_PARTITION}

18/06/15 02:54:54 WARN clients.NetworkClient: Error while fetching metadata with correlation id 15 : {foobar=UNKNOWN_TOPIC_OR_PARTITION}

18/06/15 02:54:54 WARN clients.NetworkClient: Error while fetching metadata with correlation id 16 : {foobar=UNKNOWN_TOPIC_OR_PARTITION}

18/06/15 02:54:54 WARN clients.NetworkClient: Error while fetching metadata with correlation id 17 : {foobar=UNKNOWN_TOPIC_OR_PARTITION}

so, as we can see we could not read from Topic, which we don't authorize to read.

Systemizing all this, I'd like to put user-group-role-privilegies flow on one picture:

And also, I'd like to summarize steps, required for getting list of privileges for certain user (testsentry in my example):

// Run as superuser - Kafka

$ kinit -kt /opt/kafka/security/kafka.keytab kafka/`hostname`

$ klist 

Ticket cache: FILE:/tmp/krb5cc_1001

Default principal: kafka/cfclbv3872.us2.oraclecloud.com@BDACLOUDSERVICE.ORACLE.COM

 

Valid starting     Expires            Service principal

06/19/18 02:38:26  06/20/18 02:38:26  krbtgt/BDACLOUDSERVICE.ORACLE.COM@BDACLOUDSERVICE.ORACLE.COM

    renew until 06/24/18 02:38:26

// Get list of the groups which belongs certain user

$ hdfs groups testsentry

testsentry : testsentry_grp

// Get list of the role for certain group

$ kafka-sentry -lr -g testsentry_grp

...

 

testsentry_role

// Get list of permissions for certain role

$ kafka-sentry -r testsentry_role -lp

...

HOST=*->CONSUMERGROUP=testconsumergroup->action=read

HOST=*->TOPIC=testTopic->action=describe

HOST=*->TOPIC=testTopic->action=write

HOST=*->TOPIC=testTopic->action=read

HOST=*->CONSUMERGROUP=testconsumergroup->action=describe

Based on what we saw above - our user testsentry could read and write to topic testTopic. For reading data he should to belong to the consumergroup "testconsumergroup".

Security Implementation Step 4 - Encryption At Rest

Last part of security journey is Encryption of Data, which you store on the disks. Here there are multiple ways, one of the most common is Navigator Encrypt.