PowerShell Useful Commands

Domain Name


Start -> Run -> CMD
nslookup set type=all
_ldap._tcp.dc._msdcs.DOMAIN_NAME

MySQL - Adding Remote User

To add mysql user with remote access to the database you have to:
bind mysql service to external IP address on the server
add mysql user for remote connection
grant user permissions to access the database
In order to connect remotely you have to have MySQL bind port 3306 to your server’s external IP.
Edit my.cnf:

#Replace xxx with your IP Address
bind-address = xxx.xxx.xxx.xxx
Restart mysql as you change config. If you don’t have firewall enabled, you should have access to mysql service from external clients now.

Now you have to have created the user in both localhost and ‘%’ wildcard and grant permissions on all DB’s as such . Open mysql and run commands:

CREATE USER 'myuser'@'localhost' IDENTIFIED BY 'mypass';
CREATE USER 'myuser'@'%' IDENTIFIED BY 'mypass';
Then

GRANT ALL ON *.* TO 'myuser'@'localhost';
GRANT ALL ON *.* TO 'myuser'@'%';
This will let myuser access all databases from server as well from external sources. Depending on your OS you may have to open port 3306 to allow remote connections. If it’s your case – look for firewall (iptables in Linux OSes) configuration

Databricks

 


Databricks is a unified analytics platform that helps organizations to solve their most challenging data problems. It is a cloud-based platform that provides a single environment for data engineering, data science, and machine learning.

Databricks offers a wide range of features and capabilities, including:

  • Apache Spark: Databricks is built on Apache Spark, a unified analytics engine for large-scale data processing.
  • Delta Lake: Delta Lake is a unified data lake storage format that provides ACID transactions, version control, and lineage.
  • MLflow: MLflow is an open source platform for managing the end-to-end machine learning lifecycle.
  • Workspaces: Databricks Workspaces provide a secure and collaborative environment for data scientists and engineers to work together.
  • Notebooks: Databricks Notebooks are a powerful tool for data exploration, analysis, and visualization.
  • Jobs: Databricks Jobs are a way to automate data pipelines and workflows.
  • Monitoring: Databricks provides a comprehensive monitoring dashboard that provides visibility into your data and workloads.

Databricks is a popular choice for organizations of all sizes. It is used by some of the world's largest companies, such as Airbnb, Spotify, and Uber.

Here are some of the benefits of using Databricks:

  • Speed: Databricks can help you to process large amounts of data quickly and efficiently.
  • Scalability: Databricks is scalable, so you can easily add more resources as your needs grow.
  • Ease of use: Databricks is easy to use, even for non-technical users.
  • Collaboration: Databricks provides a collaborative environment for data scientists and engineers to work together.
  • Security: Databricks is secure, so you can be confident that your data is safe.

If you are looking for a unified analytics platform that can help you to solve your most challenging data problems, then Databricks is a good choice.

Here are some of the use cases for Databricks:

  • Data engineering: Databricks can be used to build and manage data pipelines.
  • Data science: Databricks can be used to develop and deploy machine learning models.
  • Business intelligence: Databricks can be used to create interactive dashboards and reports.
  • Regulatory compliance: Databricks can be used to help organizations comply with regulations, such as GDPR and CCPA.
  • Research: Databricks can be used to conduct research and analysis on large datasets.

If you are interested in learning more about Databricks, I recommend that you visit the Databricks website.

Data Catalog

 A data catalog is a system that collects and organizes metadata about data assets. It provides a central repository for information about the data, such as its source, format, and usage. Data catalogs can be used to help people find and use the data they need, and to improve the overall management of data assets.

Here are some of the benefits of using a data catalog:

  • Improved data discovery: Data catalogs can help people find the data they need by providing a central repository for information about the data. This can save time and effort, and it can help to ensure that people are using the most accurate and up-to-date data.
  • Increased data usability: Data catalogs can make data more usable by providing information about the data's format, lineage, and quality. This can help people understand the data and to use it more effectively.
  • Improved data governance: Data catalogs can help to improve data governance by providing information about the data's ownership, access control, and security. This can help to ensure that the data is managed in a secure and compliant manner.
  • Reduced data duplication: Data catalogs can help to reduce data duplication by providing information about the data's location and usage. This can help to prevent people from creating duplicate copies of the data.
  • Improved data quality: Data catalogs can help to improve data quality by providing information about the data's lineage and quality. This can help to identify and correct errors in the data.

There are two main types of data catalogs:

  • Enterprise data catalogs: These are designed to be used by entire organizations. They typically store metadata about all of the data assets in the organization.
  • Self-service data catalogs: These are designed to be used by individual users or teams. They typically store metadata about the data assets that are relevant to the user or team.

Data catalogs can be implemented using a variety of technologies, such as Hadoop, Hive, and Spark. The best technology for your organization will depend on your specific needs and requirements.

If you are considering implementing a data catalog in your organization, I recommend that you do the following:

  • Define your goals: The first step is to define your goals for the data catalog. What do you want to achieve by implementing a data catalog?
  • Identify your stakeholders: The next step is to identify your stakeholders. Who will be using the data catalog?
  • Assess your current state: The next step is to assess your current state of data management. What are your strengths and weaknesses?
  • Develop a plan: The next step is to develop a plan for implementing the data catalog. This plan should include the goals, stakeholders, and resources needed for the data catalog.
  • Implement the plan: The next step is to implement the plan for the data catalog. This may involve making changes to your policies, procedures, and technology.
  • Monitor and improve: The final step is to monitor and improve the data catalog. This will help you to ensure that the data catalog is effective and that it meets your goals.

By following these steps, you can implement a data catalog in your organization and reap the benefits that it has to offer.

KERBEROS - ACL Example

 Here is an example of a kadm5.acl file:

*/admin@ATHENA.MIT.EDU    *                               # line 1
joeadmin@ATHENA.MIT.EDU   ADMCIL                          # line 2
joeadmin/*@ATHENA.MIT.EDU i   */root@ATHENA.MIT.EDU       # line 3
*/root@ATHENA.MIT.EDU     ci  *1@ATHENA.MIT.EDU           # line 4
*/root@ATHENA.MIT.EDU     l   *                           # line 5
sms@ATHENA.MIT.EDU        x   * -maxlife 9h -postdateable # line 6

(line 1) Any principal in the ATHENA.MIT.EDU realm with an admin instance has all administrative privileges except extracting keys.

(lines 1-3) The user joeadmin has all permissions except extracting keys with his admin instance, joeadmin/admin@ATHENA.MIT.EDU (matches line 1). He has no permissions at all with his null instance, joeadmin@ATHENA.MIT.EDU (matches line 2). His root and other non-admin, non-null instances (e.g., extra or dbadmin) have inquire permissions with any principal that has the instance root (matches line 3).

(line 4) Any root principal in ATHENA.MIT.EDU can inquire or change the password of their null instance, but not any other null instance. (Here, *1 denotes a back-reference to the component matching the first wildcard in the actor principal.)

(line 5) Any root principal in ATHENA.MIT.EDU can generate the list of principals in the database, and the list of policies in the database. This line is separate from line 4, because list permission can only be granted globally, not to specific target principals.

(line 6) Finally, the Service Management System principal sms@ATHENA.MIT.EDU has all permissions except extracting keys, but any principal that it creates or modifies will not be able to get postdateable tickets or tickets with a life of longer than 9 hours.

KERBEROS Cheet Sheet

This summary is not available. Please click here to view the post.

Importance of change advisory board (CAB)

 

A change advisory board (CAB) is a group of people who meet regularly to review and approve changes to an organization's IT infrastructure. The CAB helps to ensure that changes are made in a controlled and orderly manner, and that they do not impact the business negatively.

The importance of a CAB can be summarized as follows:

  • Ensures that changes are reviewed and approved by a group of experts: The CAB typically includes representatives from different areas of the organization, such as IT, business, and operations. This ensures that changes are reviewed from all angles and that any potential risks are identified and mitigated.
  • Provides a forum for communication and collaboration: The CAB provides a forum for stakeholders to discuss changes and to reach consensus on the best course of action. This helps to ensure that changes are implemented smoothly and that everyone is on the same page.
  • Helps to improve the quality of changes: The CAB can help to ensure that changes are well-planned, well-tested, and documented. This helps to reduce the risk of errors and problems.
  • Helps to improve the efficiency of change management: The CAB can help to streamline the change management process and to identify opportunities for improvement. This can help to save time and money.
  • Helps to build trust and credibility: The CAB can help to build trust and credibility between IT and the business. This is important for ensuring that changes are supported by the business and that they are implemented successfully.

Overall, the CAB is an important part of any organization's change management process. By ensuring that changes are reviewed and approved by a group of experts, the CAB helps to improve the quality, efficiency, and success of changes.

Here are some of the benefits of having a CAB:

  • Improved decision-making: The CAB can help to improve decision-making by providing a forum for discussion and debate. This can help to ensure that all perspectives are considered and that the best possible decision is made.
  • Increased visibility: The CAB can help to increase visibility of changes by providing a forum for communication and collaboration. This can help to ensure that everyone is aware of changes and that they are implemented smoothly.
  • Reduced risk: The CAB can help to reduce risk by identifying and mitigating potential problems. This can help to prevent changes from impacting the business negatively.
  • Improved efficiency: The CAB can help to improve efficiency by streamlining the change management process. This can help to save time and money.
  • Increased compliance: The CAB can help to ensure that changes comply with all relevant regulations. This can help to protect the organization from fines and penalties.

If you are considering implementing a CAB in your organization, I recommend that you do the following:

  • Define the scope of the CAB: The first step is to define the scope of the CAB. This will help to determine who should be involved and what issues should be discussed.
  • Identify the members of the CAB: The next step is to identify the members of the CAB. The members should be experts from different areas of the organization, such as IT, business, and operations.
  • Establish the meeting schedule: The CAB should meet regularly to review and approve changes. The meeting schedule should be agreed upon by all members.
  • Develop the meeting agenda: The CAB should have a clear agenda for each meeting. This will help to ensure that the meeting is productive.
  • Document the decisions: The decisions of the CAB should be documented. This will help to ensure that everyone is aware of the decisions that have been made.

By following these steps, you can ensure that your CAB is successful.

Installing Apache Zeppelin on a Hadoop Cluster

Apache Zeppelin(https://zeppelin.incubator.apache.org/)  is a web-based notebook that enables interactive data analytics. You can make data-driven, interactive and collaborative documents with SQL, Scala and more.


This document describes the steps you can take to install Apache Zeppelin on a CentOS 7 Machine.


Steps

Note: Run all the commands as Root


Configure the Environment

Install Maven (If not already done)

cd /tmp/

wget https://archive.apache.org/dist/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz

tar xzf apache-maven-3.1.1-bin.tar.gz -C /usr/local

cd /usr/local

ln -s apache-maven-3.1.1 maven

Configure Maven (If not already done)

#Run the following

export M2_HOME=/usr/local/maven

export M2=${M2_HOME}/bin

export PATH=${M2}:${PATH}

Note: If you were to login as a different user or logout these settings will be whipped out so you won’t be able to run any mvn commands. To prevent this, you can append these export statements to the end of your ~/.bashrc file:


#append the export statements

vi ~/.bashrc

#apply the export statements

source ~/.bashrc


Install NodeJS


Note: Steps referenced from https://nodejs.org/en/download/package-manager/


curl --silent --location https://rpm.nodesource.com/setup_5.x | bash -


yum install -y nodejs

Install Dependencies

Note: Used for Zeppelin Web App


yum install -y bzip2 fontconfig

Install Apache Zeppelin

Select the version you would like to install

View the available releases and select the latest:


https://github.com/apache/zeppelin/releases


Override the {APACHE_ZEPPELIN_VERSION} placeholder with the value you would like to use.



Download Apache Zeppelin

cd /opt/

wget https://github.com/apache/zeppelin/archive/{APACHE_ZEPPELIN_VERSION}.zip

unzip {APACHE_ZEPPELIN_VERSION}.zip

ln -s /opt/zeppelin-{APACHE_ZEPPELIN_VERSION-WITHOUT_V_INFRONT} /opt/zeppelin

rm {APACHE_ZEPPELIN_VERSION}.zip

Get Build Variable Values

Get Spark Version

Running the following command


spark-submit --version

Override the {SPARK_VERSION} placeholder with this value.


Example: 1.6.0


Get Hadoop Version

Running the following command


hadoop version

Override the {HADOOP_VERSION} placeholder with this value.


Example: 2.6.0-cdh5.9.0


Take the this value and get the major and minor version of Hadoop. Override the {SIMPLE_HADOOP_VERSION} placeholder with this value.


Example: 2.6


Build Apache Zeppelin

Update the bellow placeholders and run


cd /opt/zeppelin

mvn clean package -Pspark-{SPARK_VERSION} -Dhadoop.version={HADOOP_VERSION} -Phadoop-{SIMPLE_HADOOP_VERSION} -Pvendor-repo -DskipTests

Note: this process will take a while


 


Configure Apache Zeppelin

Base Zeppelin Configuration

Setup Conf

cd /opt/zeppelin/conf/

cp zeppelin-env.sh.template zeppelin-env.sh

cp zeppelin-site.xml.template zeppelin-site.xml

Setup Hive Conf

# note: verify that the path to your hive-site.xml is correct

ln -s /etc/hive/conf/hive-site.xml /opt/zeppelin/conf/

Edit zeppelin-env.sh

Uncomment export HADOOP_CONF_DIR

Set it to export HADOOP_CONF_DIR=“/etc/hadoop/conf”


Starting/Stopping Apache Zeppelin

Start Zeppelin

/opt/zeppelin/bin/zeppelin-daemon.sh start

Restart Zeppelin

/opt/zeppelin/bin/zeppelin-daemon.sh restart

Stop Zeppelin

/opt/zeppelin/bin/zeppelin-daemon.sh stop

Viewing Web UI

Once the zeppelin process is running you can view the WebUI by opening a web browser and navigating to:


http://{HOST}:8080/


Note: Network rules will need to allow this communication


Runtime Apache Zeppelin Configuration

Further configurations maybe needed for certain operations to work


Configure Hive in Zeppelin

Open the cloudera manager and get the public host name of the machine that has the HiveServer2 role. Identify this as HIVESERVER2_HOST

Open the Web UI and click the Interpreter tab

Change the Hive default.url option to: jdbc:hive2://{HIVESERVER2_HOST}:10000


How to check the MD5 checksum of a downloaded file

 Issue:

You would like to verify the integrity of your downloaded files.

Solution:

WINDOWS:

Download the latest version of WinMD5Free.

Extract the downloaded zip and launch the WinMD5.exe file.

Click on the Browse button, navigate to the file that you want to check and select it.

Just as you select the file, the tool will show you its MD5 checksum.

Copy and paste the original MD5 value provided by the developer or the download page.

Click on Verify button.

MAC:

Download the file you want to check and open the download folder in Finder.

Open the Terminal, from the Applications / Utilities folder.

Type md5 followed by a space. Do not press Enter yet.

Drag the downloaded file from the Finder window into the Terminal window.

Press Enter and wait a few moments.

The MD5 hash of the file is displayed in the Terminal.

Open the checksum file provided on the Web page where you downloaded your file from.

The file usually has a .cksum extension.

NOTE: The file should contain the MD5 sum of the download file. For example: md5sum: 25d422cc23b44c3bbd7a66c76d52af46

 Compare the MD5 hash in the checksum file to the one displayed in the Terminal.

If they are exactly the same, your file was downloaded successfully. Otherwise, download your file again.


LINUX:

Open a terminal window.

Type the following command: md5sum [type file name with extension here] [path of the file] -- NOTE: You can also drag the file to the terminal window instead of typing the full path.

Hit the Enter key.

You’ll see the MD5 sum of the file. 

Match it against the original value.

Cannot Login to Cloudera Manager with LDAP/LDAPS Enabled

Summary

After changing ‘Authentication Backend Order’ to external, users cannot login. This guide explains how to revert back to default behaviour, authenticating through database first.

Symptoms

Users cannot login to Cloudera Manager

Conditions

Cloudera Manager boots up

Login page accessible through the browser

External authentication is enabled (LDAP, LDAP with TLS = LDAPS)

Authentication Backend Order, was changed to external authentication.

Cause

Cloudera Manager is trying to connect to LDAP If auth_backend_order is set to external only or external and DB. A misconfiguration with LDAP or External authentication is causing Cloudera Manager Server to unable to map users credential appropriately.

Instructions

Please follow the instructions to fix this.

Note: Take backup of the SCM database [0]

By deleting auth_backend_order order config Cloudera Manager falls back to the DB_ONLY auth backend and will not try to connect to the LDAP server.

Step 1: 

Stop the Cloudera Manager server

$sudo service cloudera-scm-server stop

Confirm the auth_backend_order is other than non-default ie: not DB_ONLY or nothing.


Step – 2:

Run this query in the Cloudera Manager schema to reset the Authentication Backend Order configuration:

Connect mysql DB: 

./mysql -u root -p

mysql>use scm;

mysql> select ATTR, VALUE from CONFIGS where ATTR = “auth_backend_order”;

Delete the auth_backend_order attribute from Cloudera Manager database (this will revert to default behavior). Run below query in the Cloudera Manager schema to reset the Authentication Backend Order configuration:

mysql> delete from CONFIGS where ATTR = “auth_backend_order” and SERVICE_ID is null;


Step – 3:

Start the Cloudera Manager server

$sudo service cloudera-scm-server start


Try to login now with admin user.


Reference

https://www.devopsbaba.com/cannot-login-to-cloudera-manager-with-ldap-ldaps-enabled/


Linux Administration Commands

As a system administrator, you may want to know who is on the system at any give point in time. You may also want to know what they are doing. In this article let us review 4 different methods to identify who is on your Linux system.

1. Get the running processes of logged-in user using w

w command is used to show logged-in user names and what they are doing. The information will be read from /var/run/utmp file. The output of the w command contains the following columns:
  • Name of the user
  • User’s machine number or tty number
  • Remote machine address
  • User’s Login time
  • Idle time (not usable time)
  • Time used by all processes attached to the tty (JCPU time)
  • Time used by the current process (PCPU time)
  • Command currently getting executed by the users
 
Following options can be used for the w command:
  • -h Ignore the header information
  • -u Display the load average (uptime output)
  • -s Remove the JCPU, PCPU, and login time.

$ w
 23:04:27 up 29 days,  7:51,  3 users,  load average: 0.04, 0.06, 0.02
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
ramesh   pts/0    dev-db-server        22:57    8.00s  0.05s  0.01s sshd: ramesh [priv]
jason    pts/1    dev-db-server        23:01    2:53   0.01s  0.01s -bash
john     pts/2    dev-db-server        23:04    0.00s  0.00s  0.00s w

$ w -h
ramesh   pts/0    dev-db-server        22:57   17:43   2.52s  0.01s sshd: ramesh [priv]
jason    pts/1    dev-db-server        23:01   20:28   0.01s  0.01s -bash
john     pts/2    dev-db-server        23:04    0.00s  0.03s  0.00s w -h

$ w -u
 23:22:06 up 29 days,  8:08,  3 users,  load average: 0.00, 0.00, 0.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
ramesh   pts/0    dev-db-server        22:57   17:47   2.52s  2.49s top
jason    pts/1    dev-db-server        23:01   20:32   0.01s  0.01s -bash
john     pts/2    dev-db-server        23:04    0.00s  0.03s  0.00s w -u

$ w -s
 23:22:10 up 29 days,  8:08,  3 users,  load average: 0.00, 0.00, 0.00
USER     TTY      FROM               IDLE WHAT
ramesh   pts/0    dev-db-server        17:51  sshd: ramesh [priv]
jason    pts/1    dev-db-server        20:36  -bash
john     pts/2    dev-db-server         1.00s w -s

2. Get the user name and process of logged in user using who and users command

who command is used to get the list of the usernames who are currently logged in. Output of the who command contains the following columns: user name, tty number, date and time, machine address.
$ who
ramesh pts/0        2009-03-28 22:57 (dev-db-server)
jason  pts/1        2009-03-28 23:01 (dev-db-server)
john   pts/2        2009-03-28 23:04 (dev-db-server)
 
To get a list of all usernames that are currently logged in, use the following:
$ who | cut -d' ' -f1 | sort | uniq
john
jason
ramesh

Users Command

users command is used to print the user name who are all currently logged in the current host. It is one of the command don’t have any option other than help and version. If the user using, ‘n’ number of terminals, the user name will shown in ‘n’ number of time in the output.
$ users
john jason ramesh

3. Get the username you are currently logged in using whoami

whoami command is used to print the loggedin user name.
$ whoami
john
 
whoami command gives the same output as id -un as shown below:
$ id -un
john
 
who am i command will display the logged-in user name and current tty details. The output of this command contains the following columns: logged-in user name, tty name, current time with date and ip-address from where this users initiated the connection.
$ who am i
john     pts/2        2009-03-28 23:04 (dev-db-server)

$ who mom likes
john     pts/2        2009-03-28 23:04 (dev-db-server)

Warning: Don't try "who mom hates" command.
Also, if you do su to some other user, this command will give the information about the logged in user name details.

4. Get the user login history at any time

last command will give login history for a specific username. If we don’t give any argument for this command, it will list login history for all users. By default this information will read from /var/log/wtmp file. The output of this command contains the following columns:
  • User name
  • Tty device number
  • Login date and time
  • Logout time
  • Total working time
$ last jason
jason   pts/0        dev-db-server   Fri Mar 27 22:57   still logged in
jason   pts/0        dev-db-server   Fri Mar 27 22:09 - 22:54  (00:45)
jason   pts/0        dev-db-server   Wed Mar 25 19:58 - 22:26  (02:28)
jason   pts/1        dev-db-server   Mon Mar 16 20:10 - 21:44  (01:33)
jason   pts/0        192.168.201.11  Fri Mar 13 08:35 - 16:46  (08:11)
jason   pts/1        192.168.201.12  Thu Mar 12 09:03 - 09:19  (00:15)
jason   pts/0        dev-db-server   Wed Mar 11 20:11 - 20:50  (00:39