Product: TIBCO Spotfire®
List of additional parameters when Kerberos and High availability enabled on the Hadoop cluster
What are the list of additional parameters required when you have Kerberos and High availability enabled on the Hadoop cluster and you want to connect that data source to TIBCO Spotfire Data Science?
Here is the list of additional parameters that need to configured when you add a data source to TIBCO Spotfire Data Science.
Kerberos Related :-
alpine.principal=alpine/chorus.alpinenow.local@ALPINENOW.LOCAL
-
alpine.keytab=/home/chorus/keytab/alpine.keytab
-
dfs.datanode.kerberos.principal=hdfs/_HOST@TDS.LOCAL
-
dfs.namenode.kerberos.principal=hdfs/_HOST@TDS.LOCAL
-
yarn.resourcemanager.principal=yarn/_HOST@TDS.LOCAL (The Kerberos principal for the resource manager.)
-
mapreduce.jobhistory.principal=mapred/_HOST@TDS.LOCAL
Note: _HOST allows any host to connect using this principle.
Protections:
-
spark.hadoop.hadoop.rpc.protection=privacy
-
hadoop.security.authentication=kerberos // Only when you have kerberos
When Data in Transit Encryption is enabled on CDH cluster:
-
hadoop.rpc.protection=privacy ( authentication is default)
-
dfs.data.transfer.protection=privacy
Yarn Parameters:
You can get the following parameters from yarn-site.xml on the Hadoop server.-
yarn.app.mapreduce.am.staging-dir=/tmp
-
yarn.resourcemanager.admin.address=cdh516dare.tds.local:8033 (The address of the Resource manager admin interface.)
-
yarn.resourcemanager.resource-tracker.address=cdh516dare.tds.local:8031 (The address of the Resource tracker admin interface.)
-
yarn.resourcemanager.scheduler.address=cdh516dare.tds.local:8030 (The address of scheduler interface)
-
yarn.resourcemanager.webapp.address=cdh516dare.tds.local:8088 ( HTTP Address of Resource manager)
-
yarn.resourcemanager.webapp.https.address=cdh516dare.tds.local:8090 ( HTTPS Address of Resource manager)
-
yarn.application.classpath= ( We get this value when you run the command yarn classpath on CDH server’s command line)
High availability:
You can get the following parameters from Hdfs-site.xml on the Hadoop server. We are just giving the name to the service
-
dfs .nameservices=nameservice1
dfs.ha.namenodes.[nameservice ID] - unique identifiers for each NameNode in the nameservice. We need to configure a list of comma-separated NameNode IDs. This will be used by DataNodes to determine all the NameNodes in the cluster. For example, if you use mycluster as the NameService ID previously, and you wanted to use nn1 and nn2 as the individual IDs of the NameNodes, you would configure this as follows:
-
dfs.ha.namenodes.nameservice1=namenode64,namenode72
For communication between nodes, we use RPC protocol and the following parameters establishes that communication.
For both of the previously-configured NameNode IDs, set the full address and RPC port of the NameNode process.-
dfs.namenode.rpc-address.nameservice1.namenode64=nn1.alpinenow.local:8020
-
dfs.namenode.rpc-address.nameservice1.namenode72=nn2.alpinenow.local:8020
dfs.client.failover.proxy.provider.[nameservice ID] - the Java class that HDFS clients use to contact the Active NameNode. Configure the name of the Java class which the DFS client will use to determine which NameNode is the current active, and therefore which NameNode is currently serving client requests. The only implementation which currently ships with Hadoop is the ConfiguredFailoverProxyProvider, so use this unless you are using a custom one.
-
dfs.client.failover.proxy.provider.nameservice1=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.namenode.http-address.[nameservice ID].[name node ID] - the fully-qualified HTTP address for each NameNode to listen on. Similarly to rpc-address above, set the addresses for both NameNodes' HTTP servers to listen on.
-
dfs.namenode.http-address.nameservice1.namenode64=nn1.alpinenow.local:50070
-
dfs.namenode.http-address.nameservice1.namenode72=nn2.alpinenow.local:50070
If you have HTTPS enabled on CDH
-
dfs.namenode.https-address.nameservice1.namenode64=nn1.alpinenow.local:50470
-
dfs.namenode.https-address.nameservice1.namenode72=nn2.alpinenow.local:50470
-
dfs.namenode.servicerpc-address.nameservice1.namenode64=nn1.alpinenow.local:8022
-
dfs.namenode.servicerpc-address.nameservice1.namenode72=nn2.alpinenow.local:8022
-
dfs.ha.automatic-failover.enabled.nameservice1=true
If Resource Manager is configured for High Availability:
You can get the following parameters from yarn-site.xml on the Hadoop server.
-
yarn.resourcemanager.ha.rm-ids=rm60,rm70
-
yarn.resourcemanager.webapp.https.address.rm70=nn2.alpinenow.local:8090
-
yarn.resourcemanager.webapp.address.rm70=nn2.alpinenow.local:8088
-
yarn.resourcemanager.admin.address.rm70=nn2.alpinenow.local:8033
-
yarn.resourcemanager.resource-tracker.address.rm70=nn2.alpinenow.local:8031
-
yarn.resourcemanager.scheduler.address.rm70=nn2.alpinenow.local:8030
-
yarn.resourcemanager.address.rm70=nn2.alpinenow.local:8032
-
yarn.resourcemanager.webapp.https.address.rm60=nn1.alpinenow.local:8090
-
yarn.resourcemanager.webapp.address.rm60=nn1.alpinenow.local:8088
-
yarn.resourcemanager.admin.address.rm60=nn1.alpinenow.local:8033
-
yarn.resourcemanager.resource-tracker.address.rm60=nn1.alpinenow.local:8031
-
yarn.resourcemanager.scheduler.address.rm60=nn1.alpinenow.local:8030
-
yarn.resourcemanager.address.rm60=nn1.alpinenow.local:8032
-
yarn.resourcemanager.zk-address=cm.alpinenow.local:2181,nn1.alpinenow.local:2181,nn2.alpinenow.local:2181
-
yarn.resourcemanager.recovery.enabled=true
-
yarn.resourcemanager.ha.automatic-failover.embedded=true
-
yarn.resourcemanager.ha.automatic-failover.enabled=true
-
yarn.resourcemanager.ha.enabled=true
-
failover_resource_manager_hosts=cdh516node1.tds.local,cdh516node2.tds.local
Mapreduce:
You can get the following parameters from Mapred-site.xml on the Hadoop server.
-
mapreduce.job.map.output.collector.class=org.apache.hadoop.mapred.MapTask$MapOutputBuffer (The MapOutputCollector implementation(s) to use. This may be a comma-separated list of class names, in which case the map task will try to initialize each of the collectors in turn. The first to successfully initialize will be used.
-
mapreduce.job.reduce.shuffle.consumer.plugin.class=org.apache.hadoop.mapreduce.task.reduce.Shuffle. (Name of the class whose instance will be used to send shuffle requests by reducetasks of this job. The class must be an instance of org.apache.hadoop.mapred.ShuffleConsumerPlugin.)
-
mapreduce.jobhistory.address=cdh6dite.tds.local:10020 (MapReduce JobHistory Server IPC host:port)
-
mapreduce.jobhistory.webapp.address=cdh6dite.tds.local:19888 (MapReduce JobHistory Server Web UI host:port)
-
Mapreduce.application.classpath= ( We get this value when you run the command hadoop classpath on CDH server’s command line)
If using Hive:
-
hive.metastore.client.connect.retry.delay=1
-
hive.metastore.client.socket.timeout=600
Hive with Kerberos:
-
hive.hiveserver2.uris=jdbc:hive2://cm.alpinenow.local:10000/default
-
hive.metastore.kerberos.principal=hive/_HOST@ALPINENOW.LOCAL( realm name)
-
hive.server2.authentication.kerberos.principal=hive/_HOST@ALPINENOW.LOCAL
Spark History Server:
The following parameters need to be added when you have spark service on the Hadoop cluster.
-
spark.yarn.historyServer.address=http://172.27.0.3:18088
-
spark.eventLog.dir=hdfs://172.27.0.3:8020/user/spark/applicationHistory
-
spark.eventLog.enabled=true
Comments
0 comments
Article is closed for comments.