How to check Hadoop and Matlab Integrated Properly or Not ???

Question

Pulkesh Haran 2017 年 5 月 4 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/338701-how-to-check-hadoop-and-matlab-integrated-properly-or-not

コメント済み: lov kumar 2019 年 6 月 2 日

Hi

We integrated Matlab R2016b with Hadoop-2.7.2 ...but we are not sure it is working properly ....how we can check the program is running on cluster and each node is contributing in processing ...

1. It is taking more time for map-reduce (on cluster with 50 Nodes ) compare to matlab map-reduce with single computer..

2. Where to set Matlab Distributed Computer Server Properties ...like how many nodes are there, parallel pool etc .

3.How to see Matlab+Hadoop Cluster Configuration in Matlab Interface ??

Please provide me all detail in answer Thanks

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

lov kumar 2019 年 6 月 2 日

Please help me.

how to fix this error:

Error using mapreduce (line 124)

The HADOOP job failed to submit. It is possible that there is some issue with the HADOOP configuration.

Error in bg1 (line 9)

meanDelay = mapreduce(ds,@meanArrivalDelayMapper,@meanArrivalDelayReducer,mr,...

I am using this code:

setenv('HADOOP_HOME','C:/hadoop-2.8.0');

cluster = parallel.cluster.Hadoop;

mr = mapreducer(cluster);

ds = datastore('hdfs://localhost:9000/lov/airlinesmall.csv','TreatAsMissing','NA',...

'SelectedVariableNames','ArrDelay','ReadSize',1000);

preview(ds)

outputFolder = 'hdfs://localhost:9000/results/out1';

meanDelay = mapreduce(ds,@meanArrivalDelayMapper,@meanArrivalDelayReducer,mr,...

'OutputFolder',outputFolder)

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Rick Amos 2017 年 5 月 12 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/338701-how-to-check-hadoop-and-matlab-integrated-properly-or-not#answer_266740

MATLAB Online で開く

To look at whether MATLAB is running on the Hadoop cluster correctly, your best bet is to look at the Hadoop/Yarn Web UI. By default, this is:

http://hadoophostname:8088/

Where hadoophostname should be replaced by the hostname of the head node of Hadoop. During a mapreduce operation in MATLAB, you should see a running job in the web UI.

If you don't see a job running, it might be possible that the Hadoop installation you provided to MATLAB is not configured to run jobs in cluster mode. This can happen if the Hadoop property mapreduce.jobtracker.address found in ${HADOOP_INSTALL}/etc/hadoop/mapred-site.xml has not been set or has been set to "local". This property should be set to the hostname of the headnode of the cluster.

In a Hadoop cluster, the number of workers that are launched are controlled by Hadoop. By default, it will run as many workers as it can fit in the memory given to it.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Pulkesh Haran 2017 年 5 月 16 日

MATLAB Online で開く

Then we integrated Correctly Matlab and Hadoop. we can see Running program on Hadoop Interface.. But I have following questions ...

1. Matlab+Hadoop cluster (with 50 Nodes) taking more time for running map-reduce Job compare to single machine Matlab Map-Reduce ?

2. What is role of MDCS ??

3. How to define or configure property of MDCS ??

4. In Matlab GUI how can we see Matlab+Hadoop Cluster or Program running on Cluster with detailed info.

5. Matlab+hadoop CLuster with 50 nodes taking more time compare to normal matlab map-reduce running on single machine. How to can we resolve that problem.

6.We are getting Errors like :

      while (Converting 50K images into Sequence File)
    i. Unable to read MAT file ... : not a binary File .
    ii. we are getting serialization error.

how to resolve them ??

Thanks Kojiro on 11 May 2017 at 0:51

1. It's seems strange.

2. MDCS does parallel execution of mapreduce with Hadoop.

3. In order to use MDCS with Hadoop, you need to set the following parameters.

In Hadoop 2.x settings, set "yarn.nodemanager.user-home-dir" ($HADOOP_HOME/etc/hadoop/yarn-site.xml)

and in MATLAB, setenv 'HADOOP_HOME' and create a parallel.cluster.Hadoop and mapreducer.

This example will help you.

4. You can monitor the status using Hadoop Web UI( http://YOUR_HADOOP_HOST:8088/ ) or by the following command in terminal,

yarn application -status ID

5.6 Could you give us more detail MATLAB scripts? or did you also ask to MathWorks technical support ? Pulkesh Haran on 14 May 2017 at 7:08

This Problem we are Facing while Running our Matlab Script on Cluster for 50K images.

We are Creating Sequence File for 50 thousand images. * I am attaching Matlab Code and other details. Please help us doing same.*

------------------------------- Matlab Error1--------------------------------------------

Parallel mapreduce execution on the Hadoop cluster:

****************************

MAPREDUCE PROGRESS *******************************

Map 0% Reduce 0%

Map 1% Reduce 0%

Map 33% Reduce 0%

Map 93% Reduce 0%

Error using mapreduce (line 118)

Unable to read MAT-file /tmp/tp1ce5fe8e_0189_4e64_85a3_b671c61453a4/task_0_675_MAP_4.mat: not a binary MAT-file.

Error in create_seq (line 101)

seqds = mapreduce(imageDS, @identityMap, @identityReduce,'OutputFolder',output_identity);

Error -2

> whos -file '/home/nitw_viper_user/task_0_1081_MAP_4.mat'

Name Size Bytes Class Attributes

Error 1x1 2336 MException

>> Error

Error =

MException with properties:

    identifier: 'parallel:internal:DeserializationException'
       message: 'Deserialization threw an exception.'
         cause: {0×1 cell}
         stack: [3×1 struct]

Error -3

>> create_seq

Hadoop with properties:

HadoopInstallFolder: '/home/nitw_viper_user/hadoop-

2.7.2'

             HadoopConfigurationFile: ''
                  SparkInstallFolder: ''
                    HadoopProperties: [2×1 containers.Map]
                     SparkProperties: [0×1 containers.Map]
                   ClusterMatlabRoot: '/usr/local/MATLAB/R2016b'
    RequiresMathWorksHostedLicensing: 0
                       LicenseNumber: ''
                     AutoAttachFiles: 1
                       AttachedFiles: {}
                    AdditionalPaths: {}

Parallel mapreduce execution on the Hadoop cluster:

**************************** * MAPREDUCE PROGRESS *****************************

Map 0% Reduce 0%

Map 1% Reduce 0%

Map 2% Reduce 0%

Map 22% Reduce 0%

Map 40% Reduce 0%

Map 80% Reduce 0%

Error using mapreduce (line 118)

The HADOOP job failed to complete.

Error in create_seq (line 101)

seqds = mapreduce(imageDS, @identityMap, @identityReduce,'OutputFolder',output_identity);

Caused by:

    Error using distcompdeserialize
    Deserialization threw an exception.
    Error using distcompdeserialize
    Deserialization threw an exception.
    Error using distcompdeserialize
    Deserialization threw an exception.
    Error using distcompdeserialize
    Deserialization threw an exception.
    Error using distcompdeserialize
    Deserialization threw an exception.
   Error using distcompdeserialize
    Deserialization threw an exception.
    Error using distcompdeserialize
    Deserialization threw an exception.
    Error using distcompdeserialize
    Deserialization threw an exception.

>>

date:07-05-2017

[WARN] BlockReaderFactory - I/O error constructing remote block reader. <java.io.IOException: Got error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461>java.io.IOException: Got error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461

at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.ch eckBlockOpStatus(DataTransferProtoUtil.java:140)

at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockRea der2.java:456)

at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockR eader2.java:424)

at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockR eaderFactory.java:818)

at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp (BlockReaderFactory.java:697)

at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.ja va:355)

at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java :656)

at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream .java:882)

at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)

at java.io.DataInputStream.read(Unknown Source)

at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)

at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)

at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)

[WARN] DFSClient - Failed to connect to /192.168.193.177:50010 for block, add to deadNodes and continue. java.io.IOException: Got

error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461 <java.io.IOException: Got error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461>java.io.IOException: Got error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461

at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.ch eckBlockOpStatus(DataTransferProtoUtil.java:140)

at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockRea der2.java:456)

at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockR eader2.java:424)

at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockR eaderFactory.java:818)

at o

rg.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp( BlockReaderFactory.java:697)

at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:656) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) at java.io.DataInputStream.read(Unknown Source) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769) at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)

[INFO] DFSClient - Successfully connected to /192.168.193.167:50010 for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 Error using mapreduce (line 118) Unable to read MAT-file /tmp/tp3c745940_f508_4326_93f7_fc5f6fb9ef06/task_0_805_MAP_1.mat: not a binary MAT-file.

Error in main (line 270) res = mapreduce(seqds, @Ltrp_db1_seq_file_mapper, @Ltrp_db1_reducer, 'OutputFolder', Ltrp_db1_seq_file_result); bold

サインインしてコメントする。

How to check Hadoop and Matlab Integrated Properly or Not ???

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

How to check Hadoop and Matlab Integrated Properly or Not ???

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示