0 Replies Latest reply on Jun 10, 2015 10:49 PM by ricepot100

    UDA exception of ibv_reg_mr

      Dear sirs,

           I encounter a problem when I use ROCE mode in Hadoop 2.2.0, the details is as follow (And english is not my mother lanugage, if some words unproper, please pardon ):

       

      CPU Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz

      DDR:  192GB

      Mellanox Technologies MT27520 Family

       

      OS: Suse 11 SP3    x86_64 

      Hadoop version: 2.2.0 or 2.4.1

      Hadoop nodes: 2 datanodes

      uda version:      3.4.1-0.0

      OFED: MLNX_OFED_LINUX-2.4-1.0.4-sles11sp3-x86_64.iso

       

      driver: mlx4_en

      version: 2.4-1.0.4 (Mar  8 2015)

      firmware-version: 2.31.5270

      bus-info: 0000:05:00.0

      supports-statistics: yes

      supports-test: yes

      supports-eeprom-access: no

      supports-register-dump: no

       

      I have configured the  Mellanox netcard as eth but not IB, so could use ROCE mode

      Test solution: HiBench Terasort

      1. modify the mapred-site.xml as follow:

      ......

      <property>

              <name>mapreduce.job.reduce.shuffle.consumer.plugin.class</name>

              <value>com.mellanox.hadoop.mapred.UdaShuffleConsumerPlugin</value>

      </property>

      <property>

              <name>mapreduce.reduce.memory.mb</name>

              <value>2048</value>

      </property>

      <property>

              <name>mapreduce.reduce.java.opts</name>

              <value>-Xmx1024m</value>

      </property>

      ......

      2. modify the yarn-site.xml as follow:

      <property>

              <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

              <value>com.mellanox.hadoop.mapred.UdaShuffleHandler</value>

      </property>

      2. During run the Terasort, the output looks fine, no errors,

       

      3. But the performance is not improved comparing with normal socket test

       

      4. trace the syslog(for example: application_1433928017396_0005/container_1433928017396_0005_01_000143/syslog), I see some UDA errors as follow:

      PS: From the syslog, I see that, do the action of register RDMA memory twice, but only the first time is success, the second time is failed !!!

      And I don't know, why the register memory failed, the output message on the terminal will also finish the reduce steps. Is it go to the normal socket path when encounter the problem ?

       

       

      2015-06-10 20:33:31,486 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]

      2015-06-10 20:33:31,566 INFO [main] org.apache.hadoop.mapred.ReduceTask: Using ShuffleConsumerPlugin: com.mellanox.hadoop.mapred.UdaShuffleConsumerPlugin@6b4ffe93

      2015-06-10 20:33:31,567 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: init - Using UdaShuffleConsumerPlugin

      2015-06-10 20:33:31,571 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.job.shuffle.input.buffer.percent is deprecated. Instead, use mapreduce.reduce.shuffle.input.buffer.percent

      2015-06-10 20:33:31,571 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: Using JAVA Xmx with mapred.job.shuffle.input.buffer.percent to limit UDA shuffle memory

      2015-06-10 20:33:31,571 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: RDMA shuffle memory is limited to 686MB

      2015-06-10 20:33:31,571 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: UDA: user prefer rdma.buf.size=1024KB

      2015-06-10 20:33:31,572 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: UDA: minimum rdma.buf.size=16KB

      2015-06-10 20:33:31,572 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: UDA has limited support for map task speculative execution

      2015-06-10 20:33:31,572 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: UDA: number of segments to fetch: 300

      2015-06-10 20:33:31,575 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: going to execute C++ thru JNI with argc=: 12 cmd: [-w, 256, -r, 9011, -a, 1, -m, 1, -g, /home/f00318826/cloud/hadoop-2.2.0/logs/userlogs/application_1433928017396_0005/container_1433928017396_0005_01_000143/, -s, 1024]

      2015-06-10 20:33:31,577 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin:  +++>>> invoking UdaBridge.startNative: isNetMerger=true

      2015-06-10 20:33:31,577 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: UDA version is 3.4.1-0.0 (Merger/NetMergerMain.cc:55)

      2015-06-10 20:33:31,577 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: Compiled on the Jun  8 2015, 11:29:01

      (Merger/NetMergerMain.cc:56)

      2015-06-10 20:33:31,577 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: size of rdma buffer as passed from java is 1048576

      (Merger/NetMergerMain.cc:58)

      2015-06-10 20:33:31,578 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: main initialization finished ret=0 (UdaBridge.cc:235)

      2015-06-10 20:33:31,578 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin:  <<<+++ after UdaBridge.startNative ret=0

      2015-06-10 20:33:31,580 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: mParams array is [300, job_1433928017396_0005, attempt_1433928017396_0005_r_000002_0, 0, 1048576, 16384, org.apache.hadoop.io.Text, null, 262144, 720332378, 1, /home/f00318826/cloud/hadoop-2.2.0/temp/nm-local-dir/usercache/f00318826/appcache/application_1433928017396_0005]

      2015-06-10 20:33:31,580 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: UDA: sending INIT_COMMAND

      2015-06-10 20:33:31,581 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: got params from java: hadoop_cmd->count=13,  num_maps=300, job_id=job_1433928017396_0005, reduce_task_id=attempt_1433928017396_0005_r_000002_0, lpq_size=0, rdma.buf.size=1048576B, minBuffer=16384B, cmp_func=org.apache.hadoop.io.Text, comp_alg=null, comp_block_size=262144, shuffleMemorySize=720332378B (Merger/reducer.cc:65)

      2015-06-10 20:33:31,582 INFO [Thread-9] org.apache.hadoop.mapred.ShuffleConsumerPlugin: C++ THREAD STARTED (by RdmaClient) and attached to JVM tid=0xd10f3700 (CommUtils/UdaUtil.cc:46)

      2015-06-10 20:33:31,589 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: Successfully init'ed device (DataNet/RDMAComm.cc:370)

      2015-06-10 20:33:31,589 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: Going to register memory with contig-pages (DataNet/RDMAComm.cc:123)

      2015-06-10 20:33:31,589 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: Going to register RDMA memory. size=650117120 (DataNet/RDMAComm.cc:69)

      2015-06-10 20:33:31,773 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: After RDMA memory registration. size=650117120 (DataNet/RDMAComm.cc:107)

      2015-06-10 20:33:31,778 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: Successfully init'ed device (DataNet/RDMAComm.cc:370)

      2015-06-10 20:33:31,778 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: Going to register memory with contig-pages (DataNet/RDMAComm.cc:123)

      2015-06-10 20:33:31,778 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: Going to register RDMA memory. size=650117120 (DataNet/RDMAComm.cc:69)

      2015-06-10 20:33:31,779 ERROR [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: ibv_reg_mr failed for memory of total_size=650117120 , MSG=Cannot allocate memory (errno=12), memory-pointer=140028050276032 (DataNet/RDMAComm.cc:98)

      2015-06-10 20:33:31,779 ERROR [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: raising com/mellanox/hadoop/mapred/UdaRuntimeException to java side, with info=ibv_reg_mr failure (UdaBridge.cc:91)

      2015-06-10 20:33:31,779 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: checking mapred.rdma.developer.mode...

      2015-06-10 20:33:31,781 ERROR [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: Critical failure has occured in UdaPlugin - We'll try to use vanilla as fallbackPlugin.

          Exception is:com.mellanox.hadoop.mapred.UdaRuntimeException: This is a UDA exception invoked from C++ with the following info: ibv_reg_mr failure

          And with the following C++ stacktrace:

              /home/f00318826/cloud/hadoop-2.2.0/share/hadoop/mapreduce/libuda.so(_ZN12UdaExceptionC2EPKc+0x30) [0x7f5ad20ff340]

              /home/f00318826/cloud/hadoop-2.2.0/share/hadoop/mapreduce/libuda.so(_Z20netlev_init_rdma_memPPvmP10netlev_devm+0x114) [0x7f5ad20fbd94]

              /home/f00318826/cloud/hadoop-2.2.0/share/hadoop/mapreduce/libuda.so(_Z16rdma_mem_managerPPvmP10netlev_dev+0x84) [0x7f5ad20fc064]

              /home/f00318826/cloud/hadoop-2.2.0/share/hadoop/mapreduce/libuda.so(_Z10create_devP11ibv_contextP10netlev_ctxPFvP14progress_eventPvEPS5_l+0x65) [0x7f5ad20fd485]

              /home/f00318826/cloud/hadoop-2.2.0/share/hadoop/mapreduce/libuda.so(_Z14map_ib_devicesP10netlev_ctxPFvP14progress_eventPvEPS3_l+0x7b) [0x7f5ad20fd66b]

              /home/f00318826/cloud/hadoop-2.2.0/share/hadoop/mapreduce/libuda.so(_ZN10RdmaClient12register_memEP11memory_pool13double_buffer+0x25) [0x7f5ad20f81e5]

              /home/f00318826/cloud/hadoop-2.2.0/share/hadoop/mapreduce/libuda.so(_Z15handle_init_msgP10hadoop_cmd+0x243) [0x7f5ad2108663]

              /home/f00318826/cloud/hadoop-2.2.0/share/hadoop/mapreduce/libuda.so(_Z23reduce_downcall_handlerRKSs+0x98) [0x7f5ad2108eb8]

              /home/f00318826/cloud/hadoop-2.2.0/share/hadoop/mapreduce/libuda.so(Java_com_mellanox_hadoop_mapred_UdaBridge_doCommandNative+0xa3) [0x7f5ad210edb3]

              [0x7f5ae0600f90]

          at com.mellanox.hadoop.mapred.UdaBridge.doCommandNative(Native Method)

          at com.mellanox.hadoop.mapred.UdaBridge.doCommand(UdaBridge.java:63)

          at com.mellanox.hadoop.mapred.UdaPluginRT.<init>(UdaPlugin.java:316)

          at com.mellanox.hadoop.mapred.UdaShuffleConsumerPluginShared.init(UdaShuffleConsumerPluginShared.java:191)

          at com.mellanox.hadoop.mapred.UdaShuffleConsumerPluginShared.init(UdaShuffleConsumerPluginShared.java:200)

          at com.mellanox.hadoop.mapred.UdaShuffleConsumerPlugin.init(UdaShuffleConsumerPlugin.java:64)

          at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:379)

          at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)

          at java.security.AccessController.doPrivileged(Native Method)

          at javax.security.auth.Subject.doAs(Subject.java:415)

          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

          at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

       

      2015-06-10 20:33:31,809 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager: memoryLimit=720332352, maxSingleShuffleLimit=180083088, mergeThreshold=475419360, ioSortFactor=10, memToMemMergeOutputsThreshold=10

      2015-06-10 20:33:31,811 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: Succesfuly switched to Using fallbackPlugin

      2015-06-10 20:33:31,811 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: fetchOutputs: Using fallbackPlugin

      2015-06-10 20:33:31,812 INFO [main] org.apache.hadoop.mapred.ShuffleConsumerPlugin: createKVIterator: Using fallbackPlugin

      2015-06-10 20:33:31,814 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1433928017396_0005_r_000002_0 Thread started: EventFetcher for fetching Map Completion Events

      2015-06-10 20:33:31,831 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning GPU4:13562 with 1 to fetcher#5

      2015-06-10 20:33:31,832 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to GPU4:13562 to fetcher#5

      2015-06-10 20:33:31,833 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning GPU5:13562 with 1 to fetcher#1

      2015-06-10 20:33:31,833 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 1 of 1 to GPU5:13562 to fetcher#1

      2015-06-10 20:33:31,848 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1433928017396_0005_r_000002_0: Got 139 new map-outputs

      2015-06-10 20:33:31,931 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=13562/mapOutput?job=job_1433928017396_0005&reduce=2&map=attempt_1433928017396_0005_m_000008_0 sent hash and received reply

      2015-06-10 20:33:31,931 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=13562/mapOutput?job=job_1433928017396_0005&reduce=2&map=attempt_1433928017396_0005_m_000000_0 sent hash and received reply

      2015-06-10 20:33:31,939 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#5 about to shuffle output of map attempt_1433928017396_0005_m_000000_0 decomp: 14028770 len: 14028774 to MEMORY

      2015-06-10 20:33:31,943 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1433928017396_0005_m_000008_0 decomp: 14038130 len: 14038134 to MEMORY

      2015-06-10 20:33:31,963 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 14028770 bytes from map-output for attempt_1433928017396_0005_m_000000_0

      2015-06-10 20:33:31,966 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 14028770, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->28066900

      2015-06-10 20:33:31,967 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: GPU4:13562 freed by fetcher#5 in 136ms

      2015-06-10 20:33:31,967 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning GPU4:13562 with 75 to fetcher#5

      2015-06-10 20:33:31,967 INFO [fetcher#5] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 20 of 75 to GPU4:13562 to fetcher#5

      2015-06-10 20:33:31,967 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 14038130 bytes from map-output for attempt_1433928017396_0005_m_000008_0

      2015-06-10 20:33:31,968 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 14038130, inMemoryMapOutputs.size() -> 2, commitMemory -> 14028770, usedMemory ->28066900

      2015-06-10 20:33:31,968 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: GPU5:13562 freed by fetcher#1 in 135ms