请教一个并行运算的问题
-
以代码为例,在一个函数中有如下代码
DynamicList<label> CellLabel; DynamicList<scalar> CellCenterX; DynamicList<scalar> CellCenterZ; //markfield 是一个volScalarField, 一部分为0,另一部分为1 //我想得到markfield[i] = 1网格的网格编号和中心点x,z坐标 forAll(markField_, i) { if(markField_[i]) { CellLabel.append(i); CellCenterX.append(mesh_.C()[i].component(0)); CellCenterZ.append(mesh_.C()[i].component(2)); } } forAll(CellLabel, i) //输出 { Info << "i = " << CellLabel[i] << ", " << CellCenterX[i] << ", " << CellCenterZ[i] <<endl;; }
此代码在单核运行时无问题,输出的结果类似:
i = 20287, 2.4625, -0.39375 i = 20288, 2.4625, -0.39125 i = 20289, 2.4625, -0.39875 i = 20290, 2.4625, -0.39625 i = 20589, 2.4625, -0.38625 i = 20590, 2.4625, -0.38875 i = 20612, 2.4625, -0.38375 i = 20613, 2.4625, -0.38125 i = 20614, 2.4675, -0.39375 i = 20615, 2.4675, -0.39125 i = 20616, 2.4675, -0.39875 i = 20617, 2.4675, -0.39625
然而在并行时出现问题,直接报错
[wdx-Precision-7920-Tower:06322] *** Process received signal *** [wdx-Precision-7920-Tower:06322] Signal: Segmentation fault (11) [wdx-Precision-7920-Tower:06322] Signal code: (-6) [wdx-Precision-7920-Tower:06322] Failing at address: 0x3e8000018b2 [wdx-Precision-7920-Tower:06322] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7fd0451a0f20] [wdx-Precision-7920-Tower:06322] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7fd0451a0e97] [wdx-Precision-7920-Tower:06322] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7fd0451a0f20] [wdx-Precision-7920-Tower:06322] [ 3] /home/wdx/OpenFOAM-6/platforms/linux64GccDPInt32Opt/lib/libMassSource.so(_ZN4Foam19massSourceWaveMaker15CalDynamicListsEv+0x316)[0x7fd04750cb96] [wdx-Precision-7920-Tower:06322] [ 4] /home/wdx/OpenFOAM-6/platforms/linux64GccDPInt32Opt/lib/libMassSource.so(_ZN4Foam19massSourceWaveMakerC1ERKNS_12IOdictionaryERNS_14GeometricFieldIdNS_12fvPatchFieldENS_7volMeshEEES8_RNS4_INS_6VectorIdEES5_S6_EE+0x9c3)[0x7fd04750e053] [wdx-Precision-7920-Tower:06322] [ 5] DXFlow(+0x4941a)[0x5557003bc41a] [wdx-Precision-7920-Tower:06322] [ 6] [wdx-Precision-7920-Tower:06316] *** Process received signal *** /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fd045183b97] [wdx-Precision-7920-Tower:06322] [ 7] [wdx-Precision-7920-Tower:06316] Signal: Segmentation fault (11) [wdx-Precision-7920-Tower:06316] Signal code: (-6) [wdx-Precision-7920-Tower:06316] Failing at address: 0x3e8000018ac DXFlow(+0x57a9a)[0x5557003caa9a]
错误出现在访问动态数组的时候(即输出的时候),查了查是内存访问的问题。我觉得主要原因应该是缺少并行语句,我对MPI编程了解很有限,只大概了解reduce这个函数,但是了解的也不深。我也查了些资料,发现这个比较有参考价值:这里。
然而我还是没有解决现在的问题,所以来请教一下大家,或者谁能提供一点OF中MPI函数的资料?
谢谢大家!
-
你好,感谢你的回复,你说的内容我之前有过了解,Pout可以输出所有processor的信息,每个网格有自己的编号。
可能是我描述问题不够精确,我换一个说法。
程序流程如下:
1、我创建了一个空容器,根据网格位置信息选定某一区域的网格,向容器里面添加元素(网格编号);
2、后续对这个容器进行操作,因此这个容器不能为空这个程序在单核的时候没有问题,寻找网格编号算法无误。但是并行时出现了问题,由于计算域被分块,并不是所有processor上的网格位置信息都位于目标区域,不包含目标网格的processor不会进行添加元素的操作,这样就导致在不同的processor之间有不同大小的容器,因此并行时会出现发散的情况。
我现在想尝试将所有的容器进行gather和scatter操作,但是没有成功,不知谁能给我点建议?谢谢