目次

準備-クラスタ構成

1. クラスタープロファイルマネージャーの表示

MATLAB のホームタブで、並列メニューの[クラスタの作成と管理]を選択します。

2. クラスタープロファイルの追加

クラスタプロファイルの追加 > MATLAB Job Scheduler をクリックします。

追加したプロファイルはダブルクリックすることで、名前を変更が可能です。
デフォルトのままMJSProfile1でも使用可能です。

3. クラスタープロファイルの編集

プロファイルを選択し、ツールバーの[編集]をクリックして編集します。

このクラスターの説明 Description	lab2022
MATLABジョブスケジューラを実行しているマシンのホスト名 Host	a9.media.hosei.ac.jp
MATLABジョブスケジューラにアクセスするためのユーザー名 Username	lab2022に登録しているユーザー名
各ワーカーで使用する計算スレッドの数 NumThreads	1 デフォルト
ライセンス番号 LisenceNumber	<none> デフォルト

[完了]をクリックします。二枚目の画像は、設定後のMATLAB Job Schedulerクラスタ・プロファイルを示します。

4. クラスタープロファイルを規定に設定

[規定に設定] を選択して、このプロファイルをデフォルトにします。

5. クラスタープロファイルの検証

クラスタープロファイルの検証をクリックします。その後、ユーザー名とパスワードが聞かれます。
クラスタの検証が成功すると、以下のようになります。

演習

解析するファイルの準備

Coding files>Block Process On Large Imageファイルのinput_img.ngをC:\Users\ユーザー名\Documents\MATLABに移す。

移動したファイルが画像左の現在のフォルダーに表示されていれば成功。

クラスタープロファイルの切り替え（localとa9 Server）

localとa9 Serverの切り替えは、それぞれのクラスタープロファイルについて既定の設定を切り替えて行います。

並列処理ありとなしの切り替え

並列処理ありとなしの切り替えは、コマンドにおける'UseParallel',trueを'UseParallel',falseに書き換える。

並列処理あり

tic % Start stopwatch timer
% Input image
input_img = "input_img.jpg"; % Add image path
% Initialize Edge detection function
fun = @(block_struct) edge(block_struct.data,"canny");
% Covert source image from RGB to GRAY
input_image= rgb2gray(imread(input_img));
% Perform Parallel Block Process
result = blockproc(input_image,[25 25],fun, ...
   'UseParallel',true);
toc % Terminate stopwatch timer 
% Show ouput image
imshow(result)

並列処理なし

tic % Start stopwatch timer
% Input image
input_img = "input_img.jpg"; % Add image path
% Initialize Edge detection function
fun = @(block_struct) edge(block_struct.data,"canny");
% Covert source image from RGB to GRAY
input_image= rgb2gray(imread(input_img));
% Perform Parallel Block Process
result = blockproc(input_image,[25 25],fun, ...
   'UseParallel',false);
toc % Terminate stopwatch timer 
% Show ouput image
imshow(result

演習問題

演習1：大きな画像に対するブロック処理

並列処理ありの場合

tic % Start stopwatch timer
% Input image
input_img = "input_img.jpg"; % Add image path
% Initialize Edge detection function
fun = @(block_struct) edge(block_struct.data,"canny");
% Covert source image from RGB to GRAY
input_image= rgb2gray(imread(input_img));
% Perform Parallel Block Process
result = blockproc(input_image,[25 25],fun, ...
   'UseParallel',true);
toc % Terminate stopwatch timer 
% Show ouput image
imshow(result)

並列処理なしの場合

tic % Start stopwatch timer
% Input image
input_img = "input_img.jpg"; % Add image path
% Initialize Edge detection function
fun = @(block_struct) edge(block_struct.data,"canny");
% Covert source image from RGB to GRAY
input_image= rgb2gray(imread(input_img));
% Perform Parallel Block Process
result = blockproc(input_image,[25 25],fun, ...
   'UseParallel',false);
toc % Terminate stopwatch timer 
% Show ouput image
imshow(result

処理時間の比較（単位：秒）

並列処理なし

並列処理あり

windows

(Local)

windows

(Local)

8 workers

Red Hat Enterprise Linux

(a9 Server)

12 workers

11.97

3.30

2.79

演習2：グローバルミニマムの探索

並列処理ありの場合

tic % Start stopwatch timer
% Consider a function with several local minima.
fun = @(x) x.^2 + 4*sin(5*x);
fplot(fun,[-10,10])
rng default % For reproducibility
opts = optimoptions(@fmincon,'Algorithm','sqp');
problem = createOptimProblem('fmincon','objective',...
    fun,'x0',3,'lb',-5,'ub',5,'options',opts);
ms = MultiStart('UseParallel', true);
%To search for the global minimum, run MultiStart on 2000 instances of the problem using the fmincon 'sqp' algorithm.
[x,f] = run(ms,problem,2000)
toc % Terminate stopwatch timer

並列処理なしの場合

tic % Start stopwatch timer
% Consider a function with several local minima.
fun = @(x) x.^2 + 4*sin(5*x);
fplot(fun,[-10,10])
rng default % For reproducibility
opts = optimoptions(@fmincon,'Algorithm','sqp');
problem = createOptimProblem('fmincon','objective',...
    fun,'x0',3,'lb',-5,'ub',5,'options',opts);
ms = MultiStart('UseParallel', false);
%To search for the global minimum, run MultiStart on 2000 instances of the problem using the fmincon 'sqp' algorithm.
[x,f] = run(ms,problem,2000)
toc % Terminate stopwatch timer

処理時間の比較（単位：秒）

並列処理なし

並列処理あり

windows

(Local)

windows

(Local)

8 workers

Red Hat Enterprise Linux

(a9 Server)

12 workers

6.20

1.70

1.52

演習3：SVM分類器の最適化

並列処理ありの場合

tic % Start stopwatch timer 
load ionosphere % Load the ionosphere data set.
rng default
% Find hyperparameters that minimize five-fold cross-validation loss by using automatic hyperparameter optimization. For reproducibility, set the random seed and use the 'expected-improvement-plus' acquisition function.
Mdl = fitcsvm(X,Y,'OptimizeHyperparameters','auto', ...
'HyperparameterOptimizationOptions',struct('UseParallel',true))
toc % Terminate stopwatch timer

並列処理なしの場合

tic % Start stopwatch timer 
load ionosphere % Load the ionosphere data set.
rng default
% Find hyperparameters that minimize five-fold cross-validation loss by using automatic hyperparameter optimization. For reproducibility, set the random seed and use the 'expected-improvement-plus' acquisition function.
Mdl = fitcsvm(X,Y,'OptimizeHyperparameters','auto', ...
'HyperparameterOptimizationOptions',struct('UseParallel',false))
toc % Terminate stopwatch timer

処理時間の比較（単位：秒）

並列処理なし

並列処理あり

windows

(Local)

windows

(Local)

8 workers

Red Hat Enterprise Linux

(a9 Server)

12 workers

32.38

8.23

7.86

演習4：並列処理によるデータのクラスタリング

並列処理ありの場合

Mu = bsxfun(@times,ones(20,300),(1:20)'); % Gaussian mixture mean
rn300 = randn(300,300);
Sigma = rn300'*rn300; % Symmetric and positive-definite covariance
Mdl = gmdistribution(Mu,Sigma); % Define the Gaussian mixture distribution

rng(1); % For reproducibility
X = random(Mdl,10000);
% Specify the options for parallel computing.
stream = RandStream('mlfg6331_64');  % Random number stream
options = statset('UseParallel',1,'UseSubstreams',1,...
    'Streams',stream);
% Cluster the data using k-means clustering. Specify that there are k = 200 clusters in the data and increase the number of iterations. Typically, the objective function contains local minima. Specify 10 replicates to help find a lower, local minimum.

tic; % Start stopwatch timer
[idx,C,sumd,D] = kmeans(X,200,'Options',options,'MaxIter',10000,...
    'Display','final','Replicates',10);
toc % Terminate stopwatch timer

並列処理なしの場合

Mu = bsxfun(@times,ones(20,300),(1:20)'); % Gaussian mixture mean
rn300 = randn(300,300);
Sigma = rn300'*rn300; % Symmetric and positive-definite covariance
Mdl = gmdistribution(Mu,Sigma); % Define the Gaussian mixture distribution

rng(1); % For reproducibility
X = random(Mdl,10000);
% Specify the options for parallel computing.
stream = RandStream('mlfg6331_64');  % Random number stream
options = statset('UseParallel',false,'UseSubstreams',1,...
    'Streams',stream);
% Cluster the data using k-means clustering. Specify that there are k = 200 clusters in the data and increase the number of iterations. Typically, the objective function contains local minima. Specify 10 replicates to help find a lower, local minimum.

tic; % Start stopwatch timer
[idx,C,sumd,D] = kmeans(X,200,'Options',options,'MaxIter',10000,...
    'Display','final','Replicates',10);
toc % Terminate stopwatch timer

処理時間の比較（単位：秒）

並列処理なし

並列処理あり

windows

(Local)

windows

(Local)

8 workers

Red Hat Enterprise Linux

(a9 Server)

12 workers

8.45

9.90

11.00

演習5：複数の GPU での MATLAB 関数の実行

単一の GPU の使用

N = 1000;
r = gpuArray.linspace(0,4,N);
x = rand(1,N,"gpuArray");
numIterations = 1000;
for n=1:numIterations
    x = r.*x.*(1-x);
end
plot(r,x,'.');

経過時間を計測

N = 1000;
r = gpuArray.linspace(0,4,N);
x = rand(1,N,"gpuArray");
numIterations = 1000;
for n=1:numIterations
    x = r.*x.*(1-x);
end
plot(r,x,'.');

parfor による複数の GPU の使用

numGPUs = gpuDeviceCount("available");
parpool(numGPUs);
numSimulations = 100;
X = zeros(numSimulations,N,"gpuArray");
parfor i = 1:numSimulations
    X(i,:) = rand(1,N,"gpuArray");
    for n=1:numIterations
        X(i,:) = r.*X(i,:).*(1-X(i,:));
    end
end
figure
plot(r,X,'.');

サポートされているGPUデバイスがないため、numGPUsは1に変更
経過時間を計測

parpool(1);
numSimulations = 100;
numIterations = 1000;
N = 1000;
r = gpuArray.linspace(0,4,N);
X = zeros(numSimulations,N,"gpuArray");
tic;
parfor i = 1:numSimulations
    X(i,:) = rand(1,N,"gpuArray");
    for n=1:numIterations
        X(i,:) = r.X(i,:).(1-X(i,:));
    end
end
toc;
figure
plot(r,X,'.');

parfeval による複数 GPU の非同期の使用

f(numSimulations) = parallel.FevalFuture;
type myParallelFcn
for i=1:numSimulations
    f(i) = parfeval(@myParallelFcn,1,r);
end
figure
hold on
afterEach(f,@(x) plot(r,x,'.'),0);

演習1~4の並列処理の有無、Localとa9 Serverの比較

演習1~3について
- 並列処理なし（Local）：処理時間が長いが、振れ幅が少ない。
- 並列処理あり（Local）：処理時間が短いが、振れ幅が大きい
- 並列処理あり（a9 Server）：処理時間が短く、振れ幅も少ない
演習4について
- 並列処理なしの方が処理時間が短くなった。

演習1：大きな画像に対するブロック処理

演習2：グローバルミニマムの探索

演習3：SVM分類器の最適化

演習4：並列処理によるデータのクラスタリング

並列処理ありと並列処理なしの違い

並列処理あり
- コマンドにおいて'UseParallel',true
- 処理中の際に左下のアイコンが緑色になる。
並列処理なし
- コマンドにおいて'UseParallel',false
- 処理中の際に左下のアイコンが青色になる。

MATLAB R2022a：Parallel Server Evaluation

準備-クラスタ構成

1. クラスタープロファイルマネージャーの表示

2. クラスタープロファイルの追加

3. クラスタープロファイルの編集

4. クラスタープロファイルを規定に設定

5. クラスタープロファイルの検証

演習

解析するファイルの準備

クラスタープロファイルの切り替え（localとa9 Server）

並列処理ありとなしの切り替え

演習問題

演習1：大きな画像に対するブロック処理

処理時間の比較（単位：秒）

演習2：グローバルミニマムの探索

処理時間の比較（単位：秒）

演習3：SVM分類器の最適化

処理時間の比較（単位：秒）

演習4：並列処理によるデータのクラスタリング

処理時間の比較（単位：秒）

演習5：複数の GPU での MATLAB 関数の実行

演習1~4の並列処理の有無、Localとa9 Serverの比較

並列処理ありと並列処理なしの違い