...

次元縮約理論構築およびRPAコード大規模並列化

by user

on
Category: Documents
12

views

Report

Comments

Transcript

次元縮約理論構築およびRPAコード大規模並列化
/-$(WM]U_VKRl
%43-=#1I\YRPAdkh9$
:
LaFeAsO, -(ET)2Cu(SCN)2, EtMe3Sb[Pd(dmit)2]2
中村 和磨 (東大工, A03-9) Post constrained RPA Project:
Reduction of spatial dimension KN-Yoshimoto-Nohara-Imada, J. Phys. Soc. Jpn. 79, 123708 (2010)
z1
z2
z2
z3
z4
z1
z3
z4
KAZUMA NAKAMURA (A03-9)
YOSHIHIDE YOSHIMOTO (A02-5)
MASATOSHI IMADA (A03-9)
Acknowledge:
YOSHIRO NOHARA (Max Plank Institute)
Aim and Background
Strong correlation and quantum fluctuation
from first principles and prediction of new
phases and functions of correlated materials
Ab initio construction of effective model
describing Low-energy property
Model analysis of derived model considering
strong correlation and quantum fluctuation
in high accuracy
LDA+Dynamical-Mean-Field Theory,
V. I. Anisimov, et al. J. Phys. Cond. Mat., 9, 767 (1997)
LDA+path-integral-renormalization-group;
Y. Imai, I. V. Solovyev, M. Imada, PRL 95, 176405 (2005)
Feasibility Studies (2006-prenent)
(1) Iron-bansed superconductors:
-
KN-Arita-Imada, JPSJ 77, 093711 (2008)
Miyake-KN-Arita-Imada, JPSJ 79, 044705 (2009)
Misawa-KN-Imada, JPSJ, 80, 023704 (2011)
KN-Yoshimoto-Nohara-Imada, JPSJ 79, 123708 (2010)
(2) Organic compounds:
- KN-Yoshimoto-Kosugi-Arita-Imada, JPSJ 78, 083710 (2009)
- Shinaoka-Misawa-KN-Imada, in preparation
(3) Alkali-cluster-in-zeolite systems:
- KN-Koretsune-Arita, PRB 80, 043941 (2009)
(4) Transition metal and its oxides:
- KN-Arita-Yoshimoto-Tsuneyuki, PRB 74, 235113 (2006)
- Miyake-Aryasetiawan-Imada, PRB 80, 155134 (2009)
(5) Excited states of semiconductors:
- KN-Yoshimoto-Arita-Tsuneyuki-Imada, PRB 77,195126(2008)
(6) Review:
- Imada-Miyake, JPSJ 79, 112001 (2010)
Low-energy Hamiltonian
1) Basis function
2) Transfer integral
3) Screened Coulomb,
Screened exchange
1) Maximally localized Wannier function (MarzariEEEEVanderbilt 1997, Souza-Marzari-Vanderbilt 2002) 2) Matrix elements for DFT Kohn-Sham Hamiltonian
3) constrained RPA; Original idea Aryasetiawan et.al. 2004, Solovyev-Imada 2005
Practical detail KN-Arita-Imada, JPSJ 77, 093711, 2008
RPA polarizability:
Virtual (V)
Ef
(1)
(2)
Target (T)
Occupied (O)
(3)
(4)
LaFeAsO: constrained RPA
Interaction (eV)
KN-Arita-Imada, JPSJ 77, 093711 (2008)
1/r
bare
constrained RPA
full RPA
1/(6.7r)
cRPA is 3D interaction
with long-range tail
decaying with power
r (Angstrom)
LaFeAsO
What‘s the problem ? We derive ab initio parameters for
3D model, while we solve 2D model
in the analysis stage
LaO layer
FeAs layer
We have serious problem on
“dimensional inconsistency”
LaFeAsO is quasi-2D system
Derived model = 3D model,
Analyzed model = 2D model
Considering strong quantum
fluctuation effects with high
accuracy is considerably
difficult for the 3D model
Reducing 3D to 2D
KEY IDEA: renormalize spatial dimension
“Dimensional Downfolding”
We extend cRPA idea to the degree
of freedom of “spatial space”
3D
Interlayer
interaction
Intralayer
interaction
2D
delete
Renormalized
interaction
delete
Interlayer screening
"a.@S
8,O_LTV
\^'W
V:_
(d) WiekjX
cfg; LW"X
$: &AS
5M`_[W
PJ]
Computational details:
1.
2.
3.
with
4.
5.
6.
with
LaFeAsO: Band & Wannier
Typical quasi-2D system,
good target of present study
xy
yz
LaO
FeAs
LaO
FeAs
z2
zx
x2-y2
t 300 meV
tD 10 meV
Interaction (eV)
LaFeAsO: 2D downfolded
bare
3D-cRPA
full RPA
r (Angstrom)
2D-cRPA
2D-cRPA
Summary
Nakamura-Yoshimoto-Nohara-Imada,
J. Phys. Soc. Jpn. 79, 123708 (2010)
Interaction (eV)
We developed a new ab initio downfolding
scheme for deriving effective low-energy
models in low dimensions
It justify 2D short-ranged Hubbard models
as effective models from first principles
r (Angstrom)
r (Angstrom)
Performance Report for
Massively-Parallel Project
For constrained-RPA code KAZUMA NAKAMURA (A03-9)
YOSHIHIDE YOSHIMOTO (A02-5)
Acknowledge:
YOSHIRO NOHARA (Max Plank Institute)
YUICHIRO MATSUSHITA (OSHIYAMA Lab)
HIROAKI ISHIZUKA (MOTOME Lab)
Computational cost
Nk
Nk Nb Nb
NPW
cost (Nk )2(Nb)2 NPW
(Nk )2(Nb)2 NPW NkNb
=
NkNbNPWO(10) O(10)
= 10,000 (if Nk =100, Nb =1,000)
Need: distributed-memory code
EtMe3Sb[Pd(dmit)2]2
Memory size of ~ 400
Gbyte with Nband=2000,
Nk=125, NPW = 100000
The data cannot be
stored by single node
alone
Need of development for
“distributed-memory RPA code”
For massively parallelization I Proposed by
YOSHIHIDE
YOSHIMOTO
Division of data; Step1
Step2
Step3
Step4
occ
unocc
occ
Data split
unocc
1
2
3
4
5
6
7
8
9
10
1
E2
3
4
5
6
7
8
9
unocc 1
2
3
4
5
6
7
8
9
10 Data send
to MPI
10
1
2
3
4
5
6
7
8
9
10
1
E2
3
4
5
6
7
8
9
10
4
5
6
9
10
occ
occ
unocc 1
2
3
7
8
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10
Calc For massively parallelization II Step6
Step7
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
10
1
2
3
4
5
6
7
8
9 1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
10
1
2
3
4
5
6
7
8
9 Data Rotation
MPI_SENDRECV
Calc 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 +10
Step8
Step9
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
9
10
1
2
3
6
7
8
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
9
10
1
2
3
4
5
6
7
8
4
5
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 +10
Data Rotation
MPI_SENDRECV
Calc MPI_COMM_SPLIT
TNR
core=128
W2a5H_
MPI_COMM_SPLIT
(q1)
(q2)
(q3)
(q4)
- 4comm
(q
; 7
?) - 8MPI*4OMP/comm
(;0; B7
?) Performance of our
Code: Benchmark for small
System: SrVO3
SrVO3@kashiwa 2q*
*;0
B7
WZC(n=MPIOMP)
n
time(sec)
1
341.4
4(1x4)
89.9
8(2x4)
49.5
12(3x4)
33.8
16(4x4)
27.3
+CCCE
+CCCCC>+
98.2
94.9
3.8
97.7
86.2
6.9
98.3
84.1
10.1
98.1
78.2
12.5
SrVO3@kashiwa 20q*
*;0
B7
+7
C(n=COMMMPIOMP)
>+
+CCCE
+CCCCC>+
98.6
91.3
7.3
99.6
89.3
28.6
99.8
87.6
70.1
99.9
85.8
137.3
+
+
n
time(sec)
1
3590
8( 1x2x4)
492
32( 4x2x4)
126
80(10x2x4)
51
160(20x2x4)
26
db
db
db
Test Run at 2011/1/14:
2048-cores calculation
Performance of our
code: Benchmark for large
system: C60
C60@kashiwa 1q*
*;0
B7
WZC(n=MPIOMP)
n
time(sec) +CCCC
+CCCCCC>+
1
15639.1
4( 1x4) 4077.0
98.6
95.9
3.8
8( 2x4) 2108.3
98.9
92.7
7.4
16( 4x4) 1071.3
99.4
91.2
14.6
32( 8x4)
542.6
99.6
90.1
28.8
64(16x4)
297.9
99.7
82.0
52.5
C60@kashiwa 32q*
*;0
B7
+7
C(n=COMMMPIOMP)
7
?WZVQGRW6<!SF_LTV)
+CCCC
+CCCCC>+
99.98
98.81 126.72
99.99
97.79 250.24
100.00
98.60 504.96
100.00
97.62 999.04
100.00
94.04 1925.76 +
+
>+
n
time(sec)
64( 1x16x4) 9202.83
128( 2x16x4) 4657.06
256( 4x16x4) 2352.64
512( 8x16x4) 1166.69
1024(16x16x4)
589.33
2048(32x16x4)
305.81
db
db
db
Product Run at 2011/2/11:
4096-cores calculation
Constrained RPA for dmit
Condition of product run:
-
Nk=75 (553),
Nband = 1000 (Nocc= 464, Npocc= 4, Nvir= 532),
Ecut(
) = 36 Ry (100,000 PWs),
Ecut() = 4.0 Ry (3,200 PWs)
Architecture and performance:
-
SGI Altix ICE 8400EX sytem
X5570(4core)2
Ifort 11.1, SGI-oriented MPI, InfiniBand
4096 core (4comm128MPI8omp)
Total time = 43h19min
Dielectric function: dmit and -bedt
dmit
M(q+G)
-bedt
|q + G| (a.u)
- 4096 cores
- 43h19min
- kashiwa
|q + G| (a.u)
-
128 cores
384h (16days)
SR11000@ITC
1/6 of dmit
Convergence:
-bedt
dmit
20.0eV
M(q
(q+G))
Energy
e gy (eV)
12.5eV
||q +G|
G| (a.u)
( )
|q +G| (a.u)
Interaction (eV)
3D-cRPA Interaction: dmit and -bedt
dmit
-bedt
r (Angstrom)
r (Angstrom)
bare
3D-cRPA
Unfortunately dmit yet to be converged…
APPENDIX
Computational data:
z2
z1
z3
z4
z2
z3
-Cu(SCN)2: Band & Wannier
Geometry
Wannier
t 65 meV
tD 0.1 meV Interaction (eV)
-Cu(SCN)2: 2D downfolded bare
3D-cRPA
full RPA
r (Angstrom)
2D-cRPA
2D-cRPA
Interaction (eV)
Screening length
LaFeAsO
zero at 8.4Å
r (Å)
c=8.4Å
-Cu(SCN)2
zero at 16.4Å
r (Å)
c=16.4Å
Thus, screening length of interlayer
screening corresponds to the c value
Feynman Diagram for
Screened interaction
z1
z2
Coulomb interaction
between electrons at
z1 and z4 are screened
by RPA polarization of
(z2,z3)
z3
z4
Interlayer screening
z2
z1
z3
z4
Electrons at z1 and z4
are in target layer, while
screened electrons exist
in z2 and z3 on other layer
other types of interlayer screening:
z3
z3
z2
z1
z4
z2
z11
z4
z1
z2
z4
z3
Computational details:
(0) Below is post-cRPA story
(1) Target-band-RPA (2) Fourier transform of wave vector in BZ
reciprocal lattice vector
in-plane, out-of-plane
Layer 1
CCCLayer 2
= target
z2
z1
z3
z3
z4
Layer 3
z2
z3
z3
Layer 1
CCCLayer 2
= target
z2
z11
z4
Layer 3
z2
z3
z3
Layer 1
CCCLayer 2
= target
Layer 3
z1
z4
z2
z2
z3
Layer 1
CCCLayer 2
= target
z2
z1
z4
z3
Layer 3
z2
We have to cut this polarization to avoid
double counting of it
Layer 1
CCCLayer 2
= target
z2
z1
z4
z3
Layer 3
z2
z3
(3) Polarization cutting
CUT
0
(4) Inverse FT of cut
g, g’:
reciprocal lattice vector of super lattice
(5) 2D dielectric function 2D
(6) 2D screened Coulomb
(7) 2D screened exchange
Interaction (eV)
LaFeAsO: cRPA (previous slide)
bare
3D-cRPA
3D-cRPA
full RPA
r (Angstrom)
Program structure
Polarization
do q = 1, Nk
do = 1, Npair
do k = 1, Nk
call FFT module to calculate
enddo
call TETRAHEDRON module to calculate
do G=1, NPW
do G’=1, NPW
do = 1,N
do k=1, Nk
enddo
enddo
enddo
enddo
enndo
enddo
Fly UP