Comments
Description
Transcript
次元縮約理論構築およびRPAコード大規模並列化
/-$(WM]U_VKRl %43-=#1I\YRPAdkh9$ : LaFeAsO, -(ET)2Cu(SCN)2, EtMe3Sb[Pd(dmit)2]2 中村 和磨 (東大工, A03-9) Post constrained RPA Project: Reduction of spatial dimension KN-Yoshimoto-Nohara-Imada, J. Phys. Soc. Jpn. 79, 123708 (2010) z1 z2 z2 z3 z4 z1 z3 z4 KAZUMA NAKAMURA (A03-9) YOSHIHIDE YOSHIMOTO (A02-5) MASATOSHI IMADA (A03-9) Acknowledge: YOSHIRO NOHARA (Max Plank Institute) Aim and Background Strong correlation and quantum fluctuation from first principles and prediction of new phases and functions of correlated materials Ab initio construction of effective model describing Low-energy property Model analysis of derived model considering strong correlation and quantum fluctuation in high accuracy LDA+Dynamical-Mean-Field Theory, V. I. Anisimov, et al. J. Phys. Cond. Mat., 9, 767 (1997) LDA+path-integral-renormalization-group; Y. Imai, I. V. Solovyev, M. Imada, PRL 95, 176405 (2005) Feasibility Studies (2006-prenent) (1) Iron-bansed superconductors: - KN-Arita-Imada, JPSJ 77, 093711 (2008) Miyake-KN-Arita-Imada, JPSJ 79, 044705 (2009) Misawa-KN-Imada, JPSJ, 80, 023704 (2011) KN-Yoshimoto-Nohara-Imada, JPSJ 79, 123708 (2010) (2) Organic compounds: - KN-Yoshimoto-Kosugi-Arita-Imada, JPSJ 78, 083710 (2009) - Shinaoka-Misawa-KN-Imada, in preparation (3) Alkali-cluster-in-zeolite systems: - KN-Koretsune-Arita, PRB 80, 043941 (2009) (4) Transition metal and its oxides: - KN-Arita-Yoshimoto-Tsuneyuki, PRB 74, 235113 (2006) - Miyake-Aryasetiawan-Imada, PRB 80, 155134 (2009) (5) Excited states of semiconductors: - KN-Yoshimoto-Arita-Tsuneyuki-Imada, PRB 77,195126(2008) (6) Review: - Imada-Miyake, JPSJ 79, 112001 (2010) Low-energy Hamiltonian 1) Basis function 2) Transfer integral 3) Screened Coulomb, Screened exchange 1) Maximally localized Wannier function (MarzariEEEEVanderbilt 1997, Souza-Marzari-Vanderbilt 2002) 2) Matrix elements for DFT Kohn-Sham Hamiltonian 3) constrained RPA; Original idea Aryasetiawan et.al. 2004, Solovyev-Imada 2005 Practical detail KN-Arita-Imada, JPSJ 77, 093711, 2008 RPA polarizability: Virtual (V) Ef (1) (2) Target (T) Occupied (O) (3) (4) LaFeAsO: constrained RPA Interaction (eV) KN-Arita-Imada, JPSJ 77, 093711 (2008) 1/r bare constrained RPA full RPA 1/(6.7r) cRPA is 3D interaction with long-range tail decaying with power r (Angstrom) LaFeAsO What‘s the problem ? We derive ab initio parameters for 3D model, while we solve 2D model in the analysis stage LaO layer FeAs layer We have serious problem on “dimensional inconsistency” LaFeAsO is quasi-2D system Derived model = 3D model, Analyzed model = 2D model Considering strong quantum fluctuation effects with high accuracy is considerably difficult for the 3D model Reducing 3D to 2D KEY IDEA: renormalize spatial dimension “Dimensional Downfolding” We extend cRPA idea to the degree of freedom of “spatial space” 3D Interlayer interaction Intralayer interaction 2D delete Renormalized interaction delete Interlayer screening "a.@S 8,O_LTV \^'W V:_ (d) WiekjX cfg; LW"X $: &AS 5M`_[W PJ] Computational details: 1. 2. 3. with 4. 5. 6. with LaFeAsO: Band & Wannier Typical quasi-2D system, good target of present study xy yz LaO FeAs LaO FeAs z2 zx x2-y2 t 300 meV tD 10 meV Interaction (eV) LaFeAsO: 2D downfolded bare 3D-cRPA full RPA r (Angstrom) 2D-cRPA 2D-cRPA Summary Nakamura-Yoshimoto-Nohara-Imada, J. Phys. Soc. Jpn. 79, 123708 (2010) Interaction (eV) We developed a new ab initio downfolding scheme for deriving effective low-energy models in low dimensions It justify 2D short-ranged Hubbard models as effective models from first principles r (Angstrom) r (Angstrom) Performance Report for Massively-Parallel Project For constrained-RPA code KAZUMA NAKAMURA (A03-9) YOSHIHIDE YOSHIMOTO (A02-5) Acknowledge: YOSHIRO NOHARA (Max Plank Institute) YUICHIRO MATSUSHITA (OSHIYAMA Lab) HIROAKI ISHIZUKA (MOTOME Lab) Computational cost Nk Nk Nb Nb NPW cost (Nk )2(Nb)2 NPW (Nk )2(Nb)2 NPW NkNb = NkNbNPWO(10) O(10) = 10,000 (if Nk =100, Nb =1,000) Need: distributed-memory code EtMe3Sb[Pd(dmit)2]2 Memory size of ~ 400 Gbyte with Nband=2000, Nk=125, NPW = 100000 The data cannot be stored by single node alone Need of development for “distributed-memory RPA code” For massively parallelization I Proposed by YOSHIHIDE YOSHIMOTO Division of data; Step1 Step2 Step3 Step4 occ unocc occ Data split unocc 1 2 3 4 5 6 7 8 9 10 1 E2 3 4 5 6 7 8 9 unocc 1 2 3 4 5 6 7 8 9 10 Data send to MPI 10 1 2 3 4 5 6 7 8 9 10 1 E2 3 4 5 6 7 8 9 10 4 5 6 9 10 occ occ unocc 1 2 3 7 8 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 Calc For massively parallelization II Step6 Step7 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 10 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 10 1 2 3 4 5 6 7 8 9 Data Rotation MPI_SENDRECV Calc 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 +10 Step8 Step9 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 9 10 1 2 3 6 7 8 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 9 10 1 2 3 4 5 6 7 8 4 5 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 +10 Data Rotation MPI_SENDRECV Calc MPI_COMM_SPLIT TNR core=128 W2a5H_ MPI_COMM_SPLIT (q1) (q2) (q3) (q4) - 4comm (q ; 7 ?) - 8MPI*4OMP/comm (;0; B7 ?) Performance of our Code: Benchmark for small System: SrVO3 SrVO3@kashiwa 2q* *;0 B7 WZC(n=MPIOMP) n time(sec) 1 341.4 4(1x4) 89.9 8(2x4) 49.5 12(3x4) 33.8 16(4x4) 27.3 +CCCE +CCCCC>+ 98.2 94.9 3.8 97.7 86.2 6.9 98.3 84.1 10.1 98.1 78.2 12.5 SrVO3@kashiwa 20q* *;0 B7 +7 C(n=COMMMPIOMP) >+ +CCCE +CCCCC>+ 98.6 91.3 7.3 99.6 89.3 28.6 99.8 87.6 70.1 99.9 85.8 137.3 + + n time(sec) 1 3590 8( 1x2x4) 492 32( 4x2x4) 126 80(10x2x4) 51 160(20x2x4) 26 db db db Test Run at 2011/1/14: 2048-cores calculation Performance of our code: Benchmark for large system: C60 C60@kashiwa 1q* *;0 B7 WZC(n=MPIOMP) n time(sec) +CCCC +CCCCCC>+ 1 15639.1 4( 1x4) 4077.0 98.6 95.9 3.8 8( 2x4) 2108.3 98.9 92.7 7.4 16( 4x4) 1071.3 99.4 91.2 14.6 32( 8x4) 542.6 99.6 90.1 28.8 64(16x4) 297.9 99.7 82.0 52.5 C60@kashiwa 32q* *;0 B7 +7 C(n=COMMMPIOMP) 7 ?WZVQGRW6<!SF_LTV) +CCCC +CCCCC>+ 99.98 98.81 126.72 99.99 97.79 250.24 100.00 98.60 504.96 100.00 97.62 999.04 100.00 94.04 1925.76 + + >+ n time(sec) 64( 1x16x4) 9202.83 128( 2x16x4) 4657.06 256( 4x16x4) 2352.64 512( 8x16x4) 1166.69 1024(16x16x4) 589.33 2048(32x16x4) 305.81 db db db Product Run at 2011/2/11: 4096-cores calculation Constrained RPA for dmit Condition of product run: - Nk=75 (553), Nband = 1000 (Nocc= 464, Npocc= 4, Nvir= 532), Ecut( ) = 36 Ry (100,000 PWs), Ecut() = 4.0 Ry (3,200 PWs) Architecture and performance: - SGI Altix ICE 8400EX sytem X5570(4core)2 Ifort 11.1, SGI-oriented MPI, InfiniBand 4096 core (4comm128MPI8omp) Total time = 43h19min Dielectric function: dmit and -bedt dmit M(q+G) -bedt |q + G| (a.u) - 4096 cores - 43h19min - kashiwa |q + G| (a.u) - 128 cores 384h (16days) SR11000@ITC 1/6 of dmit Convergence: -bedt dmit 20.0eV M(q (q+G)) Energy e gy (eV) 12.5eV ||q +G| G| (a.u) ( ) |q +G| (a.u) Interaction (eV) 3D-cRPA Interaction: dmit and -bedt dmit -bedt r (Angstrom) r (Angstrom) bare 3D-cRPA Unfortunately dmit yet to be converged… APPENDIX Computational data: z2 z1 z3 z4 z2 z3 -Cu(SCN)2: Band & Wannier Geometry Wannier t 65 meV tD 0.1 meV Interaction (eV) -Cu(SCN)2: 2D downfolded bare 3D-cRPA full RPA r (Angstrom) 2D-cRPA 2D-cRPA Interaction (eV) Screening length LaFeAsO zero at 8.4Å r (Å) c=8.4Å -Cu(SCN)2 zero at 16.4Å r (Å) c=16.4Å Thus, screening length of interlayer screening corresponds to the c value Feynman Diagram for Screened interaction z1 z2 Coulomb interaction between electrons at z1 and z4 are screened by RPA polarization of (z2,z3) z3 z4 Interlayer screening z2 z1 z3 z4 Electrons at z1 and z4 are in target layer, while screened electrons exist in z2 and z3 on other layer other types of interlayer screening: z3 z3 z2 z1 z4 z2 z11 z4 z1 z2 z4 z3 Computational details: (0) Below is post-cRPA story (1) Target-band-RPA (2) Fourier transform of wave vector in BZ reciprocal lattice vector in-plane, out-of-plane Layer 1 CCCLayer 2 = target z2 z1 z3 z3 z4 Layer 3 z2 z3 z3 Layer 1 CCCLayer 2 = target z2 z11 z4 Layer 3 z2 z3 z3 Layer 1 CCCLayer 2 = target Layer 3 z1 z4 z2 z2 z3 Layer 1 CCCLayer 2 = target z2 z1 z4 z3 Layer 3 z2 We have to cut this polarization to avoid double counting of it Layer 1 CCCLayer 2 = target z2 z1 z4 z3 Layer 3 z2 z3 (3) Polarization cutting CUT 0 (4) Inverse FT of cut g, g’: reciprocal lattice vector of super lattice (5) 2D dielectric function 2D (6) 2D screened Coulomb (7) 2D screened exchange Interaction (eV) LaFeAsO: cRPA (previous slide) bare 3D-cRPA 3D-cRPA full RPA r (Angstrom) Program structure Polarization do q = 1, Nk do = 1, Npair do k = 1, Nk call FFT module to calculate enddo call TETRAHEDRON module to calculate do G=1, NPW do G’=1, NPW do = 1,N do k=1, Nk enddo enddo enddo enddo enndo enddo