nextuppreviouscontentsindex
MOLCAS manual:

Next: 8.7 cht3 Up: 8. Programs Previous: 8.5 ccsdt

Subsections



8.6 chcc

The CHCC is a Closed-Shell Coupled-Clusters Singles and Doubles program based exclusively on the Cholesky (or RI) decomposed 2-electron integrals aimed towards calculation of large systems on highly parallel architectures. Use of point-group symmetry is not implemented. Main advantage compared to the CCSDT module in MOLCAS is in its more efficient parallelization and dramatically lowered memory (and eventually disk) requirements.


8.6.1 Dependencies

CHCC requires a previous run of the RHF SCF program to produce molecular orbitals and orbital energies stored in RUNFILE. The SCF program (as well as SEWARD) must be run in Cholesky/RI mode.

The algorithm used for almost complete elimination of the CHCC limits in calculated system size due to the computer memory bottleneck relies on blocking of the virtual orbitals. Number of blocks (further also referred to as the ``large'' segmentation, LARGe), $\rm N'$, should be as small as possible, because increasing of the segmentation brings in more CPU and I/O overhead. Furthermore, blocking can be ``fine tuned'' by, so called, ``small'' segmentation (SMALl), $\rm N''$, which affects only the (typically) most demanding $\rm O^2V^4$ scaling terms. The ``large'' segmentation can range from 1 to 32, ``small'' segmentation from 1 to 8, but their product, i.e. ``large x small'' must be no more than 64.

Selected blocking also determines the number of ``independent'' parallel tasks that must be executed in each iteration of the CCSD equations. In other words, particular segmentation predetermines the optimal number of computational nodes (i.e., if the best possible parallelization is desired). If the requested ``large'' segmentation is $\rm N'$, then $\rm N'^2$ terms scaling as $\rm O^3V^3$ and 1/2 $\rm N'^2$ terms scaling as $\rm O^2V^4$ result. Depending on which of these terms dominated in the calculations ($\rm O^3V^3$ is more demanding for systems with large number of occupied orbitals and rather small basis set, while $\rm O^2V^4$ dominated for relatively large basis sets, i.e. large number of virtual orbitals), number of these task should be divisible by the number of computational nodes for optimal performance. To make it simple, as a rule of thumb, $\rm N'^2$/2 should be divisible by the number of nodes, since the $\rm O^3V^3$ are typically twice less expensive then the $\rm O^2V^4$ step. Otherwise, any reasonable (i.e. the number of tasks is larger than the number of computational nodes, obviously) combination is allowed.


8.6.2 Files

8.6.2.1 Input files

CHCC will use the following input files: CHVEC, CHRED, CHORST, RUNFILE, and CHOR2F (for more information see [*]).

8.6.2.2 Output files

FileContents
L0xxxx, L1xxxx, L2xxxxMO-transformed Cholesky vectors
T2xxxxT2 $\rm (ij,a'b')$ excitation amplitudes
RstFilCommunication file containing T1 amplitudes, restart informations, etc.


8.6.3 Input

The input for each module is preceded by its name like:

  &CHCC
Optional keywords
KeywordMeaning
TITLeThis keyword is followed by one title line.
FROZenInteger on the following line specifies number of inactive occupied orbitals in the CCSD calculation. (Default=0)
DELEtedInteger on the following line specifies number of inactive virtual orbitals in the CCSD calculation. (Default=0)
LARGeInteger on the following line specifies the main segmentation of the virtual orbitals. Value must be between 1 (no segmentation) and 32. Product of Large and Small segmentation must be lower than 64. (Default=1)
SMALlInteger on the following line specifies the auxiliary segmentation of the virtual orbitals. Value must be between 1 (no segmentation) and 8. Product of Large and Small segmentation must be lower than 64. Small segmentation doesn't generate extra parallel tasks. (Default=1)
CHSEgmentationInteger on the following line specifies the block size of the auxiliary (Cholesky/RI) index. Value must lower than the minimal dimension of the auxiliary index on each computational node. (Default=100)
MHKEyInteger on the following line specifies if library BLAS (MHKEy=1) or hard-coded fortran vector-vector, matrix-vector and matrix-matrix manipulation is used. (Default=1)
NOGEnerateThis keyword specifies that the pre-CCSD steps (regeneration of integrals from the Cholesky/RI vectors, etc.) are skipped. (Default=OFF)
ONTHeflyThis keyword specifies that all integral types scaling steeper then O2V2 are generated "on-the-fly" from the Cholesky/RI vectors. Use of this keyword leads to dramatically savings of the disk resources, but leads to significant arithmetic overhead. Keywords "ONTHefly" and "PRECalculate" are mutually exclusive. (Default=OFF)
PRECalculateThis keyword specifies that all integral are precalculated before the CCSD iterative procedure starts. Use of this keyword leads to significant consumption of the disk space, especially is single-processor runs. (Default=ON)
NODIstributeThis keyword (in combination with the "PRECalculate" keyword) specifies that all integral are stored on each computational node. In case of all integrals being stored on each node, extra permutation symmetry can be applied, thus leading to significant savings of the disk space. However, in case of massively parallel runs (i.e. more than $\approx$8 nodes), savings from keeping only subset of integrals required on particular node are more significant than savings due to permutational symmetry. (Default=OFF)
JOINlkeyThe parameter on the following line specifies, which algorithm is used for precalculation and of the integrals in parallel run. In parallel runs, SEWARD produces AO Cholesky/RI vectors segmented in auxiliary index over parallel nodes. Depending on the network bandwidth and computational power of each node, different algorithms can lead to optimal performance. Following options are available:
0 - None: no cumulation of Cholesky/RI vectors is needed (debug only).
1 - Minimal: Cholesky/RI vectors are cumulated prior to integral precalculation. Low network bandwidth is required.
2 - Medium: O2V2 integrals are generated from local Cholesky/RI vectors and cumulated along with the Cholesky/RI vectors afterwards. Other integrals are calculated from cumulated intermediates.
3 - Full: All integrals are generated from local Cholesky/RI vectors and cumulated afterwards. High network bandwidth is required.
(Default=2)
MAXIterationsInteger on the following line specifies maximum number of CCSD iteration (Default=40)
RESTartThis keyword specifies that CCSD calculation is restarted from previous run. This keyword is currently under development, thus disabled. (Default=OFF)
THREsholdDouble precision floating point number on the following line specifies the convergence threshold for the CCSD correlation energy. (Default=1.0d-6)
PRINtkeyThe integer on the following line specifies the print level in output
1 - Minimal
2 - Minimal + timings of each step of the CCSD iterations
10 - Debug
(Default=1)
END of inputThis keyword indicates that there is no more input to be read.



  &CHCC  &END
Title
Benzene  dimer
Frozen
12
Deleted
0
Large
4
Small
2
CHSEgment
100
Precalculate
Join
2
Maxiter
50
Threshold
1.0d-6
Print
2
End  of  Input


next up previous contents index
Next: 8.7 cht3 Up: 8. Programs Previous: 8.5 ccsdt