When comparing two different kinds of group therapy or two individual treatments where patients within each arm are nested within care providers, clustering of observations may occur in both arms. The arms may differ in terms of (a) the intraclass correlation, (b) the outcome variance, (c) the cluster size, and (d) the number of clusters, and there may be some ideal group size or ideal caseload in case of care providers, fixing the cluster size. For this case, optimal cluster numbers are derived for a linear mixed model analysis of the treatment effect under cost constraints as well as under power constraints. To account for uncertain prior knowledge on relevant model parameters, also maximin sample sizes are given. Formulas for sample size calculation are derived, based on the standard normal as the asymptotic distribution of the test statistic. For small sample sizes, an extensive numerical evaluation shows that in a two-tailed test employing restricted maximum likelihood estimation, a safe correction for both 80% and 90% power, is to add three clusters to each arm for a 5% type I error rate and four clusters to each arm for a 1% type I error rate.