Richard Gonzalez1
Analysis of Dyad and Group Data

1  LECTURE NOTES #2

1.1  Suggested homework second night

  1. What is an intraclass correlation?

    The intraclass correlation is an index comparing the variability within a group to the variability across groups. The intuition is that if there is an effect due to social interaction, the variability of those subjects who interacted should be different (when measured as a social group) than the variability of subjects who did not interact (such as across social groups).

    There are several ways of measuring variability, and these lead to different measures for the intraclass correlation (for a general review of many measures of the intraclass correlation see Shrout & Fleiss, [1979]). One of the common ways to measure variability uses an ANOVA framework. Within this framework one compares the mean squared between groups (variability of the group means from the grand mean; denoted MSB) to the mean squared within groups (variability of the individual scores from their respective group means; denoted MSW). As stated above, the underlying intuition is that if there is a group effect, then the variability within groups should differ from the variability between groups. This is the same intuition that leads to the omnibus F test in an analysis of variance. The difference in our application is that we are interested in the intraclass correlation as a measure of nonindependence rather than the usual concern of the ANOVA, which is a statistical test of the equality of group means. Below I will present a different, more intuitive, way to think about the intraclass correlation.

    The term ``group'' above can refer to any relevant collection. For example, a researcher might have husbands and wives rate their satisfaction in the marriage. In this context the intraclass compares the variability within a dyad to the variability across dyads. Another example occurs in the context of assessing reliability. Consider four judges who rate several videotapes. Note how this design differs from the previous example, which had different husband and wife pairs. In the present design the same judges are used to rate a standard set of videotapes. Regardless of these design differences, the intraclass correlation is still relevant because it compares the variability within judges (across several videotapes) to the variability across judges (on a single videotape). Thus, the intraclass correlation has many uses such as in reliability theory.

  2. Does the intraclass correlation measure independence?

    The intraclass correlation measures a special meaning of the word independence. The idea is that if there is no group influence, then the variability within groups should be the same as the variability between groups. That is, when s2g = 0 reflecting no variability across groups, then MSB = MSW and the intraclass correlation equals 0. If the variability within groups is less than the variability across groups, then this is evidence of some kind of ``convergence'' within the group. The intraclass correlation will be greater than 0. However, if the variability within the group is greater than across groups, then this is evidence of ``divergence'' within a group and the intraclass correlation will be negative. Negative intraclass correlations need to be interpreted with caution.

  3. When should I use an intraclass correlation in group research?

    The intraclass correlation is used to test the convergence or homogeneity of responses within groups. It applies only to the case of one dependent variable. When you want to test the relationship between two or more dependent variables, then one must turn to more complicated techniques (which we will cover during Days 3 and 4 of this workshop).

    A natural situation to use an intraclass correlation is when group members are indistinguishable from one another. However, it is also possible to do an intraclass correlation when the partners are distinguishable, but it requires a small twist in the computation as shown below.

  4. A general statistical framework for the intraclass correlation

    Assume that each observation is as additive function of three components: the grand mean, an effect due to group, and random noise. These three terms are denoted below as m, a, and e, respectively.

    Yij
    =
    m+ ai + eij
    (1)
    Where i refers to the group (such as the couple number) and j refers to subject within the group (such as roommate 1 and roommate2). So far we aren't going to distinguish between individuals in a group (we'll deal with the distinguishable case later). We assume that the as are normally distributed with mean 0 and variance t2, and that the es are normally distributed with mean 0 and variance s2. The reason the as have a distribution makes sense: roommate pairs were randomly selected to be in the study so the group level effect was a distribution. Here is an outline of a design matrix with 5 roommate couples:

    Roommate
    Roommate Pair 1 2
    1
    2
    3
    4
    5

    This design yields an ANOVA source table conceptually organized as follows:

    SourceSum of Squares Degrees of Freedom Mean Square Terms F p-value
    Between Groups SSB dfb MSB F p
    Within Groups SSW dfw MSW
    Total SST dft

    As I will show below, this source table is very important for computing the intraclass correlation. Note that everything I did in this section is identical to a one-way ANOVA where roommate pair is the grouping code (in this example there was one factor with 5 levels) and the grouping code is treated as a random effect. This makes sense because presumably roommate pairs were randomly sampled by the investigator to participate in the design. Once a roommate pair was randomly selected, then the two roommates are automatically in the study.

  5. What is the formula for an intraclass correlation?

    This question is tricky because there are several different ways to calculate the intraclass correlation depending on which theory one uses (method of moments or maximum likelihood) and how one treats the relevant factors (e.g., random v. fixed).

    The most common formula for the intraclass correlation is in this context

    r
    =
    MSB - MSW
    MSB + (k-1)MSW
    (2)

    The intraclass correlation examines the difference between 1) the deviations from the group mean and the grand mean (i.e., MSB) and 2) the deviations of the individual scores from the group mean (i.e., MSW). This difference is then normalized by a weighted combination of the two components. The numerator is the important piece of this equation-the numerator determines the sign of the intraclass correlation.

  6. How do I compute MSB and MSW?

    It depends on the design. If the group members are indistinguishable, then the design is identical to a one-way ANOVA with the groups as the factor (as I showed above). The MSB and MSW are taken straight from the ANOVA source table.

    For distinguishable group members (e.g., male/female, president/vice president) a two factor ANOVA is computed with groups and people as factors. The variable that is used to distinguish the people in the group should be theoretically meaningful. The two factor ANOVA gives a MS for groups, a MS for people, and an MSW. The MSW from this two factor ANOVA will usually be less than the MSW from the one-way ANOVA treating individuals as indistinguishable. The model for this two factor model is

    Yij
    =
    m+ ai + pj + eij

    Note that there is no interaction term (the reason is that in this formulation there is one observation per cell and it is impossible to extract an independent interaction effect).

  7. Here is a numerical example using data from Snedecor and Cochran. There are 14 dyads. The complete data matrix appears below.

    
    ID Group Person Score 
     1     1      1    71
     2     2      1    79
     3     3      1   105
     4     4      1   115
     5     5      1    76
     6     6      1    83
     7     7      1   114
     8     8      1    57
     9     9      1   114
    10    10      1    94
    11    11      1    75
    12    12      1    76
    13     1      2    71
    14     2      2    82
    15     3      2    99
    16     4      2   114
    17     5      2    70
    18     6      2    82
    19     7      2   113
    20     8      2    44
    21     9      2   113
    22    10      2    91
    23    11      2    83
    24    12      2    72
    

    First, we consider the case when dyad members are indistinguishable (such as in homosexual couples or same-sex roommates). The ANOVA for treating the cases as indistinguishable has one grouping code as a factor.

    
    summary(aov(Score ~ factor(Group)))
    .
                  Df Sum of Sq Mean Sq F Value       Pr(F) 
    Between       11   8990.46 817.314 57.1882 1.26299e-08
    Within        12    171.50  14.292                    
    

    The intraclass correlation for the indistinguishable case is

    0.966
    =
    817.3 - 14.29
    817.3 + 14.29
    The general ANOVA method, of course, can be used for any size group, but a major limitation of this approach is that all groups must be of equal size for the intraclass formula to make sense (usually not a problem in marriage research or in jury research). The unequal group size problem can be handled easily though in this approach, but it requires a different twist on the usual model.

    Now we turn to the case when the group members are distinguishable (such as husbands and wives who can be distinguished on gender). Note this ANOVA has two factors and no interaction term. The design is equivalent to a randomized block design (one observation per cell).

    
    summary(aov(Score ~ factor(Group) + factor(Person)))
    .
                   Df Sum of Sq Mean Sq F Value   Pr(F) 
    Between Group  11   8990.46 817.314 61.8078 0.00000
    Between Person  1     26.04  26.042  1.9693 0.18811
    Within         11    145.46  13.223                
    

    This yields an intraclass correlation of

    0.968
    =
    817.31 - 13.22
    817.31 + 13.22
    Note that the degrees of freedom for Within are less in the distinguishable case. Also, we completely ignored the ``between person'' variability in the computation of the intraclass correlation. The difference between the exchangeable and the distinguishable cases rests completely on what makes up the correct error term.

    In this example the difference between the two versions is trivial but it won't always be. Below I show an example where the intraclass goes from 1 to .78 as one switches analyses from the distinguishable case to the indistinguishable case for the same data.

  8. What do the terms MSB and MSW mean?

    We must first understand the ANOVA model that is being used. The assumption in the exchangeable case is that a particular score Yij for the ith person in the jth group is composed of three parts: a grand mean m, a group effect aj, and noise eij. People from the same group have the same aj but have different eij.

    The aj are assumed to be normally distributed with mean 0 and variance t2, and the eij are assumed to be normally distributed with mean 0 and variance s2e.

    The original question can now be answered, What do the terms MSB and MSW mean? The term MSW is an unbiased sample estimate of s2e. This is the noise attributable to individuals regardless of group membership. The term MSB is the unbiased sample estimate of Ns2g + s2e (where N is the size of the group). This is the variability of the group means and consists of two parts: noise due to people and the true variability of the groups.

  9. When group members are distinguishable do I need to bother with the intraclass correlation?

    If you have distinguishable dyads, then it may occur to you that a Pearson correlation on the raw data (i.e., husband scores correlated with wife scores) may be possible. There are several problems with this. One problem is that the Pearson correlation on the raw data doesn't measure agreement in a strict sense. That is, if you add a 1 to all the husband's score, the Pearson correlation won't change, but there is a sense in which the agreement between husband and wife is different. Unlike the Pearson correlation, the intraclass correlation is sensitive to additive constants.

    Another argument for the intraclass correlation on distinguishable dyads, however, is that the intraclass correlation provides an adjusted R2 (i.e., a better estimate of the population R2). If the group size is greater than two, then the intraclass makes more sense than a Pearson correlation because the latter can only be computed over two people.

  10. Is the intraclass correlation a correlation?

    The intraclass correlation is not what nonstatisticians usually mean by a correlation because an intraclass correlation can be used when the members are indistinguishable whereas the Pearson correlation cannot be computed for indistinguishable cases.

    Further, a Pearson correlation assesses association whereas the intraclass correlation assesses agreement. Association is a different concept from agreement. Association is about how two scores ``go together'' in the sense of a change in one score (or for one group member) produces, or is attached with, a linear change in the other score. However, agreement means that a score in one variable produces, or is attached with, an exact score on the other variable. One can also make use of the underlying geometry and show that association is about the angle between two vectors whereas agreement is about the distance between two vectors.2 I'll explain during the workshop the analogy of angles and association on one hand with distance and agreement on the other.

    Given all those differences it is still possible to put the intraclass correlation can be put into the mathematical form of a correlation. Recall that the correlation is defined as

    r
    =
    cov(X,Y)

    Ö

    var(X)

    Ö

    var(Y)

    All we need to show is that the numerator of the intraclass is a covariance and the denominator of the intraclass is the product of two standard deviations to prove that it is a correlation.

    Recall that the expected values of MSB and MSW are ks2a + s2e and s2e, respectively. With a little high school algebra, one can re-express the intraclass

    MSB - MSW
    MSB + (k-1)MSW
    =
    s2a
    s2a + s2e

    The term s2a is a covariance because two people from the same group share the same a. The variance of each persons' score is s2a + s2e. Thus, the correlation between two people from the same group is

    s2a

    Ö

    s2a + s2e

    Ö

    s2a + s2e
    This term is equivalent to the intraclass correlation and has the form of a covariance divided by the product of the two standard deviations. Thus it is by mathematical definition a correlation.

    The intraclass correlation is a ``mathematical correlation'' but it should not be confused with the standard treatment of a Pearson correlation involving the usual scatterplots because the former is not sensitive to linear transformation of the data.

  11. Can the Pearson correlation equal 1 but the intraclass be less than one?

    If you can compute a meaningful Pearson correlation, then that means you have distinguishable cases (otherwise, the Pearson correlation would not make any sense because you wouldn't know which individual to put in ``column X'' and which individual to put in ``column Y'').

    Consider this example:

    
    example
    .
    ID   Group Person Score 
     1     1      1     3
     2     1      2     3
     3     2      1     4
     4     2      2     4
     5     3      1     5
     6     3      2     5
     7     4      1     6
     8     4      2     6
     9     5      1     7
    10     5      2     7
    

    In the example the scores for each dyad member are identical. The Pearson correlation equals one as does the intraclass correlation for distinguishable cases.

    
    summary(aov(Score ~ factor(Group) + factor(Person)))
                   Df Sum of Sq Mean Sq     F Value    Pr(F) 
     factor(Group)  4        20       5 2.62827e+31 0.000000
    factor(Person)  1         0       0 4.00000e+00 0.105113
         Residuals  4         0       0                     
    

    But now consider the case where one partner always scores one point more. The correlation will still be one. But the only reason we can do a correlation is because the partners within a dyad are distinguishable. Therefore, we must also do a distinguishable intraclass correlation.

    
    Example2 (add one to one partner's score)
    .
       Group Person Score 
     1     1      1     3
     2     1      2     4
     3     2      1     4
     4     2      2     5
     5     3      1     5
     6     3      2     6
     7     4      1     6
     8     4      2     7
     9     5      1     7
    10     5      2     8
    
    
    summary(aov(Score ~ factor(Group) + factor(Person)))
    .
                   Df Sum of Sq Mean Sq      F Value Pr(F) 
     factor(Group)  4      20.0     5.0 1.165237e+31     0
    factor(Person)  1       2.5     2.5 5.826186e+30     0
         Residuals  4       0.0     0.0                   
    
    The one point difference between the two partners is reflected in the MS for person. The MSW (or mean square residual) equals 0 because the partners within a dyad did not differ in variability but did differ by a constant. So, the Pearson correlation equals one as does the intraclass correlation.

    Using the same numerical example, if the partners are indistinguishable, then the intraclass will not be 1 even when the only difference between the partners is that one partner has one point higher than the other partner (which, in the distinguishable case would lead to a perfect Pearson correlation).

    
    summary(aov(Score ~ factor(Group)))
    .
                  Df Sum of Sq Mean Sq F Value     Pr(F) 
    factor(Group)  4      20.0     5.0      10 0.0132602
        Residuals  5       2.5     0.5                  
    
    Using the same data for the indistinguishable case the intraclass correlation becomes
    0.78
    =
    20-2.5
    20+2.5

  12. How do I interpret a negative intraclass correlation?

    The intraclass correlation will be negative whenever MSB  <  MSW. In other words, the intraclass correlation will be negative whenever the variability within groups exceeds the variability across groups. This means that scores in a group ``diverge'' relative to the noise present in the individuals. A negative intraclass correlation is tricky to interpret because the intraclass is bounded on the negative side.

    Unfortunately, the intraclass correlation has a lower bound of

    -1
    k-1
    where k is the group size; so negative correlations are difficult to interpret. Note that the intraclass correlation does go to +1 in the case of perfect convergence. Is an intraclass correlation of -.20 something to be excited about if your theory predicts divergence? If you ran groups of six, then a -.20 is the smallest intraclass correlation you can observe. Note that for dyads the intraclass correlation can go to -1. Perhaps this asymmetric negative bound issue is what Kenny was referring to when he said that negative intraclass correlations can only be interpreted in the context of dyads?

  13. Does the standard formula for the intraclass correlation lead to an unbiased estimate?

    NO! The formula everyone uses is biased.

    r
    =
    MSB - MSW
    MSB + (k-1)MSW

    An unbiased estimate of the intraclass correlation is given by this formula

    r
    =
    MSB - tMSW
    MSB + (k-1)tMSW
    where t = [(g(k-1))/(g(k-1)-2)], g is the number of groups, and k is the number of people in a group. Why everyone persists in using a biased formula is a mystery to me. But people don't seem to care about using biased estimators (e.g., the sample standard deviation we use in psychology-dividing by n - 1-is a biased estimator of the population standard deviation).

    The unbiased form shares the same lower bound as the biased form.

    Because most people don't care about the issue of bias, I won't discuss it again and will continue using biased measures in this workshop. Note that the bias gets very, very small as the number of groups gets large. Usually when you have 20 or 30 groups the bias is so small it won't make any difference in your data analysis.

  14. Are there uses for the intraclass correlation other than group research?

    The intraclass correlation is used in reliability theory. Recall that in classic test theory reliability is defined as

    r11
    =
    s2t
    s2t + s2e
    where t is the ``true score.'' A reliability is a proportion and is commonly estimated by a Pearson correlation. When computing, say, a split-half correlation the reliability is given a proportion-of-variance interpretation even though the calculation is identical to that of a Pearson correlation. Note that the definition of reliability has the same form as the intraclass correlation (Equation  3). During Day 3 of the workshop I'll show how our multi-level group model can be used to test standard models in test theory (such as the multi-method multi-trait method).

    The intraclass correlation also appears in the repeated-measures ANOVA where the assumption is that the intraclass correlation across time periods (as well as time periods × the between-subjects variables) is a constant. This is known as the compound symmetry assumption.

    A third place where the intraclass correlation appears is in the adjusted R2 in the context of a random effects ANOVA. This is analogous to w2 (for a fixed-effects model) and closely related to h2 and R2.

  15. After having spent all that time developing the intraclass as others use it, let me do something wild, something crazy, something completely off-the-wall-let me give you a completely different way to think about and compute the intraclass correlation.

    The framework we use is the pairwise correlation, which dates back to Pearson ([1901]; see also Fisher, [1925]). The idea is to build the nonindependence directly into the organization of the data matrix, and then compute standard estimators on the reorganized data matrix with appropriate corrections to the standard error. The pairwise correlation is so named because each possible within-group pair of scores is used to compute the correlation. For example, on dyads with individuals Adam and Amos in the first dyad, there are two possible pairings: Adam in column one and Amos in column two; or Amos in column one and Adam in column two. This coding is represented symbolically in Table 1. Thus with N = 3 dyads, each column contains 2N = 6 scores because each individual is represented in both columns. The two columns (i.e., variables X  and X¢) are then correlated using the usual product-moment correlation. This correlation is denoted rxx¢, and is called the pairwise intraclass correlation. It is an estimate of the intraclass correlation of one person's score with his or her partner's score. The pairwise intraclass correlation is the maximum likelihood estimate of the intraclass correlation and therefore is endowed with the usual properties of maximum likelihood estimators (such as consistency)3. For theoretical development of the exchangeable case, formulas, and examples see Griffin and Gonzalez ([1995]).

    Table 1: Symbolic representation for the pairwise data setup in the exchangeable case. The first subscript represents the dyad and the second subscript represents the individual. Categorization of individuals as 1 or 2 is arbitrary.

    Variable
    Dyad # X X¢
    1 X11 X12
    X12 X11
    2 X21 X22
    X22 X21
    3 X31 X32
    X32 X31
    4 X41 X42
    X42 X41

    The correlation rxx¢ indexes the absolute similarity between two exchangeable partners in a dyad. This can be seen in a simple scatterplot of X against X¢. On this plot each dyad is represented twice, once as the point (Xi, X¢i) and once as the point (X¢i, Xi). We draw line segments between points from the same dyad. These line segments will all have a slope of -1 and are bisected by the identity line. An aggregate measure of the squared length of these line segments is proportional to rxx¢; it is in this sense that the pairwise intraclass is a measure of similarity between dyad members. Note that when the two individuals in the dyad perfectly agree, then the line segments will have length 0 (i.e., all points will be on the identity line), and the pairwise intraclass correlation rxx¢ will equal 1. An analogous plot was proposed in the context of calibration and resolution of judgment (Liberman & Tversky, [1993]).

    Two examples of these plots appear in Figure 1. The data, from Stinson and Ickes ([1992]), are the frequency of smiles and laughter between dyad members, separately for dyads consisting of strangers and dyads consisting of friends. For dyads of strangers the pairwise intraclass rxx¢ was .72 whereas for dyads of friends rxx¢ was .40. The plot highlights an interesting difference in interaction between friends and strangers. It appears that the interaction pattern between strangers involves matching each other's frequency of smiling to a higher degree than interaction between friends. That is, for strangers both partners' frequency of smiling was more similar than the frequency of smiling between friends. For friends, the interaction pattern consisted of pairs where one partner smiled relatively much more than the other partner. This matching difference was independent from the mean level of smiling. The closed circle on the identity line represents the mean frequency of smiles. Dyads of friends had a higher frequency of smiles than dyads of strangers, yet dyads of strangers had a higher degree of matching (as indexed by the pairwise intraclass).

    Figure 1: Graphical display of the pairwise intraclass correlation. Data are from Stinson & Ickes (1992) and represent the frequency of smiles and laughter for same-sex dyads where both members are either friends or strangers. The point on the identity line represents the mean frequency for strangers and the mean frequency for friends; the length of the line segments is related to the similarity of the within-dyad scores.
    Figure

    It is important to remember that the correlation rxx¢ is computed over 2N pairs. However, because the correlation rxx¢ is based on 2N pairs rather than on N dyads as in the usual case, the test of significance must be adjusted. The sample value rxx¢ can be tested against the null hypothesis that rxx¢ using the asymptotic test4

    Z
    =
    rxx¢ÖN
    (3)
    where N  is the number of dyads and Z follows a standardized normal distributed.

    The pairwise intraclass correlation indexes the similarity of individuals within dyads, and is closely related to other estimators of the intraclass correlation such as the ANOVA estimator (Fisher, [1925]; Haggard, [1958]). However, the pairwise method has several important advantages in the present situation. Most important, it is calculated in the same manner as the usual Pearson correlation: the two ``reverse-coded'' columns are correlated in the usual manner, thus offering ease of computation, flexibility in the use of existing computer packages, and an intuitive link to general correlational methods. It also has certain statistical properties that make it ideal to serve as the basis for more complicated statistics of interdependence (e.g., it is the maximum likelihood estimator of the intraclass correlation on groups of equal size). Moreover, the pairwise method used to compute the intraclass correlation within a single variable can be used to compute the ``cross intraclass correlation'' across different variables, an important index discussed below. Thus, the pairwise approach can extend to multivariate situations.

    The previous example on dyads (Table 1) was defined implicitly on dyads where the members are ``exchangeable''; that is, there is no a priori way to classify an individual in a dyad. Examples of exchangeable dyads include gay couples, same-sex roommates, and identical twins. However, examples of distinguishable dyads (such as heterosexual couples where individuals within a dyad can be classified by sex) also occur. The calculation of the partial pairwise intraclass correlation in the distinguishable case follows the same general pattern. In the distinguishable case the pairwise correlation model requires one extra piece of information: a grouping code indexing the dyad member. This extra information is needed because each dyad member is distinguishable according to some theoretically meaningful variable. One simply computes the usual partial correlation between the two reversed columns, i.e., partialling out the variable of the grouping code. This partial correlation is the maximum likelihood estimator of the pairwise intraclass correlation for the distinguishable case. For the theoretical background underlying the distinguishable case, relevant formulae, computational examples, and extensions to a structural equations modelling framework see Gonzalez & Griffin ([in press]).

    Table 2: Symbolic representation for the pairwise data setup in the distinguishable case. The first subscript represents the dyad and the second subscript represents the individual. Categorization of individuals as 1 or 2 is based on the class variable C.

    Variable
    Dyad # C X X¢
    1 1X11 X12
    2X12 X11
    2 1X21 X22
    2X22 X21
    3 1X31 X32
    2X32 X31
    4 1X41 X42
    2X42 X41

    The sample estimate of the partial pairwise intraclass correlation is simply the Pearson correlation between X  and X¢ partialling out variable C. The partial pairwise intraclass correlation is denoted rxx¢ .c. This correlation can be computed with standard statistical packages (e.g., the partial correlation routine in either SAS or SPSS). For completeness we present the formula for the partial correlation

    rxx¢ .c
    =
    rxx¢- rcxrcx¢

    Ö

    (1-rcx2)(1-rcx¢2)
    The sample value rxx¢ .c can be tested against the null hypothesis that rxx¢ .c using the large sample, asymptotic test
    Z
    =
    rxx¢ .cÖN
    where Z is normally distributed and can be compared to critical values found in standard tables. Note that the equality of variance assumption applies in the distinguishable case. For instance, the population variance for the men on variable X  is assumed to be equivalent to the population variance for the women on variable X . Standard tests for the equality of two dependent variances can be used to determine if this assumption is valid (e.g., Kenny, [1979]). See Gonzalez and Griffin ([1998b]) for advice about dealing with situations where the between-group variances are different.

  16. The pairwise intraclass correlation for groups

    Here we show how to extend the pairwise approach to situations where all groups are of size k. The direct extension is to perform the pairwise coding for all possible combinations of dyads. For instance, in a group of size three with members denoted A, B and C, the possible combinations are AB, AC, BA, BC, CA, and CB. For each of the six combinations, data from the person coded on the left (e.g., A in AB) is entered into column X  and data from the person coded on the right (e.g., B in AB) is entered into column X¢. Thus, columns X  and X¢  will contain 6N  data points, where N is the number of groups. The Pearson correlation between columns X  and X¢ is the pairwise intraclass correlation for the exchangeable case.

    Obviously, with large groups the pairwise framework becomes cumbersome because of the many combinations that need to be coded, but it still maintains it's interpretational simplicity. A computational shortcut to the pairwise framework for groups is given by using a traditional analysis of variance source table. Compute a one-way ANOVA using the grouping code as the single factor (e.g., if there are 20 groups of size 4, then there will be 20 cells in the ANOVA, each cell having four observations). Denote the sum of squares between groups as SSB, the sum of squares within groups as SSW, and the corresponding mean square terms as MSB and MSW, respectively. The exchangeable pairwise intraclass correlation is identical to

    rxx¢
    =
    (k-1)SSB - SSW
    (k-1)(SSB + SSW)
    (4)
    where k is the group size. Contrast this definition of the pairwise, with the ANOVA-based intraclass correlation (e.g., Shrout & Fleiss, [1979]), which is
    rxx¢
    =
    MSB - MSW
    MSB + (k - 1)MSW
    (5)
    where k is the group size. For a comparison of these two different formulations of the intraclass correlation, see Gonzalez and Griffin ([1998a]). The setup is similar for the distinguishable case: include an second factor for group member and compute the source table for the two factor ANOVA.

    Extensions of the intraclass correlation (either in pairwise or ANOVA-based formulations) is not straightforward for situations where the size of the groups vary within the same study. For example, a study on families may have some families of size 3, some of size 4, etc. For preliminary treatments of this problem see Karlin, Cameron, and Williams ([1981]) and Donner ([1986]).

  17. Relation to general models-Hierarchical Linear Models (HLM)

    There has been much discussion lately in the groups literature about hierarchical linear models. They are useful because they are general and can be extended to many other situations (unequal group sizes, missing data, nonnormal data, etc). Here I will show how the intraclass correlation for the exchangeable case can be handled through HLM and also how HLM helps unify the ANOVA intraclass and the pairwise intraclass that I presented today. For an introductory comparison between standard ANOVA models and their HLM generalizations see Raudenbush ([1993]).

    The intuition of HLM is very simple. The idea is to run a regression within each group (level 1 regressions), take the b's from those separate regressions and then use those bs as variables in a subsequent regression (level 2 regression). Of course, it isn't computationally efficient (nor correct) to manually do the two step process I just described. Rather, computer programs do both steps simultaneously. There are many reasons why it is a good idea to run everything simultaneously. For one, regressions should be aware that they are making use of parameters estimated by other regressions so that error estimates and degrees of freedom can be adjusted accordingly.

    We'll begin with the exchangeable case. Let's define the level 1 regression as

    Yij
    =
    bi0 + eij
    (6)
    where the e are normally distributed with mean 0 and variance s2e. Note that there is no intercept or grand mean term; the bi0 are the group means.

    The level 2 regression will decompose the bi0 into constituent parts. This is accomplished by the following regression that uses bi0 as the dependent variable.

    bi0
    =
    q00 + ki0
    (7)
    where q00 is the usual grand mean and ki0 is the random effect associated with group i. The ki0 are assumed to be normally distributed with mean 0 and variance t2. This is identical to the ANOVA model I presented earlier where m = q00 and ai = ki0.

    An HLM program will estimate s2e and t2, and then on use those to estimates to compute the intraclass correlation, i.e.,

    t2
    t2 + s2e
    (8)

    Here is a simple example using SAS PROC MIXED (identical results will emerge from other HLM programs such as Mln, HLM, VARCLUS, etc). The SAS code to read in the data and run the model is5

    *INFILE 'rg1';
    DATA rg1;
       INFILE 'simple.data';
       INPUT   subject  group data;
    run;
     
    proc mixed method=reml asycorr covtest cl info data=rg1;
     class group;
     model data = / solution predicted corrb;
     random intercept/sub=group solution;
    run;
    
    THE OUTPUT IS:.
    
                         Covariance Parameter Estimates (REML)
     
        Cov Parm    Subject      Estimate     Std Error       Z  Pr > |Z|  Alpha
     
        INTERCEPT   GROUP      0.44522222    0.38738209    1.15    0.2504   0.05
        Residual               1.01777778    0.21456640    4.74    0.0001   0.05
     
                        Covariance Parameter Estimates (REML)
     
                            Lower     Upper
     
                           0.1357    8.2051
                           0.7002    1.6146
     
     
                       Asymptotic Correlation Matrix of Estimates
     
                      Cov Parm     Row         COVP1         COVP2
     
                      INTERCEPT      1    1.00000000   -0.05538883
                      Residual       2   -0.05538883    1.00000000
     
     
                           Model Fitting Information for DATA
     
                        Description                        Value
     
                        Observations                     50.0000
                        Res Log Likelihood              -75.2790
                        Akaike's Information Criterion  -77.2790
                        Schwarz's Bayesian Criterion    -79.1709
                        -2 Res Log Likelihood           150.5581
     
    
                              Solution for Fixed Effects
     
             Effect         Estimate     Std Error    DF       t  Pr > |t|
     
             INTERCEPT    2.08000000    0.33075671     4    6.29    0.0033
     
     
                          Correlation Matrix for Fixed Effects
     
                          Effect      Row          COL1
     
                          INTERCEPT     1    1.00000000
     
                             Solution for Random Effects
     
          Effect     GROUP      Estimate       SE Pred    DF       t  Pr > |t|
     
          INTERCEPT  1       -0.55347552    0.39410253    45   -1.40    0.1671
          INTERCEPT  2        0.26045907    0.39410253    45    0.66    0.5121
          INTERCEPT  3        0.34185253    0.39410253    45    0.87    0.3903
          INTERCEPT  4        0.66742637    0.39410253    45    1.69    0.0973
          INTERCEPT  5       -0.71626244    0.39410253    45   -1.82    0.0758
     
                                   Predicted Values
     
                  DATA  Predicted   SE Pred       L95       U95  Residual
     
                0.0000     1.5265    0.2943    0.9337    2.1193   -1.5265
                1.0000     1.5265    0.2943    0.9337    2.1193   -0.5265
                3.0000     1.5265    0.2943    0.9337    2.1193    1.4735
                1.0000     1.5265    0.2943    0.9337    2.1193   -0.5265
                1.0000     1.5265    0.2943    0.9337    2.1193   -0.5265
                2.0000     1.5265    0.2943    0.9337    2.1193    0.4735
                2.0000     1.5265    0.2943    0.9337    2.1193    0.4735
                1.0000     1.5265    0.2943    0.9337    2.1193   -0.5265
                1.0000     1.5265    0.2943    0.9337    2.1193   -0.5265
                2.0000     1.5265    0.2943    0.9337    2.1193    0.4735
                2.0000     2.3405    0.2943    1.7477    2.9333   -0.3405
                3.0000     2.3405    0.2943    1.7477    2.9333    0.6595
                4.0000     2.3405    0.2943    1.7477    2.9333    1.6595
                2.0000     2.3405    0.2943    1.7477    2.9333   -0.3405
                1.0000     2.3405    0.2943    1.7477    2.9333   -1.3405
                1.0000     2.3405    0.2943    1.7477    2.9333   -1.3405
                2.0000     2.3405    0.2943    1.7477    2.9333   -0.3405
                2.0000     2.3405    0.2943    1.7477    2.9333   -0.3405
                3.0000     2.3405    0.2943    1.7477    2.9333    0.6595
                4.0000     2.3405    0.2943    1.7477    2.9333    1.6595
                2.0000     2.4219    0.2943    1.8290    3.0147   -0.4219
                3.0000     2.4219    0.2943    1.8290    3.0147    0.5781
                4.0000     2.4219    0.2943    1.8290    3.0147    1.5781
                4.0000     2.4219    0.2943    1.8290    3.0147    1.5781
                2.0000     2.4219    0.2943    1.8290    3.0147   -0.4219
                1.0000     2.4219    0.2943    1.8290    3.0147   -1.4219
                2.0000     2.4219    0.2943    1.8290    3.0147   -0.4219
                3.0000     2.4219    0.2943    1.8290    3.0147    0.5781
                2.0000     2.4219    0.2943    1.8290    3.0147   -0.4219
                2.0000     2.4219    0.2943    1.8290    3.0147   -0.4219
                2.0000     2.7474    0.2943    2.1546    3.3402   -0.7474
                4.0000     2.7474    0.2943    2.1546    3.3402    1.2526
                5.0000     2.7474    0.2943    2.1546    3.3402    2.2526
                3.0000     2.7474    0.2943    2.1546    3.3402    0.2526
                2.0000     2.7474    0.2943    2.1546    3.3402   -0.7474
                1.0000     2.7474    0.2943    2.1546    3.3402   -1.7474
                3.0000     2.7474    0.2943    2.1546    3.3402    0.2526
                3.0000     2.7474    0.2943    2.1546    3.3402    0.2526
                2.0000     2.7474    0.2943    2.1546    3.3402   -0.7474
                4.0000     2.7474    0.2943    2.1546    3.3402    1.2526
                1.0000     1.3637    0.2943    0.7709    1.9565   -0.3637
                0.0000     1.3637    0.2943    0.7709    1.9565   -1.3637
                2.0000     1.3637    0.2943    0.7709    1.9565    0.6363
                1.0000     1.3637    0.2943    0.7709    1.9565   -0.3637
                1.0000     1.3637    0.2943    0.7709    1.9565   -0.3637
                2.0000     1.3637    0.2943    0.7709    1.9565    0.6363
                1.0000     1.3637    0.2943    0.7709    1.9565   -0.3637
                0.0000     1.3637    0.2943    0.7709    1.9565   -1.3637
                1.0000     1.3637    0.2943    0.7709    1.9565   -0.3637
                3.0000     1.3637    0.2943    0.7709    1.9565    1.6363
    

    The PROC MIXED has three critical lines of syntax. The first is that it defines the variable ``group'' as a class variable; that means group will be treated like a factor in an ANOVA. The second line states the dependent variable is the variable data. The third line states that there is a random effect called the intercept which is nested within the group variable. The other words such as ``solution'', ``predicted'', ``corrb'', ``cl'', etc. pretty much handle printing options. The last critical subcommand is the the ``method=reml'' given in the PROC MIXED line. This tells SAS to use a ``REstricted Maximum Likelihood'' estimation procedure. This just means to use the standard ANOVA estimators for MS terms.

    The output labels the parameter t2 as ``INTERCEPT GROUP'' and the parameter s2e as ``RESIDUAL''. No where in the output does one find the intraclass correlation printed, so it must be computed by hand using Equation 8. For this example we have an intraclass of

    .30
    =
    .445
    .445 + 1.02
    (9)

    Now here is something very interesting. If you change the command ``method=reml'' to ``method=ml'' (and don't change anything else) you automatically get the pairwise intraclass correlation. ``ml'' stands for ``Maximum Likelihood'', which is a slightly different estimator than the REML estimator. Interesting that what separates the ANOVA intraclass and the pairwise intraclass is whether or not ``re'' is added to the ``method'' option in PROC MIXED.

    proc mixed method=ml asycorr covtest cl info data=rg1;
     class group;
     model data = / solution predicted corrb;
     random intercept/sub=group solution;
    run;
    
    THE OUTPUT IS:
    
    
                         Covariance Parameter Estimates (MLE)
     
        Cov Parm    Subject      Estimate     Std Error       Z  Pr > |Z|  Alpha
     
        INTERCEPT   GROUP      0.33582222    0.27759303    1.21    0.2264   0.05
        Residual               1.01777778    0.21456640    4.74    0.0001   0.05
     
     
                          Covariance Parameter Estimates (MLE)
     
                             Lower     Upper
     
                            0.1067    4.9168
                            0.7002    1.6146
     
     
                       Asymptotic Correlation Matrix of Estimates
     
                      Cov Parm     Row         COVP1         COVP2
     
                      INTERCEPT      1    1.00000000   -0.07729531
                      Residual       2   -0.07729531    1.00000000
     
     
                           Model Fitting Information for DATA
     
                        Description                        Value
     
                        Observations                     50.0000
                        Log Likelihood                  -75.0338
                        Akaike's Information Criterion  -77.0338
                        Schwarz's Bayesian Criterion    -78.9458
                        -2 Log Likelihood               150.0675
     
     
                              Solution for Fixed Effects
     
                              Solution for Fixed Effects
     
             Effect         Estimate     Std Error    DF       t  Pr > |t|
     
             INTERCEPT    2.08000000    0.29583779     4    7.03    0.0022
     
     
                          Correlation Matrix for Fixed Effects
     
                          Effect      Row          COL1
     
                          INTERCEPT     1    1.00000000
     
     
                              Solution for Random Effects
     
          Effect     GROUP      Estimate       SE Pred    DF       t  Pr > |t|
     
          INTERCEPT  1       -0.52184440    0.36006853    45   -1.45    0.1542
          INTERCEPT  2        0.24557384    0.36006853    45    0.68    0.4987
          INTERCEPT  3        0.32231566    0.36006853    45    0.90    0.3755
          INTERCEPT  4        0.62928296    0.36006853    45    1.75    0.0873
          INTERCEPT  5       -0.67532805    0.36006853    45   -1.88    0.0672
     
     
                                   Predicted Values
     
                  DATA  Predicted   SE Pred       L95       U95  Residual
     
                0.0000     1.5582    0.2878    0.9785    2.1379   -1.5582
                1.0000     1.5582    0.2878    0.9785    2.1379   -0.5582
                3.0000     1.5582    0.2878    0.9785    2.1379    1.4418
                1.0000     1.5582    0.2878    0.9785    2.1379   -0.5582
                1.0000     1.5582    0.2878    0.9785    2.1379   -0.5582
                2.0000     1.5582    0.2878    0.9785    2.1379    0.4418
                2.0000     1.5582    0.2878    0.9785    2.1379    0.4418
                1.0000     1.5582    0.2878    0.9785    2.1379   -0.5582
                1.0000     1.5582    0.2878    0.9785    2.1379   -0.5582
                2.0000     1.5582    0.2878    0.9785    2.1379    0.4418
                2.0000     2.3256    0.2878    1.7459    2.9053   -0.3256
                3.0000     2.3256    0.2878    1.7459    2.9053    0.6744
                4.0000     2.3256    0.2878    1.7459    2.9053    1.6744
                2.0000     2.3256    0.2878    1.7459    2.9053   -0.3256
                1.0000     2.3256    0.2878    1.7459    2.9053   -1.3256
                1.0000     2.3256    0.2878    1.7459    2.9053   -1.3256
                2.0000     2.3256    0.2878    1.7459    2.9053   -0.3256
                2.0000     2.3256    0.2878    1.7459    2.9053   -0.3256
                3.0000     2.3256    0.2878    1.7459    2.9053    0.6744
                4.0000     2.3256    0.2878    1.7459    2.9053    1.6744
                2.0000     2.4023    0.2878    1.8226    2.9820   -0.4023
                3.0000     2.4023    0.2878    1.8226    2.9820    0.5977
                4.0000     2.4023    0.2878    1.8226    2.9820    1.5977
                4.0000     2.4023    0.2878    1.8226    2.9820    1.5977
                2.0000     2.4023    0.2878    1.8226    2.9820   -0.4023
                1.0000     2.4023    0.2878    1.8226    2.9820   -1.4023
                2.0000     2.4023    0.2878    1.8226    2.9820   -0.4023
                3.0000     2.4023    0.2878    1.8226    2.9820    0.5977
                2.0000     2.4023    0.2878    1.8226    2.9820   -0.4023
                2.0000     2.4023    0.2878    1.8226    2.9820   -0.4023
                2.0000     2.7093    0.2878    2.1296    3.2890   -0.7093
                4.0000     2.7093    0.2878    2.1296    3.2890    1.2907
                5.0000     2.7093    0.2878    2.1296    3.2890    2.2907
                3.0000     2.7093    0.2878    2.1296    3.2890    0.2907
                2.0000     2.7093    0.2878    2.1296    3.2890   -0.7093
                1.0000     2.7093    0.2878    2.1296    3.2890   -1.7093
                3.0000     2.7093    0.2878    2.1296    3.2890    0.2907
                3.0000     2.7093    0.2878    2.1296    3.2890    0.2907
                2.0000     2.7093    0.2878    2.1296    3.2890   -0.7093
                4.0000     2.7093    0.2878    2.1296    3.2890    1.2907
                1.0000     1.4047    0.2878    0.8250    1.9844   -0.4047
                0.0000     1.4047    0.2878    0.8250    1.9844   -1.4047
                2.0000     1.4047    0.2878    0.8250    1.9844    0.5953
                1.0000     1.4047    0.2878    0.8250    1.9844   -0.4047
                1.0000     1.4047    0.2878    0.8250    1.9844   -0.4047
                2.0000     1.4047    0.2878    0.8250    1.9844    0.5953
                1.0000     1.4047    0.2878    0.8250    1.9844   -0.4047
                0.0000     1.4047    0.2878    0.8250    1.9844   -1.4047
                1.0000     1.4047    0.2878    0.8250    1.9844   -0.4047
                3.0000     1.4047    0.2878    0.8250    1.9844    1.5953
    

    The intraclass correlation for this analysis is also found by applying Equation 8

    .2477
    =
    .3358
    .3358 + 1.02
    (10)

    This is identical to the value you would get using the pairwise formula I presented earlier. If you run an ANOVA on these data you will find that SSB = 21.88, SSW = 45.80, and k = 10 because there are 10 subjects in each of the 5 groups. Plugging in these numbers into Equation 4 yields the identical result for the intraclass as the SAS output using ``method=ml''

    .2481
    =
    (10-1)21.88 - 45.80
    (10-1)(21.88+45.80)
    (11)

  18. The same example can be carried out in a program called HLM, which is a program that obviously performs hierarchical linear modelling. The versions I've seen are very buggy, but many people I know think the HLM program is the best thing in their lives since started graduate school.

    HLM is easiest through the windows interface but that interface does not include all the available options. There is also a syntax command that allows you to use all the available options. I'll give both the syntax file and excerpts from the resulting output.

    #WHLM CMD FILE FOR hlmex.ssm
    
    nonlin:n
    
    numit:50
    
    stopval:0.0000010000
    
    level1:DATA=INTRCPT1+RANDOM
    
    level2:INTRCPT1=INTRCPT2+random/
    
    fixtau:3
    
    lev1ols:10
    
    accel:5
    
    resfil:n
    
    resfil:n
    
    hypoth:n
    
    homvar:n
    
    CONSTRAIN:N
    
    HETEROL1VAR:N
    
    LAPLACE:N,50
    
    LVR:N
    
    title:no title
    
    output:C:\Rich\dyad\hlm2.out
    
    mlf:y
    
    

              *************************************************************
              *                                                           *
              *             H   H  L      M   M   22                      *
              *             H   H  L      MM MM  2  2                     *
              *             HHHHH  L      M M M    2      Version 4.40    *
              *             H   H  L      M   M   2                       *
              *             H   H  LLLLL  M   M  2222                     *
              *                                                           *
              *************************************************************
    
      SPECIFICATIONS FOR THIS HLM RUN                     Tue Jun 22 00:35:36 1999
    
     -------------------------------------------------------------------------------
    
      Problem Title: NO TITLE
    
     Weighting Specification
     -----------------------
                             Weight
                             Variable
                Weighting?   Name        Normalized?
     Level 1        no                        no
     Level 2        no                        no
    
      The outcome variable is     DATA    
    
      The model specified for the fixed effects was:
     ----------------------------------------------------
    
       Level-1                  Level-2
       Coefficients             Predictors
     ----------------------   ---------------
             INTRCPT1, B0      INTRCPT2, G00
    
    
     The model specified for the covariance components was:
     ---------------------------------------------------------
    
             Sigma squared (constant across level-2 units)
    
             Tau dimensions
                   INTRCPT1
    
    
     Summary of the model specified (in equation format)
     ---------------------------------------------------
    
    Level-1 Model
    
            Y = B0 + R
    
    Level-2 Model
    
            B0 = G00 + U0
    
    Level-1 OLS regressions
     -----------------------
    
     Level-2 Unit     INTRCPT1    
     ------------------------------------------------------------------------------
               1     1.40000    
               2     2.40000    
               3     2.50000    
               4     2.90000    
               5     1.20000    
    
    
    The average OLS level-1 coefficient for INTRCPT1 =      2.08000
    
    
     Least Squares Estimates
     -----------------------
    
     sigma_squared =      1.35360
    
     The outcome variable is     DATA
    
     Least-squares estimates of fixed effects
     ----------------------------------------------------------------------------
                                           Standard             Approx.
        Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
     ----------------------------------------------------------------------------
     For       INTRCPT1, B0
        INTRCPT2, G00           2.080000   0.164536    12.642         4    0.000
     ----------------------------------------------------------------------------
    
     The outcome variable is     DATA
    
     Least-squares estimates of fixed effects
     (with robust standard errors)
     ----------------------------------------------------------------------------
                                           Standard             Approx.
        Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
     ----------------------------------------------------------------------------
     For       INTRCPT1, B0
        INTRCPT2, G00           2.080000   0.295838     7.031         4    0.000
     ----------------------------------------------------------------------------
    
    
     The least-squares likelihood value = -78.516119
     Deviance =    157.03224
     Number of estimated parameters =    2
    
    
    
     STARTING VALUES
     ---------------
    sigma(0)_squared =      1.01778
    
     Tau(0)
     INTRCPT1      0.33582 
    
     Estimation of fixed effects
    (Based on starting values of covariance components)
     ----------------------------------------------------------------------------
                                           Standard             Approx.
        Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
     ----------------------------------------------------------------------------
     For       INTRCPT1, B0
        INTRCPT2, G00           2.080000   0.295838     7.031         4    0.000
     ----------------------------------------------------------------------------
    
    
    The value of the likelihood function at iteration 1 = -7.503375E+001
    
    Iterations stopped due to small change in likelihood function
    ****** ITERATION 2 *******
    
     Sigma_squared =      1.01778
    
     Standard Error of Sigma_squared =      0.21457
    
    
     Tau
     INTRCPT1      0.33582 
    
    
     Standard Errors of Tau
     INTRCPT1      0.27759 
    
    
    Tau (as correlations)
     INTRCPT1  1.000
    
     ----------------------------------------------------
      Random level-1 coefficient   Reliability estimate
     ----------------------------------------------------
      INTRCPT1, B0                        0.767
     ----------------------------------------------------
    
    The value of the likelihood function at iteration 2 = -7.503375E+001
    The outcome variable is     DATA
    
     Final estimation of fixed effects:
     ----------------------------------------------------------------------------
                                           Standard             Approx.
        Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
     ----------------------------------------------------------------------------
     For       INTRCPT1, B0
        INTRCPT2, G00           2.080000   0.295838     7.031         4    0.000
     ----------------------------------------------------------------------------
    
     The outcome variable is     DATA
    
     Final estimation of fixed effects
     (with robust standard errors)
     ----------------------------------------------------------------------------
                                           Standard             Approx.
        Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
     ----------------------------------------------------------------------------
     For       INTRCPT1, B0
        INTRCPT2, G00           2.080000   0.295838     7.031         4    0.000
     ----------------------------------------------------------------------------
    
    
    
     Final estimation of variance components:
     -----------------------------------------------------------------------------
     Random Effect          Standard      Variance     df    Chi-square   P-value
                            Deviation     Component
     -----------------------------------------------------------------------------
     INTRCPT1,       U0        0.57950       0.33582     4      21.49782    0.000
      level-1,       R         1.00885       1.01778
     -----------------------------------------------------------------------------
    
    
     Statistics for current covariance components model
     --------------------------------------------------
     Deviance =    150.06750
     Number of estimated parameters =    3
    
    \end{verbatimcmd}
    
    }
    
    
    \newpage
    \section{Appendix 1}
    
    Data for HLM example. I used the data structure below both for SAS and
    as input into the {\it HLM} program as level 1.
    
    {\scriptsize
    \begin{verbatim}
    SUBID GROUP DATA
     1 1 0
     2 1 1
     3 1 3
     4 1 1
     5 1 1
     6 1 2
     7 1 2
     8 1 1
     9 1 1
    10 1 2
    11 2 2
    12 2 3
    13 2 4
    14 2 2
    15 2 1
    16 2 1
    17 2 2
    18 2 2
    19 2 3
    20 2 4 
    21 3 2
    22 3 3
    23 3 4
    24 3 4 
    25 3 2 
    26 3 1
    27 3 2
    28 3 3 
    29 3 2 
    30 3 2
    31 4 2 
    32 4 4 
    33 4 5
    34 4 3
    35 4 2
    36 4 1
    37 4 3
    38 4 3
    39 4 2
    40 4 4 
    41 5 1
    42 5 0
    43 5 2
    44 5 1
    45 5 1
    46 5 2
    47 5 1
    48 5 0
    49 5 1
    50 5 3
    

    For HLM I also had to input a level 2 file structured like this:

    GROUP GROUPID
    1 1
    2 2
    3 3
    4 4
    5 5
    

    References

    [1986]
    Donner, A. (1986). A review of inference procedures of the intraclass correlation coefficient in the one-way random effects model. Internation Statistical Review, 54, 67-82.

    [1988]
    Donner, A., & Eliasziw, M. (1988). Confidence interval construction for parent-offspring correlations. Biometrics, 44, 727-737.

    [1925]
    Fisher, R. A. (1925). Statistical Methods for Research Workers. Edinburgh: Oliver & Boyd.

    [1998a]
    Gonzalez, R., & Griffin, D. (1998a). An approximate significance test for the group-level correlation. University of Michigan and University of Sussex

    [1998b]
    Gonzalez, R., & Griffin, D. (1998b). The multiple personalities of the intraclass correlation. University of Michigan and University of Sussex

    [in press]
    Gonzalez, R., & Griffin, D. (in press). The correlational analysis of dyad-level data in the distinguishable case. Personal Relationships

    [1995]
    Griffin, D., & Gonzalez, R. (1995). The correlational analysis of dyad-level data: Models for the exchangeable case. Psychological Bulletin, 118, 430-439.

    [1958]
    Haggard, E. A. (1958). Intraclass Correlation and the Analysis of Variance. New York: Dryden Press.

    [1981]
    Karlin, S., Cameron, E. C., & WIlliams, P. T. (1981). Sibling and parent-offspring correlation estimation with variable family size. Procedings of the National Academy of Science, 78, 2664-2668.

    [1979]
    Kenny, D. A. (1979). Correlation and Causality. New York: John Wiley.

    [1993]
    Liberman, V., & Tversky, A. (1993). On the evaluation of probability judgments: Calibration, resolution, and monotonicity. Psychological Bulletin, 114, 162-173.

    [1901]
    Pearson, K. (1901). Mathematical contributions to the theory of evolution ix. On the principle of homotyposis and its relation to heredity, to the variability of the individual, and to that of the race. Philosophical Transactions of the Royal Society of London, Series A, 197, 285-379.

    [1993]
    Raudenbush, S. W. (1993). Hierarchical linear models and experimental design. In L. K. Edwards (Ed.), Applied Analysis of Variance in Behavioral Science (pp. 459-496). New York: Marcel Dekker.

    [1979]
    Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420-428.

    [1992]
    Stinson, L., & Ickes, W. (1992). Empathic accuracy in the interactions of male friends versus male strangers. Journal of Personality and Social Psychology, 62, 787-797.


Footnotes:

1 These notes take liberally from joint work with Dale Griffin. When I say ``our'' or ``we'' in these notes, I'm refer to Dale Griffin and myself.

2 In the case of a dyad with distinguishable members the X and Y columns each define a vector in an N-dimensional space. The cosine of the angle between the two vectors is the correlation coefficient.

3 There are additional uses of the intraclass correlation. For instance, the intraclass correlation appears in reliability theory and can be used where a measure of similarity between two scores is needed.

4 To simplify matters,we have chosen to present large sample asymptotic significance tests. We present a null hypothesis testing approach rather than a confidence interval approach but the latter will also be developed. Deriving analytic results for confidence intervals over correlations has not been an easy problem. Fortunately, there have been recent advances in the variance components literature for deriving confidence intervals that are applicable to the pairwise models (e.g., Donner & Eliasziw, [1988]).

5 There are other ways to write the PROC MIXED code that produce identical output but I've chosen this way because it makes generalizations to what I want to do during Day 3 and 4 much easier. For instance, the line ``random intercept/sub=group solution'' can be replaced with ``random group/solution'' and the output is the same.


File translated from TEX by TTH, version 2.32.
On 6 Sep 1999, 22:34.