Covariance, Regression, and Correlation

  • Covariance Consider a set of paired variables, \((x,y)\). The Covariance of \(x\) and \(y\) is: \[\sigma(x,y)=E[(x-\mu_x)(y-\mu_y)]=E(xy)-\mu_x\mu_y\]

  • Regression \[y=\alpha+\beta x+e\] \[\hat y=\alpha+\beta x\]

\[a=\bar y-b\bar x\]

\[b=\frac{Cov(x,y)}{Var(x)}\] The least-squares estimators for the intercept and slope of a linear regression are simple functions of the observed means, variances, and covariances. This property is exceedingly useful for quantitative genetics, since such statistics are readily obtainable from phenotypic data.

data(women)
cov(women$height, women$weight)
## [1] 69
fit <- lm(weight~height, data = women)
summary(fit)
## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14
plot(women$weight~women$height, xlab='height', ylab='weight')
abline(lm(weight ~ height, data = women))

  • Correlation

\[r(x,y)=\frac{Cov(x,y)}{\sqrt{Var(x)Var(y)}}\]

  • The correlation between genotypic and phenotypic values \[z=G+E\] \(z\) is the phenotypic value, \(G\) is the genotypic value, and \(E\) is an environmental deviation(residual error) \[\sigma_{z,G}=\sigma[(G+E),G]=\sigma_G^2+\sigma_{G,E}\] correlation coefficient is \[\rho^2(G,z)=(\frac{\sigma_{G,z}}{\sigma_G\sigma_z})^2=\frac{(\sigma^2_G+\sigma_{G,E})^2}{\sigma^2_G\sigma^2_z}\] which simplifies to \[\rho^2(G,z)=\frac{\sigma^2_G}{\sigma^2_z}\] if there is no genotype-environment covariance. The quantity \(\sigma^2_G/\sigma^2_z\) is generally referred to as heritability in the broad sense.

  • Regression of offspring phenotype on midparent phenotype

We can estimate the narrow-sense heritability through the regression of child’s phenotype on the parental mid-point phenotype. Related individuals carry copies of many of the same alleles. \[z_{mp}=\frac{z_m+z_f}{2}\] \[z_m=g_m+g_m^{'}+E_m\] \[z_f=g_f+g_f^{'}+E_f\] \[z_o=g_m+g_f+E_o\] \[\sigma(z_{mp},z_0)=\sigma[(\frac{z_m+z_f}{2}),z_o]=\sigma[(\frac{g_m+g_f}{2}),(g_m+g_f)]=\frac{\sigma^2(g_m)+\sigma^2(g_f)}{2}\] additive genetic variance, \(\sigma^2_A\) \[\sigma(z_{mp},z_o)=\frac{\sigma^2_A}{2}\] variance of midparent phenotypes: \[\sigma^2(z_{mp})=\sigma^2(\frac{z_m+z_f}{2})=\frac{\sigma^2(z_m)+\sigma^2(z_f)}{4}\] If the phenotypic variance in the two sexes is equal \[\sigma^2(z_{mp})=\frac{\sigma^2_z}{2}\] \[\beta_{o,mp}=\frac{\sigma^2_A}{\sigma^2_z}=h^2\]

# a simulation study
par.off.corr<-function(L=20, environ.var,Num_inds=1000,print.slope=FALSE,sel.cutoff=FALSE){
    ##Quantitative genetics sims
    allele.freq<-0.5   ###each locus is assumed to have the same allele frequencies. This is just to simplify the coding, in reality these results work when each locus has its own frequency (and the coding wouldn't be too much harder). 
     
     
    
    ##MAKE A MUM
    ## For each mother, at each locus we draw an allele (either 0 or 1) from the population allele frequency. 
    ##We do this twice for each mother two represent the two haplotypes in the mother 
    mum.hap.1<-replicate(Num_inds, rbinom(L,1,allele.freq) )
    mum.hap.2<-replicate(Num_inds, rbinom(L,1,allele.freq) )
    ##type mum.hap.1[,1] to see the 1st mothers 1st haplotype
    
    ##Each mothers genotype at each locus is either 0,1,2
    mum.geno<-mum.hap.1+mum.hap.2
    
    additive.genetic<-colSums(mum.geno)
    genetic.sd<-sd(additive.genetic)
    mean.genetic<-mean(additive.genetic)
    
    additive.genetic<-additive.genetic / sd(additive.genetic)
    mum.pheno<- additive.genetic + rnorm(Num_inds,sd=sqrt(environ.var))
    mum.pheno<-mum.pheno-mean(mum.pheno)
    
    
    
    ##MAKE A DAD (same code as make a mum, only said in a deeper voice)
    dad.hap.1<-replicate(Num_inds, rbinom(L,1,allele.freq) )
    dad.hap.2<-replicate(Num_inds, rbinom(L,1,allele.freq) )
    dad.geno<-dad.hap.1+dad.hap.2
    
    
    additive.genetic<-colSums(dad.geno)
    additive.genetic<-additive.genetic / sd(additive.genetic)
    dad.pheno<- additive.genetic + rnorm(Num_inds,sd=sqrt(environ.var))
    dad.pheno<-dad.pheno-mean(dad.pheno)
    
    ### Make a child
    child.geno<-dad.hap.1+mum.hap.1 ##1 haplotype from mum 1 haplotype from dad
    
    additive.genetic<-colSums(child.geno)
    additive.genetic<-additive.genetic / sd(additive.genetic)
    child.pheno<- additive.genetic + rnorm(Num_inds,sd=sqrt(environ.var))
    child.pheno<-child.pheno-mean(child.pheno)
    
    
    
    ##Calculate midpoints, linear model and plots
    
    parental.midpoint<-(mum.pheno+dad.pheno)/2 ##avg. parents
    
    lm(child.pheno~parental.midpoint) ##linear model between child and mid point
                                ##the slope of this line is the narrow sense heritability.
    
    # plot parental midpoint against offsprings phenotype.
    #layout(1) ###done in case this is run after the code with 3 plots
    if(sel.cutoff){
        plot(parental.midpoint,child.pheno,xlab="Parental midpoint",ylab="Child's phenotype",cex=1.5,cex.axis=1.5,cex.main=1.5,cex.lab=1.5,main=paste("VE=",environ.var,", VA=1 ","(L =",L,")",sep=""),col=ifelse(parental.midpoint>1,"red","grey"))
    }else{
        plot(parental.midpoint,child.pheno,xlab="Parental midpoint",ylab="Child's phenotype",cex=1.5,cex.axis=1.5,cex.main=1.5,cex.lab=1.5,main=paste("VE=",environ.var,", VA=1 ","(L =",L,")",sep=""))
    }
    ## plot the regression in red
    abline(h=0,col="grey",lwd=2)
    abline(v=0,col="grey",lwd=2)
     abline(lm(child.pheno~parental.midpoint),col="blue",lwd=2)
    
    
    
    my.slope<-lm(child.pheno~parental.midpoint)$coeff[2]
    
    if(sel.cutoff){
        sel.child.mean<-mean(child.pheno[parental.midpoint>1])
        sel.par.mean<-mean(parental.midpoint[parental.midpoint>1]);
        points(sel.par.mean,0,col="blue",pch=19,cex=1.2)
        pred.sel.child<-my.slope*sel.par.mean
        points(0,pred.sel.child,col="blue",pch=19,cex=1.2)
        lines(c(0,sel.par.mean), rep(pred.sel.child,2),col="blue",lwd=2)
        lines(rep(sel.par.mean,2), c(0,pred.sel.child),col="blue",lwd=2)
        arrows(x0=0,x1=0,y0=0,y1=pred.sel.child,col="blue",lwd=2,length=0.1,code=3)
        arrows(x0=0,x1=sel.par.mean,y0=0,y1=0,col="blue",lwd=2,length=0.1,code=3)
        text(x=sel.par.mean/2,y=min(child.pheno)*0.06,"S",cex=1.5,col="blue")
        text(x=min(parental.midpoint)*0.06,y=pred.sel.child/2,"R",cex=1.5,col="blue")
    }
    
    if(print.slope) text(x=min(parental.midpoint)*.8,y=max(child.pheno)*.9,label=paste("slope= ",format(my.slope,digit=3)),col="red",lwd=4,cex=1.5)
    
    abline(0,1,col="red",lwd=3,lty=2)
 #  recover()
 }
 

layout(t(1:3))
par.off.corr(L=100, environ.var=100,Num_inds=500)  #,print.slope=TRUE)
par.off.corr(L=100, environ.var=1,Num_inds=500) #,print.slope=TRUE)
par.off.corr(L=100, environ.var=0.001,Num_inds=500) #,print.slope=TRUE)

  • Directional selection differentials and the Robertson-Price Identity

The evolutionary response of a character to selection is a function of the intensity of selection and the fraction of the phenotypic variance attributable to certain genetic effects.

Directional selection differential, \(S\) \[S=\mu_s-\mu\] The degree to which \(\mu_s\) deviates from \(\mu\) depends on the survivorship and reproductive rates of individuals with different phenotypes. If all individuals have equal fertility and viability, then \(\mu_s=\mu, S=0\), and the population mean phenotype is not expected to change between generations.

Fitness, \(W(z)\) is the probability that individuals with phenotype \(z\) survive to reproduce.

Probability density function after selection: \[p_s(z)=\frac{W(z)p(z)}{\int W(z)p(z)dz}\] The denominator is the mean individual fitness, \(\bar{W}\). Letting the relative fitness of phenotype \(z\) be \(w(z)=W(z)/\bar{W}\), \(p_s(z)=w(z)p(z)\)

\[\mu_s=\int zp_s(z)dz=\int zw(z)p(z)dz=E[zw(z)]\] \[\bar{w}=\int w(z)p(z)dz=\frac{1}{\bar{W}}\int W(z)p(z)dz=\bar{W}/\bar{W}=1\] since \(\mu=E(z)*E(w)=E(z)*1\), \[S=\mu_s-\mu=E[zw(z)]-E(z)E(w)=\sigma[zw(z)]-E(z)E(w)=\sigma[z,w(z)]\]

  • Robertson-Price identity

breeder’s equation: predictive eqution for evolutionary change across generations \[\Delta \mu=\mu_o-\mu=\beta(\mu_s-\mu)=\beta S\]

Reference