Assignment2

=Assignment for Homework 3: Parts 5-6: characterizing co-variability between 2 fields.=

5. Scatter plot, correlation and covariance, regression-explained variance

 * 1) Based on your data fields (which you've seen pictures of), **make subsets of your 2 variables** x and y and **make a scatter plot of these showing the strongest (positive or negative) correlation of one field with the other you can find**. The subset might simply be all (x,t) values if your fields are very similar (olr, precip), or maybe the 240 time values at one longitude, or 144 longitudinal values in the time mean, or time series at different longitudes if some variability is offset in your two fields (like pressure and wind).
 * 2) Now consider the covariance and correlation of the two subset arrays entering your scatterplot.
 * What is the correlation coefficient corresponding to this scatter plot? rho = corrcoef(x,y) rho = correlate(x,y)
 * What are the standard deviations of your two data subsets? std(x) stdev(y)
 * What fraction of the variance of y can be 'explained' by linear regression on x (y = mx + b)? How does this relate to rho? How much y variance is explained? (variance: with units of y squared) What is m? //Hint: these are simple questions: use the math formula, not a computer code (Hsieh section 1.4.2, Eq. 1.33).//
 * What fraction of the variance of x can be 'explained' by linear regression on variable y? (x = nx + a)? How does this relate to rho? What is n? //Hint: these are simple questions, use the math formula not computer code.//
 * 1) Now add uncorrelated (random) noise with variance 1 to one of your variables. This might be like observation error. noisey = y + random('Normal',0,1,size(y))
 * How did the variance of y change when this noise was added? var(y) var(noisey)
 * How did the correlation change? rho = corrcoef(x,noisey) rho = correlate(x, noisey)
 * How do these changes affect the regression of y on x? How much (y+noise) variance is explained by linear regression on x? What is the new value of m in the new (noisey = mx + b) regression?
 * Hint: all these could be answered without using the computer, but it may help to confirm with data

6. Lagged correlation, covariance, and cross-covariance: hey let's compute all vs. all
Rather than pick time series at one longitude to correlate with other time series, let's use matrix power to fill out a whole 2D (144x144 or 240x240) covariance matrix or what the heck 3D arrays: the covariance matrix as a function of longitude and time lag (144 x 144 x NLAGS).

For these ENSO type anomalies, let's do NLAGS = 49 [-24 to 24 months]. Everything we might want to know about correlations among our variables is a slice or section of that big covariance array.

Let's work from the anomx and anomy arrays (built in part 1 by subtracting the climatological annual cycle at each longitude). This is traditional climate statistics, although notice that of course we could look at spatial instead of temporal covariances, define anomalies from spatial rather than temporal averages, etc. if we cared to.

Code for computing covariance arrays:
Same code lines for both Matlab and IDL: MAXLAG = 24; % out to +/- 24 months NLAGS = 2*MAXLAG + 1; % 49 lags are considered

**%Matlab**: % deseasonalized anomalies anomy = reshape(anomy,144,240); anomx = reshape(anomx,144,240);

% make a lag vector, for a plot axis later lags = linspace(-MAXLAG, MAXLAG, NLAGS); % lag vector, for plot axis

%% Matlab, upon reading our files, sees them as (144 x 240) arrays. %% The **first** dimension, 144, is called "**rows**" for whatever reason, %% while the **second** dimension, 240, is called "**columns**". %% Statistics functions assume the "**columns**" are the **structural** dimension while %% "**rows**" are the **statistical** dimension to be averaged over. %% So if we want the TEMPORAL covariance of our TEMPORAL anomalies, %% as a function of longitude, we need to transpose the second matrix rather than the first: %% **//covxx = anomx*transpose(anomx) /240.;//** %% OR we could just use the cov or corrcoef functions, %% but we'd need to transpose: **// varx = cov( transpose(anomx)); //** %% Frustratingly, cov(x,y) and corrcoef(x,y) give stupid-sized outputs (2x2), %% although cov(x) and corrcoef(x) are sensible. who wants that?? Frigging matlab. %% Let's just do it more directly, manually, by matrix multiplication.

%%%% Declare the desired arrays: lagged temporal covariance, as function of space (144). lagcovxy_temporal = zeros(144,144, NLAGS); lagcovxx_temporal = zeros(144,144, NLAGS); lagcovyy_temporal = zeros(144,144, NLAGS);

% explicit loop over lag, to fill the arrays of lagged covariance. % Must trim off data within MAXLAG of the ends of the time series, so it's a bit fussy:

NTSTATS = 240. - NLAGS; % trimmed off ends, so sum is over 191 months, not 240 % for ilag = 1:NLAGS % lag index 1-49 for ilag = 1:NLAGS-1 % lag index 1-48 only!!! (bugs/ matlab sux)

% UMM BETTER JUST USE NLAGS-1 in the above because it chokes on 49
timelag = ilag-MAXLAG; % actual time lag, -24 to 24 months lagco**vxy**_temporal(:,:, ilag) = anom**x**(:, MAXLAG:end-MAXLAG) * transpose( anom**y**(:, MAXLAG+timelag:end-MAXLAG+timelag) )/NTSTATS; lagco**vxx**_temporal(:,:, ilag) = anom**x**(:, MAXLAG:end-MAXLAG) * transpose( anom**x**(:, MAXLAG+timelag:end-MAXLAG+timelag) )/NTSTATS; lagco**vyy**_temporal(:,:, ilag) = anom**y**(:, MAXLAG:end-MAXLAG) * transpose( anom**y**(:, MAXLAG+timelag:end-MAXLAG+timelag) )/NTSTATS; end

% Remember covariance is correlation coefficient times the standard % deviations of the 2 variables. So make stdev 144 element arrays % for converting covariances to correlations

stdx = squeeze(std(transpose(anomx))); % 144 element vector: std(anomx) at each longitude stdy = squeeze(std(transpose(anomy)));

stdxrep = repmat(stdx,144,1); % replicate matrix to 144 x 144 stdyrep = repmat(stdy,144,1);

stdyy = (stdyrep .* transpose(stdyrep)); % elementwise multiplication gives stdev x stdev stdxx = (stdxrep .* transpose(stdxrep)); % elementwise multiplication gives stdev x stdev stdxy = (stdxrep .* transpose(stdyrep)); % elementwise multiplication gives stdev x stdev

% Make a few plots figure(1) contourf(lagcovxx_temporal(:,:,MAXLAG)) % zero-lag covariance title('zero-lag COVARIANCE of anomx = deseasonalized olr anomalies') figure(2) contourf(lagcovxx_temporal(:,:,MAXLAG)./stdxx) % zero-lag covariance title('zero-lag CORRELATION of anomx = deseasonalized olr anomalies')

**NOTE ADDED AT END OF COURSE: Some error here as std is computed from144 values while cov is from trimmed values to leave room for lag. ** see some corrections at http://mpo581-hw3-morestats.wikispaces.com/Results_sst_slp_Chen

figure(4) plot(stdx) title('OLR stdev as a function of lon') % Make a few plots figure(3) contourf( squeeze(lagcovxx_temporal(:,80,:)) ) % covariance as a function of lag for longitude #80 = 197.5E

title('COVARIANCE of OLR (lon=197.5) vs lon and lag in [-24,24] months')

full code [|HW3_olr_precip_mapes.m]

**IDL**: MAXLAG = 24; % out to +/- 24 months NLAGS = 2*MAXLAG + 1; % 49 lags are considere lag = -MAXLAG + indgen(NLAGS)

anomx = reform(anomx,144,240) anomy = reform(anomy,144,240)

lagcovxx_temporal = fltarr(144,144, NLAGS); lagcovxy_temporal = fltarr(144,144, NLAGS); lagcovyy_temporal = fltarr(144,144, NLAGS);


 * covxx = matrix_multiply (anomx, transpose(anomx));THIS ONE IS 144x144
 * %%% explicit loop to fill the array. Have to trim data within MAXLAG
 * of the end of the time series.

NTSTATS = 240-NLAGS .run for ilag = 0, NLAGS-1 do begin timelag = ilag-MAXLAG lagcovxx_temporal(*,*, ilag) = matrix_multiply( anomx(*, MAXLAG :239-MAXLAG), $ transpose( anomy(*, MAXLAG+timelag:239-MAXLAG+timelag) ))/NTSTATS; endfor end

contour, anomx <19.9 >(-19.9), lon, time, /fill, levels = levels, tit='olr anomalies', xtit='lon' contour, REFORM(lagcovxx_temporal(*,70,*)), lon, lag, /fill, nlev=23, xtit='lon', ytit='lag', levels=levels
 * Some plots


 * full code[| HW3_olr_precip_mapes.pro]

6. Lagged correlation, covariance, and cross-covariance: questions

 * 1) Show the zero-lag spatial covariance and correlation structures for your primary field, like [|OLR_anoms_covar_correl.BEM.Matlab.png] this for OLR. (please label the axes better than I did!) Interpret the results.
 * 2) Show longitude-lag sections of the covariance or correlation of this field, for a base point at some longitude of interest. Like this for OLR at a central Pacific longitude: [|OLR.lagregression.BEM.jpg] (Please label the axes better than I did in this example! I hate Matlab). Better in IDL: [|olr_lag_covariances.gif]
 * 3) Intepret the results in terms of the characteristic space and time scales of your anomalies. Can you see these characteristic scales in your original raw data like in olr_lag_covariances.gif?
 * 4) Share a longitude-lag slice of your lagged co-variance matrix for your TWO fields. Label it, interpret it.