Result_sst_u100_Greta

=SST and u100= Greta Leber

Note: x = sst; y = u1000.

3. (the assignment part)
_ _ _ _   Is it the same as the time series of the spatial mean of the raw data? Or is it a new thing?
 * 1) Confirm that the time mean of the anomalies as defined above is 0. Yes, must be true by definition of anomaly.
 * 2) Is the spatial mean of the anomalies (as defined above) 0? No, this shows that the anomalies show spatial dependence.



The spatial mean of the raw data is directly related to the spatial mean of the anomalies through the seasonal cycle. If one removes the seasonal cycle from the spatial mean of raw data, one recovers the spatial mean of anomalies. _ _ _ _ _ _      _ _  3. My CLIMATOLOGICAL ANNUAL CYCLE have variance:
 * 2.4852 ° C 2 : SST
 * 0.1921 m 2 s -2 : u1000

_ _ _  4. My INTERANNUAL ANOMALY ARRAYS have variance:
 * 0.4473 ° C 2 : SST
 * 6.0257 m 2 s -2 : u1000

_. _ 5. Fill out a variance decomposition table for field 1: feel free to add columns if you can define other parts. _ _ 6. Discuss your results:
 * = Variance / ( **°** C) 2 ||= SST ||
 * < a) total variance of x || 2.9326 ||
 * < b) purely spatial (variance of TIME mean at each lon) || 1.8427 ||
 * < c) variance of (x minus its TIME mean at each lon) ||= 1.0898 ||
 * < d) purely temporal (variance of LON mean at each time) ||= 0.3616 ||
 * < e) variance of (x minus its LON mean at each time) ||= 2.5710 ||
 * < f) remove both means (space-time variability) ||= 0.7282 ||
 * < g) mean seasonal cycle ||= 2.4852 ||
 * < h) deseasonalized anomalies ||= 0.4473 ||
 * < i) variance of longitudinal mean of h ||= 0.1025 ||
 * < j) h minus i || 0.3448 ||


 * The total variance, a, can be decomposed into:
 * (g) + (h), mean seasonal cycle + deseasonalized anomalies
 * (b) + (c), spatial + temporal anomalies
 * (d) + (e), temporal + spatial anomalies
 * (b) + (d) + (f), spatial + temporal + space-time
 * (c) + (e) - (f), temporal anomalies + spatial anomalies - space-time

4. decomposition by scale (the assignment)

 * 1) Based on the variance_by_scalefactor diagram you make, what space and time scales (units: degrees and months) have an especially prominent lot of variance in your anomx field? These are the scales at which averaging over them reduces variance the most. Make a contour plot of anomx, or use Milan's total plots in Getting data -- can you see these "characteristic" scales by eye?
 * Annotate the contour plot with some scale indications of about the right size (ovals in powerpoint may be easiest).

First, here is my variance_by_scalefactor diagram for SST:



Looking at the right plot: It is clear that the largest gradients are clustered into the bottom right corner, or where the time scalefactor is large and the spatial scale factor is small. The smallest gradients are found for small time and small spatial scalefactors, and also for large time and large spatial scalefactors. This suggests that features persisting across large space and time scales and also small space and time scales are not as important as those over large spatial areas and a small amount of time (since a small scalefactor corresponds to a large scale and v.v.).

The scales over which the most variance occurs are for those with a time scalefactor of 48, which translates to (240/48) 5 months for a 20 year time series, and with a spatial scalefactor of less than 8 [(144/8)*2.5 ° ] or 45 °.

Now, a contour plot of anomx (with a rectangle in the lower left corner showing how much space is occupied by 45 ° and 5 months: the scales that hold the most variance as discussed above):



Even without the rectangle revealing the scales at which the most variance occur, the eye is drawn to the portion of the plot between 150-325 longitude where a shape similar in size to the rectangle repeats itself with strong negative and positive deviations. This confirms what was found above about these spatial and time scales containing the largest amount of variance.

5. Scatter plot, correlation and covariance, regression-explained variance
>>>> >>>> Subset.std_y=std(Subset.y); %0.559514475156165 >>>> >>>> >>>> >>>> >>>>
 * 1) Based on your data fields (which you've seen pictures of), **make subsets of your 2 variables** x and y and **make a scatter plot of these showing the strongest (positive or negative) correlation of one field with the other you can find**. The subset might simply be all (x,t) values if your fields are very similar (olr, precip), or maybe the 240 time values at one longitude, or 144 longitudinal values in the time mean, or time series at different longitudes if some variability is offset in your two fields (like pressure and wind).
 * After looping through all subsets of my two variables (holding both time and longitude constant separately), I found that the highest correlation coefficient was found at longitude 12.5 ° with a correlation coefficient between uwnd and SST of -0.77. The scatter plot for these two variables at longitude 12.5 ° is:
 * [[image:Leber_HW3_Q5.jpg width="800" height="444"]]
 * 1) Now consider the covariance and correlation of the two subset arrays entering your scatterplot.
 * What is the correlation coefficient corresponding to this scatter plot? rho = corrcoef(x,y) rho = correlate(x,y)
 * Subset.rho=corrcoef(Subset.x,Subset.y); %[1,-0.770822954073569;-0.770822954073569,1]
 * What are the standard deviations of your two data subsets? std(x) stdev(y)
 * Subset.std_x=std(Subset.x); %1.397462825115031
 * What fraction of the variance of y can be 'explained' by linear regression on x (y = mx + b)? How does this relate to rho? How much y variance is explained? (variance: with units of y squared) What is m? //Hint: these are simple questions: use the math formula, not a computer code (Hsieh section 1.4.2, Eq. 1.33).//
 * The explained variance is simply the correlation^2 multiplied by the variance of the individual variable. We already have the correlation (0.77) from above, but we need to calculate the variance for y: Subset.yVar=var(Subset.y,1); %0.311752046042990 m^2/s^2. The explained variance becomes that multiplied by correlation^2: Subset.yVarExplained=Subset.yVar*(Subset.rho(1,2)^2); %0.185233097963025 m^2/s^2. Therefore the fraction of variance that is explained by a linear regression is given by: Subset.yVarExplainedFraction=Subset.yVarExplained/Subset.yVar; %0.594168026526704. So, about 60%.
 * The slope, m is equal to -0.30862.
 * What fraction of the variance of x can be 'explained' by linear regression on variable y? (x = nx + a)? How does this relate to rho? What is n? //Hint: these are simple questions, use the math formula not computer code.//
 * As above, we calculate the variance of x on its own: Subset.xVar=var(Subset.x,1); %1.944765254463573 ^0C^2. The linear regression, then, explains: Subset.xVarExplained=Subset.xVar*(Subset.rho(1,2)^2); %1.155517333302324 ^0C^2. Therefore the fraction of variance that is explained by a linear regression is given by: Subset.xVarExplainedFraction=Subset.xVarExplained/Subset.xVar; %0.594168026526704. So, about 60% (exactly the same as above for y).
 * The slope, n, is equal to -1.9252.
 * 1) Now add uncorrelated (random) noise with variance 1 to one of your variables. This might be like observation error. noisey = y + random('Normal',0,1,size(y))
 * How did the variance of y change when this noise was added? var(y) var(noisey)
 * My original y variance was 0.311752046042990 m^2/s^2. The variance of my y + random noise is: Subset.noiseyVar=var(Subset.noisey,1); %1.330528012299738. By adding noise, the variance went up by close to 1.
 * How did the correlation change? rho = corrcoef(x,noisey) rho = correlate(x, noisey)
 * The new rho is: Subset.rho_x_noisey=corrcoef(Subset.x,Subset.noisey); %[1,-0.415953654523728;-0.415953654523728,1]. By adding random noise, therefore, the correlation between the two variables became much weaker.
 * How do these changes affect the regression of y on x? How much (y+noise) variance is explained by linear regression on x? What is the new value of m in the new (noisey = mx + b) regression?
 * Following the procedure as above, the explained variance for noisey is: Subset.noiseyVarExplained=Subset.noiseyVar*(Subset.rho_x_noisey(1,2)^2); %0.230204554144309 m^2s^-2. Therefore, the fraction of total variacne this represents is: Subset.noiseyVarExplainedFraction=Subset.noiseyVarExplained/Subset.noiseyVar; %0.173017442711645. Now, only 17% of the variance is explained by linear regression, an appreciable difference from alomst 60%.
 * The m in the noisey regression is now equal to -0.34405.

6. Lagged correlation, covariance, and cross-covariance: questions
>> >>
 * 1) Show the zero-lag spatial covariance and correlation structures for your primary field, like OLR_anoms_covar_correl.BEM.Matlab.png this for OLR. (please label the axes better than I did!) Interpret the results.
 * [[image:Leber_HW3_Q6.jpg width="800" height="440"]]
 * The largest covariance and correlation is seen in a block between longitudes 150-275. This is in the area where the ENSO signal is strongest. Outside of this area, the only signals that appear to be significant are those relating a longitude to itself (the line going from the bottom left to top right) and so is not of interest.
 * 1) Show longitude-lag sections of the covariance or correlation of this field, for a base point at some longitude of interest. Like this for OLR at a central Pacific longitude: OLR.lagregression.BEM.jpg (Please label the axes better than I did in this example! I hate Matlab). Better in IDL: olr_lag_covariances.gif
 * Similarly as what I did for question #5, I looped through holding each time and longitude constant to determine the highest correlation. I found the longitude of highest correlation between the two anomaly fields to be lon=190 (field 77). Here is the longitude-lag plot for that longitude:
 * [[image:Leber_HW3_Q6b2.jpg width="800" height="435"]]
 * 1) Intepret the results in terms of the characteristic space and time scales of your anomalies. Can you see these characteristic scales in your original raw data like in olr_lag_covariances.gif?
 * The characteristic space and time scales found in question 4 were 45 ° and 5 months. The contour associated with covariances above approximately 0.73 corresponds to this space and time scale. This is seen more clearly in the following figure (where the contour in question is labeled):
 * 
 * 1) Share a longitude-lag slice of your lagged co-variance matrix for your TWO fields. Label it, interpret it.
 * [[image:Leber_HW3_Q6d.jpg width="800" height="420"]]
 * The only thing that appears significant in this plot is similar to the covariance lag-lon plot of SST alone. However, the main difference here is that the area of red seems to be larger and of a greater magnitude in this case.