Assignment

= = =Assignment for Homework 3:= =Parts 1-4: characterizing variability in 1 field. //__ DUE DATE: MON APRIL 4 __//=

Example Data grab: Matlab in red, IDL in blue
x = double(ncload('/Users/bem/Downloads/hw3_data.nc','olr')); y = double(ncload('/Users/bem/Downloads/hw3_data.nc','precip')); lon = double(ncload('/Users/bem/Downloads/hw3_data.nc','lon')); time = linspace(1982., 2009. +11./12, 240);

cdf2idl, '/Users/bem/Downloads/hw3_data.nc' @hw3_data.idl x = olr y = precip time = 1982. + findgen(240)/12.

NLONS = 144; NYEARS = 20; NMONS = 12 * NYEARS;

With data named x and y, the rest of the code can be pretty standard, but in your presentation of results page please make the plot titles say the right field names!

Get used to

 * ===which dimension (first or second in the (144x240) indicated by whos or help ) is space (144) and time (240)===
 * ===which is called a "row" or "column" in the matrix manipulations===
 * ===which array indices vary most rapidly as you step through the data in memory order===
 * ===so that months 1,2,3,....13,14,... can be reformed or reshaped from 240 to 12x20===

===These are just language conventions, with no fundamental right or wrong way, so it is irritating to have to learn. Happily, both IDL and Matlab show (144,240) for these arrays, with longitude as dimension "1" and time as dimension "2" in functions like mean(x,1) or total(x,1). In both languages, the arrays need to be transposed or permuted to put time series (240 values) in continuous memory order, so that it can be re-interpreted as 12x20, in order to average over the 20 and make a 12-month climatology.===

//You don't need to turn in anything here, it is just step 1.//

2. Your field 1 means and variances: space, time, spacetime

 * 1) The GRAND (spacetime) **mean** of my x and y (including units): **Matlab**: mean( x ) IDL: print, mean(x)
 * 2) The GRAND (spacetime) **variance** of my x and y (including units): **Matlab**: var( x,1) IDL: print, mean(x^2)-mean(x)^2
 * NOTE: If your variance budgets are to add up exactly, you need them to be **variance = (sum(x^2) - sum(x)^2) /N **
 * NOT the /N-1 variance "estimator" that is standard in both IDL & Matlab var functions.
 * This is the var(x, 1) in Matlab.
 * **return, mean(x^2) - mean(x)^2** can easily be packaged into a 1-line function in IDL: [|var_n.pro]
 * 1) The GRAND standard deviations are: std(x) print, sqrt(var_n(x))
 * 2) The SPATIAL variance of my TIME MEAN longitude section is: _(units). var( mean(x,2) ,1) print, var_n( total(x,2)/NMONS )
 * 3) The TEMPORAL variance of my LONGITUDE MEAN time series is: _(units). var( mean(x,1) ,1) print, var_n( total(x,1)/NLONS )

We have to rotate the time dimension to be continuous in memory (first array dimension), then reinterpret (reform or reshape) it as months x years, then average over years
xx = reshape( permute(x, [2 1]), 12, NYEARS, NLONS); climxx12 = squeeze( mean(xx ,2 )); climx12 = permute( climxx12, [2 1]);
 * Matlab: **

**IDL**: xx = reform( transpose(x), 12, NYEARS, NLONS) climxx12 = total(xx, 2 ) /NYEARS climx12 = transpose(climxx12)

Subtract that mean annual cycle, to get the traditional climate definition of "anomalies" (temporal anomalies):
**Matlab**: anomx = reshape(x, NLONS, 12, NYEARS); % Declare right sized array initially equal to x for iy = 1:NYEARS anomx(:,:, iy) = anomx(:,:, iy) - climx12; % subtract climo end anomx = reshape(anomx, 144, 240);

**IDL**: anomx = reform(x,NLONS,12,NYEARS) ; Declare right sized array initially equal to x for iy = 0, NYEARS-1 do anomx (*,*,iy) = anomx (*,*,iy) - climx12 anomx = reform(anomx, 144, 240)

3. (the assignment part)
Write down all the decompositions of total variance that add up sensibly, and confirm them by noticing that the numbers work out. Example: (a) = (g) + (h). Yes it checks out, 100 = 23 + 77. or whatever numbers for your field x
 * 1) Confirm that the 20-year mean of the anomalies as defined above is 0. Write math (on paper, for yourself) that proves it/ shows why.
 * 2) Is the spatial mean of the climate anomalies (as defined above) 0? Is it the same as the time series of the spatial mean of the raw data? Or is it a new object?
 * 3) My CLIMATOLOGICAL ANNUAL CYCLE has variance: _(units). var( climx12,1) print, var_n(climx12)
 * 4) My INTERANNUAL ANOMALY ARRAYS has variance: _(units). var( anomx,1) print, var_n(anomx)
 * 5) Fill out a variance decomposition table for field 1: feel free to add columns if you can define other parts. It may help you to look at an example: []
 * || field1 name ||
 * a) total variance of x ||  ||
 * b) purely spatial (variance of TIME mean at each lon) ||  ||
 * c) temporal anomalies (x minus its TIME mean at each lon) ||  ||
 * d) purely temporal (variance of LON mean at each time) ||  ||
 * e) spatial anomalies (x minus its LON mean at each time) ||  ||
 * f) remove both means (space-time variability) ||  ||
 * g) mean seasonal cycle ||  ||
 * h) deseasonalized anomalies ||  ||
 * i) variance of longitudinal mean of part h) ||  ||
 * j) h minus i (anomalies from both space and monthly-climatological means) ||  ||
 * __6. Discuss your results__:**

// Extra credit for inventing and a clear, consistent notation (extra extra credit if it can be typed in clunky Wiki editors)! // // The usual simple prime and overbar notation is not enough, there are multiple kinds of averages and deviations from those averages. //


 * // a too-codey, not-mathy-enough example:take it or make up your own... //**
 * // _bart is time mean, _pert is perturbation in time, and same for space x. //**
 * // x = x_bart + x_pert //**
 * // x = x_barx + x_perx //**
 * // x = x_barx + (x_perx_bart + x_perx_pert) //**
 * // etc. //**
 * // Then we'd just need to work out the rules of it: //**
 * // For example, bar and per commute in x and t: //**
 * **// x_perx_pert = //****// x_pert_perx (graphs make it clearer, as in //**[|Varaince_in_tp_space.pptx]**// ) //**
 * **// x_barx_bart = //****// x_bart_barx //**


 * // Some sort of labeled squiggle for "per," just like the labeled overbar we use for the average, might make a workable math notation? //**

Rebin the data to coarser meshes, by averaging the values that fall in each coarse grid cell.
Your anomx array is 144 longitudes x 240 months. The 144 space bins can be regridded using factors of 2 to 72, 36, 18, 9, and then (factor of 3) to 3 bins. The 240 time bins can be regridded by factors of 2 to 120, 60, 30, 15, and then (by a factor of 3) to 5 bins.

Scale is the //logarithm// of size. It can be measured in //decades// (factors of 10) or //octaves// (factors of 8). For example, 1km, 10km, 100km, 1000km are equally spaced //scales// (but exponentially growing //sizes//).

Rebinning the data averages out the smaller scale structure, and preserves the larger scale structure. Variance as a function of bin scale measures the relative amount of structure at small vs. large scales.

Necessary code:
scalefactors = [1, 2, 4, 8, 16, 48]; % matlab or IDL are the same for these lines

IDL:
scales = alog(scalefactors) ;;; measured in factors of e, not decades or octaves variance_by_scalefactor = fltarr(6,6)

for it = 0, 4 do $ for ix = 0, 4 do begin print, 'rebinning space to ' +string(NLONS/scalefactors(ix))+' bins, and time to '+string(NMONS/scalefactors(it) ) a = rebin(anomx, NLONS/scalefactors(ix), NMONS/scalefactors(it)) variance_by_scalefactor(ix,it) = var_n(a) endfor
 * a loop with multiple commands needs a **.run ... end** block if you paste into the command line:
 * .run**
 * end**

Matlab:
scales = log(scalefactors) % measured in factors of e, not decades or octaves variance_by_scalefactor = zeros(6,6); % Step 1: DECLARE the thing you want to build!

for it = 1:6 for ix = 1:6 ['rebinning space to ' num2str(ix) ' and time to ' num2str(it)] % say what's happening while it runs a = blkproc(anomx, [scalefactors(ix) scalefactors(it)], 'mean2'); variance_by_scalefactor(it,ix) = var(a,1) end end

calculate the gradient of var with scale and then its magnitude - this part is not well tested!
;;; The derivative of variance with scale shows where power (variance) is concentrated.
 * The magnitude of the gradient is the "__distribution__" of variance with scale:
 * it tells you at what scale the rebinning process reduced variance the most,
 * which is the same as saying which scales in the data had the most variance

dvar1 = variance_by_scalefactor*0 & dvar2 = variance_by_scalefactor*0 ;;; right sized arrays for ix = 0,NS-1 do dvar1[ix,*] = deriv(variance_by_scalefactor[ix,*], scales) for it = 0,NS-1 do dvar2[*,it] = deriv(variance_by_scalefactor[*,it], scales)
 * IDL:**

variance_distrib = sqrt(dvar1^2 + dvar2^2) contour, variance_distrib, /xlog, /ylog, title='d(var)/d(scale) (var. dist): MYFIELDNAME', $ xtit='rebin factor in lon', ytit='rebin factor in t'

Matlab:
[dvar1, dvar2] = gradient(variance_by_scalefactor, scales, scales); variance_distrib = sqrt(dvar1 .^2 + dvar2 .^2)

4. decomposition by scale (the assignment)

 * 1) Based on the variance_by_scalefactor diagram you make, what space and time scales (units: degrees and months) have an especially prominent lot of variance in your anomx field? These are the scales at which averaging over them reduces variance the most. Make a contour plot of anomx, or use Milan's total plots in Getting data -- can you see these "characteristic" scales by eye?
 * Annotate the contour plot with some scale indications of about the right size (ovals in powerpoint may be easiest).