Today starting to use SAS to analyse what I extracted from the simulation yesterday. Today is probably going to be a case of trying to get the data into SAS, and then just starting to work out how to use the program.
NB would measuring the polarity of the group be a useful measure?
Okay, first issue: SAS doesn’t like Unix-style data files, with a \n line return character. This can be got round by using the Cygwin program unix2dos. It also appears that SAS is quite interesting when it comes to delimiters, so it may be an idea to use spaces ” ” as a delimiter. (Note to self: the find-replace in Notepad is horrifically slow — use Vim’s instead )
Now managed to get data into SAS — the following was used to import data (line numbers added by me, through HTML bullets):
- data boids; /* this tells SAS a convenient name to use to refer to the data */
- infile ‘path_to_data_file’;
- input BOID RADIUS TIMESTEP X_POS Y_POS Z_POS; /* column headings */
- proc print data=boids; /* print out the data */
- run;
SQL in SAS seems to work pretty well also: replacing lines 4 & 5 above with
proc select * from boids (where BOID = 1)
… selects all of the data. The italicised suffix will select just the results for boid #1 — this is pretty awesome, I must say. So now to start working out the distances travelled by each boid. I have 3 coordinates for each boid, x, y, and z, so I’m going to work out the difference between two sets of coordinates — i.e. dx = x2 – x1, etc — then use Pythagoras to find h^2 = x^2 + y^2 + z^2.
After a number of hours searching, managed to work out how to work out differences between observations (SAS term for rows) in a column:
data sample;
input x;
cards;
12
15
17
18
13
;
run;
data two;
set sample;
ratio = x – lag(x);
run;
proc print data=two;
run;
The bits in the middle paragraph are the important ones, for the time being. Lag appears to a function of some importance, it seems. With a bit of playing, how to calculate the distance travelled by a bird, using Pythagoras, in SAS:
proc sql;
create table boid1 as
select *
from boids
where BOID = 1;
data four;
set boid1;
h = sqrt(
(X_POS – lag(X_POS)) * (X_POS – lag(X_POS)) +
(Y_POS – lag(Y_POS)) * (Y_POS – lag(Y_POS)) +
(Z_POS – lag(Z_POS)) * (Z_POS – lag(Z_POS))
);
run;
proc print data=four;
run;
And this is how to calculate the mean distance travelled by the boid in question:
proc means data=four;
class boid;
var h;
run;
Just gone through the data in Excel, using functions I know (!), and the results come out the same, which is good. The next piece to do is to work out the mean distance travelled by all boids, at once, and not just one bird. How to do this then? Hmmm, I’m thinking of sorting the results by bird … or maybe grouping them through SQL? Anyway, this is the first post of today — expect another one later, as computer complaining about needing a reboot.