Here's the script file that we'll be going over next class. We'll go a bit deeper into how to read and write files to your computer directly from R, and we'll get a teaser about plotting. Please download both the script file and the data files and save them to the same folder on your computer.
0 Comments
Here is the script file and the data file for Carlyn Perovich's lecture on packages and data.table.
Here is the script file that we'll go over on Tuesday. I took what we had left from Lecture 3 and copied it over, so we'll just start from this one. We'll talk about how to deal with NAs in your data, and start looking at some more advanced data summaries.
Remember that Assignment 1 is due this Tuesday (2 days from now). You can just email it to me to turn in (sorry if I made you print it out already!). I should have another assignment with subsetting practice to post before class. Here is the script file for next Tuesday. We'll be finishing the notes from last class on accessing data and then moving on to basic data summaries and subsetting.
Remember that the due date for the first assignment was pushed back to February 11th. TERMS
conditional statement - a command containing a conditional operator that evaluates to either TRUE or FALSE == conditional operator; "is equal to" != conditional operator; "is not equal to" > conditional operator; "is greater than" >= conditional operator; "is greater than or equal to" < conditional operator; "is less than" <= conditional operator; "is less than or equal to" | conditional operator; "or"; expects conditional statements on either side of the operator. Both sides of the | must be FALSE for this to evaluate to FALSE & conditional operator; "and"; expects conditional statements on either side of the operator. Both sides of the & must be TRUE for this to evaluate to TRUE mode - the inherent 'type' of a variable; we'll work mostly with numeric (numbers, integer or real), character (text), and logical (TRUE or FALSE) class - the 'organization' of the data contained in a variable. The default for simple scalar or vector data is for the class to be the same as the mode. Other classes that we will see are: matrix, data.frame, ts (time series). coerce - forcing a variable to change its mode. This really only works in cases where it makes sense for the data to be in a different type of mode. (e.g. a character "46" can become a numeric 46, a matrix of numerics can become a data frame) vector - one unit of data that contains many elements of the same mode. Describe a vector by its length and the mode of data contained (e.g. "a three element numeric vector"). elements - the individual pieces of a vector index - the location of an element within a vector (i.e. the 5th element has an index of 5) vectorization - an inherent property of R that allows faster and more flexible operations on data that have dimensionality (e.g. vectors and matrices as opposed to scalars). Since R can tell whether data are vectors or scalars, R determines how to efficiently perform operations on whatever data it encounters (performing the operation on each element of a vector, or performing an operation on two vectors that are matched element for element for instance) FUNCTIONS c(...) - "concatenate" Arguments: any number of arguments that are the same mode What does it do: combines arguments into a vector What does it return: a vector seq(from, to, by) - "sequence" Arguments: from (the starting number), to (the finishing number), by (the number to count by) What does it do: concatenates the integers starting with the 'from' argument and going to the 'to' argument, counting by the 'by' argument What does it return: a vector Example: seq(from=1, to=9, by=2) returns c(1, 3, 5, 7, 9) mode(x) and class(x) Arguments: any variable What does it do: determines the mode or class of that variable What does it return: character as.numeric(x), as.matrix(x), as.data.frame(x) Arguments: any variable What does it do: tries to coerce the variable to a numeric, matrix, or data frame What does it return: a numeric, matrix, or data frame mean(x, na.rm = FALSE) and sd(x, na.rm = FALSE) Arguments: x (any numeric variable), na.rm=FALSE (a logical determining how NAs are handled) What does it do: calculates the mean or standard deviation of the vector and ignores NAs if na.rm is set to TRUE What does it return: a numeric or NA (if there are any NAs present and na.rm is FALSE, the default) str(object) - "structure" Arguments: any variable What does it do: determines the structure of the variable What does it return: a short description of the modes and classes contained in the variable, as well as a preview of the first several elements length(x) Arguments: a vector or matrix What does it do: determines how many elements are in the variable What does it return: a scalar numeric dim(x) Arguments: a data frame or matrix What does it do: determines how many rows and columns are in the variable What does it return: a two-element numeric vector representing the number of rows in the variable as the first element and the number of columns as the second. Here is the first assignment of the semester. I consider the first 5 questions to be fundamental skills that you should know very well, as we'll be using them a lot in the class. The challenge questions (6 through 10) are supposed to be outside your comfort zone and are designed to make you think like a programmer. Give them a try!
Start from a blank script file and don't copy/paste from other sources. I encourage you to work with other people, but everyone should turn in their own answers and code. We'll be finishing the script file from Lecture 1 first (starting at Part 4: Variables continued) and then going over this script about data structures and how to access them.
Here is the syllabus. Remember that the schedule is subject to change. I'll be sure to keep you all in the loop.
Here are some of the terms and functions we covered today.
TERMS script — the file that the code is stored in. This is a permanent record of the commands that you want R to execute. console — the screen that shows what commands have just been executed. The calculations and values that result from running the code end up here. syntax — the “grammar” of the programming language. The specifics of how you write a command for it to be understood as something for R to execute. comments — lines of code in the script file that are not treated as commands by R. Anything after a ‘#’ will be ignored by R, so you can write notes in plain English to remind yourself and collaborators what the commands are doing. These are absolutely CRITICAL for writing legible code. variables — named objects in R. We’ll see a lot of different kinds of variables (e.g. integers, real numbers, vectors, matrices, character strings). They do not alter the original data or variables that are used to derive them, and they are named by the user. functions -- the workhorse of R. Does 3 things: takes in arguments, does something to those arguments, returns ONE thing. arguments -- variables that are passed to a function so the function can perform some operation on them. FUNCTIONS exp(x) Arguments: x What does it do: exponentiates x What does it return: a single number representing the exponentiation of x log(x, base = exp(1)) Arguments: x, base (default is exp(1)) What does it do: calculates the log of x given the base What does it return: a single number representing the log of x given the base Here are the class notes that we will be going over tomorrow morning. For future weeks, I will have these posted by Sunday night.
Please click the button below to download the script file and save it to a new folder specifically for this course. It'll become more apparent later why it makes sense to keep all files relating to a single project in the same folder. If you haven't done so yet, please download and install R. You can find the links to do so in the post below. Make sure that you can get it to open on your computer. For tomorrow, please bring your personal computer with R installed and with this script file downloaded. We'll go over the syllabus, go around and introduce ourselves (including one specific goal that you hope to achieve through this class), and then jump right in to this material. Let me know if you have any trouble with this file or with installing R. As always, feel free to email me if you have any general questions. |
About the Course
The course is aimed to introduce students to the R language in an interactive way. In-class participatory lectures and assignments will be posted here. Categories
All
Archives |