class: center, middle, inverse, title-slide .title[ # Research Issues and Practices in Psychology ] .subtitle[ ## Week 1: Gentle Introduction to R ] .author[ ### Jason Geller, Ph.D.(he/him/his) ] .institute[ ### Princeton University ] .date[ ### 2022-09-06 ] --- # Objective .left-column[ - Get you started with R - Load your first dataset in R - Explain some basic terminology and concepts ] .right-column[ - Explain how to structure any data analysis project - Learn how to run commands and save scripts ] <img src="lover.png" width="30%" style="display: block; margin: auto;" /> --- # It's Scary <img src="nocluedog.png" width="65%" style="display: block; margin: auto;" /> --- # Note - You can learn R! -- - You will get frustrated. -- - You will get errors that don't help or make sense. -- - Google is your friend. - Try Googling the specific error message first. - Then try googling your specific function and the error. - Try a bunch of different search terms. --- # Outline + ` Why R ` + IDE + R commands, data structures, and functions - Tidyverse & the Pipe Operator + Multiple Functions + Reading in data + Saving R scripts --- Why R? + Free and open-source -- + Flexibility -- + Programming language (not point-and-click) -- + Excellent graphics (via `ggplot2`) -- + Easy to generate reproducible reports (markdown and quarto) -- + Easy to integrate with other tools and programs -- + Inclusive Community -- + Marketability ??? R was created in early part of 1990s. Is free and open source. What does that mean? Open to everyone. Anyone can contribute. I have a package called gazeR. There is a package for lots of things you make want to do. --- <center> <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Yes, I made the corn song about R. Let me know how the students react tomorrow <a href="https://twitter.com/minebocek?ref_src=twsrc%5Etfw">@minebocek</a>! And thanks <a href="https://twitter.com/LisaDeBruine?ref_src=twsrc%5Etfw">@LisaDeBruine</a> for the lyrics :) <a href="https://t.co/8HKHpPoiQ2">pic.twitter.com/8HKHpPoiQ2</a></p>— Rafael Moral (@rafamoral) <a href="https://twitter.com/rafamoral/status/1564376662971760642?ref_src=twsrc%5Etfw">August 29, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> <center> --- # Outline + Why R? + *IDE* + R commands, data structures, and functions + Tidyverse & the Pipe Operator + Multiple Functions + Reading in data + Saving R scripts --- background-image: url(console_layout.png) background-position: center background-size: cover --- # Outline + Why R? + IDE + *R commands, data structures, and functions* + Tidyverse & the Pipe Operator + Multiple Functions + Reading in data + Saving R scripts --- # Commands - Commands are the code that you tell R to do for you. - They can be very simple or complex. - Computers do what *you* tell them to do. Mistakes happen! - Maybe it's a typo, maybe it's a misunderstanding of what the code does --- # Commands - You can type a command directly into the *console* - You can type in a document (*Script* or *Markdown*) and tell it to then run in the *console* ```r X <- 4 ``` --- # Commands - `>` indicates the console is ready for more code - `+` indicates that you haven't finished a code block - Capitalization and symbols matter - Hit the up arrow – you can scroll through the last commands that were run - Hit the tab key – you'll get a list of variable names and options to select from - Use the `?` followed by a command to learn more about it --- # Comments <img src="future_self.png" width="40%" height="20%" style="display: block; margin: auto;" /> --- # Comments - You can make comments on your code using the `#` symbol - Comments are not processed by R, they provide documentation of your code for humans - Feel free to comment your personal code as much as you need to in order to understand it - Try to make your code clear enough that it can be understood even without comments ```r #this does something #this does not work because I am stupid x=4 #assign ``` --- # Assignment - A variable is a symbol or object that stands for another value (just like “X” in algebra) ```r x <- 4 x ``` ``` ## [1] 4 ``` -- - The arrow `<-` is called an ASSIGNMENT OPERATOR, and tells R to save an object called x that has the value of 6. This is similar to saving a value in a graphing calculator. - Can use `=` if you want --- # Objects and Values - We will use the terms __object__ and __variable__ a lot when talking about code in this class - `Objects` are things you save in your environment (like a set of numbers, a dataset, or a regression model) - We will use the word `Variable` to refer to columns of a data frame and to data variables that we use in models --- # Vectors - Think of it as a row or column in a spreadsheet - Allows same classes to be concatenated together - Numeric ```r x <- c(2,6,16) x ``` ``` ## [1] 2 6 16 ``` --- # Vectors - Character ```r x <- c("cat", "bat") gender <- c("male", "female") ``` ```r gender <- as.factor(gender) gender ``` ``` ## [1] male female ## Levels: female male ``` - Logical ```r x==7 ``` ``` ## [1] FALSE FALSE ``` --- # Indexing - Vectors can be indexed - 1 not 0 ```r x[1] # retreve first ``` ``` ## [1] "cat" ``` ```r x[-2] # everything but that number ``` ``` ## [1] "cat" ``` ```r x[1] <- 7 # Change values in vector ``` --- # Lists - While vectors are one row of data, we might want to have multiple rows or types - With a vector, it is *key* to understand they have to be all the same type - Lists are a grouping of variables that can be multiple types (between list items) and *can be different lengths* - Often function output is saved as a list for this reason - They usually have names to help you print out just a small part of the list ```r library(palmerpenguins) output <- lm(flipper_length_mm ~ bill_length_mm, data = penguins) str(output) ``` ``` ## List of 13 ## $ coefficients : Named num [1:2] 126.68 1.69 ## ..- attr(*, "names")= chr [1:2] "(Intercept)" "bill_length_mm" ## $ residuals : Named num [1:342] -11.766 -7.442 0.206 4.29 -3.104 ... ## ..- attr(*, "names")= chr [1:342] "1" "2" "3" "5" ... ## $ effects : Named num [1:342] -3715.57 170.39 1.03 5.35 -2.22 ... ## ..- attr(*, "names")= chr [1:342] "(Intercept)" "bill_length_mm" "" "" ... ## $ rank : int 2 ## $ fitted.values: Named num [1:342] 193 193 195 189 193 ... ## ..- attr(*, "names")= chr [1:342] "1" "2" "3" "5" ... ## $ assign : int [1:2] 0 1 ## $ qr :List of 5 ## ..$ qr : num [1:342, 1:2] -18.4932 0.0541 0.0541 0.0541 0.0541 ... ## .. ..- attr(*, "dimnames")=List of 2 ## .. .. ..$ : chr [1:342] "1" "2" "3" "5" ... ## .. .. ..$ : chr [1:2] "(Intercept)" "bill_length_mm" ## .. ..- attr(*, "assign")= int [1:2] 0 1 ## ..$ qraux: num [1:2] 1.05 1.04 ## ..$ pivot: int [1:2] 1 2 ## ..$ tol : num 1e-07 ## ..$ rank : int 2 ## ..- attr(*, "class")= chr "qr" ## $ df.residual : int 340 ## $ na.action : 'omit' Named int [1:2] 4 272 ## ..- attr(*, "names")= chr [1:2] "4" "272" ## $ xlevels : Named list() ## $ call : language lm(formula = flipper_length_mm ~ bill_length_mm, data = penguins) ## $ terms :Classes 'terms', 'formula' language flipper_length_mm ~ bill_length_mm ## .. ..- attr(*, "variables")= language list(flipper_length_mm, bill_length_mm) ## .. ..- attr(*, "factors")= int [1:2, 1] 0 1 ## .. .. ..- attr(*, "dimnames")=List of 2 ## .. .. .. ..$ : chr [1:2] "flipper_length_mm" "bill_length_mm" ## .. .. .. ..$ : chr "bill_length_mm" ## .. ..- attr(*, "term.labels")= chr "bill_length_mm" ## .. ..- attr(*, "order")= int 1 ## .. ..- attr(*, "intercept")= int 1 ## .. ..- attr(*, "response")= int 1 ## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> ## .. ..- attr(*, "predvars")= language list(flipper_length_mm, bill_length_mm) ## .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric" ## .. .. ..- attr(*, "names")= chr [1:2] "flipper_length_mm" "bill_length_mm" ## $ model :'data.frame': 342 obs. of 2 variables: ## ..$ flipper_length_mm: int [1:342] 181 186 195 193 190 181 195 193 190 186 ... ## ..$ bill_length_mm : num [1:342] 39.1 39.5 40.3 36.7 39.3 38.9 39.2 34.1 42 37.8 ... ## ..- attr(*, "terms")=Classes 'terms', 'formula' language flipper_length_mm ~ bill_length_mm ## .. .. ..- attr(*, "variables")= language list(flipper_length_mm, bill_length_mm) ## .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1 ## .. .. .. ..- attr(*, "dimnames")=List of 2 ## .. .. .. .. ..$ : chr [1:2] "flipper_length_mm" "bill_length_mm" ## .. .. .. .. ..$ : chr "bill_length_mm" ## .. .. ..- attr(*, "term.labels")= chr "bill_length_mm" ## .. .. ..- attr(*, "order")= int 1 ## .. .. ..- attr(*, "intercept")= int 1 ## .. .. ..- attr(*, "response")= int 1 ## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> ## .. .. ..- attr(*, "predvars")= language list(flipper_length_mm, bill_length_mm) ## .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric" ## .. .. .. ..- attr(*, "names")= chr [1:2] "flipper_length_mm" "bill_length_mm" ## ..- attr(*, "na.action")= 'omit' Named int [1:2] 4 272 ## .. ..- attr(*, "names")= chr [1:2] "4" "272" ## - attr(*, "class")= chr "lm" ``` --- # R as a Calculator - Typing in a simple calculation show us the result ```r 608+ 28 ``` ``` ## [1] 636 ``` -- ```r 11527-283 ``` ``` ## [1] 11244 ``` -- ```r # divison 400/65 ``` ``` ## [1] 6.153846 ``` ```r #multiplication 2*4 ``` ``` ## [1] 8 ``` ```r #exponentiation 5^2 ``` ``` ## [1] 25 ``` --- # Functions - Take an object, do something to it, and return the result - More complex calculations can be done with functions: - What is square root of 64? ```r # sqrt function # in parenthesis: what we want to perform function on sqrt(64) ``` ``` ## [1] 8 ``` ```r sr=function(a, b){ c=a + b return(c) } sr(2,3) ``` --- # Arguments - Some functions have settings (“arguments”) that we can adjust: - `round(3.14)` - Rounds off to the nearest integer (zero decimal places) - `round(3.14, digits=1)` - One decimal place --- # Getting Help 1. Help files <img src="help.png" width="100%" style="display: block; margin: auto;" /> --- background-image: url(help2.png) background-position: center background-size: cover --- # Exercises 1. Open a blank new script 1.1 File -> New File > R Script 1.2 Ctrl + Shift + N 1.3 Click on new script icon 2. To paste strings together you can use the `paste`() function (e.g., paste("Hello", "World")). Use ?paste or Google "paste function in R" to get an idea of how to use this function. 2.1 Use the `paste` function to string together a sentence of your choice. Assign it to a variable or object. 5. Modify the function above and instead of returning the sum return the product. --- # Outline + Why R + IDE + R commands & functions - `Tidyverse & the Pipe Operator` + Multiple Functions + Reading in data + Saving R scripts --- # Tidyverse and Pipes - The `tidyverse` is an ecosystem of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures. <img src="hex.png" width="40%" style="display: block; margin: auto;" /> --- background-image: url(tidyversepacks.jpg) background-position: center background-size: cover --- # Installing Tidyverse <img src="cran.png" width="40%" style="display: block; margin: auto;" /> ```r install.packages(tidyverse) ``` - Load package ```r library(tidyverse) ``` --- # Pipes - `tidyverse` provides another interface to functions—the pipe operator - Makes code easier to read and follow: - This: ```r a %>% round() ``` -- - Can be converted into: - Start with a and then round - `|>` pipe is slowly becoming more popular --- # Outline + Why R + IDE + R commands & functions - Tidyverse & the Pipe Operator + `Multiple Functions` + Reading in data + Saving R scripts --- # Multiple Functions - Pipe operator makes it easy to do multiple functions in a row ```r -16 %>% sqrt() %>% abs() ``` - What is this doing? --- # Outline + Why R + IDE + R commands & functions - Tidyverse & the Pipe Operator + Multiple Functions + `Reading in data` + Saving R scripts --- # Reading in Data - Download the file <img src="readr.png" width="100%" /> - General form: dataframe.name <-read.csv('filename') --- # Data Frames > A data frame is like an Excel spreadsheet. It is two-dimensional with rows and columns. - Instead of creating a number of vectors we store all the vectors into a single DF - Can store numeric data (phone number, postal code, coordinates, etc.), float data (internet IP address, etc.), logical data (wants to receive ads: FALSE/TRUE, etc.), etc. ```r car_model <- c("Ford Fusion", "Hyundai Accent", "Toyota Corolla") car_price <- c(25000, 16000, 18000) car_mileage <- c(27, 36, 32) cars_df <- data.frame(model=car_model, price=car_price, mileage=car_mileage) flextable::flextable(cars_df)%>% flextable::autofit() ``` <template id="83a3f0c3-cef6-4591-8fbd-a3f269974008"><style> .tabwid table{ border-spacing:0px !important; border-collapse:collapse; line-height:1; margin-left:auto; margin-right:auto; border-width: 0; display: table; margin-top: 1.275em; margin-bottom: 1.275em; border-color: transparent; } .tabwid_left table{ margin-left:0; } .tabwid_right table{ margin-right:0; } .tabwid td { padding: 0; } .tabwid a { text-decoration: none; } .tabwid thead { background-color: transparent; } .tabwid tfoot { background-color: transparent; } .tabwid table tr { background-color: transparent; } .katex-display { margin: 0 0 !important; } </style><div class="tabwid"><style>.cl-2328455e{}.cl-23244198{font-family:'Helvetica';font-size:11pt;font-weight:normal;font-style:normal;text-decoration:none;color:rgba(0, 0, 0, 1.00);background-color:transparent;}.cl-232450a2{margin:0;text-align:left;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-232450a3{margin:0;text-align:right;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-23247712{width:59.1pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-2324771c{width:54.2pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-2324771d{width:97.6pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-2324771e{width:59.1pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-23247726{width:97.6pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-23247727{width:54.2pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-23247728{width:97.6pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-23247730{width:54.2pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-23247731{width:59.1pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-2324773a{width:54.2pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 2pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-2324773b{width:59.1pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 2pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-2324773c{width:97.6pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 2pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}</style><table class='cl-2328455e'><thead><tr style="overflow-wrap:break-word;"><td class="cl-2324773c"><p class="cl-232450a2"><span class="cl-23244198">model</span></p></td><td class="cl-2324773a"><p class="cl-232450a3"><span class="cl-23244198">price</span></p></td><td class="cl-2324773b"><p class="cl-232450a3"><span class="cl-23244198">mileage</span></p></td></tr></thead><tbody><tr style="overflow-wrap:break-word;"><td class="cl-2324771d"><p class="cl-232450a2"><span class="cl-23244198">Ford Fusion</span></p></td><td class="cl-2324771c"><p class="cl-232450a3"><span class="cl-23244198">25,000</span></p></td><td class="cl-23247712"><p class="cl-232450a3"><span class="cl-23244198">27</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-23247726"><p class="cl-232450a2"><span class="cl-23244198">Hyundai Accent</span></p></td><td class="cl-23247727"><p class="cl-232450a3"><span class="cl-23244198">16,000</span></p></td><td class="cl-2324771e"><p class="cl-232450a3"><span class="cl-23244198">36</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-23247728"><p class="cl-232450a2"><span class="cl-23244198">Toyota Corolla</span></p></td><td class="cl-23247730"><p class="cl-232450a3"><span class="cl-23244198">18,000</span></p></td><td class="cl-23247731"><p class="cl-232450a3"><span class="cl-23244198">32</span></p></td></tr></tbody></table></div></template> <div class="flextable-shadow-host" id="8700d5f2-00fc-4393-aca9-b90684f3e370"></div> <script> var dest = document.getElementById("8700d5f2-00fc-4393-aca9-b90684f3e370"); var template = document.getElementById("83a3f0c3-cef6-4591-8fbd-a3f269974008"); var caption = template.content.querySelector("caption"); if(caption) { caption.style.cssText = "display:block;text-align:center;"; var newcapt = document.createElement("p"); newcapt.appendChild(caption) dest.parentNode.insertBefore(newcapt, dest.previousSibling); } var fantome = dest.attachShadow({mode: 'open'}); var templateContent = template.content; fantome.appendChild(templateContent); </script> --- # Tibbles - More modern take on Data frames - Never changes input's type - Never adjusts the names of variables - It evaluates arguments lazily and sequentially - Differences - Printing ```r as.tibble() ``` --- # Matrices - Matrices are vectors with dimensions (like a 2X3) - All the data *must be the same type* - Data Frames / Tibbles - Like a matrix, but the columns can be different types of classes ```r myMatrix <- matrix(data = 1:10, nrow = 5, ncol = 2) myMatrix ``` ``` ## [,1] [,2] ## [1,] 1 6 ## [2,] 2 7 ## [3,] 3 8 ## [4,] 4 9 ## [5,] 5 10 ``` --- # Working Directories - `Here` package - `Here` helps set relative as opposed to absolute paths - Why would this be a problem? ```r #setwd("your path here") ``` -- ```r #install here library(here) # here here::here() ``` ``` ## [1] "/Users/jasongeller/Documents/505-psych_issues_research/505-psych_issues_research" ``` ```r # can use with read.csv ``` --- # Loading the Data <br> <br> <br> # **Always create an R project before you start** --- # Aside: Naming Conventions - Object names are case-sensitive!! - Typing in data to call an object named Data will fail - Object names can contain letters, numbers, underscores “_” and periods “.” - In most cases you should use snake_case to name objects - Avoid periods - use_an_underscore_between_words - Names should be short and descriptive, with descriptive being the most important feature --- # Loading the Data - Download at: https://osf.io/qh9rb/ The faculty dataset contains aggregated data per faculty: - faculty: Business, Economics, Political Science, Sociology - students: number of students - profs: number of profs - salary: amount of salary - costs: amount of costs dataset entails demographic and school-related information on imaginary students, such as --- # Load the data - CSV ```r library(kableExtra) fac=read.csv(here::here("static", "slides", "01-R", "datasets", "faculty.csv")) fac %>% kbl() %>% kable_material_dark() ``` <table class=" lightable-material-dark" style='font-family: "Source Sans Pro", helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> faculty </th> <th style="text-align:right;"> students </th> <th style="text-align:right;"> profs </th> <th style="text-align:right;"> salary </th> <th style="text-align:right;"> costs </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Business </td> <td style="text-align:right;"> 339 </td> <td style="text-align:right;"> 76 </td> <td style="text-align:right;"> 57273 </td> <td style="text-align:right;"> 33346 </td> </tr> <tr> <td style="text-align:left;"> Economics </td> <td style="text-align:right;"> 225 </td> <td style="text-align:right;"> 79 </td> <td style="text-align:right;"> 83292 </td> <td style="text-align:right;"> 33527 </td> </tr> <tr> <td style="text-align:left;"> Political Science </td> <td style="text-align:right;"> 264 </td> <td style="text-align:right;"> 63 </td> <td style="text-align:right;"> 66425 </td> <td style="text-align:right;"> 24965 </td> </tr> <tr> <td style="text-align:left;"> Sociology </td> <td style="text-align:right;"> 162 </td> <td style="text-align:right;"> 77 </td> <td style="text-align:right;"> 54246 </td> <td style="text-align:right;"> 29640 </td> </tr> </tbody> </table> --- # Looking at Data ```r fac%>% summary() ``` ``` ## faculty students profs salary ## Length:4 Min. :162.0 Min. :63.00 Min. :54246 ## Class :character 1st Qu.:209.2 1st Qu.:72.75 1st Qu.:56516 ## Mode :character Median :244.5 Median :76.50 Median :61849 ## Mean :247.5 Mean :73.75 Mean :65309 ## 3rd Qu.:282.8 3rd Qu.:77.50 3rd Qu.:70642 ## Max. :339.0 Max. :79.00 Max. :83292 ## costs ## Min. :24965 ## 1st Qu.:28471 ## Median :31493 ## Mean :30370 ## 3rd Qu.:33391 ## Max. :33527 ``` --- # Looking at Data ```r library(skimr) library(flextable) fac%>% skim() %>% kbl() %>% kable_material_dark() ``` <table class=" lightable-material-dark" style='font-family: "Source Sans Pro", helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> skim_type </th> <th style="text-align:left;"> skim_variable </th> <th style="text-align:right;"> n_missing </th> <th style="text-align:right;"> complete_rate </th> <th style="text-align:right;"> character.min </th> <th style="text-align:right;"> character.max </th> <th style="text-align:right;"> character.empty </th> <th style="text-align:right;"> character.n_unique </th> <th style="text-align:right;"> character.whitespace </th> <th style="text-align:right;"> numeric.mean </th> <th style="text-align:right;"> numeric.sd </th> <th style="text-align:right;"> numeric.p0 </th> <th style="text-align:right;"> numeric.p25 </th> <th style="text-align:right;"> numeric.p50 </th> <th style="text-align:right;"> numeric.p75 </th> <th style="text-align:right;"> numeric.p100 </th> <th style="text-align:left;"> numeric.hist </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> character </td> <td style="text-align:left;"> faculty </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 8 </td> <td style="text-align:right;"> 17 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> NA </td> </tr> <tr> <td style="text-align:left;"> numeric </td> <td style="text-align:left;"> students </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> 247.50 </td> <td style="text-align:right;"> 74.074287 </td> <td style="text-align:right;"> 162 </td> <td style="text-align:right;"> 209.25 </td> <td style="text-align:right;"> 244.5 </td> <td style="text-align:right;"> 282.75 </td> <td style="text-align:right;"> 339 </td> <td style="text-align:left;"> ▇▇▇▁▇ </td> </tr> <tr> <td style="text-align:left;"> numeric </td> <td style="text-align:left;"> profs </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> 73.75 </td> <td style="text-align:right;"> 7.274384 </td> <td style="text-align:right;"> 63 </td> <td style="text-align:right;"> 72.75 </td> <td style="text-align:right;"> 76.5 </td> <td style="text-align:right;"> 77.50 </td> <td style="text-align:right;"> 79 </td> <td style="text-align:left;"> ▂▁▁▁▇ </td> </tr> <tr> <td style="text-align:left;"> numeric </td> <td style="text-align:left;"> salary </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> 65309.00 </td> <td style="text-align:right;"> 13058.854085 </td> <td style="text-align:right;"> 54246 </td> <td style="text-align:right;"> 56516.25 </td> <td style="text-align:right;"> 61849.0 </td> <td style="text-align:right;"> 70641.75 </td> <td style="text-align:right;"> 83292 </td> <td style="text-align:left;"> ▇▁▃▁▃ </td> </tr> <tr> <td style="text-align:left;"> numeric </td> <td style="text-align:left;"> costs </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> 30369.50 </td> <td style="text-align:right;"> 4023.686079 </td> <td style="text-align:right;"> 24965 </td> <td style="text-align:right;"> 28471.25 </td> <td style="text-align:right;"> 31493.0 </td> <td style="text-align:right;"> 33391.25 </td> <td style="text-align:right;"> 33527 </td> <td style="text-align:left;"> ▃▁▃▁▇ </td> </tr> </tbody> </table> --- # Looking at the Data - Select specific columns - use $ operator to grab one column ```r fac$column_name %>% summary() ``` ``` ## Length Class Mode ## 0 NULL NULL ``` ```r fac%>% select(column_name) %>% summary() %>% ``` --- # Looking at the Data - Whole dataset - First 6 observations - Last 6 observations ```r fac # whole dataset ``` ``` ## faculty students profs salary costs ## 1 Business 339 76 57273 33346 ## 2 Economics 225 79 83292 33527 ## 3 Political Science 264 63 66425 24965 ## 4 Sociology 162 77 54246 29640 ``` ```r head(fac) ``` ``` ## faculty students profs salary costs ## 1 Business 339 76 57273 33346 ## 2 Economics 225 79 83292 33527 ## 3 Political Science 264 63 66425 24965 ## 4 Sociology 162 77 54246 29640 ``` ```r tail(fac) ``` ``` ## faculty students profs salary costs ## 1 Business 339 76 57273 33346 ## 2 Economics 225 79 83292 33527 ## 3 Political Science 264 63 66425 24965 ## 4 Sociology 162 77 54246 29640 ``` --- # Looking at Data ```r # look at specific variables table(fac$students) ``` ``` ## ## 162 225 264 339 ## 1 1 1 1 ``` ```r # let's try another package library("janitor") tabyl(fac$students) ``` ``` ## fac$students n percent ## 162 1 0.25 ## 225 1 0.25 ## 264 1 0.25 ## 339 1 0.25 ``` --- # Reading in Other File Types - Excel ```r library(readxl) fac<- read_excel('/Users/jg/Desktop/experiment.xlsx', sheet=2) # excel files can have multiple sheets ``` - SPSS ```r library(haven) fac<- read_spss('/Users/jg/Desktop/experiment.spss') ``` --- # Outline + Why R + IDE + R commands & functions - Tidyverse & the Pipe Operator + Multiple Functions + Reading in data + `Saving R scripts` --- # Saving Files ```r write.csv(fac, file="df.csv") write.table(fac, file="df.txt") ``` --- # Wrapping Up - You've learned: - Some basic programming terminology - Specific *R* defaults and issues - Example functions and use cases - How do I get started? - Practice! --- # Helpful Websites - Google! - Cheat sheets (https://rstudio.cloud/learn/cheat-sheets) - Quick-R: www.statmethods.net - R documentation: www.rdocumentation.org - Swirl: www.swirlstats.com - Stack Overflow: www.stackoverflow.com - Learn Statistics with R: https://learningstatisticswithr.com/ --- # Exercise 1. Create a variable called y with the value of 7 2. Save the results of 6 + 3 as a variable called a. 3. Create a new project folder for this course entitled "psy_505" (run the .Rproj file) 3.1 Place the exercise.csv file in the folder 5. Using here assign the file to a name of your choice 7. Explore the data set by running the commands head(data), str(data), glimpse(data) and summary(data) in your R script. You will use these a lot in the future, so have a closer look at the different outputs in the console (lower left). Remember to save your script! --- class:middle center # Tutorial assignment due next Tuesday