class: center, middle, inverse, title-slide # Summarizing Data ## Computational Mathematics and Statistics ### Jason Bryer, Ph.D. ### September 3, 2024 --- # One Minute Paper Results .pull-left[ **What was the most important thing you learned during this class?** <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> ] .pull-right[ **What important question remains unanswered for you?** <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> ] --- # Familiarity with Statistical Topics <img src="images/hex/likert.png" class="title-hex"><img src="images/hex/googlesheets4.png" class="title-hex"> ``` r likert(stats.results) %>% plot(center = 2.5) ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- # Math Anxiety Survey Scale <img src="images/hex/likert.png" class="title-hex"><img src="images/hex/googlesheets4.png" class="title-hex"> ``` r likert(mass.results) %>% plot() ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- class: font80 # About `legosets` <img src="images/hex/brickset.png" class="title-hex"> To install the `brickset` package: ``` r remotes::install_github('jbryer/brickset') ``` To load the load the `legosets` dataset. ``` r data('legosets', package = 'brickset') ``` The `legosets` data has 19409 observations of 36 variables. .code70[ ``` r names(legosets) ``` ``` ## [1] "setID" "number" "numberVariant" ## [4] "name" "year" "theme" ## [7] "themeGroup" "subtheme" "category" ## [10] "released" "pieces" "minifigs" ## [13] "bricksetURL" "rating" "reviewCount" ## [16] "packagingType" "availability" "agerange_min" ## [19] "thumbnailURL" "imageURL" "US_retailPrice" ## [22] "US_dateFirstAvailable" "US_dateLastAvailable" "UK_retailPrice" ## [25] "UK_dateFirstAvailable" "UK_dateLastAvailable" "CA_retailPrice" ## [28] "CA_dateFirstAvailable" "CA_dateLastAvailable" "DE_retailPrice" ## [31] "DE_dateFirstAvailable" "DE_dateLastAvailable" "height" ## [34] "width" "depth" "weight" ``` ] --- # Structure (`str`) <img src="images/hex/brickset.png" class="title-hex"> .code50[ ``` r str(legosets) ``` ``` ## 'data.frame': 19409 obs. of 36 variables: ## $ setID : int 7693 7695 7697 7698 25534 7418 7419 6020 22704 7421 ... ## $ number : chr "1" "2" "3" "4" ... ## $ numberVariant : int 8 8 6 4 6 1 1 1 3 4 ... ## $ name : chr "Small house set" "Medium house set" "Medium house set" "Large house set" ... ## $ year : int 1970 1970 1970 1970 1970 1970 1970 1970 1970 1970 ... ## $ theme : chr "Minitalia" "Minitalia" "Minitalia" "Minitalia" ... ## $ themeGroup : chr "Vintage" "Vintage" "Vintage" "Vintage" ... ## $ subtheme : chr NA NA NA NA ... ## $ category : chr "Normal" "Normal" "Normal" "Normal" ... ## $ released : logi TRUE TRUE TRUE TRUE TRUE TRUE ... ## $ pieces : int 67 109 158 233 NA 1 1 60 65 NA ... ## $ minifigs : int NA NA NA NA NA NA NA NA NA NA ... ## $ bricksetURL : chr "https://brickset.com/sets/1-8" "https://brickset.com/sets/2-8" "https://brickset.com/sets/3-6" "https://brickset.com/sets/4-4" ... ## $ rating : num 0 0 0 0 0 0 0 0 0 0 ... ## $ reviewCount : int 0 0 1 0 0 0 0 0 0 0 ... ## $ packagingType : chr "{Not specified}" "{Not specified}" "{Not specified}" "{Not specified}" ... ## $ availability : chr "{Not specified}" "{Not specified}" "{Not specified}" "{Not specified}" ... ## $ agerange_min : int NA NA NA NA NA NA NA NA NA NA ... ## $ thumbnailURL : chr "https://images.brickset.com/sets/small/1-8.jpg" "https://images.brickset.com/sets/small/2-8.jpg" "https://images.brickset.com/sets/small/3-6.jpg" "https://images.brickset.com/sets/small/4-4.jpg" ... ## $ imageURL : chr "https://images.brickset.com/sets/images/1-8.jpg" "https://images.brickset.com/sets/images/2-8.jpg" "https://images.brickset.com/sets/images/3-6.jpg" "https://images.brickset.com/sets/images/4-4.jpg" ... ## $ US_retailPrice : num NA NA NA NA NA NA NA NA NA NA ... ## $ US_dateFirstAvailable: Date, format: NA NA ... ## $ US_dateLastAvailable : Date, format: NA NA ... ## $ UK_retailPrice : num NA NA NA NA NA NA NA NA NA NA ... ## $ UK_dateFirstAvailable: Date, format: NA NA ... ## $ UK_dateLastAvailable : Date, format: NA NA ... ## $ CA_retailPrice : num NA NA NA NA NA NA NA NA NA NA ... ## $ CA_dateFirstAvailable: Date, format: NA NA ... ## $ CA_dateLastAvailable : Date, format: NA NA ... ## $ DE_retailPrice : num NA NA NA NA NA NA NA NA NA NA ... ## $ DE_dateFirstAvailable: Date, format: NA NA ... ## $ DE_dateLastAvailable : Date, format: NA NA ... ## $ height : num NA NA NA NA NA ... ## $ width : num NA NA NA NA NA ... ## $ depth : num NA NA NA NA NA NA NA NA 5.08 NA ... ## $ weight : num NA NA NA NA NA NA NA NA NA NA ... ``` ] --- # RStudio Eenvironment tab can help <img src="images/hex/rstudio.png" class="title-hex"> <img src="images/legosets_rstudio_environment.png" width="500" style="display: block; margin: auto;" /> --- class: hide-logo # Table View .font60[
] --- # Data Wrangling Cheat Sheet <img src="images/hex/dplyr.png" class="title-hex"> .center[ <a href='https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf' target='_new'><img src='images/data-transformation.png' width='700' /></a> ] --- # Tidyverse vs Base R <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/pipe.png" class="title-hex"> .center[ <a href='images/R_Syntax_Comparison.jpeg' target='_new'><img src="images/R_Syntax_Comparison.jpeg" width='700' /></a> ] --- # Pipes `%>%` and `|>` <img src="images/hex/magrittr.png" class="title-hex"> <img src='images/magrittr_pipe.jpg' align='right' width='200' /> .font90[ The pipe operator (`%>%`) introduced with the `magrittr` R package allows for the chaining of R operations. Base R has now added their own pipe operator (`|>`). They take the output from the left-hand side and passes it as the first parameter to the function on the right-hand side. ] .pull-left[ You can do this in two steps: ``` r tab_out <- table(legosets$category) prop.table(tab_out) ``` Or as nested function calls. ``` r prop.table(table(legosets$category)) ``` ] .pull-right[ Using the pipe (`|>`) operator we can chain these calls in a what is arguably a more readable format: ``` r table(legosets$category) |> prop.table() ``` ] <hr /> ``` ## ## Book Collection Extended Gear Normal Other ## 0.034468546 0.031377196 0.028749549 0.154515946 0.684682364 0.062599825 ## Random ## 0.003606574 ``` --- # Filter <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> .center[ <img src='images/dplyr_filter_sm.png' width='800' /> ] --- # Logical Operators * `!a` - TRUE if a is FALSE * `a == b` - TRUE if a and be are equal * `a != b` - TRUE if a and b are not equal * `a > b` - TRUE if a is larger than b, but not equal * `a >= b` - TRUE if a is larger or equal to b * `a < b` - TRUE if a is smaller than be, but not equal * `a <= b` - TRUE if a is smaller or equal to b * `a %in% b` - TRUE if a is in b where b is a vector ``` r which( letters %in% c('a','e','i','o','u') ) ``` ``` ## [1] 1 5 9 15 21 ``` * `a | b` - TRUE if a *or* b are TRUE * `a & b` - TRUE if a *and* b are TRUE * `isTRUE(a)` - TRUE if a is TRUE --- # Filter <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> ### `dplyr` ``` r mylego <- legosets %>% filter(themeGroup == 'Educational' & year > 2015) ``` ### Base R ``` r mylego <- legosets[legosets$themeGroups == 'Educaitonal' & legosets$year > 2015,] ``` <hr /> ``` r nrow(mylego) ``` ``` ## [1] 99 ``` --- # Select <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> ### `dplyr` ``` r mylego <- mylego %>% select(setID, pieces, theme, availability, US_retailPrice, minifigs) ``` ### Base R ``` r mylego <- mylego[,c('setID', 'pieces', 'theme', 'availability', 'US_retailPrice', 'minifigs')] ``` <hr /> ``` r head(mylego, n = 4) ``` ``` ## setID pieces theme availability US_retailPrice minifigs ## 1 26803 103 Education {Not specified} NA 6 ## 2 26689 142 Education {Not specified} NA 4 ## 3 26804 98 Education {Not specified} NA 6 ## 4 26277 188 Education Educational 94.95 NA ``` --- # Relocate <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> .center[ <img src='images/dplyr_relocate.png' width='800' /> ] --- # Relocate <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> ### `dplyr` ``` r mylego %>% relocate(where(is.numeric), .after = where(is.character)) %>% head(n = 3) ``` ``` ## theme availability setID pieces US_retailPrice minifigs ## 1 Education {Not specified} 26803 103 NA 6 ## 2 Education {Not specified} 26689 142 NA 4 ## 3 Education {Not specified} 26804 98 NA 6 ``` ### Base R ``` r mylego2 <- mylego[,c('theme', 'availability', 'setID', 'pieces', 'US_retailPrice', 'minifigs')] head(mylego2, n = 3) ``` ``` ## theme availability setID pieces US_retailPrice minifigs ## 1 Education {Not specified} 26803 103 NA 6 ## 2 Education {Not specified} 26689 142 NA 4 ## 3 Education {Not specified} 26804 98 NA 6 ``` --- # Rename <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> .center[ <img src='images/rename_sm.jpg' width='1000' /> ] --- # Rename <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> ### `dplyr` ``` r mylego %>% dplyr::rename(USD = US_retailPrice) %>% head(n = 3) ``` ``` ## setID pieces theme availability USD minifigs ## 1 26803 103 Education {Not specified} NA 6 ## 2 26689 142 Education {Not specified} NA 4 ## 3 26804 98 Education {Not specified} NA 6 ``` ### Base R ``` r names(mylego2)[5] <- 'USD' head(mylego2, n = 3) ``` ``` ## theme availability setID pieces USD minifigs ## 1 Education {Not specified} 26803 103 NA 6 ## 2 Education {Not specified} 26689 142 NA 4 ## 3 Education {Not specified} 26804 98 NA 6 ``` --- # Mutate <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> .center[ <img src='images/dplyr_mutate.png' width='700' /> ] --- # Mutate <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> ### `dplyr` ``` r mylego %>% filter(!is.na(pieces) & !is.na(US_retailPrice)) %>% mutate(Price_per_piece = US_retailPrice / pieces) %>% head(n = 3) ``` ``` ## setID pieces theme availability US_retailPrice minifigs Price_per_piece ## 1 26277 188 Education Educational 94.95 NA 0.5050532 ## 2 25949 280 Education Educational 224.95 NA 0.8033929 ## 3 25954 1 Education Educational 14.95 NA 14.9500000 ``` ### Base R ``` r mylego2 <- mylego[!is.na(mylego$US_retailPrice) & !is.na(mylego$Price_per_piece),] mylego2$Price_per_piece <- mylego2$Price_per_piece / mylego2$US_retailPrice head(mylego2, n = 3) ``` ``` ## [1] setID pieces theme availability ## [5] US_retailPrice minifigs Price_per_piece ## <0 rows> (or 0-length row.names) ``` --- # Group By and Summarize <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> .code80[ ``` r legosets %>% group_by(themeGroup) %>% summarize(mean_price = mean(US_retailPrice, na.rm = TRUE), sd_price = sd(US_retailPrice, na.rm = TRUE), median_price = median(US_retailPrice, na.rm = TRUE), n = n(), missing = sum(is.na(US_retailPrice))) ``` ``` ## # A tibble: 17 × 6 ## themeGroup mean_price sd_price median_price n missing ## <chr> <dbl> <dbl> <dbl> <int> <int> ## 1 Action/Adventure 40.2 38.9 30.0 1474 779 ## 2 Art and crafts 34.9 47.7 17.5 97 9 ## 3 Basic 21.6 19.2 15.0 873 733 ## 4 Constraction 16.4 12.4 13.0 502 284 ## 5 Educational 182. 188. 130. 503 465 ## 6 Girls 35.8 24.0 23.0 240 227 ## 7 Historical 34.2 32.4 20.0 473 400 ## 8 Junior 22.0 10.1 20.0 228 165 ## 9 Licensed 53.3 71.7 30.0 2775 1066 ## 10 Miscellaneous 20.7 29.2 13.0 6253 3961 ## 11 Model making 74.3 92.1 40.0 771 384 ## 12 Modern day 38.2 35.6 30.0 2469 1535 ## 13 Pre-school 30.8 22.7 25.0 1562 1103 ## 14 Racing 26.8 26.5 15.0 270 176 ## 15 Technical 82.8 95.3 50.0 607 327 ## 16 Vintage NaN NA NA 306 306 ## 17 <NA> NaN NA NA 6 6 ``` ] --- # Describe and Describe By ``` r library(psych) describe(legosets$US_retailPrice) ``` ``` ## vars n mean sd median trimmed mad min max range skew kurtosis ## X1 1 7483 38.96 56.5 19.99 27.7 17.79 1.49 849.99 848.5 5.32 44.74 ## se ## X1 0.65 ``` ``` r describeBy(legosets$US_retailPrice, group = legosets$availability, mat = TRUE, skew = FALSE) ``` ``` ## item group1 vars n mean sd median min max range se ## X11 1 {Not specified} 1 1831 26.84733 39.96747 19.99 1.49 789.99 788.5 0.9340335 ## X12 2 Educational 1 12 212.86667 105.88283 222.45 14.95 399.95 385.0 30.5657410 ## X13 3 LEGO exclusive 1 1039 57.21203 106.63125 12.99 1.99 849.99 848.0 3.3080857 ## X14 4 LEGOLAND exclusive 1 2 4.99000 0.00000 4.99 4.99 4.99 0.0 0.0000000 ## X15 5 Not sold 1 1 12.99000 NA 12.99 12.99 12.99 0.0 NA ## X16 6 Promotional 1 5 4.79000 0.83666 4.99 3.99 5.99 2.0 0.3741657 ## X17 7 Promotional (Airline) 1 0 NaN NA NA Inf -Inf -Inf NA ## X18 8 Retail 1 4290 37.55889 38.44918 24.99 1.99 699.99 698.0 0.5870275 ## X19 9 Retail - limited 1 302 63.54381 70.91908 39.99 2.49 449.99 447.5 4.0809343 ## X110 10 Unknown 1 1 3.99000 NA 3.99 3.99 3.99 0.0 NA ``` --- # Additional Resources For data wrangling: * `dplyr` website: https://dplyr.tidyverse.org * R for Data Science book: https://r4ds.had.co.nz/wrangle-intro.html * Wrangling penguins tutorial: https://allisonhorst.shinyapps.io/dplyr-learnr/#section-welcome * Data transformation cheat sheet: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf --- class: middle # Grammer of Graphics .center[ <img src="images/ggplot2_masterpiece.png" height="550" /> ] --- # Data Visualizations with ggplot2 <img src="images/hex/ggplot2.png" class="title-hex"> * `ggplot2` is an R package that provides an alternative framework based upon Wilkinson’s (2005) Grammar of Graphics. * `ggplot2` is, in general, more flexible for creating "prettier" and complex plots. * Works by creating layers of different types of objects/geometries (i.e. bars, points, lines, polygons, etc.) `ggplot2` has at least three ways of creating plots: 1. `qplot` 2. `ggplot(...) + geom_XXX(...) + ...` 3. `ggplot(...) + layer(...)` * We will focus only on the second. --- # Parts of a `ggplot2` Statement <img src="images/hex/ggplot2.png" class="title-hex"> * Data `ggplot(myDataFrame, aes(x=x, y=y))` * Layers `geom_point()`, `geom_histogram()` * Facets `facet_wrap(~ cut)`, `facet_grid(~ cut)` * Scales `scale_y_log10()` * Other options `ggtitle('my title')`, `ylim(c(0, 10000))`, `xlab('x-axis label')` --- # Lots of geoms <img src="images/hex/ggplot2.png" class="title-hex"> ``` r ls('package:ggplot2')[grep('^geom_', ls('package:ggplot2'))] ``` ``` ## [1] "geom_abline" "geom_area" "geom_bar" "geom_bin_2d" ## [5] "geom_bin2d" "geom_blank" "geom_boxplot" "geom_col" ## [9] "geom_contour" "geom_contour_filled" "geom_count" "geom_crossbar" ## [13] "geom_curve" "geom_density" "geom_density_2d" "geom_density_2d_filled" ## [17] "geom_density2d" "geom_density2d_filled" "geom_dotplot" "geom_errorbar" ## [21] "geom_errorbarh" "geom_freqpoly" "geom_function" "geom_hex" ## [25] "geom_histogram" "geom_hline" "geom_jitter" "geom_label" ## [29] "geom_line" "geom_linerange" "geom_map" "geom_path" ## [33] "geom_point" "geom_pointrange" "geom_polygon" "geom_qq" ## [37] "geom_qq_line" "geom_quantile" "geom_raster" "geom_rect" ## [41] "geom_ribbon" "geom_rug" "geom_segment" "geom_sf" ## [45] "geom_sf_label" "geom_sf_text" "geom_smooth" "geom_spoke" ## [49] "geom_step" "geom_text" "geom_tile" "geom_violin" ## [53] "geom_vline" ``` --- # Data Visualization Cheat Sheet <img src="images/hex/ggplot2.png" class="title-hex"> .center[ <a href='https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf'><img src='images/data-visualization-2.1.png' width='700' /></a> ] --- # Scatterplot <img src="images/hex/ggplot2.png" class="title-hex"> ``` r ggplot(legosets, aes(x=pieces, y=US_retailPrice)) + geom_point() ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-35-1.png" style="display: block; margin: auto;" /> --- # Scatterplot (cont.) <img src="images/hex/ggplot2.png" class="title-hex"> ``` r ggplot(legosets, aes(x=pieces, y=US_retailPrice, color=availability)) + geom_point() ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-36-1.png" style="display: block; margin: auto;" /> --- # Scatterplot (cont.) <img src="images/hex/ggplot2.png" class="title-hex"> ``` r ggplot(legosets, aes(x=pieces, y=US_retailPrice, size=minifigs, color=availability)) + geom_point() ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-37-1.png" style="display: block; margin: auto;" /> --- # Scatterplot (cont.) <img src="images/hex/ggplot2.png" class="title-hex"> ``` r ggplot(legosets, aes(x=pieces, y=US_retailPrice, size=minifigs)) + geom_point() + facet_wrap(~ availability) ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-38-1.png" style="display: block; margin: auto;" /> --- # Boxplots <img src="images/hex/ggplot2.png" class="title-hex"> ``` r ggplot(legosets, aes(x='Lego', y=US_retailPrice)) + geom_boxplot() ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-39-1.png" style="display: block; margin: auto;" /> --- # Boxplots (cont.) <img src="images/hex/ggplot2.png" class="title-hex"> ``` r ggplot(legosets, aes(x=availability, y=US_retailPrice)) + geom_boxplot() ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-40-1.png" style="display: block; margin: auto;" /> --- # Boxplot (cont.) <img src="images/hex/ggplot2.png" class="title-hex"> ``` r ggplot(legosets, aes(x=availability, y=US_retailPrice)) + geom_boxplot() + coord_flip() ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-41-1.png" style="display: block; margin: auto;" /> --- # Histograms <img src="images/hex/ggplot2.png" class="title-hex"> ``` r ggplot(legosets, aes(x = US_retailPrice)) + geom_histogram(binwidth = 25) ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-42-1.png" style="display: block; margin: auto;" /> --- # Histograms (cont.)<img src="images/hex/ggplot2.png" class="title-hex"> ``` r ggplot(legosets, aes(x = US_retailPrice)) + geom_histogram(bins = 15) + scale_x_log10() ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-43-1.png" style="display: block; margin: auto;" /> --- # Histograms (cont.) <img src="images/hex/ggplot2.png" class="title-hex"> ``` r ggplot(legosets, aes(x = US_retailPrice)) + geom_histogram(binwidth = 25) + facet_wrap(~ availability) ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-44-1.png" style="display: block; margin: auto;" /> --- # Density Plots <img src="images/hex/ggplot2.png" class="title-hex"> ``` r ggplot(legosets, aes(x = US_retailPrice, color = availability)) + geom_density() ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-45-1.png" style="display: block; margin: auto;" /> --- # Density Plots (cont.) <img src="images/hex/ggplot2.png" class="title-hex"> ``` r ggplot(legosets, aes(x = US_retailPrice, color = availability)) + geom_density() + scale_x_log10() ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-46-1.png" style="display: block; margin: auto;" /> --- # `ggplot2` aesthetics <img src="images/hex/ggplot2.png" class="title-hex"> .center[ <a href='images/ggplot_aesthetics_cheatsheet.png' target='_new'> <img src='images/ggplot_aesthetics_cheatsheet.png' height='550' /></a> ] --- # Likert Scales <img src="images/hex/likert.png" class="title-hex"> Likert scales are a type of questionnaire where respondents are asked to rate items on scales usually ranging from four to seven levels (e.g. strongly disagree to strongly agree). ``` r library(likert) library(reshape) data(pisaitems) items24 <- pisaitems[,substr(names(pisaitems), 1,5) == 'ST24Q'] items24 <- rename(items24, c( ST24Q01="I read only if I have to.", ST24Q02="Reading is one of my favorite hobbies.", ST24Q03="I like talking about books with other people.", ST24Q04="I find it hard to finish books.", ST24Q05="I feel happy if I receive a book as a present.", ST24Q06="For me, reading is a waste of time.", ST24Q07="I enjoy going to a bookstore or a library.", ST24Q08="I read only to get information that I need.", ST24Q09="I cannot sit still and read for more than a few minutes.", ST24Q10="I like to express my opinions about books I have read.", ST24Q11="I like to exchange books with my friends.")) ``` --- # `likert` R Package <img src="images/hex/likert.png" class="title-hex"> ``` r l24 <- likert(items24) summary(l24) ``` ``` ## Item low neutral high mean sd ## 10 I like to express my opinions about books I have read. 41.07516 0 58.92484 2.604913 0.9009968 ## 5 I feel happy if I receive a book as a present. 46.93475 0 53.06525 2.466751 0.9446590 ## 8 I read only to get information that I need. 50.39874 0 49.60126 2.484616 0.9089688 ## 7 I enjoy going to a bookstore or a library. 51.21231 0 48.78769 2.428508 0.9164136 ## 3 I like talking about books with other people. 54.99129 0 45.00871 2.328049 0.9090326 ## 11 I like to exchange books with my friends. 55.54115 0 44.45885 2.343193 0.9609234 ## 2 Reading is one of my favorite hobbies. 56.64470 0 43.35530 2.344530 0.9277495 ## 1 I read only if I have to. 58.72868 0 41.27132 2.291811 0.9369023 ## 4 I find it hard to finish books. 65.35125 0 34.64875 2.178299 0.8991628 ## 9 I cannot sit still and read for more than a few minutes. 76.24524 0 23.75476 1.974736 0.8793028 ## 6 For me, reading is a waste of time. 82.88729 0 17.11271 1.810093 0.8611554 ``` --- # `likert` Plots <img src="images/hex/likert.png" class="title-hex"> ``` r plot(l24) ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-49-1.png" style="display: block; margin: auto;" /> --- # `likert` Plots <img src="images/hex/likert.png" class="title-hex"> ``` r plot(l24, type='heat') ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-50-1.png" style="display: block; margin: auto;" /> --- # `likert` Plots <img src="images/hex/likert.png" class="title-hex"> ``` r plot(l24, type='density') ``` <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-51-1.png" style="display: block; margin: auto;" /> --- # Pie Charts There is only one pie chart in *OpenIntro Statistics* (Diez, Barr, & Çetinkaya-Rundel, 2015, p. 48). Consider the following three pie charts that represent the preference of five different colors. Is there a difference between the three pie charts? This is probably a difficult to answer. <center><img src='images/Pie.png' width='500'></center> --- # Pie Charts There is only one pie chart in *OpenIntro Statistics* (Diez, Barr, & Çetinkaya-Rundel, 2015, p. 48). Consider the following three pie charts that represent the preference of five different colors. Is there a difference between the three pie charts? This is probably a difficult to answer. <center><img src='images/Pie.png' width='500'></center> <center><img src='images/Bar.png' width='500'></center> Source: [https://en.wikipedia.org/wiki/Pie_chart](https://en.wikipedia.org/wiki/Pie_chart). --- class: middle # Just say NO to pie charts! .font150[ "There is no data that can be displayed in a pie chart that cannot better be displayed in some other type of chart"] .right[.font130[John Tukey]] --- # Additional Resources For data wrangling: * `dplyr` website: https://dplyr.tidyverse.org * R for Data Science book: https://r4ds.had.co.nz/wrangle-intro.html * Wrangling penguins tutorial: https://allisonhorst.shinyapps.io/dplyr-learnr/#section-welcome * Data transformation cheat sheet: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf For data visualization: * `ggplot2` website: https://ggplot2.tidyverse.org * R for Data Science book: https://r4ds.had.co.nz/data-visualisation.html * R Graphics Cookbook: https://r-graphics.org * Data visualization cheat sheet: https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf --- class: left, font140 # One Minute Paper .pull-left[ 1. What was the most important thing you learned during this class? 2. What important question remains unanswered for you? ] .pull-right[ <img src="02-Summarizing_Data_files/figure-html/unnamed-chunk-52-1.png" style="display: block; margin: auto;" /> ] https://forms.gle/U4UXAosdjHorxY919