What is R?

Introduction to R

R is an used programming language and software platform for statistical analysis and visual representation. It is well liked by data experts, statisticians and scholars for examining data creating visuals and implementing machine learning techniques. With its range of statistical methods and graphical tools R is a crucial instrument, for investigating and interpreting data. Moreover R enjoys the support of a community that enhances its diverse set of tools and materials.

Brief History of R

Robert Gentleman and Ross Ihaka developed the programming language R at the University of Auckland in New Zealand during the 1990s. Their work was influenced by the S language, which originated at Bell Laboratories in the 1970s under John Chambers leadership and others. R transitioned to an open source initiative in 1995 fostering community growth. The R Core Group, comprised of developers manages the fundamental aspects and framework of the project. The inaugural official version of R was introduced in 2000 establishing its significance as a tool, for statistical computation and data analysis.

Why Use R?

R is favored in industries like finance, healthcare, technology, and academia due to its powerful capabilities in data manipulation, modeling, and visualization. Companies such as Facebook, Google, and Pfizer use R for their data analysis and statistical needs. Despite its steep learning curve and less user-friendly interfaces, R’s flexibility and extensive library of methods make it ideal for analyzing and visualizing complex datasets.

Basics of R

Installing and Setting Up R

To begin using R, you need to download and install it from the Comprehensive R Archive Network (CRAN). After installing R, you should install RStudio, an integrated development environment (IDE) for R. RStudio provides a more user-friendly interface for coding in R.

The R Console and GUI

R offers two primary interfaces: the R console and the graphical user interface (GUI). The R console is a command-line interface for inputting commands and receiving results. The GUI provides menus and options for easier interaction with the system. Both interfaces allow users to perform advanced data analysis and visualization.

Variables and Data Types in R

In R, variables store and manipulate data. Common data types include numeric, character, and logical types. R also has a special data type called factors, used for representing categorical data, which is particularly useful for data analysis.

Examples:

  • Numeric variable: age <- 25
  • Character variable: name <- "John"
  • Logical variable: is_student <- TRUE

Basic Operations in R

Basic operations in R involve data manipulation, modeling, and visualization. Packages like dplyr and tidyr help with data cleaning and preparation. Modeling tools like lm for linear regression and randomForest for building models are essential for exploring relationships and making predictions. Visualization tools like ggplot2 allow for the creation of diverse plots and graphs.

Working with Data in R

Importing Data into R

To import data, you can use the read.csv() function. For example, to import a file named "chestsize.csv" into a data frame:

df <- read.csv("chestsize.csv")
head(df)

Data Structures in R

Vectors: One-dimensional arrays for storing a sequence of values.

numbers <- c(1, 2, 3, 4, 5)

Matrices: Two-dimensional arrays for tabular data.

mat <- matrix(1:6, nrow = 2, ncol = 3)

Arrays: Multidimensional data structures for complex data.

arr <- array(1:12, dim = c(2, 3, 2))

Lists: Versatile structures that can hold different types of elements.

my_list <- list(name = "John", age = 30, grades = c(85, 90, 75))

Data Frames: Tables with rows and columns, commonly used for structured data.

df <- data.frame(name = c("John", "Alice", "Bob"), age = c(30, 25, 28), scores = c(85, 90, 75))

Understanding these data structures is crucial for effectively working with data in R.

Create a free account to access the full topic

“It has all the necessary theory, lots of practice, and projects of different levels. I haven't skipped any of the 3000+ coding exercises.”
Andrei Maftei
Hyperskill Graduate