R continues to classify data into five primary atomic types: numeric (decimals), integer (whole numbers), character (text), logical (true/false), and complex. These building blocks are organized into structures like vectors and matrices for homogeneous data, or lists and data frames for heterogeneous datasets. To verify or transform these types during analysis, users rely on core functions such as class(), typeof(), and as.numeric().
# Numeric (Double): Default for numbers with decimals
my_num <- 42.5
class(my_num) # Output: "numeric"
## [1] "numeric"
typeof(my_num) # Output: "double"
## [1] "double"
#typeof() is for low level data type as recognized by computer memory. class() is for high level type as recogonized by other components and functions in the program/ script.
# Integer: Whole numbers (add 'L' suffix)
my_int <- 7L
class(my_int) # Output: "integer"
## [1] "integer"
# Character: Text data in quotes
my_char <- "Data Science 2026"
class(my_char) # Output: "character"
## [1] "character"
# Logical: Boolean values
my_bool <- TRUE
class(my_bool) # Output: "logical"
## [1] "logical"
R organizes atomic types into multi-element structures.
# Vector: Homogeneous collection (same type)
student_grades <- c(85, 92, 78, 88)
is.vector(student_grades)
## [1] TRUE
# List: Heterogeneous collection (different types)
student_info <- list(name = "Alice", age = 21, active = TRUE)
str(student_info)
## List of 3
## $ name : chr "Alice"
## $ age : num 21
## $ active: logi TRUE
# Data Frame: Tabular data (rows and columns)
# Columns can be different types, but must be the same length
class_data <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Score = c(95, 82, 88),
Passed = c(TRUE, TRUE, TRUE)
)
head(class_data)
## Name Score Passed
## 1 Alice 95 TRUE
## 2 Bob 82 TRUE
## 3 Charlie 88 TRUE
Essential functions for verifying and changing data types.
# Checking types
is.numeric(my_num) # Returns TRUE
## [1] TRUE
is.character(my_num) # Returns FALSE
## [1] FALSE
# Coercion (Converting types)
as.character(my_num) # Converts 42.5 to "42.5"
## [1] "42.5"
as.numeric("100") # Converts "100" to 100
## [1] 100
You can work with several in built datasets in R. Type in
data(“
data("airquality")
Let’s write this data into a csv file. R operates from a folder which
is designated as the working directory. you can set the working
directory from the files tab or by using the setwd(“
setwd("D:/Anand_documents")
write.csv(airquality,"Air_quality_data.csv")
Now lets remove all data in the environment. Click the broom icon in
environment tab. Hit CTRL+L to clear console. Now lets get back the
necessary data from the csv file. Use read,csv(“
data<-read.csv("Air_quality_data.csv")
You can get to know the structure of data better by using commands like head(), tail(), str().
str(data)
## 'data.frame': 153 obs. of 7 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
head(data)
## X Ozone Solar.R Wind Temp Month Day
## 1 1 41 190 7.4 67 5 1
## 2 2 36 118 8.0 72 5 2
## 3 3 12 149 12.6 74 5 3
## 4 4 18 313 11.5 62 5 4
## 5 5 NA NA 14.3 56 5 5
## 6 6 28 NA 14.9 66 5 6
tail(data)
## X Ozone Solar.R Wind Temp Month Day
## 148 148 14 20 16.6 63 9 25
## 149 149 30 193 6.9 70 9 26
## 150 150 NA 145 13.2 77 9 27
## 151 151 14 191 14.3 75 9 28
## 152 152 18 131 8.0 76 9 29
## 153 153 20 223 11.5 68 9 30
colnames(data)
## [1] "X" "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
Lets see if we can change a column name. We wish to also see if any data is missing.
colnames(data)[c(1,3,4)]<-c("Sl.No","Solar","Wind")
#is.na(data)# Checks if each element is a missing value
colSums(is.na(data))# Total missing values in each column
## Sl.No Ozone Solar Wind Temp Month Day
## 0 37 7 0 0 0 0
Measure like mean gets contaminated by missing data. we need to remove it for better understanding.
summary(data)#automatically removes missing value
## Sl.No Ozone Solar Wind
## Min. : 1 Min. : 1.00 Min. : 7.0 Min. : 1.700
## 1st Qu.: 39 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400
## Median : 77 Median : 31.50 Median :205.0 Median : 9.700
## Mean : 77 Mean : 42.13 Mean :185.9 Mean : 9.958
## 3rd Qu.:115 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500
## Max. :153 Max. :168.00 Max. :334.0 Max. :20.700
## NA's :37 NA's :7
## Temp Month Day
## Min. :56.00 Min. :5.000 Min. : 1.0
## 1st Qu.:72.00 1st Qu.:6.000 1st Qu.: 8.0
## Median :79.00 Median :7.000 Median :16.0
## Mean :77.88 Mean :6.993 Mean :15.8
## 3rd Qu.:85.00 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :97.00 Max. :9.000 Max. :31.0
##
m<-mean(data$Ozone,na.rm=TRUE)
data$Ozone[is.na(data$Ozone)]<-m
sum(is.na(data$Ozone))
## [1] 0