Descriptive analytics in R

A study on data types

R continues to classify data into five primary atomic types: numeric (decimals), integer (whole numbers), character (text), logical (true/false), and complex. These building blocks are organized into structures like vectors and matrices for homogeneous data, or lists and data frames for heterogeneous datasets. To verify or transform these types during analysis, users rely on core functions such as class(), typeof(), and as.numeric().

# Numeric (Double): Default for numbers with decimals
my_num <- 42.5
class(my_num)   # Output: "numeric"
## [1] "numeric"
typeof(my_num)  # Output: "double"
## [1] "double"
#typeof() is for low level data type as recognized by computer memory. class() is for high level type as recogonized by other components and functions in the program/ script.

# Integer: Whole numbers (add 'L' suffix)
my_int <- 7L
class(my_int)   # Output: "integer"
## [1] "integer"
# Character: Text data in quotes
my_char <- "Data Science 2026"
class(my_char)  # Output: "character"
## [1] "character"
# Logical: Boolean values
my_bool <- TRUE
class(my_bool)  # Output: "logical"
## [1] "logical"

Data structures

R organizes atomic types into multi-element structures.

# Vector: Homogeneous collection (same type)
student_grades <- c(85, 92, 78, 88)
is.vector(student_grades)
## [1] TRUE
# List: Heterogeneous collection (different types)
student_info <- list(name = "Alice", age = 21, active = TRUE)
str(student_info)
## List of 3
##  $ name  : chr "Alice"
##  $ age   : num 21
##  $ active: logi TRUE
# Data Frame: Tabular data (rows and columns)
# Columns can be different types, but must be the same length
class_data <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Score = c(95, 82, 88),
  Passed = c(TRUE, TRUE, TRUE)
)
head(class_data)
##      Name Score Passed
## 1   Alice    95   TRUE
## 2     Bob    82   TRUE
## 3 Charlie    88   TRUE

Type Checking & Conversion

Essential functions for verifying and changing data types.

# Checking types
is.numeric(my_num)    # Returns TRUE
## [1] TRUE
is.character(my_num)  # Returns FALSE
## [1] FALSE
# Coercion (Converting types)
as.character(my_num)  # Converts 42.5 to "42.5"
## [1] "42.5"
as.numeric("100")     # Converts "100" to 100
## [1] 100

Loading and writing data

You can work with several in built datasets in R. Type in data(“”) to load the dataset into your programming environment. It will now become recogonized as a data frame. The data can be viewed in the environemnt tab.

data("airquality")

Let’s write this data into a csv file. R operates from a folder which is designated as the working directory. you can set the working directory from the files tab or by using the setwd(“”) command. Files within the working directory can be refereed to without their full source address. Working directory is also the default storage place.

setwd("D:/Anand_documents")
write.csv(airquality,"Air_quality_data.csv")

Now lets remove all data in the environment. Click the broom icon in environment tab. Hit CTRL+L to clear console. Now lets get back the necessary data from the csv file. Use read,csv(“”) for this.

data<-read.csv("Air_quality_data.csv")

Knowing more about your data

You can get to know the structure of data better by using commands like head(), tail(), str().

str(data)
## 'data.frame':    153 obs. of  7 variables:
##  $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...
head(data)
##   X Ozone Solar.R Wind Temp Month Day
## 1 1    41     190  7.4   67     5   1
## 2 2    36     118  8.0   72     5   2
## 3 3    12     149 12.6   74     5   3
## 4 4    18     313 11.5   62     5   4
## 5 5    NA      NA 14.3   56     5   5
## 6 6    28      NA 14.9   66     5   6
tail(data)
##       X Ozone Solar.R Wind Temp Month Day
## 148 148    14      20 16.6   63     9  25
## 149 149    30     193  6.9   70     9  26
## 150 150    NA     145 13.2   77     9  27
## 151 151    14     191 14.3   75     9  28
## 152 152    18     131  8.0   76     9  29
## 153 153    20     223 11.5   68     9  30
colnames(data)
## [1] "X"       "Ozone"   "Solar.R" "Wind"    "Temp"    "Month"   "Day"

Lets see if we can change a column name. We wish to also see if any data is missing.

colnames(data)[c(1,3,4)]<-c("Sl.No","Solar","Wind")
#is.na(data)# Checks if each element is a missing value
colSums(is.na(data))# Total missing values in each column
## Sl.No Ozone Solar  Wind  Temp Month   Day 
##     0    37     7     0     0     0     0

Handling missing data

Measure like mean gets contaminated by missing data. we need to remove it for better understanding.

summary(data)#automatically removes missing value
##      Sl.No         Ozone            Solar            Wind       
##  Min.   :  1   Min.   :  1.00   Min.   :  7.0   Min.   : 1.700  
##  1st Qu.: 39   1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400  
##  Median : 77   Median : 31.50   Median :205.0   Median : 9.700  
##  Mean   : 77   Mean   : 42.13   Mean   :185.9   Mean   : 9.958  
##  3rd Qu.:115   3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500  
##  Max.   :153   Max.   :168.00   Max.   :334.0   Max.   :20.700  
##                NA's   :37       NA's   :7                       
##       Temp           Month            Day      
##  Min.   :56.00   Min.   :5.000   Min.   : 1.0  
##  1st Qu.:72.00   1st Qu.:6.000   1st Qu.: 8.0  
##  Median :79.00   Median :7.000   Median :16.0  
##  Mean   :77.88   Mean   :6.993   Mean   :15.8  
##  3rd Qu.:85.00   3rd Qu.:8.000   3rd Qu.:23.0  
##  Max.   :97.00   Max.   :9.000   Max.   :31.0  
## 
m<-mean(data$Ozone,na.rm=TRUE)
data$Ozone[is.na(data$Ozone)]<-m
sum(is.na(data$Ozone))
## [1] 0