1 Exploratory Data Analysis Exercise:

Download the customer data and read into R.

customer <- read.csv(file = "https://xiaoruizhu.github.io/Data-Mining-R/lecture/data/CustomerData.csv")
  1. How many rows and columns of the dataset?
  2. Print first few rows the dataset.
  3. Obtain the summary statistics (Min, Median, Max, Mean and Std.) for Age, EducationYears, HHIncome, and CreditDebt.
  4. Obtain the mean of HHIncome by MaritalStatus
  5. Obtain a pivot table of LoanDefault vs. JobCategory. Which Job Category has the highest and lowerst loan default rate?

2 Data Manipulation Exercise:

  1. Obtain a dataset “iris_select” that drop the first and second column by using dataname[, “variable_index”];
  2. Create new variable Sepal_LW equals to the ratio of sepal length to sepal width. (without using mutate());
  3. How to get only those variables that contain missing values?
  4. Random sample a training data set that contains 80% of original data points.

go to top