How to standardize rows when cleaning data

+1 vote
How to standardize rows when cleaning data?
Nov 14, 2018 in Data Analytics by Ali
• 11,360 points
817 views

1 answer to this question.

0 votes

The goal of this step is to make sure that 1) every row has the same number of fields and 2) the fields are in the right order. In read.table, lines that contain less fields than the maximum number of fields detected are appended with NA. One advantage of the do-it-yourself approach shown here is that we do not have to make this assumption. The easiest way to standardize rows is to write a function that takes a single character vector as input and assigns the values in the right order.

assignFields <- function(x){
out <- character(3) 
# get names 
i <- grepl("[[:alpha:]]",x)
out[1] <- x[i]
# get birth date (if any)
i <- which(as.numeric(x) < 1890)
out[2] <- ifelse(length(i)>0, x[i], NA)
# get death date (if any)
i <- which(as.numeric(x) > 1890) 
out[3] <- ifelse(length(i)>0, x[i], NA)
out
}
answered Nov 14, 2018 by Maverick
• 10,840 points

Related Questions In Data Analytics

0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered Aug 20, 2019 in Data Analytics by anonymous
• 33,030 points
14,762 views
0 votes
2 answers

How to subset rows containing NA in a chosen column of a data frame?

You can give this a try. subset(dataframe, is.na(dataframe$col2)) ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 33,030 points
10,467 views
0 votes
1 answer

How to order data frame rows according to vector with specific order using R?

You can try using match: data <- data.frame(alphabets=letters[1:4], ...READ MORE

answered Apr 30, 2018 in Data Analytics by Sahiti
• 6,370 points
7,464 views
0 votes
1 answer

How can I append rows to an R data frame?

Consider a dataSet i.e cicar(present under library ...READ MORE

answered May 9, 2018 in Data Analytics by zombie
• 3,790 points
10,768 views
0 votes
1 answer

How to remove certain character from a vector

We can use sub to remove the * by specifying fixed = ...READ MORE

answered Nov 14, 2018 in Data Analytics by Maverick
• 10,840 points
727 views
+1 vote
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 33,030 points
1,812 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
3,839 views
0 votes
1 answer

Look for certain values from not cleaned data

First see what rows meet t$ps04==1 & t$rectyp==1. ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
631 views
+1 vote
1 answer

an error occurred" because rows do not match when trying to use lm to perform an ANOVA test

Maybe you could do something like this. ...READ MORE

answered Nov 2, 2018 in Data Analytics by Maverick
• 10,840 points
980 views
0 votes
1 answer

Error saying "duplicate 'row.names' are not allowed" when trying to setup my data for the mlogit-package

Take out the chid.var argument in your call to mlogit.data, ...READ MORE

answered Nov 12, 2018 in Data Analytics by Maverick
• 10,840 points
2,043 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP