R Programming Finding items with exceptional sequence

0 votes

My data is in the below format:

ID  Timestamp   Status
1   1/1/2014    1
2   1/1/2014    1
3   1/2/2014    1
4   1/3/2014    1
1   1/3/2014    2
3   1/3/2014    2
4   1/5/2014    2
5   1/5/2014    1
1   1/6/2014    3
2   1/7/2014    3
3   1/8/2014    3
4   1/9/2014    3
5   1/10/2014   2
6   1/10/2014   1
3   1/10/2014   4
3   1/10/2014   5
3   1/10/2014   6
1   1/11/2014   4
2   1/11/2014   3
3   1/11/2014   4
3   1/11/2014   2
5   1/11/2014   3
6   1/12/2014   4
7   1/12/2014   5
5   1/12/2014   6
4   1/12/2014   7
2   1/13/2014   3
3   1/13/2014   4
1   1/14/2014   5
5   1/14/2014   3
6   1/14/2014   4
1   1/15/2014   6
1   1/16/2014   7

All the IDs must go through status from 1 to 7 in order 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7

But because of some error in data entry, the order of the sequence is collapsing.

For the above data only ID 1 have the correct status history 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7

Can this be done for all the other IDs?

Feb 27, 2019 in Data Analytics by Tyrion anex
• 8,700 points
771 views

1 answer to this question.

0 votes

Here's a code that will help with the sequence:

dd<-structure(list(ID = c(1L, 2L, 3L, 4L, 1L, 3L, 4L, 5L, 1L, 2L, 
3L, 4L, 5L, 6L, 3L, 3L, 3L, 1L, 2L, 3L, 3L, 5L, 6L, 7L, 5L, 4L, 
2L, 3L, 1L, 5L, 6L, 1L, 1L), Timestamp = structure(c(18262, 18262, 
18263, 18264, 18264, 18264, 18266, 18266, 18267, 18268, 18269, 
18270, 18271, 18271, 18271, 18271, 18271, 18272, 18272, 18272, 
18272, 18272, 18273, 18273, 18273, 18273, 18274, 18274, 18275, 
18275, 18275, 18276, 18277), class = "Date"), Status = c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 1L, 3L, 3L, 3L, 3L, 2L, 1L, 4L, 5L, 6L, 
4L, 3L, 4L, 2L, 3L, 4L, 5L, 6L, 7L, 3L, 4L, 5L, 3L, 4L, 6L, 7L
)), .Names = c("ID", "Timestamp", "Status"), row.names = c(NA, 
-33L), class = "data.frame")

Next, you define a helper function to test all the difference in the status values are all 1 and that we have all seven of them

isgoodseq<-function(x) {
    length(x) ==7 & all(diff(x)==1) & min(x)==1
}

After which, you can run this for each ID

with(dd[order(dd$Timestamp, dd$ID, dd$Status),], 
    tapply(Status, ID, isgoodseq))

This is the output:

    1     2     3     4     5     6     7 
 TRUE FALSE FALSE FALSE FALSE FALSE FALSE 

Which means 1 is the only good ID

answered Feb 27, 2019 by Sophie may
• 10,620 points

Related Questions In Data Analytics

0 votes
1 answer

Most common errors faced when programming with R

These the two most common errors I ...READ MORE

answered Oct 31, 2018 in Data Analytics by Kalgi
• 52,350 points
743 views
+1 vote
2 answers

Stacked barchart - R programming with ggplot2

Use position = stack inside geom_bar() ggplot(mydata, aes(x = ...READ MORE

answered Aug 23, 2019 in Data Analytics by anonymous
• 33,030 points
985 views
+1 vote
1 answer

R programming: Finding closest pair

The dist() function will help you find the ...READ MORE

answered Mar 5, 2019 in Data Analytics by Tyrion anex
• 8,700 points
796 views
+10 votes
3 answers

Which is a better initiative to learn data science: Python or R?

Well it truly depends on your requirement, If ...READ MORE

answered Aug 9, 2018 in Data Analytics by Abhi
• 3,720 points
1,719 views
+1 vote
2 answers
0 votes
1 answer

R programming logic

Use gsub to match the substring that we want ...READ MORE

answered Nov 16, 2018 in Data Analytics by Maverick
• 10,840 points
803 views
0 votes
1 answer
0 votes
1 answer

R programming: Finding the difference between 2 vectors

Try this function, it worked for me: f ...READ MORE

answered Dec 28, 2018 in Data Analytics by Sophie may
• 10,620 points
1,419 views
+1 vote
1 answer

R Programming: matrices

Try this, It will test if a matrix ...READ MORE

answered Dec 17, 2018 in Data Analytics by Sophie may
• 10,620 points
811 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP