Fill NAs in R with zero if the next valid data point is more than 2 intervals awayConvert a list of data...

If I wouldn't want to read the story, is writing it still a good idea?

Cascading Repair Costs following Blown Head Gasket on a 2004 Subaru Outback

Impossible darts scores

Does Marvel have an equivalent of the Green Lantern?

Why did pressing the joystick button spit out keypresses?

Unusual mail headers, evidence of an attempted attack. Have I been pwned?

Why is C++ initial allocation so much larger than C's?

Is my Rep in Stack-Exchange Form?

Computing a trigonometric integral

How to split an equation in two lines?

What reason would an alien civilization have for building a Dyson Sphere (or Swarm) if cheap Nuclear fusion is available?

Why is the voltage measurement of this circuit different when the switch is on?

Can any NP-Complete Problem be solved using at most polynomial space (but while using exponential time?)

Where can I find a database of galactic spectra?

What is the legal status of travelling with methadone in your carry-on?

Iterate MapThread with matrices

Why do some games show lights shine thorugh walls?

Should I prioritize my 401(k) over my student loans?

First-year PhD giving a talk among well-established researchers in the field

Is adding a new player (or players) a DM decision, or a group decision?

Interaction between Leyline of Anticipation and Teferi, Time Raveler

Underbar nabla symbol doesn't work

Trainee keeps missing deadlines for independent learning

Hand soldering SMD 1206 components

Fill NAs in R with zero if the next valid data point is more than 2 intervals away

Convert a list of data frames into one data frameHow can I use mySQL replace() to replace strings in multiple records?How to rename a single column in a data.frame?grep using a character vector with multiple patternsMatching a fixed point with an interval in a data framedata.table vs dplyr: can one do something well the other can't or does poorly?Calculate the bearing between more than two data pointsFind interval from data frame start and end pointsCreate intervals using data pointsPlotting splines with confidence intervals on top of data points

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I have multiple vectors with NAs and my intention to fill NA which are more than 2 intervals from a valid data point with 0. for example:

x <- c(3, 4, NA, NA, NA, 3, 3)

Expected output is,

3, 4, NA, 0, NA, 3, 3

edited 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

asked 9 hours ago

John Musau

536 bronze badges

add a comment |

I have multiple vectors with NAs and my intention to fill NA which are more than 2 intervals from a valid data point with 0. for example:

x <- c(3, 4, NA, NA, NA, 3, 3)

Expected output is,

3, 4, NA, 0, NA, 3, 3

edited 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

asked 9 hours ago

John Musau

536 bronze badges

add a comment |

I have multiple vectors with NAs and my intention to fill NA which are more than 2 intervals from a valid data point with 0. for example:

x <- c(3, 4, NA, NA, NA, 3, 3)

Expected output is,

3, 4, NA, 0, NA, 3, 3

edited 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

asked 9 hours ago

John Musau

536 bronze badges

I have multiple vectors with NAs and my intention to fill NA which are more than 2 intervals from a valid data point with 0. for example:

x <- c(3, 4, NA, NA, NA, 3, 3)

Expected output is,

3, 4, NA, 0, NA, 3, 3

r replace na

edited 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

asked 9 hours ago

John Musau

536 bronze badges

edited 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

asked 9 hours ago

John Musau

536 bronze badges

edited 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

edited 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

edited 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

asked 9 hours ago

John Musau

536 bronze badges

asked 9 hours ago

John Musau

536 bronze badges

asked 9 hours ago

John Musau

536 bronze badges

add a comment |

6 Answers
6

active

oldest

votes

Maybe there are simpler solutions but this one works.

na2zero <- function(x){

  ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){

    if(anyNA(y)){

      if(length(y) > 2) y[-c(1, length(y))] <- 0

    }

    y

  })

}



na2zero(x)

#[1]  3  4 NA  0 NA  3  3



X <- list(x, c(x, x), c(3, 4, NA, NA, NA, NA, 3, 3))

lapply(X, na2zero)

answered 9 hours ago

Rui Barradas

20.2k6 gold badges19 silver badges35 bronze badges

add a comment |

Update -

Here's probably one of the simplest and fastest solutions (Thanks to answer from G. Grothendieck). Simply knowing whether the value is NA on either side of any NA is sufficient information. Therefore, using lead and lag from dplyr package -

na2zero <- function(x) {

  x[is.na(lag(x, 1, 0)) & is.na(lead(x, 1, 0)) & is.na(x)] <- 0

  x

}



na2zero(x = c(3, 4, NA, NA, NA, 3, 3))

[1]  3  4 NA  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))

[1]  3  4 NA  0  0  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L)))

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

Previous Answer (also fast) -

Here's one way using rle and replace from base R. This method turns every NA, that is not an endpoint in the running length, into a 0 -

na2zero <- function(x) {

  run_lengths <- rle(is.na(x))$lengths

  replace(x, 

    sequence(run_lengths) != 1 &

    sequence(run_lengths) != rep(run_lengths, run_lengths) &

    is.na(x),

  0)

}



na2zero(x = c(3, 4, NA, NA, NA, 3, 3))

[1]  3  4 NA  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))

[1]  3  4 NA  0  0  0 NA  3  3

Updated Benchmarks -

set.seed(2)

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e5, T)



microbenchmark(

  Rui(x),

  Shree_old(x), Shree_new(x),

  markus(x),

  IceCreamT(x),

  Uwe1(x), Uwe2(x), Uwe_Reduce(x),

  Grothendieck(x),

  times = 50

)



all.equal(Shree_new(x), Rui(x)) # [1] TRUE

all.equal(Shree_new(x), Shree_old(x)) # [1] TRUE

all.equal(Shree_new(x), markus(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe1(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe2(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe_Reduce(x)) # [1] TRUE

all.equal(Shree_new(x), Grothendieck(x)) # [1] TRUE





Unit: milliseconds

           expr        min         lq        mean     median          uq        max neval

         Rui(x) 286.026540 307.586604  342.620266 318.404731  363.844258  518.03330    50

   Shree_old(x)  51.556489  62.038875   85.348031  65.012384   81.882141  327.57514    50

   Shree_new(x)   3.996918   4.258248   17.210709   6.298946   10.335142  207.14732    50

      markus(x) 853.513854 885.419719 1001.450726 919.930389 1018.353847 1642.25435    50

   IceCreamT(x)  12.162079  13.773873   22.555446  15.021700   21.271498  199.08993    50

        Uwe1(x) 162.536980 183.566490  225.801038 196.882049  269.020395  439.17737    50

        Uwe2(x)  83.582360  93.136277  115.608342  99.165997  115.376903  309.67290    50

  Uwe_Reduce(x)   1.732195   1.871940    4.215195   2.016815    4.842883   25.91542    50

Grothendieck(x) 620.814291 688.107779  767.749387 746.699435  850.442643  982.49094    50

edited 4 hours ago

answered 9 hours ago

Shree

5,0371 gold badge7 silver badges26 bronze badges

While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

– Mako212
9 hours ago

@Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

– Shree
9 hours ago

1

Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

– Mako212
9 hours ago

Nah, the first 0 in your output is > 2 away from nearest valid point.

– Shree
9 hours ago

1

The question says more than 2 intervals from a valid data point

– Mako212
9 hours ago

|
show 1 more comment

Here's a data.table option

library(data.table)



na0_dt <- function(x){

  replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)

}

Benchmark and equality comparison.

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e5, T)



all.equal(Rui(x), na0_dt(x))

# [1] TRUE

all.equal(Rui(x), markus(x))

# [1] TRUE

all.equal(Rui(x), Shree(x))

# [1] TRUE

all.equal(Rui(x), Uwe1(x))

# [1] TRUE

all.equal(Rui(x), Uwe2(x))

# [1] TRUE



library(microbenchmark)

microbenchmark(

  Rui(x),

  Shree(x),

  markus(x),

  na0_dt(x),

  Uwe1(x),

  Uwe2(x),

  times = 50

)

# Unit: milliseconds

#       expr       min        lq       mean    median         uq        max neval   cld

#     Rui(x) 262.80052 286.75909  323.24804 300.73305  344.75091  488.45805    50    d 

#   Shree(x)  55.30956  64.63756   82.52067  72.70157   85.11101  367.10743    50  b   

#  markus(x) 807.05592 882.74498 1007.31244 958.71506 1075.13111 1590.34609    50     e

#  na0_dt(x)  14.95426  16.75980   19.37040  18.53108   19.99180   38.43161    50 a    

#    Uwe1(x) 164.20027 178.59412  210.47365 198.76931  224.98389  327.70639    50   c  

#    Uwe2(x)  85.33296  97.42117  110.56380 102.55040  115.19759  192.73526    50  b

Again with length 1e3

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e3, T)



microbenchmark(

  Rui(x),

  Shree(x),

  markus(x),

  na0_dt(x),

  Uwe1(x),

  Uwe2(x),

  times = 50

)

# Unit: microseconds

#       expr      min       lq      mean   median        uq       max neval  cld

#     Rui(x) 2543.180 2668.719 3084.0056 2847.386  3286.566  4820.105    50  bc 

#   Shree(x)  530.462  584.206  651.1348  600.821   653.949  1170.052    50 a   

#  markus(x) 7036.721 7604.517 9467.0319 8507.491 11014.979 24400.421    50    d

#  na0_dt(x) 1008.001 1092.513 1179.9802 1160.001  1247.591  1564.308    50 a   

#    Uwe1(x) 2907.078 3190.976 3773.5566 3424.207  4162.873  9388.723    50   c 

#    Uwe2(x) 2299.489 2566.155 2803.8250 2703.386  2996.104  4038.155    50  b

Functions from other answers used in this benchmark

Rui <- function(x){

  ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){

    if(anyNA(y)){

      if(length(y) > 2) y[-c(1, length(y))] <- 0

    }

    y

  })

}

Shree <- function(x) {

  run_lengths <- rle(is.na(x))$lengths

  replace(x, 

    sequence(run_lengths) != 1 &

    sequence(run_lengths) != rep(run_lengths, run_lengths) &

    is.na(x),

  0)

}

markus <- function(x, gap = 1) {



  stopifnot(gap >= 0)



  if (gap == 0) {

    x[is.na(x)] <- 0

    x } else {

      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))

      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

      replace(x, tmp > gap, 0)

    }

}

na0_dt <- function(x){

  replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)

}

Uwe1 <- function(x){

  data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x

}

Uwe2 <- function(x){

  x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0

  x

}

edited 7 hours ago

answered 8 hours ago

IceCreamToucan

12.3k1 gold badge8 silver badges19 bronze badges

1

Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

– Shree
7 hours ago

1

That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

– IceCreamToucan
7 hours ago

Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

– markus
7 hours ago

Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

– IceCreamToucan
7 hours ago

add a comment |

For the sake of completeness, here are three other data.table approaches:

x <- c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L))



library(data.table)

data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0

x

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

`shift()` & `Reduce()`

I was so focused on finding the right way to create groups that I started to think about the straightforward approach rather late. The rule is quite simple:

Replace all NAs by zero which are preceeded and succeeded by another NA.

This can be accomplished by zoo::rollapply() as in G. Grothendieck's answer or by using lag() & lead() like in Shree's latest edit.

However, my own benchmark (not posted here to avoid duplication with Shree' benchmark) shows that data.table::shift() and Reduce() is the fastest method so far.

  isnax <- is.na(x) 

  x[Reduce(`&`, data.table::shift(isnax, -1:1))] <- 0

  x

It is also slightly faster than using lag() & lead() (please, note that this differs from Shree's version as is.na() is only called once):

  isnax <- is.na(x) 

  x[isnax & dplyr::lag(isnax) & dplyr::lead(isnax)] <- 0

  x

edited 4 hours ago

answered 8 hours ago

Uwe

25k7 gold badges56 silver badges78 bronze badges

Indeed, it is faster. Nice! I'll add it to my benchmarks.

– Shree
4 hours ago

2

@Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

– Uwe
4 hours ago

add a comment |

Based on the example, I assume what you mean is that if the value is NA and adjacent values in both directions are NA (or in one direction if the value is first or last) then replace the value with 0. Using a centered rolling window of length 3 return TRUE if it is all NA and then replace the TRUE positions with 0. This gives the following one-liner

library(zoo)



replace(x, rollapply(c(TRUE, is.na(x), TRUE), 3, all), 0)

## [1]  3  4 NA  0 NA  3  3

edited 7 hours ago

answered 7 hours ago

G. Grothendieck

159k11 gold badges143 silver badges251 bronze badges

zoo has some pretty handy functions to work with missing values

– PavoDive
7 hours ago

add a comment |

Another base approach

x <- c(3, 4, NA, NA, NA, 3, 3, NA, 3, NA, NA, NA, NA, 1)

Create a grouping variable

grp <- with(rle(is.na(x)), rep(seq_along(lengths), lengths)) # same as rleid(is.na(x))

For each group calculate the parallel minimum of cumsum(is.na(x)) and its reverse (which will be greater than one for values 'which are more than 2 intervals from a valid data point' away)

tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

Finally use this as an identifier to replace desired values in x

replace(x, tmp > 1, 0)

# [1]  3  4 NA  0 NA  3  3 NA  3 NA  0  0 NA  1

Written as a function

f <- function(x, gap = 1) {



  stopifnot(gap >= 0)



  if (gap == 0) {

    x[is.na(x)] <- 0

    x } else {

      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))

      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

      replace(x, tmp > gap, 0)

    }

}

f(x)

edited 8 hours ago

answered 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56692176%2ffill-nas-in-r-with-zero-if-the-next-valid-data-point-is-more-than-2-intervals-aw%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

Maybe there are simpler solutions but this one works.

na2zero <- function(x){

  ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){

    if(anyNA(y)){

      if(length(y) > 2) y[-c(1, length(y))] <- 0

    }

    y

  })

}



na2zero(x)

#[1]  3  4 NA  0 NA  3  3



X <- list(x, c(x, x), c(3, 4, NA, NA, NA, NA, 3, 3))

lapply(X, na2zero)

answered 9 hours ago

Rui Barradas

20.2k6 gold badges19 silver badges35 bronze badges

add a comment |

Maybe there are simpler solutions but this one works.

na2zero <- function(x){

  ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){

    if(anyNA(y)){

      if(length(y) > 2) y[-c(1, length(y))] <- 0

    }

    y

  })

}



na2zero(x)

#[1]  3  4 NA  0 NA  3  3



X <- list(x, c(x, x), c(3, 4, NA, NA, NA, NA, 3, 3))

lapply(X, na2zero)

answered 9 hours ago

Rui Barradas

20.2k6 gold badges19 silver badges35 bronze badges

add a comment |

Maybe there are simpler solutions but this one works.

na2zero <- function(x){

  ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){

    if(anyNA(y)){

      if(length(y) > 2) y[-c(1, length(y))] <- 0

    }

    y

  })

}



na2zero(x)

#[1]  3  4 NA  0 NA  3  3



X <- list(x, c(x, x), c(3, 4, NA, NA, NA, NA, 3, 3))

lapply(X, na2zero)

answered 9 hours ago

Rui Barradas

20.2k6 gold badges19 silver badges35 bronze badges

Maybe there are simpler solutions but this one works.

na2zero <- function(x){

  ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){

    if(anyNA(y)){

      if(length(y) > 2) y[-c(1, length(y))] <- 0

    }

    y

  })

}



na2zero(x)

#[1]  3  4 NA  0 NA  3  3



X <- list(x, c(x, x), c(3, 4, NA, NA, NA, NA, 3, 3))

lapply(X, na2zero)

answered 9 hours ago

Rui Barradas

20.2k6 gold badges19 silver badges35 bronze badges

answered 9 hours ago

Rui Barradas

20.2k6 gold badges19 silver badges35 bronze badges

answered 9 hours ago

Rui Barradas

20.2k6 gold badges19 silver badges35 bronze badges

answered 9 hours ago

Rui Barradas

20.2k6 gold badges19 silver badges35 bronze badges

add a comment |

Update -

na2zero <- function(x) {

  x[is.na(lag(x, 1, 0)) & is.na(lead(x, 1, 0)) & is.na(x)] <- 0

  x

}



na2zero(x = c(3, 4, NA, NA, NA, 3, 3))

[1]  3  4 NA  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))

[1]  3  4 NA  0  0  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L)))

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

Previous Answer (also fast) -

Here's one way using rle and replace from base R. This method turns every NA, that is not an endpoint in the running length, into a 0 -

na2zero <- function(x) {

  run_lengths <- rle(is.na(x))$lengths

  replace(x, 

    sequence(run_lengths) != 1 &

    sequence(run_lengths) != rep(run_lengths, run_lengths) &

    is.na(x),

  0)

}



na2zero(x = c(3, 4, NA, NA, NA, 3, 3))

[1]  3  4 NA  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))

[1]  3  4 NA  0  0  0 NA  3  3

Updated Benchmarks -

set.seed(2)

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e5, T)



microbenchmark(

  Rui(x),

  Shree_old(x), Shree_new(x),

  markus(x),

  IceCreamT(x),

  Uwe1(x), Uwe2(x), Uwe_Reduce(x),

  Grothendieck(x),

  times = 50

)



all.equal(Shree_new(x), Rui(x)) # [1] TRUE

all.equal(Shree_new(x), Shree_old(x)) # [1] TRUE

all.equal(Shree_new(x), markus(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe1(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe2(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe_Reduce(x)) # [1] TRUE

all.equal(Shree_new(x), Grothendieck(x)) # [1] TRUE





Unit: milliseconds

           expr        min         lq        mean     median          uq        max neval

         Rui(x) 286.026540 307.586604  342.620266 318.404731  363.844258  518.03330    50

   Shree_old(x)  51.556489  62.038875   85.348031  65.012384   81.882141  327.57514    50

   Shree_new(x)   3.996918   4.258248   17.210709   6.298946   10.335142  207.14732    50

      markus(x) 853.513854 885.419719 1001.450726 919.930389 1018.353847 1642.25435    50

   IceCreamT(x)  12.162079  13.773873   22.555446  15.021700   21.271498  199.08993    50

        Uwe1(x) 162.536980 183.566490  225.801038 196.882049  269.020395  439.17737    50

        Uwe2(x)  83.582360  93.136277  115.608342  99.165997  115.376903  309.67290    50

  Uwe_Reduce(x)   1.732195   1.871940    4.215195   2.016815    4.842883   25.91542    50

Grothendieck(x) 620.814291 688.107779  767.749387 746.699435  850.442643  982.49094    50

edited 4 hours ago

answered 9 hours ago

Shree

5,0371 gold badge7 silver badges26 bronze badges

While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

– Mako212
9 hours ago

@Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

– Shree
9 hours ago

1

Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

– Mako212
9 hours ago

Nah, the first 0 in your output is > 2 away from nearest valid point.

– Shree
9 hours ago

1

The question says more than 2 intervals from a valid data point

– Mako212
9 hours ago

|
show 1 more comment

Update -

na2zero <- function(x) {

  x[is.na(lag(x, 1, 0)) & is.na(lead(x, 1, 0)) & is.na(x)] <- 0

  x

}



na2zero(x = c(3, 4, NA, NA, NA, 3, 3))

[1]  3  4 NA  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))

[1]  3  4 NA  0  0  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L)))

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

Previous Answer (also fast) -

Here's one way using rle and replace from base R. This method turns every NA, that is not an endpoint in the running length, into a 0 -

na2zero <- function(x) {

  run_lengths <- rle(is.na(x))$lengths

  replace(x, 

    sequence(run_lengths) != 1 &

    sequence(run_lengths) != rep(run_lengths, run_lengths) &

    is.na(x),

  0)

}



na2zero(x = c(3, 4, NA, NA, NA, 3, 3))

[1]  3  4 NA  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))

[1]  3  4 NA  0  0  0 NA  3  3

Updated Benchmarks -

set.seed(2)

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e5, T)



microbenchmark(

  Rui(x),

  Shree_old(x), Shree_new(x),

  markus(x),

  IceCreamT(x),

  Uwe1(x), Uwe2(x), Uwe_Reduce(x),

  Grothendieck(x),

  times = 50

)



all.equal(Shree_new(x), Rui(x)) # [1] TRUE

all.equal(Shree_new(x), Shree_old(x)) # [1] TRUE

all.equal(Shree_new(x), markus(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe1(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe2(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe_Reduce(x)) # [1] TRUE

all.equal(Shree_new(x), Grothendieck(x)) # [1] TRUE





Unit: milliseconds

           expr        min         lq        mean     median          uq        max neval

         Rui(x) 286.026540 307.586604  342.620266 318.404731  363.844258  518.03330    50

   Shree_old(x)  51.556489  62.038875   85.348031  65.012384   81.882141  327.57514    50

   Shree_new(x)   3.996918   4.258248   17.210709   6.298946   10.335142  207.14732    50

      markus(x) 853.513854 885.419719 1001.450726 919.930389 1018.353847 1642.25435    50

   IceCreamT(x)  12.162079  13.773873   22.555446  15.021700   21.271498  199.08993    50

        Uwe1(x) 162.536980 183.566490  225.801038 196.882049  269.020395  439.17737    50

        Uwe2(x)  83.582360  93.136277  115.608342  99.165997  115.376903  309.67290    50

  Uwe_Reduce(x)   1.732195   1.871940    4.215195   2.016815    4.842883   25.91542    50

Grothendieck(x) 620.814291 688.107779  767.749387 746.699435  850.442643  982.49094    50

edited 4 hours ago

answered 9 hours ago

Shree

5,0371 gold badge7 silver badges26 bronze badges

While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

– Mako212
9 hours ago

@Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

– Shree
9 hours ago

1

Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

– Mako212
9 hours ago

Nah, the first 0 in your output is > 2 away from nearest valid point.

– Shree
9 hours ago

1

The question says more than 2 intervals from a valid data point

– Mako212
9 hours ago

|
show 1 more comment

Update -

na2zero <- function(x) {

  x[is.na(lag(x, 1, 0)) & is.na(lead(x, 1, 0)) & is.na(x)] <- 0

  x

}



na2zero(x = c(3, 4, NA, NA, NA, 3, 3))

[1]  3  4 NA  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))

[1]  3  4 NA  0  0  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L)))

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

Previous Answer (also fast) -

Here's one way using rle and replace from base R. This method turns every NA, that is not an endpoint in the running length, into a 0 -

na2zero <- function(x) {

  run_lengths <- rle(is.na(x))$lengths

  replace(x, 

    sequence(run_lengths) != 1 &

    sequence(run_lengths) != rep(run_lengths, run_lengths) &

    is.na(x),

  0)

}



na2zero(x = c(3, 4, NA, NA, NA, 3, 3))

[1]  3  4 NA  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))

[1]  3  4 NA  0  0  0 NA  3  3

Updated Benchmarks -

set.seed(2)

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e5, T)



microbenchmark(

  Rui(x),

  Shree_old(x), Shree_new(x),

  markus(x),

  IceCreamT(x),

  Uwe1(x), Uwe2(x), Uwe_Reduce(x),

  Grothendieck(x),

  times = 50

)



all.equal(Shree_new(x), Rui(x)) # [1] TRUE

all.equal(Shree_new(x), Shree_old(x)) # [1] TRUE

all.equal(Shree_new(x), markus(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe1(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe2(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe_Reduce(x)) # [1] TRUE

all.equal(Shree_new(x), Grothendieck(x)) # [1] TRUE





Unit: milliseconds

           expr        min         lq        mean     median          uq        max neval

         Rui(x) 286.026540 307.586604  342.620266 318.404731  363.844258  518.03330    50

   Shree_old(x)  51.556489  62.038875   85.348031  65.012384   81.882141  327.57514    50

   Shree_new(x)   3.996918   4.258248   17.210709   6.298946   10.335142  207.14732    50

      markus(x) 853.513854 885.419719 1001.450726 919.930389 1018.353847 1642.25435    50

   IceCreamT(x)  12.162079  13.773873   22.555446  15.021700   21.271498  199.08993    50

        Uwe1(x) 162.536980 183.566490  225.801038 196.882049  269.020395  439.17737    50

        Uwe2(x)  83.582360  93.136277  115.608342  99.165997  115.376903  309.67290    50

  Uwe_Reduce(x)   1.732195   1.871940    4.215195   2.016815    4.842883   25.91542    50

Grothendieck(x) 620.814291 688.107779  767.749387 746.699435  850.442643  982.49094    50

edited 4 hours ago

answered 9 hours ago

Shree

5,0371 gold badge7 silver badges26 bronze badges

Update -

na2zero <- function(x) {

  x[is.na(lag(x, 1, 0)) & is.na(lead(x, 1, 0)) & is.na(x)] <- 0

  x

}



na2zero(x = c(3, 4, NA, NA, NA, 3, 3))

[1]  3  4 NA  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))

[1]  3  4 NA  0  0  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L)))

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

Previous Answer (also fast) -

Here's one way using rle and replace from base R. This method turns every NA, that is not an endpoint in the running length, into a 0 -

na2zero <- function(x) {

  run_lengths <- rle(is.na(x))$lengths

  replace(x, 

    sequence(run_lengths) != 1 &

    sequence(run_lengths) != rep(run_lengths, run_lengths) &

    is.na(x),

  0)

}



na2zero(x = c(3, 4, NA, NA, NA, 3, 3))

[1]  3  4 NA  0 NA  3  3



na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))

[1]  3  4 NA  0  0  0 NA  3  3

Updated Benchmarks -

set.seed(2)

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e5, T)



microbenchmark(

  Rui(x),

  Shree_old(x), Shree_new(x),

  markus(x),

  IceCreamT(x),

  Uwe1(x), Uwe2(x), Uwe_Reduce(x),

  Grothendieck(x),

  times = 50

)



all.equal(Shree_new(x), Rui(x)) # [1] TRUE

all.equal(Shree_new(x), Shree_old(x)) # [1] TRUE

all.equal(Shree_new(x), markus(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe1(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe2(x)) # [1] TRUE

all.equal(Shree_new(x), Uwe_Reduce(x)) # [1] TRUE

all.equal(Shree_new(x), Grothendieck(x)) # [1] TRUE





Unit: milliseconds

           expr        min         lq        mean     median          uq        max neval

         Rui(x) 286.026540 307.586604  342.620266 318.404731  363.844258  518.03330    50

   Shree_old(x)  51.556489  62.038875   85.348031  65.012384   81.882141  327.57514    50

   Shree_new(x)   3.996918   4.258248   17.210709   6.298946   10.335142  207.14732    50

      markus(x) 853.513854 885.419719 1001.450726 919.930389 1018.353847 1642.25435    50

   IceCreamT(x)  12.162079  13.773873   22.555446  15.021700   21.271498  199.08993    50

        Uwe1(x) 162.536980 183.566490  225.801038 196.882049  269.020395  439.17737    50

        Uwe2(x)  83.582360  93.136277  115.608342  99.165997  115.376903  309.67290    50

  Uwe_Reduce(x)   1.732195   1.871940    4.215195   2.016815    4.842883   25.91542    50

Grothendieck(x) 620.814291 688.107779  767.749387 746.699435  850.442643  982.49094    50

edited 4 hours ago

answered 9 hours ago

Shree

5,0371 gold badge7 silver badges26 bronze badges

edited 4 hours ago

answered 9 hours ago

Shree

5,0371 gold badge7 silver badges26 bronze badges

answered 9 hours ago

Shree

5,0371 gold badge7 silver badges26 bronze badges

answered 9 hours ago

Shree

5,0371 gold badge7 silver badges26 bronze badges

While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

– Mako212
9 hours ago

@Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

– Shree
9 hours ago

1

Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

– Mako212
9 hours ago

Nah, the first 0 in your output is > 2 away from nearest valid point.

– Shree
9 hours ago

1

The question says more than 2 intervals from a valid data point

– Mako212
9 hours ago

|
show 1 more comment

While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

– Mako212
9 hours ago

@Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

– Shree
9 hours ago

1

Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

– Mako212
9 hours ago

Nah, the first 0 in your output is > 2 away from nearest valid point.

– Shree
9 hours ago

1

The question says more than 2 intervals from a valid data point

– Mako212
9 hours ago

While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

– Mako212
9 hours ago

@Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

– Shree
9 hours ago

Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

– Mako212
9 hours ago

Nah, the first 0 in your output is > 2 away from nearest valid point.

– Shree
9 hours ago

The question says more than 2 intervals from a valid data point

– Mako212
9 hours ago

|
show 1 more comment

Here's a data.table option

library(data.table)



na0_dt <- function(x){

  replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)

}

Benchmark and equality comparison.

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e5, T)



all.equal(Rui(x), na0_dt(x))

# [1] TRUE

all.equal(Rui(x), markus(x))

# [1] TRUE

all.equal(Rui(x), Shree(x))

# [1] TRUE

all.equal(Rui(x), Uwe1(x))

# [1] TRUE

all.equal(Rui(x), Uwe2(x))

# [1] TRUE



library(microbenchmark)

microbenchmark(

  Rui(x),

  Shree(x),

  markus(x),

  na0_dt(x),

  Uwe1(x),

  Uwe2(x),

  times = 50

)

# Unit: milliseconds

#       expr       min        lq       mean    median         uq        max neval   cld

#     Rui(x) 262.80052 286.75909  323.24804 300.73305  344.75091  488.45805    50    d 

#   Shree(x)  55.30956  64.63756   82.52067  72.70157   85.11101  367.10743    50  b   

#  markus(x) 807.05592 882.74498 1007.31244 958.71506 1075.13111 1590.34609    50     e

#  na0_dt(x)  14.95426  16.75980   19.37040  18.53108   19.99180   38.43161    50 a    

#    Uwe1(x) 164.20027 178.59412  210.47365 198.76931  224.98389  327.70639    50   c  

#    Uwe2(x)  85.33296  97.42117  110.56380 102.55040  115.19759  192.73526    50  b

Again with length 1e3

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e3, T)



microbenchmark(

  Rui(x),

  Shree(x),

  markus(x),

  na0_dt(x),

  Uwe1(x),

  Uwe2(x),

  times = 50

)

# Unit: microseconds

#       expr      min       lq      mean   median        uq       max neval  cld

#     Rui(x) 2543.180 2668.719 3084.0056 2847.386  3286.566  4820.105    50  bc 

#   Shree(x)  530.462  584.206  651.1348  600.821   653.949  1170.052    50 a   

#  markus(x) 7036.721 7604.517 9467.0319 8507.491 11014.979 24400.421    50    d

#  na0_dt(x) 1008.001 1092.513 1179.9802 1160.001  1247.591  1564.308    50 a   

#    Uwe1(x) 2907.078 3190.976 3773.5566 3424.207  4162.873  9388.723    50   c 

#    Uwe2(x) 2299.489 2566.155 2803.8250 2703.386  2996.104  4038.155    50  b

Functions from other answers used in this benchmark

Rui <- function(x){

  ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){

    if(anyNA(y)){

      if(length(y) > 2) y[-c(1, length(y))] <- 0

    }

    y

  })

}

Shree <- function(x) {

  run_lengths <- rle(is.na(x))$lengths

  replace(x, 

    sequence(run_lengths) != 1 &

    sequence(run_lengths) != rep(run_lengths, run_lengths) &

    is.na(x),

  0)

}

markus <- function(x, gap = 1) {



  stopifnot(gap >= 0)



  if (gap == 0) {

    x[is.na(x)] <- 0

    x } else {

      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))

      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

      replace(x, tmp > gap, 0)

    }

}

na0_dt <- function(x){

  replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)

}

Uwe1 <- function(x){

  data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x

}

Uwe2 <- function(x){

  x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0

  x

}

edited 7 hours ago

answered 8 hours ago

IceCreamToucan

12.3k1 gold badge8 silver badges19 bronze badges

1

Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

– Shree
7 hours ago

1

That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

– IceCreamToucan
7 hours ago

Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

– markus
7 hours ago

Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

– IceCreamToucan
7 hours ago

add a comment |

Here's a data.table option

library(data.table)



na0_dt <- function(x){

  replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)

}

Benchmark and equality comparison.

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e5, T)



all.equal(Rui(x), na0_dt(x))

# [1] TRUE

all.equal(Rui(x), markus(x))

# [1] TRUE

all.equal(Rui(x), Shree(x))

# [1] TRUE

all.equal(Rui(x), Uwe1(x))

# [1] TRUE

all.equal(Rui(x), Uwe2(x))

# [1] TRUE



library(microbenchmark)

microbenchmark(

  Rui(x),

  Shree(x),

  markus(x),

  na0_dt(x),

  Uwe1(x),

  Uwe2(x),

  times = 50

)

# Unit: milliseconds

#       expr       min        lq       mean    median         uq        max neval   cld

#     Rui(x) 262.80052 286.75909  323.24804 300.73305  344.75091  488.45805    50    d 

#   Shree(x)  55.30956  64.63756   82.52067  72.70157   85.11101  367.10743    50  b   

#  markus(x) 807.05592 882.74498 1007.31244 958.71506 1075.13111 1590.34609    50     e

#  na0_dt(x)  14.95426  16.75980   19.37040  18.53108   19.99180   38.43161    50 a    

#    Uwe1(x) 164.20027 178.59412  210.47365 198.76931  224.98389  327.70639    50   c  

#    Uwe2(x)  85.33296  97.42117  110.56380 102.55040  115.19759  192.73526    50  b

Again with length 1e3

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e3, T)



microbenchmark(

  Rui(x),

  Shree(x),

  markus(x),

  na0_dt(x),

  Uwe1(x),

  Uwe2(x),

  times = 50

)

# Unit: microseconds

#       expr      min       lq      mean   median        uq       max neval  cld

#     Rui(x) 2543.180 2668.719 3084.0056 2847.386  3286.566  4820.105    50  bc 

#   Shree(x)  530.462  584.206  651.1348  600.821   653.949  1170.052    50 a   

#  markus(x) 7036.721 7604.517 9467.0319 8507.491 11014.979 24400.421    50    d

#  na0_dt(x) 1008.001 1092.513 1179.9802 1160.001  1247.591  1564.308    50 a   

#    Uwe1(x) 2907.078 3190.976 3773.5566 3424.207  4162.873  9388.723    50   c 

#    Uwe2(x) 2299.489 2566.155 2803.8250 2703.386  2996.104  4038.155    50  b

Functions from other answers used in this benchmark

Rui <- function(x){

  ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){

    if(anyNA(y)){

      if(length(y) > 2) y[-c(1, length(y))] <- 0

    }

    y

  })

}

Shree <- function(x) {

  run_lengths <- rle(is.na(x))$lengths

  replace(x, 

    sequence(run_lengths) != 1 &

    sequence(run_lengths) != rep(run_lengths, run_lengths) &

    is.na(x),

  0)

}

markus <- function(x, gap = 1) {



  stopifnot(gap >= 0)



  if (gap == 0) {

    x[is.na(x)] <- 0

    x } else {

      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))

      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

      replace(x, tmp > gap, 0)

    }

}

na0_dt <- function(x){

  replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)

}

Uwe1 <- function(x){

  data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x

}

Uwe2 <- function(x){

  x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0

  x

}

edited 7 hours ago

answered 8 hours ago

IceCreamToucan

12.3k1 gold badge8 silver badges19 bronze badges

1

Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

– Shree
7 hours ago

1

That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

– IceCreamToucan
7 hours ago

Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

– markus
7 hours ago

Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

– IceCreamToucan
7 hours ago

add a comment |

Here's a data.table option

library(data.table)



na0_dt <- function(x){

  replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)

}

Benchmark and equality comparison.

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e5, T)



all.equal(Rui(x), na0_dt(x))

# [1] TRUE

all.equal(Rui(x), markus(x))

# [1] TRUE

all.equal(Rui(x), Shree(x))

# [1] TRUE

all.equal(Rui(x), Uwe1(x))

# [1] TRUE

all.equal(Rui(x), Uwe2(x))

# [1] TRUE



library(microbenchmark)

microbenchmark(

  Rui(x),

  Shree(x),

  markus(x),

  na0_dt(x),

  Uwe1(x),

  Uwe2(x),

  times = 50

)

# Unit: milliseconds

#       expr       min        lq       mean    median         uq        max neval   cld

#     Rui(x) 262.80052 286.75909  323.24804 300.73305  344.75091  488.45805    50    d 

#   Shree(x)  55.30956  64.63756   82.52067  72.70157   85.11101  367.10743    50  b   

#  markus(x) 807.05592 882.74498 1007.31244 958.71506 1075.13111 1590.34609    50     e

#  na0_dt(x)  14.95426  16.75980   19.37040  18.53108   19.99180   38.43161    50 a    

#    Uwe1(x) 164.20027 178.59412  210.47365 198.76931  224.98389  327.70639    50   c  

#    Uwe2(x)  85.33296  97.42117  110.56380 102.55040  115.19759  192.73526    50  b

Again with length 1e3

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e3, T)



microbenchmark(

  Rui(x),

  Shree(x),

  markus(x),

  na0_dt(x),

  Uwe1(x),

  Uwe2(x),

  times = 50

)

# Unit: microseconds

#       expr      min       lq      mean   median        uq       max neval  cld

#     Rui(x) 2543.180 2668.719 3084.0056 2847.386  3286.566  4820.105    50  bc 

#   Shree(x)  530.462  584.206  651.1348  600.821   653.949  1170.052    50 a   

#  markus(x) 7036.721 7604.517 9467.0319 8507.491 11014.979 24400.421    50    d

#  na0_dt(x) 1008.001 1092.513 1179.9802 1160.001  1247.591  1564.308    50 a   

#    Uwe1(x) 2907.078 3190.976 3773.5566 3424.207  4162.873  9388.723    50   c 

#    Uwe2(x) 2299.489 2566.155 2803.8250 2703.386  2996.104  4038.155    50  b

Functions from other answers used in this benchmark

Rui <- function(x){

  ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){

    if(anyNA(y)){

      if(length(y) > 2) y[-c(1, length(y))] <- 0

    }

    y

  })

}

Shree <- function(x) {

  run_lengths <- rle(is.na(x))$lengths

  replace(x, 

    sequence(run_lengths) != 1 &

    sequence(run_lengths) != rep(run_lengths, run_lengths) &

    is.na(x),

  0)

}

markus <- function(x, gap = 1) {



  stopifnot(gap >= 0)



  if (gap == 0) {

    x[is.na(x)] <- 0

    x } else {

      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))

      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

      replace(x, tmp > gap, 0)

    }

}

na0_dt <- function(x){

  replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)

}

Uwe1 <- function(x){

  data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x

}

Uwe2 <- function(x){

  x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0

  x

}

edited 7 hours ago

answered 8 hours ago

IceCreamToucan

12.3k1 gold badge8 silver badges19 bronze badges

Here's a data.table option

library(data.table)



na0_dt <- function(x){

  replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)

}

Benchmark and equality comparison.

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e5, T)



all.equal(Rui(x), na0_dt(x))

# [1] TRUE

all.equal(Rui(x), markus(x))

# [1] TRUE

all.equal(Rui(x), Shree(x))

# [1] TRUE

all.equal(Rui(x), Uwe1(x))

# [1] TRUE

all.equal(Rui(x), Uwe2(x))

# [1] TRUE



library(microbenchmark)

microbenchmark(

  Rui(x),

  Shree(x),

  markus(x),

  na0_dt(x),

  Uwe1(x),

  Uwe2(x),

  times = 50

)

# Unit: milliseconds

#       expr       min        lq       mean    median         uq        max neval   cld

#     Rui(x) 262.80052 286.75909  323.24804 300.73305  344.75091  488.45805    50    d 

#   Shree(x)  55.30956  64.63756   82.52067  72.70157   85.11101  367.10743    50  b   

#  markus(x) 807.05592 882.74498 1007.31244 958.71506 1075.13111 1590.34609    50     e

#  na0_dt(x)  14.95426  16.75980   19.37040  18.53108   19.99180   38.43161    50 a    

#    Uwe1(x) 164.20027 178.59412  210.47365 198.76931  224.98389  327.70639    50   c  

#    Uwe2(x)  85.33296  97.42117  110.56380 102.55040  115.19759  192.73526    50  b

Again with length 1e3

x <- c(3, 4, NA, NA, NA, 3, 3)

x <- sample(x, 1e3, T)



microbenchmark(

  Rui(x),

  Shree(x),

  markus(x),

  na0_dt(x),

  Uwe1(x),

  Uwe2(x),

  times = 50

)

# Unit: microseconds

#       expr      min       lq      mean   median        uq       max neval  cld

#     Rui(x) 2543.180 2668.719 3084.0056 2847.386  3286.566  4820.105    50  bc 

#   Shree(x)  530.462  584.206  651.1348  600.821   653.949  1170.052    50 a   

#  markus(x) 7036.721 7604.517 9467.0319 8507.491 11014.979 24400.421    50    d

#  na0_dt(x) 1008.001 1092.513 1179.9802 1160.001  1247.591  1564.308    50 a   

#    Uwe1(x) 2907.078 3190.976 3773.5566 3424.207  4162.873  9388.723    50   c 

#    Uwe2(x) 2299.489 2566.155 2803.8250 2703.386  2996.104  4038.155    50  b

Functions from other answers used in this benchmark

Rui <- function(x){

  ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){

    if(anyNA(y)){

      if(length(y) > 2) y[-c(1, length(y))] <- 0

    }

    y

  })

}

Shree <- function(x) {

  run_lengths <- rle(is.na(x))$lengths

  replace(x, 

    sequence(run_lengths) != 1 &

    sequence(run_lengths) != rep(run_lengths, run_lengths) &

    is.na(x),

  0)

}

markus <- function(x, gap = 1) {



  stopifnot(gap >= 0)



  if (gap == 0) {

    x[is.na(x)] <- 0

    x } else {

      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))

      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

      replace(x, tmp > gap, 0)

    }

}

na0_dt <- function(x){

  replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)

}

Uwe1 <- function(x){

  data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x

}

Uwe2 <- function(x){

  x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0

  x

}

edited 7 hours ago

answered 8 hours ago

IceCreamToucan

12.3k1 gold badge8 silver badges19 bronze badges

edited 7 hours ago

answered 8 hours ago

IceCreamToucan

12.3k1 gold badge8 silver badges19 bronze badges

answered 8 hours ago

IceCreamToucan

12.3k1 gold badge8 silver badges19 bronze badges

answered 8 hours ago

IceCreamToucan

12.3k1 gold badge8 silver badges19 bronze badges

1

Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

– Shree
7 hours ago

1

That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

– IceCreamToucan
7 hours ago

Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

– markus
7 hours ago

Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

– IceCreamToucan
7 hours ago

add a comment |

1

Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

– Shree
7 hours ago

1

That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

– IceCreamToucan
7 hours ago

Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

– markus
7 hours ago

Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

– IceCreamToucan
7 hours ago

Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

– Shree
7 hours ago

That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

– IceCreamToucan
7 hours ago

Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

– markus
7 hours ago

Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

– IceCreamToucan
7 hours ago

add a comment |

For the sake of completeness, here are three other data.table approaches:

x <- c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L))



library(data.table)

data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0

x

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

`shift()` & `Reduce()`

I was so focused on finding the right way to create groups that I started to think about the straightforward approach rather late. The rule is quite simple:

Replace all NAs by zero which are preceeded and succeeded by another NA.

This can be accomplished by zoo::rollapply() as in G. Grothendieck's answer or by using lag() & lead() like in Shree's latest edit.

However, my own benchmark (not posted here to avoid duplication with Shree' benchmark) shows that data.table::shift() and Reduce() is the fastest method so far.

  isnax <- is.na(x) 

  x[Reduce(`&`, data.table::shift(isnax, -1:1))] <- 0

  x

It is also slightly faster than using lag() & lead() (please, note that this differs from Shree's version as is.na() is only called once):

  isnax <- is.na(x) 

  x[isnax & dplyr::lag(isnax) & dplyr::lead(isnax)] <- 0

  x

edited 4 hours ago

answered 8 hours ago

Uwe

25k7 gold badges56 silver badges78 bronze badges

Indeed, it is faster. Nice! I'll add it to my benchmarks.

– Shree
4 hours ago

2

@Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

– Uwe
4 hours ago

add a comment |

For the sake of completeness, here are three other data.table approaches:

x <- c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L))



library(data.table)

data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0

x

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

`shift()` & `Reduce()`

I was so focused on finding the right way to create groups that I started to think about the straightforward approach rather late. The rule is quite simple:

Replace all NAs by zero which are preceeded and succeeded by another NA.

This can be accomplished by zoo::rollapply() as in G. Grothendieck's answer or by using lag() & lead() like in Shree's latest edit.

However, my own benchmark (not posted here to avoid duplication with Shree' benchmark) shows that data.table::shift() and Reduce() is the fastest method so far.

  isnax <- is.na(x) 

  x[Reduce(`&`, data.table::shift(isnax, -1:1))] <- 0

  x

It is also slightly faster than using lag() & lead() (please, note that this differs from Shree's version as is.na() is only called once):

  isnax <- is.na(x) 

  x[isnax & dplyr::lag(isnax) & dplyr::lead(isnax)] <- 0

  x

edited 4 hours ago

answered 8 hours ago

Uwe

25k7 gold badges56 silver badges78 bronze badges

Indeed, it is faster. Nice! I'll add it to my benchmarks.

– Shree
4 hours ago

2

@Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

– Uwe
4 hours ago

add a comment |

For the sake of completeness, here are three other data.table approaches:

x <- c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L))



library(data.table)

data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0

x

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

`shift()` & `Reduce()`

I was so focused on finding the right way to create groups that I started to think about the straightforward approach rather late. The rule is quite simple:

Replace all NAs by zero which are preceeded and succeeded by another NA.

This can be accomplished by zoo::rollapply() as in G. Grothendieck's answer or by using lag() & lead() like in Shree's latest edit.

However, my own benchmark (not posted here to avoid duplication with Shree' benchmark) shows that data.table::shift() and Reduce() is the fastest method so far.

  isnax <- is.na(x) 

  x[Reduce(`&`, data.table::shift(isnax, -1:1))] <- 0

  x

It is also slightly faster than using lag() & lead() (please, note that this differs from Shree's version as is.na() is only called once):

  isnax <- is.na(x) 

  x[isnax & dplyr::lag(isnax) & dplyr::lead(isnax)] <- 0

  x

edited 4 hours ago

answered 8 hours ago

Uwe

25k7 gold badges56 silver badges78 bronze badges

For the sake of completeness, here are three other data.table approaches:

x <- c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L))



library(data.table)

data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0

x

[1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA

`shift()` & `Reduce()`

I was so focused on finding the right way to create groups that I started to think about the straightforward approach rather late. The rule is quite simple:

Replace all NAs by zero which are preceeded and succeeded by another NA.

This can be accomplished by zoo::rollapply() as in G. Grothendieck's answer or by using lag() & lead() like in Shree's latest edit.

However, my own benchmark (not posted here to avoid duplication with Shree' benchmark) shows that data.table::shift() and Reduce() is the fastest method so far.

  isnax <- is.na(x) 

  x[Reduce(`&`, data.table::shift(isnax, -1:1))] <- 0

  x

It is also slightly faster than using lag() & lead() (please, note that this differs from Shree's version as is.na() is only called once):

  isnax <- is.na(x) 

  x[isnax & dplyr::lag(isnax) & dplyr::lead(isnax)] <- 0

  x

edited 4 hours ago

answered 8 hours ago

Uwe

25k7 gold badges56 silver badges78 bronze badges

edited 4 hours ago

answered 8 hours ago

Uwe

25k7 gold badges56 silver badges78 bronze badges

answered 8 hours ago

Uwe

25k7 gold badges56 silver badges78 bronze badges

answered 8 hours ago

Uwe

25k7 gold badges56 silver badges78 bronze badges

Indeed, it is faster. Nice! I'll add it to my benchmarks.

– Shree
4 hours ago

2

@Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

– Uwe
4 hours ago

add a comment |

Indeed, it is faster. Nice! I'll add it to my benchmarks.

– Shree
4 hours ago

2

@Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

– Uwe
4 hours ago

Indeed, it is faster. Nice! I'll add it to my benchmarks.

– Shree
4 hours ago

@Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

– Uwe
4 hours ago

add a comment |

library(zoo)



replace(x, rollapply(c(TRUE, is.na(x), TRUE), 3, all), 0)

## [1]  3  4 NA  0 NA  3  3

edited 7 hours ago

answered 7 hours ago

G. Grothendieck

159k11 gold badges143 silver badges251 bronze badges

zoo has some pretty handy functions to work with missing values

– PavoDive
7 hours ago

add a comment |

library(zoo)



replace(x, rollapply(c(TRUE, is.na(x), TRUE), 3, all), 0)

## [1]  3  4 NA  0 NA  3  3

edited 7 hours ago

answered 7 hours ago

G. Grothendieck

159k11 gold badges143 silver badges251 bronze badges

zoo has some pretty handy functions to work with missing values

– PavoDive
7 hours ago

add a comment |

library(zoo)



replace(x, rollapply(c(TRUE, is.na(x), TRUE), 3, all), 0)

## [1]  3  4 NA  0 NA  3  3

edited 7 hours ago

answered 7 hours ago

G. Grothendieck

159k11 gold badges143 silver badges251 bronze badges

library(zoo)



replace(x, rollapply(c(TRUE, is.na(x), TRUE), 3, all), 0)

## [1]  3  4 NA  0 NA  3  3

edited 7 hours ago

answered 7 hours ago

G. Grothendieck

159k11 gold badges143 silver badges251 bronze badges

edited 7 hours ago

answered 7 hours ago

G. Grothendieck

159k11 gold badges143 silver badges251 bronze badges

answered 7 hours ago

G. Grothendieck

159k11 gold badges143 silver badges251 bronze badges

answered 7 hours ago

G. Grothendieck

159k11 gold badges143 silver badges251 bronze badges

zoo has some pretty handy functions to work with missing values

– PavoDive
7 hours ago

add a comment |

zoo has some pretty handy functions to work with missing values

– PavoDive
7 hours ago

zoo has some pretty handy functions to work with missing values

– PavoDive
7 hours ago

add a comment |

Another base approach

x <- c(3, 4, NA, NA, NA, 3, 3, NA, 3, NA, NA, NA, NA, 1)

Create a grouping variable

grp <- with(rle(is.na(x)), rep(seq_along(lengths), lengths)) # same as rleid(is.na(x))

For each group calculate the parallel minimum of cumsum(is.na(x)) and its reverse (which will be greater than one for values 'which are more than 2 intervals from a valid data point' away)

tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

Finally use this as an identifier to replace desired values in x

replace(x, tmp > 1, 0)

# [1]  3  4 NA  0 NA  3  3 NA  3 NA  0  0 NA  1

Written as a function

f <- function(x, gap = 1) {



  stopifnot(gap >= 0)



  if (gap == 0) {

    x[is.na(x)] <- 0

    x } else {

      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))

      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

      replace(x, tmp > gap, 0)

    }

}

f(x)

edited 8 hours ago

answered 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

add a comment |

Another base approach

x <- c(3, 4, NA, NA, NA, 3, 3, NA, 3, NA, NA, NA, NA, 1)

Create a grouping variable

grp <- with(rle(is.na(x)), rep(seq_along(lengths), lengths)) # same as rleid(is.na(x))

For each group calculate the parallel minimum of cumsum(is.na(x)) and its reverse (which will be greater than one for values 'which are more than 2 intervals from a valid data point' away)

tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

Finally use this as an identifier to replace desired values in x

replace(x, tmp > 1, 0)

# [1]  3  4 NA  0 NA  3  3 NA  3 NA  0  0 NA  1

Written as a function

f <- function(x, gap = 1) {



  stopifnot(gap >= 0)



  if (gap == 0) {

    x[is.na(x)] <- 0

    x } else {

      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))

      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

      replace(x, tmp > gap, 0)

    }

}

f(x)

edited 8 hours ago

answered 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

add a comment |

Another base approach

x <- c(3, 4, NA, NA, NA, 3, 3, NA, 3, NA, NA, NA, NA, 1)

Create a grouping variable

grp <- with(rle(is.na(x)), rep(seq_along(lengths), lengths)) # same as rleid(is.na(x))

For each group calculate the parallel minimum of cumsum(is.na(x)) and its reverse (which will be greater than one for values 'which are more than 2 intervals from a valid data point' away)

tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

Finally use this as an identifier to replace desired values in x

replace(x, tmp > 1, 0)

# [1]  3  4 NA  0 NA  3  3 NA  3 NA  0  0 NA  1

Written as a function

f <- function(x, gap = 1) {



  stopifnot(gap >= 0)



  if (gap == 0) {

    x[is.na(x)] <- 0

    x } else {

      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))

      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

      replace(x, tmp > gap, 0)

    }

}

f(x)

edited 8 hours ago

answered 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

Another base approach

x <- c(3, 4, NA, NA, NA, 3, 3, NA, 3, NA, NA, NA, NA, 1)

Create a grouping variable

grp <- with(rle(is.na(x)), rep(seq_along(lengths), lengths)) # same as rleid(is.na(x))

For each group calculate the parallel minimum of cumsum(is.na(x)) and its reverse (which will be greater than one for values 'which are more than 2 intervals from a valid data point' away)

tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

Finally use this as an identifier to replace desired values in x

replace(x, tmp > 1, 0)

# [1]  3  4 NA  0 NA  3  3 NA  3 NA  0  0 NA  1

Written as a function

f <- function(x, gap = 1) {



  stopifnot(gap >= 0)



  if (gap == 0) {

    x[is.na(x)] <- 0

    x } else {

      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))

      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))

      replace(x, tmp > gap, 0)

    }

}

f(x)

edited 8 hours ago

answered 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

edited 8 hours ago

answered 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

answered 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

answered 9 hours ago

markus

17.6k2 gold badges16 silver badges39 bronze badges

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Mdthbs