Fill NAs in R with zero if the next valid data point is more than 2 intervals awayConvert a list of data...

If I wouldn't want to read the story, is writing it still a good idea?

Cascading Repair Costs following Blown Head Gasket on a 2004 Subaru Outback

Impossible darts scores

Does Marvel have an equivalent of the Green Lantern?

Why did pressing the joystick button spit out keypresses?

Unusual mail headers, evidence of an attempted attack. Have I been pwned?

Why is C++ initial allocation so much larger than C's?

Is my Rep in Stack-Exchange Form?

Computing a trigonometric integral

How to split an equation in two lines?

What reason would an alien civilization have for building a Dyson Sphere (or Swarm) if cheap Nuclear fusion is available?

Why is the voltage measurement of this circuit different when the switch is on?

Can any NP-Complete Problem be solved using at most polynomial space (but while using exponential time?)

Where can I find a database of galactic spectra?

What is the legal status of travelling with methadone in your carry-on?

Iterate MapThread with matrices

Why do some games show lights shine thorugh walls?

Should I prioritize my 401(k) over my student loans?

First-year PhD giving a talk among well-established researchers in the field

Is adding a new player (or players) a DM decision, or a group decision?

Interaction between Leyline of Anticipation and Teferi, Time Raveler

Underbar nabla symbol doesn't work

Trainee keeps missing deadlines for independent learning

Hand soldering SMD 1206 components



Fill NAs in R with zero if the next valid data point is more than 2 intervals away


Convert a list of data frames into one data frameHow can I use mySQL replace() to replace strings in multiple records?How to rename a single column in a data.frame?grep using a character vector with multiple patternsMatching a fixed point with an interval in a data framedata.table vs dplyr: can one do something well the other can't or does poorly?Calculate the bearing between more than two data pointsFind interval from data frame start and end pointsCreate intervals using data pointsPlotting splines with confidence intervals on top of data points






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







9















I have multiple vectors with NAs and my intention to fill NA which are more than 2 intervals from a valid data point with 0. for example:



x <- c(3, 4, NA, NA, NA, 3, 3)


Expected output is,



3, 4, NA, 0, NA, 3, 3 









share|improve this question































    9















    I have multiple vectors with NAs and my intention to fill NA which are more than 2 intervals from a valid data point with 0. for example:



    x <- c(3, 4, NA, NA, NA, 3, 3)


    Expected output is,



    3, 4, NA, 0, NA, 3, 3 









    share|improve this question



























      9












      9








      9


      0






      I have multiple vectors with NAs and my intention to fill NA which are more than 2 intervals from a valid data point with 0. for example:



      x <- c(3, 4, NA, NA, NA, 3, 3)


      Expected output is,



      3, 4, NA, 0, NA, 3, 3 









      share|improve this question
















      I have multiple vectors with NAs and my intention to fill NA which are more than 2 intervals from a valid data point with 0. for example:



      x <- c(3, 4, NA, NA, NA, 3, 3)


      Expected output is,



      3, 4, NA, 0, NA, 3, 3 






      r replace na






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 9 hours ago









      markus

      17.6k2 gold badges16 silver badges39 bronze badges




      17.6k2 gold badges16 silver badges39 bronze badges










      asked 9 hours ago









      John MusauJohn Musau

      536 bronze badges




      536 bronze badges
























          6 Answers
          6






          active

          oldest

          votes


















          2














          Maybe there are simpler solutions but this one works.



          na2zero <- function(x){
          ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){
          if(anyNA(y)){
          if(length(y) > 2) y[-c(1, length(y))] <- 0
          }
          y
          })
          }

          na2zero(x)
          #[1] 3 4 NA 0 NA 3 3

          X <- list(x, c(x, x), c(3, 4, NA, NA, NA, NA, 3, 3))
          lapply(X, na2zero)





          share|improve this answer































            6














            Update -



            Here's probably one of the simplest and fastest solutions (Thanks to answer from G. Grothendieck). Simply knowing whether the value is NA on either side of any NA is sufficient information. Therefore, using lead and lag from dplyr package -



            na2zero <- function(x) {
            x[is.na(lag(x, 1, 0)) & is.na(lead(x, 1, 0)) & is.na(x)] <- 0
            x
            }

            na2zero(x = c(3, 4, NA, NA, NA, 3, 3))
            [1] 3 4 NA 0 NA 3 3

            na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))
            [1] 3 4 NA 0 0 0 NA 3 3

            na2zero(x = c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L)))
            [1] 3 4 NA 0 NA 3 3 NA NA 1 NA 0 0 NA 0 0 NA




            Previous Answer (also fast) -



            Here's one way using rle and replace from base R. This method turns every NA, that is not an endpoint in the running length, into a 0 -



            na2zero <- function(x) {
            run_lengths <- rle(is.na(x))$lengths
            replace(x,
            sequence(run_lengths) != 1 &
            sequence(run_lengths) != rep(run_lengths, run_lengths) &
            is.na(x),
            0)
            }

            na2zero(x = c(3, 4, NA, NA, NA, 3, 3))
            [1] 3 4 NA 0 NA 3 3

            na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))
            [1] 3 4 NA 0 0 0 NA 3 3




            Updated Benchmarks -



            set.seed(2)
            x <- c(3, 4, NA, NA, NA, 3, 3)
            x <- sample(x, 1e5, T)

            microbenchmark(
            Rui(x),
            Shree_old(x), Shree_new(x),
            markus(x),
            IceCreamT(x),
            Uwe1(x), Uwe2(x), Uwe_Reduce(x),
            Grothendieck(x),
            times = 50
            )

            all.equal(Shree_new(x), Rui(x)) # [1] TRUE
            all.equal(Shree_new(x), Shree_old(x)) # [1] TRUE
            all.equal(Shree_new(x), markus(x)) # [1] TRUE
            all.equal(Shree_new(x), Uwe1(x)) # [1] TRUE
            all.equal(Shree_new(x), Uwe2(x)) # [1] TRUE
            all.equal(Shree_new(x), Uwe_Reduce(x)) # [1] TRUE
            all.equal(Shree_new(x), Grothendieck(x)) # [1] TRUE


            Unit: milliseconds
            expr min lq mean median uq max neval
            Rui(x) 286.026540 307.586604 342.620266 318.404731 363.844258 518.03330 50
            Shree_old(x) 51.556489 62.038875 85.348031 65.012384 81.882141 327.57514 50
            Shree_new(x) 3.996918 4.258248 17.210709 6.298946 10.335142 207.14732 50
            markus(x) 853.513854 885.419719 1001.450726 919.930389 1018.353847 1642.25435 50
            IceCreamT(x) 12.162079 13.773873 22.555446 15.021700 21.271498 199.08993 50
            Uwe1(x) 162.536980 183.566490 225.801038 196.882049 269.020395 439.17737 50
            Uwe2(x) 83.582360 93.136277 115.608342 99.165997 115.376903 309.67290 50
            Uwe_Reduce(x) 1.732195 1.871940 4.215195 2.016815 4.842883 25.91542 50
            Grothendieck(x) 620.814291 688.107779 767.749387 746.699435 850.442643 982.49094 50





            share|improve this answer


























            • While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

              – Mako212
              9 hours ago











            • @Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

              – Shree
              9 hours ago






            • 1





              Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

              – Mako212
              9 hours ago











            • Nah, the first 0 in your output is > 2 away from nearest valid point.

              – Shree
              9 hours ago






            • 1





              The question says more than 2 intervals from a valid data point

              – Mako212
              9 hours ago



















            4














            Here's a data.table option



            library(data.table)

            na0_dt <- function(x){
            replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)
            }


            Benchmark and equality comparison.



            x <- c(3, 4, NA, NA, NA, 3, 3)
            x <- sample(x, 1e5, T)

            all.equal(Rui(x), na0_dt(x))
            # [1] TRUE
            all.equal(Rui(x), markus(x))
            # [1] TRUE
            all.equal(Rui(x), Shree(x))
            # [1] TRUE
            all.equal(Rui(x), Uwe1(x))
            # [1] TRUE
            all.equal(Rui(x), Uwe2(x))
            # [1] TRUE

            library(microbenchmark)
            microbenchmark(
            Rui(x),
            Shree(x),
            markus(x),
            na0_dt(x),
            Uwe1(x),
            Uwe2(x),
            times = 50
            )
            # Unit: milliseconds
            # expr min lq mean median uq max neval cld
            # Rui(x) 262.80052 286.75909 323.24804 300.73305 344.75091 488.45805 50 d
            # Shree(x) 55.30956 64.63756 82.52067 72.70157 85.11101 367.10743 50 b
            # markus(x) 807.05592 882.74498 1007.31244 958.71506 1075.13111 1590.34609 50 e
            # na0_dt(x) 14.95426 16.75980 19.37040 18.53108 19.99180 38.43161 50 a
            # Uwe1(x) 164.20027 178.59412 210.47365 198.76931 224.98389 327.70639 50 c
            # Uwe2(x) 85.33296 97.42117 110.56380 102.55040 115.19759 192.73526 50 b


            Again with length 1e3



            x <- c(3, 4, NA, NA, NA, 3, 3)
            x <- sample(x, 1e3, T)

            microbenchmark(
            Rui(x),
            Shree(x),
            markus(x),
            na0_dt(x),
            Uwe1(x),
            Uwe2(x),
            times = 50
            )
            # Unit: microseconds
            # expr min lq mean median uq max neval cld
            # Rui(x) 2543.180 2668.719 3084.0056 2847.386 3286.566 4820.105 50 bc
            # Shree(x) 530.462 584.206 651.1348 600.821 653.949 1170.052 50 a
            # markus(x) 7036.721 7604.517 9467.0319 8507.491 11014.979 24400.421 50 d
            # na0_dt(x) 1008.001 1092.513 1179.9802 1160.001 1247.591 1564.308 50 a
            # Uwe1(x) 2907.078 3190.976 3773.5566 3424.207 4162.873 9388.723 50 c
            # Uwe2(x) 2299.489 2566.155 2803.8250 2703.386 2996.104 4038.155 50 b


            Functions from other answers used in this benchmark



            Rui <- function(x){
            ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){
            if(anyNA(y)){
            if(length(y) > 2) y[-c(1, length(y))] <- 0
            }
            y
            })
            }
            Shree <- function(x) {
            run_lengths <- rle(is.na(x))$lengths
            replace(x,
            sequence(run_lengths) != 1 &
            sequence(run_lengths) != rep(run_lengths, run_lengths) &
            is.na(x),
            0)
            }
            markus <- function(x, gap = 1) {

            stopifnot(gap >= 0)

            if (gap == 0) {
            x[is.na(x)] <- 0
            x } else {
            grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))
            tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))
            replace(x, tmp > gap, 0)
            }
            }
            na0_dt <- function(x){
            replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)
            }
            Uwe1 <- function(x){
            data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x
            }
            Uwe2 <- function(x){
            x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0
            x
            }





            share|improve this answer





















            • 1





              Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

              – Shree
              7 hours ago






            • 1





              That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

              – IceCreamToucan
              7 hours ago













            • Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

              – markus
              7 hours ago











            • Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

              – IceCreamToucan
              7 hours ago





















            4














            For the sake of completeness, here are three other data.table approaches:



            x <- c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L))

            library(data.table)
            data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x



            [1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA



            x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0
            x



            [1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA




            shift() & Reduce()



            I was so focused on finding the right way to create groups that I started to think about the straightforward approach rather late. The rule is quite simple:




            Replace all NAs by zero which are preceeded and succeeded by another NA.




            This can be accomplished by zoo::rollapply() as in G. Grothendieck's answer or by using lag() & lead() like in Shree's latest edit.



            However, my own benchmark (not posted here to avoid duplication with Shree' benchmark) shows that data.table::shift() and Reduce() is the fastest method so far.



              isnax <- is.na(x) 
            x[Reduce(`&`, data.table::shift(isnax, -1:1))] <- 0
            x


            It is also slightly faster than using lag() & lead() (please, note that this differs from Shree's version as is.na() is only called once):



              isnax <- is.na(x) 
            x[isnax & dplyr::lag(isnax) & dplyr::lead(isnax)] <- 0
            x





            share|improve this answer


























            • Indeed, it is faster. Nice! I'll add it to my benchmarks.

              – Shree
              4 hours ago








            • 2





              @Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

              – Uwe
              4 hours ago



















            2














            Based on the example, I assume what you mean is that if the value is NA and adjacent values in both directions are NA (or in one direction if the value is first or last) then replace the value with 0. Using a centered rolling window of length 3 return TRUE if it is all NA and then replace the TRUE positions with 0. This gives the following one-liner



            library(zoo)

            replace(x, rollapply(c(TRUE, is.na(x), TRUE), 3, all), 0)
            ## [1] 3 4 NA 0 NA 3 3





            share|improve this answer


























            • zoo has some pretty handy functions to work with missing values

              – PavoDive
              7 hours ago



















            1














            Another base approach



            x <- c(3, 4, NA, NA, NA, 3, 3, NA, 3, NA, NA, NA, NA, 1)


            Create a grouping variable



            grp <- with(rle(is.na(x)), rep(seq_along(lengths), lengths)) # same as rleid(is.na(x))


            For each group calculate the parallel minimum of cumsum(is.na(x)) and its reverse (which will be greater than one for values 'which are more than 2 intervals from a valid data point' away)



            tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))


            Finally use this as an identifier to replace desired values in x



            replace(x, tmp > 1, 0)
            # [1] 3 4 NA 0 NA 3 3 NA 3 NA 0 0 NA 1




            Written as a function



            f <- function(x, gap = 1) {

            stopifnot(gap >= 0)

            if (gap == 0) {
            x[is.na(x)] <- 0
            x } else {
            grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))
            tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))
            replace(x, tmp > gap, 0)
            }
            }
            f(x)





            share|improve this answer




























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56692176%2ffill-nas-in-r-with-zero-if-the-next-valid-data-point-is-more-than-2-intervals-aw%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              6 Answers
              6






              active

              oldest

              votes








              6 Answers
              6






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              2














              Maybe there are simpler solutions but this one works.



              na2zero <- function(x){
              ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){
              if(anyNA(y)){
              if(length(y) > 2) y[-c(1, length(y))] <- 0
              }
              y
              })
              }

              na2zero(x)
              #[1] 3 4 NA 0 NA 3 3

              X <- list(x, c(x, x), c(3, 4, NA, NA, NA, NA, 3, 3))
              lapply(X, na2zero)





              share|improve this answer




























                2














                Maybe there are simpler solutions but this one works.



                na2zero <- function(x){
                ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){
                if(anyNA(y)){
                if(length(y) > 2) y[-c(1, length(y))] <- 0
                }
                y
                })
                }

                na2zero(x)
                #[1] 3 4 NA 0 NA 3 3

                X <- list(x, c(x, x), c(3, 4, NA, NA, NA, NA, 3, 3))
                lapply(X, na2zero)





                share|improve this answer


























                  2












                  2








                  2







                  Maybe there are simpler solutions but this one works.



                  na2zero <- function(x){
                  ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){
                  if(anyNA(y)){
                  if(length(y) > 2) y[-c(1, length(y))] <- 0
                  }
                  y
                  })
                  }

                  na2zero(x)
                  #[1] 3 4 NA 0 NA 3 3

                  X <- list(x, c(x, x), c(3, 4, NA, NA, NA, NA, 3, 3))
                  lapply(X, na2zero)





                  share|improve this answer













                  Maybe there are simpler solutions but this one works.



                  na2zero <- function(x){
                  ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){
                  if(anyNA(y)){
                  if(length(y) > 2) y[-c(1, length(y))] <- 0
                  }
                  y
                  })
                  }

                  na2zero(x)
                  #[1] 3 4 NA 0 NA 3 3

                  X <- list(x, c(x, x), c(3, 4, NA, NA, NA, NA, 3, 3))
                  lapply(X, na2zero)






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 9 hours ago









                  Rui BarradasRui Barradas

                  20.2k6 gold badges19 silver badges35 bronze badges




                  20.2k6 gold badges19 silver badges35 bronze badges

























                      6














                      Update -



                      Here's probably one of the simplest and fastest solutions (Thanks to answer from G. Grothendieck). Simply knowing whether the value is NA on either side of any NA is sufficient information. Therefore, using lead and lag from dplyr package -



                      na2zero <- function(x) {
                      x[is.na(lag(x, 1, 0)) & is.na(lead(x, 1, 0)) & is.na(x)] <- 0
                      x
                      }

                      na2zero(x = c(3, 4, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 NA 3 3

                      na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 0 0 NA 3 3

                      na2zero(x = c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L)))
                      [1] 3 4 NA 0 NA 3 3 NA NA 1 NA 0 0 NA 0 0 NA




                      Previous Answer (also fast) -



                      Here's one way using rle and replace from base R. This method turns every NA, that is not an endpoint in the running length, into a 0 -



                      na2zero <- function(x) {
                      run_lengths <- rle(is.na(x))$lengths
                      replace(x,
                      sequence(run_lengths) != 1 &
                      sequence(run_lengths) != rep(run_lengths, run_lengths) &
                      is.na(x),
                      0)
                      }

                      na2zero(x = c(3, 4, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 NA 3 3

                      na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 0 0 NA 3 3




                      Updated Benchmarks -



                      set.seed(2)
                      x <- c(3, 4, NA, NA, NA, 3, 3)
                      x <- sample(x, 1e5, T)

                      microbenchmark(
                      Rui(x),
                      Shree_old(x), Shree_new(x),
                      markus(x),
                      IceCreamT(x),
                      Uwe1(x), Uwe2(x), Uwe_Reduce(x),
                      Grothendieck(x),
                      times = 50
                      )

                      all.equal(Shree_new(x), Rui(x)) # [1] TRUE
                      all.equal(Shree_new(x), Shree_old(x)) # [1] TRUE
                      all.equal(Shree_new(x), markus(x)) # [1] TRUE
                      all.equal(Shree_new(x), Uwe1(x)) # [1] TRUE
                      all.equal(Shree_new(x), Uwe2(x)) # [1] TRUE
                      all.equal(Shree_new(x), Uwe_Reduce(x)) # [1] TRUE
                      all.equal(Shree_new(x), Grothendieck(x)) # [1] TRUE


                      Unit: milliseconds
                      expr min lq mean median uq max neval
                      Rui(x) 286.026540 307.586604 342.620266 318.404731 363.844258 518.03330 50
                      Shree_old(x) 51.556489 62.038875 85.348031 65.012384 81.882141 327.57514 50
                      Shree_new(x) 3.996918 4.258248 17.210709 6.298946 10.335142 207.14732 50
                      markus(x) 853.513854 885.419719 1001.450726 919.930389 1018.353847 1642.25435 50
                      IceCreamT(x) 12.162079 13.773873 22.555446 15.021700 21.271498 199.08993 50
                      Uwe1(x) 162.536980 183.566490 225.801038 196.882049 269.020395 439.17737 50
                      Uwe2(x) 83.582360 93.136277 115.608342 99.165997 115.376903 309.67290 50
                      Uwe_Reduce(x) 1.732195 1.871940 4.215195 2.016815 4.842883 25.91542 50
                      Grothendieck(x) 620.814291 688.107779 767.749387 746.699435 850.442643 982.49094 50





                      share|improve this answer


























                      • While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

                        – Mako212
                        9 hours ago











                      • @Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

                        – Shree
                        9 hours ago






                      • 1





                        Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

                        – Mako212
                        9 hours ago











                      • Nah, the first 0 in your output is > 2 away from nearest valid point.

                        – Shree
                        9 hours ago






                      • 1





                        The question says more than 2 intervals from a valid data point

                        – Mako212
                        9 hours ago
















                      6














                      Update -



                      Here's probably one of the simplest and fastest solutions (Thanks to answer from G. Grothendieck). Simply knowing whether the value is NA on either side of any NA is sufficient information. Therefore, using lead and lag from dplyr package -



                      na2zero <- function(x) {
                      x[is.na(lag(x, 1, 0)) & is.na(lead(x, 1, 0)) & is.na(x)] <- 0
                      x
                      }

                      na2zero(x = c(3, 4, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 NA 3 3

                      na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 0 0 NA 3 3

                      na2zero(x = c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L)))
                      [1] 3 4 NA 0 NA 3 3 NA NA 1 NA 0 0 NA 0 0 NA




                      Previous Answer (also fast) -



                      Here's one way using rle and replace from base R. This method turns every NA, that is not an endpoint in the running length, into a 0 -



                      na2zero <- function(x) {
                      run_lengths <- rle(is.na(x))$lengths
                      replace(x,
                      sequence(run_lengths) != 1 &
                      sequence(run_lengths) != rep(run_lengths, run_lengths) &
                      is.na(x),
                      0)
                      }

                      na2zero(x = c(3, 4, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 NA 3 3

                      na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 0 0 NA 3 3




                      Updated Benchmarks -



                      set.seed(2)
                      x <- c(3, 4, NA, NA, NA, 3, 3)
                      x <- sample(x, 1e5, T)

                      microbenchmark(
                      Rui(x),
                      Shree_old(x), Shree_new(x),
                      markus(x),
                      IceCreamT(x),
                      Uwe1(x), Uwe2(x), Uwe_Reduce(x),
                      Grothendieck(x),
                      times = 50
                      )

                      all.equal(Shree_new(x), Rui(x)) # [1] TRUE
                      all.equal(Shree_new(x), Shree_old(x)) # [1] TRUE
                      all.equal(Shree_new(x), markus(x)) # [1] TRUE
                      all.equal(Shree_new(x), Uwe1(x)) # [1] TRUE
                      all.equal(Shree_new(x), Uwe2(x)) # [1] TRUE
                      all.equal(Shree_new(x), Uwe_Reduce(x)) # [1] TRUE
                      all.equal(Shree_new(x), Grothendieck(x)) # [1] TRUE


                      Unit: milliseconds
                      expr min lq mean median uq max neval
                      Rui(x) 286.026540 307.586604 342.620266 318.404731 363.844258 518.03330 50
                      Shree_old(x) 51.556489 62.038875 85.348031 65.012384 81.882141 327.57514 50
                      Shree_new(x) 3.996918 4.258248 17.210709 6.298946 10.335142 207.14732 50
                      markus(x) 853.513854 885.419719 1001.450726 919.930389 1018.353847 1642.25435 50
                      IceCreamT(x) 12.162079 13.773873 22.555446 15.021700 21.271498 199.08993 50
                      Uwe1(x) 162.536980 183.566490 225.801038 196.882049 269.020395 439.17737 50
                      Uwe2(x) 83.582360 93.136277 115.608342 99.165997 115.376903 309.67290 50
                      Uwe_Reduce(x) 1.732195 1.871940 4.215195 2.016815 4.842883 25.91542 50
                      Grothendieck(x) 620.814291 688.107779 767.749387 746.699435 850.442643 982.49094 50





                      share|improve this answer


























                      • While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

                        – Mako212
                        9 hours ago











                      • @Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

                        – Shree
                        9 hours ago






                      • 1





                        Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

                        – Mako212
                        9 hours ago











                      • Nah, the first 0 in your output is > 2 away from nearest valid point.

                        – Shree
                        9 hours ago






                      • 1





                        The question says more than 2 intervals from a valid data point

                        – Mako212
                        9 hours ago














                      6












                      6








                      6







                      Update -



                      Here's probably one of the simplest and fastest solutions (Thanks to answer from G. Grothendieck). Simply knowing whether the value is NA on either side of any NA is sufficient information. Therefore, using lead and lag from dplyr package -



                      na2zero <- function(x) {
                      x[is.na(lag(x, 1, 0)) & is.na(lead(x, 1, 0)) & is.na(x)] <- 0
                      x
                      }

                      na2zero(x = c(3, 4, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 NA 3 3

                      na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 0 0 NA 3 3

                      na2zero(x = c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L)))
                      [1] 3 4 NA 0 NA 3 3 NA NA 1 NA 0 0 NA 0 0 NA




                      Previous Answer (also fast) -



                      Here's one way using rle and replace from base R. This method turns every NA, that is not an endpoint in the running length, into a 0 -



                      na2zero <- function(x) {
                      run_lengths <- rle(is.na(x))$lengths
                      replace(x,
                      sequence(run_lengths) != 1 &
                      sequence(run_lengths) != rep(run_lengths, run_lengths) &
                      is.na(x),
                      0)
                      }

                      na2zero(x = c(3, 4, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 NA 3 3

                      na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 0 0 NA 3 3




                      Updated Benchmarks -



                      set.seed(2)
                      x <- c(3, 4, NA, NA, NA, 3, 3)
                      x <- sample(x, 1e5, T)

                      microbenchmark(
                      Rui(x),
                      Shree_old(x), Shree_new(x),
                      markus(x),
                      IceCreamT(x),
                      Uwe1(x), Uwe2(x), Uwe_Reduce(x),
                      Grothendieck(x),
                      times = 50
                      )

                      all.equal(Shree_new(x), Rui(x)) # [1] TRUE
                      all.equal(Shree_new(x), Shree_old(x)) # [1] TRUE
                      all.equal(Shree_new(x), markus(x)) # [1] TRUE
                      all.equal(Shree_new(x), Uwe1(x)) # [1] TRUE
                      all.equal(Shree_new(x), Uwe2(x)) # [1] TRUE
                      all.equal(Shree_new(x), Uwe_Reduce(x)) # [1] TRUE
                      all.equal(Shree_new(x), Grothendieck(x)) # [1] TRUE


                      Unit: milliseconds
                      expr min lq mean median uq max neval
                      Rui(x) 286.026540 307.586604 342.620266 318.404731 363.844258 518.03330 50
                      Shree_old(x) 51.556489 62.038875 85.348031 65.012384 81.882141 327.57514 50
                      Shree_new(x) 3.996918 4.258248 17.210709 6.298946 10.335142 207.14732 50
                      markus(x) 853.513854 885.419719 1001.450726 919.930389 1018.353847 1642.25435 50
                      IceCreamT(x) 12.162079 13.773873 22.555446 15.021700 21.271498 199.08993 50
                      Uwe1(x) 162.536980 183.566490 225.801038 196.882049 269.020395 439.17737 50
                      Uwe2(x) 83.582360 93.136277 115.608342 99.165997 115.376903 309.67290 50
                      Uwe_Reduce(x) 1.732195 1.871940 4.215195 2.016815 4.842883 25.91542 50
                      Grothendieck(x) 620.814291 688.107779 767.749387 746.699435 850.442643 982.49094 50





                      share|improve this answer















                      Update -



                      Here's probably one of the simplest and fastest solutions (Thanks to answer from G. Grothendieck). Simply knowing whether the value is NA on either side of any NA is sufficient information. Therefore, using lead and lag from dplyr package -



                      na2zero <- function(x) {
                      x[is.na(lag(x, 1, 0)) & is.na(lead(x, 1, 0)) & is.na(x)] <- 0
                      x
                      }

                      na2zero(x = c(3, 4, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 NA 3 3

                      na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 0 0 NA 3 3

                      na2zero(x = c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L)))
                      [1] 3 4 NA 0 NA 3 3 NA NA 1 NA 0 0 NA 0 0 NA




                      Previous Answer (also fast) -



                      Here's one way using rle and replace from base R. This method turns every NA, that is not an endpoint in the running length, into a 0 -



                      na2zero <- function(x) {
                      run_lengths <- rle(is.na(x))$lengths
                      replace(x,
                      sequence(run_lengths) != 1 &
                      sequence(run_lengths) != rep(run_lengths, run_lengths) &
                      is.na(x),
                      0)
                      }

                      na2zero(x = c(3, 4, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 NA 3 3

                      na2zero(x = c(3, 4, NA, NA, NA, NA, NA, 3, 3))
                      [1] 3 4 NA 0 0 0 NA 3 3




                      Updated Benchmarks -



                      set.seed(2)
                      x <- c(3, 4, NA, NA, NA, 3, 3)
                      x <- sample(x, 1e5, T)

                      microbenchmark(
                      Rui(x),
                      Shree_old(x), Shree_new(x),
                      markus(x),
                      IceCreamT(x),
                      Uwe1(x), Uwe2(x), Uwe_Reduce(x),
                      Grothendieck(x),
                      times = 50
                      )

                      all.equal(Shree_new(x), Rui(x)) # [1] TRUE
                      all.equal(Shree_new(x), Shree_old(x)) # [1] TRUE
                      all.equal(Shree_new(x), markus(x)) # [1] TRUE
                      all.equal(Shree_new(x), Uwe1(x)) # [1] TRUE
                      all.equal(Shree_new(x), Uwe2(x)) # [1] TRUE
                      all.equal(Shree_new(x), Uwe_Reduce(x)) # [1] TRUE
                      all.equal(Shree_new(x), Grothendieck(x)) # [1] TRUE


                      Unit: milliseconds
                      expr min lq mean median uq max neval
                      Rui(x) 286.026540 307.586604 342.620266 318.404731 363.844258 518.03330 50
                      Shree_old(x) 51.556489 62.038875 85.348031 65.012384 81.882141 327.57514 50
                      Shree_new(x) 3.996918 4.258248 17.210709 6.298946 10.335142 207.14732 50
                      markus(x) 853.513854 885.419719 1001.450726 919.930389 1018.353847 1642.25435 50
                      IceCreamT(x) 12.162079 13.773873 22.555446 15.021700 21.271498 199.08993 50
                      Uwe1(x) 162.536980 183.566490 225.801038 196.882049 269.020395 439.17737 50
                      Uwe2(x) 83.582360 93.136277 115.608342 99.165997 115.376903 309.67290 50
                      Uwe_Reduce(x) 1.732195 1.871940 4.215195 2.016815 4.842883 25.91542 50
                      Grothendieck(x) 620.814291 688.107779 767.749387 746.699435 850.442643 982.49094 50






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited 4 hours ago

























                      answered 9 hours ago









                      ShreeShree

                      5,0371 gold badge7 silver badges26 bronze badges




                      5,0371 gold badge7 silver badges26 bronze badges













                      • While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

                        – Mako212
                        9 hours ago











                      • @Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

                        – Shree
                        9 hours ago






                      • 1





                        Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

                        – Mako212
                        9 hours ago











                      • Nah, the first 0 in your output is > 2 away from nearest valid point.

                        – Shree
                        9 hours ago






                      • 1





                        The question says more than 2 intervals from a valid data point

                        – Mako212
                        9 hours ago



















                      • While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

                        – Mako212
                        9 hours ago











                      • @Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

                        – Shree
                        9 hours ago






                      • 1





                        Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

                        – Mako212
                        9 hours ago











                      • Nah, the first 0 in your output is > 2 away from nearest valid point.

                        – Shree
                        9 hours ago






                      • 1





                        The question says more than 2 intervals from a valid data point

                        – Mako212
                        9 hours ago

















                      While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

                      – Mako212
                      9 hours ago





                      While this works for the provided example, I don't know that it gives the desired behavior for say, x <- c(3, 4, NA, NA, NA, NA, 3, 3).

                      – Mako212
                      9 hours ago













                      @Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

                      – Shree
                      9 hours ago





                      @Mako212 It gives [1] 3 4 NA NA 0 NA 3 3 which seems correct. Am I missing something?

                      – Shree
                      9 hours ago




                      1




                      1





                      Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

                      – Mako212
                      9 hours ago





                      Not sure, the way I read OP's question I'd expect [1] 3 4 NA 0 0 NA 3 3 to be the desired result, but that's just how I interpreted it.

                      – Mako212
                      9 hours ago













                      Nah, the first 0 in your output is > 2 away from nearest valid point.

                      – Shree
                      9 hours ago





                      Nah, the first 0 in your output is > 2 away from nearest valid point.

                      – Shree
                      9 hours ago




                      1




                      1





                      The question says more than 2 intervals from a valid data point

                      – Mako212
                      9 hours ago





                      The question says more than 2 intervals from a valid data point

                      – Mako212
                      9 hours ago











                      4














                      Here's a data.table option



                      library(data.table)

                      na0_dt <- function(x){
                      replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)
                      }


                      Benchmark and equality comparison.



                      x <- c(3, 4, NA, NA, NA, 3, 3)
                      x <- sample(x, 1e5, T)

                      all.equal(Rui(x), na0_dt(x))
                      # [1] TRUE
                      all.equal(Rui(x), markus(x))
                      # [1] TRUE
                      all.equal(Rui(x), Shree(x))
                      # [1] TRUE
                      all.equal(Rui(x), Uwe1(x))
                      # [1] TRUE
                      all.equal(Rui(x), Uwe2(x))
                      # [1] TRUE

                      library(microbenchmark)
                      microbenchmark(
                      Rui(x),
                      Shree(x),
                      markus(x),
                      na0_dt(x),
                      Uwe1(x),
                      Uwe2(x),
                      times = 50
                      )
                      # Unit: milliseconds
                      # expr min lq mean median uq max neval cld
                      # Rui(x) 262.80052 286.75909 323.24804 300.73305 344.75091 488.45805 50 d
                      # Shree(x) 55.30956 64.63756 82.52067 72.70157 85.11101 367.10743 50 b
                      # markus(x) 807.05592 882.74498 1007.31244 958.71506 1075.13111 1590.34609 50 e
                      # na0_dt(x) 14.95426 16.75980 19.37040 18.53108 19.99180 38.43161 50 a
                      # Uwe1(x) 164.20027 178.59412 210.47365 198.76931 224.98389 327.70639 50 c
                      # Uwe2(x) 85.33296 97.42117 110.56380 102.55040 115.19759 192.73526 50 b


                      Again with length 1e3



                      x <- c(3, 4, NA, NA, NA, 3, 3)
                      x <- sample(x, 1e3, T)

                      microbenchmark(
                      Rui(x),
                      Shree(x),
                      markus(x),
                      na0_dt(x),
                      Uwe1(x),
                      Uwe2(x),
                      times = 50
                      )
                      # Unit: microseconds
                      # expr min lq mean median uq max neval cld
                      # Rui(x) 2543.180 2668.719 3084.0056 2847.386 3286.566 4820.105 50 bc
                      # Shree(x) 530.462 584.206 651.1348 600.821 653.949 1170.052 50 a
                      # markus(x) 7036.721 7604.517 9467.0319 8507.491 11014.979 24400.421 50 d
                      # na0_dt(x) 1008.001 1092.513 1179.9802 1160.001 1247.591 1564.308 50 a
                      # Uwe1(x) 2907.078 3190.976 3773.5566 3424.207 4162.873 9388.723 50 c
                      # Uwe2(x) 2299.489 2566.155 2803.8250 2703.386 2996.104 4038.155 50 b


                      Functions from other answers used in this benchmark



                      Rui <- function(x){
                      ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){
                      if(anyNA(y)){
                      if(length(y) > 2) y[-c(1, length(y))] <- 0
                      }
                      y
                      })
                      }
                      Shree <- function(x) {
                      run_lengths <- rle(is.na(x))$lengths
                      replace(x,
                      sequence(run_lengths) != 1 &
                      sequence(run_lengths) != rep(run_lengths, run_lengths) &
                      is.na(x),
                      0)
                      }
                      markus <- function(x, gap = 1) {

                      stopifnot(gap >= 0)

                      if (gap == 0) {
                      x[is.na(x)] <- 0
                      x } else {
                      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))
                      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))
                      replace(x, tmp > gap, 0)
                      }
                      }
                      na0_dt <- function(x){
                      replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)
                      }
                      Uwe1 <- function(x){
                      data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x
                      }
                      Uwe2 <- function(x){
                      x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0
                      x
                      }





                      share|improve this answer





















                      • 1





                        Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

                        – Shree
                        7 hours ago






                      • 1





                        That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

                        – IceCreamToucan
                        7 hours ago













                      • Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

                        – markus
                        7 hours ago











                      • Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

                        – IceCreamToucan
                        7 hours ago


















                      4














                      Here's a data.table option



                      library(data.table)

                      na0_dt <- function(x){
                      replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)
                      }


                      Benchmark and equality comparison.



                      x <- c(3, 4, NA, NA, NA, 3, 3)
                      x <- sample(x, 1e5, T)

                      all.equal(Rui(x), na0_dt(x))
                      # [1] TRUE
                      all.equal(Rui(x), markus(x))
                      # [1] TRUE
                      all.equal(Rui(x), Shree(x))
                      # [1] TRUE
                      all.equal(Rui(x), Uwe1(x))
                      # [1] TRUE
                      all.equal(Rui(x), Uwe2(x))
                      # [1] TRUE

                      library(microbenchmark)
                      microbenchmark(
                      Rui(x),
                      Shree(x),
                      markus(x),
                      na0_dt(x),
                      Uwe1(x),
                      Uwe2(x),
                      times = 50
                      )
                      # Unit: milliseconds
                      # expr min lq mean median uq max neval cld
                      # Rui(x) 262.80052 286.75909 323.24804 300.73305 344.75091 488.45805 50 d
                      # Shree(x) 55.30956 64.63756 82.52067 72.70157 85.11101 367.10743 50 b
                      # markus(x) 807.05592 882.74498 1007.31244 958.71506 1075.13111 1590.34609 50 e
                      # na0_dt(x) 14.95426 16.75980 19.37040 18.53108 19.99180 38.43161 50 a
                      # Uwe1(x) 164.20027 178.59412 210.47365 198.76931 224.98389 327.70639 50 c
                      # Uwe2(x) 85.33296 97.42117 110.56380 102.55040 115.19759 192.73526 50 b


                      Again with length 1e3



                      x <- c(3, 4, NA, NA, NA, 3, 3)
                      x <- sample(x, 1e3, T)

                      microbenchmark(
                      Rui(x),
                      Shree(x),
                      markus(x),
                      na0_dt(x),
                      Uwe1(x),
                      Uwe2(x),
                      times = 50
                      )
                      # Unit: microseconds
                      # expr min lq mean median uq max neval cld
                      # Rui(x) 2543.180 2668.719 3084.0056 2847.386 3286.566 4820.105 50 bc
                      # Shree(x) 530.462 584.206 651.1348 600.821 653.949 1170.052 50 a
                      # markus(x) 7036.721 7604.517 9467.0319 8507.491 11014.979 24400.421 50 d
                      # na0_dt(x) 1008.001 1092.513 1179.9802 1160.001 1247.591 1564.308 50 a
                      # Uwe1(x) 2907.078 3190.976 3773.5566 3424.207 4162.873 9388.723 50 c
                      # Uwe2(x) 2299.489 2566.155 2803.8250 2703.386 2996.104 4038.155 50 b


                      Functions from other answers used in this benchmark



                      Rui <- function(x){
                      ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){
                      if(anyNA(y)){
                      if(length(y) > 2) y[-c(1, length(y))] <- 0
                      }
                      y
                      })
                      }
                      Shree <- function(x) {
                      run_lengths <- rle(is.na(x))$lengths
                      replace(x,
                      sequence(run_lengths) != 1 &
                      sequence(run_lengths) != rep(run_lengths, run_lengths) &
                      is.na(x),
                      0)
                      }
                      markus <- function(x, gap = 1) {

                      stopifnot(gap >= 0)

                      if (gap == 0) {
                      x[is.na(x)] <- 0
                      x } else {
                      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))
                      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))
                      replace(x, tmp > gap, 0)
                      }
                      }
                      na0_dt <- function(x){
                      replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)
                      }
                      Uwe1 <- function(x){
                      data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x
                      }
                      Uwe2 <- function(x){
                      x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0
                      x
                      }





                      share|improve this answer





















                      • 1





                        Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

                        – Shree
                        7 hours ago






                      • 1





                        That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

                        – IceCreamToucan
                        7 hours ago













                      • Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

                        – markus
                        7 hours ago











                      • Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

                        – IceCreamToucan
                        7 hours ago
















                      4












                      4








                      4







                      Here's a data.table option



                      library(data.table)

                      na0_dt <- function(x){
                      replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)
                      }


                      Benchmark and equality comparison.



                      x <- c(3, 4, NA, NA, NA, 3, 3)
                      x <- sample(x, 1e5, T)

                      all.equal(Rui(x), na0_dt(x))
                      # [1] TRUE
                      all.equal(Rui(x), markus(x))
                      # [1] TRUE
                      all.equal(Rui(x), Shree(x))
                      # [1] TRUE
                      all.equal(Rui(x), Uwe1(x))
                      # [1] TRUE
                      all.equal(Rui(x), Uwe2(x))
                      # [1] TRUE

                      library(microbenchmark)
                      microbenchmark(
                      Rui(x),
                      Shree(x),
                      markus(x),
                      na0_dt(x),
                      Uwe1(x),
                      Uwe2(x),
                      times = 50
                      )
                      # Unit: milliseconds
                      # expr min lq mean median uq max neval cld
                      # Rui(x) 262.80052 286.75909 323.24804 300.73305 344.75091 488.45805 50 d
                      # Shree(x) 55.30956 64.63756 82.52067 72.70157 85.11101 367.10743 50 b
                      # markus(x) 807.05592 882.74498 1007.31244 958.71506 1075.13111 1590.34609 50 e
                      # na0_dt(x) 14.95426 16.75980 19.37040 18.53108 19.99180 38.43161 50 a
                      # Uwe1(x) 164.20027 178.59412 210.47365 198.76931 224.98389 327.70639 50 c
                      # Uwe2(x) 85.33296 97.42117 110.56380 102.55040 115.19759 192.73526 50 b


                      Again with length 1e3



                      x <- c(3, 4, NA, NA, NA, 3, 3)
                      x <- sample(x, 1e3, T)

                      microbenchmark(
                      Rui(x),
                      Shree(x),
                      markus(x),
                      na0_dt(x),
                      Uwe1(x),
                      Uwe2(x),
                      times = 50
                      )
                      # Unit: microseconds
                      # expr min lq mean median uq max neval cld
                      # Rui(x) 2543.180 2668.719 3084.0056 2847.386 3286.566 4820.105 50 bc
                      # Shree(x) 530.462 584.206 651.1348 600.821 653.949 1170.052 50 a
                      # markus(x) 7036.721 7604.517 9467.0319 8507.491 11014.979 24400.421 50 d
                      # na0_dt(x) 1008.001 1092.513 1179.9802 1160.001 1247.591 1564.308 50 a
                      # Uwe1(x) 2907.078 3190.976 3773.5566 3424.207 4162.873 9388.723 50 c
                      # Uwe2(x) 2299.489 2566.155 2803.8250 2703.386 2996.104 4038.155 50 b


                      Functions from other answers used in this benchmark



                      Rui <- function(x){
                      ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){
                      if(anyNA(y)){
                      if(length(y) > 2) y[-c(1, length(y))] <- 0
                      }
                      y
                      })
                      }
                      Shree <- function(x) {
                      run_lengths <- rle(is.na(x))$lengths
                      replace(x,
                      sequence(run_lengths) != 1 &
                      sequence(run_lengths) != rep(run_lengths, run_lengths) &
                      is.na(x),
                      0)
                      }
                      markus <- function(x, gap = 1) {

                      stopifnot(gap >= 0)

                      if (gap == 0) {
                      x[is.na(x)] <- 0
                      x } else {
                      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))
                      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))
                      replace(x, tmp > gap, 0)
                      }
                      }
                      na0_dt <- function(x){
                      replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)
                      }
                      Uwe1 <- function(x){
                      data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x
                      }
                      Uwe2 <- function(x){
                      x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0
                      x
                      }





                      share|improve this answer















                      Here's a data.table option



                      library(data.table)

                      na0_dt <- function(x){
                      replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)
                      }


                      Benchmark and equality comparison.



                      x <- c(3, 4, NA, NA, NA, 3, 3)
                      x <- sample(x, 1e5, T)

                      all.equal(Rui(x), na0_dt(x))
                      # [1] TRUE
                      all.equal(Rui(x), markus(x))
                      # [1] TRUE
                      all.equal(Rui(x), Shree(x))
                      # [1] TRUE
                      all.equal(Rui(x), Uwe1(x))
                      # [1] TRUE
                      all.equal(Rui(x), Uwe2(x))
                      # [1] TRUE

                      library(microbenchmark)
                      microbenchmark(
                      Rui(x),
                      Shree(x),
                      markus(x),
                      na0_dt(x),
                      Uwe1(x),
                      Uwe2(x),
                      times = 50
                      )
                      # Unit: milliseconds
                      # expr min lq mean median uq max neval cld
                      # Rui(x) 262.80052 286.75909 323.24804 300.73305 344.75091 488.45805 50 d
                      # Shree(x) 55.30956 64.63756 82.52067 72.70157 85.11101 367.10743 50 b
                      # markus(x) 807.05592 882.74498 1007.31244 958.71506 1075.13111 1590.34609 50 e
                      # na0_dt(x) 14.95426 16.75980 19.37040 18.53108 19.99180 38.43161 50 a
                      # Uwe1(x) 164.20027 178.59412 210.47365 198.76931 224.98389 327.70639 50 c
                      # Uwe2(x) 85.33296 97.42117 110.56380 102.55040 115.19759 192.73526 50 b


                      Again with length 1e3



                      x <- c(3, 4, NA, NA, NA, 3, 3)
                      x <- sample(x, 1e3, T)

                      microbenchmark(
                      Rui(x),
                      Shree(x),
                      markus(x),
                      na0_dt(x),
                      Uwe1(x),
                      Uwe2(x),
                      times = 50
                      )
                      # Unit: microseconds
                      # expr min lq mean median uq max neval cld
                      # Rui(x) 2543.180 2668.719 3084.0056 2847.386 3286.566 4820.105 50 bc
                      # Shree(x) 530.462 584.206 651.1348 600.821 653.949 1170.052 50 a
                      # markus(x) 7036.721 7604.517 9467.0319 8507.491 11014.979 24400.421 50 d
                      # na0_dt(x) 1008.001 1092.513 1179.9802 1160.001 1247.591 1564.308 50 a
                      # Uwe1(x) 2907.078 3190.976 3773.5566 3424.207 4162.873 9388.723 50 c
                      # Uwe2(x) 2299.489 2566.155 2803.8250 2703.386 2996.104 4038.155 50 b


                      Functions from other answers used in this benchmark



                      Rui <- function(x){
                      ave(x, cumsum(abs(c(0, diff(is.na(x))))), FUN = function(y){
                      if(anyNA(y)){
                      if(length(y) > 2) y[-c(1, length(y))] <- 0
                      }
                      y
                      })
                      }
                      Shree <- function(x) {
                      run_lengths <- rle(is.na(x))$lengths
                      replace(x,
                      sequence(run_lengths) != 1 &
                      sequence(run_lengths) != rep(run_lengths, run_lengths) &
                      is.na(x),
                      0)
                      }
                      markus <- function(x, gap = 1) {

                      stopifnot(gap >= 0)

                      if (gap == 0) {
                      x[is.na(x)] <- 0
                      x } else {
                      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))
                      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))
                      replace(x, tmp > gap, 0)
                      }
                      }
                      na0_dt <- function(x){
                      replace(x, rowid(r <- rleid(xna <- is.na(x))) > 1 & rev(rowid(rev(r))) > 1 & xna, 0)
                      }
                      Uwe1 <- function(x){
                      data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x
                      }
                      Uwe2 <- function(x){
                      x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0
                      x
                      }






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited 7 hours ago

























                      answered 8 hours ago









                      IceCreamToucanIceCreamToucan

                      12.3k1 gold badge8 silver badges19 bronze badges




                      12.3k1 gold badge8 silver badges19 bronze badges








                      • 1





                        Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

                        – Shree
                        7 hours ago






                      • 1





                        That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

                        – IceCreamToucan
                        7 hours ago













                      • Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

                        – markus
                        7 hours ago











                      • Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

                        – IceCreamToucan
                        7 hours ago
















                      • 1





                        Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

                        – Shree
                        7 hours ago






                      • 1





                        That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

                        – IceCreamToucan
                        7 hours ago













                      • Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

                        – markus
                        7 hours ago











                      • Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

                        – IceCreamToucan
                        7 hours ago










                      1




                      1





                      Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

                      – Shree
                      7 hours ago





                      Interestingly, my base R solution seems faster for x of length 1e3. I have no idea how data.table works but seems like it has some initial overhead after which it just takes off! Therefore longer the vector more it can compensate.

                      – Shree
                      7 hours ago




                      1




                      1





                      That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

                      – IceCreamToucan
                      7 hours ago







                      That's interesting, I added that case to the answer. I thought maybe the flipping-of-speeds would be because of the [.data.table overhead but it persists even after rewriting my function to avoid [.data.table

                      – IceCreamToucan
                      7 hours ago















                      Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

                      – markus
                      7 hours ago





                      Your benchmark makes me feel I should downvote my answer. Tried it, but it's not possibly unfortunately.

                      – markus
                      7 hours ago













                      Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

                      – IceCreamToucan
                      7 hours ago







                      Ha, that wasn't my intention. Even for a vector of length 1e5 the times are listed in milliseconds, so the timing isn't all that important for most cases probably.

                      – IceCreamToucan
                      7 hours ago













                      4














                      For the sake of completeness, here are three other data.table approaches:



                      x <- c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L))

                      library(data.table)
                      data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x



                      [1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA



                      x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0
                      x



                      [1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA




                      shift() & Reduce()



                      I was so focused on finding the right way to create groups that I started to think about the straightforward approach rather late. The rule is quite simple:




                      Replace all NAs by zero which are preceeded and succeeded by another NA.




                      This can be accomplished by zoo::rollapply() as in G. Grothendieck's answer or by using lag() & lead() like in Shree's latest edit.



                      However, my own benchmark (not posted here to avoid duplication with Shree' benchmark) shows that data.table::shift() and Reduce() is the fastest method so far.



                        isnax <- is.na(x) 
                      x[Reduce(`&`, data.table::shift(isnax, -1:1))] <- 0
                      x


                      It is also slightly faster than using lag() & lead() (please, note that this differs from Shree's version as is.na() is only called once):



                        isnax <- is.na(x) 
                      x[isnax & dplyr::lag(isnax) & dplyr::lead(isnax)] <- 0
                      x





                      share|improve this answer


























                      • Indeed, it is faster. Nice! I'll add it to my benchmarks.

                        – Shree
                        4 hours ago








                      • 2





                        @Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

                        – Uwe
                        4 hours ago
















                      4














                      For the sake of completeness, here are three other data.table approaches:



                      x <- c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L))

                      library(data.table)
                      data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x



                      [1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA



                      x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0
                      x



                      [1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA




                      shift() & Reduce()



                      I was so focused on finding the right way to create groups that I started to think about the straightforward approach rather late. The rule is quite simple:




                      Replace all NAs by zero which are preceeded and succeeded by another NA.




                      This can be accomplished by zoo::rollapply() as in G. Grothendieck's answer or by using lag() & lead() like in Shree's latest edit.



                      However, my own benchmark (not posted here to avoid duplication with Shree' benchmark) shows that data.table::shift() and Reduce() is the fastest method so far.



                        isnax <- is.na(x) 
                      x[Reduce(`&`, data.table::shift(isnax, -1:1))] <- 0
                      x


                      It is also slightly faster than using lag() & lead() (please, note that this differs from Shree's version as is.na() is only called once):



                        isnax <- is.na(x) 
                      x[isnax & dplyr::lag(isnax) & dplyr::lead(isnax)] <- 0
                      x





                      share|improve this answer


























                      • Indeed, it is faster. Nice! I'll add it to my benchmarks.

                        – Shree
                        4 hours ago








                      • 2





                        @Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

                        – Uwe
                        4 hours ago














                      4












                      4








                      4







                      For the sake of completeness, here are three other data.table approaches:



                      x <- c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L))

                      library(data.table)
                      data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x



                      [1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA



                      x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0
                      x



                      [1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA




                      shift() & Reduce()



                      I was so focused on finding the right way to create groups that I started to think about the straightforward approach rather late. The rule is quite simple:




                      Replace all NAs by zero which are preceeded and succeeded by another NA.




                      This can be accomplished by zoo::rollapply() as in G. Grothendieck's answer or by using lag() & lead() like in Shree's latest edit.



                      However, my own benchmark (not posted here to avoid duplication with Shree' benchmark) shows that data.table::shift() and Reduce() is the fastest method so far.



                        isnax <- is.na(x) 
                      x[Reduce(`&`, data.table::shift(isnax, -1:1))] <- 0
                      x


                      It is also slightly faster than using lag() & lead() (please, note that this differs from Shree's version as is.na() is only called once):



                        isnax <- is.na(x) 
                      x[isnax & dplyr::lag(isnax) & dplyr::lead(isnax)] <- 0
                      x





                      share|improve this answer















                      For the sake of completeness, here are three other data.table approaches:



                      x <- c(3, 4, NA, NA, NA, 3, 3, NA, NA, 1, NA, 0, 0, rep(NA, 4L))

                      library(data.table)
                      data.table(x)[, x := replace(x, which(is.na(x))[-c(1L, .N)], 0), by =.(rleid(is.na(x)))]$x



                      [1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA



                      x[data.table(x)[, .I[is.na(x)][-c(1L, .N)], by =.(rleid(is.na(x)))]$V1] <- 0
                      x



                      [1]  3  4 NA  0 NA  3  3 NA NA  1 NA  0  0 NA  0  0 NA




                      shift() & Reduce()



                      I was so focused on finding the right way to create groups that I started to think about the straightforward approach rather late. The rule is quite simple:




                      Replace all NAs by zero which are preceeded and succeeded by another NA.




                      This can be accomplished by zoo::rollapply() as in G. Grothendieck's answer or by using lag() & lead() like in Shree's latest edit.



                      However, my own benchmark (not posted here to avoid duplication with Shree' benchmark) shows that data.table::shift() and Reduce() is the fastest method so far.



                        isnax <- is.na(x) 
                      x[Reduce(`&`, data.table::shift(isnax, -1:1))] <- 0
                      x


                      It is also slightly faster than using lag() & lead() (please, note that this differs from Shree's version as is.na() is only called once):



                        isnax <- is.na(x) 
                      x[isnax & dplyr::lag(isnax) & dplyr::lead(isnax)] <- 0
                      x






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited 4 hours ago

























                      answered 8 hours ago









                      UweUwe

                      25k7 gold badges56 silver badges78 bronze badges




                      25k7 gold badges56 silver badges78 bronze badges













                      • Indeed, it is faster. Nice! I'll add it to my benchmarks.

                        – Shree
                        4 hours ago








                      • 2





                        @Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

                        – Uwe
                        4 hours ago



















                      • Indeed, it is faster. Nice! I'll add it to my benchmarks.

                        – Shree
                        4 hours ago








                      • 2





                        @Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

                        – Uwe
                        4 hours ago

















                      Indeed, it is faster. Nice! I'll add it to my benchmarks.

                      – Shree
                      4 hours ago







                      Indeed, it is faster. Nice! I'll add it to my benchmarks.

                      – Shree
                      4 hours ago






                      2




                      2





                      @Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

                      – Uwe
                      4 hours ago





                      @Shree, thank you for benchmarking all the different solutions. BTW: I have switched from microbenchmark to bench for doing benchmarks because it allows to vary problem sizes easily and to create charts. (I decided not to post the charts because you took the burden to do all the benchmarks, already.)

                      – Uwe
                      4 hours ago











                      2














                      Based on the example, I assume what you mean is that if the value is NA and adjacent values in both directions are NA (or in one direction if the value is first or last) then replace the value with 0. Using a centered rolling window of length 3 return TRUE if it is all NA and then replace the TRUE positions with 0. This gives the following one-liner



                      library(zoo)

                      replace(x, rollapply(c(TRUE, is.na(x), TRUE), 3, all), 0)
                      ## [1] 3 4 NA 0 NA 3 3





                      share|improve this answer


























                      • zoo has some pretty handy functions to work with missing values

                        – PavoDive
                        7 hours ago
















                      2














                      Based on the example, I assume what you mean is that if the value is NA and adjacent values in both directions are NA (or in one direction if the value is first or last) then replace the value with 0. Using a centered rolling window of length 3 return TRUE if it is all NA and then replace the TRUE positions with 0. This gives the following one-liner



                      library(zoo)

                      replace(x, rollapply(c(TRUE, is.na(x), TRUE), 3, all), 0)
                      ## [1] 3 4 NA 0 NA 3 3





                      share|improve this answer


























                      • zoo has some pretty handy functions to work with missing values

                        – PavoDive
                        7 hours ago














                      2












                      2








                      2







                      Based on the example, I assume what you mean is that if the value is NA and adjacent values in both directions are NA (or in one direction if the value is first or last) then replace the value with 0. Using a centered rolling window of length 3 return TRUE if it is all NA and then replace the TRUE positions with 0. This gives the following one-liner



                      library(zoo)

                      replace(x, rollapply(c(TRUE, is.na(x), TRUE), 3, all), 0)
                      ## [1] 3 4 NA 0 NA 3 3





                      share|improve this answer















                      Based on the example, I assume what you mean is that if the value is NA and adjacent values in both directions are NA (or in one direction if the value is first or last) then replace the value with 0. Using a centered rolling window of length 3 return TRUE if it is all NA and then replace the TRUE positions with 0. This gives the following one-liner



                      library(zoo)

                      replace(x, rollapply(c(TRUE, is.na(x), TRUE), 3, all), 0)
                      ## [1] 3 4 NA 0 NA 3 3






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited 7 hours ago

























                      answered 7 hours ago









                      G. GrothendieckG. Grothendieck

                      159k11 gold badges143 silver badges251 bronze badges




                      159k11 gold badges143 silver badges251 bronze badges













                      • zoo has some pretty handy functions to work with missing values

                        – PavoDive
                        7 hours ago



















                      • zoo has some pretty handy functions to work with missing values

                        – PavoDive
                        7 hours ago

















                      zoo has some pretty handy functions to work with missing values

                      – PavoDive
                      7 hours ago





                      zoo has some pretty handy functions to work with missing values

                      – PavoDive
                      7 hours ago











                      1














                      Another base approach



                      x <- c(3, 4, NA, NA, NA, 3, 3, NA, 3, NA, NA, NA, NA, 1)


                      Create a grouping variable



                      grp <- with(rle(is.na(x)), rep(seq_along(lengths), lengths)) # same as rleid(is.na(x))


                      For each group calculate the parallel minimum of cumsum(is.na(x)) and its reverse (which will be greater than one for values 'which are more than 2 intervals from a valid data point' away)



                      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))


                      Finally use this as an identifier to replace desired values in x



                      replace(x, tmp > 1, 0)
                      # [1] 3 4 NA 0 NA 3 3 NA 3 NA 0 0 NA 1




                      Written as a function



                      f <- function(x, gap = 1) {

                      stopifnot(gap >= 0)

                      if (gap == 0) {
                      x[is.na(x)] <- 0
                      x } else {
                      grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))
                      tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))
                      replace(x, tmp > gap, 0)
                      }
                      }
                      f(x)





                      share|improve this answer






























                        1














                        Another base approach



                        x <- c(3, 4, NA, NA, NA, 3, 3, NA, 3, NA, NA, NA, NA, 1)


                        Create a grouping variable



                        grp <- with(rle(is.na(x)), rep(seq_along(lengths), lengths)) # same as rleid(is.na(x))


                        For each group calculate the parallel minimum of cumsum(is.na(x)) and its reverse (which will be greater than one for values 'which are more than 2 intervals from a valid data point' away)



                        tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))


                        Finally use this as an identifier to replace desired values in x



                        replace(x, tmp > 1, 0)
                        # [1] 3 4 NA 0 NA 3 3 NA 3 NA 0 0 NA 1




                        Written as a function



                        f <- function(x, gap = 1) {

                        stopifnot(gap >= 0)

                        if (gap == 0) {
                        x[is.na(x)] <- 0
                        x } else {
                        grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))
                        tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))
                        replace(x, tmp > gap, 0)
                        }
                        }
                        f(x)





                        share|improve this answer




























                          1












                          1








                          1







                          Another base approach



                          x <- c(3, 4, NA, NA, NA, 3, 3, NA, 3, NA, NA, NA, NA, 1)


                          Create a grouping variable



                          grp <- with(rle(is.na(x)), rep(seq_along(lengths), lengths)) # same as rleid(is.na(x))


                          For each group calculate the parallel minimum of cumsum(is.na(x)) and its reverse (which will be greater than one for values 'which are more than 2 intervals from a valid data point' away)



                          tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))


                          Finally use this as an identifier to replace desired values in x



                          replace(x, tmp > 1, 0)
                          # [1] 3 4 NA 0 NA 3 3 NA 3 NA 0 0 NA 1




                          Written as a function



                          f <- function(x, gap = 1) {

                          stopifnot(gap >= 0)

                          if (gap == 0) {
                          x[is.na(x)] <- 0
                          x } else {
                          grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))
                          tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))
                          replace(x, tmp > gap, 0)
                          }
                          }
                          f(x)





                          share|improve this answer















                          Another base approach



                          x <- c(3, 4, NA, NA, NA, 3, 3, NA, 3, NA, NA, NA, NA, 1)


                          Create a grouping variable



                          grp <- with(rle(is.na(x)), rep(seq_along(lengths), lengths)) # same as rleid(is.na(x))


                          For each group calculate the parallel minimum of cumsum(is.na(x)) and its reverse (which will be greater than one for values 'which are more than 2 intervals from a valid data point' away)



                          tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))


                          Finally use this as an identifier to replace desired values in x



                          replace(x, tmp > 1, 0)
                          # [1] 3 4 NA 0 NA 3 3 NA 3 NA 0 0 NA 1




                          Written as a function



                          f <- function(x, gap = 1) {

                          stopifnot(gap >= 0)

                          if (gap == 0) {
                          x[is.na(x)] <- 0
                          x } else {
                          grp <- with(rle(is.na(x)), rep(seq_along(lengths), times = lengths))
                          tmp <- ave(x, grp, FUN = function(x) pmin(cumsum(is.na(x)), rev(cumsum(is.na(x)))))
                          replace(x, tmp > gap, 0)
                          }
                          }
                          f(x)






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited 8 hours ago

























                          answered 9 hours ago









                          markusmarkus

                          17.6k2 gold badges16 silver badges39 bronze badges




                          17.6k2 gold badges16 silver badges39 bronze badges






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f56692176%2ffill-nas-in-r-with-zero-if-the-next-valid-data-point-is-more-than-2-intervals-aw%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Taj Mahal Inhaltsverzeichnis Aufbau | Geschichte | 350-Jahr-Feier | Heutige Bedeutung | Siehe auch |...

                              Baia Sprie Cuprins Etimologie | Istorie | Demografie | Politică și administrație | Arii naturale...

                              Nicolae Petrescu-Găină Cuprins Biografie | Opera | In memoriam | Varia | Controverse, incertitudini...