r - Using auto.arima: Error in OCSBtest(x, m) : subscript out of bounds -


i'm using big (isplit) loop on huge set of time series testing on arima models. i'm using auto.arima function forecast package.

for created function, traverse time series while keeping track of progress , store fitted models , stats (such accuracy , model parameters). i'm dealing error generated auto.arima function. more precise; caused ocsb seasonal testing.

i'm using function 'monthly' time series 'weekly' time series. monthly time series have no problems (almost 50000, including lot 'zero' values). weekly time series ran problem. i'm not able find real cause of error.

i tried recreate error. though had lot of 0 (or same) values in combination 52 frequency period. still can't point finger @ problem.

see examples below. info: set of time series weekly values (freq=52), starting in 2010, week 1. length 122 samples (until 2012, week 18). therefore tested lengths of 122, can generate error. still think has frequency , 'running same values'...

for error generated, not.

example 1 [random numbers, length=122] > no problem:

ts_element <- ts(sample(0:30, 122, replace=true), frequency = 52, start = c(2010, 1)) fit <- auto.arima(ts_element, trace=false, seasonal.test="ocsb", allowdrift=true, stepwise=true) 

example 2 [only 0 values, length=122] > ocsb test error (normally assume different error...see example 3):

ts_element <- ts(sample(0:0, 122, replace=true), frequency = 52, start = c(2010, 1)) fit <- auto.arima(ts_element, trace=false, seasonal.test="ocsb", allowdrift=true, stepwise=true) error in ocsbtest(x, m) : subscript out of bounds 

example 3 [only 0 values, length=100] > 'zero/equal values' error, assumed one, example not problem, point out length relevant (compare example 2):

ts_element <- ts(sample(0:0, 100, replace=true), frequency = 52, start = c(2010, 1)) fit <- auto.arima(ts_element, trace=false, seasonal.test="ocsb", allowdrift=true, stepwise=true) error in if (pval == min(tablep)) warning("p-value smaller printed p-value") else warning("p-value greater     printed p-value") :    missing value true/false needed 

example 4 [almost same ex.3, 1 non-0 value, length=100] > no problem anymore:

ts_element[30] <- 1 fit <- auto.arima(ts_element, trace=false, seasonal.test="ocsb", allowdrift=true, stepwise=true) 

example 5 [almost same ex.4, length=122] > ocsb test error:

ts_element <- ts(sample(0:0, 122, replace=true), frequency = 52, start = c(2010, 1)) ts_element[30] <- 1 fit <- auto.arima(ts_element, trace=false, seasonal.test="ocsb", allowdrift=true, stepwise=true) error in ocsbtest(x, m) : subscript out of bounds 

example 6 [random 1's , 0's, length=122] > no problem:

ts_element <- ts(sample(0:1, 122, replace=true), frequency = 52, start = c(2010, 1)) fit <- auto.arima(ts_element, trace=false, seasonal.test="ocsb", allowdrift=true, stepwise=true) 

example 7 [random numbers, smaller length of 50] > no problem:

ts_element <- ts(sample(1:34, 50, replace=true), frequency = 52, start = c(2010, 1)) fit <- auto.arima(ts_element, trace=false, seasonal.test="ocsb", allowdrift=true, stepwise=true) 

does have idea cause of ocsb out of bounds error is? how recognize?

the main problem whenever error occurs in function described in beginning of post, function doesn't output information i'm gathering. hours of waiting nothing. if root cause cannot found, i'm helped code deal errors in way 'ignore' them (skip time series) , go further. or ignore, still output information gathered @ moment.

how has solution?

note: zero-error not problem. i'm covering in function.

nice question, , explained. thought through before submitted.

the problems in examples due number of issues (and in opinion, bugs) around dealing time series full of zeroes.

in general, should use debug command step through code. example, try debugging 5 main functions run auto.arima:

debug(auto.arima) debug(nsdiffs) debug(forecast:::ocsbtest) debug(lm) debug(lm.fit) 

(use q exit , undebug stop debugging function) , try running code in example 2

ts_element <- ts(sample(0:0, 122, replace=true), frequency = 52, start = c(2010, 1)) fit <- auto.arima(ts_element, trace=false, seasonal.test="ocsb", allowdrift=true, stepwise=true) 

after lot pressing enter reach point r fails. in case, rather deep , nasty bug in lm.fit. if coefficients zero, reason converts them na. when ocsbtest function tries pull out coefficients, find matrix empty, , tells not appropriate index.

i tell report r-bugs... can pretty snippy when comes bugs in base. tell "user error" , shouldn't fitting regression models zeroes (sigh).

the first problem example 3 appears feature undocumented in nsdiffs page, describes forecast::ocsbtest function. looks time series must bigger 2 times period + 5, or seasonal differencing not run. in example 2 true, not in example 3. indeed, first bit of code in function is:

if (length(time.series) < (2 * period + 5)) {     return(0) } 

read in 2 osborn references listed in nsdiffs page, maybe mentions there somewhere. idea let authors of forecast know include in documentation somewhere. maybe throw warning, option turn off.

example 3 has different error example 2 because example 3 exits nsdiffs function, , goes on fail in ndiff function, differencing. ndiff appears have bug in if sum of square differences 0 (because series zero), causes divide 0 error. here relevant code in ndiff function:

s2 <- .c("r_pp_sum", as.vector(e, mode = "double"), as.integer(n), as.integer(l), s2 = as.double(s2), package = "tseries")$s2 stat <- eta/s2 # becomes nan pval <- approx(table, tablep, stat, rule = 2)$y # nan if (is.na(approx(table, tablep, stat, rule = 1)$y)) if (pval ==  min(tablep)) warning("p-value smaller printed p-value") else warning("p-value greater printed p-value") # bombs 

example 4 succeeds because s2 never zero. simple fix check if s2 0 before dividing.

example 5 fails same reason example 2. enters nsdiff function because has length of more 2*period+5, , fails because lm.fit doesn't return coefficients when zero.

example 6 succeeds because lm.fit return coefficients, because not zero, because time series mixed ones , zeroes.

example 7 succeeds because nsdiff not run (because series small) , ndiff no longer cause divide zero, because sum of squared differences not zero.

in conclusion, examples have shown 2 bugs. 1 in ndiff when time series zero, , in lm.fit function when covariates zero. also, documentation in nsdiff should updated tell won't run if time series has length of less 2*period+5 if use 'ocsb' option (but maybe documented in references).


Comments

Popular posts from this blog

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -

php - Controller/JToolBar not working in Joomla 2.5 -