Main Content

Kernel (Covariance) Function Options

在监督学习ing, it is expected that the points with similar predictor values x i , naturally have close response (target) values y i . In Gaussian processes, the covariance function expresses this similarity[1]. It specifies the covariance between the two latent variables f ( x i ) and f ( x j ) , where both x i and x j ared-by-1 vectors. In other words, it determines how the response at one point x i is affected by responses at other points x j ,ij,i= 1, 2, ...,n. The covariance function k ( x i , x j ) can be defined by various kernel functions. It can be parameterized in terms of the kernel parameters in vector θ . Hence, it is possible to express the covariance function as k ( x i , x j | θ ) .

For many standard kernel functions, the kernel parameters are based on the signal standard deviation σ f and the characteristic length scale σ l . The characteristic length scales briefly define how far apart the input values x i can be for the response values to become uncorrelated. Both σ l and σ f need to be greater than 0, and this can be enforced by the unconstrained parametrization vector θ , such that

θ 1 = log σ l , θ 2 = log σ f .

The built-in kernel (covariance) functionswith same length scale for each predictorare:

  • Squared Exponential Kernel

    This is one of the most commonly used covariance functions and is the default option forfitrgp. The squared exponential kernel function is defined as

    k ( x i , x j | θ ) = σ f 2 exp [ 1 2 ( x i x j ) T ( x i x j ) σ l 2 ] .

    where σ l is the characteristic length scale, and σ f is the signal standard deviation.

  • Exponential Kernel

    You can specify the exponential kernel function using the'KernelFunction','exponential'name-value pair argument. This covariance function is defined by

    k ( x i , x j | θ ) = σ f 2 exp ( r σ l ) ,

    where σ l is the characteristic length scale and

    r = ( x i x j ) T ( x i x j )

    is the Euclidean distance between x i and x j .

  • Matern 3/2

    You can specify the Matern 3/2 kernel function using the'KernelFunction','matern32'name-value pair argument. This covariance function is defined by

    k ( x i , x j | θ ) = σ f 2 ( 1 + 3 r σ l ) exp ( 3 r σ l ) ,

    where

    r = ( x i x j ) T ( x i x j )

    is the Euclidean distance between x i and x j .

  • Matern 5/2

    You can specify the Matern 5/2 kernel function using the'KernelFunction','matern52'name-value pair argument. The Matern 5/2 covariance function is defined as

    k ( x i , x j ) = σ f 2 ( 1 + 5 r σ l + 5 r 2 3 σ l 2 ) exp ( 5 r σ l ) ,

    where

    r = ( x i x j ) T ( x i x j )

    is the Euclidean distance between x i and x j .

  • Rational Quadratic Kernel

    You can specify the rational quadratic kernel function using the'KernelFunction','rationalquadratic'name-value pair argument. This covariance function is defined by

    k ( x i , x j | θ ) = σ f 2 ( 1 + r 2 2 α σ l 2 ) α ,

    where σ l is the characteristic length scale, α is a positive-valued scale-mixture parameter, and

    r = ( x i x j ) T ( x i x j )

    is the Euclidean distance between x i and x j .

It is possible to use a separate length scale σ m for each predictorm,m= 1, 2, ...,d. The built-in kernel (covariance) functions with a separate length scale for each predictor implement automatic relevance determination (ARD)[2]. The unconstrained parametrization θ in this case is

θ m = log σ m , for m = 1 , 2 , ... , d θ d + 1 = log σ f .

The built-in kernel (covariance) functionswith separate length scale for each predictorare:

  • ARD Squared Exponential Kernel

    You can specify this kernel function using the'KernelFunction','ardsquaredexponential'name-value pair argument. This covariance function is the squared exponential kernel function, with a separate length scale for each predictor. It is defined as

    k ( x i , x j | θ ) = σ f 2 exp [ 1 2 m = 1 d ( x i m x j m ) 2 σ m 2 ] .

  • ARD Exponential Kernel

    You can specify this kernel function using the'KernelFunction','ardexponential'name-value pair argument. This covariance function is the exponential kernel function, with a separate length scale for each predictor. It is defined as

    k ( x i , x j | θ ) = σ f 2 exp ( r ) ,

    where

    r = m = 1 d ( x i m x j m ) 2 σ m 2 .

  • ARD Matern 3/2

    You can specify this kernel function using the'KernelFunction','ardmatern32'name-value pair argument. This covariance function is the Matern 3/2 kernel function, with a different length scale for each predictor. It is defined as

    k ( x i , x j | θ ) = σ f 2 ( 1 + 3 r ) exp ( 3 r ) ,

    where

    r = m = 1 d ( x i m x j m ) 2 σ m 2 .

  • ARD Matern 5/2

    You can specify this kernel function using the'KernelFunction','ardmatern52'name-value pair argument. This covariance function is the Matern 5/2 kernel function, with a different length scale for each predictor. It is defined as

    k ( x i , x j | θ ) = σ f 2 ( 1 + 5 r + 5 3 r 2 ) exp ( 5 r ) ,

    where

    r = m = 1 d ( x i m x j m ) 2 σ m 2 .

  • ARD Rational Quadratic Kernel

    You can specify this kernel function using the'KernelFunction','ardrationalquadratic'name-value pair argument. This covariance function is the rational quadratic kernel function, with a separate length scale for each predictor. It is defined as

    k ( x i , x j | θ ) = σ f 2 ( 1 + 1 2 α m = 1 d ( x i m x j m ) 2 σ m 2 ) α .

You can specify the kernel function using theKernelFunctionname-value pair argument in a call tofitrgp. You can either specify one of the built-in kernel parameter options, or specify a custom function. When providing the initial kernel parameter values for a built-in kernel function, input the initial values for signal standard deviation and the characteristic length scale(s) as a numeric vector. When providing the initial kernel parameter values for a custom kernel function, input the initial values the unconstrained parametrization vector θ .fitrgpuses analytical derivatives to estimate parameters when using a built-in kernel function, whereas when using a custom kernel function it uses numerical derivatives.

References

[1] Rasmussen, C. E. and C. K. I. Williams.Gaussian Processes for Machine Learning.MIT Press. Cambridge, Massachusetts, 2006.

[2] Neal, R. M.Bayesian Learning for Neural Networks.Springer, New York. Lecture Notes in Statistics, 118, 1996.

See Also

|

Related Topics