TurbuStat ¶

Cramer Distance¶

The Cramer statistic was introduced by Baringhaus & Franz (2004) for multivariate two-sample testing. The statistic is defined as the difference of the Euclidean distances between the two data sets subtracted by half of the distances measured within each data set.

Yeremi et al. 2015 applied this to position-position-velocity data cubes by selecting a sample of the brightest pixels in each spectral channel to reduce the cube to a 2D data matrix. It was also used tested in Koch et al. 2017, and the definition used in TurbuStat can be found there.

Warning

Koch et al. 2017 find that this test is unsuitable for comparing data cubes that have a large difference in their mean intensities. When using this metric, be sure that the intensity distributions have similar mean intensities or apply some normalization prior to running the metric. Be cautious when interpreting these results and ensure that the distances are compared to a well-understood fiducial.

Using¶

The data in this tutorial are available here.

We need to import the Cramer_Distance class, along with a few other common packages:

>>> from turbustat.statistics import Cramer_Distance
>>> from astropy.io import fits
>>> import matplotlib.pyplot as plt

And we load in the two data sets. The Cramer statistic needs two cubes:

>>> cube = fits.open("Design4_flatrho_0021_00_radmc.fits")[0]  # doctest: +SKIP
>>> cube_fid = fits.open("Fiducial0_flatrho_0021_00_radmc.fits")[0]  # doctest: +SKIP

Cramer_Distance takes the two cubes as inputs. Minimum intensity values for the statistic to consider can be specified with noise_value1 and noise_value2.

>>> cramer = Cramer_Distance(cube_fid, cube, noise_value1=-np.inf,
...                          noise_value2=-np.inf)  # doctest: +SKIP

Note that, since the Cramer statistic defaults to using the upper 20% of the values in each spectral channel, there may not be large differences in the distance when the noise values are low.

The 2D data matrices and the Cramer statistic can now be calculated with:

>>> cramer.distance_metric(normalize=True, n_jobs=1, verbose=True)  # doctest: +SKIP

Setting verbose=True creates this figure, where the data matrices are shown for each data cube. The x-axis are the spectral channels and the y-axis are, ordered with the largest at the bottom, the largest pixel values in that spectral channel. Custom labels can be set by setting label1 and label2 in the distance metric call above.

The argument n_jobs sets how many cores to use when calculating pairwise distances with the sklearn paired_distances function. This is the slowest step in computing the Cramer statistic; see format_data for more information.

The distance between the data cubes is:

>>> cramer.distance  # doctest: +SKIP
0.18175851051788378

distance_metric performs two steps: format_data to find the 2D data matrix for each cube, and cramer_statistic to calculate the distance. These steps can be run separately to allow for changes in the keyword arguments of both.

References¶

Baringhaus & Franz 2004

Delta-Variance Distance¶

See the tutorial for a description of Delta-Variance.

The distance metric for Delta-Variance is DeltaVariance_Distance. There are two definitions of a distance:

The curve distance is the L2 norm between the delta-variance curves normalized by the sum of each curve:

\[d_{curve} = \left|\left|\frac{\sigma_{\Delta,1}^2 (\ell)}{\sum_i \sigma_{\Delta,1}^2 (\ell)} - \frac{\sigma_{\Delta,2}^2 (\ell)}{\sum_i \sigma_{\Delta,2}^2 (\ell)}\right|\right|\]

\(\sigma_{\Delta,i}\) are the delta-variance values at lag \(\ell\).

This is a non-parametric attempt to describe the entire delta-variance curve, including regions that are not well fit by a power-law model.

Warning

This distance requires the delta-variance to be measured at the same lags in angular units. This is described further below.
The slope distance is the t-statistic of the difference in the fitted slopes:

\[d_{\rm slope} = \frac{|\beta_1 - \beta_2|}{\sqrt{\sigma_{\beta_1}^2 + \sigma_{\beta_1}^2}}\]

\(\beta_i\) are the slopes of the delta-variance curves and \(\sigma_{\beta_i}\) are the uncertainty of the slopes.

More information on the distance metric definitions can be found in Koch et al. 2017

Using¶

The data in this tutorial are available here.

We need to import the DeltaVariance_Distance class, along with a few other common packages:

>>> from turbustat.statistics import DeltaVariance_Distance
>>> from astropy.io import fits
>>> import matplotlib.pyplot as plt

And we load in the two data sets; in this case, two integrated intensity (zeroth moment) maps:

>>> moment0 = fits.open("Design4_flatrho_0021_00_radmc_moment0.fits")[0]  # doctest: +SKIP
>>> moment0_fid = fits.open("Fiducial0_flatrho_0021_00_radmc_moment0.fits")[0]  # doctest: +SKIP

The error maps are saved in the second extension of these FITS files. These can be used as weights for the Delta-Variance:

>>> moment0_err = fits.open("Design4_flatrho_0021_00_radmc_moment0.fits")[1]  # doctest: +SKIP
>>> moment0_fid_err = fits.open("Fiducial0_flatrho_0021_00_radmc_moment0.fits")[1]  # doctest: +SKIP

The images (and optionally the error maps) are passed to the DeltaVariance_Distance class:

>>> delvar = DeltaVariance_Distance(moment0_fid, moment0, weights1=moment0_err,
...                                weights2=moment0_fid_err)  # doctest: +SKIP
>>> delvar.distance_metric(verbose=True, xunit=u.pix)  # doctest: +SKIP
                            WLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.688
Model:                            WLS   Adj. R-squared:                  0.674
Method:                 Least Squares   F-statistic:                     11.70
Date:                Thu, 01 Nov 2018   Prob (F-statistic):            0.00234
Time:                        13:32:37   Log-Likelihood:                 4.9953
No. Observations:                  25   AIC:                            -5.991
Df Residuals:                      23   BIC:                            -3.553
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.5660      0.155     16.602      0.000       2.263       2.869
x1             0.6353      0.186      3.421      0.001       0.271       0.999
==============================================================================
Omnibus:                        3.845   Durbin-Watson:                   0.313
Prob(Omnibus):                  0.146   Jarque-Bera (JB):                3.114
Skew:                          -0.858   Prob(JB):                        0.211
Kurtosis:                       2.784   Cond. No.                         7.05
==============================================================================
                            WLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.956
Model:                            WLS   Adj. R-squared:                  0.954
Method:                 Least Squares   F-statistic:                     66.34
Date:                Thu, 01 Nov 2018   Prob (F-statistic):           3.15e-08
Time:                        13:32:37   Log-Likelihood:                 14.779
No. Observations:                  25   AIC:                            -25.56
Df Residuals:                      23   BIC:                            -23.12
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.6490      0.118     14.001      0.000       1.418       1.880
x1             1.3072      0.160      8.145      0.000       0.993       1.622
==============================================================================
Omnibus:                        0.251   Durbin-Watson:                   0.559
Prob(Omnibus):                  0.882   Jarque-Bera (JB):                0.394
Skew:                           0.195   Prob(JB):                        0.821
Kurtosis:                       2.523   Cond. No.                         10.8
==============================================================================

A summary of the fits are printed along with a plot of the two delta-variance curves and the fit residuals when verbose=True. Custom labels can be set by setting label1 and label2 in the distance metric call.

The distances between these two datasets are:

>>> delvar.curve_distance  # doctest: +SKIP
0.8374744762224977
>>> delvar.slope_distance  # doctest: +SKIP
2.737516700717662

In this case, the default settings were used and all portions of the delta-variance curves were used in the fit, yielding poor fits. Setting can be passed to run by specifying inputs to delvar_kwargs. For example, we will now limit the Delta-Variance fitting between 4 and 10 pixel lags:

>>>  delvar_fit = DeltaVariance_Distance(moment0_fid, moment0, weights1=moment0_err,
...                                      weights2=moment0_fid_err,
...                                      delvar_kwargs={'xlow': 4 * u.pix,
...                                                     'xhigh': 10 * u.pix})  # doctest: +SKIP
>>> delvar_fit.distance_metric(verbose=True, xunit=u.pix)  # doctest: +SKIP
                            WLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.985
Model:                            WLS   Adj. R-squared:                  0.982
Method:                 Least Squares   F-statistic:                     77.59
Date:                Thu, 01 Nov 2018   Prob (F-statistic):           0.000313
Time:                        13:32:37   Log-Likelihood:                 20.519
No. Observations:                   7   AIC:                            -37.04
Df Residuals:                       5   BIC:                            -37.15
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.4484      0.087     28.186      0.000       2.278       2.619
x1             0.9605      0.109      8.809      0.000       0.747       1.174
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   0.931
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.657
Skew:                          -0.378   Prob(JB):                        0.720
Kurtosis:                       1.704   Cond. No.                         16.7
==============================================================================
                            WLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.995
Model:                            WLS   Adj. R-squared:                  0.994
Method:                 Least Squares   F-statistic:                     206.3
Date:                Thu, 01 Nov 2018   Prob (F-statistic):           2.95e-05
Time:                        13:32:37   Log-Likelihood:                 23.185
No. Observations:                   7   AIC:                            -42.37
Df Residuals:                       5   BIC:                            -42.48
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.7989      0.065     27.823      0.000       1.672       1.926
x1             1.1402      0.079     14.363      0.000       0.985       1.296
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   1.289
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.654
Skew:                           0.062   Prob(JB):                        0.721
Kurtosis:                       1.507   Cond. No.                         16.6
==============================================================================

The fits are improved, particularly for the first data set (moment0_fid), with the limits specified. Both of the distances are changed: the slope distance from the improved fits and curve distance because the comparison is limited to the fit limits:

>>> delvar_fit.curve_distance  # doctest: +SKIP
0.06769078224562503
>>> delvar_fit.slope_distance  # doctest: +SKIP
1.3324272202721044

What if you want to set different limits for the two datasets? Or how can you handle datasets with different boundary conditions in the convolution (i.e., observations vs simulated observations)? A second set of kwargs can be given with delvar2_kwargs, which specifies the parameters for the second dataset. For example, to pass a different set of fit limits for the second dataset (moment0):

>>>  delvar_fitdiff = DeltaVariance_Distance(moment0_fid, moment0, weights1=moment0_err,
...                                          weights2=moment0_fid_err,
...                                          delvar_kwargs={'xlow': 4 * u.pix,
...                                                         'xhigh': 10 * u.pix},
...                                          delvar2_kwargs={'xlow': 6 * u.pix,
...                                                          'xhigh': 20 * u.pix})  # doctest: +SKIP
>>> delvar_fitdiff.distance_metric(verbose=True, xunit=u.pix)  # doctest: +SKIP
                            WLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.985
Model:                            WLS   Adj. R-squared:                  0.982
Method:                 Least Squares   F-statistic:                     77.59
Date:                Thu, 01 Nov 2018   Prob (F-statistic):           0.000313
Time:                        13:32:38   Log-Likelihood:                 20.519
No. Observations:                   7   AIC:                            -37.04
Df Residuals:                       5   BIC:                            -37.15
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.4484      0.087     28.186      0.000       2.278       2.619
x1             0.9605      0.109      8.809      0.000       0.747       1.174
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   0.931
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.657
Skew:                          -0.378   Prob(JB):                        0.720
Kurtosis:                       1.704   Cond. No.                         16.7
==============================================================================
                            WLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.999
Model:                            WLS   Adj. R-squared:                  0.999
Method:                 Least Squares   F-statistic:                 1.084e+04
Date:                Thu, 01 Nov 2018   Prob (F-statistic):           1.99e-12
Time:                        13:32:38   Log-Likelihood:                 36.872
No. Observations:                   9   AIC:                            -69.74
Df Residuals:                       7   BIC:                            -69.35
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.8957      0.010    198.429      0.000       1.877       1.914
x1             1.0257      0.010    104.110      0.000       1.006       1.045
==============================================================================
Omnibus:                        0.438   Durbin-Watson:                   3.074
Prob(Omnibus):                  0.803   Jarque-Bera (JB):                0.436
Skew:                           0.381   Prob(JB):                        0.804
Kurtosis:                       2.237   Cond. No.                         16.6
==============================================================================

_images/delvar_distmet_fitlimits_diff.png

The fit limits, shown with the dot-dashed vertical lines in the plot, differ between the datasets. This will change the slope distance:

>>> delvar_fit.slope_distance  # doctest: +SKIP
0.5956856398497301

But the curve distance is no longer defined:

>>> delvar_fit.curve_distance  # doctest: +SKIP
nan

The curve distance is only valid when the same set of lags are used to compute the delta-variance. Thus having different fit limits violates this condition and the distance is returned as a nan.

The curve distance will also be undefined if different sets of lags are used for the datasets. By default, use_common_lags=True is used in DeltaVariance_Distance, which will find a common set of scales in angular units between the two datasets.

For further fine-tuning of the delta-variance for either dataset, the DeltaVariance classes for each dataset can be accessed as delvar1 and delvar2. Each of these class instances can be run separately, as shown in the delta-variance tutorial, to fine-tune or alter how the delta-variance is computed.

References¶

Dendrogram Distance¶

See the tutorial for a description of the dendrogram statistics.

Warning

Requires the optional astrodendro package to be installed. See the documentation

Using the two comparisons defined by Burkhart et al. 2013, Dendro_Distance provides two distance metrics:

The distance between histograms of peak intensity in the leaves of the dendrogram, measured over a range of minimum branch heights, is:

\[d_{\mathrm{Hist}} = \left[\sum H(p_{1,\delta_I},p_{2,\delta_I})\right]/N_\delta\]

\(p_{i,\delta_I}\) are the histograms with minimum branch height of \(\delta_I\), \(H(i, j)\) is the Hellinger distance, and \(N_{\delta}\) is the number of branch heights (and histograms) that the dendrogram was computed for.
The slopes of the linear relation fit to the log of the number of features in the tree as a function of minimum branch height:

\[d_{\rm slope} = \frac{|\beta_1 - \beta_2|}{\sqrt{\sigma_{\beta_1}^2 + \sigma_{\beta_1}^2}}\]

\(\beta_i\) are the slopes of the fitted lines and \(\sigma_{\beta_i}\) are the uncertainty of the slopes.

More information on the distance metric definitions can be found in Koch et al. 2017

Using¶

The data in this tutorial are available here.

We need to import the Dendrogram_Distance class, along with a few other common packages:

>>> from turbustat.statistics import Dendrogram_Distance
>>> from astropy.io import fits
>>> import matplotlib.pyplot as plt

And we load in the two data sets. Dendrogram_Distance can be given two 2D images or cubes. For this example, we will use two cubes:

>>> cube = fits.open("Design4_flatrho_0021_00_radmc.fits")[0]  # doctest: +SKIP
>>> cube_fid = fits.open("Fiducial0_flatrho_0021_00_radmc.fits")[0]  # doctest: +SKIP

Dendrogram_Distance requires the two datasets to be given. A number of other parameters can be specified to control the dendrogram settings or fitting settings. This example sets the minimum deltas (branch height) for the dendrograms, as explained in the dendrogram tutorial. Other dendrogram settings, such as the minimum pixel intensity to use and the minimum number of pixels per structure, are also set.

>>> dend_dist = Dendrogram_Distance(cube_fid, cube,
...                                 min_deltas=np.logspace(-2, 0, 50),
...                                 min_features=100,
...                                 dendro_params={"min_value": 0.005,
...                                                "min_npix": 50})  # doctest: +SKIP

The min_features sets a threshold on the number of ‘features’ in a dendrogram needed for it to be included to calculate the distances. “Features” is the number of branches and leaves in the dendrogram. As delta is increased in the dendrogram, the number of features drops significantly, with large values leaving only a few features in the dendrogram. min_features ensures a meaningful histogram can be measured from the dendrogram properties.

If additional parameters need to be set to create the dendrograms, dendro_kwargs takes a dictionary as input and passes the arguments to run. Separate settings can be given for each dataset by specifying both dendro_kwargs and dendro2_kwargs. The individual Dendrogram_Stats objects can be also be accessed as dendro1 and dendro2 (see the dendrogram tutorial for more information).

To calculate the two dendrogram distances, we run:

>>> dend_dist.distance_metric(verbose=True)  # doctest: +SKIP

The distance computation is very fast for both methods so both distance metrics are always computed.

Verbose mode creates two plots, which can be saved by specifying save_name in the call above. The first plot shows the histograms used in the Hellinger distance.

_images/dendrogram_distmet.hist_distance.png

The top two panels are the ECDFs of the histograms of peak intensity within features (branches or leaves) of the dendrogram. The histograms are shown in the bottom two panels. The first dataset is shown in the first column plots and the second in the second column plots. Note that the intensity values are standardized in all plots. There are several curves/histograms shown in each plot. Each one is the dendrogram with different cut-offs of the minimum delta (branch height).

The histogram distance is:

>>> dend_dist.histogram_distance  # doctest: +SKIP
0.14298381514818145

The second plot shows the log of the number of features (branches + leaves) in a dendrogram as a function of log delta (minimum branch height):

_images/dendrogram_distmet.num_distance.png

A line is fit to this relation, and the difference in the slopes of those lines is used to calculate the distance:

>>> dend_dist.num_distance  # doctest: +SKIP
2.7987025053709766

For both plots, the plotting labels can be changed from 1 and 2 by setting label1 and label2 in distance_metric.

For large data sets, creating the dendrogram can be slow. Particularly when comparing many datasets to a fiducial dataset, recomputing the dendrogram each time wastes a lot of time. Dendrogram_Distance can be passed a precomputed Dendrogram_Stats instead of giving a dataset. See the distance metric introduction.

Warning

The saved dendrograms should be run with the same min_deltas given to Dendrogram_Stats. The histogram distance is only valid when comparing dendrograms measured with the same deltas.

References¶

Genus Distance¶

See the tutorial for a description of Genus statistics.

The distance metric for Genus is Genus_Distance. The distance between the Genus curves is defined as:

\[d_{\mathrm{genus}} = \left|\left|\frac{G_{1}\left(I_{0,i}\right)}{A_1} - \frac{G_{2}\left(I_{0,i}\right)}{A_2}\right|\right|\]

where \(G_{j}\left(I_{0, i}\right)\) are the Genus curves, and \(A_{j}\) is the total area each Genus curve is measured over to normalize comparisons of images of different sizes.

More information on the distance metric definitions can be found in Koch et al. 2017

Using¶

The data in this tutorial are available here.

We need to import the Genus_Distance class, along with a few other common packages:

>>> from turbustat.statistics import Genus_Distance
>>> from astropy.io import fits
>>> import astropy.units as u
>>> import numpy as np

And we load in the two data sets; in this case, two integrated intensity (zeroth moment) maps:

>>> moment0 = fits.open("Design4_flatrho_0021_00_radmc_moment0.fits")[0]  # doctest: +SKIP
>>> moment0_fid = fits.open("Fiducial0_flatrho_0021_00_radmc_moment0.fits")[0]  # doctest: +SKIP

The two images are passed to the Genus_Distance class:

>>> genus = Genus_Distance(moment0_fid, moment0,
...                        lowdens_percent=15, highdens_percent=85, numpts=100,
...                        genus_kwargs=dict(min_size=4 * u.pix**2))  # doctest: +SKIP

Genus_Distance accepts similar keyword arguments to Genus. Keywords to run can be specified in a dictionary to genus_kwargs. Separate keywords for the second image (moment0) can be specified in a second dictionary to genus2_kwargs.

To find the distance between the images:

>>> genus.distance_metric(verbose=True)  # doctest: +SKIP

This returns a figure that plots the Genus curves of the two images, where the image values are standardized (zero mean and standard deviation of one). Colours, labels, and symbols in the plot can be changed. See distance_metric.

When comparing many images to a fiducial image, a pre-computed Genus can be passed instead of a dataset. See the distance metric introduction.

References¶

MVC Distance¶

See the tutorial for a description of Modified Velocity Centroids (MVC).

The distance metric for MVC is based on the t-statistics of the difference between the power spectrum slopes:

\[d_{\rm MVC} = \frac{\left| \beta_1 - \beta_2 \right|}{\sqrt{\sigma_{\beta_1}^2 + \sigma_{\beta_1}^2}}\]

\(\beta_i\) and \(\sigma_{\beta_i}\) is the index and index uncertainty, respectively.

More information on the distance metric definitions can be found in Koch et al. 2017.

Using¶

The data in this tutorial are available here.

We need to import the MVC_Distance class, along with a few other common packages:

>>> from turbustat.statistics import MVC_Distance
>>> from astropy.io import fits
>>> import matplotlib.pyplot as plt

MVC is the only (current) statistic in TurbuStat that requires multiple moment arrays. Because of this, the input for MVC_Distance has a different format than the other distance metrics: a dictionary that contains the array and headers:

>>> moment0 = fits.open("Design4_flatrho_0021_00_radmc_moment0.fits")[0]  # doctest: +SKIP
>>> centroid = fits.open("Design4_flatrho_0021_00_radmc_centroid.fits")[0]  # doctest: +SKIP
>>> lwidth = fits.open("Design4_flatrho_0021_00_radmc_linewidth.fits")[0]  # doctest: +SKIP
>>> data = {"moment0": [moment0.data, moment0.header],
...         "centroid": [centroid.data, centroid.header],
...         "linewidth": [lwidth.data, lwidth.header]}  # doctest: +SKIP

And we create a second dictionary for the data set to compare with:

>>> moment0_fid = fits.open("Fiducial0_flatrho_0021_00_radmc_moment0.fits")[0]  # doctest: +SKIP
>>> centroid_fid = fits.open("Fiducial0_flatrho_0021_00_radmc_centroid.fits")[0]  # doctest: +SKIP
>>> lwidth_fid = fits.open("Fiducial0_flatrho_0021_00_radmc_linewidth.fits")[0]  # doctest: +SKIP
>>> data_fid = {"moment0": [moment0.data, moment0.header],
...             "centroid": [centroid.data, centroid.header],
...             "linewidth": [lwidth.data, lwidth.header]}  # doctest: +SKIP

These dictionaries can optionally include uncertainty arrays for the moments using the same format with keywords moment0_error, centroid_error, and linewidth_error.

These dictionaries get passed to MVC_Distance:

>>> mvc = MVC_Distance(data_fid, data)  # doctest: +SKIP

To calculate the distance between the MVC power-spectra is calculated with:

>>> mvc.distance_metric(verbose=True, xunit=u.pix**-1)  # doctest: +SKIP
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.925
Model:                            OLS   Adj. R-squared:                  0.924
Method:                 Least Squares   F-statistic:                     378.5
Date:                Tue, 13 Nov 2018   Prob (F-statistic):           8.18e-34
Time:                        10:21:40   Log-Likelihood:                -62.343
No. Observations:                  91   AIC:                             128.7
Df Residuals:                      89   BIC:                             133.7
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         15.2461      0.161     94.965      0.000      14.931      15.561
x1            -4.8788      0.251    -19.455      0.000      -5.370      -4.387
==============================================================================
Omnibus:                        5.193   Durbin-Watson:                   0.068
Prob(Omnibus):                  0.075   Jarque-Bera (JB):                4.522
Skew:                          -0.459   Prob(JB):                        0.104
Kurtosis:                       2.408   Cond. No.                         4.40
==============================================================================
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.941
Model:                            OLS   Adj. R-squared:                  0.941
Method:                 Least Squares   F-statistic:                     477.5
Date:                Tue, 13 Nov 2018   Prob (F-statistic):           1.55e-37
Time:                        10:21:40   Log-Likelihood:                -52.867
No. Observations:                  91   AIC:                             109.7
Df Residuals:                      89   BIC:                             114.8
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         14.0302      0.144     97.714      0.000      13.749      14.312
x1            -5.0144      0.229    -21.853      0.000      -5.464      -4.565
==============================================================================
Omnibus:                        3.541   Durbin-Watson:                   0.129
Prob(Omnibus):                  0.170   Jarque-Bera (JB):                3.488
Skew:                          -0.469   Prob(JB):                        0.175
Kurtosis:                       2.800   Cond. No.                         4.40
==============================================================================

The MVC spectra are plotted in the figure and the fit summaries are printed out. The distance between the indices is:

>>> mvc.distance  # doctest: +SKIP
0.3988169606167437

This is an awful fit. We want to limit where the spectra are fit. Keywords for MVC can be passed with low_cut, high_cut, breaks, pspec_kwargs and pspec2_kwargs. If separate parameters need to be set, a two-element list or array can be given to low_cut, high_cut and breaks; the second element will be used for the second data set. For example, limiting the fit region can be done with:

>>> mvc = MVC_Distance(data_fid, data, low_cut=0.02 / u.pix,
...                    high_cut=0.4 / u.pix)  # doctest: +SKIP
>>> mvc.distance_metric(verbose=True, xunit=u.pix**-1)  # doctest: +SKIP
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.946
Model:                            OLS   Adj. R-squared:                  0.942
Method:                 Least Squares   F-statistic:                     135.6
Date:                Tue, 13 Nov 2018   Prob (F-statistic):           2.99e-08
Time:                        10:36:41   Log-Likelihood:                 10.700
No. Observations:                  15   AIC:                            -17.40
Df Residuals:                      13   BIC:                            -15.98
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         17.9988      0.266     67.588      0.000      17.477      18.521
x1            -2.5502      0.219    -11.647      0.000      -2.979      -2.121
==============================================================================
Omnibus:                        1.189   Durbin-Watson:                   2.376
Prob(Omnibus):                  0.552   Jarque-Bera (JB):                0.814
Skew:                          -0.200   Prob(JB):                        0.666
Kurtosis:                       1.931   Cond. No.                         13.5
==============================================================================
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.951
Model:                            OLS   Adj. R-squared:                  0.948
Method:                 Least Squares   F-statistic:                     70.08
Date:                Tue, 13 Nov 2018   Prob (F-statistic):           1.36e-06
Time:                        10:36:41   Log-Likelihood:                 10.420
No. Observations:                  15   AIC:                            -16.84
Df Residuals:                      13   BIC:                            -15.42
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         16.7135      0.390     42.879      0.000      15.950      17.477
x1            -2.7335      0.327     -8.371      0.000      -3.373      -2.094
==============================================================================
Omnibus:                        0.831   Durbin-Watson:                   2.076
Prob(Omnibus):                  0.660   Jarque-Bera (JB):                0.621
Skew:                          -0.449   Prob(JB):                        0.733
Kurtosis:                       2.568   Cond. No.                         13.5
==============================================================================

The distance is now:

>>> mvc.distance  # doctest: +SKIP
0.46621655722371613

A pre-computed MVC class can also be passed instead of giving a dataset as the input. See the distance metric introduction.

References¶

PCA Distance¶

See the tutorial for a description of Principal Component Analysis (PCA).

We define the PCA distance as the L2 distance between the normalized eigenvalues of two spectral-line data cubes:

\[d_{\mathrm{PCA}} = \left|\left|\lambda_{1}' - \lambda_{2}'\right|\right|\]

The normalized eigenvalues are \(\lambda_{i} / \sum_i \lambda_{i}\).

More information on the distance metric definitions can be found in Koch et al. 2017.

Using¶

The data in this tutorial are available here.

We need to import the PCA_Distance class, along with a few other common packages:

>>> from turbustat.statistics import PCA_Distance
>>> from astropy.io import fits
>>> import matplotlib.pyplot as plt

PCA_Distance takes two data cubes as input:

>>> cube = fits.open("Design4_flatrho_0021_00_radmc.fits")[0]  # doctest: +SKIP
>>> cube_fid = fits.open("Fiducial0_flatrho_0021_00_radmc.fits")[0]  # doctest: +SKIP
>>> pca = PCA_Distance(cube_fid, cube, n_eigs=50, mean_sub=True)  # doctest: +SKIP

There are two additional keywords that set the number of eigenvalues to include in the distance calculation (n_eigs), and whether to subtract the mean from each spectral channel (mean_sub).

To calculate the distance between the eigenvalues:

>>> pca.distance_metric(verbose=True)  # doctest: +SKIP
Proportions of total variance: 1 - 1.000, 2 - 1.000

This prints out what fraction of the total variance is included in the eigenvalue vectors for each data cube. The image shows the covariance matrix for each data cube in the first row and a bar chart of the eigenvalues in the second row.

And the distance is:

>>> pca.distance  # doctest: +SKIP
0.07211706060387167

Note that a comparison of the size-line width from PCA as a distance metric is not yet implemented.

A pre-computed PCA class can also be passed instead of a dataset. See the distance metric introduction.

References¶

PDF Distance¶

See the tutorial for a description of PDFs.

There are multiple ways to define the distance between PDFs. Two of the metrics are non-parametric:

The Hellinger distance between the PDFs (computed over the same set of bins):

\[d_{\rm Hellinger}(p_1,p_2) = \frac{1}{\sqrt{2}}\left\{\sum_{\tilde{I}} \left[ \sqrt{p_1(\tilde{I})} - \sqrt{p_{2}(\tilde{I})} \right]^2\right\}^{1/2}.\]

where \(p_i\) are the histogram values at the bin \(\tilde{I}\).
The Kolmogorov-Smirnov Distance between the ECDFs of the PDFs:

\[d_{\rm KS}(P_1, P_2) = {\rm sup} \left| P_1(\tilde{I}) - P_2(\tilde{I}) \right|\]

where \(P_i\) is the ECDF at the value \(\tilde{I}\).

There is also one parametric distance metric included in PDF_Distance: the t-statistic of the difference in the fitted log-normal widths:

\[d_{\rm LN} = \frac{\left| w_1 - w_2 \right|}{\sqrt{\sigma_{w_1}^2 + \sigma_{w_1}^2}}\]

where \(w_i\) is the width of the log-normal distribution fit.

More information on the distance metric definitions can be found in Koch et al. 2017.

Using¶

The data in this tutorial are available here.

We need to import the PDF_Distance class, along with a few other common packages:

>>> from turbustat.statistics import PDF_Distance
>>> from astropy.io import fits
>>> import matplotlib.pyplot as plt

And we load in the two data sets. PDF_Distance can be given two 2D images or cubes. For this example, we will use two integrated intensity images:

>>> moment0 = fits.open(osjoin(data_path, "Design4_flatrho_0021_00_radmc_moment0.fits"))[0]  
>>> moment0_fid = fits.open(osjoin(data_path, "Fiducial0_flatrho_0021_00_radmc_moment0.fits"))[0]  

These two images are given as the inputs to PDF_Distance. Other parameters can be set here, including the minimum images values to be included in the histograms (min_val1/min_val2), whether to fit a log-normal distribution (do_fit), and what type of normalization to use on the data (normalization_type; see the PDF tutorial):

>>> pdf = PDF_Distance(moment0_fid, moment0, min_val1=0.0, min_val2=0.0,
...                    do_fit=True, normalization_type=None)  # doctest: +SKIP

This will create and run two PDF instances using a common set of bins for the histograms. These can be accessed as pdf1 and pdf2.

To calculate the distances, we run:

>>> pdf.distance_metric(verbose=True)  # doctest: +SKIP
Optimization terminated successfully.
         Current function value: 6.335450
         Iterations: 36
         Function evaluations: 72
Optimization terminated successfully.
         Current function value: 6.007851
         Iterations: 34
         Function evaluations: 69
                              Likelihood Results
==============================================================================
Dep. Variable:                      y   Log-Likelihood:            -1.0380e+05
Model:                     Likelihood   AIC:                         2.076e+05
Method:            Maximum Likelihood   BIC:                         2.076e+05
Date:                Wed, 14 Nov 2018
Time:                        09:58:10
No. Observations:               16384
Df Residuals:                   16382
Df Model:                           2
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
par0           0.4553      0.003    181.019      0.000       0.450       0.460
par1         299.8377      1.067    281.114      0.000     297.747     301.928
==============================================================================
                              Likelihood Results
==============================================================================
Dep. Variable:                      y   Log-Likelihood:                -98433.
Model:                     Likelihood   AIC:                         1.969e+05
Method:            Maximum Likelihood   BIC:                         1.969e+05
Date:                Wed, 14 Nov 2018
Time:                        09:58:10
No. Observations:               16384
Df Residuals:                   16382
Df Model:                           2
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
par0           0.4360      0.002    181.019      0.000       0.431       0.441
par1         225.6771      0.769    293.602      0.000     224.171     227.184
==============================================================================

This returns a summary of the log-normal fits (if do_fit=True) and a plot of the PDF and ECDF of each data set. The solid lines in the plot are the fitted distributions.

By default, all three distance metrics are run. For these images, the distances are:

>>> pdf.hellinger_distance  # doctest: +SKIP
0.23007068347013115

>>> pdf.ks_distance  # doctest: +SKIP
0.24285888671875

>>> pdf.lognormal_distance  # doctest: +SKIP
5.561198154785891

Each distance metric can be run separately by running its function in PDF_Distance, or by setting the statistic keyword in distance_metric.

Because of the Hellinger distance requires that the PDF histograms have the same bins, there is no input to give a pre-computed fiducial PDF, unlike most of the other distance metric classes.

References¶

Spatial Power Spectrum Distance¶

See the tutorial for a description of the spatial power-spectrum.

The distance metric for the power-spectrum is PSpec_Distance. The distance is defined as the t-statistic between the indices of the power-spectra:

\[d_{\rm slope} = \frac{|\beta_1 - \beta_2|}{\sqrt{\sigma_{\beta_1}^2 + \sigma_{\beta_1}^2}}\]

\(\beta_i\) are the slopes of the power-spectra and \(\sigma_{\beta_i}\) are the uncertainties.

More information on the distance metric definitions can be found in Koch et al. 2017

Using¶

The data in this tutorial are available here.

We need to import the Pspec_Distance class, along with a few other common packages:

>>> from turbustat.statistics import PSpec_Distance
>>> from astropy.io import fits
>>> import matplotlib.pyplot as plt
>>> import astropy.units as u

And we load in the two data sets; in this case, two integrated intensity (zeroth moment) maps:

>>> moment0 = fits.open("Design4_flatrho_0021_00_radmc_moment0.fits")[0]  # doctest: +SKIP
>>> moment0_fid = fits.open("Fiducial0_flatrho_0021_00_radmc_moment0.fits")[0]  # doctest: +SKIP

These two images are given as inputs to Pspec_Distance. We know from the power-spectrum tutorial that there should be limits set on where the power-spectra are fit. These can be specified with low_cut and high_cut, along with breaks if the power-spectrum is best fit with a broken power-law model. In this case, we will use the same fit limits for both power-spectra, but separate limits can be given for each image by giving a two-element list to any of these three keywords.

>>> pspec = PSpec_Distance(moment0_fid, moment0,
...                        low_cut=0.025 / u.pix, high_cut=0.1 / u.pix,)  # doctest: +SKIP

This will create and run two PowerSpectrum instances, which can be accessed as pspec1 and pspec2.

To find the distance between these two images, we run:

>>> pspec.distance_metric(verbose=True, xunit=u.pix**-1)  # doctest: +SKIP
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.930
Model:                            OLS   Adj. R-squared:                  0.924
Method:                 Least Squares   F-statistic:                     222.2
Date:                Wed, 14 Nov 2018   Prob (F-statistic):           4.18e-09
Time:                        10:21:04   Log-Likelihood:                 11.044
No. Observations:                  14   AIC:                            -18.09
Df Residuals:                      12   BIC:                            -16.81
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          6.8004      0.202     33.648      0.000       6.404       7.197
x1            -2.3745      0.159    -14.905      0.000      -2.687      -2.062
==============================================================================
Omnibus:                        0.495   Durbin-Watson:                   2.400
Prob(Omnibus):                  0.781   Jarque-Bera (JB):                0.547
Skew:                          -0.148   Prob(JB):                        0.761
Kurtosis:                       2.078   Cond. No.                         15.2
==============================================================================
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.971
Model:                            OLS   Adj. R-squared:                  0.968
Method:                 Least Squares   F-statistic:                     495.9
Date:                Wed, 14 Nov 2018   Prob (F-statistic):           3.96e-11
Time:                        10:21:04   Log-Likelihood:                 14.077
No. Observations:                  14   AIC:                            -24.15
Df Residuals:                      12   BIC:                            -22.87
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.5109      0.171     32.289      0.000       5.176       5.845
x1            -3.0223      0.136    -22.269      0.000      -3.288      -2.756
==============================================================================
Omnibus:                        0.901   Durbin-Watson:                   2.407
Prob(Omnibus):                  0.637   Jarque-Bera (JB):                0.718
Skew:                          -0.215   Prob(JB):                        0.698
Kurtosis:                       1.977   Cond. No.                         15.2
==============================================================================

The fit summaries and a plot of the power-spectra with their fits are returned when verbose=True. Colours, labels, and symbols can be altered in the plot with the keywords plot_kwargs1 and plot_kwargs2 in distance_metric.

The distance between these two images is:

>>> pspec.distance  # doctest: +SKIP
3.0952798493530262

Recomputing an already compute power-spectrum can be avoided by passing a pre-computed PowerSpectrum instead of a dataset. See the distance metric introduction.

References¶

SCF Distance¶

See the tutorial for a description of the Spectral Correlation Function (SCF).

The SCF creates a surface by shifting a spectral-line cube and calculating the correlation of the shifted cube with the original cube. The distance metric defined in SCF_Distance is the L2 distance between the correlation surfaces, weighted by the inverse of the lag:

\[d_{\mathrm{SCF}} = \left( \frac{\sum_{\boldsymbol{\ell}}[S_1(\boldsymbol{\ell})-S_2(\boldsymbol{\ell})]^2/|\boldsymbol{\ell}|}{\sum_{\boldsymbol{\ell}} 1/|\boldsymbol{\ell}|}\right)^{1/2}.\]

where \(S_i\) is the correlation surface and \(\ell\) is the spatial lag between the shifted and original cubes.

This direct comparison between the correlation surfaces requires that a common set of spatial lags be used. SCF_Distance creates a common set of angular lags to compare two data cubes.

More information on the distance metric definitions can be found in Koch et al. 2017

Using¶

The data in this tutorial are available here.

We need to import the SCF_Distance class, along with a few other common packages:

>>> from turbustat.statistics import SCF_Distance
>>> from astropy.io import fits
>>> import matplotlib.pyplot as plt

SCF_Distance takes two data cubes as input:

>>> cube = fits.open("Design4_flatrho_0021_00_radmc.fits")[0]  # doctest: +SKIP
>>> cube_fid = fits.open("Fiducial0_flatrho_0021_00_radmc.fits")[0]  # doctest: +SKIP
>>> scf = SCF_Distance(cube_fid, cube, size=11)  # doctest: +SKIP

This call runs SCF for the two cubes, which can be accessed with scf1 and scf2.

The default setting assumes that the boundaries are continuous (e.g., simulated observations from a periodic-box simulation, like this example). To change how boundaries are handled, boundary can be set in SCF_Distance. For example, for observational data, boundary='cut' should be used. When comparing a simulated observation to a real observation, different boundary conditions can be given: boundary=['cut', 'continuous']. The first list item will be used for the first cube given to SCF_Distance and the same for the second list item.

To calculate the distance between the cubes:

>>> scf.distance_metric(verbose=True)  # doctest: +SKIP

With verbose=True, this function creates a plot of the SCF correlation surfaces (top row), the weighted difference between the surfaces (left, second row), and azimuthally-averaged SCF curves for both cubes (right, second row).

The distance between the SCF surfaces is:

>>> scf.distance  # doctest: +SKIP
0.08101015924738914

By default, the distance between the surfaces is weighted by the lag (see equation above). This weighting can be disabled by setting weighted=False in distance_metric, and the distance metrics reduces to the L2 norm between the surfaces.

A pre-computed SCF class can be also passed instead of a data cube. However, the SCF will need to be recomputed if the lags are different from the common set defined in SCF_Distance. See the distance metric introduction.

References¶

Statistical Moments Distance¶

See the tutorial for a description of the statistics moments.

StatsMoments calculates the first four moments—the mean, variance, skewness, and kurtosis—in circular regions of a map. For comparisons between different images, we utilize the skewness and kurtosis as metrics since they are not affected by offsets in the mean and are normalized by the variance. We reduce the moment images to histograms and define metrics from differences in the histograms. The histogram difference is converted to a distance by using the Hellinger distance:

\[H(p_1,p_2) = \frac{1}{\sqrt{2}}\left\{\sum_{\tilde{I}} \left[ \sqrt{p_1(\tilde{I})} - \sqrt{p_{2}(\tilde{I})} \right]^2\right\}^{1/2}.\]

where \(p_i\) is the histogram of an image and \(\tilde{I}\) is the bin of the histogram. The same definition is used for the skewness and kurtosis.

More information on the distance metric definitions can be found in Koch et al. 2017

Using¶

The data in this tutorial are available here.

We need to import the StatMoments_Distance class, along with a few other common packages:

>>> from turbustat.statistics import StatMoments_Distance
>>> from astropy.io import fits
>>> import matplotlib.pyplot as plt
>>> import astropy.units as u

And we load in the two data sets; in this case, two integrated intensity (zeroth moment) maps:

>>> moment0 = fits.open("Design4_flatrho_0021_00_radmc_moment0.fits")[0]  # doctest: +SKIP
>>> moment0_fid = fits.open("Fiducial0_flatrho_0021_00_radmc_moment0.fits")[0]  # doctest: +SKIP

These images are given as inputs to StatMoments_Distance:

>>> moments = StatMoments_Distance(moment0_fid, moment0, radius=5 * u.pix,
...                                periodic1=True, periodic2=True)  # doctest: +SKIP

This runs StatMoments for both images, which can be accessed with moments1 and moments2.

Since the moments are calculated in circular regions, it is important that the a common circular region be used for both images. StatMoments_Distance scales the given radius to a common angular scale using the WCS information in the headers.

The boundary handling can also be changed by setting periodic1 and periodic2 for the first and second images, respectively. By default, periodic boundaries are used, as is appropriated for simulated observations from a periodic-box (like this example). For real observations, periodic boundaries will likely need to be disabled.

To calculate the distance between the images:

>>> moments.distance_metric(verbose=True)  # doctest: +SKIP

Additional arguments for the figure and the number of bins can also be given. The default number of bins for the histogram is set to \(\sqrt{{\rm min}(N_1, N_2)}\) where \(N_i\) is the number of pixels in each image with a finite value.

The distances for the skewness and kurtosis are:

>>> print(moments.skewness_distance, moments.kurtosis_distance)  # doctest: +SKIP
0.01189910501201634 0.019870935761084074

A pre-computed StatMoments class can be passed instead of an image. However, the moments will need to be recomputed if the size differs from the common scale determined in StatMoments_Distance. See the distance metric introduction.

References¶

VCA Distance¶

See the tutorial for a description of Velocity Channel Analysis (VCA).

The VCA distance is defined as the t-statistic of the difference in the fitted slopes:

\[d_{\rm slope} = \frac{|\beta_1 - \beta_2|}{\sqrt{\sigma_{\beta_1}^2 + \sigma_{\beta_1}^2}}\]

\(\beta_i\) are the slopes of the VCA spectra and \(\sigma_{\beta_i}\) are the uncertainty of the slopes.

More information on the distance metric definitions can be found in Koch et al. 2017.

Using¶

The data in this tutorial are available here.

We need to import the VCA_Distance class, along with a few other common packages:

>>> from turbustat.statistics import VCA_Distance
>>> from astropy.io import fits
>>> import matplotlib.pyplot as plt
>>> import astropy.units as u

VCA_Distance takes two data cubes as input:

>>> cube = fits.open("Design4_flatrho_0021_00_radmc.fits")[0]  # doctest: +SKIP
>>> cube_fid = fits.open("Fiducial0_flatrho_0021_00_radmc.fits")[0]  # doctest: +SKIP

From the VCA tutorial, we know that limits should be placed on the power-spectra. These limits can be specified with low_cut and high_cut:

>>> vca = VCA_Distance(cube_fid, cube, low_cut=0.025 / u.pix,
...                    high_cut=0.1 / u.pix)  # doctest: +SKIP

This will run VCA on the given cubes, which can be accessed as vca1 and vca2.

Additional parameters can be specified to VCA. Different fit limits for the two cubes can be given as a two-element list (e.g., low_cut=[0.025 / u.pix, 0.04 / u.pix]). Estimated break points in the power-spectra can be given in the same format, which will enable fitting with a broken-linear model.

To find the distance between the cubes:

>>> vca.distance_metric(verbose=True)  # doctest: +SKIP
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.985
Model:                            OLS   Adj. R-squared:                  0.984
Method:                 Least Squares   F-statistic:                     436.3
Date:                Fri, 16 Nov 2018   Prob (F-statistic):           8.40e-11
Time:                        10:16:28   Log-Likelihood:                 22.506
No. Observations:                  14   AIC:                            -41.01
Df Residuals:                      12   BIC:                            -39.73
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.0853      0.137     37.170      0.000       4.817       5.353
x1            -2.3350      0.112    -20.887      0.000      -2.554      -2.116
==============================================================================
Omnibus:                        0.981   Durbin-Watson:                   1.483
Prob(Omnibus):                  0.612   Jarque-Bera (JB):                0.712
Skew:                          -0.138   Prob(JB):                        0.700
Kurtosis:                       1.930   Cond. No.                         15.2
==============================================================================
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.986
Model:                            OLS   Adj. R-squared:                  0.985
Method:                 Least Squares   F-statistic:                     722.3
Date:                Fri, 16 Nov 2018   Prob (F-statistic):           4.33e-12
Time:                        10:16:28   Log-Likelihood:                 18.704
No. Observations:                  14   AIC:                            -33.41
Df Residuals:                      12   BIC:                            -32.13
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          3.6086      0.141     25.636      0.000       3.333       3.884
x1            -3.2136      0.120    -26.876      0.000      -3.448      -2.979
==============================================================================
Omnibus:                       13.847   Durbin-Watson:                   2.394
Prob(Omnibus):                  0.001   Jarque-Bera (JB):                9.666
Skew:                          -1.631   Prob(JB):                      0.00796
Kurtosis:                       5.434   Cond. No.                         15.2
==============================================================================

This function returns a summary of the fits to the VCA spectra and plots the two spectra with the fits. Colours, symbols and labels in the plot can be changed with plot_kwargs1 and plot_kwargs2 in distance_metric.

The distance is:

>>> vca.distance  # doctest: +SKIP
5.366955632554179

Changing the width of the velocity channels affects the contribution of the turbulent velocity field to the spectrum, thereby altering the measured index (Lazarian & Pogosyan 2000). It is generally advisable to compare cubes with a similar velocity resolution.

In VCA_Distance, the channel width can be changed with channel_width. The new channel width should be (1) larger than the current channel widths of the cubes, and (2) in similar units to the spectral axis of the cubes (i.e., a width in velocity should be given for a spectral axis in velocity).

Warning

Changing the spectral resolution will be slow for large cubes. Consider changing the velocity resolution of large cubes before running VCA.

In this example, we will change the velocity resolution to 400 m/s:

>>> vca = VCA_Distance(cube_fid, cube, low_cut=0.025 / u.pix,
...                    high_cut=0.1 / u.pix, channel_width=400 * u.m / u.s)  # doctest: +SKIP
>>> vca.distance_metric(verbose=True)  # doctest: +SKIP
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.985
Model:                            OLS   Adj. R-squared:                  0.983
Method:                 Least Squares   F-statistic:                     419.3
Date:                Fri, 16 Nov 2018   Prob (F-statistic):           1.06e-10
Time:                        10:16:28   Log-Likelihood:                 22.121
No. Observations:                  14   AIC:                            -40.24
Df Residuals:                      12   BIC:                            -38.96
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          3.0105      0.141     21.350      0.000       2.734       3.287
x1            -2.3639      0.115    -20.478      0.000      -2.590      -2.138
==============================================================================
Omnibus:                        0.854   Durbin-Watson:                   1.515
Prob(Omnibus):                  0.652   Jarque-Bera (JB):                0.676
Skew:                          -0.144   Prob(JB):                        0.713
Kurtosis:                       1.963   Cond. No.                         15.2
==============================================================================
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.985
Model:                            OLS   Adj. R-squared:                  0.984
Method:                 Least Squares   F-statistic:                     684.5
Date:                Fri, 16 Nov 2018   Prob (F-statistic):           5.94e-12
Time:                        10:16:28   Log-Likelihood:                 17.855
No. Observations:                  14   AIC:                            -31.71
Df Residuals:                      12   BIC:                            -30.43
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.5197      0.146     10.408      0.000       1.234       1.806
x1            -3.2379      0.124    -26.163      0.000      -3.481      -2.995
==============================================================================
Omnibus:                       13.778   Durbin-Watson:                   2.379
Prob(Omnibus):                  0.001   Jarque-Bera (JB):                9.575
Skew:                          -1.633   Prob(JB):                      0.00833
Kurtosis:                       5.398   Cond. No.                         15.2
==============================================================================

The VCA power-spectra with 400 m/s channels have a similar slope to the original velocity resolution. The distance then has not significantly changed:

>>> vca.distance  # doctest: +SKIP
5.164776059129051

A pre-computed VCA class can be also passed instead of a data cube. See the distance metric introduction.

References¶

VCS Distance¶

See the tutorial for a description of Velocity Coordinate Spectrum (VCS).

There are four distance definitions for the VCS based on the broken linear modelling described above. All of these distance metrics are based on t-statistics between the VCS for each cube:

The difference between the fitted slopes on “large-scales” (below the break position): large_scale_distance.

\[d_{\rm large-scale} = \frac{|\beta_{{\rm LS}, 1} - \beta_{{\rm LS}, 2}|}{\sqrt{\sigma_{\beta_{{\rm LS}, 1}}^2 + \sigma_{\beta_{{\rm LS}, 2}}^2}}\]

\(\beta_{{\rm LS}, i}\) are the slopes of the VCS on large-scales and \(\sigma_{\beta_{{\rm LS}, i}}\) are the uncertainty of the slopes.
The difference between the fitted slopes on “small-scales” (above the break position): small_scale_distance.

\[d_{\rm small-scale} = \frac{|\beta_{{\rm SS}, 1} - \beta_{{\rm SS}, 2}|}{\sqrt{\sigma_{\beta_{{\rm SS}, 1}}^2 + \sigma_{\beta_{{\rm SS}, 2}}^2}}\]

\(\beta_{{\rm SS}, i}\) are the slopes of the VCS on small-scales and \(\sigma_{\beta_{{\rm SS}, i}}\) are the uncertainty of the slopes.
The sum of the differences between the slopes in both regimes: distance

\[d_{{\rm all}} = d_{\rm large-scale} + d_{\rm small-scale}\]
The difference in the fitted break points: break_distance

\[d_{\rm break} = \frac{|b_{1} - b_{2}|}{\sqrt{\sigma_{b_{1}}^2 + \sigma_{b_{2}}^2}}\]

\(b_{i}\) are the break locations of the VCS and \(\sigma_{b_{i}}\) are the uncertainties.

More information on the distance metric definitions can be found in Koch et al. 2017.

Using¶

The data in this tutorial are available here.

We need to import the VCS_Distance class, along with a few other common packages:

>>> from turbustat.statistics import VCS_Distance
>>> from astropy.io import fits
>>> import matplotlib.pyplot as plt
>>> import astropy.units as u

VCS_Distance takes two data cubes as input:

>>> cube = fits.open("Design4_flatrho_0021_00_radmc.fits")[0]  # doctest: +SKIP
>>> cube_fid = fits.open("Fiducial0_flatrho_0021_00_radmc.fits")[0]  # doctest: +SKIP

From the VCS tutorial, we know that limits should be placed on the power-spectra. These limits can be specified with low_cut and high_cut:

>>> vcs = VCS_Distance(cube_fid, cube,
...                    fit_kwargs=dict(low_cut=0.025 / u.pix,
...                                    high_cut=0.1 / u.pix))  # doctest: +SKIP

This will run VCS on the given cubes, which can be accessed as vcs1 and vcs2.

Settings for the VCS fitting can be passed with fit_kwargs, and fit_kwargs2 when different setting are required for the second cube. In this example, we set the fitting limits to be used.

To find the distances between the cube:

>>> vcs.distance_metric(verbose=True)  # doctest: +SKIP
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.993
Model:                            OLS   Adj. R-squared:                  0.992
Method:                 Least Squares   F-statistic:                     3678.
Date:                Fri, 16 Nov 2018   Prob (F-statistic):           1.96e-86
Time:                        11:20:09   Log-Likelihood:                -17.089
No. Observations:                  85   AIC:                             42.18
Df Residuals:                      81   BIC:                             51.95
Df Model:                           3
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.2848      0.295      4.354      0.000       0.698       1.872
x1            -2.1220      0.171    -12.428      0.000      -2.462      -1.782
x2           -14.5354      0.317    -45.812      0.000     -15.167     -13.904
x3             0.0715      0.129      0.553      0.582      -0.186       0.329
==============================================================================
Omnibus:                        2.570   Durbin-Watson:                   0.089
Prob(Omnibus):                  0.277   Jarque-Bera (JB):                2.546
Skew:                          -0.378   Prob(JB):                        0.280
Kurtosis:                       2.616   Cond. No.                         21.5
==============================================================================
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.988
Model:                            OLS   Adj. R-squared:                  0.987
Method:                 Least Squares   F-statistic:                     2212.
Date:                Fri, 16 Nov 2018   Prob (F-statistic):           1.43e-77
Time:                        11:20:09   Log-Likelihood:                -38.551
No. Observations:                  85   AIC:                             85.10
Df Residuals:                      81   BIC:                             94.87
Df Model:                           3
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.5246      0.380      4.014      0.000       0.769       2.280
x1            -1.9578      0.220     -8.908      0.000      -2.395      -1.520
x2           -14.7109      0.408    -36.020      0.000     -15.524     -13.898
x3             0.1178      0.167      0.707      0.482      -0.214       0.449
==============================================================================
Omnibus:                        7.714   Durbin-Watson:                   0.059
Prob(Omnibus):                  0.021   Jarque-Bera (JB):                3.123
Skew:                          -0.127   Prob(JB):                        0.210
Kurtosis:                       2.096   Cond. No.                         21.5
==============================================================================

This function returns a summary of the broken linear fits to the VCS for each cube. The plot shows the VCS for both cubes; in this example, the two are quite similar.

The distances between the cubes, as defined above, are:

>>> vcs.large_scale_distance  # doctest: +SKIP
0.5901343561262037
>>> vcs.small_scale_distance  # doctest: +SKIP
0.01921401163828633
>>> vcs.distance  # doctest: +SKIP
0.60934836776449
>>> vcs.break_distance  # doctest: +SKIP
0.0023172070537929865

The difference in the slopes is dominated by vcs.large_scale_distance, while the small-scale slopes are quite similar. The break locations are also similar and give a small vcs.break_distance.

A pre-computed VCS class can be also passed instead of a data cube. See the distance metric introduction.

References¶

Wavelet Distance¶

See the tutorial for a description of Delta-Variance.

The distance metric for wavelets is Wavelet_Distance. The distance is defined as the t-statistic of the difference between the slopes of the wavelet transforms:

\[d_{\rm slope} = \frac{|\beta_1 - \beta_2|}{\sqrt{\sigma_{\beta_1}^2 + \sigma_{\beta_1}^2}}\]

\(\beta_i\) are the slopes of the wavelet transforms and \(\sigma_{\beta_i}\) are the uncertainty of the slopes.

More information on the distance metric definitions can be found in Koch et al. 2017

Using¶

The data in this tutorial are available here.

We need to import the Wavelet_Distance class, along with a few other common packages:

>>> from turbustat.statistics import Wavelet
>>> from astropy.io import fits
>>> import matplotlib.pyplot as plt
>>> import astropy.units as u

And we load in the two data sets; in this case, two integrated intensity (zeroth moment) maps:

>>> moment0 = fits.open("Design4_flatrho_0021_00_radmc_moment0.fits")[0]  # doctest: +SKIP
>>> moment0_fid = fits.open("Fiducial0_flatrho_0021_00_radmc_moment0.fits")[0]  # doctest: +SKIP

The two images are input to Wavelet_Distance:

>>> wavelet = Wavelet_Distance(moment0_fid, moment0, xlow=2 * u.pix,
...                            xhigh=10 * u.pix)  # doctest: +SKIP

This call will run Wavelet on both of the images, which can be accessed with wt1 and wt2.

In this example, we have limited the fitting regions with xlow and xhigh. Separate fitting limits for each image can be given by giving a two-element list for either keywords (e.g., xlow=[1 * u.pix, 2 * u.pix]). Additional fitting keyword arguments can be passed with fit_kwargs and fit_kwargs2 for the first and second images, respectively.

To calculate the distance:

>>> delvar.distance_metric(verbose=True, xunit=u.pix)  # doctest: +SKIP
                       OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.983
Model:                            OLS   Adj. R-squared:                  0.982
Method:                 Least Squares   F-statistic:                     1013.
Date:                Fri, 16 Nov 2018   Prob (F-statistic):           1.31e-18
Time:                        17:55:59   Log-Likelihood:                 73.769
No. Observations:                  22   AIC:                            -143.5
Df Residuals:                      20   BIC:                            -141.4
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.5636      0.006    267.390      0.000       1.552       1.575
x1             0.3137      0.010     31.832      0.000       0.294       0.333
==============================================================================
Omnibus:                        3.421   Durbin-Watson:                   0.195
Prob(Omnibus):                  0.181   Jarque-Bera (JB):                1.761
Skew:                          -0.397   Prob(JB):                        0.414
Kurtosis:                       1.864   Cond. No.                         7.05
==============================================================================
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.993
Model:                            OLS   Adj. R-squared:                  0.993
Method:                 Least Squares   F-statistic:                     1351.
Date:                Fri, 16 Nov 2018   Prob (F-statistic):           7.76e-20
Time:                        17:55:59   Log-Likelihood:                 75.406
No. Observations:                  22   AIC:                            -146.8
Df Residuals:                      20   BIC:                            -144.6
Df Model:                           1
Covariance Type:                  HC3
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.3444      0.008    158.895      0.000       1.328       1.361
x1             0.4728      0.013     36.752      0.000       0.448       0.498
==============================================================================
Omnibus:                        4.214   Durbin-Watson:                   0.170
Prob(Omnibus):                  0.122   Jarque-Bera (JB):                3.493
Skew:                          -0.958   Prob(JB):                        0.174
Kurtosis:                       2.626   Cond. No.                         7.05
==============================================================================

A summary of the fits are printed along with a plot of the two wavelet transforms and the fit residuals. Colours, labels, and symbols can be specified in the plot with plot_kwargs1 and plot_kwargs2.

The distances between these two datasets are:

>>> wavelet.curve_distance  # doctest: +SKIP
9.81949754947785

A pre-computed Wavelet class can be also passed instead of a data cube. See the distance metric introduction.

References¶