API

class stella.DownloadSets(fn_dir=None, flare_catalog_name=None)

Downloads the flare catalog and light curves needed for the CNN training, test, and validation sets. This class also reformats the light curves into .npy files and removes the associated FITS files to save space.

download_catalog()

Downloads the flare catalog using Vizier. The flare catalog is named ‘Guenther_2020_flare_catalog.txt’. The star catalog is named ‘Guenther_2020_star_catalog.txt’.

flare_table

Flare catalog that was downloaded.

Type

astropy.table.Table

download_lightcurves(remove_fits=True)

Downloads light curves for the training, validation, and test sets.

Parameters

remove_fits (bool, optional) – Allows the user to remove the TESS light curveFITS files when done. This will save space. Default is True.

download_models(all_models=False)

Downloads the stella pre-trained convolutional neural network models from MAST.

Parameters

all_model (bool, optional) – Determines whether or not to return all 100 trained models or the 10 models used in Feinstein et al. (2020) analysis in the attribute models. Default is False.

model_dir

Path to where the CNN models have been downloaded.

Type

str

models

Array of model filenames.

Type

np.array

class stella.FlareDataSet(fn_dir=None, catalog=None, downloadSet=None, cadences=200, frac_balance=0.73, training=0.8, validation=0.9)

Given a directory of files, reformat data to create a training set for the convolutional neural network. Files must be in ‘.npy’ file format and contain at minimum the following indices:

  • 0th index = array of time

  • 1st index = array of flux

  • 2nd index = array of flux errors

All other indices in the files are ignored. This class additionally requires a catalog of flare start times for labeling. The flare catalog can be in either ‘.txt’ or ‘.csv’ file format. This class will be passed into the stella.neural_network() class to create and train the neural network.

load_files(id_keyword='TIC', ft_keyword='tpeak', time_offset=2457000.0)

Loads in light curves from the assigned training set directory. Files must be formatted such that the ID of each star is first and followed by ‘_’ (e.g. 123456789_sector09.npy).

times

An n-dimensional array of times, where n is the number of training set files.

Type

np.ndarray

fluxes

An n-dimensional array of fluxes, where n is the number of training set files.

Type

np.ndarray

flux_errs

An n-dimensional array of flux errors, where n is the number of training set files.

Type

np.ndarray

ids

An array of light curve IDs for each time/flux/flux_err. This is essential for labeling flare events.

Type

np.array

id_keyword

The column header in catalog to identify target ID. Default is ‘tic_id’.

Type

str, optional

ft_keyword

The column header in catalog to identify flare peak time. Default is ‘tpeak’.

Type

str, optional

time_offset

Time correction from flare catalog to light curve and is necessary when using Max Guenther’s catalog. Default is 2457000.0

Type

float, optional

reformat_data(random_seed=321)

Reformats the data into cadences-sized array and assigns a label based on flare times defined in the catalog.

Parameters

random_seed (int, optional) – A random seed to set for randomizing the order of the training_matrix after it is constructed. Default is 321.

training_matrix

An n x cadences-sized array used as the training data.

Type

np.ndarray

labels

An n-sized array of labels for each row in the training data.

Type

np.array

class stella.ConvNN(output_dir, ds=None, layers=None, optimizer='adam', loss='binary_crossentropy', metrics=None)

Creates and trains the convolutional neural network.

calibration(df, metric_threshold)

Transforming the rankings output by the CNN into actual probabilities. This can only be run for an ensemble of models.

Parameters
  • df (astropy.Table.table) – Table of output predictions from the validation set.

  • metric_threshold (float) – Defines ranking above which something is considered a flares.

create_model(seed)

Creates the Tensorflow keras model with appropriate layers.

model
Type

tensorflow.python.keras.engine.sequential.Sequential

cross_validation(seed=2, epochs=350, batch_size=64, n_splits=5, shuffle=False, pred_test=False, save=False)

Performs cross validation for a given number of K-folds. Reassigns the training and validation sets for each fold.

Parameters
  • seed (int, optional) – Sets random seed for creating CNN model. Default is 2.

  • epochs (int, optional) – Number of epochs to run each folded model on. Default is 350.

  • batch_size (int, optional) – The batch size for training. Default is 64.

  • n_splits (int, optional) – Number of folds to perform. Default is 5.

  • shuffle (bool, optional) – Allows for shuffling in scikitlearn.model_slection.KFold. Default is False.

  • pred_test (bool, optional) – Allows for predicting on the test set. DO NOT SET TO TRUE UNTIL YOU ARE HAPPY WITH YOUR FINAL MODEL. Default is False.

  • save (bool, optional) – Allows the user to save the kfolds table of predictions. Defaul it False.

crossval_predval

Table of predictions on the validation set from each fold.

Type

astropy.table.Table

crossval_predtest

Table of predictions on the test set from each fold. ONLY EXISTS IF PRED_TEST IS TRUE.

Type

astropy.table.Table

crossval_histories

Table of history values from the model run on each fold.

Type

astropy.table.Table

load_model(modelname, mode='validation')

Loads an already created model.

Parameters
  • modelname (str) –

  • mode (str, optional) –

predict(modelname, times, fluxes, errs, multi_models=False, injected=False)

Takes in arrays of time and flux and predicts where the flares are based on the keras model created and trained.

Parameters
  • modelname (str) – Path and filename of a model to load.

  • times (np.ndarray) – Array of times to predict flares in.

  • fluxes (np.ndarray) – Array of fluxes to predict flares in.

  • flux_errs (np.ndarray) – Array of flux errors for predicted flares.

  • injected (bool, optional) – Returns predictions instead of setting attribute. Used for injection-recovery. Default is False.

model

The model input with modelname.

Type

tensorflow.python.keras.engine.sequential.Sequential

predict_time

The input times array.

Type

np.ndarray

predict_flux

The input fluxes array.

Type

np.ndarray

predict_err

The input flux errors array.

Type

np.ndarray

predictions

An array of predictions from the model.

Type

np.ndarray

train_models(seeds=[2], epochs=350, batch_size=64, shuffle=False, pred_test=False, save=False)

Runs n number of models with given initial random seeds of length n. Also saves each model run to a hidden ~/.stella directory.

Parameters
  • seeds (np.array) – Array of random seed starters of length n, where n is the number of models you want to run.

  • epochs (int, optional) – Number of epochs to train for. Default is 350.

  • batch_size (int, optional) – Setting the batch size for the training. Default is 64.

  • shuffle (bool, optional) – Allows for shuffling of the training set when fitting the model. Default is False.

  • pred_test (bool, optional) – Allows for predictions on the test set. DO NOT SET TO TRUE UNTIL YOU’VE DECIDED ON YOUR FINAL MODEL. Default is False.

  • save (bool, optional) – Saves the predictions and histories of from each model in an ascii table to the specified output directory. Default is False.

history_table

Saves the metric values for each model run.

Type

Astropy.table.Table

val_pred_table

Predictions on the validation set from each run.

Type

Astropy.table.Table

test_pred_table

Predictions on the test set from each run. Must set pred_test = True, or else it is an empty table.

Type

Astropy.table.Table

class stella.FitFlares(id, time, flux, flux_err, predictions)

Uses the predictions from the neural network and identifies flaring events based on consecutive points. Users define a given probability threshold for accpeting a flare event as real.

get_init_guesses(groupings, time, flux, err, prob, maskregion, region)

Guesses at the initial t0 and amplitude based on probability groups.

Parameters
  • groupings (np.ndarray) – Group of indices for a single flare event.

  • time (np.array) –

  • flux (np.array) –

  • err (np.array) –

  • prob (np.array) –

Returns

  • tpeaks (np.ndarray) – Array of tpeaks for each flare group.

  • amps (np.ndarray) – Array of amplitudes at each tpeak.

group_inds(values)

Groups regions marked as flares (> prob_threshold) for flare fitting. Indices within 4 of each other are grouped as one flare.

Returns

results – An array of arrays, which are groups of indices supposedly attributed with a single flare.

Return type

np.ndarray

identify_flare_peaks(threshold=0.5)

Finds where the predicted value is above the threshold as a flare candidate. Groups consecutive indices as one flaring event.

Parameters

threshold (float, optional) – The probability threshold for believing an event is a flare. Default is 0.5.

treshold
Type

float

flare_table

A table of flare times, amplitudes, and equivalent durations. Equivalent duration given in units of days.

Type

astropy.table.Table

class stella.MeasureProt(IDs, time, flux, flux_err)

Used for measuring rotation periods.

assign_flag(period, power, width, avg, secpow, maxperiod, orbit_flag=0)

Assigns a flag in the table for which periods are reliable.

averaged_per_sector(tab)
Looks at targets observed in different sectors and determines

which period measured is likely the best period. Adds a column to MeasureRotations.LS_results of ‘true_period_days’ for the results.

Returns

Return type

astropy.table.Table

chiSquare(var, mu, x, y, yerr)
Calculates chi-square for fitting a Gaussian

to the peak of the LS periodogram.

Parameters
  • var (list) – Variables to fit (std and scale for Gaussian curve).

  • mu (float) – Mean to fit around.

  • x (np.array) –

  • y (np.array) –

  • yerr (np.array) –

Returns

Return type

chi-square value.

fit_LS_peak(period, power, arg)

Fits the LS periodogram at the peak power.

Parameters
  • period (np.array) – Array of periods from Lomb Scargle routine.

  • power (np.array) – Array of powers from the Lomb Scargle routine.

  • arg (int) – Argmax of the power in the periodogram.

Returns

popt – Array of best fit values for Gaussian fit.

Return type

np.array

gauss_curve(x, std, scale, mu)
Fits a Gaussian to the peak of the LS

periodogram.

Parameters
  • x (np.array) –

  • std (float) – Standard deviation of gaussian.

  • scale (float) – Scaling for gaussian.

  • mu (float) – Mean to fit around.

Returns

Return type

Gaussian curve.

phase_lightcurve(table=None, trough=- 0.5, peak=0.5, kernel_size=101)

Finds and creates a phase light curve that traces the spots. Uses only complete rotations and extrapolates outwards until the entire light curve is covered.

Parameters
  • table (astropy.table.Table, optional) – Used for getting the periods of each light curve. Allows users to use already created tables. Default = None. Will search for stella.FindTheSpots.LS_results.

  • trough (float, optional) – Sets the phase value at the minimum. Default = -0.5.

  • peak (float, optional) – Sets the phase value t the maximum. Default = 0.5.

  • kernel_size (odd float, optional) – Sets kernel size for median filter smoothing. Default = 15.

phases
Type

np.ndarray

run_LS(minf=0.08, maxf=10.0, spp=50)

Runs LS fit for each light curve.

Parameters
  • minf (float, optional) – The minimum frequency to search in the LS routine. Default = 1/20.

  • maxf (float, optional) – The maximum frequency to search in the LS routine. Default = 1/0.1.

  • spp (int, optional) – The number of samples per peak. Default = 50.

LS_results
Type

astropy.table.Table

class stella.Visualize(cnn, set='validation')

Creates diagnostic plots for the neural network.

confusion_matrix(threshold=0.5, colormap='inferno')

Plots the confusion matrix of true positives, true negatives, false positives, and false negatives.

Parameters
  • threshold (float, optional) – Defines the threshold for positive vs. negative cases. Default is 0.5 (50%).

  • colormap (str, optional) – Colormap to draw colors from to plot the light curves on the confusion matrix. Default is ‘inferno’.

loss_acc(train_color='k', val_color='darkorange')

Plots the loss & accuracy curves for the training and validation sets.

Parameters
  • train_color (str, optional) – Color to plot the training set in. Default is black.

  • val_color (str, optional) – Color to plot the validation set in. Default is dark orange.

precision_recall(**kwargs)

Plots the ensemble-averaged precision recall metric.

Parameters

**kwargs (dictionary, optional) – Dictionary of parameters to pass into matplotlib.