Introduction

Neuroglia is a suite of scikit-learn transformers to facilitate converting between the canonical data structures used in ephys & ophys:

  • Spike times: a list of timestamps labelled with the neuron that elicited the spike
  • Traces: a 2D array of neurons x time. Aka a “time series”. E.g. calcium traces from 2P. binned spike times. Gaussian smoothed spike times, etc.
  • Tensor: a 3D array of traces aligned to events (events x neurons x time)

scikit-learn transformers

Transformations between these representations are implemented as scikit-learn transformers. This means that they all are defined as objects with “fit” and “transform” methods so that, for example, applying a Gaussian smoothing to a population of spiking data means transforming from a “spike times” structure to a “traces” structure like so:

smoother = neuroglia.spike.Smoother(
    sample_times=np.arange(0,MAX_TIME,0.001), # <- this is the time base that the smoothed traces will be cast onto
    kernel=’gaussian’, # <- this is the kernel that will be used
    tau=0.005, # <- this is the width of the kernel in whatever time base the spike times are in
)

smoothed_traces = smoother.fit_transform(SPIKES)

Conforming to the syntax that is expected by the scikit learn API turns these transformers into building blocks that can plug into a scikit learn pipeline. For example, let’s say you wanted to do some dimensionality reduction on the smoothed traces.

from sklearn.decomposition import NMF

nmf = NMF(n_components=10)
reduced_traces = nmf.fit_transform(smoothed_traces)

machine learning pipelines

You could also chain these together like so

from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    (‘smooth’,smoother),
    (‘reduce’, nmf),
])

reduced_traces = pipeline.fit_transform(SPIKES)

And if you wanted to change an analysis step, it just becomes a matter of replacing that piece of the pipeline

from sklearn.decomposition import PCA

pipeline = Pipeline([
    (‘smooth’,smoother),
    (‘reduce’, PCA(n_components=10)),
])

reduced_traces = pipeline.fit_transform(SPIKES)

event-aligned responses

I’ve also implemented annotating events with event-aligned responses, so I can build an entire decoding pipeline that decodes the stimulus that was presented to a population from (for example) the peak response in any 10ms bin in a 250ms window after the stimulus onset:

from neuroglia.event import PeriEventSpikeSampler
from neuroglia.tensor import ResponseReducer
from sklearn.neighbors import KNeighborsClassifier

pipeline = Pipeline([
    ('sample', PeriEventSpikeSampler(
        spikes=SPIKES,
        sample_times=np.arange(0.0,0.25,0.01),
        tracizer=Binner,
        )),
    ('reduce', ResponseReducer(method='max')),
    ('classify', KNeighborsClassifier()),
])

cross validation of an entire pipeline

Then, once this pipeline has defined, we can take advantage of scikit-learn infrastructure for cross validation to do a 4-fold cross validation across stimulus presentations

from sklearn.model_selection import cross_val_score

X = EVENTS[‘times’]
y = EVENTS[‘image_id’]

scores = cross_val_score(pipeline, X, y, cv=4)

These examples illustrate the major features of the package & how the API works.