.. _introduction:

Introduction
============

Neuroglia is a suite of scikit-learn transformers to facilitate converting between the canonical data structures used in ephys & ophys:

-	Spike times: a list of timestamps labelled with the neuron that elicited the spike
-	Traces: a 2D array of neurons x time. Aka a “time series”. E.g. calcium traces from 2P. binned spike times. Gaussian smoothed spike times, etc.
-	Tensor: a 3D array of traces aligned to events (events x neurons x time)

scikit-learn transformers
-------------------------

Transformations between these representations are implemented as scikit-learn transformers. This means that they all are defined as objects with “fit” and “transform” methods so that, for example, applying a Gaussian smoothing to a population of spiking data means transforming from a “spike times” structure to a “traces” structure like so:
::

    smoother = neuroglia.spike.Smoother(
        sample_times=np.arange(0,MAX_TIME,0.001), # <- this is the time base that the smoothed traces will be cast onto
        kernel=’gaussian’, # <- this is the kernel that will be used
        tau=0.005, # <- this is the width of the kernel in whatever time base the spike times are in
    )

    smoothed_traces = smoother.fit_transform(SPIKES)

Conforming to the syntax that is expected by the scikit learn API turns these transformers into building blocks that can plug into a scikit learn pipeline. For example, let’s say you wanted to do some dimensionality reduction on the smoothed traces.
::

    from sklearn.decomposition import NMF

    nmf = NMF(n_components=10)
    reduced_traces = nmf.fit_transform(smoothed_traces)

machine learning pipelines
--------------------------

You could also chain these together like so
::

    from sklearn.pipeline import Pipeline

    pipeline = Pipeline([
        (‘smooth’,smoother),
        (‘reduce’, nmf),
    ])

    reduced_traces = pipeline.fit_transform(SPIKES)

And if you wanted to change an analysis step, it just becomes a matter of replacing that piece of the pipeline
::

    from sklearn.decomposition import PCA

    pipeline = Pipeline([
        (‘smooth’,smoother),
        (‘reduce’, PCA(n_components=10)),
    ])

    reduced_traces = pipeline.fit_transform(SPIKES)


event-aligned responses
-----------------------

I’ve also implemented annotating events with event-aligned responses, so I can build an entire decoding pipeline that decodes the stimulus that was presented to a population from (for example)  the peak response in any 10ms bin in a 250ms window after the stimulus onset:
::

    from neuroglia.event import PeriEventSpikeSampler
    from neuroglia.tensor import ResponseReducer
    from sklearn.neighbors import KNeighborsClassifier

    pipeline = Pipeline([
        ('sample', PeriEventSpikeSampler(
            spikes=SPIKES,
            sample_times=np.arange(0.0,0.25,0.01),
            tracizer=Binner,
            )),
        ('reduce', ResponseReducer(method='max')),
        ('classify', KNeighborsClassifier()),
    ])

cross validation of an entire pipeline
--------------------------------------

Then, once this pipeline has defined, we can take advantage of scikit-learn infrastructure for cross validation to do a 4-fold cross validation across stimulus presentations
::

    from sklearn.model_selection import cross_val_score

    X = EVENTS[‘times’]
    y = EVENTS[‘image_id’]

    scores = cross_val_score(pipeline, X, y, cv=4)

These examples illustrate the major features of the package & how the API works.