{
  "nbformat_minor": 0, 
  "nbformat": 4, 
  "cells": [
    {
      "execution_count": null, 
      "cell_type": "code", 
      "source": [
        "%matplotlib inline"
      ], 
      "outputs": [], 
      "metadata": {
        "collapsed": false
      }
    }, 
    {
      "source": [
        "\nIntroduction to neuroglia\n============\n\nNeuroglia is a Python library for analyzing large scale electrophysiology and calcium imaging with scikit-learn machine learning pipelines\n\nApplying modern machine learning techniques to the analysis of neurophysiology data requires the researcher to extract relevant features from the continuous time-varying activity of populations of recorded neurons. For example, to apply supervised classification techniques to population activity for a decoding analysis, the researcher must create a population response vector for each stimulus to decode. Depending on the scientific question and recording modality, this response vector could be the mean calcium signal in a window during each stimulus or the time to first spike after the stimulus onset.\n\nIn neuroglia, these transformations between these core data structures are defined as scikit-learn compatible transformers\u2014Python objects utilizing a standardized fit, transform, and predict methods, allow them to be chained together into scikit-learn pipelines.\n\nNeuroglia helps you transform your data between the three canonical shapes of neurophysiology data:\n\n-\t**Spikes**: a list of timestamps labelled with the neuron that elicited the spike\n-\t**Traces**: a 2D array of neurons x time. For example, calcium traces from 2P, binned spikin activity, or an LFP signal.\n-\t**Tensor**: a 3D array of traces aligned to events (events x neurons x time)\n\nTransformations between these representations are implemented as scikit-learn transformers. This means that they all are defined as objects with \u201cfit\u201d and \u201ctransform\u201d methods so that, for example, applying a Gaussian smoothing to a population of spiking data means transforming from a \u201cspike times\u201d structure to a \u201ctraces\u201d structure like so:\n\nIn this example, we are going to demonstrate neuroglia's use by implementing a decoding analysis with spikes recorded from a Neuropixels probe in mouse primary visual cortex.\n\n\n\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }, 
    {
      "execution_count": null, 
      "cell_type": "code", 
      "source": [
        "import h5py\nimport pandas as pd\n\nnwb_file = '/allen/programs/braintv/workgroups/nc-ophys/Justin/ephys_single_16.nwb'\n\nwith h5py.File(nwb_file) as f:\n\n    # Load spikes\n    nwb_spikes = {}\n    for unit, unit_data in f['processing/V1/UnitTimes'].items():\n        if unit == 'unit_list':\n            continue\n        else:\n            nwb_spikes[unit] = unit_data['times'][:]\n\n    # Load visual stimuli\n    import pandas as pd\n    events = pd.DataFrame(\n        data=f['stimulus/presentation/natural_images/timestamps'][:],\n        columns=['time','end_time','image_id'],\n    )\n    events['image_id'] = events['image_id'].astype(int)\n    events['duration'] = events['end_time']-events['time']"
      ], 
      "outputs": [], 
      "metadata": {
        "collapsed": false
      }
    }, 
    {
      "source": [
        "Spikes\n------\n\nNeuroglia expects spikes to be stored in a pandas dataframe with one column for spike times and another for neuron labels for each spike recorded and sorted.\n\nTo help transform spikes loaded from NWB files, we use the SpikeTablizer transformer from the `nwb` module.\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }, 
    {
      "execution_count": null, 
      "cell_type": "code", 
      "source": [
        "from neuroglia.nwb import SpikeTablizer\n\nspikes = SpikeTablizer().fit_transform(nwb_spikes)\n\nprint(spikes.head())"
      ], 
      "outputs": [], 
      "metadata": {
        "collapsed": false
      }
    }, 
    {
      "source": [
        "Events\n------\n\nEvents are events that we wish to align neural data to. For example, stimulus presentations, behavioral events, or even other neural events.\n\nIn this example, `events` is a series of visual stimulus presentations, including the time of the presentation and an identifier for each image.\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }, 
    {
      "execution_count": null, 
      "cell_type": "code", 
      "source": [
        "print(events.head())"
      ], 
      "outputs": [], 
      "metadata": {
        "collapsed": false
      }
    }, 
    {
      "source": [
        "PeriEventSpikeSampler\n---------------------\n\nWe want to extract the spikes that were elicited by each event, returning a 3D Tensor\n\nWe do this using the PeriEventSpikeSampler. We initialize it by passing in the spikes we want to sample from.\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }, 
    {
      "execution_count": null, 
      "cell_type": "code", 
      "source": [
        "import numpy as np\nfrom neuroglia.event import PeriEventSpikeSampler\n\nbins = np.arange(0.1, 0.35, 0.01)\n\nspike_sampler = PeriEventSpikeSampler(\n    spikes=spikes,\n    sample_times=bins,\n)"
      ], 
      "outputs": [], 
      "metadata": {
        "collapsed": false
      }
    }, 
    {
      "source": [
        "Now we're ready to sample spikes for each event. We do so using the scikit-learn style `fit_transform()` syntax, passing `events` in as X. The PeriEventSpikeSampler returns a tensor with each event labelled with the timeseries of spiking activity relative to the event.\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }, 
    {
      "execution_count": null, 
      "cell_type": "code", 
      "source": [
        "tensor = spike_sampler.fit_transform(events)\nprint(tensor)"
      ], 
      "outputs": [], 
      "metadata": {
        "collapsed": false
      }
    }, 
    {
      "source": [
        "Output::\n\n    <xarray.DataArray (event: 5950, sample_times: 24, neuron: 69)>\n    array([[[0., 0., ..., 0., 0.],\n            [0., 0., ..., 0., 0.],\n            ...,\n            [0., 0., ..., 0., 0.],\n            [0., 0., ..., 0., 0.]],\n\n           [[0., 0., ..., 0., 0.],\n            [0., 0., ..., 1., 0.],\n            ...,\n            [0., 0., ..., 0., 0.],\n            [0., 0., ..., 0., 0.]],\n\n           ...,\n\n           [[0., 0., ..., 0., 0.],\n            [0., 0., ..., 0., 0.],\n            ...,\n            [0., 0., ..., 0., 0.],\n            [0., 0., ..., 0., 0.]],\n\n           [[0., 0., ..., 0., 0.],\n            [0., 0., ..., 0., 0.],\n            ...,\n            [0., 0., ..., 0., 0.],\n            [0., 0., ..., 0., 0.]]])\n    Coordinates:\n      * neuron        (neuron) object '102' '110' '111' '112' ... '9' '91' '95' '98'\n      * sample_times  (sample_times) float64 0.1 0.11 0.12 0.13 ... 0.31 0.32 0.33\n        time          (event) float64 1.986e+03 1.987e+03 ... 3.473e+03 3.474e+03\n        end_time      (event) float64 1.987e+03 1.987e+03 ... 3.474e+03 3.474e+03\n        image_id      (event) int64 77 88 113 22 54 102 35 ... 9 40 59 15 99 88 97\n        duration      (event) float64 0.2553 0.2527 0.25 ... 0.2504 0.2536 0.2424\n      * event         (event) int64 0 1 2 3 4 5 6 ... 5944 5945 5946 5947 5948 5949\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }, 
    {
      "source": [
        "We can get the total number of elicited spikes count with the `ResponseReducer` transformer\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }, 
    {
      "execution_count": null, 
      "cell_type": "code", 
      "source": [
        "from neuroglia.tensor import ResponseReducer\nreducer = ResponseReducer(func=np.sum)"
      ], 
      "outputs": [], 
      "metadata": {
        "collapsed": false
      }
    }, 
    {
      "source": [
        "Again, we use the scikit-learn style `fit_transform()` syntax to transform the tensor into a 2D array, where each row contains the population response of each event.\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }, 
    {
      "execution_count": null, 
      "cell_type": "code", 
      "source": [
        "population_response = reducer.fit_transform(tensor)\nprint(population_response)\n\n# Output::\n#\n#     <xarray.DataArray (event: 5950, neuron: 69)>\n#     array([[3., 2., 0., ..., 0., 2., 0.],\n#            [3., 0., 0., ..., 0., 3., 1.],\n#            [1., 1., 0., ..., 0., 1., 1.],\n#            ...,\n#            [0., 0., 0., ..., 0., 0., 0.],\n#            [1., 1., 0., ..., 0., 0., 1.],\n#            [2., 0., 0., ..., 0., 0., 1.]])\n#     Coordinates:\n#       * neuron    (neuron) object '102' '110' '111' '112' ... '9' '91' '95' '98'\n#         time      (event) float64 1.986e+03 1.987e+03 ... 3.473e+03 3.474e+03\n#         end_time  (event) float64 1.987e+03 1.987e+03 ... 3.474e+03 3.474e+03\n#         image_id  (event) int64 77 88 113 22 54 102 35 45 ... 13 9 40 59 15 99 88 97\n#         duration  (event) float64 0.2553 0.2527 0.25 0.2504 ... 0.2504 0.2536 0.2424\n#       * event     (event) int64 0 1 2 3 4 5 6 ... 5943 5944 5945 5946 5947 5948 5949"
      ], 
      "outputs": [], 
      "metadata": {
        "collapsed": false
      }
    }, 
    {
      "source": [
        "Pipelines\n---------\nBecause neuroglia transformers are scikit-learn compatible, we can take advantage of scikit-learn features like Pipelines, chaining each transformer together into a single pipeline, implementing feature extraction, normalization, dimensionality reduction, and decoding.\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }, 
    {
      "execution_count": null, 
      "cell_type": "code", 
      "source": [
        "from sklearn.pipeline import Pipeline\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n\npipeline = Pipeline([\n    ('spike_sampler', PeriEventSpikeSampler(spikes=spikes, sample_times=bins)),\n    ('extract', ResponseReducer(func=np.sum)),\n    ('rescale', StandardScaler()),\n    ('classify', LinearDiscriminantAnalysis()),\n])"
      ], 
      "outputs": [], 
      "metadata": {
        "collapsed": false
      }
    }, 
    {
      "source": [
        "We'll use scikit-learn's model_selection module to split our set of events into a training and testing set.\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }, 
    {
      "execution_count": null, 
      "cell_type": "code", 
      "source": [
        "from sklearn.model_selection import train_test_split\n\nX = events[['time','duration']]\ny = events['image_id']\n\nchance = 1.0 / len(y.unique())\n\nX_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.33)"
      ], 
      "outputs": [], 
      "metadata": {
        "collapsed": false
      }
    }, 
    {
      "source": [
        "Next, we'll train the entire pipeline.\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }, 
    {
      "execution_count": null, 
      "cell_type": "code", 
      "source": [
        "pipeline.fit(X_train,y_train)"
      ], 
      "outputs": [], 
      "metadata": {
        "collapsed": false
      }
    }, 
    {
      "source": [
        "Finally we'll test the pipeline's performance on the held out data\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }, 
    {
      "execution_count": null, 
      "cell_type": "code", 
      "source": [
        "# Test decoding held out data\nscore = pipeline.score(X_test,y_test)\n\nprint(\"Decoding accuracy: {:0.2f} (chance: {:0.3f})\".format(score,chance))"
      ], 
      "outputs": [], 
      "metadata": {
        "collapsed": false
      }
    }, 
    {
      "source": [
        "Output::\n\n    Decoding accuracy: 0.37 (chance: 0.008)\n\n"
      ], 
      "cell_type": "markdown", 
      "metadata": {}
    }
  ], 
  "metadata": {
    "kernelspec": {
      "display_name": "Python 2", 
      "name": "python2", 
      "language": "python"
    }, 
    "language_info": {
      "mimetype": "text/x-python", 
      "nbconvert_exporter": "python", 
      "name": "python", 
      "file_extension": ".py", 
      "version": "2.7.12", 
      "pygments_lexer": "ipython2", 
      "codemirror_mode": {
        "version": 2, 
        "name": "ipython"
      }
    }
  }
}