SpikeData ========= This is the documentation for the :code:`SpikeData` which loads spiking data from :code:`Phy` files. This class calculates raw refractory period violations, quality metrics, and loads raw waveforms. Initializing SpikeData ---------------------- Similarly to :code:`StimulusData`, a :code:`file_path` should be given which holds the various :code:`*.npy` files that are created post-Phy curation. For Windows this may need to prepended by an "r" due to escaping. .. code-block:: python from spikeanalysis import SpikeData spikes = SpikeData(file_path = "path/to/data") For Windows I recommend .. code-block:: python spikes = SpikeData(file_path = r"path\to\data") # prevents improper escaping Saving Data ----------- Upon loading a data directory the :code:`CACHING` state is :code:`False` for each :code:`SpikeData` object. This is for the case where the user is memory constrained. In order to save data to be able to reload qc values and waveform values in the future the :code:`CACHING` should be set to true. This is done with :code:`set_caching` .. code-block:: python spikes.set_caching(cache=True) # files will be saved If the user decides after this they don't want to save data or they are just doing exploratory analysis the same method can be used to switching caching to False .. code-block:: python spikes.set_caching(cache=False) # files will not be saved Caching the data is done as either :code:`json` or :code:`.npy` depending on whether the data being saved is a :code:`dict` or a :code:`numpy.array`, respectively. Refractory violations --------------------- A simple calcuation of refractory period violations for each unit based on the :code:`ref_dur_ms` given. Since neurons have refractory periods in which they can't fire units with too many violations are likely poorly separated or mis-curated. Calculated with the function :code:`refractory_violation`. .. code-block:: python spikes.refractory_violation(ref_dur_ms = 2.0) Generating PCs -------------- :code:`Phy` has principal components files, but these files do not update after curation, so the function :code:`generate_pcs` will generate the new PC values based on the manual curation. Once the post-curation PCs have been generated both Isolation Distance (ref) as well as Silhouette Score (ref) can be calculated with :code:`generate_qcmetrics`. .. code-block:: python spikes.generate_pcs() # organize curated data spikes.generate_qcmetrics() # Isolation distance and Silhouette Score Creating a quality control threshold ------------------------------------ Once refractory period violations and qcmetrics have been calculated, a qc threhold can be generated so that only higher quality units are assessed. Importantly there are no strict cutoffs for these values and they should be determined based on the type of analysis and neuronal populations. The mask is generated with :code:`qc_preprocessing`. Importantly to overwrite a previous qc mask the :code:`recurated` can be set to :code:`True`. An additional layer of masking is provided by the :code:`denoise_data`, which removes any units labeled as :code:`noise` in :code:`Phy`. Finally the mask can be applied to the dataset by setting it with :code:`set_qc`. .. code-block:: python spikes.qc_preprocessing(idthres = 10, rpv = 0.01, sil=0.45) spikes.set_qc() Isolation Distance ^^^^^^^^^^^^^^^^^^ Relies on the mahalobnis distance between clusters as a metric of clustering quality. Since this metric utilizes the covariance matrix of cluster distances it helps reduce less significant and highly coordinated PC spaces to reduce the curse of dimensionality as well as the fact that many contacts of the probes are in similar locations and so should have correlated PC spaces. The Isolation Distance relies on the mahalobnis distances and is reported as the smallest mahalobnis distance of the nearest spike not found in the current cluster. Proposed by Harris et al (2001) and equation adapted from Schmitzer-Torbert et al (2005). Isolation distances can vary from :math:`0` to :math:`\infty` with greater distance indicating that clusters are farther apart in PC space and thus likely cleaner clusters. .. math:: {D^2}_{i,C} = (x_i - \mu_{c})^T \Sigma_c^{-1}(x_i - \mu_C) Silhouette Score ^^^^^^^^^^^^^^^^ Silhouette Score is another metric of clustering quality that seeks to determine the goodness of clustering. It is a metric that assesses the pointwise distances between every spike within a cluster compared to every other spike in the cluster as well as every other spike in the nearest other cluster. The basic idea is that for each spike we can determine whether it fits better within its assigned cluster :math:`a(i)` or whether it would have better distance scores to the neighboring cluster :math:`b(i)`: .. math:: a(i) = \frac{1}{|C_K| - 1} \Sigma_{x \in C_K, x \neq i} distance(i, x) b(i) = \frac{1}{|C_L|} \Sigma_{x \in C_L} distance(i, x) The silhouette score is than: .. math:: s(i) = \frac{b(i) - a(i)}{max(a(i), b(i))} For :code:`spikenalysis`, rather than this implementation proposed by Rousseeuw, the simplified silhouette is used as proposed by Hruschka et al. This makes use of the centroid distance rather than pairwise. So, .. math:: distance(i, \mu_{C_K}) The general interpretation of the silhouette score is that :math:`-1` would indicate that a spike was placed in the wrong the cluster whereas a score of :math:`1` is that a spike was put into the correct cluster. We can take the average of all these scores to get the average silhouette score to indicate the overall quality of a cluster. :math:`\frac{1}{n spikes} \Sigma s(i)` Thus similar to the per-spike basis the per-cluster score can vary from :math:`-1` (bad cluster) to :math:`1` (great cluster) with intermediate values. Denoising Data -------------- :code:`Phy` allows for the labeling of the curated data. :code:`spikeanalysis` only uses one of these labels: :code:`noise`. The goal is to remove multiunit and have only "good" units, which in :code:`spikeanalysis` is done with the :code:`pc_metrics` and the :code:`refractory period violations`. But certain types of artifacts (ie optogenetic stimulus) artifacts can actually have great qc metrics since they are so distinct from the "good" units. So in order to remove these high-qc, but artifact-based units, you add a noise label in :code:`Phy` (see Phy instructions) and then run the helper function :code:`denoise_data` to remove anything you want to be removed regardless of quality values. .. code-block:: python spikes.denoise_data() # remove units labeled as Phy noise Denoising data alters the values of :code:`spike_clusters`, :code:`raw_spike_times`, etc, so if testing multiple iterations of different qc metrics one should use :code:`reload_data` to "un"-denoise the data. This allows applying new :code:`qc` masks to the data while iterating through values. For this reason the :code:`run_all` (see below) does not run :code:`denoise_data`. But do note that :code:`SpikeAnalysis` does run :code:`denoise_data` automatically, so if iterating with :code:`SpikeAnalysis` one should run :code:`reload_data` .. code-block:: python spikes.reload_data() Raw waveforms ------------- Although :code:`Phy` has :code:`templates` of each unit sometimes it is beneficial to analyze the raw waveforms of a neuron. This can be accomplished reading the raw waveforms with the function :code:`get_waveforms`. The user can specificy the number of samples around the spike time to load (:code:`Phy` shows 82 samples so this the default) and the number of waveforms can be specified with :code:`n_wfs`. The waveforms will be saved as :code:`.json` if :code:`set_caching` has been run. Once the raw waveforms have been loaded some common values (depth, amplitude half-width) can be calculated with :code:`get_waveform_values`. The :code:`depth` can be optionally specified for depths to be real depth in tissue. If this isn't given then depths are given as distance from the "0" contact of the probe. .. code-block:: python spikes.get_waveforms() spikes.get_waveform_values(depth=1000) Waveform depth ^^^^^^^^^^^^^^ Since sorting is performed relative to the :code:`channel_map`, the raw depth of each channel are given relative to the :math:`0`. Since this is typically the bottom of the probe the values are at the relative to the bottom of the probe. This can easily be corrected. Determining the depth of a spike, though can be calculated a variety of different ways. Currently to keep the code as understandable and maintainable a weighted average is used (faster, slightly less accurate) rather than a PC based approach (slower, slightly more accurate) .. math:: \frac{1}{\Sigma amplitudes} \sum{amplitudes * depths} Waveform amplitudes ^^^^^^^^^^^^^^^^^^^ Since neurons should always have the same amplitude we can assess the variation in amplitudes as a measure of the quality of a neuron. We expect a rather gaussian distribution of amplitudes so using the :code:`get_amplitudes()` function we assess how many spikes fall within a certain std of the waveform data. .. code-block:: python spikes.get_amplitudes(std=2) # 2 std devs Pipeline Function ----------------- For users wanting to use all the functionality of :code:`SpikeData` an easy to use pipeline will run all functions automatically. (This also means the user doesn't need to remember a bunch of function names.) This function is called :code:`run_all` and will request all parameters to be provided. Example below will all values included. .. code-block:: python spikes.run_all( ref_dur_ms=2, # 2 ms refractory period idthres=20, # isolation distance 20--need an empiric number from your data rpv=0.02, # 2% the amount of spikes violating the 2ms refractory period allowed sil=0.45, # silhouette score (-1,1) with values above 0 indicates better and better clustering amp_std=2 # number of std deviations above mean waveform amplitude to look at amp_cutoff=0.98, # percent of neurons which must fall within amp_std deviations of mean waveform recurated= False, # I haven't recurated my data set_caching = True, # I want to save data for future use depth= 500, # probe inserted 500 um deep ) References ---------- See references page.