Bahareh Tolooshams, Andrew H. Song, Simona Temereanca, and Demba Ba. Submitted. “Deep Exponential-Family Auto-Encoders.” In Advances in Neural Information Processing Systems 32. arXiv VersionAbstract
We consider the problem of learning recurring convolutional patterns from data that are not necessarily real valued, such as binary or count-valued data. We cast the problem as one of learning a convolutional dictionary, subject to sparsity constraints, given observations drawn from a distribution that belongs to the canonical exponential family. We propose two general approaches towards its solution. The first approach uses the ℓ0 pseudo-norm to enforce sparsity and is reminiscent of the alternating-minimization algorithm for classical convolutional dictionary learning (CDL). The second approach, which uses the ℓ1 norm to enforce sparsity, generalizes to the exponential family the recently-shown connection between CDL and a class of ReLU auto-encoders for Gaussian observations. The two approaches can each be interpreted as an auto-encoder, the weights of which are in one-to-one correspondence with the parameters of the convolutional dictionary. Our key insight is that, unless the observations are Gaussian valued, the input fed into the encoder ought to be modified iteratively, and in a specific manner, using the parameters of the dictionary. Compared to the ℓ0 approach, once trained, the forward pass through the ℓ1 encoder computes sparse codes orders of magnitude more efficiently. We apply the two approaches to the unsupervised learning of the stimulus effect from neural spiking data acquired in the barrel cortex of mice in response to periodic whisker deflections. We demonstrate that they are both superior to generalized linear models, which rely on hand-crafted features.
Andrew H. Song, Francisco J. Flores, and Demba Ba. Submitted. “Fast Convolutional Dictionary Learning off the Grid.” In IEEE Transactions on Signal Processing. arXiv VersionAbstract
Given a continuous-time signal that can be modeled as the superposition of localized, time-shifted events from multiple sources, the goal of Convolutional Dictionary Learning (CDL) is to identify the location of the events--by Convolutional Sparse Coding (CSC)--and learn the template for each source--by Convolutional Dictionary Update (CDU). In practice, because we observe samples of the continuous-time signal on a uniformly-sampled grid in discrete time, classical CSC methods can only produce estimates of the times when the events occur on this grid, which degrades the performance of the CDU. We introduce a CDL framework that significantly reduces the errors arising from performing the estimation in discrete time. Specifically, we construct an expanded dictionary that comprises, not only discrete-time shifts of the templates, but also interpolated variants, obtained by bandlimited interpolation, that account for continuous-time shifts. For CSC, we develop a novel computationally efficient CSC algorithm, termed Convolutional Orthogonal Matching Pursuit with interpolated dictionary (COMP-INTERP). We benchmarked COMP-INTERP to Contiunuous Basis Pursuit (CBP), the state-of-the-art CSC algorithm for estimating off-the-grid events, and demonstrate, on simulated data, that 1) COMP-INTERP achieves a similar level of accuracy, and 2) is two orders of magnitude faster. For CDU, we derive a novel procedure to update the templates given sparse codes that can occur both on and off the discrete-time grid. We also show that 3) dictionary update with the overcomplete dictionary yields more accurate templates. Finally, we apply the algorithms to the spike sorting problem on electrophysiology recording and show their competitive performance.
Andrew H. Song, Francisco Flores, and Demba Ba. Submitted. “Spike sorting by convolutional dictionary learning.” Advances in Neural Information Processing 31. arXiv Version
Javier Zazo, Bahareh Tolooshams, and Demba Ba. Submitted. “Convolutional Dictionary Learning in Hierarchical Networks.” In CAMSAP2019. arXiv VersionAbstract
Filter banks are a popular tool for the analysis of piecewise smooth signals such as natural images. Motivated by the empirically observed properties of scale and detail coefficients of images in the wavelet domain, we propose a hierarchical deep generative model of piecewise smooth signals that is a recursion across scales: the low pass scale coefficients at one layer are obtained by filtering the scale coefficients at the next layer, and adding a high pass detail innovation obtained by filtering a sparse vector. This recursion describes a linear dynamic system that is a non-Gaussian Markov process across scales and is closely related to multilayer-convolutional sparse coding (ML-CSC) generative model for deep networks, except that our model allows for deeper architectures, and combines sparse and non-sparse signal representations. We propose an alternating minimization algorithm for learning the filters in this hierarchical model given observations at layer zero, e.g., natural images. The algorithm alternates between a coefficient-estimation step and a filter update step. The coefficient update step performs sparse (detail) and smooth (scale) coding and, when unfolded, leads to a deep neural network. We use MNIST to demonstrate the representation capabilities of the model, and its derived features (coefficients) for classification.
PDF Version
Bahareh Tolooshams, Sourav Dey, and Demba Ba. Submitted. “Deep Residual Auto-Encoders for Expectation Maximization-based Dictionary Learning.” IEEE Transactions on Neural Networks and Learning Systems. arXiv VersionAbstract
Convolutional dictionary learning (CDL) has become a popular method for learning sparse representations from data. State-of-the-art algorithms perform dictionary learning (DL) through an optimization-based alternating-minimization procedure that comprises a sparse coding and a dictionary update step respectively. Here, we draw connections between CDL and neural networks by proposing an architecture for CDL termed the constrained recurrent sparse auto-encoder (CRsAE). We leverage the interpretation of the alternating-minimization algorithm for DL as an Expectation-Maximization algorithm to develop auto-encoders (AEs) that, for the first time, enable the simultaneous training of the dictionary and regularization parameter. The forward pass of the encoder, which performs sparse coding, solves the E-step using an encoding matrix and a soft-thresholding non-linearity imposed by the FISTA algorithm. The encoder in this regard is a variant of residual and recurrent neural networks. The M-step is implemented via a two-stage back-propagation. In the first stage, we perform back-propagation through the AE formed by the encoder and a linear decoder whose parameters are tied to the encoder. This stage parallels the dictionary update step in DL. In the second stage, we update the regularization parameter by performing back-propagation through the encoder using a loss function that includes a prior on the parameter motivated by Bayesian statistics. We leverage GPUs to achieve significant computational gains relative to state-of-the-art optimization-based approaches to CDL. We apply CRsAE to spike sorting, the problem of identifying the time of occurrence of neural action potentials in recordings of electrical activity from the brain. We demonstrate on recordings lasting hours that CRsAE speeds up spike sorting by 900x compared to notoriously slow classical algorithms based on convex optimization.
PDF Version
Demba Ba. Submitted. “Deeply-sparse signal representations.” IEEE Transactions on Signal Processing. arXiv Version PDF Version
Thomas Chang, Bahareh Tolooshams, and Demba Ba. 10/13/2019. “RandNet: deep learning with compressed measurements of images.” In IEEE 29th International Workshop on Machine Learning and Signal Processing (MLSP 2019). Pittsburgh, PA. arXiv VersionAbstract
Principal component analysis, dictionary learning, and auto-encoders are all unsupervised methods for learning representations from a large amount of training data. In all these methods, the higher the dimensions of the input data, the longer it takes to learn. We introduce a class of neural networks, termed RandNet, for learning representations using compressed random measurements of data of interest, such as images. RandNet extends the convolutional recurrent sparse auto-encoder architecture to dense networks and, more importantly, to the case when the input data are compressed random measurements of the original data. Compressing the input data makes it possible to fit a larger number of batches in memory during training. Moreover, in the case of sparse measurements,training is more efficient computationally. We demonstrate that, in unsupervised settings, RandNet performs dictionary learning using compressed data. In supervised settings, we show that RandNet can classify MNIST images with minimal loss in accuracy, despite being trained with random projections of the images that result in a 50% reduction in size. Overall, our results provide a general principled framework for training neural networks using compressed data.
Andrew H. Song, Leon Chlon, Hugo Soulat, John Tauber, Sandya Subramanian, Demba Ba, and Michael J. Prerau. 4/2019. “Multitaper Infinite Hidden Markov Model for EEG.” In International Engineering in Medicine and Biology Conference (EMBC) 4/2019. Abstract

Electroencephalographam (EEG) monitoring of neural activity is widely used for identifying underlying brain states. For inference of brain states, researchers have often used Hidden Markov Models (HMM) with a fixed number of hidden states and an observation model linking the temporal dynamics embedded in EEG to the hidden states. The use of fixed states may be limiting, in that 1) pre-defined states might not capture the heterogeneous neural dynamics across individuals and 2) the oscillatory dynamics of the neural activity are not directly modeled. To this end, we use a Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM), which discovers the set of hidden states that best describes the EEG data, without a-priori specification of state number. In addition, we introduce an observation model based on classical asymptotic results of frequency domain properties of stationary time series, along with the description of the conditional distributions for Gibbs sampler inference. We then combine this with multitaper spectral estimation to reduce the variance of the spectral estimates. By applying our method to simulated data inspired by sleep EEG, we arrive at two main results: 1) the algorithm faithfully recovers the spectral characteristics of the true states, as well as the right number of states and 2) the incorporation of the multitaper framework produces a more stable estimate than traditional periodogram spectral estimates.

PDF Version
Alexander Lin, Yingzhou Zhang, Jeremy Heng, Stephen A. Allsop, Kay M. Tye, Pierre E. Jacob, and Demba Ba. 2019. “Clustering Time Series with Nonlinear Dynamics: A Bayesian Non-Parametric and Particle-Based Approach.” In International Conference on Artificial Intelligence and Statistics (AISTATS) 2019.Abstract

We propose a general statistical framework for clustering multiple time series that exhibit nonlinear dynamics into an a-priori unknown number of sub-groups. Our motivation comes from neuroscience, where an important problem is to identify, within a large assembly of neurons, subsets that respond similarly to a stimulus or contingency. Upon modeling the multiple time series as the output of a Dirichlet process mixture of nonlinear state-space models, we derive a Metropolis-within-Gibbs algorithm for full Bayesian inference that alternates between sampling cluster assignments and sampling parameter values that form the basis of the clustering. The Metropolis step employs recent innovations in particle-based methods. We apply the framework to clustering time series acquired from the prefrontal cortex of mice in an experiment designed to characterize the neural underpinnings of fear.

PDF Version
Noa Malem-Shiniski, Yingzhuo Zhang, Daniel T. Gray, Sarah N. Burke, Anne C. Smith, Carol A. Barnes, and Demba. Ba. 9/1/2018. “A separable two-dimensional random field model of binary response data from multi-day behavioral experiments.” Journal of Neursocience Methods, 307, Pp. 175-187. Publisher's Version
Bahareh Tolooshams, Sourav Dey, and Demba Ba. 9/2018. “Scalable convolutional dictionary learning with constrained recurrent sparse auto-encoders.” 2018 IEEE 28th International Worskhop on Machine Learning and Signal Processing. Aalborg, Denmark: IEEE. arXiv Version
Yinghzuo Zhang, Noa Malem-Shinitski, Stephen A. Allsop, Kay Tye, and Demba Ba. 4/2018. “Estimating a separably-Markov random field (SMuRF) from binary observations.” Neural Computation, 30, 4, Pp. 1046-1079. Publisher's Version yingzhuo_zhang.pdf
Stephen A Allsop, Romy Wichmann, Fergil Mills, Anthony Burgos-Robles, Chia-Jung Chang, Ada C. Felix-Ortiz, Alienor Vienne, Anna Beyeler, Ehsan M. Izadmehr, Gordon Glober, Meghan I. Cum, Johanna Stergiadou, Kavitha K. Anandalingham, Kathryn Farris, Praneeth Namburi, Christopher A. Leppla, Javier C. Weddington, Edward H. Nieh, Anne C. Smith, Demba Ba, Emery N. Brown, and Kay M. Tye. 3/3/2018. “Corticoamygdala transfer of socially derived information gates observational learning.” Cell, 173, 6, Pp. 1329-1342. Publisher's Version
Gabriel Schamberg, Demba Ba, and Todd P Coleman. 2/15/2018. “A modularized efficient framework for non-Markov time-series estimation.” IEEE Transactions on Signal Processing, 66, 12. Publisher's Version
Seong-Eun Kim, Michael Behr, Demba Ba, and Emery N. Brown. 1/2/2018. “State-space multitaper time-frequency analysis.” Proceedings of the National Academy of Sciences, 115, 1. Publisher's Version
Noa Shinitski, Yingzhuo Zhang, Daniel T Gray, Sarah N Burke, Anne C Smith, Carol A Barnes, and Demba Ba. 2017. “Can you teach an old monkey a new trick?” Cosyne 2017. shinitski_cosyne_2017.pdf
Yingzhuo Zhang, Noa Shinitski, Stephen Allsop, Kay Tye, and Demba Ba. 2017. “A Two-Dimensional Seperable Random Field Model of Within and Cross-Trial Neural Spiking Dynamics.” Cosyne 2017. zhang_cosyne_2017.pdf
Gabriel Schamberg, Demba Ba, Mark Wagner, and Todd Coleman. 2016. “Efficient low-rank spectrotemporal decomposition using ADMM.” In Statistical Signal Processing Workshop (SSP), 2016 IEEE, Pp. 1–5. IEEE.
Jonathan D Kenny, Jessica J Chemali, Joseph F Cotten, Christa J Van Dort, Seong-Eun Kim, Demba Ba, Norman E Taylor, Emery N Brown, and Ken Solt. 2016. “Physostigmine and Methylphenidate Induce Distinct Arousal States During Isoflurane General Anesthesia in Rats.” Anesthesia and analgesia.
Gabriela Czanner, Sridevi V Sarma, Demba Ba, Uri T Eden, Wei Wu, Emad Eskandar, Hubert H Lim, Simona Temereanca, Wendy A Suzuki, and Emery N Brown. 2015. “Measuring the signal-to-noise ratio of a neuron.” Proceedings of the National Academy of Sciences, 112, 23, Pp. 7141–7146.