Neuroscience databases - tools for exploring brain structure-function relationships

a Theme Issue organized and edited by Rolf Kötter

published in August 2001 (Vol. 356, Issue 1412) by

Philosophical Transactions of the Royal Society, Series B: Biological Sciences


Topics and summaries:

Imaging | Connectivity | Physiology | Morphology | Molecular | Methods | Models | Perspectives


IMAGING


A probabilistic atlas and reference system for the human brain

John Mazziotta, M.D., Ph.D.
and the Members of the International Consortium for Brain Mapping

 
 

Because the brains of individuals are not the same with regard to structure, function or organization, no single, unique physical representation of the brain that depicts the human species is possible. For the last six years, we have worked to develop a probabilistic atlas and reference system for the human brain that serves both as informatics and neuroscience tools, because it captures, in digital form, the variance of a large population of subjects and includes information about their racial and ethnic backgrounds, education and handedness, personal traits and habits, medical, neurological and psychiatric profiles, structural and functional imaging and DNA for genotyping. The current data structure includes 438 normal subjects between the ages of 20 and 40 and will soon be expanded to 1,000 subjects and possibly beyond. The addition of functional information from fMRI and PET as well as microscopic data on cyto- and chemo-architecture provides new and unique tools and strategies. An important neuroscientific outcome for this program is the ability to examine, for the first time, the stability, relationships and distribution of the micro- and macroscopic structure and function of the human brain. This issue, although a major area of interest, has remained a vexing problem because of the difficulty in obtaining data sets of sufficient magnitude, diversity, number and organization to answer such questions. The resultant data set is organized in four dimensions (three in space and one in time) with an infinite number of potential attributes. Through the consortium structure that we have developed there is a distribution of labor that has been separated into parallel, complementary tasks, executed in such a way as to create a "real world" environment among participants. With this program, differences in equipment, software and protocols actually reflect a microcosm of the larger neuroscience and neuroinformatics communities. We believe that there is value added in such an approach as it allows both individual and summary data and the entry of raw or interpretive data where the user of the resultant database can choose their confidence level by setting a threshold for the type of data to be obtained from a query (e.g., all data for a given location versus only peer-reviewed and independently reproduced data). Such a database, with such a large number of subjects, provides the opportunity for electronic hypothesis generation and comparisons between individuals, experiments and laboratories.



 

Surface Management System: A surface-based database to aid cortical surface reconstruction, visualization and analysis

James Dickson, Heather Drury and David C. Van Essen
Department of Anatomy and Neurobiology, Washington University School of Medicine, St Louis

 
 

Reconstruction of the cerebral cortex and subsequent flattening procedures routinely generate large collections of data in an increasing variety of formats. Viewing a specific dataset from the many possible configurations, requires selection of an appropriate combination of compatible files out of the dozens that may exist for each experimental hemisphere. This complexity reflects the diversity of information needed to specify cortical shape, topology and experimental findings. To address the logistical problems that this imposes, we have developed SuMS, Surface Management System.

SuMS plays four important roles in the surface reconstruction and analysis process.
First, it provides a systematic framework for the classification and storage of all surface, volume and experimental datasets. Second, within this classification, it serves as a version control system for the rapidly evolving surface and volume datasets. Third, with its built-in Database Management System Support, SuMS provides rapid search and retrieval capabilities across all the datasets. Finally, with both client and server side Java implementations, SuMS is in a good position to act as a multi-platform, multi-user "Surface Request Broker" for the community of neuroscientists studying the structure and function of the cerebral cortex.


The fMRIDC: The Challenges and Rewards of Large Scale Databasing of Neuroimaging Studies

John Darrell Van Horn, Jeffrey S. Grethe, Peter Kostelec, Javed A. Aslam, Daniela Rus, Daniel Rockmore, Michael S. Gazzaniga

 
 

The National fMRI Data Center (http://www.fmridc.org) was established in the Autumn of 1999 with the objective of creating a mechanism by which members
of the neuroscience community may more easily share functional neuroimaging data.  Examples in other sciences offer proof of the utility and benefit
that data sharing provides through encouraging growth and development in those fields. By building a publicly-accessible repository of raw
neuroimaging data from peer-reviewed studies, the Data Center expects to create a similarly successful environment for the neurosciences.
In this article, we discuss the continuum of database efforts and provide an overview of the scientific and practical difficulties inherent in managing
various database models. Next, we detail the organization, design, and foundation of the fMRI Data Center, ranging from its current capabilities to
the issues involved in the submitting and requesting of data.  We discuss how a publicly-accessible database enables other fields to develop relevant
tools that can aid in the growth of understanding of cognitive processes. Information retrieval and meta-analytic techniques can be utilized to
search, sort, and categorize study information with a view towards subjecting study data to secondary “meta- and mega-analyses”. In addition,
we discuss the technical and policy choices needed to be addressed in the formation of the Data Center. Among others, these include: human subject
confidentiality issues; the ensuring of investigator's rights; heterogeneous data description and organization; the development of search tools; and data
transfer issues. We conclude with comments concerning the future of the fMRI Data Center effort, its role in promoting the sharing of neuroscientific
data, and how this may alter the manner in which studies are published.


CONNECTIVITY


X-Anat: A graphical database for storing and analyzing information on neuroanatomical connections

William A. Press
Dept. of Psychology, Stanford University, Stanford, CA 94305
Bruno A. Olshausen
Dept. of Psychology and Center for Neuroscience, 1544 Newton Ct., University of California, Davis, Davis, CA 95616
David C. Van Essen
Dept. of Anatomy and Neurobiology, Washington University School of Medicine, 660 S. Euclid Ave., St. Louis, MO 63110

 
 

We have developed a graphical anatomical database program, X-anat, that allows the results of numerous studies on neuroanatomical connections to be stored, compared, and analyzed in a standardized format.Data are entered into the database by drawing injection and label sites from a particular tracer study directly onto canonical representations of the neuroanatomical structures of interest, along with providing descriptive text information.Searches may then be performed on the data by querying the database graphically, for example by specifying a region of interest within the brain for which connectivity information is desired, or via text information such as keywords describing a particular brain region or an author name or reference.Analyses may also be performed by accumulating data across multiple studies and displaying a color coded map that graphically represents the total evidence for connectivity between regions.Thus, data may be studied and compared free of areal boundaries (which often vary from one lab to the next), and instead with respect to standard landmarks, such as the position relative to well known neuroanatomical substrates, or stereotaxic coordinates.If desired, areal boundaries may also be defined by the user to facilitate the interpretation of results.We demonstrate the application of the database to the analysis of pulvinar-cortical connections in the macaque monkey, for which the results of over 120 neuroanatomical experiments were entered into the database.We show how these techniques can be used to elucidate connectivity trends and patterns that may otherwise go un-noticed.The database software may be obtained from http://redwood.ucdavis.edu/bruno/xanat/xanat.html.


Advanced database methodology for the Collation of Connectivity data on the Macaque brain (CoCoMac)

Klaas Stephan, Lars Kamper, Ahmet Bozkurt, Gully Burns, Malcolm Young, Rolf Kötter
C. + O. Vogt Brain Research Institute, Heinrich Heine University, Universitätsstr. 1, D-40225 Düsseldorf, Germany

 
 

Driven by the necessity of integrating the ever increasing amount of data on the mammalian brain, several ambitious neuroscientific database projects have been started during the last decade. Databases on anatomical connectivity as delivered by tracing studies play a particularly important role as these data characterize the structural constraints of the complex and poorly understood functional interactions in real neural systems. Available connectivity databases have already made possible important analyses of anatomical brain circuitry in various species and opened exciting new ways to interpret functional data, both from electrophysiological and functional imaging studies. The eventual impact and success of connectivity databases, however, will be determined by the resolution of methodological problems that currently still limit their use. These problems comprise four main points: (i) objective representation of coordinate-free, parcellation-based data, (ii) assessment of the reliability and precision of individual data, especially in case of contradictory reports, (iii) data-mining in large sets of partially redundant and contradictory data, (iv) automatized and reproducible transformation of data between incongruent brain maps (the "parcellation problem"). In this article, we analyze potential solutions to these problems, and present the specific implementation of a database on the cortical connectivity of the Macaque (CoCoMac; http://www.cocomac.org). The design of this database focuses especially on the needs of both experimental and computational neuroscientists to perform flexible data-mining of the great amount of experimental data published by tracing studies. The efficiency and flexibility of our approach is demonstrated by analyses of the cortico-cortical and thalamo-cortical network in the Macaque monkey.


PHYSIOLOGY


Dynamic publication model for neurophysiology databases

Gardner, D., Abato, M., Knuth, K.H, DeBellis, R., and Erde, S.M.
Dept. of Physiology, Weill Medical College of Cornell University, New York, NY 10021 USA

 
 

We have implemented a pair of database projects, one serving cortical electrophysiology and the other invertebrate neurons and recordings. The design for each combines aspects of two proven schemes for information interchange. The journal article metaphor determined the type, scope, organization, and quantity of data to comprise each submission. Sequence databases encouraged intuitive tools for data viewing, capture, and direct submission by authors. Transcending these models, neurophysiology additionally requires new datatypes and benefits from dynamic data viewers that function like a virtual oscilloscope. Datatypes, chiefly timeseries, histogram, and bivariate, and illustration-like wrappers, were selected by utility to the community of investigators. Functional and anatomical characteristics specify neurons. Searches are via visual interfaces to sets of controlled-vocabulary trees of values to neurophysiological metadata attributes; in neuroscience, where interpretation of recordings is heavily context-dependent, such metadata also supplement datasets. Permanence is advanced by data model and data formats largely independent of contemporary technology; the projects rely only on Java and the new XML standard, itself implementation-dependent. All user tools are Java-based, free, multiplatform, and distributed by our application server to any contemporary networked computer. Copyright is retained by submitters; viewer displays are dynamic and do not violate copyright of related journal figures. Panels of neurophysiologists view and test schemas and tools, enhancing community support.



 

MORPHOLOGY


Local and global approaches in computational neuroanatomy

Giorgio Ascoli, Jeffrey Krichmar, Slawomir Nasuto, and Steven Senft
Krasnow Institute for Advanced Study at George Mason University

 
 

It is generally assumed that the variability of neuronal morphology has an important effect on the connectivitity and response within the nervous system, but this effect has not been thoroughly investigated. Neuroanatomical archives represent a crucial tool to explore structure-function relationships in the brain. We are developing computational tools to describe, generate, store, and render large sets of three-dimensional neuronal structures in a format that is both compact, quantitative, accurate, and readily accessible to the neuroscientist.

Single-cell neuroanatomy can be characterized quantitatively at several levels. In computer-aided neuronal tracing files, a dendritic tree is described as a series of cylinders ("branches"), each represented by diameter, spatial coordinates (x, y, and z), and the connectivity to other branches in the tree. This "Cartesian" description constitutes a completely accurate mapping of dendritic morphology, but it bears little "intuitive" information for the neuroscientist (e.g. it is difficult to establish the morphological class of a neuron by simply looking at its Cartesian file). In a classical neuroanatomical analysis, in contrast, neuronal dendrites are characterized on the basis of the statistical distributions of morphological parameters, e.g. maximum branching order or bifurcation asymmetry. This description is intuitively more accessible, but it only yields information on the collective anatomy of a group of dendrites, i.e. it is not complete enough to provide a precise "blueprint" of the original data. We are adopting a third, intermediate level of description, which consists of the algorithmic "generation" of neuronal structures within a certain morphological class based on a set of measured parameters. Given the right algorithm, these "fundamental" parameters describe that morphological class as intuitively as in classical neuroanatomical analysis (because their statistical distributions have an intuitive geometrical meaning), and as completely as in the Cartesian format (because they are sufficient to generate and display complete neurons). Since fundamental parameters measured from experimental data result in statistical distributions, the algorithms that generate "virtual neurons" sample values from these distributions stochastically. As a result, like in nature, no two virtual neurons are identical, even if they belong to a recognizable anatomical class.

This "computational" approach to neuroanatomy, originally proposed in the 70's, has only recently become a viable strategy thanks to the exceptional improvement of computer hardware, software, and graphics. The advantages of the "algorithmic" description of neuronal structure are immense. If an algorithm can measure the values of a handful of parameters from an experimental database and generate virtual neurons that are anatomically indistinguishable from their real counterparts, a great deal of data compression and amplification can be achieved. Data compression results from the ability to describe quantitatively and completely thousands of neurons from a morphological class with just a few statistical distributions of fundamental parameters. Data amplification is possible because, from a set of experimental neurons, many more virtual analogs can be generated. This approach could allow in principle to create and store a neuroanatomical database containing data for an entire human brain in a personal computer.

Two major types of algorithms have been proposed for the generation and description of dendritic trees. Local algorithms rely entirely on a set of local rules correlating morphological parameters (such as branch diameter and length) to let each branch grow independent of the other dendrites in the tree and independent of its absolute position within the tree. In global algorithms, new dendritic branches are dealt "from outside" to competing groups of growing tips, also depending on their position in the tree (e.g. on their distance from the soma). Local and global algorithms offer complementary advantages. Local algorithms are simpler and more intuitive, and their fundamental parameters can be measured directly from experimental data. Because of their small number of parameters, they are perfectly suited to study structure/function relationship and the origin of emergent properties (i.e. anatomical parameters that are not explicitly imposed in the algorithm). Global algorithms are usually more flexible and overall accurate, but many of their fundamental parameters must be obtained through extensive and elaborate parameter searches. Global algorithms can be also extended to generate populations of interconnected neurons (networks), instead of single neurons. We are developing two programs, L-Neuron and ArborVitae, which implement several global and local algorithms, to investigate systematically the potential of the "computational neuroanatomy" approach for neuroscience databases. We virtually generated anatomically plausible neurons for several morphological classes, including cerebellar Purkinje cells, hippocampal pyramidal and granule cells, and spinal cord motoneurons.


MOLECULAR


How to handle the wealth of receptor subunit sequences and get the most from their comparisons: The Ligand Gated Ion Channel database (LGICdb)

Nicolas Le Novere* and Jean-Pierre Changeux$
*Dpt of Zoology, University of Cambridge, Downing street, Cambridge CB2 3EJ, UK
$Neurobiologie Moleculaire, CNRS URA D1284, Institut Pasteur, 75724 Paris, France

 
 

The extracellularly activated ligand gated ion channels (LGIC) are polymeric ionotropic receptors to neurotransmitters. These LGIC constitute superfamilies of receptors formed by homologous subunits. The last two decades revealed an unexpected wealth of genes coding for these subunits. Multiple comparisons of sequences proved to be an invaluable tool in modern pharmacological investigations. From the study of regulation of gene expression to the understanding of protein structure-function relationships, almost each design of experiment involves a step of sequence comparisons. In addition, the careful analysis of known sequences may lead to the cloning of new genes. Unfortunately, although of outstanding importance, the general sequence databases suffer from several imperfections due to their size and their widespread purpose. Each gene is often represented by multiple entries, multiplicity generated from intrinsic causes, such as alternative splicing or editing, from methodology, cDNA versus genomic cloning, but also from competition between laboratories, each submitting its own clone. In addition, unwanted errors are sometimes made during the submission process. There is therefore room for expert-maintained databases, of restricted focus but higher quality, where the knowledge of the research field help to filter the huge amount of data generated. The Ligand Gated Ion Channel database (LGICdb) has been developed to handle the growing wealth of cloned LGIC subunits. The database aims to provide only one entry for each gene, containing annotated nucleic acid and protein sequences. The release 3 of the LGICdb contained 266 subunits entries belonging to 28 different species and covering three groups of receptors: the superfamily of pentameric LGIC (nicotinic, 5-HT3, GABA A and C, glycine, and anionic glutamate receptors), the cationic tetrameric glutamate receptors (AMPA, kainate and NMDA receptors) and the trimeric ATP P2X receptors. In addition to the gene entries, the database provides multiple sequence alignments, phylogenetic investigations, and atomic coordinates when available. The LGICdb is accessible via the worldwide web (http://www.pasteur.fr/recherche/banques/LGIC/LGIC.html), where it is continuously updated.


METHODS


NeuroScholar and Knowledge Mechanics: a computational framework to manage and manipulate information from the published literature

Gully Burns
University of Southern California, 3614 Watt Way, Los Angeles, CA90089-2520, USA

 
 

In any scientific discipline, the role of the published literature is to provide an 'intellectual environment' for a domain of knowledge, representing the totality of both experimental evidence and theoretical explanation in articles, books and reviews. Neuroscientists suffer greatly from information overload, due to the extent, the complexity, as well as the complicated taxonomy of their subject. We describe a paradigm for knowledge management systems called 'Knowledge Mechanics' that implements a versatile, generally applicable framework for the management and manipulation of information that is represented in a distributed literature. We have built a system that implements this paradigm for the systems-level neuroscientific literature called 'NeuroScholar'. This system allows neuroscientists to interact with the information in the literature in roughly the same way that an application programming interface ('API') allows software engineers to manipulate data constructs and subroutines. The system is design to provide the following functionality: (1) to represent the contents of the system's target literature accurately; (2) to permit users to interpret the literature according to their own judgement, producing a personalized representation; (3) to provide mechanisms to allow users to merge, share and compare their individualized representations; (4) to cross-reference the data in order to identify contradictions and discrepancies between different personalized representations and (5) to provide data-analysis tools that help us to form more powerful interpretations of the literature. Within this paper, we describe Knowledge Mechanics and NeuroScholar in detail, both conceptually and practically. We describe the history of the project and present worked examples illustrating how the system may be used.


CANTOR - a system for the dynamical storage and analysis of complex biological data

Mark A. O'Neill1 and Claus C. Hilgetag2
1Digital Vision, Park Road, Didcot, Oxfordshire OX11 8QY, UK
2Boston University School of Medicine, Department of Anatomy and Neurobiology, 700 Albany Street W746, BOSTON, MA 02118, USA

 
 

Many problems in analytical biology, like the classification of species, the analysis of metabolic or neural networks, or the modelling of macromolecules, involve complex relational data. Here we describe a novel software system, CANTOR, which has been developed to deal effectively with such data, and which can also be used as a general development tool for intelligent database applications. Although the system grew out of a specific project in the analysis of neuroanatomical connectivity, it can be applied to a wide range of relational data. Principal elements of the CANTOR system are a database of dynamic objects, as well as a set of library functions which can perform various operations on these objects. The objects possess attributes that define the objects' characteristics as well as their relationships to other objects within the database. Most of the object relationships are dynamically maintained and updated by the objects themselves, thus providing a flexible, efficient and constantly updated data representation. The CANTOR library routines allow modifications of object attributes as well as the rearrangement of objects in the database. This restructuring can be evaluated by a large variety of user-defined cost functions and can be guided by optimisation algorithms, providing a flexible and powerful tool for the structural analysis of the database content. The application of optimisation approaches also makes it possible for the CANTOR system to deal effectively with incomplete and inconsistent data. A prototypical form of CANTOR has been coded and has subsequently been used successfully in the analysis of anatomical and functional mammalian brain connectivity, involving complex and inconsistent experimental data. In addition, it has been used for solving multivariate engineering optimisation problems. CANTOR has been programmed using the ANSI-C language and is thus architecture-independent. The software is supported by systems libraries which allow multi-threading (the concurrent processing of several database operations), as well as the distribution of the dynamic data objects and library operations over several computers at once. These attributes make the system easily scalable and in principle allow the representation and analysis of arbitrarily large sets of relational data.


MODELS



 

Neuromorphic hardware databases for exploring structure­function relationships in the brain

Catherine Breslin
Department of Computing Science and Mathematics, University of Stirling, Stirling FK9 4LA, Scotland

 
 

Neuromorphic hardware is the term used to describe full custom designed integrated circuits, ICs or silicon ``chips'', that are the product of neuromorphic engineering ­ a methodology for the synthesis of biologically­inspired systems such as retinae, cochleas, oculomotor responses and central pattern generators, but also for the replication of neurons and functional circuits of neurons to provide tools for the analysis of the workings of the nervous system, including structure­function relationships. Neuromorphic hardware can be constructed with either digital or analogue circuitry or with a hybrid of the two. Currently, most examples of this type of hardware are constructed using analogue circuits. The correspondence between these circuits and neurons, or functional circuits of neurons, can exist at a number of levels. At the smallest scale, the correspondence is between populations of ion channels, either synaptic or non­synaptic, and types of field­effect transistors, whilst the resistive and capacitive properties of the neuronal membrane can be represented with extrinsic devices, or with the intrinsic properties of the materials from which transistors are com­ posed: doped silicon and polysilicon. This allows silicon ``neurons'' to be built, with dendritic, somatic and axonal structures and endowed with ionic and synaptic properties. Examples of structure­function relationships already explored using neuromorphic hardware include directional selectivity, sublinear summation and temporal coding. Establishing databases for this hardware is valuable for two reasons: firstly, independently of neuroscientific motivations, the field of neuromorphic engineering would benefit greatly from a resource in which circuit designs could be stored in a form appropriate for reuse and refabrication. Analogue designers would benefit particularly from such a database, as there are no equivalents to the algorithmic design methods available to designers of digital circuits. Secondly, and more importantly for the purpose of this theme issue, is the possibility of databases of silicon neuron designs replicating specific neuronal types and morphologies. Especially if an automated process for translating morphometric data directly into layout compatible format were to be developed. The question that needs to be addressed is: what could a neuromorphic hardware database contribute to the wider neuroscientific community that a conventional database could not? The answer is that neuromorphic hardware is expected to provide analogue sensory­motor systems for interfacing the computational power of symbolic, digital systems with the external, analogue environment. It is also expected to contribute to ongoing work in ``living silicon'' and neural prosthetics. This, combined with the possibility of evolving the hardware in the form of analogue field programmable gate arrays, creates the need for a database to be established and it would be advantageous to set about this whilst the field is relatively young. The paper will outline a framework for the construction of a neuromorphic hardware database, for use at the stage when neuromorphic design can actively contribute to, as well as being informed by, the biological exploration of structure­function relationships.


NeuroML: Model Description Methods for Collaborative Modeling in Neuroscience

Nigel. H. Goddard, Fred Howell, Hugo Cornelis, Michael Hucka, Dave Beeman

 
 

Neuronal systems are complex and models of these systems correspondingly complex. We describe methodologies which will improve the ability of neuroscientists to collaborate in the modeling process. It is crucial for modelers to have access to tools which support discussion, development and exchange of models and components of models. We report our findings on the requirements on these tools and our proposal for structuring their development. The desirability of declarative methods for describing models is discussed. We show the equivalence of this form to object-oriented class design and database schema definition (collectively called templates). We introduce a template hierarchy sufficient to describe models from membrane to network levels. The templates support both a database of models and simulation of models.


PERSPECTIVES


Neuroscience databases: Tools for exploring brain structure function relationships

Rolf Kötter
Heinrich Heine University Düsseldorf, Germany

 
 

Faster than ever, neuroscience is generating vast amounts of data that await cross-referencing, comparison, integration and interpretation in the endeavour to unravel the mechanisms of the brain. The complex, diverse and distributed nature of these data requires the development of advanced neuroinformatics methodologies for databases and associated tools that are now beginning to emerge.
Here I present an overview of current issues in the representation, integration and analysis of neuroscience data from molecular to brain systems levels, including issues of implementation, standardisation, management, quality control, copyright, confidentiality and acceptance. Particular emphasis is given to integrative neuroinformatics approaches for exploring structure-function relationships in the brain.



Rolf Kötter 01/2001.