Pierre Baldi (PhD '86), Computer Scientist and Explorer of the Natural-Artifical Intelligence Interface
What can the brain teach us about artificial intelligence, and what can artificial intelligence teach us about the brain? Since the time of John Hopfield's pioneering work at Caltech in neural networks and his development of the Hopfield Model, which revolutionized our understanding of information storage and retrieval, this question has become one of the most interesting and consequential research avenues in science. Particularly exciting over the past three decades has been the parallel advances of neuroscience and neurobiology, and the rise of computational power and the increasing capabilities of machine learning. While these are discrete fields, some of the most important work is happening at the interface, and the research benefits both in natural and in artificial intelligence has been astounding. At the center of these developments is Caltech alumnus and UC Irvine Distinguished Professor Pierre Baldi, whose educational trajectory and research achievements have had - and will have for decades to come - a major impact on what intelligence means and what it can accomplish.
In the discussion below, Baldi recounts the liberating feeling he experienced upon his arrival at Caltech. As a university student in Paris, Baldi felt too confined by the hierarchical culture. At Caltech, where it was easy to interact meaningfully with luminary professors, and where the administrative boundaries between disciplines was essentially non-existent, Baldi was free to pursue his interests across mathematics, biology, computer science, and dynamical systems. As a postdoctoral scholar at UC San Diego, Baldi gained valuable perspective and new understanding of the conceptual power of algorithms, and his dual appointments at JPL and in the Division of Biology at Caltech cemented a research agenda that would toggle effortlessly between human and machine intelligence.
Baldi joined the faculty at UC Irvine with a clearly established vision to formalize and strengthen a campus-wide interest in interdisciplinarity. With his appointments in the School of Information and Computer Sciences, the Center for Machine Learning and Intelligent Systems, and the Institute for Genomics and Bioinformatics, Baldi is leading research that spans basic and applied science, and biology and computers, all within a broader framework that seeks to universalize the benefits of deep learning across science, mathematics, and engineering.
The question of what exactly artificial intelligence is, and how it will affect society in the future (for better or worse), has become a topic of intense public interest. As a leading practitioner in the field, Baldi has also assumed a role as public intellectual. In addition to over 400 hundred scholarly publications, Baldi is a popular lecturer and the author of landmark texts that explore evolution, deep learning, and bioinformatics, which grapples with the study of biology in our age of big data. At the end of the interview, Baldi reflects on the latest advances of Chat GPT, which he contends has already passed the Turing Test; meaning, that the age of being unable to distinguish human from computer communication has already arrived. The implications for this are profound and extend beyond the narrow question of computational power. It is a comforting thought to know of Baldi's ongoing dedication to understand - and ultimately improve - the machine-human intelligence interface.
Interview Transcript
DAVID ZIERLER: This is David Zierler, Director of the Caltech Heritage Project. It's Thursday, December 21, 2023. It is my great privilege to be here with Professor Pierre Baldi. Pierre, it's wonderful to be with you. Thank you so much for joining me today.
PIERRE BALDI: Thank you for having me.
ZIERLER: Pierre, to start, would you please tell me your title and institutional affiliations? You'll notice I pluralized that, because I know you have many.
BALDI: My official title is Distinguished Professor with a primary appointment in the Department of Computer Science at the University of California, Irvine. That is my primary appointment. I have courtesy appointments in several other departments—in Bioengineering, Mathematics and Statistics. I am also the Director of the Institute for Genomics and Bioinformatics at UCI, which is being replaced by the AI in Science Institute. I am also involved with the Center for Machine Learning and Intelligent Systems at UCI.
ZIERLER: As an understanding of how rare an honor the title Distinguished Professor is, how many are there, roughly, at UC Irvine?
BALDI: The rules have changed over time. I am not sure but something like one percent of the faculty. Currently, we have a system of steps in the UC system at each level; assistant professor, associate professor, full professor. Once you reach the last step, which is full professor step nine, you automatically become a distinguished professor.
Understanding Intelligence in All its Forms
ZIERLER: I want to start at a very broad level. I want to understand how you put all these disciplines together to create some really fundamental and important questions. Let's start first with those questions. What are the most important questions that you've pursued in your career?
BALDI: My long-term vision has always been to try to understand intelligence in brains and in machines. That's my lifetime quest. Of course it is a very difficult problem. It's not well-defined. There is tension and synergy between the biological side and the engineering, the artificial intelligence side. Sometimes there is slow progress in one direction so you move to some other direction, so I've been oscillating between those two branches. The key thing is that on the artificial side, we have been developing a number of techniques that can be applied to anything. The analogy that I like to give is linear regression. Everybody knows what linear regression is, and if you are an expert in linear regression, you can work with economists, with psychologists, with physicists. They give you data points and you draw a line through those points to better understand a particular phenomenon. This is similar to some of what I have been doing. I am an expert in deep learning, in artificial neural networks, and I've been applying those techniques to data from different areas of science, whether the life sciences or more recently physics and chemistry. That is why I have been able to work on a variety of problems. But at its core, my research is focused on understanding natural and artificial intelligence.
ZIERLER: Is it truly a two-way street? In other words, do you look at machines to better understand biological intelligence, and do you look at biological brains in order to better understand machine intelligence?
BALDI: Yes, it's a two-way street, with perhaps a little twist that in recent years the AI side has become so strong and has somewhat diverged from the biological side. If your goal is to build the most intelligent, capable machine today, you don't really have to look to biology to do that.
ZIERLER: What aspects of your research are theoretical and where are you focused more in an experimental sense?
BALDI: By training, as a graduate student at Caltech, I was a mathematician, so I have always kept some theoretical work, and I continue to do that, and I am very excited by it. I usually do that alone or in collaboration with other mathematicians. Then with my graduate students, I tend to work on the more applied problems, where we're getting data from our collaborators at UCI or elsewhere, from different sciences, and we work with them to solve those problems.
Experimental and Simulated Data
ZIERLER: I wonder if you can give a sense of the kind of data that is most useful to you. Where does it come from and what do you do with it?
BALDI: For example, we have several projects with particle physicists. Typically they get data from large instruments, whether it's the Large Hadron Collider at CERN, for example, or the neutrino detector in the NOvA/DUNE Collaboration. They have also simulators, so they can produce simulated data. We receive either experimental data or simulated data, and then in a typical case we apply these AI methods, these neural networks, these deep learning approaches, to try to predict something of interest or to classify the data, et cetera. Another typical example would be biomedical imaging. We work with doctors from the UCI hospitals. They have images, let's say, of colonoscopies, where in some images you have polyps, and in some other images you don't have polyps. We have trained neural networks to classify whether a given image frame contains a polyp or not and to draw a bounding box around the polyp, when a polyp is present. Training can take a few days but once trained these neural networks are very fast—they can process a frame in roughly twenty milliseconds So you can then apply them to process colonoscopy videos, by applying them to every frame in five or so, resulting in systems that can help doctors identify polyps during colonoscopies in real-time.
ZIERLER: You mentioned simulated data. It's almost as much a philosophical as it is a scientific question—for you, are simulations sometimes an end in and of themselves, or do you always require them as a backdrop to experimental data, to data that comes from the real world?
BALDI: Of course you need both, and ultimately you tend to go to the real data, but simulated data is very important, and sometimes it's sufficient for prototyping and for publication. In some cases, simulated data can be quite accurate. In these large collaborative experiments in physics, there are constraints on how you publish things. You first have to show that your method works on simulated data and publish that, and then you may go to real data and your paper suddenly has one thousand authors. There are all kinds of rules and protocols governing those larger collaborations.
ZIERLER: Is any aspect of your research purely applied? Do you think about applications to societal benefit or startup companies, or is your starting point always fundamental research or curiosity-driven research?
BALDI: The majority is curiosity-driven and fundamental research, but I have been involved in startups, and sometimes we do translational work. We have had projects that have led to translational applications that are used by others.
ZIERLER: When are you inspired to do a startup? When do you see that there might be commercial value to the research that you're doing?
BALDI: It really depends on a case-by-case basis, and a lot depends also on your collaborators. You need experts from the application field. If it's a medical application, of course you have to work with doctors and understand what their constraints are. How well does the system that you have developed perform? Can it be translated into a clinical application?
ZIERLER: What are some of the most important funding sources for you? What are the federal agencies or even private foundations that are most important in supporting your work?
BALDI: The primary ones have been NSF—the National Science Foundation—and NIH—the National Institutes of Health. Those have been the primary ones by far. Occasionally I have had other sources, but I would say mostly NSF and NIH.
ZIERLER: What have been some of the most important technological advances over the course of your career that have not only allowed you to perform research more efficiently but maybe do things that weren't even possible in earlier generations?
BALDI: There are two main ones that are obvious, and they are true for the entire field of AI. One is computing power. The things we can do today were impossible in the 1980s. The algorithms that we use today are not very different from what we already had in the 1980s, but the computing power is one million times greater today, especially with graphical processing units, GPUs. I would say that's number one. Number two is data, the availability of large amounts of data of different kinds, because of the internet, because of databases, because of new sensors and so forth. Data and computing power have been the two main driving factors behind progress in AI and its applications to the sciences.
AI as a Subset of Machine Learning
ZIERLER: There are so many questions about AI and machine learning. Let's just start first with the terminology. Artificial intelligence and machine learning, are these interchangeable for you, and if not, where are the meaningful distinctions?
BALDI: The truth is that today I would say 90 percent of artificial intelligence is machine learning. Machine learning can be viewed as a subfield of AI, but it's pretty much most of AI today. There are a few things that are not machine learning or that people would classify as not being machine learning, but the majority of AI is machine-learning-based. Within machine learning, 90 percent of machine learning I would say is deep learning, or what used to be called neural networks in the early 1980s, at Caltech among other places.
ZIERLER: The 10 percent where there isn't that overlap, what is artificial intelligence absent machine learning?
BALDI: Systems that are rule-based, where one is trying to establish a number of rules that guide the behavior of an intelligent system in a certain domain. That would be a typical example of something that people consider as non-machine-learning based. Some people have tried to build systems that learn the rules from data. In general, this has not worked so well on a large scale. There are a few examples of rule-based systems, but overall this approach does not work so well. What has worked well is deep learning, the machine learning approach.
ZIERLER: Within machine learning you mentioned deep learning. What is the threshold? What is required within machine learning to get to deep learning?
BALDI: Deep learning is basically learning using networks of simplified neurons with connections that can be adjusted. The main distinction I would say is between shallow learning and deep learning. In shallow learning, there are only input and output neurons. The prototypical case is linear regression, which I mentioned at the beginning. Deep learning is when there are hidden neurons in the network, neurons that are not in direct contact with the inputs or the outputs of the system These neurons have to learn their connections somewhat in the dark. That's the notion of deep learning. How do you adjust the synaptic weights, the parameters, of a neuron that is deep inside a forest of connection and doesn't see the outside world, the inputs or the outputs? That's the central mystery behind deep learning.
Connecting Neuroscience and Algorithms
ZIERLER: What are the pathways to overcoming or to shedding light on that central mystery?
BALDI: That's a very good question. It is first very useful to think about the brain. To see the importance of the problem, let's think about the brain, and let's think about a synapse deep inside the brain. Imagine you are trying to learn how to play tennis or how to play the violin. The brain has to decide whether this little synapse should strengthen or weaken itself so that you play better tennis or better violin. Synapses are very small, and you are probably not used to thinking about those very small scales, so I'm going to rescale everything by a factor of a million. If you rescale things by a million, a synapse becomes 10 centimeters in size. That's about the size of a fist. Then, where is the tennis racket? Well, it's one meter away, the length of your arm. You multiply this by one million; you get a thousand kilometers. So, you are in Pasadena, your fist is a synapse, and the tennis racket is in San Francisco or Seattle. This little fist of yours doesn't know anything about tennis, doesn't know anything about Newton's laws of mechanics, and it cannot Google the answer. It knows nothing about the world, it knows only about its local, biochemical environment. How does it make this decision of strengthening or weakening itself so that your behavior, your global behavior, your ability to play tennis improves? It's completely mind-boggling. It's baffling, right? That's the center, the mystery, of deep learning. The answer we have in AI, is actually a very simple algorithm which is called gradient descent, and we can prove mathematically that it is essentially the only way you can solve this problem. I can go into more details about what is gradient descent in artificial systems if you want me to. Nobody has been able to prove that biological systems are doing gradient descent. But there is evidence that they must be computing some kind of approximation to gradient descent in a way that is different from the way used by our artificial systems, which is called back-propagation. It's a longer, technical discussion which I'm happy to go into if you want.
ZIERLER: Is there a theoretical basis that biological systems have this capability and it's just a matter of finding it?
BALDI: I think to the best of our current knowledge, in a very high-dimensional system like we think the brain is, with a lot of parameters, and which must learn by adjusting these parameters to reach some sort of global optimum, the only algorithm that can do this task is gradient descent. Because if you are changing synapses randomly and hoping to get better behavior, it is hopeless due to the fact that it's a very high-dimensional system. You need a gradient. You need a direction that tells you where you should move to locally in order to improve things a little bit. Now, you can also make an argument that biological systems are faced with difficulties in computing this gradient exactly. The logical conclusion is that they have to compute some approximation to the gradient. That is one current area of research, trying to come up with ideas on how biological networks may compute approximate gradients.
ZIERLER: More broadly is there a basis that biological systems rely on algorithms?
BALDI: I would say yes. I have to preface this with a disclaimer, which is that we still really do not know how the brain works. If you ask any neuroscientist to tell you, where do you store the first digit of your telephone number, they are completely unable to tell you that, and it's not just because they cannot do experiments on human brains; you can take a mouse, store one bit of information in the brain of the mouse, and nobody can tell you where that bit is. So, we're still very far from understanding how brains really work. We're making guesses, but yes, the guess is that there is a sort of gradient descent-like algorithm that is being implemented in the messy biological wetware of the brain.
ZIERLER: I can't help but ask, are you strictly a materialist? Are you convinced that just because we don't currently know where the brain stores these things there must be a material basis for it? Have you ever thought philosophically about the idea that there might be non-material bases for how the brain operates?
BALDI: As far as storage is concerned, storage of bits, I do believe that there is a material story there. But if you take broader questions—consciousness of course, and so forth—I don't have very strong opinions. It's a difficult, messy question.
ZIERLER: It could get into metaphysics.
BALDI: And other hypotheses. For instance, there are theories trying to connect fundamental aspects of consciousness to quantum mechanics, proposed for instance by the physicist Roger Penrose —I don't have strong opinions on any of these. I do think our introspective ideas about consciousness, and the vocabulary built on them thousands of years ago, are inadequate and misleading.
ZIERLER: Of course you have been thinking about artificial intelligence for a very long time, but in the zeitgeist, in our culture, artificial intelligence is everywhere now. This is a relatively recent phenomenon. What accounts for it? Why are so many people thinking about artificial intelligence these days?
BALDI: There has been a lot of progress in the last 10, 15 years, driven by computing power, so you start to see all kinds of new, interesting applications—self-driving cars, for example. I would say that the major breakthrough, actually, occurred in 2022 with GPT-4. I think GPT-4—and of course there were predecessors of GPT-4, but GPT-4 really captured the imagination of many people. These systems are called large language models. They are extremely capable. In my opinion, GPT-4 essentially passed the Turing test. We can have a discussion on some of the details, and yes, they make mistakes, sometimes they hallucinate things, but by and large they pass the Turing test, which was the holy grail of artificial intelligence. The idea behind passing the Turing test is to build a machine that interacts with a human and the human cannot reliably tell whether it's a machine or a human. The interaction is through texting or emailing, with no direct contact of course. I don't know if you have played with ChatGPT, or better with GPT-4, but it's pretty good. In many dimensions, it's much better than humans. You can ask it to write a poem about the American elections in Spanish and in the style of Shakespeare, and in a few seconds you get something that is way better than what any humans could produce. I think that is really an amazing event in the history of intelligence and artificial intelligence. And now you also have multi-modal systems that can ingest or produce data from other modalities, such as speech or video.
Guardrails on AI
ZIERLER: Of course with all of this interest comes some concern about where AI might be going and if there are necessary guardrails—political, social, economic—that should be put on AI to constrain its power. What are your feelings on this, and are you involved at all in those policy discussions?
BALDI: Of course GPT-4, other large language models, and more broadly modern AI, come with incredible opportunities but also incredible challenges. I am very worried about the challenges. There were two letters that were signed by many scientists calling for a moratorium or a slowdown of research in AI. I signed both. Of course, none of us believe that is possible to pause AI research, but it was a way to attract some attention on the current situation. In an ideal world, I think it would be a good thing to slow down and do more research on some of the consequences, some of the dangers, and so forth. Unfortunately, I think the nature of capitalism and human nature are such that a slowdown is not possible. Actually, what you see is the opposite, some kind of acceleration. You see all the companies, large companies like Microsoft and Google—and the smaller ones like OpenAI, are now in a crazy race to improve these large language models, to make them even better, to combine modalities, so now you're going to have systems that can do visual input, text input, voice input, and so forth and similarly for the output. They are going to become extremely powerful. We don't really understand how they work. We don't really understand the possibilities. We don't really understand what their impact is going to be on the labor markets, the financial markets, weapon systems, elections, and so on. So, it's definitely something to be worried about and to keep an eye on. It is useful to notice that human intelligence is not particularly safe either.
Our own history is filled with wars, genocides, increasingly more sophisticated weapons and methods of torture, and so forth. It is very instructive to look at all the things that nature and humans have done to try to control the nefarious aspects of human intelligence. The list is long and goes from good examples of behavior provided by parents and teachers, to constitutions and complex systems of laws and institutions for enforcing them. These are beginning to have parallels in AI, things like RLFH (reinforcement learning with human feedback) and constitutional AI. I could elaborate if you want me to. But another fundamental problem is that today universities cannot train and study the most advanced forms of AI, primarily because they do not have the necessary computing and staff resources. If we want to address this problem, I think there is only one way to do it. The way I have proposed is to create the largest computing and data center in the world, with permanent staff, and with thousands of affiliated academic laboratories (like CERN), working on studying the most advanced forms of AI, the corresponding safety issues et cetera. This would be the largest scientific project ever undertaken by mankind, with a budget one or two orders of magnitude larger than CERN or JWST.
ZIERLER: If we go to a science fiction perspective, even beyond using AI as a dangerous tool, how far away do you think we are from the real long-term concerns that at a certain point AI is not working for us, that it develops its own consciousness, it develops its own capabilities of doing what it wants regardless of human needs and desires?
BALDI: It's very difficult to do predictions, and most of the time one is wrong regarding these predictions, but that's the existential concern of AI. Within the existential concern, there are different scenarios one can imagine, as well as scenarios we are not able to imagine yet. One obvious scenario is AI going rogue and starting to act directly against humans. For that to happen in the most obvious way, you probably need to have AI embodied into physical robots that move around in the world. Maybe we will decide not to allow that, not to have our most intelligent machines combined with robots that can act in the world. But it is also possible that there could be more subtle destructive effects of AI in terms of AI manipulating us, or AI reducing our ability to lead interesting lives. It's not clear that humans fare well in a world where they don't have to work at ll. We don't know much about those questions and other more subtle scenarios. I won't predict a date, maybe it will not happen, but we definitely need to be very vigilant about these things.
ZIERLER: What about recent developments in quantum computing? Are you following that, and is that relevant for the kind of work that you do?
BALDI: I follow that but from a distance. I'm not an expert in quantum computing. It is potentially relevant because you could potentially build very powerful quantum computers that could do calculations that we cannot do on current computers. And they could be applied to AI. It is a very interesting area of research, but it's not my primary area.
Industry and the Imbalance of Computing Power
ZIERLER: Are you using artificial intelligence as a research tool? You mentioned the enormous amounts of data that you produce. Are you utilizing AI to sift through that data to find the signals?
BALDI: Yes, I do, but I have to highlight again another problem of modern times, which is that the computing power that we have in universities is ridiculous compared to the computing power that is available to large IT companies. It's quite unusual to have this situation where the best research or the latest types of AI systems are being developed by companies. It's not that universities don't know how to do it. It's just that they don't have the computing power. Some people think this is OK because AI is a technology, and like all technologies, it must transition from universities to the commercial world as it matures. After all, universities do not build airplanes, nuclear power plants, and so forth. My problem with this view is that AI is also a science, and a fundamental one in terms of defining our place in the universe. And AI is not mature enough.
ZIERLER: Would you include the national labs in this? Oak Ridge, Los Alamos? Is that true for them as well?
BALDI: Some of the national labs have supercomputers, so some of these places potentially could train large AI systems. However, as far as I know, they do not have supercomputers with hundreds of thousands of GPUs, which is the current scale being considered by the largest IT companies. You also need training data at scale, which is another significant problem. You also need hundreds of permanent staff members to run these operations. So my best guess is that currently no national laboratory and no cluster of universities is capable of training the most advanced, multi-modal, large language model AI system.
ZIERLER: Do you have industry partnerships that are valuable so that you have access to these computers?
BALDI: No, I don't.
ZIERLER: This is a source of frustration for you? You feel limited to some degree?
BALDI: Absolutely.
ZIERLER: Where could you go? What industries have the technology that would be wonderful for the kind of work that you do?
BALDI: Nvidia, currently the main company producing GPUs would be an example. There is also availability of GPUs in the cloud through Amazon, Microsoft, or other companies, but the cost is significant. If you want to be able to work on, let's say, 10,000 GPUs, the cost is already too high for a single investigator or even a single typical university.
ZIERLER: I'm curious, for your students—for your graduate students, the postdocs that you work with—given the relevance in industry nowadays, are they more likely to go into industry than into academia and fundamental research? Is that a trend that you've seen?
BALDI: Oh, absolutely. In the past, maybe 20 years ago, half of my students went to industry and half went to academia. Now the ratio is very skewed towards industry.
ZIERLER: That also means that within industry, it is possible to do fundamental research, in other words to lead something of an academic life?
BALDI: Yes, to some extent in some of the large companies. But there is always the danger that corporate research could be negatively affected by corporate goals.
Focus on Genomics and Bioinformatics
ZIERLER: Let's move to some of your affiliations. Your main affiliation beyond your home academic department is IGB, the Institute for Genomics and Bioinformatics. The rest are courtesy appointments.
BALDI: Yes. I founded the Institute in 2000. That was the time when the human genome had just been sequenced, et cetera, so there was a rapid expansion of research in the areas driven by genomic technologies, gene sequencing, gene expression, and so forth. I founded that Institute and I have been running it for the past 20-plus years. But it's clear that a change is needed, so several years ago I proposed to the administration to create an AI institute. Administrations are not always very visionary so [laughs] the proposal was rejected. But finally we got the necessary approvals and thus we have a new AI in Science Institute at UCI.
ZIERLER: Was the Human Genome Project the catalyst for you to start thinking about biology, or do your interests in biology precede that period, precede the 1990s?
BALDI: My interest preceded that—I actually participated in some biological experiments as an undergraduate and while a student at Caltech. But in the early 1990s, when the Human Genome Project was just starting, you could see already that it was going to be a very significant area of research. In particular, bioinformatics, the application of algorithms, statistics, and AI/machine learning, to biological data derived from sequencing technologies and molecular biology was going to be a very important component. So, I started working on some of these problems in the late 1980s, early 1990s.
ZIERLER: For all of these divergent interests that you have, either by your education or by how you see the world, what do you consider your home discipline from which everything else flows? Are you a mathematician, fundamentally?
BALDI: I am not a pure mathematician. I view myself as a scientist, and when I see interesting questions that interest me, I try to go after them. Some of them may need mathematics, or computers, or both. I don't put myself in a box. I do like theory, and I like to think from first principles, so in that sense you could say that I am some kind of theoretical computer scientist.
ZIERLER: Let's now go back and establish some personal history. Let's go back to the University of Paris. What were you interested in as an undergraduate? What were the most exciting courses for you to take?
BALDI: I was interested in mathematics. I always liked the clarity and the beauty of mathematics, so I definitely took mathematics classes. I was already interested in the brain at the time, so I took some psychology classes too. I did some experiments and so forth. By and large, I was already interested in brains, artificial intelligence, those sorts of ideas—in a very confused way. I didn't have clear ideas at all. I was trying to find my way. I was struggling to find a way into those questions, yes.
ZIERLER: Did you achieve a dual degree in psychology and mathematics at the same time, or this was sequential?
BALDI: It was at the same time. In parallel, I obtained master's degrees in both disciplines.
ZIERLER: What was the value, do you think, especially in terms of thinking about the human brain, in having that focus on psychology?
BALDI: The value was moderate, I would think. It gave me some ideas, some sense, of what was known and what was not known and so forth, but by and large I didn't like very much the French education system at the time. One of my personal regrets is not to have moved to the United States earlier.
ZIERLER: Your sense was that the things that you were interested in, the education in the U.S. was stronger?
BALDI: Definitely, that's what I found out once I moved to the U.S. In France, the university system for me was very hierarchical. It was very difficult to interact with professors outside of the lectures. From my perspective at the time, the French culture seemed to overemphasize mathematics and literature. This is a little bit of a caricature but somehow, you had to become a mathematician or a poet. When you go to empirical sciences—physics, chemistry—it was considered [laughs] not as glamorous. Of course, there have been famous French physicists and chemists, Lavoisier and so forth, but mathematics and poetry were at the top of the pyramid. I found this almost dislike for empirical data, this desire to impose theories on the world without listening to what the world is telling you, a little bit disconcerting.
ZIERLER: I wonder if you can explain the University of Paris system, all of the different numbers and campuses that are associated. Where do you go for what?
BALDI: All these different campuses and these different universities, have numbers attached to them. At the time, certain numbers were more specialized for certain areas. For instance, for psychology I went to Paris 10, which is Nanterre, which had a good psychology/sociology program. For mathematics, and physics—I went to Paris 6 and 7, which is Jussieu. The French system also has a lot of other things like Grandes Écoles, those technical schools like École Normale Supérieure, Polytechnique, and so forth, which are sort of a parallel system of education, especially for engineering disciplines. I didn't know the system very well because actually I grew up in Italy. When I arrived in France, I was a little bit disoriented by the system.
ZIERLER: Did you grow up speaking French?
BALDI: I grew up speaking both French and Italian. My mother was French. My father was Italian. We lived in Rome, Italy. At home, we spoke both languages. I was raised in a bilingual family.
ZIERLER: Your mom must have chosen your first name?
BALDI: Definitely, and my dad gave me my last name. Exactly! [laughs]
ZIERLER: [laughs] What were the big ideas when you were in Paris as a student, in regard to computers? What could computers do? What were your interests in computers all those years ago?
BALDI: I did a little bit of programming—typically there was a single, relatively large for the time, computer per campus. It was less powerful than your laptop today. We had to do programming with punch cards. You had a stack of cards where you had to punch one instruction per card. And then you had to carry this deck and put it into a machine that would send it to the computer for processing. If you flipped the order of the cards, if your deck of cards fell on the floor and you put them back together randomly, it wouldn't work at all. You would submit this stack of cards and then wait a couple of days to get the answer, usually in the form of long printed sheet of paper. It was horrible. To tell you the truth, although I am a computer scientist, I don't particularly like programming. AI is finally making programming much easier.
From Paris to Pasadena
ZIERLER: Tell me about your interest in going for the PhD and your realization that it would be so important to come to the United States, perhaps even before you thought about Caltech.
BALDI: I didn't realize it would be important. I was just so disappointed, and a little bit depressed, with what I was doing that I thought I needed to do something else. I thought maybe it would be a good idea to apply to universities in the United States. I knew that a lot of good science was coming from there. That's sort of where I was coming from. What is striking to me when I think back to those times is the randomness of the events that changed my life. I think it's true for everybody; if you look at your life in the rear-view mirror, you see that there are a few events, especially people you meet by chance, that completely change your life--
ZIERLER: Yes.
BALDI: At the time, I knew of some famous American universities like Stanford, MIT, UC Berkeley, et cetera. Their names were known to me and to many other people in France, of course. I had applied to those. But then I met by pure chance the physics teacher of the American High School in Paris. There was an American high school in Paris. And the school had physics classes, and there was a physics teacher. I met him by chance, and I told him, "Look, I am applying to American universities, to Stanford, MIT,…" He told me, "You should apply to Caltech. They are the best in the world in astronomy." I had never heard the word "Caltech" before. I don't come from an academic family, so I didn't know the word "Caltech." And I didn't want to do astronomy, although astronomy is very beautiful. But because he told me that, I applied to Caltech. Just, why not? I don't know what it is, but why not?
Then I got accepted into a few places, including Caltech and Berkeley, and I didn't know what to choose. I didn't know Caltech. I had been to Berkeley on a previous summer travel so I knew a little bit about Berkeley. To try to make up my mind, I went to the American Embassy in Paris, and they had a room filled with the catalogs of all the American universities. I went through the catalogs. It's hard to tell from a catalog if you like a place or not. But there was a lady there, a librarian, who at the time seemed quite old to me, and I told her about my situation—"I like California, of course, it's the dream for everybody, and I've been accepted to Berkeley and Caltech, and where should I go?" Immediately, I still remember, she told me, "Oh, don't go to Caltech. It's in a very bad part of Los Angeles."
ZIERLER: [laughs]
BALDI: And I was—in my early twenties. This "very bad part of Los Angeles" really resonated with me. That's when on the spot I decided, "Okay, I'm going to Caltech." But it's purely by chance.
ZIERLER: You mean Berkeley?
BALDI: She said, "Caltech is in a very bad part of Los Angeles." So she advised me to go to Berkeley. But to me, a "very bad part of Los Angeles" sounded exciting.
ZIERLER: That was intriguing.
BALDI: It sounded exciting, exotic. You want to be in a very bad part of something! [laughs]
ZIERLER: You must have been so disappointed when you saw lovely Pasadena!
BALDI: Well, it's so lovely that, no, I was not disappointed.
ZIERLER: [laughs] Was it only mathematics programs that you were applying to? Did you feel like that was the best place for you to study? Were you looking at computer science programs also?
BALDI: I applied only to mathematics, because that's where my curriculum had been. Most of my classes, and I had a couple of letters of recommendation, were mathematics related. I felt the easiest for me would be to try to get into a mathematics program. Although I knew that I was not going to be a pure mathematician, and that my interests were more diverse and more connected to the real world.
ZIERLER: Tell me about arriving at Caltech. What was that like for you? What do you remember?
BALDI: I remember that overall it was incredible. It was very exciting. I wandered around the campus. I rapidly found that it was indeed one of the best places for science in the world. Within weeks I discovered the work of John Hopfield, the Hopfield model, who was doing neural networks, so exactly the kinds of things I am still doing today, but one of the embryonic versions of that was being developed at Caltech. I stumbled on this immediately, and I knew that this was what I was going to do. There was also Leroy Hood, so I was also aware very early on of DNA sequencing, of its impact on biology, on the possibility of sequencing the human genome. I made wonderful friends, and it was an exciting time.
The Joy of Caltech Flatness
ZIERLER: The hierarchical system that you emphasized in Paris, was it a breath of fresh air when you got to Caltech?
BALDI: It was [laughs]—it was a great breath of fresh air, yes.
ZIERLER: How did you go about choosing a thesis advisor? What was most interesting to you?
BALDI: I had a little bit of a problem there, because of course I was in the Mathematics Department, and my interests were a little bit more applied and so forth. I asked a few people, and finally my official advisor ended up being Richard Wilson, who was a combinatorist. I didn't work very closely with him. I didn't work on the problems that interested him, but he gave me a lot of freedom. One of his great qualities was to let me—completely free to do anything I wanted. Which I did. [laughs]
ZIERLER: Tell me about Wilson. What was he known for?
BALDI: Wilson is a combinatorist, so discrete mathematics, the kind of mathematics that is used in computer science. There is a strong tradition, or there was a strong tradition in combinatorics at Caltech, through Marshall Hall, Herb Ryser, Richard Wilson. Richard had solved some important combinatorial problems, so he was known for things like combinatorial designs and extremal set theory. Basically, the study of how to arrange objects in different ways and with different properties, and to estimate the number of such ways.
ZIERLER: Pierre, did you ever interact with engineers who were interested in biology? For example, Carver Mead started to become interested in neurology at that point.
BALDI: Of course I interacted with them! I took a course with Hopfield. I started working on the Hopfield model. Carver Mead was part of the course—it was co-taught by Hopfield and Carver Mead—so I started interacting with them. Then I started interacting with people in EE. For instance, Bob McEliece and Ed Posner were there, as well as Yaser Abu-Mostapha and Demetri Psaltis. too. Ed Posner, for instance, was on my thesis committee. I interacted with all those people. Even Feynman was teaching a computing class at the time and was interested in some of these issues.
ZIERLER: In speaking with Lee Hood, did you come to appreciate that he was developing technology that would make biology a big data discipline? Was that apparent to you even back then?
BALDI: Yes, it was very clear.
ZIERLER: Did you see the origins of the Human Genome Project intellectually, even as a graduate student?
BALDI: Yes. Technically the Human Genome Project started I think in 1990, and I graduated in 1986, but you could definitely see the beginning. In a sense, Caltech had an early start in both neural networks and bioinformatics. Unfortunately, through a series of mistakes, it squandered both.
ZIERLER: Were there computational advances, either by chronology or just by resources? Did you have access to better computers at Caltech?
BALDI: Not really. The Mathematics Department didn't have any significant computing facilities at the time. I saw the beginning of the personal desktop, the beginning of Microsoft. I saw that towards the end of my PhD. There were a few PCs coming into the Math Department. Barry Simon, Richard Wilson were very interested in these technologies. I think they worked with IBM or some other companies on this. But yeah, you could see the beginning. But at the time the Department of Electrical Engineering had better computing facilities than the Mathematics Department. I remember typing one of my first papers using Latex at Caltech.
ZIERLER: Not working closely with Wilson, not seeing yourself as a pure mathematician, did you ever think about changing departments or advisors, or as long as Wilson allowed you all the freedom you needed, it simply worked out?
BALDI: I did think about changing departments. I thought about having an advisor outside of mathematics. I did ask a couple of persons but it didn't work out.
Neural Networks and the Hopfield Model
ZIERLER: Tell me about developing your thesis. What did you work on?
BALDI: I definitely worked on neural networks, on the Hopfield model, on generalization or extension of the Hopfield model. That was a chunk of my thesis. I also worked on a more combinatorial problem that was given to me by Ed Posner in electrical engineering, which had to do with cellular radio - the precursor to our current cellular phone system. Basically the problem could be translated into a particular kind of graph coloring problem.
ZIERLER: What were your conclusions? What did you see as your contributions?
BALDI: The contributions were technical. For instance, in the case of the Hopfield model, I showed that it could be generalized to neurons with polynomial activation functions instead of linear activation functions, and started defining and measuring the capacity of such networks. In the case of the cellular radio problem, I converted it into a graph coloring problem where the colors are radio frequencies, and I estimated the number of colors required under several different scenarios. I also proved a connection to the classical four-color problem for planar graphs. Those are some of the outcomes of that work.
ZIERLER: In talking to John or your own sense, what aspects of the Hopfield model were mature, and what were really still in building mode?
BALDI: The Hopfield model is a little bit of an outlier within the current field of AI. It is a beautiful mathematical structure, and it is also related to physics, to spin glasses—to statistical mechanics in physics. That was very important at the time. The Hopfield model turned out to be not particularly useful from a practical standpoint, but it was very useful in terms of our thinking about systems of neurons, what they can do, how they can behave, how they can converge to stable states. The model was used to try to understand associative memory, basically how you can start from partial bits of information and then recall what you want through a sequence of associations. That's the functional process that you are trying to capture with the Hopfield model. It opened up the connection of neural networks to statistical mechanics. There were also important connections to optimization. Associative memory, statistical mechanics, and optimization were connected by the fact that there is an energy function behind these Hopfield networks, and the dynamics of these networks tends to minimize this energy function. It was fairly mature, and a lot of people started working on this. At the time, Hopfield was also a member of Bell Labs, so many physicists, many scientists from Bell Labs started working on these neural networks. The first conferences in this area, such as the Snowbird and NIPS conferences, were started in the mid-1980s very much under Caltech and Bell Labs influence, I would say.
ZIERLER: In the mid 1980s, at Caltech or beyond, were the terms "artificial intelligence" and "machine learning" already in use?
BALDI: Artificial intelligence, definitely; it's an old term. Machine learning existed, but I don't think it was a very common term in the early to mid-1980s.
A Combinatorial Connection
ZIERLER: When you defended, when you finished up at Caltech, were you determined to stay in the United States? Was your experience here such that you wanted to make a research life for yourself in the United States?
BALDI: Without any doubt. The United States were a much better place to do science. Another reason I came to the United States was to avoid the military draft in France. At the time, there was a military draft and basically, I had run away from the draft by coming to Caltech. So I was essentially in an illegal situation, in the sense that the French consulate wouldn't renew my passport. I remained without a valid passport and could not go outside of the U.S. for many years. But more importantly, the level of the science and the enthusiasm, the creativity that I found at Caltech and in the United States, were so great that I did not want to go anywhere else. Then, I had another moment of incredible luck. At that time, in the mid 1980s, there were two places that were really the center of neural networks research. Caltech was one of them, but the other one was UC San Diego, where you had the PDP group led by David Rumelhart with many other people such as Geoff Hinto, Terry Sejnowski, Francis Crick, Michael Jordan, and so on. Somehow, I was hoping to be able to go to UCSD to continue working on neural networks from a different perspective. As a graduate student, I used to do some odd jobs here and there, and so I ended up doing some bartending. One evening, I was bartending at the Athenaeum for a Caltech alumni reunion, and a gentleman came to see me. I fixed a drink for him, and he asked me what I was doing. I said I was doing a PhD in combinatorics. He said, "That's funny, I'm also a professor of mathematics, and I am doing combinatorics research at UCSD. After a brief conversation, he said: "Why don't you come to UCSD for a postdoc?" This was Gill Williamson, Professor of Mathematics at UCSD, now emeritus, and a Caltech alumnus. So, basically, he gave me my first job on the spot. And my first job was exactly where I wanted to be, purely by chance meeting! I was fortunate to interact with the PDP group at UCSD from 1986 to 1988. Learning by gradient descent in neural networks, which is the dominant algorithm of AI today, can be traced back to the PDP group.
ZIERLER: Tell me about Williamson and the history of how UC San Diego became a center in this research.
BALDI: Gill Williamson, again, was a combinatorist, a mathematician. He didn't work at all in these areas. He invited me to go there. But my understanding was that at UCSD, there was a forward-looking cognitive science department, and David Rumelhart was one of the key professors there. He had teamed up with J. McClelland at CMU, and together they had created this PDP group—PDP stands for parallel distributed processing—with the basic idea of having networks of little units connected and talking to each other, and creating organized behavior through those interactions, mediated by adjustable connections. Again, the idea of neural networks. The group had quite a few members, and many of the members became quite famous, people like Geoff Hinton, Terry Sejnowski, and Michael Jordan were part of the PDP group. Francis Crick—of Watson and Crick DNA fame—was also hanging out with the PDP group at the time. He was at the Salk Institute. He had a joint appointment with UCSD and came almost every week to the meetings of the PDP group. I attended those meetings and started to work on neural networks and backpropagations right there. I also worked with Walter Heiligenberg who was a neuroethologist at UCSD, working on electric fish. Unfortunately, both Heiligenberg and Posner died in transportation accidents a few years later. And Rumelhart suffered from dementia, most likely Pick's disease, shortly after he moved to Stanford.
ZIERLER: What was your appointment? Was this an assistant professorship? Were you a postdoc?
BALDI: I think the official title was Visiting Lecturer. I was a lecturer, so it was not a tenured position. It was like a postdoc. I didn't lecture very much. I did some lecturing in calculus, but for the most part I was able to focus on research. With Heiligenberg, we showed how arrays of bell-shaped sensors could produce linear responses and explain hyperacuity phenomena. With Kurt Hornik, we proved one of the first theorems connecting neural networks to statistics (Principal Component Analysis) and showed that fully connected linear networks have no spurious local minima. Part of the success of gradient descent learning in high-dimensional space is connected to the absence, or rarity, of spurious local minima.
ZIERLER: Was there collaboration at UCSD with Caltech, or was it sort of its own world?
BALDI: Own world. By and large, it was its own world. At the first machine learning conferences of course, there were participants from both campuses. That's one place where there was some intersection, but by and large, they remained fairly separate.
ZIERLER: What new ideas did you experience at UCSD?
BALDI: The key idea was learning by gradient descent, or back-propagation, which is the algorithm that ultimately succeeded in building intelligent machines. That's one of the key ideas. Another beautiful idea that I experienced is the idea of self-supervised learning, in the form of autoencoders. In machine learning, one standard paradigm is called supervised learning. In supervised learning, you have data and you have targets for the data, so you know what is the right answer for a given input. Let me give you an example. Let's say you are doing biomedical imaging, you are studying cancer in the liver, you have images of the liver, some with cancer and some without, so you have 0/1 labels: one tells you that cancer is present in the image, and zero tells you that there is no cancer. The label or the target is provided with the data. Usually, it is expensive to acquire those targets. It requires human experts to label the images. But it leads to supervised learning. You have data, you have inputs, and you know what the right output should be for those inputs. Supervised learning uses gradient descent to reduce the error that a neural network makes on those outputs. That is fine, but again, it's expensive, and people say, in many cases, "You have data but you don't have labels; what are you going to do in those cases? How can you use very large amounts of data without labels? What can you do from the data itself?"
That's where self-supervised learning comes into play. Let me give you examples of self-supervised learning. When you take language, you take text, you could imagine a machine that, given a piece of text, tries to predict the next word. You don't need labels, you don't need somebody to tell you what the next word is. The information is directly available from the data itself. Another example is with images. You can occlude pieces of images and train a system that can recover the full images with the missing pieces. Again, the labels are directly available from the original data itself. Another example that is the example that became important for me when I was at UCSD is what are called autoencoders. In an autoencoder, you have data that comes in—it can be any kind of data (text, images,audio) —and then you have a neural network that takes this data and feeds it into a small bottleneck layer—that's called the encoder network—and then you have a decoder network that tries to reproduce the input data from the bottleneck representation. So basically, you have a network that gets data in its input and tries to produce the same data in its output, but it has to go through this bottleneck. At first sight, this looks stupid. You already have the data; why are you trying to reproduce the data in the output? In the neural network output, you are going to produce a distorted version of the data. There is no gain. However, the brilliant idea is that what is important is not the distorted output that you get; it's what happens in the bottleneck, where the network is forced to learn how to optimally compress the data in order to reconstruct the data itself. Being able to compress the data is essentially equivalent to understanding the data, because you have to remove the noise and keep what is essential. This idea that you can learn how to do compression using this technique of auto-encoding the data through a bottleneck layer, is another example of self-supervised learning, or learning without external targets. I started to work on back-propagations and auto-encoders while at UCSD, and this work continues to this day.
Dynamical Systems at JPL
ZIERLER: Tell me about returning to Pasadena, your appointment at JPL, and your appointment in Biology at Caltech.
BALDI: To be frank, I wanted to be in a university. That was my goal, to be free to do research. I did apply in a few places but that didn't work at the time. Actually neural networks were viewed in a very negative way by the majority of the community. Traditional computer scientists didn't like neural networks at all. They were looked at as a very poor, unprincipled, approach for solving problems. I experienced a lot of rejection, but I applied to JPL, and that was one place that accepted me. I did have a postdoc offer from MIT but with the passport problem I mentioned to you I wanted to avoid short-term positions. In short, I ended up at JPL but I knew that it was not the ideal match for my interests.
ZIERLER: What was the group at JPL?
BALDI: It was a newly founded group that was led by Jacob Barhen. I don't remember the name. I think dynamical systems was in the name. We were able to do neural network research, but the problem was the connection to NASA. We didn't have a good connection to NASA. We basically were on soft money, and we had a technology that was not directly applicable to NASA at the time because the computers were not powerful enough. So, it was a little bit of an odd group within the big JPL organization, and the even bigger NASA organization.
ZIERLER: Was neural networks relevant to JPL's core missions in planetary science and Earth science?
BALDI: A little bit, but very marginally. There were attempts to do image processing using neural networks, convolutional neural networks, but the computing power was not sufficient, really, to make a mark. It was not clear that you could develop useful technology, at the time, in the late 1980s, that could be useful for NASA missions. I remember I had started a line of work in bioinformatics, applying machine learning methods (hidden Markov models) to biological sequence data. JPL was generous to give me the Lew Allen Prize for some of that work. I asked Ed Stone, who was JPL's Director at the time, if there was room for biology at JPL and he candidly answered no.
ZIERLER: Tell me about the value of your affiliation with Biology at Caltech during this period.
BALDI: I had kept my Caltech connections, and I was participating in various activities on campus, attending talks, and so forth. I had some kind of courtesy appointment in Biology, and that allowed me to submit a couple of grants. But the Caltech administration did not like that a significant fraction of my JPL salary was covered by my Caltech grants. So, I was forced to move to Caltech full-time in a member of the professional staff position that I did not like. In addition, by the mid-90s, there was essentially no research in AI/deep learning or bioinformatics at Caltech, and so I left.
A New Era of Big Data Biology
ZIERLER: This was at the time when Lee Hood left Caltech. I wonder if you followed those developments and more broadly what that meant about Caltech's relationship to so-called small science versus big science in biology?
BALDI: Yes. Of course I was aware that Lee Hood left Caltech. I was told that it was because its operation had become too big. I don't know anything about the details but that's the explanation I was given—too big for Caltech. He moved to Seattle, of course. What was the other part of your question?
ZIERLER: Just if it registered with you that what Lee was doing was ushering in a new era of biology as a big data enterprise, and perhaps Caltech at that point was not ready for that?
BALDI: Yes, definitely. By the early 1990s, you could see the beginning of the Human Genome Project. You could see the beginning of large databases of DNA sequences, of protein sequences and the need for statistical and algorithmic methods to handle and analyze these data. All those things were starting and moving rapidly in the early 1990s. But little of that research was being carried out at Caltech. Among other errors, Caltech overemphasized neuroscience. Of course, neuroscience is very important and interesting, and it has made considerable progress. But the progress and impact made by bioinformatics and AI in the last three decades dwarfs those made by neuroscience. Today, we have machines that more or less pass the Turing test, but we still do not know how we store our telephone numbers in our brains.
ZIERLER: This would include astrobiology, presumably? No one was thinking about astrobiology back then?
BALDI: Yes, that's probably the case. You can think about problems of origin of life, and these of course are very interesting problems, but I don't think they were part of JPL's core mission. JPL's core mission is not on the side of biology.
ZIERLER: When you leave Caltech and JPL, is that to begin your faculty position at UC Irvine, or does something come in between?
BALDI: In between, I headed a neural network startup called Net-ID—we had developed technology, for instance, for recognizing fingerprints. We were also using other related machine learning approaches, in particular hidden Markov models, for biological sequence analysis problems. But the timing was not good--we were ahead of the time and did not have sufficient computing power to move rapidly. I moved to UCI in 1999.
ZIERLER: Tell me about the opportunity at UCI. Why was it attractive to you?
BALDI: Location wise it was not far from Pasadena, Los Angeles, and SanDiego but with much less congestion and pollution, close to the beach. et cetera. More importantly, UCI was and still is a young campus within the UC system, much younger than UC Berkeley or UCLA. So, there was a kind of youthful energy and a sense of expansion, with new buildings being constructed continuously from 1999 to today. You could see new departments; new buildings being constructed. The joke, even today, is that UCI stands for "Under Construction Indefinitely." Finally, I noticed that interdisciplinary collaborations were easy to establish at UCI. There were no rigid boundaries between departments or between schools. You could easily start new interdisciplinary collaborations, submit joint grants and so forth. The circular topology of the campus around a park (another chance event, by the way) is the ideal topology for any large university which emphasizes interdisciplinarity—any department is within short walking distance of any other department, by cutting through the park. This has been very helpful in developing collaborations with scientists from other fields, especially in the pre-video-conferencing era. In short, the youthful energy and the ease for establishing new interdisciplinary collaborations are the two main features that attracted me to UCI.
ZIERLER: There was a culture of what we now call interdisciplinarity at UCI.
BALDI: Exactly.
ZIERLER: What was your home department? Where did you start at UCI?
BALDI: I was hired by Computer Science. At the time it was a self-standing department. We have now become a school with three different departments: Computer Science, Informatics, and Statistics. It is actually not very common to have statistics and computer science together in the same school and now in the same building. But I think it is the right vision for obvious reasons. And then, within a year or so, I was able to create the Institute for Genomics and Bioinformatics.
ZIERLER: Given all of your experience—you didn't start in a faculty position straight out of graduate school—did you come in with tenure, or did you have an accelerated tenure clock?
BALDI: Yes, I was hired with tenure.
ZIERLER: Tell me about starting the Institute for Genomics and Bioinformatics. What was available to you? What did you have to work with?
BALDI: I didn't know anything about these institutes, but senior faculty told me that within the University of California system, we have something called organized research units, ORUs, which is a mechanism whereby any group of faculty can submit a proposal to the administration, and if the proposal is accepted, the group of faculty can create one of these interdisciplinary institute which do not report to the departments or the schools. Instead, they report directly to the central administration, the Office of Research to be precise. These ORUs receive a modest amount of funding from the central administration, and this allows them to bootstrap their various activities. They are reviewed every five years with typically a sunset clause after fifteen years or so. I wrote a proposal, and it was accepted, and that's how the Institute started.
Interdisciplinarity at UC Irvine
ZIERLER: What was the founding mission? What did you hope to accomplish in building the Institute?
BALDI: The main mission was to foster interdisciplinary research at the intersection of the life and computational sciences on the UCI campus.
ZIERLER: At Caltech, you were saying earlier, you didn't have access to such great computers. By the late 1990s, had the Rubicon been crossed? Were computers now powerful enough to pursue these questions in biology?
BALDI: The progress was incremental. Of course there were more powerful computers. There was enough computing power to do a number of very interesting things. Definitely enough computing power and storage to begin to analyze all the sequence data that was coming from the sequencing projects, beyond the human genome project—sequencing of all the model organisms, sequencing multiple human genomes, and so forth. In addition, in the late 1990s, early 2000s, other high-throughput omic technologies were becoming available, for instance high-throughput gene expression technologies, in the form of microarrays (and now RNA sequencing), providing the ability to measure the level of expression of all the genes in a given preparation. Today we can do it even at the single-cell level using single-cell RNA-seq technology. Thus, there was an explosion of biological data, both in terms of publicly available data in public repositories and in terms of data produced by laboratories at UCI in both the School of Biological Sciences and the School of Medicine. You could play with all these data, and the computing power that we had, or that we acquired over time, was sufficient to perform interesting analyses and discover novel biology. However, the computing power at the time was not sufficient to have a successful application of neural networks to the game of Go, for instance, or to build something like GPT-4, or to analyze a lot of large images from some large-scale imaging project. We did try though—for instance, we published a paper on using neural network to play the game of Go, but we couldn't really solve it with the computing power available at the time. The same is true for the protein folding problem. We developed neural networks methods to solve the protein folding problem, but we did not have enough computing power to scale them up. That had to wait another 10 years with the advent of GPUs to really take off.
ZIERLER: I want to ask about the term "bioinformatics." I know perhaps you're familiar with this—at Caltech, there was the launch of astroinformatics, of astronomy utilizing AI because of the enormous amounts of data that were coming in from the night sky surveys, the advent of CCD detectors, and things like this. Was bioinformatics taking a cue from astroinformatics, were you talking with astronomers, or this is happening in tandem?
BALDI: It's happening in tandem, even I would say predating astroinformatics.
ZIERLER: It does predate astroinformatics?
BALDI: I would say so, yes.
ZIERLER: How far back does it go? How far back is the idea that biology needs AI in order to understand and sift through all of the data?
BALDI: You could trace it to different origins but definitely to the Human Genome Project. With the Human Genome Project, you saw immediately that you needed large storage facilities for all the sequencing data that was becoming available and then you needed computers, statistics, machine learning to analyze the data. Because DNA is very gibberish, right? Genomes yield very long sequences of A, C, G, Ts, that are very boring and cryptic for the human eye. You really need computers to align the DNA of different organisms to see what they have in common, to detect signals, et cetera. Alignment algorithms, for instance, became very important. Besides sequences, you could also compare gene expression. We developed new statistical methods for gene expression comparison (or differential analysis).
The First Draft of the Human Genome Project
ZIERLER: What was the status of the Human Genome Project by the time the Institute got started? Where were things with the Human Genome Project circa 2001?
BALDI: The first draft was completed. I don't know the exact date—in a sense, even today people are still arguing whether we know the complete sequence or not, because there are repetitive regions and so forth. By that time, the first version, the first complete draft of the human genome was available, as well as that of all the model organisms, whether it's E. coli or C. elegans, or drosophila, or the mouse genome. All those genomes had been sequenced or were going to be sequenced very soon.
ZIERLER: What was the Institute's motivation with the Human Genome Project? Was all of the data available now to be analyzed? Was the data already analyzed at that point?
BALDI: It was partially analyzed, but analysis continues today, right? There was analysis going on for genomic data from the Human Genome Project as well as other genome sequencing projects. But there was a rapid increase in all kinds of other sequence data such as RNA sequences, protein sequences, et cetera. As I mentioned, in addition there was also gene expression data using microarrays. UCI had a microarray facility which later became also a sequencing facility.
ZIERLER: What aspect of the gene expression research was fundamental, and when did you or colleagues at the Institute really start thinking about translational research, even clinical applications?
BALDI: There were several members of the Institute who were interested in translational efforts from the beginning and several startup companies were founded by members of the Institute over the years. Gene expression can be used, for instance, in drug discovery projects both to identify targets and study the effect of compounds. Another thing that became very important over time at least for my own research was the study of circadian rhythms. I realized how important circadian rhythms are for biology. The last 10 years, for instance, my group has spent a lot of time working on gene expression in particular in connection to circadian rhythms in biology.
ZIERLER: What is the connection between circadian rhythms and gene expression?
BALDI: Circadian rhythms for most people are some kind of strange phenomenon that happens when you travel. You have jet lag. You notice that you're a little bit out of whack. They are viewed as a curiosity of biology, some kind of rare odd phenomena. Instead, it turns out that circadian rhythms are absolutely fundamental and ubiquitous. Every one of your cells in your body, every cell of every organism, is essentially oscillating with a 24-hour rhythm, and by that I mean that certain genes are expressed and certain genes are turned down in a periodic fashion with a period of 24 hours. Not all of your genes, but a good fraction—10 percent of your genes in any one of your cells—is basically oscillating with this remarkable rhythm. This rhythm is absolutely fundamental for biology. Every aspect of biology, every function in biology, is touched by circadian oscillations.
These rhythms go back to the origin of life. Some of the first organisms, like cyanobacteria, were using photosynthesis to drive their metabolism and all their physiology, and if you are photosynthetic, by definition you are circadian. Why? Because you get your energy from the Sun. You are active when the Sun is up, and when the Sun goes down, you have to shut down and go into a rest mode, and then you become active again when the Sun comes back. All of your circuitry, your molecular circuitry, is tuned to this rhythm of shutting down and turning on every 24 hours. And we descend from those organisms. Furthermore, if you look at the world around you, the circadian rhythm is the only thing that is very predictable, and you can be sure of. You have no idea who is going to be the president of the United States tomorrow, but you can be pretty sure that tomorrow morning, the Sun will be up, right? And if you think of evolution, and compare day and night, the world is very different in terms of temperature, winds, predators, et cetera. Thus, evolution had to pay great attention to this rhythm, and this is why it is deeply etched into the core of all biological machinery. It's just amazing. It's a fundamental aspect of living systems, and so you can interrogate these oscillations using DNA microarrays and now what is called RNA-seq technologies that look at gene expression. We have been looking at which genes are oscillating in different organisms, in different tissues, under different conditions. If you change your diet, you will see changes in these oscillations in your liver, for instance. Any change in your environment, any perturbation will change your circadian oscillations. It's a very pervasive phenomena. And of course, on the translational side, you can think about monitoring these oscillations to measure health, diagnose diseases, and identify optimal times for therapeutic interventions.
A Research Convergence
ZIERLER: Tell me about the advent of the Machine Learning Institute. What were you doing that required a new organization, a new research group, dedicated more explicitly to machine learning, circa 2005, 2006?
BALDI: It was clear to some of us that machine learning/AI was essentially the most important area of computer science. Data was growing exponentially, and machine learning methods were needed to analyze the data, data of all kinds, text, audio, images, videos, and so forth. My colleagues and I thought it would be a good thing to have such a center. UCI actually had a tradition in this field going even further back because we have the so-called UCI Machine Learning Repository, which is a large public repository of very different datasets that can be used to test and compare different machine learning approaches. In the short history of machine learning, this Machine Learning Repository has played a useful role. We also hosted the first Southern California machine learning conference. These were some of the components that went into the creation of the center.
ZIERLER: In the title, Center for Machine Learning and Intelligent Systems, is that to suggest that you're thinking about intelligence both in an artificial sense and in a biological sense? Is that baked into the program?
BALDI: That Center in particular is mostly on the artificial side, but yes, it was definitely part of the overall concept. In fact, it was one of my roles to keep the connection to the neurobiology side of things alive.
ZIERLER: In your answer to my very first question about what motivates your interests, at this point this is really a convergence for you of having these two institutes and their representation of your fundamental interests in artificial and biological intelligence.
BALDI: Absolutely.
ZIERLER: How did your research change or how did it improve, either with colleagues or attracting graduate students, as a result of formulating not one but both of these research centers at UCI?
BALDI: It has definitely helped to attract new, talented graduate students, new faculty to our programs. When we interview faculty and they come to the campus, they are definitely introduced to these entities, one way or the other. We have organized symposia, conferences over the years, where people from different departments on the UCI campus come together. We just had a symposium last October with the neuroscientists at UCI together with the AI folks, a joint symposium precisely to foster this two-way dialogue between neuroscience and AI. So, yes, they have been helpful in lubricating those kinds of interactions. And, of course there have been joint grants and joint research projects over the years and so on and so forth.
ZIERLER: You mentioned earlier that the Hopfield model was something of an outlier. I wonder if you could explain that a little more. In what ways was it an outlier, and how did you think about that now having these two research centers at UCI?
BALDI: It's a good question. The Hopfield model was developed with the idea of understanding associative memory. The initial focus in that sense—functional focus—was somewhat narrow—associative memory—whereas the other efforts from the PDP group were broader in terms of functions. They were trying to apply neural networks to language, to images, to all kinds of different things. The Hopfield model was concentrated on associative memory. In later years there was an attempt to apply a version of the Hopfield model to optimization problem, how neurons by interacting with each other could try to find the optimum of some problem, such as the traveling salesman problem. But that didn't pan out, and not just because of computing power. Even today, I would say that almost nobody is using that sort of approach to solve optimization problems. The Hopfield model came with elegant mathematics and a strong connection to statistical physics. But this came from having neurons connected in a symmetric fashion—the strength of the connection from neuron i to neuron j must be equal to the strength of the connection from neuron j to neuron i, which is not very realistic from a biological standpoint, and it was not very practical. In that sense, it was an outlier. The Hopfield model falls in the category of shallow learning. Its deep learning generalization is called the Boltzmann Machine. But even Boltzmann machines are not very practical and have been superseded by other methods. It is not what is used in order to build, say, GPT-4. In short, the Hopfield model was an outlier, both structurally and functionally, but it had beautiful mathematics, and a strong connection to statistical mechanics, and those were influential at the time.
ZIERLER: Where is evolution in all of this for you, as an intellectual framework, as a way of thinking about evolution, as a concept relevant both in biological and synthetic systems?
BALDI: Evolution, of course, is fundamental to biology. There is a famous quote saying that nothing makes sense in biology except in light of evolution. People have tried to use evolutionary principles on the engineering side of things, the so-called evolutionary or genetic algorithms. This has led to some interesting results here and there, but overall evolution is too slow. So, the majority of scientists today do not use evolutionary algorithms in AI. There are a few interesting papers on evolutionary algorithms, but by and large, gradient descent has been much more powerful for building AI systems rapidly, as compared to the very slow tinkering associated with evolutionary approaches. Currently AI operates in very different regimes compared to the human brain. For instance, it uses much more power and can be trained on all the written text ever produced by mankind. No single human can read, let alone learn from, these data. Nevertheless, it is still useful to think about evolution in the context of AI. For instance, how did evolution handle the problem that human intelligence is not safe at all?
Overriding Evolution
ZIERLER: Of course you wrote a whole book on these concepts. Is the idea of The Shattered Self: The End of Natural Evolution that evolution is simply too slow to be useful in a synthetic context?
BALDI: Yes, but that is not the main idea of the book. The main idea is that through our technologies, essentially biotechnology and AI, we are becoming capable of overriding evolution in some sense, and this leads us to question who we really are and what kind of future is ahead of us. We can build powerful systems, biological and artificial, in a much better and much faster way than evolution can do. In that sense, it's the end of evolution.
ZIERLER: Is CRISPR and the concept of gene editing sort of the apotheosis of these ideas in terms of what is available to us now?
BALDI: Yes. It is the apotheosis in one direction, basically it gives us the ability to edit a genome as if it was a Word file. We can go anywhere in the file and change whatever letters we have. We can delete text, insert new text. CRISPR is basically that. It's taking the genome and making it look like a Word file where you can do almost anything you want.
ZIERLER: We talked earlier about guardrails as they relate to artificial intelligence and machine learning. What about guardrails on the ethical considerations surrounding gene editing and the concept of making so-called designer babies? What are your thoughts in that regard?
BALDI: I definitely think we need guardrails, because we know how to do things technologically, but we don't know all the consequences. If you start manipulating a genome and inserting something somewhere, you don't always know, for sure, what will happen, what will be the interaction of, say, the corresponding proteins with the rest of the proteins that you produce and so forth. We still don't understand all the details of biology at the molecular level. When you are starting to play with life and you are not sure of the results you can obtain, it is reasonable to be cautious and to have guardrails. Things like human cloning, for instance, is still not allowed today.
ZIERLER: Is there a biosecurity element to this as well? In other words in the 1970s with the recombinant DNA revolution and the resulting Asilomar Conference because of the concerns about what if there is a leak out of a lab or something like this, is there a concern that with gene editing and with decoupling ourselves from evolution, that we might create pathogens or other problems, from creating life in a way that doesn't rely on evolution and its slowness?
BALDI: It is definitely a concern. The COVID pandemic had a little bit of that. Some thought that COVID had been created in a laboratory and there is still some debate about that. But definitely one could use CRISPR and similar related technologies to try to build new strains of viruses that we have never seen before which could be very dangerous. It is technically conceivable.
ZIERLER: Let's move our conversation closer to the present. You mentioned COVID. Did you get involved at all in COVID research? Was there an opportunity to apply machine learning and bioinformatics to the pandemic?
BALDI: There were some opportunities, but I didn't—I passed on them.
ZIERLER: What happened to your lab? Were you okay in terms of being able to work remotely? Did you have to shut things down?
BALDI: I was okay because my lab is not a wet lab. We're just a bunch of computer scientists working with computers. It was actually very easy to work remotely using Zoom, for instance, so it didn't affect me very much in terms of research. The wet labs were affected much more than I was. We learned how to use Zoom for most of our meetings, and as you can see, some of that has remained with us and will remain; we won't go back to the previous system.
ZIERLER: You mentioned of course we still are not sure about the lab leak versus the wet market origins of the virus. Why is this such a difficult thing to understand? What is your sense of the mystery behind the origins of SARS-CoV-2?
BALDI: It's not something I have looked at very closely, so I wouldn't be able to tell you much more than what you have read in the general press. But it's not trivial to be able to track the source of a virus. Once it starts spreading, it's very difficult. You can say geographically, roughly what is the area, but what was the first version of that virus is difficult to track down. I think the current general consensus among scientists is that it first started in animals.
ZIERLER: In this time, of course, you write Deep Learning in Science, coming out in 2021. Was the sense, the timing there, that the concept of deep learning in science had reached a level of maturity where you could treat it as a survey in a monograph?
BALDI: Yes, and actually I wanted to publish the book earlier, but I was just too busy to write it. Ideally I should have published it earlier. Because of the pandemic, I had more time to write, so I was able to publish it during the pandemic. But GPT-4 came out in 2022. So now I have to write a second edition, because I want to incorporate all the new stuff associated with the more recent versions of large language models.
Deep Learning for all of Science
ZIERLER: I wonder what aspects of writing the book were you talking to a younger generation that needs to appreciate just how important deep learning is in science, if there's an educational aspect to the book? For students starting their careers, thinking about how deep learning can affect the kind of science that they can do, was that a message that you wanted to convey?
BALDI: Definitely, and I wanted to convey that one way to look at it, is that deep learning is a powerful technology that has applications everywhere in science. In the book, I give many examples of applications in physics, chemistry, and biology, to show how widely applicable deep learning is, in science. The title Deep Learning in Science has two meanings. One is the application of deep learning to science, to the sciences, the natural sciences; but the other one is the scientific study of deep learning. Still today, you may hear some people complaining and saying things like "Deep learning is just a bunch of recipes. Nobody understands what's going on in these networks. It's a black box approach. There is no theory." Et cetera. And that is wrong. There is theory. I have been working on the theory for decades, and there is a growing body of theory. It's just that it's not well known, and it's not always practical. It doesn't always give you answers that you can use in practice. But in terms of understanding the big picture, I think we have a growing body of very interesting theory.
We Have Passed the Turing Test
ZIERLER: You mentioned of course the book already needs an update because of GPT-4. What was so revolutionary about GPT-4 that requires this update?
BALDI: It passed the Turing test. That's the bottom line for me. It's incredibly smart, and it passed the Turing test. Another very interesting thing about the first version of GPT-4 is that it was trained on text alone. For a long time, many people thought that in order to have intelligence, you need to have senses, you need to be able to deal with visual data, audio data, smell, and so forth, and you may even need to have a robot that interacts directly with the world in order to be grounded. GPT-4 essentially shows that this may not be so necessary. If you read all the books ever written by mankind, you can become pretty smart about the world, without having seen images. In a sense, it's not too surprising if you think about Helen Keller. I don't know if you know her story, but she lived roughly a hundred years ago, and she had an infection as a young infant and became deaf and blind. She interacted with her caregivers through touch and so she had a very narrow channel of communication with the outside world. To make a long story short, she got a PhD from Harvard and wrote a book. So, having a narrow channel is not the end of intelligence. You can still become very intelligent from text alone. Of course now, we have multimodal large language models that are trained on multimodal data--text, audio, images, and video. And we have humanoid robots with GPT-4 intelligence that can interact with the world.
There was one thing I forgot to mention about the pandemic. We started using Zoom in classes. For many classes it's not great and we still want to go back to in-person classes, but I noticed that there are some classes where Zoom is actually better, and the example I wanted to give you is exactly a class that I am teaching—well, not really teaching, but [laughs]—a class that I have introduced, looking at the challenges and opportunities of AI in society, the existential problems, the problems we were discussing. Basically it is a discussion class. If you are in an auditorium, it is not a very good setting for a discussion because you have a very non-symmetric situation where you have the professor standing in front of all the other students. And students only see the back of the other students. What you really want is a circular layout, where all the participants are sitting around a circular table and are equal. At UCI, we don't have many classrooms with such a layout. However, this is the default layout you get in Zoom. I thought that was an interesting thing that came out of the pandemic that directly impacted some of our classes.
ZIERLER: Pierre, we'll bring the conversation right up to the present. The idea that GPT-4, ChatGPT-4 passes the Turing test, do we then need a new threshold? For 75 years, we've held up the Turing test as this Rubicon that we have not yet crossed. If GPT-4 does that, what's the next milestone? What should be worked toward? How will we know if we get there?
BALDI: That's a very good question, and a lot of people are working on that. Some of my colleagues, by the way, may disagree with the statement that GPT-4 passes the Turing test, but anyway, those are details. The Turing test is not a well-defined test. It's a vague idea. It doesn't specify how long the conversation should be, what kind of humans are involved in those conversations, and so forth. So, we definitely need better ways to measure intelligence at different levels, especially as these machines are becoming super intelligent, more intelligent than us. People are working on developing these tests but there are no clear ideas on how to do it. There is no new equivalent to the Turing test that I can propose to you. We have been testing large language models using IQ tests and university-level exams in different specialties.
ZIERLER: Is that because we need—if such a thing were possible—a better working definition of consciousness?
BALDI: Yes, but even intelligence is not a very well-defined term. These are terms that were invented by humans, thousands of years ago, when they knew nothing about neuroscience and nothing about computers. They capture something, but it's a very vague, very rough concept. Now you see new words emerging, things like generative AI, artificial general intelligence (AGI) and superintelligence (SI). These are vague too, but they have become quite common, and signal the struggle we are having with these concepts. Of course, we are going to put these machines through a variety of tests. You can give them IQ test. You can give them language translation test (incidentally, there are over 7,000 different human languages). You can have increasingly larger batteries of tests covering an increasing number of domains. For instance, we have shown that the best large language models can pass university levels exams in specialized areas of medicine. Of course, reasoning and mathematics are very important areas for AI. How far AI can go in mathematics? It can pass high-school or college level exams. But can it make new discoveries at the level of the top human mathematicians, and possibly even beyond? That is a fascinating question and an active area of research, which is unfortunately not very accessible to universities.
Symmetries and Limitless Possibilities
ZIERLER: We can bring the conversation right to today. What are you currently working on? What is most interesting to you generally in the field?
BALDI: I am currently working on these large language models, trying to understand how they work. I have other theoretical projects related to deep learning, for instance something we call the theory of synaptic neural balance. With my students, we are continuing to apply neural networks, particularly now transformers, which are the kind of neural networks that are used in these large language models. But we are using them for other things, not for language. We are using them in physics, for instance. It turns out that these transformers have important symmetry properties which are useful for problems in physics. We are leveraging these new neural architectures and designing them in appropriate ways for problems in physics, where you have certain symmetries that are important and need to be reflected in the architecture of the machine learning systems.
ZIERLER: These symmetries, is this chirality?
BALDI: No, rather things like the symmetry between matter and antimatter in particle colliders. When you are studying the world at the fundamental level of particles, particles and anti-particles ought to be symmetric. When we collide particles at the Large Hadron Collider, in principle we should observe matter and antimatter in a symmetric way. This is not true at the large scale of the universe, where things are not symmetric, and matter dominates. Thar remains a mystery. But we work with physicists at the LHC and apply deep learning methods to analyze the products of these high-energy collisions. Every time we see a particle, we should also see the corresponding anti particle. The neural architectures that we use to process the data from those collisions ideally should have this symmetry built into them. And there are many other symmetries that are more technical which have to be reflected in the machine learning algorithms. We have developed methods based on transformer architectures to handle such symmetries.
ZIERLER: Of course the concept of matter and antimatter compels the question, the wondering about a universe versus a multiverse, which is currently untestable. Is there a machine learning approach that might one day make the multiverse a testable proposition, do you think?
BALDI: It is possible. There are all these fundamental questions about the nature of the universe, or fundamental questions in mathematics, that we really don't know how to answer. For some of them, you can argue that very little progress has been made in decades. There is a real possibility that AI may be able to contribute to the solution of such questions. The pinnacle of the open mathematical problems, mathematical and computer science problems, is the so-called P not equal to NP problem, and we have no idea on how to solve this problem. A similar situation is found in physics with the fundamental problem of reconciling general relativity with quantum mechanics. Maybe AI will come up with something. Not today, not immediately. But what about in a few decades? It's conceivable that this could happen. And it's very exciting and very scary at the same time.
ZIERLER: We'll conclude this conversation with a retrospective question and then we'll end looking to the future. Of course what brings us together is Caltech and your very unique experiences at Caltech both as a graduate student, where officially you were in mathematics but your ideas and interests were much broader than mathematics, and then your subsequent appointments both in Biology and JPL. What has stayed with you from your Caltech days? What has shaped your motivations, your interests as a scholar, the way you approach scientific problems?
BALDI: I owe a lot to my graduate studies at Caltech. Caltech gave me the entry point I needed to start doing science. It instilled in me a passion for thinking from first principles and for interdisciplinary research. It gave me the confidence needed to work in a new direction, even when it seems wrong to most others.
ZIERLER: It's a good thing you met that librarian at—
BALDI: It changed my life.
ZIERLER: We'll end looking to the future. I want to go back to the idea—the book that you wrote, Deep Learning in Science—particularly your interest in showing how deep learning, artificial intelligence, machine learning, is going to be key to discovery and breakthrough, into some of the most implacably difficult problems in fundamental science—in physics, in chemistry, in biology. When you survey these fields, where are you most optimistic that dark matter, or the origins of life, things like that, machine learning will play a significant role in breakthrough discoveries in the future? Where are you most optimistic in both the near term and the short term?
BALDI: Across the board. I think it's a real explosion. If you go to the main machine learning conference today, there are workshops, special sessions, that are entirely focused on deep learning in science, the physical sciences, or the biomedical sciences. So, across the board. I think a very interesting area is chemistry, for instance the prediction of chemical reactions and synthetic pathways, or the discovery of new drugs or materials. I have colleagues working on climate issues and so forth. So, I see reasons for optimism across all the sciences.
ZIERLER: Finally, Pierre, a major project as you already indicated, both scientific and administrative, the necessary evolution for the Institute of Genomics and Bioinformatics, your vision on what it should become—how can we understand this vision as a microcosm more generally for where the intersection of machine and natural intelligence is headed? How is this a beacon, a sign of what is to come, what you are envisioning this Institute to be in the future?
BALDI: As I told you, I proposed this institute a few years ago, and it was not well received by our administration. I think we're just playing catch-up, to tell you the truth, and we should have done it ten years ago or more. The Institute will focus on bringing science to AI and bringing AI to science across the UCI campus. Once it is off the ground, others will have to carry the torch, so to speak. I also want the Institute to be active in the area of AI and society, AI safety, AI regulations, et cetera.
ZIERLER: The bigger idea here is that AI is going to be a part of all of human society and that should be reflected on a college campus.
BALDI: Absolutely.
ZIERLER: This has been a wonderful conversation. I want to thank you so much for spending this time with me.
BALDI: Thank you, David.
[END]
Interview Highlights
- Understanding Intelligence in All its Forms
- Experimental and Simulated Data
- AI as a Subset of Machine Learning
- Connecting Neuroscience and Algorithms
- Guardrails on AI
- Industry and the Imbalance of Computing Power
- Focus on Genomics and Bioinformatics
- From Paris to Pasadena
- The Joy of Caltech Flatness
- Neural Networks and the Hopfield Model
- A Combinatorial Connection
- Dynamical Systems at JPL
- A New Era of Big Data Biology
- Interdisciplinarity at UC Irvine
- The First Draft of the Human Genome Project
- A Research Convergence
- Overriding Evolution
- Deep Learning for all of Science
- We Have Passed the Turing Test
- Symmetries and Limitless Possibilities