The life of a cell is a dynamic one, requiring split-second decisions driven by interactions with multiple small molecules in the environment. From sugars to toxins to the molecular products of sun exposure, small molecules assail cells in what are known as signaling events. 鈥淭here are thousands, if not millions, of different kinds of small molecules,鈥 says B. Franklin Pugh 鈥83, Molecular Biology and Genetics/Physiology and Biophysics. 鈥淎ny of them could come into a cell. When they do, the cell responds by putting out products that allow it to metabolize sugars, or protect itself against insults or toxins. But to do that, it needs to reprogram its genome.鈥
Receptor proteins inside the cell sense the invading small molecules. The proteins then bind to specific points along the cell鈥檚 DNA, turning on certain genes that generate the products needed to protect the cell against the molecules. Together, these small molecules and proteins interact to form a vast machinery that regulates gene expression. 鈥淭hat鈥檚 universal for all life forms,鈥 Pugh says. 鈥淚magine all those proteins and small molecules coming together in various combinations. How do these events coalesce at specific genes to turn them on?鈥
That question has been at the heart of Pugh鈥檚 research for decades. Before coming to Cornell in 2020, he carried out a groundbreaking, eight-year project at Pennsylvania State University to map the precise binding locations of more than 400 proteins on the genome of budding yeast, Saccharomyces cerevisiae. In early 2021, he and his collaborators published a major paper in the journal Nature about their findings. 鈥淲e identified the organization of each of those proteins, how they position themselves on the genome with respect to every other protein,鈥 he says. 鈥淎nd in that compilation of every individual protein, we get a sense of the machineries that control gene regulation and genome function in general.鈥
ChIP-exo: Defining the Coordinates of Protein Binding
A key aspect of the project required the researchers to define the location on the genome where each individual protein binds. 鈥淚t鈥檚 not just one place,鈥 Pugh says. 鈥淚t could be as few as 10 or 15 genes a particular protein binds to, or it could be as many as a few thousand genes.鈥
Pugh and his colleagues developed a new technique, called ChIP-exo, for defining the coordinates of protein binding. ChIP-exo builds on ChIP-seq, an earlier technique wherein the target protein is bound to a specific antibody and the dynamic interactions taking place on a cell鈥檚 DNA are chemically fixed. Then the researchers fragment the DNA using high-frequency ultrasonic waves and use the antibody to find and pull out the target protein, bringing the DNA sequence it is attached to with it.
鈥淭he problem is that with ChIP-seq you end up with a broad distribution of DNA fragment sizes,鈥 Pugh explains. 鈥淭hey can range from a hundred to a thousand base pairs. Because of that, it鈥檚 like taking a picture of something with a moving camera; it鈥檚 blurred, very low resolution.鈥
With ChIP-exo, the researchers add an enzyme to the process, which destroys the DNA up to the precise point where the target protein is bound. That creates ultra-high, single base-pair resolution. 鈥淪o now we can look at proteins bound at that site, plus an assemblage of neighboring proteins all around it, and see how they all interact,鈥 Pugh says.
PEGR: A New System for Managing and Distilling Data
Every target protein generates a data set of millions and millions of data points. 鈥淎nd because we鈥檙e looking at thousands of proteins, we鈥檙e now up to billions of data points,鈥 Pugh says. 鈥淪o we have to manage the data, and we had to develop a computational infrastructure to do that.鈥
The researchers developed a software system, Platform for Epigenetic and Genomic Regulation (PEGR), that manages and distills the overwhelming mass of data points into something humans can understand. 鈥淧EGR is also part of the visualization and dissemination process,鈥 Pugh explains. 鈥淲hen you discover something, you need to get the word out. Part of the way we communicate is to make not only the data available but a means to analyze that data, so that anyone can look at billions of data points and ask a question and hopefully get an answer.鈥
鈥淲e used [yeast] to figure out the ground rules, unencumbered by the enormous complexities of multicellular human systems.鈥
Pugh鈥檚 ultimate goal is to uncover the details of human gene regulation and genome function, but he and his fellow researchers chose to work with single-cell yeast as an interim step first. While humans have approximately 20,000 genes, yeast has only 5,000 and has been well-studied for over a hundred years. 鈥淭he yeast system is much simpler and much is already known about it,鈥 Pugh says. 鈥淪o we used it to figure out the ground rules, unencumbered by the enormous complexities of multicellular human systems.鈥
Applying Methodologies to Human Tumors
Now, Pugh and his team are collaborating with other researchers at Cornell in Ithaca and at Weill Cornell Medicine to apply their methodologies to human tissues鈥攂oth normal and diseased. 鈥淲e鈥檙e looking at tumors,鈥 he says. 鈥淭he configuration of proteins on a genome in a tumor is going to be different than for normal tissue. If we can identify those tumor protein configurations and connect them with the prognosis of the patient, we may be able to predict the prognosis going forward for other patients.鈥
Pugh鈥檚 research could add a new dimension to what researchers already understand about the genetic factors underlying diseases like cancer. 鈥淚t鈥檚 along the lines of molecular diagnostics,鈥 he says. 鈥淢any people are doing a lot of sequencing of people鈥檚 DNA and of the DNA of tumors, which might help predict outcomes. But that鈥檚 only a small part of the picture. We鈥檙e saying there鈥檚 a lot more out there to be seen in terms of molecular pictures. There鈥檚 the epigenetic component, which is partly what we鈥檙e looking at.鈥
Childhood Chemistry Set Gateway to Career
Pugh is no newcomer to Cornell. As an undergraduate, he earned a BS degree in Biology from the university. Back then he already knew he wanted to work in the biological sciences, although he didn鈥檛 know which one. His certainty stemmed from his experiences as a child, when he found a chemistry set that one of his older siblings had rejected in the basement of his family home.
鈥淚 started throwing things together, creating all kinds of messes,鈥 he says. 鈥淚 thought, 鈥楾his is cool.鈥 Everything seemed cool. I had no idea what I was doing, but I was just creating sludges of all kinds. When you get that excited about it, I guess you鈥檙e just meant to be a scientist.鈥