TY - JOUR
T1 - Massively parallel characterization of transcriptional regulatory elements
AU - Agarwal, Vikram
AU - Inoue, Fumitaka
AU - Schubach, Max
AU - Penzar, Dmitry
AU - Martin, Beth K.
AU - Dash, Pyaree Mohan
AU - Keukeleire, Pia
AU - Zhang, Zicong
AU - Sohota, Ajuni
AU - Zhao, Jingjing
AU - Georgakopoulos-Soares, Ilias
AU - Noble, William S.
AU - Yardımcı, Galip Gürkan
AU - Kulakovskiy, Ivan V.
AU - Kircher, Martin
AU - Shendure, Jay
AU - Ahituv, Nadav
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/3/13
Y1 - 2025/3/13
N2 - The human genome contains millions of candidate cis-regulatory elements (cCREs) with cell-type-specific activities that shape both health and many disease states1. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these cCREs. Here we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of more than 680,000 sequences, representing an extensive set of annotated cCREs among three cell types (HepG2, K562 and WTC11), and found that 41.7% of these sequences were active. By testing sequences in both orientations, we find promoters to have strand-orientation biases and their 200-nucleotide cores to function as non-cell-type-specific ‘on switches’ that provide similar expression levels to their associated gene. By contrast, enhancers have weaker orientation biases, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict cCRE function and variant effects with high accuracy, delineate regulatory motifs and model their combinatorial effects. Testing a lentiMPRA library encompassing 60,000 cCREs in all three cell types further identified factors that determine cell-type specificity. Collectively, our work provides an extensive catalogue of functional CREs in three widely used cell lines and showcases how large-scale functional measurements can be used to dissect regulatory grammar.
AB - The human genome contains millions of candidate cis-regulatory elements (cCREs) with cell-type-specific activities that shape both health and many disease states1. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these cCREs. Here we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of more than 680,000 sequences, representing an extensive set of annotated cCREs among three cell types (HepG2, K562 and WTC11), and found that 41.7% of these sequences were active. By testing sequences in both orientations, we find promoters to have strand-orientation biases and their 200-nucleotide cores to function as non-cell-type-specific ‘on switches’ that provide similar expression levels to their associated gene. By contrast, enhancers have weaker orientation biases, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict cCRE function and variant effects with high accuracy, delineate regulatory motifs and model their combinatorial effects. Testing a lentiMPRA library encompassing 60,000 cCREs in all three cell types further identified factors that determine cell-type specificity. Collectively, our work provides an extensive catalogue of functional CREs in three widely used cell lines and showcases how large-scale functional measurements can be used to dissect regulatory grammar.
UR - http://www.scopus.com/inward/record.url?scp=85217237796&partnerID=8YFLogxK
U2 - 10.1038/s41586-024-08430-9
DO - 10.1038/s41586-024-08430-9
M3 - Journal articles
C2 - 39814889
AN - SCOPUS:85217237796
SN - 0028-0836
VL - 639
SP - 411
EP - 420
JO - Nature
JF - Nature
IS - 8054
M1 - 219
ER -