CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of U.S. Provisional Application Nos. 63/219,705, filed Jul. 8, 2021 and 63/313,842, filed Feb. 25, 2022. The entire contents of the above-identified applications are hereby fully incorporated herein by reference.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH
This invention was made with government support under Grant Nos. MH117886, HG009761, MH110049, and HL141201 awarded by the National Institutes of Health. The government has certain rights in the invention.
REFERENCE TO AN ELECTRONIC SEQUENCE LISTING
The contents of the electronic sequence listing (BROD-5420WP_ST26.xml”; Size is 23,452,824 (23.5 MB on disk) bytes and it was created on Jul. 8, 2022) is herein incorporated by reference in its entirety.
TECHNICAL FIELD
The subject matter disclosed herein is generally directed to methods of differentiating stem cells into target cell types and screening platforms for systematically identifying transcription factors (TFs) that drive differentiation of stem cells into target cell types.
BACKGROUND
Directed differentiation of human pluripotent stem cells into diverse cell types has the potential to realize a broad array of cellular replacement therapies and provides a tractable model that can be perturbed, genetically or chemically, to assess effects in a cell type-specific context1-5. Despite the utility of cellular engineering, however, it remains challenging or impossible to generate many cell types1-5. The best differentiation methods are often labor-intensive and can require months to produce even heterogenous or immature cell populations. Many of these methods rely on exogenous growth factors or small molecules, which are often dosage-sensitive and difficult to identify in a scalable manner. Alternatively, overexpression of transcription factors (TFs) has been shown to rapidly and efficiently generate many different cell types, including neurons and skeletal muscle cells6-12. As TFs use endogenous regulatory pathways to drive differentiation, mimicking natural development, this approach to engineering cell fate may produce higher fidelity models while illuminating aspects of cellular development. However, the process of discovering TFs for directed differentiation relies on time-intensive and low-throughput arrayed screens. Arrayed screens, in which each perturbation must be performed and tested individually, are inherently limited in their scalability, typically 5-25 TFs6-12. By contrast, pooled screening approaches, which make use of barcodes to enable multiple perturbations to be tested in parallel, are dramatically more scalable, both in terms of time and cost.
In vitro models of the human brain enable high-throughput genetic and chemical screens that can advance our understanding of complex neuro-developmental and -degenerative diseases. To simultaneously assess thousands of different perturbations and ensure unbiased results, such models should be homogenous, robust, and scalable. Current methods for generating models of the brain generally involve differentiating human embryonic stem cells (hESCs) into neural cells using exogenous factors or small molecules, a process that is labor-intensive, time-consuming, and produces non-homogeneous cell types (Douvaras P, et al., Efficient generation of myelinating oligodendrocytes from primary progressive multiple sclerosis patients by induced pluripotent stem cells. Stem Cell Reports. 2014; 3(2):250-9; Krencik R, et al., Specification of transplantable astroglial subtypes from human pluripotent stem cells. Nat Biotechnol. 2011; 29(6):528-34; Li X J, et al., Specification of motoneurons from human embryonic stem cells. Nat Biotechnol. 2005; 23(2):215-21; Perrier A L, et al., Derivation of midbrain dopamine neurons from human embryonic stem cells. Proc Natl Acad Sci USA. 2004; and Muffat J, et al., Efficient derivation of microglia-like cells from human pluripotent stem cells. Nat Med. 2016; 22(11):1358-67). Furthermore, many cell types in the brain cannot be derived from hESCs. Although methods exist for differentiating neural progenitors and some neuronal subtypes, none efficiently generate glial cells (astrocytes, oligodendrocytes, and microglia) that resemble their in vivo counterparts without transplantation (Douvaras P, et al., 2014, Krencik R, et al., 2011, and Muffat J, et al., 2016). Since glia have been shown to play critical roles in neural development and disease, including them in models is critical to the success of this approach for studying the brain (Chung W S, et al., Do glia drive synaptic and cognitive impairment in disease? Nat Neurosci. 2015; 18(11): 1539-45; and Hong S, Stevens B. Microglia: Phagocytosing to Clear, Sculpt, and Eliminate. Dev Cell. 2016; 38(2):126-8).
Thus, there is a need to develop an efficient method that can generate more complete in vitro models of the human brain. Additionally, there is a need for in vitro models of other cell types that can advance our understanding of development and disease.
SUMMARY
In certain example embodiments, the present invention provides for screening platforms for systematically identifying transcription factors (TFs) that drive differentiation of pluripotent stem cells into target cell types. In certain example embodiments, the present invention provides for differentiation methods based on overexpression of TFs to generate specific cell types. Applicants provide examples of the screening methods to identify transcription factors that are capable of differentiating stem cells into all cell types, including neural progenitors/radial glia in the developing central nervous system that are capable of differentiating into neurons, astrocytes, and oligodendrocytes. In certain embodiments, the neural progenitors are referred to as induced neural progenitors (iNPs). Some, but not all, of the iNPs become radial glial cells. Thus, “neural progenitors” as used herein may be referred to as “induced neural progenitors” or “radial glia”. Applicants further identify TFs that are capable of differentiating stem cells into cardiomyocytes.
In one aspect, the present invention provides for a method of differentiating a pluripotent cell population to a target cell type of interest comprising overexpressing one or more transcription factors (TFs) from Table 1 or Table 3 in a pluripotent cell population, and selecting cells expressing one or more target cell markers. In certain embodiments, the target cell is a neural progenitor and selecting cells comprises selecting cells expressing one or more radial glial cell markers. In certain embodiments, the one or more transcription factors are selected from the group consisting of RFX4, NFIB, ASCL1, PAX6, EOMES, FOS, OTX1, NFIC, LHX2, FANCD2, NOTCH1, SMARCC1, ESR2, ESR1, MESP1, RCOR2, GLI3, NOTCH2, HELLS, BCL11A, HES1, FANCD2, SOX9, FEZF2, and TCF7L2 or TFs that are ranked in the top 10% of any screening method in Table 1 (e.g., RFX4, NFIB, ASCL1, PAX6, EOMES, FOS, OTX1, NFIC, LHX2, RCOR2, GLI3, NOTCH2, HELLS, BCL11A, HES1, FANCD2, SOX9, FEZF2, TCF7L2). In certain embodiments, the one or more transcription factors are RFX4, NFIB, ASCL1, PAX6, or a combination thereof. In preferred embodiments, RFX4 is overexpressed to produce the neural progenitors. In certain embodiments, the method further comprises producing RFX4 neural progenitor cells in media comprising dual SMAD inhibitors. In certain embodiments, the one or more radial glial cell markers are selected from Table 2. In certain embodiments, the one or more radial glial cell markers are selected from the group consisting of NES, VIM, SLC1A3, and PAX6. In certain embodiments, the method further comprises inducing differentiation of the neural progenitors into neurons, astrocytes and/or oligodendrocytes. In certain embodiments, differentiation comprises spontaneous differentiation of the neural progenitors. In certain embodiments, differentiation comprises directed differentiation of the neural progenitors.
In certain embodiments, selecting further comprises selecting cells enriched for expression of one or more gene signatures expressed in in vivo radial glia cells. The one or more gene signatures may be any in vivo gene signature known in the art (see, e.g., Pollen et al., Molecular identity of human outer radial glia during cortical development. Cell. 2015; 163(1):55-67). In certain embodiments, selecting cells enriched for expression of one or more gene signatures expressed in in vivo radial glia cells comprises identifying gene signatures for each TF by identifying differentially expressed genes between cells overexpressing a transcription factor and control cells; and selecting cells having a signature that is enriched in an in vivo radial glia cell type. Differentially expressed genes may be identified by comparing expression of genes in cells overexpressing a transcription factor and control cells overexpressing only the reporter gene (e.g., GFP). In certain embodiments, the signature may encompass the top differentially expressed genes (e.g., top 10, 100, 1000 or more most differentially expressed genes). In certain embodiments, the gene signatures are compared to in vivo cells and the gene signatures from cells having an overexpressed transcription factor that are most enriched in the in vivo cell types are selected.
In another aspect, the present invention provides for an isolated neural progenitor cell produced by the method of any embodiment herein. In certain embodiments, the present invention provides for a therapeutic composition comprising the isolated neural progenitor cell. In certain embodiments, the present invention provides for an ex vivo system comprising the isolated neural progenitor cell.
In another aspect, the present invention provides for a method of producing neurons, astrocytes and/or oligodendrocytes comprising expressing one or more transcription factors from Table 1 in the isolated neural progenitor cell of any embodiment herein and inducing spontaneous differentiation of the isolated neural progenitor cells. In another aspect, the present invention provides for a method of producing neurons, astrocytes and/or oligodendrocytes comprising expressing one or more transcription factors from Table 1 in the isolated neural progenitor cell of any embodiment herein and inducing directed differentiation of the isolated neural progenitor cells. In preferred embodiments, the neural progenitor cell was produced by overexpression of RFX4. In certain embodiments, the method further comprises differentiating RFX4 neural progenitor cells in media comprising dual SMAD inhibitors. In certain embodiments, the RFX4 neural progenitor cells are differentiated for 7 days. In certain embodiments, the RFX4 neural progenitor cells are differentiated into CNS cell types, radial glia, and neurons. In certain embodiments, the neurons are GABAergic neurons.
In another aspect, the present invention provides for an isolated neuron, astrocyte, or oligodendrocyte produced according to any method described herein. In certain embodiments, the present invention provides for a therapeutic composition comprising the isolated neuron, astrocyte, or oligodendrocyte. In certain embodiments, the present invention provides for an ex vivo system comprising the isolated neurons, astrocytes, and/or oligodendrocytes. In preferred embodiments, the neuron is a GABAergic neuron. In certain embodiments, the GABAergic neuron can be used in a model of autism, schizophrenia, epilepsy, dementia, Alzheimer's disease, or anxiety disorders (e.g., depression).
In another aspect, the present invention provides for a non-naturally occurring population of stem cells comprising a reporter gene integrated into an endogenous locus of each stem cell in the population, wherein the endogenous locus is associated with a marker gene for a cell type of interest; the reporter gene is under control of the promoter for the marker gene; and the reporter gene and marker gene are expressed as separate proteins, whereby the marker gene and reporter gene are co-expressed upon differentiation of the stem cells into the cell type of interest. The non-naturally occurring population of stem cells may further comprise a second reporter gene integrated into a second endogenous locus of the stem cell, wherein the locus is associated with a marker gene for a second cell type of interest, and wherein the second cell type of interest is more differentiated than the first cell type of interest. The reporter gene and marker gene (e.g., first and/or second) may be separated by a ribosomal skipping site. The ribosomal skipping site may be a P2A sequence. The reporter gene may be a fluorescent protein as described herein. The cell type of interest may be any differentiated cell (e.g., more differentiated than a stem cell, including but not limited to a progenitor cell). The cell type of interest may be a neural progenitor or mature neural cell type.
In certain embodiments, the cell type of interest is a radial glia cell. The marker gene may be selected from Table 2. The marker gene may be selected from the group consisting of NES, VIM, SLC1A3, and PAX6.
In certain embodiments, the cell type of interest is an astrocyte. The marker gene may be selected from the group consisting of ALDH1L1 and GFAP.
In another aspect, the present invention provides for a pooled transcription factor screening system comprising a transcription factor library comprising one or more vectors encoding a transcription factor and a barcode identifying said transcription factor; and a population of pluripotent cells. In certain embodiments, the transcription factors encoded by the vectors are selected from Table 1 and/or Table 3. In certain embodiments, the population of pluripotent cells are stem cells. In certain embodiments, the system further comprises one or more fluorescent probes configured for detecting one or more target cell marker gene transcripts (e.g., Flow-FISH probes).
In another aspect, the present invention provides for a method of screening for transcription factors capable of differentiating pluripotent cells into a cell type of interest comprising: a) introducing a transcription factor library comprising one or more vectors to a population of pluripotent cells, wherein each vector encodes: a transcription factor selected from Table 1 and/or Table 3 or an agent capable of modulating said transcription factor, and a barcode identifying each transcription factor; b) culturing the cells to allow differentiation of the cells (e.g., 2-10 days, or 2-7 days, or 5-7 days); c) selecting cells expressing one or more marker genes for the cell type of interest; and d) determining barcodes enriched in cells expressing the one or marker genes, thereby identifying transcription factors capable of differentiating pluripotent cells into a cell type of interest. In certain embodiments, the population of pluripotent cells is a population of human embryonic stem cells (hESCs). In certain embodiments, each transcription factor is inducible. In certain embodiments, the transcription factors selected are normally expressed by the cell type of interest.
In certain embodiments, selecting cells expressing one or more marker genes for the cell type of interest comprises Flow-FISH using probes targeting one or more marker genes. In certain embodiments, selecting cells expressing one or more marker genes for the cell type of interest comprises single cell RNA-seq. In certain embodiments, selecting cells further comprises comparing single cell RNA-seq expression profiles of cells overexpressing one or more of the transcription factors to those of cells overexpressing controls (e.g., green fluorescent protein) to infer pseudotime for each cell, wherein transcription factors that increased pseudotimes direct differentiation. In certain embodiments, selecting cells further comprises grouping one or more of the transcription factors in modules that alter expression of the same gene programs, wherein transcription factors in the same modules are co-functional.
In certain embodiments, the one or more populations of pluripotent cells are stem cells. In certain embodiments, selecting cells expressing one or marker genes for the cell type of interest comprises detecting the reporter gene. In certain embodiments, selecting cells comprises FACS.
In certain embodiments, determining barcodes comprises sequencing the DNA barcode or transcript comprising the barcode. In certain embodiments, determining barcodes comprises amplification of barcode sequences (e.g., PCR).
In certain embodiments, the method further comprises introducing the transcription factor library at a low cell density, such that the cells multiply into small colonies; and inducing expression of the transcription factors or agents encoded by the vectors. In certain embodiments, the method further comprises introducing the vector library at a low MOI, such that most cells receive no more than one vector. In certain embodiments, the method further comprises introducing the vector library at a high MOI, such that most cells receive one or more vectors.
In certain embodiments, the transcription factor library comprises viral vectors. In certain embodiments, the viral vectors are lentivirus, adenovirus or adeno associated virus (AAV) vectors.
In certain embodiments, the transcription factor library further encodes a protein tag in frame with the transcription factor coding sequence.
In certain embodiments, the population of stem cells expresses a CRISPR system and the transcription factor library comprises vectors encoding one or more CRISPR guide sequences targeting one of the transcription factors. In certain embodiments, the guide sequences comprise one or more aptamer sequences specific for binding an adaptor protein and the CRISPR system comprises an enzymatically inactive CRISPR enzyme and the adaptor protein comprises a functional domain. In certain embodiments, the CRISPR system comprises an enzymatically inactive CRISPR enzyme and a functional domain. In certain embodiments, the functional domain is a transcription activation or repression domain.
In certain embodiments, the transcription factor library comprises vectors encoding a shRNA for one of the transcription factors.
In certain embodiments, identifying transcription factors further comprises determining gene signatures for each identified TF, wherein the gene signature comprises differentially expressed genes between cells overexpressing each transcription factor and control cells; and selecting transcription factors inducing a gene signature that is enriched in an in vivo cell type.
In another aspect, the present invention provides for a method of producing cardiomyocytes comprising overexpressing a transcription factor selected from the group consisting of MESP1, EOMES and ESR1 in a pluripotent cell population, and selecting cells expressing one or more cardiomyocyte markers. In certain embodiments, the transcription factor is EOMES. In certain embodiments, the amino acid sequence of EOMES is SEQ ID NO: 10807 or SEQ ID NO: 10808. In certain embodiments, the transcription factor is induced for about 2 days. In certain embodiments, the transcription factor is induced when the cell density is about 500,000 cells/ml. In certain embodiments, the one or more cardiomyocyte markers comprises TNNT2. In certain embodiments, selecting further comprises selecting cells enriched for expression of one or more gene signatures expressed in in vivo cardiomyocytes.
In another aspect, the present invention provides for an isolated cardiomyocyte produced by the method according to any embodiment herein. In certain embodiments, the present invention provides for a therapeutic composition comprising the isolated cardiomyocyte. In certain embodiments, the present invention provides for an ex vivo system comprising the isolated cardiomyocyte.
In certain embodiments, the pluripotent cell according to any embodiment herein is an embryonic stem cell (ES) or induced pluripotent stem cell. In certain embodiments, the stem cell is a human embryonic stem cell (ES). In certain embodiments, the human embryonic stem cell is selected from the group consisting of HUES66, HUES64, HUES3, HUES8, HUES53, HUES28, HUES49, HUES9, HUES48, HUES45, HUES1, HUES44, HUES6, H1, HUES62, HUES65, H7, HUES13, H9, and HUES63. In certain embodiments, the stem cell is a human induced pluripotent stem cell (iPSC). In certain embodiments, the human iPSC is selected from the group consisting of 11a, PGP1, GM08330 (also known as GM8330-8), and Mito 210.
In another aspect, the present invention provides for a stem cell comprising an exogenous nucleotide sequence capable of inducible expression of one or more transcription factors selected from the group consisting of RFX4, NFIB, ASCL1 and PAX6.
In another aspect, the present invention provides for a stem cell comprising an exogenous nucleotide sequence capable of inducible expression of one or more transcription factors selected from the group consisting of MESP1, EOMES and ESR1.
In another aspect, the present invention provides for a method of predicting transcription factor combinations for differentiating a stem cell into a cell type of interest comprising determining the average gene expression of one or more genes for two or more stem cells each expressing a single transcription factor and comparing the average expression to a gene signature specific for the cell type of interest. In certain embodiments, the method further comprises differentiating a stem cell into the cell type of interest by expressing in the stem cell a double or triple combination of transcription factors whose average gene expression is most similar to a gene signature specific for the cell type of interest.
In another aspect, the present invention provides for a method of differentiating a stem cell into a cell type of interest comprising expressing in the stem cell a double or triple combination of transcription factors selected from the clusters in Table 19.
These and other aspects, objects, features, and advantages of the example embodiments can become apparent to those having ordinary skill in the art upon consideration of the following detailed description of illustrated example embodiments.
An understanding of the features and advantages of the present invention can be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention may be utilized, and the accompanying drawings of which:
The figures herein are for illustrative purposes only and are not necessarily drawn to scale.
DETAILED DESCRIPTION OF THE EXAMPLE EMBODIMENTS General Definitions
Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. Definitions of common terms and techniques in molecular biology may be found in Molecular Cloning: A Laboratory Manual, 2nd edition (1989) (Sambrook, Fritsch, and Maniatis); Molecular Cloning: A Laboratory Manual, 4th edition (2012) (Green and Sambrook); Current Protocols in Molecular Biology (1987) (F. M. Ausubel et al. eds.); the series Methods in Enzymology (Academic Press, Inc.): PCR 2: A Practical Approach (1995) (M. J. MacPherson, B. D. Hames, and G. R. Taylor eds.): Antibodies, A Laboratory Manual (1988) (Harlow and Lane, eds.): Antibodies A Laboratory Manual, 2nd edition 2013 (E. A. Greenfield ed.); Animal Cell Culture (1987) (R. I. Freshney, ed.); Benjamin Lewin, Genes IX, published by Jones and Bartlet, 2008 (ISBN 0763752223); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0632021829); Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 9780471185710); Singleton et al., Dictionary of Microbiology and Molecular Biology 2nd ed., J. Wiley & Sons (New York, N.Y. 1994), March, Advanced Organic Chemistry Reactions, Mechanisms and Structure 4th ed., John Wiley & Sons (New York, N.Y. 1992); and Marten H. Hofker and Jan van Deursen, Transgenic Mouse Methods and Protocols, 2nd edition (2011).
As used herein, the singular forms “a”, “an”, and “the” include both singular and plural referents unless the context clearly dictates otherwise.
The term “optional” or “optionally” means that the subsequent described event, circumstance or substituent may or may not occur, and that the description includes instances where the event or circumstance occurs and instances where it does not.
The recitation of numerical ranges by endpoints includes all numbers and fractions subsumed within the respective ranges, as well as the recited endpoints.
The terms “about” or “approximately” as used herein when referring to a measurable value such as a parameter, an amount, a temporal duration, and the like, are meant to encompass variations of and from the specified value, such as variations of +/−10% or less, +/−5% or less, +/−1% or less, and +/−0.1% or less of and from the specified value, insofar such variations are appropriate to perform in the disclosed invention. It is to be understood that the value to which the modifier “about” or “approximately” refers is itself also specifically, and preferably, disclosed.
As used herein, a “biological sample” may contain whole cells and/or live cells and/or cell debris. The biological sample may contain (or be derived from) a “bodily fluid”. The present invention encompasses embodiments wherein the bodily fluid is selected from amniotic fluid, aqueous humour, vitreous humour, bile, blood serum, breast milk, cerebrospinal fluid, cerumen (earwax), chyle, chyme, endolymph, perilymph, exudates, feces, female ejaculate, gastric acid, gastric juice, lymph, mucus (including nasal drainage and phlegm), pericardial fluid, peritoneal fluid, pleural fluid, pus, rheum, saliva, sebum (skin oil), semen, sputum, synovial fluid, sweat, tears, urine, vaginal secretion, vomit and mixtures of one or more thereof. Biological samples include cell cultures, bodily fluids, cell cultures from bodily fluids. Bodily fluids may be obtained from a mammal organism, for example by puncture, or other collecting or sampling procedures.
The terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.
The term “multiplicity of infection” (MOI) as used herein refers to the ratio of agents (e.g. vector, transcription factors) introduced to target cells (e.g. stem cell, radial glia). In certain embodiments, MOI can refer to viral vectors used to introduce an agent.
Various embodiments are described hereinafter. It should be noted that the specific embodiments are not intended as an exhaustive description or as a limitation to the broader aspects discussed herein. One aspect described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced with any other embodiment(s). Reference throughout this specification to “one embodiment”, “an embodiment,” “an example embodiment,” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment,” “in an embodiment,” or “an example embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to a person skilled in the art from this disclosure, in one or more embodiments. Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention. For example, in the appended claims, any of the claimed embodiments can be used in any combination.
All publications, published patent documents, and patent applications cited herein are hereby incorporated by reference to the same extent as though each individual publication, published patent document, or patent application was specifically and individually indicated as being incorporated by reference.
Overview
The ability to engineer cell types of interest has advanced basic research and has therapeutic potential, but is currently limited to a small number of cell types. Transcription factors (TFs) regulate gene programs, thereby controlling diverse cellular processes and cell states. Although overexpression of transcription factors (TFs) has been shown to efficiently convert one cell type to another, the process of discovering the right TFs is time-intensive and low-throughput.
The ability to engineer any cell type of interest has the potential to advance our understanding of biological processes and capability to treat disease1-5. Despite this, currently only a few cell types can be generated efficiently and consistently1,2,4,5. Overexpression of transcription factors (TFs) can be used to engineer cell fates, and TFs have been shown to rapidly and efficiently generate many different cell types, including neurons and skeletal muscle cells6-12. For example, overexpressing either NEUROD1 or NEUROG2 can efficiently and rapidly differentiate hESCs into cortical neurons (Zhang Y, et al., Rapid single-step induction of functional neurons from human pluripotent stem cells. Neuron. 2013; 78(5):785-98). As TFs use endogenous regulatory pathways to drive differentiation, mimicking natural development, this approach may produce higher fidelity models while illuminating aspects of cellular development. Although overexpression of transcription factors (TFs) has been shown to efficiently convert one cell type to another, the process of discovering TFs that can direct differentiation into a desired cell type (cellular engineering) is time-intensive and low-throughput, limiting the number of transformative TFs that have been identified. Typically, candidate TFs are overexpressed individually or in specific combinations. Cells produced from independent perturbations are evaluated for similarity with the target cell type using discrete assays. This costly and time-consuming process has restricted the TFs tested per cell type to those predicted from prior studies (5-25 TFs on average), thus limiting the number of novel TFs that have been identified for cellular engineering.
To achieve a comprehensive understanding of TFs and their respective programs, Applicants developed a platform for high-throughput, systematic TF ORF overexpression that leverages barcodes for pooled screening. Applicants created a library of all annotated human TF splice isoforms (1,836 genes encoding 3,548 isoforms) and applied it to build a TF Atlas charting expression profiles in human embryonic stem cells (hESCs) overexpressing each TF. The comprehensive TF Atlas allowed systematic investigation and generalized observations, showing that 27% of TF genes could function as “master regulators” that induce differentiation when overexpressed in hESCs. Applicants mapped TF-induced expression profiles to reference cell types and validated candidate TFs for generation of diverse cell types, spanning all three germ layers and trophoblasts. Further targeted screens with a subset of the library allowed Applicants to create a tailored cellular disease model and integrate mRNA expression and chromatin accessibility data to identify downstream regulators. Finally, Applicants predicted the effects of TF combinations, demonstrated the validity of the predictions in a combinatorial TF overexpression dataset, and showed how to predict combinations of TFs that could produce target profiles of reference cell types, reducing the combinatorial search space for experiments. The TF atlas provides a comprehensive overview of gene regulatory networks and a roadmap for further understanding developmental trajectories and guiding cellular engineering efforts.
Applicants also provide different selection methods to enrich for expression of different numbers of marker genes that define the target cell type (reporter assay, Flow-FISH, and scRNA-seq).
Applicants applied the library to differentiation of human embryonic stem cells (hESCs) into neural progenitors (NPs). Applicants identified four TFs (RFX4, NFIB, PAX6, and ASCL1) that produced induced NPs (iNPs) that spontaneously differentiate into an array of central nervous system (CNS) cell types. RFX4-iNPs gave rise to the highest proportion of CNS cell types and, when combined with dual SMAD inhibition, produced iNPs at >98% purity that differentiated into predominantly GABAergic neurons, opening up new avenues for studying this cell type.
In an exemplary case, 90 TF isoforms specifically expressed in a selected target cell type (neural progenitors) were selected using available expression data (Camp et al., 2015; Johnson et al., 2015; Llorens-Bobadilla et al., 2015; Pollen et al., 2015; Shin et al., 2015; Thomsen et al., 2016; Wu et al., 2010; Zhang et al., 2016) for screening neural progenitors (NPs). Applicants chose to differentiate hESCs into induced NPs (iNPs) because NPs are born early during development, and therefore overexpression of a single TF may be sufficient to differentiate NPs from hESCs, though none have been identified. In addition, current methods for producing NPs, embryoid body formation13 or dual SMAD inhibition14, are either low-throughput or produce variable differentiation results depending on the cell line15, respectively. Through pooled screening of 90 TF isoforms, Applicants found four novel TFs (RFX4, NFIB, PAX6, and ASCL1), each of which can produce functional iNPs within 1 week. The iNPs resemble the morphology, transcriptome signature, and functional capabilities of human fetal NPs. Applicants then applied the iNPs to model neurodevelopmental disorders. These results collectively demonstrate the feasibility of using pooled TF screening to produce a diverse array of cell types that could be tailored for specific applications.
Notably, although RFX4 has not been extensively studied in neural development, RFX4-derived iNPs spontaneously differentiated into the highest proportion of cell types in the central nervous system (CNS), highlighting the importance of performing unbiased TF screens. Applicants demonstrated that RFX4-derived iNPs can be used to model neurodevelopmental disorders. Applicants also identified transcription factors capable of differentiating stem cells into cardiomyocytes. The TF screening platform provides a generalizable approach for cellular programming that could expand our ability to generate desired cell types and elucidate the complex TF regulatory networks that govern cell type specification.
Embodiments disclosed herein provide for a screening platform and methods of screening for transcription factors (TFs) that drive differentiation of stem cells into target cell types. The stem cells may be induced pluripotent stem cells (also known as iPS cells or iPSCs). The iPSCs may be patient derived.
Embodiments disclosed herein also provide for a screening platform and methods of screening for transcription factors that drive transdifferentiation of cells into target cell types. In certain embodiments, transcription factors that differentiate stem cells into a target cell (e.g., progenitor cell) can be used to transdifferentiate cells of a different lineage to target cells. In certain embodiments, TFs that are expressed in progenitor cells can be used to transdifferentiate cells of one lineage into a target cell of a different lineage.
Embodiments disclosed herein also provide also provide for high throughput screening methods for identifying transcription factors that enhance or suppress tumor growth. In certain embodiments, a barcoded transcription factor library is introduced to a cancer cell line. After growing the cancer cell line (e.g., 2 weeks) the barcodes are sequenced and enriched and depleted barcodes are identified as compared to the barcodes present in the initial library. Enriched barcodes may indicate transcription factors that enhance tumor growth and depleted barcodes may indicate transcription factors that suppress tumor growth.
In certain embodiments, the screening platform is a high-throughput multiplex screening platform.
Embodiments disclosed herein also provide for methods of using transcription factors to drive differentiation of stem cells (e.g., iPSCs or hESCs) into target cell types (e.g., neural cell types, cardiomyocytes), providing a road map for the development of an array of in vitro human models (e.g., brain) that can be tailored for specific applications. Embodiments disclosed herein also provide for in vitro models of in vivo cell types for use in modelling development and disease. In certain embodiments, target cell types can be transferred to a subject in need thereof to regenerate a diseased or damaged tissue.
Embodiments disclosed herein also provide differentiating or transdifferentiating cells into target cells in vivo by targeted modulation of transcription factors or downstream targets. In certain embodiments, the targeted modulation of transcription factors can be used to regenerate, replenish or replace damaged or diseased cells in a subject in need thereof (e.g., heart cells, pancreatic β cells, eye cells, nervous system cells).
Embodiments disclosed herein also provide for modulating transcription factors that enhance tumor growth or that suppress tumor growth. In certain embodiments, transcription factors are modulated in a treatment regimen in a subject suffering from cancer. In certain embodiments, the treatment is targeted to tumors or sites of tumors.
Many methods of modulating transcription factors may be used. In certain embodiments, the activity of transcription factors can be enhanced (e.g., by modulation of TF phosphorylation sites). In certain embodiments, TFs are overexpressed. In certain embodiments, agents capable of enhancing expression or activity of transcription factors are used. In certain embodiments, agents capable of reducing expression or activity of transcription factors are used.
Applicants provide further examples of the screening methods to identify transcription factors required for differentiation of hESCs into radial glia, neural progenitors in the developing central nervous system that are capable of differentiating into neurons, astrocytes, and oligodendrocytes. Applicants further identify TFs required for differentiation of hESCs into cardiomyocytes. The present invention also advantageously provides for high-throughput methods of screening.
Applicants identified TFs that can differentiate hESCs into radial glia. Additionally, these candidate TFs can advantageously be applied to a high-throughput screening platform for identifying TFs that direct differentiation into specific cell types of interest (e.g., interneurons, pyramidal neurons, and oligodendrocytes). The screen can advantageously be used to identify TFs that differentiate radial glia into astrocytes. The screening platform can advance understanding of gene regulation in neural development and provide robust, scalable cellular models for studying the brain.
Finally, the methods of differentiation using the identified transcription factors can advantageously produce homogenous populations of target cells (e.g., neural progenitor cell populations).
Screening Platforms
In certain embodiments, the present invention provides a screening platform for systematically identifying transcription factors (TFs) that drive differentiation of cells (e.g., pluripotent, stem cells, progenitor cells) into target cell types (e.g., neural cells, muscle cells, endocrine cells). In certain embodiments, the screening platform comprises pluripotent cells that are differentiated into target cells by overexpressing a plurality of transcription factors in the pluripotent cells. Over expression of transcription factors may be performed according to any method known in the art (e.g., introducing a vector encoding the transcription factor, introducing an agent capable of inducing expression of the endogenous gene, as described further herein). The screening platforms can provide a framework for the development of an array of in vitro human models that can be tailored for specific applications described herein. Further, the screening platform can be used to generate a transcription factor atlas, such that differential gene expression in cells differentiated using each individual transcription factor is identified. Thus, the atlas can be used to group TFs based on gene expression and to identify TFs for each target cell type. The gene expression profile generated by overexpressing single TFs in the TF Atlas can be used to predict expression profiles produced by overexpressing TF combinations (discussed further herein).
In certain embodiments, transcription factors may be selected for screening based on expression of the transcription factors in the target cell types or in progenitor cells for the target cell types. Non-limiting examples of transcription factors may be found in Tables 1, 3, 4 and 5. Cell type specific transcription factors are known in the art. Additionally, expression of transcription factors in a target cell type can be determined experimentally (e.g., by RNA sequencing).
An exemplary screening platform comprises one or more populations of pluripotent cells, a means to over express one or more transcription factors in the one or more populations of cells, and a means to identify target cells after differentiation of the cells. Each population of pluripotent cells may express a different transcription factor.
Pooled Screening Platforms
In certain embodiments, TFs are screened for differentiation of stem cells into a target cell in a pooled screen, such that a library of transcription factors are introduced to a single population of stem cells and transcription factors able to differentiate the stem cells are identified. In certain embodiments, transcription factors are introduced such that each cell receives no more than one transcription factor or are introduced such that single cells receive one or more transcription factors (e.g., 2, 3, 4, 5 transcription factors). In certain embodiments, the pooled screening platform can be used to identify combinations of transcription factors required for differentiation into a target cell type.
An exemplary pooled screening platform comprises a single population of pluripotent cells, a means to over express one or more transcription factors in one or more cells in the population of cells, and a high throughput means to identify target cells (e.g., microscopy, FACS, Flow-FISH, single cell RNA-seq, or reporter gene) and the over expressed transcription factor introduced to generate the target cells (e.g., barcode). Each pluripotent cell in the pool may express a different transcription factor or combination of transcription factors.
In certain embodiments, barcodes are used to identify the transcription factor or modulating agent for the transcription factor introduced to a cell or population of cells. In certain embodiments, stem cells differentiated into target cells are enriched (e.g., sorted) and the barcodes identified in the enriched cells indicate the transcription factors introduced. Thus, transcription factors may be identified by determining the enrichment of barcodes in cells differentiated into target cells compared to barcodes in the starting library.
Nucleic acid barcode or barcode refer to a short sequence of nucleotides (for example, DNA or RNA) that is used as an identifier for an associated molecule, such as a target molecule and/or target nucleic acid (e.g., transcription factor). A nucleic acid barcode can have a length of at least, for example, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides and can be in single- or double-stranded form. In certain embodiments, the barcode is configured for amplification and subsequent sequencing. In certain embodiments, the barcode is expressed as a transcript (e.g., poly A tailed transcript) that can be identified using a method of RNA sequencing as described further herein. In certain embodiments barcoding uses an error correcting scheme (T. K. Moon, Error Correction Coding: Mathematical Methods and Algorithms (Wiley, New York, ed. 1, 2005)).
Pluripotent Cells Stem Cells
Pluripotent cells may include any mammalian stem cell. As used herein, the term “stem cell” refers to a multipotent cell having the capacity to self-renew and to differentiate into multiple cell lineages. Mammalian stem cells may include, but are not limited to, embryonic stem cells of various types, such as murine embryonic stem cells, e.g., as described by Evans & Kaufman 1981 (Nature 292: 154-6) and Martin 1981 (PNAS 78: 7634-8); rat pluripotent stem cells, e.g., as described by lannaccone et al. 1994 (Dev Biol 163: 288-292); hamster embryonic stem cells, e.g., as described by Doetschman et al. 1988 (Dev Biol 127: 224-227); rabbit embryonic stem cells, e.g., as described by Graves et al. 1993 (Mol Reprod Dev 36: 424-433); porcine pluripotent stem cells, e.g., as described by Notarianni et al. 1991 (J Reprod Fertil Suppl 43: 255-60) and Wheeler 1994 (Reprod Fertil Dev 6: 563-8); sheep embryonic stem cells, e.g., as described by Notarianni et al. 1991 (supra); bovine embryonic stem cells, e.g., as described by Roach et al. 2006 (Methods Enzymol 418: 21-37); human embryonic stem (hES) cells, e.g., as described by Thomson et al. 1998 (Science 282: 1 145-1 147); human embryonic germ (hEG) cells, e.g., as described by Shamblott et al. 1998 (PNAS 95: 13726); embryonic stem cells from other primates such as Rhesus stem cells, e.g., as described by Thomson et al. 1995 (PNAS 92:7844-7848) or marmoset stem cells, e.g., as described by Thomson et al. 1996 (Biol Reprod 55: 254-259). In certain embodiments, the pluripotent cells may include, but are not limited to lymphoid stem cells, myeloid stem cells, neural stem cells, skeletal muscle satellite cells, epithelial stem cells, endodermal and neuroectodermal stem cells, germ cells, extraembryonic and embryonic stem cells, mesenchymal stem cells, intestinal stem cells, embryonic stem cells, and induced pluripotent stem cells (iPSCs).
As noted, prototype “human ES cells” are described by Thomson et al. 1998 (supra) and in U.S. Pat. No. 6,200,806. The scope of the term covers pluripotent stem cells that are derived from a human embryo at the blastocyst stage, or before substantial differentiation of the cells into the three germ layers. ES cells, in particular hES cells, are typically derived from the inner cell mass of blastocysts or from whole blastocysts. Derivation of hES cell lines from the morula stage has been documented and ES cells so obtained can also be used in the invention (Strelchenko et al. 2004. Reproductive BioMedicine Online 9: 623-629). As noted, prototype “human EG cells” are described by Shamblott et al. 1998 (supra). Such cells may be derived, e.g., from gonadal ridges and mesenteries containing primordial germ cells from fetuses. In humans, the fetuses may be typically 5-11 weeks post-fertilization.
In certain embodiments, mouse embryonic stem cells are used. In certain embodiments, mouse embryonic stem cells differentiated into a target cell may be transferred to a mouse to perform in vivo functional studies.
Human embryonic stem cells may include, but are not limited to the HUES66, HUES64, HUES3, HUES8, HUES53, HUES28, HUES49, HUES9, HUES48, HUES45, HUES1, HUES44, HUES6, H1, HUES62, HUES65, H7, HUES13, H9, and HUES63 cell lines. In certain embodiments, the stem cell is a human induced pluripotent stem cell (iPSC). In certain embodiments, the human iPSC is selected from the group consisting of 11a, PGP1, GM08330 (also known as GM8330-8), and Mito 210.
General techniques useful in the practice of this invention in cell culture and media uses are known in the art (e.g., Large Scale Mammalian Cell Culture (Hu et al. 1997. Curr Opin Biotechnol 8: 148); Serum-free Media (K. Kitano. 1991. Biotechnology 17: 73); or Large Scale Mammalian Cell Culture (Curr Opin Biotechnol 2: 375, 1991). The terms “culturing” or “cell culture” are common in the art and broadly refer to maintenance of cells and potentially expansion (proliferation, propagation) of cells in vitro. Typically, animal cells, such as mammalian cells, such as human cells, are cultured by exposing them to (i.e., contacting them with) a suitable cell culture medium in a vessel or container adequate for the purpose (e.g., a 96-, 24-, or 6-well plate, a T-25, T-75, T-150 or T-225 flask, or a cell factory), at art-known conditions conducive to in vitro cell culture, such as temperature of 37° C., 5% v/v CO2 and >95% humidity.
Methods related to culturing stem cells are also useful in the practice of this invention (see, e.g., “Teratocarcinomas and embryonic stem cells: A practical approach” (E. J. Robertson, ed., IRL Press Ltd. 1987); “Guide to Techniques in Mouse Development” (P. M. Wasserman et al. eds., Academic Press 1993); “Embryonic Stem Cells: Methods and Protocols” (Kursad Turksen, ed., Humana Press, Totowa N.J., 2001); “Embryonic Stem Cell Differentiation in vitro” (M. V. Wiles, Meth. Enzymol. 225: 900, 1993); “Properties and uses of Embryonic Stem Cells: Prospects for Application to Human Biology and Gene Therapy” (P. D. Rathjen et al., al., 1993). Differentiation of stem cells is reviewed, e.g., in Robertson. 1997. Meth Cell Biol 75: 173; Roach and McNeish. 2002. Methods Mol Biol 185: 1-16; and Pedersen. 1998. Reprod Fertil Dev 10: 31). For further elaboration of general techniques useful in the practice of this invention, the practitioner can refer to standard textbooks and reviews in cell biology, tissue culture, and embryology (see, e.g., Culture of Human Stem Cells (R. Ian Freshney, Glyn N. Stacey, Jonathan M. Auerbach—2007); Protocols for Neural Cell Culture (Laurie C. Doering—2009); Neural Stem Cell Assays (Navjot Kaur, Mohan C. Vemuri—2015); Working with Stem Cells (Henning Ulrich, Priscilla Davidson Negraes—2016); and Biomaterials as Stem Cell Niche (Krishnendu Roy—2010)). In certain embodiments, stem cells are spontaneously differentiated or directed to differentiate (see, e.g., Amit and Itskovitz-Eldor, Derivation and spontaneous differentiation of human embryonic stem cells, J Anat. 2002 March; 200(3): 225-232). For further methods of cell culture solutions and systems, see International Patent Publication No. WO 2014/159356A1.
Induced Pluripotent Cells
In certain embodiments, iPSCs or iPSC cell lines are used to identify transcription factors for differentiation of target cells. iPSCs advantageously can be used to generate patient specific models and cell types. iPSCs are a type of pluripotent stem cell that can be generated directly from adult cells. Further, because embryonic stem cells can only be derived from embryos, it has so far not been feasible to create patient-matched embryonic stem cell lines.
Various strategies can be used to induce pluripotency, or increase potency, in cells (Takahashi, K., and Yamanaka, S., Cell 126, 663-676 (2006); Takahashi et al., Cell 131, 861-872 (2007); Yu et al., Science 318, 1917-1920 (2007); Zhou et al., Cell Stem Cell 4, 381-384 (2009); Kim et al., Cell Stem Cell 4, 472-476 (2009); Yamanaka et al., 2009; Saha, K., Jaenisch, R., Cell Stem Cell 5, 584-595 (2009)), and improve the efficiency of reprogramming (Shi et al., Cell Stem Cell 2, 525 20 528 (2008a); Shi et al., Cell Stem Cell 3, 568-574 (2008b); Huangfu et al., Nat Biotechnol 26, 795-797 (2008a); Huangfu et al., Nat Biotechnol 26, 1269-1275 (2008b); Silva et al., Plos Bio 6, e253. doi: 10.1371/journal. pbio. 0060253 (2008); Lyssiotis et al., PNAS 106, 8912-8917 (2009); Ichida et al., Cell Stem Cell 5, 491-503 (2009); Maherali, N., Hochedlinger, K., Curr Biol 19, 1718-1723 (2009b); Esteban et 25 al., Cell Stem Cell 6, 71-79 (2010); and Feng et al., Cell Stem Cell 4, 301-3 12 (2009)).
Generally, techniques for reprogramming involve modulation of specific cellular pathways, either directly or indirectly, using polynucleotide-, polypeptide and/or small molecule-based approaches (see, e.g., International Patent Publication No. WO 2012/087965A2). The developmental potency of a cell may be increased, for example, by contacting a cell with one or more pluripotency factors. “Contacting”, as used herein, can involve culturing cells in the presence of a pluripotency factor (such as, for example, small molecules, proteins, peptides, etc.) or introducing pluripotency factors into the cell. Pluripotency factors can be introduced into cells by culturing the cells in the presence of the factor, including transcription factors such as proteins, under conditions that allow for introduction of the transcription factor into the cell. See, e.g., Zhou H et al., Cell Stem Cell. 2009 May 8; 4(5):381-4; International Patent Publication No. WO 2009/117439. Introduction into the cell may be facilitated, for example, using transient methods, e.g., protein transduction, microinjection, non-integrating gene delivery, mRNA transduction, etc., or any other suitable technique. In some embodiments, the transcription factors are introduced into the cells by expression from a recombinant vector that has been introduced into the cell, or by incubating the cells in the presence of exogenous transcription factor polypeptides such that the polypeptides enter the cell. In particular embodiments, the pluripotency factor is a transcription factor. Exemplary transcription factors that are associated with increasing, establishing, or maintaining the potency of a cell include, but are not limited to Oct-3/4, Cdx-2, 15 Gbx2, Gsh1, HesX1, HoxA10, HoxA 11, HoxB1, Irx2, Isl1, Meis1, Meox2, Nanog, Nkx2.2, Onecut, Otx1, Oxt2, Pax5, Pax6, Pdx1, Tcf1, Tcf2, Zfhxlb, Klf-4, Atbf1, Esrb, Genf, Jarid2, Jmjd1a, Jmjd2c, Klf-3, Klf-5, Mel-18, Myst3, Nac1, REST, Rex-i, Rybp, Sall4, Sall1, Tif1, YY1, Zeb2, Zfp281, Zfp57, Zic3, Coup-Tf1, Coup-Tf2, Bmi1, Rnf2, Mta1, Pias1, Pias2, Pias3, Piasy, Sox2, Lef1, Sox15, Sox6, Tcf-7, Tcf7ll, c-Myc, L-Myc, N-Myc, Hand1, Mad1, Mad3, Mad4, Mxi1, Myf5, Neurog2, Ngn3, Olig2, Tcf3, Tcf4, Foxc1, Foxd3, BAF155, C/EBPP, mafa, Eomes, Tbx-3; Rfx4, Stat3, Stella, and UTF-1. Exemplary transcription factors include Oct4, Sox2, Klf4, c-Myc, and Nanog.
Small molecule reprogramming agents are also pluripotency factors and may also be employed in the methods of the invention for inducing reprogramming and maintaining or increasing cell potency. In some embodiments of the invention, one or more small molecule reprogramming agents are used to induce pluripotency of a somatic cell, increase or maintain the potency of a cell, or improve the efficiency of reprogramming. In some embodiments, small molecule reprogramming agents are employed in the methods of the invention to improve the efficiency of reprogramming. Improvements in efficiency of reprogramming can be measured by (1) a decrease in the time required for reprogramming and generation of pluripotent cells (e.g., by shortening the time to generate pluripotent cells by at least a day compared to a similar or same process without the small molecule), or alternatively, or in combination, (2) an increase in the number of pluripotent cells generated by a particular process (e.g., increasing the number of cells reprogrammed in a given time period by at least 10%, 30%, 50%, 100%, 200%, 500%, etc. compared to a similar or same process without the small molecule). In some embodiments, a 2-fold to 20-fold improvement in reprogramming efficiency is observed. In some embodiments, reprogramming efficiency is improved by more than 20 fold. In some embodiments, a more than 100 fold improvement in efficiency is observed over the method without the small molecule reprogramming agent (e.g., a more than 100 fold increase in the number of pluripotent cells generated). Several classes of small molecule reprogramming agents may be important to increasing, establishing, and/or maintaining the potency of a cell. Exemplary small molecule reprogramming agents include, but are not limited to: agents that inhibit H3K9 methylation or promote H3K9 demethylation; agents that inhibit H3K4 demethylation or promotes H3K4 methylation; agents that inhibit histone deacetylation or promote histone acetylation; L-type Ca channel agonists; activators of the cAMP pathway; DNA methyltransferase (DNMT) inhibitors; nuclear receptor ligands; GSK3 inhibitors; MEK inhibitors; TGFP receptor/ALK5 inhibitors; HDAC inhibitors; Erk inhibitors; ROCK inhibitors; FGFR inhibitors; and PARP inhibitors. Exemplary small molecule reprogramming agents include GSK3 inhibitors; MEK inhibitors; TGFP receptor/ALK5 inhibitors; HDAC inhibitors; Erk inhibitors; and ROCK inhibitors.
In some embodiments of the invention, small molecule reprogramming agents are used to replace one or more transcription factors in the methods of the invention to induce pluripotency, improve the efficiency of reprogramming, and/or increase or maintain the potency of a cell. For example, in some embodiments, a cell is contacted with one or more small molecule reprogramming agents, wherein the agents are included in an amount sufficient to improve the efficiency of reprogramming. In other embodiments, one or more small molecule reprogramming agents are used in addition to transcription factors in the methods of the invention. In one embodiment, a cell is contacted with at least one pluripotency transcription factor and at least one small molecule reprogramming agent under conditions to increase, establish, and/or maintain the potency of the cell or improve the efficiency of the reprogramming process. In another embodiment, a cell is contacted with at least one pluripotency transcription factor and at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, or at least ten small molecule reprogramming agents under conditions and for a time sufficient to increase, establish, and/or maintain the potency of the cell or improve the efficiency of reprogramming. The state of potency or differentiation of cells can be assessed by monitoring the pluripotency characteristics (e.g., expression of markers including, but not limited to SSEA-3, SSEA-4, TRA-1-60, TRA-1-81, TRA-2-49/6E, Oct-3/4, Sox2, Nanog, GDF3, REX1, FGF4, ESG1, DPPA2, DPPA4, and hTERT).
Introducing Transcription Factors
In certain embodiments, the screening platform may comprise an open reading frame (ORF) or cDNA encoding each transcription factor used in the screen (as used herein cDNA or ORF may be used interchangeably). A cDNA may be synthesized and cloned into a vector. A plurality of cDNAs may be cloned into a library of vectors, such that each transcription factor is represented in the library. Representative transcription factor libraries are known in the art (see, e.g., Yang et al., 2011, A public genome-scale lentiviral expression library of human ORFs Nature Methods 8, 659-66; and portals.broadinstitute.org/gpp/public/).
In certain embodiments, the screening platform may comprise an agent capable of overexpressing or modulating activity of endogenous transcription factors. In certain embodiments, the agent may be a CRISPR system. In certain embodiments, pluripotent cells are differentiated into target cells by introducing a CRISPR system targeting the endogenous loci encoding the transcription factors. In certain embodiments, the CRISPR system comprises a functional domain that is targeted to the endogenous loci encoding the transcription factors. The functional domain may be a transcriptional activator or repressor (see, e.g., Konermann et al. “Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex” Nature. 2014 Dec. 10. doi: 10.1038/nature14136; Qi, L. S., et al. (2013). “Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression”. Cell. 152 (5): 1173-83; and Gilbert, L. A., et al., (2013). “CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes”. Cell. 154 (2): 442-51). In certain embodiments, a functional domain is targeted to a genomic locus encoding a transcription factor using a guide sequence that includes one or more aptamer sequences. In particular embodiments, this is ensured by the use of adaptor protein/aptamer combinations that exist within the diversity of bacteriophage coat proteins. Examples of such coat proteins include but are not limited to: MS2, PP7, Qβ, F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, ϕCb5, ϕCb8r, ϕCb12r, ϕCb23r, 7s and PRR1. In particular embodiments, the aptamer is a minimal hairpin aptamer which selectively binds dimerized MS2 bacteriophage coat proteins in mammalian cells and is introduced into the guide molecule, such as in the stemloop and/or in a tetraloop. In these embodiments, the functional domain is fused to MS2 (see, e.g., Konermann et al., Nature 2015, 517(7536): 583-588).
In certain embodiments, the arrayed screening platform can utilize multiwell plates to introduce individual transcription factors or an agent capable of modulating said transcription factors to populations of pluripotent cells. As used throughout the specification, reference to introducing transcription factors can refer to overexpressing the transcription factor from a vector or introducing an agent capable of modulating said transcription factor (e.g., CRISPR system targeting the transcription factor). Thus, each well of the multiwell plate may be configured for overexpression of a single transcription factor or combination of multiple transcription factors.
In certain embodiments, transcription factors may be introduced to individual cells by nanowires (see e.g., Shalek et al., Vertical silicon nanowires as a universal platform for delivering biomolecules into living cells, PNAS, Volume 107, Issue 1870 February, 2010). This modality enables one to assess the phenotypic consequences of introducing a broad range of biological effectors (DNAs, RNAs, peptides, proteins, and small molecules) into almost any cell type. In certain embodiments, the nanowires may be configured on a microarray format. In certain embodiments, the microarray may be configured for overexpressing transcription factors in a site-specific fashion. In certain embodiments, the array may be coupled with live-cell imaging.
Vectors
In certain embodiments, vectors are used to overexpress or modulate expression of transcription factors. Vectors for introducing CRISPR systems are described further herein.
The term “vector” generally denotes a tool that allows or facilitates the transfer of an entity from one environment to another. More particularly, the term “vector” as used throughout this specification refers to nucleic acid molecules to which nucleic acid fragments (cDNA) may be inserted and cloned, i.e., propagated. Hence, a vector is typically a replicon, into which another nucleic acid segment may be inserted, such as to bring about the replication of the inserted segment in a defined host cell or vehicle organism.
A vector thus typically contains an origin of replication and other entities necessary for replication and/or maintenance in a host cell. A vector may typically contain one or more unique restriction sites allowing for insertion of nucleic acid fragments. A vector may also preferably contain a selection marker, such as, e.g., an antibiotic resistance gene or auxotrophic gene (e.g., URA3, which encodes an enzyme necessary for uracil biosynthesis or TRP1, which encodes an enzyme required for tryptophan biosynthesis), to allow selection of recipient cells that contain the vector. Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double-stranded, or partially double-stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g., circular); nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art.
Expression vectors are generally configured to allow for and/or effect the expression of nucleic acids (e.g., cDNA, CRISPR system) introduced thereto in a desired expression system, e.g., in vitro, in a host cell, host organ and/or host organism. For example, the vector can express nucleic acids functionally or operatively linked to regulatory element(s) and hence the regulatory element(s) drive expression. The promoter(s) can be constitutive promoter(s) and/or conditional promoter(s) and/or inducible promoter(s). In certain embodiments, the vectors comprise regulatory sequences for inducible expression of cDNAs encoding transcription factors. Thus, expression of the transcription factors in cells can induced at particular time points after introducing the vectors. Inducible expression systems are known in the art and may include, for example, Tet on/off systems (see, e.g., Gossen et al., Transcriptional activation by tetracyclines in mammalian cells. Science. 1995 Jun. 23; 268(5218):1766-9).
In certain example embodiments, the vectors disclosed herein may further encode an epitope tag in frame with the transcription factors for use in downstream assessment of protein expression and TF abundance in cell populations respectively. Epitope tags provide high sensitivity and specificity in detection by specific antigen binding molecules (e.g., antibodies, aptamers). Exemplary epitope tags include, but are not limited to, Flag, CBP, GST, HA, HBH, MBP, Myc, polyHis, S-tag, SUMO, TAP, TRX, or V5.
Vectors may include, without limitation, plasmids (which refer to circular double stranded DNA loops which, in their vector form are not bound to the chromosome), episomes, phagemids, bacteriophages, bacteriophage-derived vectors, bacterial artificial chromosomes (BAC), yeast artificial chromosomes (YAC), P1-derived artificial chromosomes (PAC), transposons, cosmids, linear nucleic acids, viral vectors, etc., as appropriate. A vector can be a DNA or RNA vector. A vector can be a self-replicating extrachromosomal vector or a vector which integrates into a host genome, hence, vectors can be autonomous or integrative.
The term “viral vectors” refers to the use as viruses, or virus-associated vectors as carriers of the nucleic acid construct into the cell. Constructs may be integrated and packaged into non-replicating, defective viral genomes like adenovirus, adeno-associated virus (AAV), or herpes simplex virus (HSV) or others, including retroviral and lentiviral vectors, for infection or transduction into cells. The vector may or may not be incorporated into the cell's genome. The constructs may include viral sequences for transfection, if desired. Alternatively, the construct may be incorporated into vectors capable of episomal replication, e.g., EPV and EBV vectors.
Methods for introducing nucleic acids, including vectors, expression cassettes and expression vectors, into cells (e.g., transfection, transduction or transformation) are known to the person skilled in the art, and may include calcium phosphate co-precipitation, electroporation, micro-injection, protoplast fusion, lipofection, exosome-mediated transfection, transfection employing polyamine transfection reagents, bombardment of cells by nucleic acid-coated tungsten micro projectiles, viral particle delivery, etc.
Identification of Target Cells
In certain embodiments, differentiation of pluripotent cells is monitored. In certain embodiments, differentiation of pluripotent cells is monitored by microscopy. The screening method may further be combined with live cell imaging to monitor differentiation upon overexpression of transcription factors. The screening method may also be combined with FACS or ELISA assays to determine cells expressing markers specific for differentiated cell types. Additionally, methods of detecting target cell specific markers may include detecting reporter genes linked to marker genes, FISH, Flow-FISH, RNA sequencing, single cell RNA sequencing, quantitative RT-PCR, or western blot. In preferred embodiments, a pooled screen uses three different selection methods to enrich for cells that express one or more marker genes that define the target cell type; reporter assay, Flow-FISH, and scRNA-seq. In preferred embodiments, each transcription factor is associated with a unique barcode sequence that can be detected using sequencing.
Reporter Genes
In certain embodiments, differentiated target cells can be identified and enriched from a pool of cells using a detectable marker (i.e., high throughput means to identify target cells). In certain embodiments, the pooled screening platform uses detectable markers associated with marker genes specific to target cells to identify transcription factors.
In certain embodiments, the detectable marker is integrated into a genomic locus in the pool of cells such that the detectable marker is under control of the regulatory sequences for a target cell specific marker gene. In other words, a polynucleotide sequence encoding a detectable marker is integrated into a genomic locus encoding a marker gene, such that the marker gene and detectable marker are under control of the regulatory sequences for the marker gene and upon activation of the marker gene the detectable marker is co-expressed. In certain embodiments, the marker gene and detectable marker are expressed as separate proteins to avoid the detectable marker from interfering with proper protein folding and function of the marker gene. Thus, the detectable marker can be used to monitor activation of the marker gene to indicate differentiation into a target cell type. Thus, the present invention also provides for a population of pluripotent cells comprising a detectable marker integrated into an endogenous marker gene specific for a target cell.
Integration of the detectable marker gene at a genomic locus can be performed using known methods in the art. In certain embodiments, a donor construct is used to integrate a polynucleotide sequence encoding the detectable marker. In certain embodiments, the donor construct may comprise a nucleotide sequence encoding: a detectable marker, and optionally, a resistance gene operably linked to a separate regulatory sequence. Cells having the donor construct integrated can be selected based on fluorescence of the detectable marker. Cells having the donor construct integrated can be selected based on selection of cells expressing the resistance gene. The cells can be further selected by determining the integration site of the donor construct.
Selectable markers are known in the art and enable screening for targeted integrations. Examples of selectable markers include, but are not limited to, antibiotic resistance genes, such as beta-lactamase, neo, FabI, URA3, cam, tet, blasticidin, hyg, puromycin and the like. A selectable marker useful in accordance with the invention may be any selectable marker appropriate for use in a eukaryotic cell, such as a mammalian cell, or more specifically a human cell. One of skill in the art will understand and be able to identify and use selectable markers in accordance with the invention.
In certain embodiments, the donor construct is a plasmid, vector, PCR product, or synthesized polynucleotide sequence. In certain embodiments, the donor construct is modified to increase stability or to increase efficiency of integration into a genomic locus. In certain embodiments, the donor construct is modified by a 5′ and/or 3′ phosphorylation modification. In certain embodiments, the donor construct is modified by one or more internal or terminal PTO modifications. Phosphorothioate (PTO) modifications are used to generate nuclease resistant oligonucleotides. In PTO oligonucleotides, a non-bridging oxygen is replaced by a sulfur atom. Therefore, PTOs are also known as “S-oligos”. Phosphorothioate can be introduced to an oligonucleotide at the 5′- or 3′-end to inhibits exonuclease degradation and internally to limit the attack by endonucleases. In certain embodiments, the donor construct is obtained using PCR amplification and the 5′ phosphorylation is introduced using 5′ phosphorylated primers.
In certain embodiments, a genetic modifying agent is used to target the donor construct sequence to the correct genomic location (e.g., CRISPR, TALEN, Zinc finger protein, meganuclease).
In certain embodiments, a method of tagging genes in cells uses a donor template having homology arms that can be integrated at a target locus in the genome of a cell using homology dependent based repair mechanisms. In certain embodiments, a method of tagging genes in cells uses a generic donor template that can be integrated at any target locus in the genome of a cell using homology independent based repair mechanisms. In certain embodiments, gene tagging uses a CRISPR system. In certain embodiments, gene tagging uses a system that alleviates the need for homology templates. Previous reports using zinc-finger nucleases, TALE effector nucleases or CRISPR-Cas9 technology have shown that plasmids containing an endonuclease cleavage site can be integrated in a homology-independent manner and any of these methods may be used for constructing the tagged pluripotent population of cells of the present invention (see, e.g., Lackner, D. H. et al. A generic strategy for CRISPR-Cas9-mediated gene tagging. Nat. Commun. 6:10237 doi: 10.1038/ncomms10237 (2015); Auer, et al., Highly efficient CRISPR/Cas9-mediated knock-in in zebrafish by homology-independent DNA repair. Genome Res. 24, 142-153 (2014); Maresca, et al., Obligate ligation-gated recombination (ObLiGaRe): custom-designed nuclease-mediated targeted integration through nonhomologous end joining. Genome Res. 23, 539-546 (2013); and Cristea, S. et al., In vivo cleavage of transgene donors promotes nuclease-mediated targeted integration. Biotechnol. Bioeng. 110, 871-880 (2013)).
In certain embodiments, cells are tagged by introducing a ribonucleoprotein complex (RNP) comprising a donor sequence, guide sequences targeting a genomic locus and a CRISPR system. Delivery of CRISPR RNP complexes is described further herein. For example, the RNP complexes may be delivered to a population of cells by transfection.
In certain embodiments, the detectable marker is integrated downstream of the marker gene. In certain embodiments, the detectable marker is integrated upstream of the marker gene.
In certain embodiments, the detectable marker is separated from the marker gene by a ribosomal skipping site. Ribosomal ‘skipping’ refers to generating more than one protein during translation where a specific sequence in the nascent peptide chain prevents the ribosome from creating the peptide bond with the next proline. Translation continues and gives rise to a second chain. This mechanism results in apparent co-translational cleavage of the polyprotein. This process is induced by a ‘2A-like’, or CHYSEL (cis-acting hydrolase element) sequence. In other words, a normal peptide bond is impaired at the site, resulting in two discontinuous protein fragments from one translation event.
In certain embodiments, the detectable marker is a fluorescent protein such as green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), red fluorescent protein (RFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), miRFP (e.g., miRFP670, see, Shcherbakova, et al., Nat Commun. 2016; 7: 12405), mCherry, tdTomato, DsRed-Monomer, DsRed-Express, DSRed-Express2, DsRed2, AsRed2, mStrawberry, mPlum, mRaspberry, HcRed1, E2-Crimson, mOrange, mOrange2, mBanana, ZsYellow1, TagBFP, mTagBFP2, Azurite, EBFP2, mKalamal, Sirius, Sapphire, T-Sapphire, ECFP, Cerulean, SCFP3A, mTurquoise, mTurquoise2, monomelic Midoriishi-Cyan, TagCFP, niTFP1, Emerald, Superfolder GFP, Monomeric Azami Green, TagGFP2, mUKG, mWasabi, Clover, mNeonGreen, Citrine, Venus, SYFP2, TagYFP, Monomeric Kusabira-Orange, mKOk, mK02, mTangerine, mApple, mRuby, mRuby2, HcRed-Tandem, mKate2, mNeptune, NiFP, mkeima Red, LSS-mKate1, LSS-mkate2, mBeRFP, PA-GFP, PAmCherry1, PATagRFP, TagRFP6457, IFP1.2, iRFP, Kaede (green), Kaede (red), KikGR1 (green), KikGR1 (red), PS-CFP2, mEos2 (green), mEos2 (red), mEos3.2 (green), mEos3.2 (red), PSmOrange, Dronpa, Dendra2, Timer, AmCyan1, or a combination thereof. In certain embodiments, the detectable marker is a cell surface marker. In other instances, the cell surface marker is a marker not normally expressed on the cells, such as a truncated nerve growth factor receptor (tNGFR), a truncated epidermal growth factor receptor (tEGFR), CD8, truncated CD8, CD19, truncated CD19, a variant thereof, a fragment thereof, a derivative thereof, or a combination thereof.
In certain embodiments, the signal of the detectable marker may be enhanced by using a fluorescently labeled antibody, antibody fragment, nanobody, or aptamer. The binding agent may be specific to the detectable marker.
Flow-FISH
In certain embodiments, Flow FISH (fluorescent in-situ hybridization) is used to identify target cells in transcription factor screens. Flow FISH is a cytogenetic technique to quantify the copy number of RNA or specific repetitive elements in genomic DNA of whole cell populations via the combination of flow cytometry with cytogenetic fluorescent in situ hybridization staining protocols (see, e.g., C. P. Fulco et al., Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat Genet 51, 1664-1669 (2019); and Coillard A, Segura E. Visualization of RNA at the Single Cell Level by Fluorescent in situ Hybridization Coupled to Flow Cytometry. Bio Protoc. 2018; 8(12):e2892). The method provides for detecting marker genes for indicating differentiation of target cells using gene specific FISH probes and sorting the cells. In certain embodiments, multiple markers are used to increase specificity. Selecting for multiple reporter genes at the same time can narrow down target cell types because in certain embodiments one gene is not specific enough depending on the target cell type. Additionally, the assay is versatile in that reporter genes can be added or changed by applying different probes. Flow FISH combines FISH to fluorescently label mRNA of reporter genes and flow cytometry (see, e.g., Arrigucci et al., FISH-Flow, a protocol for the concurrent detection of mRNA and protein in single cells using fluorescence in situ hybridization and flow cytometry, Nat Protoc. 2017 June; 12(6):1245-1260. doi:10.1038/nprot.2017.039). In certain embodiments, the mRNA of reporter genes is fluorescently labeled; target cells are selected by flow cytometry; and TF barcodes are sequenced (e.g., amplified and then sequenced) to identify TFs enriched in the target cells. In certain embodiments, the marker genes are selected, such that they are specifically expressed only in the target cell. In this way, false positive selection or background is avoided. In certain embodiments, the assay is optimized to remove background fluorescence and to select for true positive cells.
Single Cell RNA-seq
In certain embodiments, the invention provides for identifying transcription factors whose overexpression can differentiate stem cells or progenitor cells into target cells by using single cell sequencing methods. In certain embodiments, transcription factors are introduced to a population of cells and single cells are analyzed by single cell sequencing. The population of cells may be analyzed with or without an integrated detectable marker. The introduced transcription factors can be identified in cells having a gene signature or biological program of interest (e.g., signature characteristic of the target cell). As used herein a “signature” may encompass any gene or genes, protein or proteins, or epigenetic element(s) whose expression profile or whose occurrence is associated with a specific cell type, subtype, or cell state of a specific cell type or subtype within a population of cells. A gene signature as used herein, may thus refer to any set of up- and down-regulated genes that are representative of a cell type or subtype or cell state. In certain embodiments, transcription factors are introduced at a high MOI to identify combinations of transcription factors capable of inducing a signature or biological program characteristic of the target cell of interest.
The transcription factors introduced may be identified by a barcode associated with each transcription factor. The barcode may be expressed on a transcript capable of identification by RNA-seq (e.g., a poly-A tailed transcript including the barcode sequence). In certain embodiments, single cells can be analyzed for a target cell phenotype or target cell subtypes after introducing transcription factors identified by the screening methods described herein. Thus, single cell sequencing may be used for identification of transcription factors and for analysis of cells differentiated by overexpressing transcription factors.
In certain embodiments, the invention involves single cell RNA sequencing (see, e.g., Kalisky, T., Blainey, P. & Quake, S. R. Genomic Analysis at the Single-Cell Level. Annual review of genetics 45, 431-445, (2011); Kalisky, T. & Quake, S. R. Single-cell genomics. Nature Methods 8, 311-314 (2011); Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research, (2011); Tang, F. et al. RNA-Seq analysis to capture the transcriptome landscape of a single cell. Nature Protocols 5, 516-535, (2010); Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nature Methods 6, 377-382, (2009); Ramskold, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nature Biotechnology 30, 777-782, (2012); and Hashimshony, T., Wagner, F., Sher, N. & Yanai, I. CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification. Cell Reports, Cell Reports, Volume 2, Issue 3, p 666-673, 2012).
In certain embodiments, the invention involves plate based single cell RNA sequencing (see, e.g., Picelli, S. et al., 2014, “Full-length RNA-seq from single cells using Smart-seq2” Nature protocols 9, 171-181, doi:10.1038/nprot.2014.006).
In certain embodiments, the invention involves high-throughput single-cell RNA-seq. In this regard reference is made to Macosko et al., 2015, “Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets” Cell 161, 1202-1214; International Patent Application No. PCT/US2015/049178, published as International Patent Publication No. WO 2016/040476 on Mar. 17, 2016; Klein et al., 2015, “Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells” Cell 161, 1187-1201; International Patent Application No. PCT/US2016/027734, published as International Patent Publication No. WO 2016/168584A1 on Oct. 20, 2016; Zheng, et al., 2016, “Haplotyping germline and cancer genomes with high-throughput linked-read sequencing” Nature Biotechnology 34, 303-311; Zheng, et al., 2017, “Massively parallel digital transcriptional profiling of single cells” Nat. Commun. 8, 14049 doi: 10.1038/ncomms14049; International Patent Publication No. WO 2014 210353A2; Zilionis, et al., 2017, “Single-cell barcoding and sequencing using droplet microfluidics” Nat Protoc. January; 12(1):44-73; Cao et al., 2017, “Comprehensive single cell transcriptional profiling of a multicellular organism by combinatorial indexing” bioRxiv preprint first posted online Feb. 2, 2017, doi: dx.doi.org/10.1101/104844; Rosenberg et al., 2017, “Scaling single cell transcriptomics through split pool barcoding” bioRxiv preprint first posted online Feb. 2, 2017, doi: dx.doi.org/10.1101/105163; Rosenberg et al., “Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding” Science 15 Mar. 2018; Vitak, et al., “Sequencing thousands of single-cell genomes with combinatorial indexing” Nature Methods, 14(3):302-308, 2017; Cao, et al., Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 357(6352):661-667, 2017; Gierahn et al., “Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput” Nature Methods 14, 395-398 (2017); and Hughes, et al., “Highly Efficient, Massively-Parallel Single-Cell RNA-Seq Reveals Cellular States and Molecular Features of Human Skin Pathology” bioRxiv 689273; doi: doi.org/10.1101/689273, all the contents and disclosure of each of which are herein incorporated by reference in their entirety.
In certain embodiments, the invention involves single nucleus RNA sequencing. In this regard reference is made to Swiech et al., 2014, “In vivo interrogation of gene function in the mammalian brain using CRISPR-Cas9” Nature Biotechnology Vol. 33, pp. 102-106; Habib et al., 2016, “Div-Seq: Single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons” Science, Vol. 353, Issue 6302, pp. 925-928; Habib et al., 2017, “Massively parallel single-nucleus RNA-seq with DroNc-seq” Nat Methods. 2017 October; 14(10):955-958; International Patent Application No. PCT/US2016/059239, published as WO 2017/164936 on Sep. 28, 2017; Patent Application No. PCT/US2018/060860, published as WO 2019/094984 on May 16, 2019; Patent Application No. PCT/US2019/055894, published as WO 2020/077236 on Apr. 16, 2020; and Drokhlyansky, et al., “The enteric nervous system of the human and mouse colon at a single-cell resolution,” bioRxiv 746743; doi: doi.org/10.1101/746743, which are herein incorporated by reference in their entirety.
In certain embodiments, the invention involves the Assay for Transposase Accessible Chromatin using sequencing (ATAC-seq) as described. (see, e.g., Buenrostro, et al., Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods 2013; 10 (12): 1213-1218; Buenrostro et al., Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486-490 (2015); Cusanovich, D. A., Daza, R., Adey, A., Pliner, H., Christiansen, L., Gunderson, K. L., Steemers, F. J., Trapnell, C. & Shendure, J. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science. 2015 May 22; 348(6237):910-4. doi: 10.1126/science.aab1601. Epub 2015 May 7; US20160208323A1; US20160060691A1; and WO2017156336A1).
In certain embodiments, the invention involves single cell multimodal data. Multiomic review (see, e.g., Lee J, Hyeon D Y, Hwang D. Single-cell multiomics: technologies and data analysis methods. Exp Mol Med. 2020; 52(9):1428-1442. doi:10.1038/s12276-020-0420-2). In certain embodiments, SHARE-Seq (Ma, S. et al. Chromatin potential identified by shared single cell profiling of RNA and chromatin. bioRxiv 2020.06.17.156943 (2020) doi:10.1101/2020.06.17.156943) is used to generate single cell RNA-seq and chromatin accessibility data. In certain embodiments, CITE-seq (Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865-868 (2017)) (cellular proteins) is used to generate single cell RNA-seq and proteomics data. In certain embodiments, Patch-seq (Cadwell, C. R. et al. Electrophysiological, transcriptomic and morphologic profiling of single neurons using Patch-seq. Nat. Biotechnol. 34, 199-203 (2016)) is used to generate single cell RNA-seq and patch-clamping electrophysiological recording and morphological analysis of single neurons data (see, e.g., van den Hurk, et al., Patch-Seq Protocol to Analyze the Electrophysiology, Morphology and Transcriptome of Whole Single Neurons Derived From Human Pluripotent Stem Cells, Front Mol Neurosci. 2018; 11: 261).
Transcription Factor Modules
In example embodiments, the invention provides for identifying transcription factors whose overexpression can differentiate stem cells or progenitor cells into target cells by using single cell sequencing methods. In example embodiments, selecting cells further comprises grouping one or more of the transcription factors into modules that alter expression of the same gene programs, such that transcription factors in the same modules are co-functional (i.e., function in similar pathways or have similar functions). As used herein the term “gene program” or “program” can be used interchangeably with “biological program”, “expression program”, “transcriptional program”, “expression profile”, or “expression program” and may refer to a set of genes that share a role in a biological function (e.g., an activation program, cell differentiation program, proliferation program). Biological programs can include a pattern of gene expression that result in a corresponding physiological event or phenotypic trait. Biological programs can include up to several hundred genes that are expressed in a spatially and temporally controlled fashion. Expression of individual genes can be shared between biological programs. Expression of individual genes can be shared among different single cell types; however, expression of a biological program may be cell type specific or temporally specific (e.g., the biological program is expressed in a cell type at a specific time). Multiple biological programs may include the same gene, reflecting the gene's roles in different processes. Expression of a biological program may be regulated by a master switch, such as a transcription factor or chromatin modifier. As used herein, the term “topic” refers to a biological program. The biological program can be modeled as a distribution over expressed genes.
One method to identify cell programs is non-negative matrix factorization (NMF) (see, e.g., Lee D D and Seung H S, Learning the parts of objects by non-negative matrix factorization, Nature. 1999 Oct. 21; 401(6755):788-91). As an alternative, a generative model based on latent Dirichlet allocation (LDA) (Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent Dirichlet allocation. J Mach Learn Res 3, 993-1022), or “topic modeling” may be created. Topic modeling is a statistical data mining approach for discovering the abstract topics that explain the words occurring in a collection of text documents. Originally developed to discover key semantic topics reflected by the words used in a corpus of documents (Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41, 391-407), topic modeling can be used to explore gene programs (“topics”) in each cell (“document”) based on the distribution of genes (“words”) expressed in the cell. A gene can belong to multiple programs, and its relative relevance in the topic is reflected by a weight. A cell is then represented as a weighted mixture of topics, where the weights reflect the importance of the corresponding gene program in the cell. Topic modeling using LDA has recently been applied to scRNA-seq data (see, e.g., Bielecki, Riesenfeld, Kowalczyk, et al., 2018 Skin inflammation driven by differentiation of quiescent tissue-resident ILCs into a spectrum of pathogenic effectors. bioRxiv 461228; and duVerle, D. A., Yotsukura, S., Nomura, S., Aburatani, H., and Tsuda, K. (2016). CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data. BMC Bioinformatics 17, 363). Other approaches include word embeddings. Identifying cell programs can recover cell states and bridge differences between cells. Single cell types may span a range of continuous cell states (see, e.g., Shekhar et al., Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics Cell. 2016 Aug. 25; 166(5):1308-1323.e30; and Bielecki, et al., 2018).
Pseudotime
In example embodiments, the invention provides for identifying transcription factors whose overexpression can differentiate stem cells or progenitor cells into target cell types by using single cell sequencing methods. In example embodiments, selecting cells further comprises inferring pseudotime distribution of cells by comparing expression profiles of single cells overexpressing one or more of the transcription factors to those overexpressing controls (e.g., empty vector not expressing a transcription factor or a vector overexpressing a control protein), wherein transcription factors that increase pseudotimes direct differentiation. The methods of the invention can use any trajectory inference (TI) method (see, e.g., Cao J, Spielmann M, Qiu X, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019; 566(7745):496-502; Chen H, Albergante L, Hsu J Y, et al. Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nat Commun. 2019; 10(1):1903; and Van den Berge K, Roux de Bézieux H, Street K, et al. Trajectory-based differential expression analysis for single-cell sequencing data. Nat Commun. 2020; 11(1):1201).
Cellular processes, such as cell differentiation and cell maturation, are dynamic in nature and not always well described by discrete analysis like clustering. Therefore, other methods such as single-cell trajectory inference and pseudotime estimation have emerged. These methods allow to study cellular dynamics, delineate cell developmental lineages, and characterize the transition between different cell states. Briefly, single cells are ordered along deterministic or probabilistic trajectories and a numeric value referred to as pseudotime is assigned to each cell to indicate how far it progresses along a dynamic process of interest. Cell trajectory analysis, also known as pseudo-time series (pseudotime) analysis, uses single cell gene expression to order individual cells at pseudo-time, placing the cells at appropriate trajectory positions corresponding to biological processes, such as cell differentiation, by way of the individual cell's asynchronous biological processes. Most TI methods share a common workflow: dimensionality reduction followed by inference of lineages and pseudotimes in the reduced dimensional space. In that reduced dimensional space, a cell's pseudotime for a given lineage is the distance, along the lineage, between the cell and the origin of the lineage. For cells overexpressing TFs, the origin is defined using cells overexpressing controls.
Target Cell Types
Target cell types may include, but are not limited to an immune cell, intestinal cell, liver cell, kidney cell, lung cell, brain cell, epithelial cell, endoderm cell, neuron, ectoderm cell, islet cell, acinar cell, hematopoietic cell, hepatocyte, skin/keratinocyte, melanocyte, bone/osteocyte, hair/dermal papilla cell, cartilage/chondrocyte, fat cell/adipocyte, skeletal muscular cell, endothelium cell, cardiac muscle/cardiomyocyte, trophoblast. Target cells may also include progenitor cells associated with target cell types. Markers specific to target cell types are well known in the art.
In certain embodiments, target cell types are neural progenitors. In preferred embodiments, neural progenitors are differentiated to obtain a target cell type that is a neuron, astrocyte and/or oligodendrocyte. In more preferred embodiments, the target cell type is a neuron. In more preferred embodiments, the neuron is a GABAergic neuron. Neurons that produce GABA as their output are called GABAergic neurons, and have chiefly inhibitory action at receptors in the adult vertebrate (Rudy, et al., Three Groups of Interneurons Account for Nearly 100% of Neocortical GABAergic Neurons, Dev Neurobiol. 2011 Jan. 1; 71(1): 45-61). Malfunction of GABAergic neurons has been implicated in a number of diseases ranging from epilepsy to schizophrenia, anxiety disorders and autism. Id.
In certain embodiments, cells differentiated by overexpression of specific transcription factors can be further analyzed. Differentiated target cells can be analyzed for expression of biomarkers specific to the target cells or specific to a phenotype associated with the target cells.
The term “biomarker” is widespread in the art and commonly broadly denotes a biological molecule, more particularly an endogenous biological molecule, and/or a detectable portion thereof, whose qualitative and/or quantitative evaluation in a tested object (e.g., in or on a cell, cell population, tissue, organ, or organism) is predictive or informative with respect to one or more aspects of the tested object's phenotype and/or genotype. The terms “marker” and “biomarker” may be used interchangeably throughout this specification. Biomarkers as intended herein may be nucleic acid-based or peptide-, polypeptide- and/or protein-based. For example, a marker may be comprised of peptide(s), polypeptide(s) and/or protein(s) encoded by a given gene, or of detectable portions thereof. Further, whereas the term “nucleic acid” generally encompasses DNA, RNA and DNA/RNA hybrid molecules, in the context of markers the term may typically refer to heterogeneous nuclear RNA (hnRNA), pre-mRNA, messenger RNA (mRNA), or complementary DNA (cDNA), or detectable portions thereof. Such nucleic acid species are particularly useful as markers, since they contain qualitative and/or quantitative information about the expression of the gene. Particularly preferably, a nucleic acid-based marker may encompass mRNA of a given gene, or cDNA made of the mRNA, or detectable portions thereof. Any such nucleic acid(s), peptide(s), polypeptide(s) and/or protein(s) encoded by or produced from a given gene are encompassed by the term “gene product(s)”.
Preferably, markers as intended herein may be extracellular or cell surface markers, as methods to measure extracellular or cell surface marker(s) need not disturb the integrity of the cell membrane and may not require fixation/permeabilization of the cells.
Unless otherwise apparent from the context, reference herein to any marker, such as a peptide, polypeptide, protein, or nucleic acid, may generally also encompass modified forms of said marker, such as bearing post-expression modifications including, for example, phosphorylation, glycosylation, lipidation, methylation, cysteinylation, sulphonation, glutathionylation, acetylation, oxidation of methionine to methionine sulphoxide or methionine sulphone, and the like.
The term “peptide” as used throughout this specification preferably refers to a polypeptide as used herein consisting essentially of 50 amino acids or less, e.g., 45 amino acids or less, preferably 40 amino acids or less, e.g., 35 amino acids or less, more preferably 30 amino acids or less, e.g., 25 or less, 20 or less, 15 or less, 10 or less or 5 or less amino acids.
The term “polypeptide” as used throughout this specification generally encompasses polymeric chains of amino acid residues linked by peptide bonds. Hence, insofar a protein is only composed of a single polypeptide chain, the terms “protein” and “polypeptide” may be used interchangeably herein to denote such a protein. The term is not limited to any minimum length of the polypeptide chain. The term may encompass naturally, recombinantly, semi-synthetically or synthetically produced polypeptides. The term also encompasses polypeptides that carry one or more co- or post-expression-type modifications of the polypeptide chain, such as, without limitation, glycosylation, acetylation, phosphorylation, sulfonation, methylation, ubiquitination, signal peptide removal, N-terminal Met removal, conversion of pro-enzymes or pre-hormones into active forms, etc. The term further also includes polypeptide variants or mutants which carry amino acid sequence variations vis-à-vis a corresponding native polypeptide, such as, e.g., amino acid deletions, additions and/or substitutions. The term contemplates both full-length polypeptides and polypeptide parts or fragments, e.g., naturally-occurring polypeptide parts that ensue from processing of such full-length polypeptides.
The term “protein” as used throughout this specification generally encompasses macromolecules comprising one or more polypeptide chains, i.e., polymeric chains of amino acid residues linked by peptide bonds. The term may encompass naturally, recombinantly, semi-synthetically or synthetically produced proteins. The term also encompasses proteins that carry one or more co- or post-expression-type modifications of the polypeptide chain(s), such as, without limitation, glycosylation, acetylation, phosphorylation, sulfonation, methylation, ubiquitination, signal peptide removal, N-terminal Met removal, conversion of pro-enzymes or pre-hormones into active forms, etc. The term further also includes protein variants or mutants which carry amino acid sequence variations vis-à-vis a corresponding native protein, such as, e.g., amino acid deletions, additions and/or substitutions. The term contemplates both full-length proteins and protein parts or fragments, e.g., naturally-occurring protein parts that ensue from processing of such full-length proteins.
The reference to any marker, including any peptide, polypeptide, protein, or nucleic acid, corresponds to the marker commonly known under the respective designations in the art. The terms encompass such markers of any organism where found, and particularly of animals, preferably warm-blooded animals, more preferably vertebrates, yet more preferably mammals, including humans and non-human mammals, still more preferably of humans.
The terms particularly encompass such markers, including any peptides, polypeptides, proteins, or nucleic acids, with a native sequence, i.e., ones of which the primary sequence is the same as that of the markers found in or derived from nature. A skilled person understands that native sequences may differ between different species due to genetic divergence between such species. Moreover, native sequences may differ between or within different individuals of the same species due to normal genetic diversity (variation) within a given species. Also, native sequences may differ between or even within different individuals of the same species due to somatic mutations, or post-transcriptional or post-translational modifications. Any such variants or isoforms of markers are intended herein. Accordingly, all sequences of markers found in or derived from nature are considered “native”. The terms encompass the markers when forming a part of a living organism, organ, tissue or cell, when forming a part of a biological sample, as well as when at least partly isolated from such sources. The terms also encompass markers when produced by recombinant or synthetic means.
In certain embodiments, markers, including any peptides, polypeptides, proteins, or nucleic acids, may be human, i.e., their primary sequence may be the same as a corresponding primary sequence of or present in a naturally occurring human markers. Hence, the qualifier “human” in this connection relates to the primary sequence of the respective markers, rather than to their origin or source. For example, such markers may be present in or isolated from samples of human subjects or may be obtained by other means (e.g., by recombinant expression, cell-free transcription or translation, or non-biological nucleic acid or peptide synthesis).
The reference herein to any marker, including any peptide, polypeptide, protein, or nucleic acid, also encompasses fragments thereof. Hence, the reference herein to measuring (or measuring the quantity of) any one marker may encompass measuring the marker and/or measuring one or more fragments thereof.
For example, any marker and/or one or more fragments thereof may be measured collectively, such that the measured quantity corresponds to the sum amounts of the collectively measured species. In another example, any marker and/or one or more fragments thereof may be measured each individually. The terms encompass fragments arising by any mechanism, in vivo and/or in vitro, such as, without limitation, by alternative transcription or translation, exo- and/or endo-proteolysis, exo- and/or endo-nucleolysis, or degradation of the peptide, polypeptide, protein, or nucleic acid, such as, for example, by physical, chemical and/or enzymatic proteolysis or nucleolysis.
The term “fragment” as used throughout this specification with reference to a peptide, polypeptide, or protein generally denotes a portion of the peptide, polypeptide, or protein, such as typically an N- and/or C-terminally truncated form of the peptide, polypeptide, or protein. Preferably, a fragment may comprise at least about 30%, e.g., at least about 50% or at least about 70%, preferably at least about 80%, e.g., at least about 85%, more preferably at least about 90%, and yet more preferably at least about 95% or even about 99% of the amino acid sequence length of said peptide, polypeptide, or protein. For example, insofar not exceeding the length of the full-length peptide, polypeptide, or protein, a fragment may include a sequence of ≥5 consecutive amino acids, or ≥10 consecutive amino acids, or ≥20 consecutive amino acids, or ≥30 consecutive amino acids, e.g., ≥40 consecutive amino acids, such as for example ≥50 consecutive amino acids, e.g., ≥60, ≥70, ≥80, ≥90, ≥100, ≥200, ≥ 300, ≥400, ≥500 or ≥600 consecutive amino acids of the corresponding full-length peptide, polypeptide, or protein.
The term “fragment” as used throughout this specification with reference to a nucleic acid (polynucleotide) generally denotes a 5′- and/or 3′-truncated form of a nucleic acid. Preferably, a fragment may comprise at least about 30%, e.g., at least about 50% or at least about 70%, preferably at least about 80%, e.g., at least about 85%, more preferably at least about 90%, and yet more preferably at least about 95% or even about 99% of the nucleic acid sequence length of said nucleic acid. For example, insofar not exceeding the length of the full-length nucleic acid, a fragment may include a sequence of ≥5 consecutive nucleotides, or ≥10 consecutive nucleotides, or ≥20 consecutive nucleotides, or ≥30 consecutive nucleotides, e.g., ≥40 consecutive nucleotides, such as for example ≥50 consecutive nucleotides, e.g., ≥60, ≥ 70, ≥80, ≥90, ≥100, ≥200, ≥300, ≥400, ≥500 or ≥600 consecutive nucleotides of the corresponding full-length nucleic acid.
Cells such as target cells as disclosed herein may in the context of the present specification be said to “comprise the expression” or conversely to “not express” one or more markers, such as one or more genes or gene products; or be described as “positive” or conversely as “negative” for one or more markers, such as one or more genes or gene products; or be said to “comprise” a defined “gene or gene product signature”.
Such terms are commonplace and well-understood by the skilled person when characterizing cell phenotypes. By means of additional guidance, when a cell is said to be positive for or to express or comprise expression of a given marker, such as a given gene or gene product, a skilled person would conclude the presence or evidence of a distinct signal for the marker when carrying out a measurement capable of detecting or quantifying the marker in or on the cell. Suitably, the presence or evidence of the distinct signal for the marker would be concluded based on a comparison of the measurement result obtained for the cell to a result of the same measurement carried out for a negative control (for example, a cell known to not express the marker) and/or a positive control (for example, a cell known to express the marker). Where the measurement method allows for a quantitative assessment of the marker, a positive cell may generate a signal for the marker that is at least 1.5-fold higher than a signal generated for the marker by a negative control cell or than an average signal generated for the marker by a population of negative control cells, e.g., at least 2-fold, at least 4-fold, at least 10-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50-fold higher or even higher. Further, a positive cell may generate a signal for the marker that is 3.0 or more standard deviations, e.g., 3.5 or more, 4.0 or more, 4.5 or more, or 5.0 or more standard deviations, higher than an average signal generated for the marker by a population of negative control cells.
A marker, for example a gene or gene product, for example a peptide, polypeptide, protein, or nucleic acid, or a group of two or more markers, is “detected” or “measured” in a tested object (e.g., in or on a cell, cell population, tissue, organ, or organism) when the presence or absence and/or quantity of said marker or said group of markers is detected or determined in the tested object, preferably substantially to the exclusion of other molecules and analytes, e.g., other genes or gene products.
The terms “increased” or “increase” or “upregulated” or “upregulate” as used herein generally mean an increase by a statically significant amount. For avoidance of doubt, “increased” means a statistically significant increase of at least 10% as compared to a reference level, including an increase of at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100% or more, including, for example at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold increase or greater as compared to a reference level, as that term is defined herein.
The term “reduced” or “reduce” or “decrease” or “decreased” or “downregulate” or “downregulated” as used herein generally means a decrease by a statistically significant amount relative to a reference. For avoidance of doubt, “reduced” means statistically significant decrease of at least 10% as compared to a reference level, for example a decrease by at least 20%, at least 30%, at least 40%, at least 50%, or at least 60%, or at least 70%, or at least 80%, at least 90% or more, up to and including a 100% decrease (i.e., absent level as compared to a reference sample), or any decrease between 10-100% as compared to a reference level, as that.
The terms “quantity”, “amount” and “level” are synonymous and generally well-understood in the art. The terms as used throughout this specification may particularly refer to an absolute quantification of a marker in a tested object (e.g., in or on a cell, cell population, tissue, organ, or organism, e.g., in a biological sample of a subject), or to a relative quantification of a marker in a tested object, i.e., relative to another value such as relative to a reference value, or to a range of values indicating a base-line of the marker. Such values or ranges may be obtained as conventionally known.
An absolute quantity of a marker may be advantageously expressed as weight or as molar amount, or more commonly as a concentration, e.g., weight per volume or mol per volume. A relative quantity of a marker may be advantageously expressed as an increase or decrease or as a fold-increase or fold-decrease relative to said another value, such as relative to a reference value. Performing a relative comparison between first and second variables (e.g., first and second quantities) may but need not require determining first the absolute values of said first and second variables. For example, a measurement method may produce quantifiable readouts (such as, e.g., signal intensities) for said first and second variables, wherein said readouts are a function of the value of said variables, and wherein said readouts may be directly compared to produce a relative value for the first variable vs. the second variable, without the actual need to first convert the readouts to absolute values of the respective variables.
Reference values may be established according to known procedures previously employed for other cell populations, biomarkers and gene or gene product signatures. For example, a reference value may be established in an individual or a population of individuals characterized by a particular diagnosis, prediction and/or prognosis of said disease or condition (i.e., for whom said diagnosis, prediction and/or prognosis of the disease or condition holds true). Such population may comprise without limitation 2 or more, 10 or more, 100 or more, or even several hundred or more individuals.
A “deviation” of a first value from a second value may generally encompass any direction (e.g., increase: first value>second value; or decrease: first value<second value) and any extent of alteration.
For example, a deviation may encompass a decrease in a first value by, without limitation, at least about 10% (about 0.9-fold or less), or by at least about 20% (about 0.8-fold or less), or by at least about 30% (about 0.7-fold or less), or by at least about 40% (about 0.6-fold or less), or by at least about 50% (about 0.5-fold or less), or by at least about 60% (about 0.4-fold or less), or by at least about 70% (about 0.3-fold or less), or by at least about 80% (about 0.2-fold or less), or by at least about 90% (about 0.1-fold or less), relative to a second value with which a comparison is being made.
For example, a deviation may encompass an increase of a first value by, without limitation, at least about 10% (about 1.1-fold or more), or by at least about 20% (about 1.2-fold or more), or by at least about 30% (about 1.3-fold or more), or by at least about 40% (about 1.4-fold or more), or by at least about 50% (about 1.5-fold or more), or by at least about 60% (about 1.6-fold or more), or by at least about 70% (about 1.7-fold or more), or by at least about 80% (about 1.8-fold or more), or by at least about 90% (about 1.9-fold or more), or by at least about 100% (about 2-fold or more), or by at least about 150% (about 2.5-fold or more), or by at least about 200% (about 3-fold or more), or by at least about 500% (about 6-fold or more), or by at least about 700% (about 8-fold or more), or like, relative to a second value with which a comparison is being made.
Preferably, a deviation may refer to a statistically significant observed alteration. For example, a deviation may refer to an observed alteration which falls outside of error margins of reference values in a given population (as expressed, for example, by standard deviation or standard error, or by a predetermined multiple thereof, e.g., ±1×SD or ±2×SD or ±3×SD, or ±1×SE or ±2×SE or ±3×SE). Deviation may also refer to a value falling outside of a reference range defined by values in a given population (for example, outside of a range which comprises ≥40%, ≥50%, ≥60%, ≥70%, ≥75% or ≥80% or ≥85% or ≥90% or ≥95% or even ≥100% of values in said population).
In a further embodiment, a deviation may be concluded if an observed alteration is beyond a given threshold or cut-off. Such threshold or cut-off may be selected as generally known in the art to provide for a chosen sensitivity and/or specificity of the prediction methods, e.g., sensitivity and/or specificity of at least 50%, or at least 60%, or at least 70%, or at least 80%, or at least 85%, or at least 90%, or at least 95%.
For example, receiver-operating characteristic (ROC) curve analysis can be used to select an optimal cut-off value of the quantity of a given immune cell population, biomarker or gene or gene product signatures, for clinical use of the present diagnostic tests, based on acceptable sensitivity and specificity, or related performance measures which are well-known per se, such as positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), negative likelihood ratio (LR−), Youden index, or similar.
In certain embodiments, the target cells may be detected, quantified, sorted or isolated using a technique selected from the group consisting of flow cytometry, mass cytometry, fluorescence activated cell sorting (FACS), fluorescence microscopy, affinity separation, magnetic cell separation, microfluidic separation, RNA-seq (e.g., bulk or single cell), quantitative PCR, MERFISH (multiplex (in situ) RNA FISH) and combinations thereof. The technique may employ one or more agents capable of specifically binding to one or more gene products expressed or not expressed by the target cells, preferably on the cell surface of the target cells. The one or more agents may be one or more antibodies. Other methods including absorbance assays and colorimetric assays are known in the art and may be used herein.
In other example embodiments, detection of a marker may include immunological assay methods, wherein the ability of an assay to separate, detect and/or quantify a marker (such as, preferably, peptide, polypeptide, or protein) is conferred by specific binding between a separable, detectable and/or quantifiable immunological binding agent (antibody) and the marker. Immunological assay methods include without limitation immunohistochemistry, immunocytochemistry, flow cytometry, mass cytometry, fluorescence activated cell sorting (FACS), fluorescence microscopy, fluorescence based cell sorting using microfluidic systems, immunoaffinity adsorption based techniques such as affinity chromatography, magnetic particle separation, magnetic activated cell sorting or bead based cell sorting using microfluidic systems, enzyme-linked immunosorbent assay (ELISA) and ELISPOT based techniques, radioimmunoassay (RIA), western blot, etc.
In certain example embodiments, detection of a marker or signature may include biochemical assay methods, including inter alia assays of enzymatic activity, membrane channel activity, substance-binding activity, gene regulatory activity, or cell signaling activity of a marker, e.g., peptide, polypeptide, protein, or nucleic acid.
In other example embodiments, detection of a marker may include mass spectrometry analysis methods. Generally, any mass spectrometric (MS) techniques that are capable of obtaining precise information on the mass of peptides, and preferably also on fragmentation and/or (partial) amino acid sequence of selected peptides (e.g., in tandem mass spectrometry, MS/MS; or in post source decay, TOF MS), may be useful herein for separation, detection and/or quantification of markers (such as, preferably, peptides, polypeptides, or proteins). Suitable peptide MS and MS/MS techniques and systems are well-known per se (see, e.g., Methods in Molecular Biology, vol. 146: “Mass Spectrometry of Proteins and Peptides”, by Chapman, ed., Humana Press 2000, ISBN 089603609x; Biemann 1990. Methods Enzymol 193: 455-79; or Methods in Enzymology, vol. 402: “Biological Mass Spectrometry”, by Burlingame, ed., Academic Press 2005, ISBN 9780121828073) and may be used herein. MS arrangements, instruments and systems suitable for biomarker peptide analysis may include, without limitation, matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) MS; MALDI-TOF post-source-decay (PSD); MALDI-TOF/TOF; surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF) MS; electrospray ionization mass spectrometry (ESI-MS); ESI-MS/MS; ESI-MS/(MS)n (n is an integer greater than zero); ESI 3D or linear (2D) ion trap MS; ESI triple quadrupole MS; ESI quadrupole orthogonal TOF (Q-TOF); ESI Fourier transform MS systems; desorption/ionization on silicon (DIOS); secondary ion mass spectrometry (SIMS); atmospheric pressure chemical ionization mass spectrometry (APCI-MS); APCI-MS/MS; APCI-(MS)n; atmospheric pressure photoionization mass spectrometry (APPI-MS); APPI-MS/MS; and APPI-(MS)n. Peptide ion fragmentation in tandem MS (MS/MS) arrangements may be achieved using manners established in the art, such as, e.g., collision induced dissociation (CID). Detection and quantification of markers by mass spectrometry may involve multiple reaction monitoring (MRM), such as described among others by Kuhn et al. 2004 (Proteomics 4: 1175-86). MS peptide analysis methods may be advantageously combined with upstream peptide or protein separation or fractionation methods, such as for example with the chromatographic and other methods.
In other example embodiments, detection of a marker may include chromatography methods. In a one example embodiment, chromatography refers to a process in which a mixture of substances (analytes) carried by a moving stream of liquid or gas (“mobile phase”) is separated into components as a result of differential distribution of the analytes, as they flow around or over a stationary liquid or solid phase (“stationary phase”), between said mobile phase and said stationary phase. The stationary phase may be usually a finely divided solid, a sheet of filter material, or a thin film of a liquid on the surface of a solid, or the like. Chromatography may be columnar. While particulars of chromatography are well known in the art, for further guidance see, e.g., Meyer M., 1998, ISBN: 047198373X, and “Practical HPLC Methodology and Applications”, Bidlingmeyer, B. A., John Wiley & Sons Inc., 1993. Exemplary types of chromatography include, without limitation, high-performance liquid chromatography (HPLC), normal phase HPLC (NP-HPLC), reversed phase HPLC (RP-HPLC), ion exchange chromatography (IEC), such as cation or anion exchange chromatography, hydrophilic interaction chromatography (HILIC), hydrophobic interaction chromatography (HIC), size exclusion chromatography (SEC) including gel filtration chromatography or gel permeation chromatography, chromatofocusing, affinity chromatography such as immunoaffinity, immobilized metal affinity chromatography, and the like.
In certain embodiments, further techniques for separating, detecting and/or quantifying markers may be used in conjunction with any of the above described detection methods. Such methods include, without limitation, chemical extraction partitioning, isoelectric focusing (IEF) including capillary isoelectric focusing (CIEF), capillary isotachophoresis (CITP), capillary electrochromatography (CEC), and the like, one-dimensional polyacrylamide gel electrophoresis (PAGE), two-dimensional polyacrylamide gel electrophoresis (2D-PAGE), capillary gel electrophoresis (CGE), capillary zone electrophoresis (CZE), micellar electrokinetic chromatography (MEKC), free flow electrophoresis (FFE), etc.
In certain examples, such methods may include separating, detecting and/or quantifying markers at the nucleic acid level, more particularly RNA level, e.g., at the level of hnRNA, pre-mRNA, mRNA, or cDNA. Standard quantitative RNA or cDNA measurement tools known in the art may be used. Non-limiting examples include hybridization-based analysis, microarray expression analysis, digital gene expression profiling (DGE), RNA-in-situ hybridization (RISH), Northern-blot analysis and the like; PCR, RT-PCR, RT-qPCR, end-point PCR, digital PCR or the like; supported oligonucleotide detection, pyrosequencing, polony cyclic sequencing by synthesis, simultaneous bi-directional sequencing, single-molecule sequencing, single molecule real time sequencing, true single molecule sequencing, hybridization-assisted nanopore sequencing, sequencing by synthesis, single-cell RNA sequencing (sc-RNA seq), or the like.
The present invention is also directed to signatures and uses thereof. In certain embodiments, a homogenous population of a target cell type (e.g., radial glia) may allow identification of specific signatures (e.g., rare signatures). As used herein a “signature” may encompass any gene or genes, protein or proteins, or epigenetic element(s) whose expression profile or whose occurrence is associated with a specific cell type, subtype, or cell state of a specific cell type or subtype within a population of cells (e.g., radial glia). In certain embodiments, the expression of the target cell signatures is dependent on epigenetic modification of the genes or regulatory elements associated with the genes. Thus, in certain embodiments, use of signature genes includes epigenetic modifications that may be detected or modulated. For ease of discussion, when discussing gene expression, any gene or genes, protein or proteins, or epigenetic element(s) may be substituted. Reference to a gene name throughout the specification encompasses the human gene, mouse gene and all other orthologues as known in the art in other organisms. As used herein, the terms “signature”, “expression profile”, or “expression program” may be used interchangeably. It is to be understood that also when referring to proteins (e.g., differentially expressed proteins), such may fall within the definition of “gene” signature. Levels of expression or activity or prevalence may be compared between different cells in order to characterize or identify for instance signatures specific for cell (sub)populations. Increased or decreased expression or activity of signature genes may be compared between different cells in order to characterize or identify for instance specific cell (sub)populations. The detection of a signature in single cells may be used to identify and quantitate for instance specific cell (sub)populations. A signature may include a gene or genes, protein or proteins, or epigenetic element(s) whose expression or occurrence is specific to a cell (sub)population, such that expression or occurrence is exclusive to the cell (sub)population. A gene signature as used herein, may thus refer to any set of up- and down-regulated genes that are representative of a cell type or subtype. A gene signature as used herein, may also refer to any set of up- and down-regulated genes between different cells or cell (sub)populations derived from a gene-expression profile. For example, a gene signature may comprise a list of genes differentially expressed in a distinction of interest.
The signature as defined herein (being it a gene signature, protein signature or other genetic or epigenetic signature) can be used to indicate the presence of a cell type, a subtype of the cell type, the state of the microenvironment of a population of cells, a particular cell type population or subpopulation, and/or the overall status of the entire cell (sub)population. Furthermore, the signature may be indicative of cells within a population of cells in vivo.
The signature according to certain embodiments of the present invention may comprise or consist of one or more genes, proteins and/or epigenetic elements, such as for instance 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of two or more genes, proteins and/or epigenetic elements, such as for instance 2, 3, 4, 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of three or more genes, proteins and/or epigenetic elements, such as for instance 3, 4, 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of four or more genes, proteins and/or epigenetic elements, such as for instance 4, 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of five or more genes, proteins and/or epigenetic elements, such as for instance 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of six or more genes, proteins and/or epigenetic elements, such as for instance 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of seven or more genes, proteins and/or epigenetic elements, such as for instance 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of eight or more genes, proteins and/or epigenetic elements, such as for instance 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of nine or more genes, proteins and/or epigenetic elements, such as for instance 9, 10 or more. In certain embodiments, the signature may comprise or consist of ten or more genes, proteins and/or epigenetic elements, such as for instance 10, 11, 12, 13, 14, 15, or more. It is to be understood that a signature according to the invention may for instance also include genes or proteins as well as epigenetic elements combined.
In certain embodiments, a signature is characterized as being specific for a particular target cell or target cell (sub)population if it is upregulated or only present, detected or detectable in that particular target cell or target cell (sub)population, or alternatively is downregulated or only absent, or undetectable in that particular target cell or target cell (sub)population. In this context, a signature consists of one or more differentially expressed genes/proteins or differential epigenetic elements when comparing different cells or cell (sub)populations, including comparing different target cell or target cell (sub)populations, as well as comparing target cell or target cell (sub)populations with non-target cell or non-target cell (sub)populations. It is to be understood that “differentially expressed” genes/proteins include genes/proteins which are up- or down-regulated as well as genes/proteins which are turned on or off. When referring to up- or down-regulation, in certain embodiments, such up- or down-regulation is preferably at least two-fold, such as two-fold, three-fold, four-fold, five-fold, or more, such as for instance at least ten-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50-fold, or more. Alternatively, or in addition, differential expression may be determined based on common statistical tests, as is known in the art.
As discussed herein, differentially expressed genes/proteins, or differential epigenetic elements may be differentially expressed on a single cell level, or may be differentially expressed on a cell population level. Preferably, the differentially expressed genes/proteins or epigenetic elements as discussed herein, such as constituting the gene signatures as discussed herein, when as to the cell population or subpopulation level, refer to genes that are differentially expressed in all or substantially all cells of the population or subpopulation (such as at least 80%, preferably at least 90%, such as at least 95% of the individual cells). This allows one to define a particular subpopulation of target cells. As referred to herein, a “subpopulation” of cells preferably refers to a particular subset of cells of a particular cell type which can be distinguished or are uniquely identifiable and set apart from other cells of this cell type. The cell subpopulation may be phenotypically characterized, and is preferably characterized by the signature as discussed herein. A cell (sub)population as referred to herein may constitute of a (sub)population of cells of a particular cell type characterized by a specific cell state.
When referring to induction, or alternatively suppression of a particular signature, preferable is meant induction or alternatively suppression (or upregulation or downregulation) of at least one gene/protein and/or epigenetic element of the signature, such as for instance at least two, at least three, at least four, at least five, at least six, or all genes/proteins and/or epigenetic elements of the signature.
In certain embodiments, cells overexpressing transcription factors may be analyzed for the ability to further differentiate (e.g., radial glia can be differentiated to astrocytes, oligodendrocytes and neurons). The cells may be analyzed by analyzing spontaneous or directed differentiation methods. In certain embodiments, cells are analyzed by performing xenografts in immune compromised animal models. In certain embodiments, the cells are analyzed for the ability to repair or regenerate diseased tissue.
Oncology Screening
In certain embodiments, the barcoded transcription library can be used for a method of pooled screening for transcription factors that enhance or suppress tumor growth. Expression of tumor suppressors have been shown to suppress tumor growth (see, e.g., Wang et al., Restoring expression of wild-type p53 suppresses tumor growth but does not cause tumor regression in mice with a p53 missense mutation. J Clin Invest. 2011 March; 121(3):893-904). In certain embodiments, the method is used to identify therapeutic targets for treating specific cancers. Cancer cell lines for any cancer type may be used. Cancer cell lines may be obtained from a patient. In certain embodiments, the barcoded transcription factor library is introduced to a cancer cell line in vitro, the cells are grown (e.g., 1 to 3 weeks), and the enrichment and depletion of barcodes in the cells is determined as compared to the barcodes present in the original library. In certain embodiments, the barcoded transcription factor library is introduced to a cancer cell line in vitro and transferred to an in vivo model (e.g., nude mice), the cells are grown in vivo (e.g., 1 to 8 weeks), tumor cells are removed (e.g., the tumor), and the enrichment and depletion of barcodes in the cells is determined as compared to the barcodes present in the original library. Barcodes that are enriched represent transcription factors that enhance tumor growth. These transcription factor may be targeted for inhibition to suppress tumor growth. Barcodes that are depleted represent transcription factors that suppress tumor growth. These transcription factors may be overexpressed or activated to suppress tumor growth.
Combinatorial TF Screening and Prediction
In example embodiments, the genes and gene programs expressed in cells screened by overexpression of single transcription factors is used to identify transcription factor combinations to differentiate stem cells into a target cell type. In example embodiments, single cells overexpressing single transcription factors are used to identify one or more differentially expressed genes as compared to cells not expressing a transcription factor. In one embodiment, a transcription factor atlas as described herein is used. The differentially expressed genes can be used to determine combinations of transcription factors for directing differentiation of stem cells into target cells that more faithfully recapitulate the in vivo target cells. Thus, providing for improved cellular models and therapeutics. In one example embodiment, the average expression of differentially expressed genes for two or more transcription factors are compared to the gene expression of the differentially expressed genes in the target cell. The combination of transcription factors that provide an average expression that most closely recapitulates the expression in the target cell can be used to differentiate stem cells into the target cells. In example embodiments, the average is taken from 2, 3, 4, or more transcription factors, preferably, 2, 3, or 4 transcription factors. In example embodiments, more than 1 gene is averaged, for example, more than 10, 100, 1,000, 5,000, or 10,000 genes. In example embodiments, the genes are part of a gene program, expression program, or pathway as described herein.
In example embodiments, combinations of TFs can be screened using the methods and libraries described herein. For example, a library of 4, 5, 6, 7, 8, 9, 10, 20 or more transcription factors can be introduced to stem cells. In preferred embodiments, the TF library is introduced at high MOI (e.g., greater than 1, 2, 3, 4, 5 or more vectors per cell). In example embodiments, the cells are profiled by single cell RNA-seq. Using the pooled screening methods described herein TF combinations can be identified that are overexpressed by each single cell.
Use of Target Cells and Transcription Factors In Vitro Models
In certain embodiments, the present invention provides methods of generating target cell types in vitro. In vitro models may be obtained by overexpressing transcription factors identified through screening as described herein. In certain embodiments, the methods advantageously produce homogeneous cell types. The methods also provide target cells with reduced labor, time and cost.
In certain embodiments, the in vitro models of the present invention may be used to study development, cell biology and disease. In certain embodiments, the in vitro models of the present invention may be used to screen for drugs capable of modulating the target cells or for determining toxicity of drugs (e.g., toxic to cardiomyocytes). In certain embodiments, the in vitro models of the present invention may be used to identify specific cell states and/or subtypes.
In certain embodiments, the in vitro models of the present invention may be used in perturbation studies. Perturbations may include conditions, substances or agents. Agents may be of physical, chemical, biochemical and/or biological nature. Perturbations may include treatment with a small molecule, protein, RNAi, CRISPR system, TALE system, Zinc finger system, meganuclease, pathogen, allergen, biomolecule, or environmental stress. Such methods may be performed in any manner appropriate for the particular application.
In certain embodiments, the in vitro models are configured for performing perturb-seq. Methods and tools for genome-scale screening of perturbations in single cells using CRISPR have been described, herein referred to as perturb-seq (see e.g., Dixit et al., “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens” 2016, Cell 167, 1853-1866; Adamson et al., “A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response” 2016, Cell 167, 1867-1882; Feldman et al., Lentiviral co-packaging mitigates the effects of intermolecular recombination and multiple integrations in pooled genetic screens, bioRxiv 262121, doi: doi.org/10.1101/262121; Datlinger, et al., 2017, Pooled CRISPR screening with single-cell transcriptome readout. Nature Methods. Vol. 14 No. 3 DOI: 10.1038/nmeth.4177; Hill et al., On the design of CRISPR-based single cell molecular screens, Nat Methods. 2018 April; 15(4): 271-274; Replogle, et al., “Combinatorial single-cell CRISPR screens by direct guide RNA capture and targeted sequencing” Nat Biotechnol (2020). doi.org/10.1038/s41587-020-0470-y; and International Patent Publication No. WO 2017/075294). In certain embodiments, stem cells are configured for expression of a CRISPR enzyme, such that the cells can be induced to differentiate by overexpressing a transcription factor and barcoded guide sequences can be introduced to the cells.
Differentiation of Progenitor Cells
In certain embodiments, target cells are further differentiated. In certain embodiments, cells are differentiated by spontaneous differentiation. In certain embodiments, cells are differentiated by directed differentiation.
As used herein the term “spontaneous differentiation” refers to a process where progenitor cells spontaneously differentiate into a target cell and usually involves removal of growth factors from the media. In certain embodiments, the process of spontaneous differentiation can be accelerated by suboptimal culture conditions, such as cultivation to high density for extended periods (4-7 weeks) without replacement of a feeder layer. In certain embodiments, neural progenitor cells obtained by overexpressing transcription factors are spontaneously differentiated into neurons, astrocytes and oligodendrocytes by removal of growth factors from the media (see, e.g., Example 1-2).
As used herein the term “directed differentiation” refers to exposing the stem cells or pluripotent cells to specific signaling pathways modulators and manipulating cell culture conditions (environmental or exogenous) to mimic the natural sequence of developmental decisions to produce a given cell type/tissue. In certain embodiments, pluripotent stem cells (PSCs) are cultured in controlled conditions involving specific substrate or extracellular matrices promoting cell adhesion and differentiation, and defined culture media compositions. A limited number of signaling factors, such as growth factors or small molecules, controlling cell differentiation is applied sequentially or in a combinatorial manner, at varying dosage and exposure time (Cohen D E, Melton D, 2011 “Turning straw into gold: directing cell fate for regenerative medicine”. Nature Reviews Genetics. 12 (4): 243-252). In certain embodiments, radial glia produced using the TF overexpression method as described herein can also be differentiated by directed differentiation into neurons, astrocytes, oligodendrocytes, or organoids.
As used herein, the term “organoid” or “epithelial organoid” refers to a cell cluster or aggregate that resembles an organ, or part of an organ, and possesses cell types relevant to that particular organ. Organoid systems have been described previously, for example, for brain, retinal, stomach, lung, thyroid, small intestine, colon, liver, kidney, pancreas, prostate, mammary gland, fallopian tube, taste buds, salivary glands, and esophagus (see, e.g., Clevers, Modeling Development and Disease with Organoids, Cell. 2016 Jun. 16; 165(7):1586-1597).
In certain embodiments, directed differentiation may include the use of hormones, cytokines, growth factors, mitogens or any other differentiation promoting agents.
In certain embodiments, dual SMAD inhibition (Chambers et al., 2009; Shi et al., 2012a) is used to differentiate RFX4 neural progenitor cells towards CNS cell types, radial glia, and neurons. In certain embodiments, the neurons are GABAergic neurons. Dual SMAD inhibition may include two inhibitors of SMAD signaling. One inhibitor may be a BMP inhibitor. BMP inhibitors include chordin, follistatin, and noggin (Chambers et al., 2009). The two inhibitors may be Noggin and SB431542. SB431542 inhibits the Lefty/Activin/TGFβ pathways by blocking phosphorylation of ALK4, ALK5, ALK7 receptors. Id.
Non-limiting examples of hormones include growth hormone (GH), adrenocorticotropic hormone (ACTH), dehydroepiandrosterone (DHEA), cortisol, epinephrine, thyroid hormone, estrogen, progesterone, testosterone, or combinations thereof.
Non-limiting examples of cytokines include lymphokines (e.g., interferon-γ, IL-2, IL-3, IL-4, IL-6, granulocyte-macrophage colony-stimulating factor (GM-CSF), interferon-γ, leukocyte migration inhibitory factors (T-LIF, B-LIF), lymphotoxin-alpha, macrophage-activating factor (MAF), macrophage migration-inhibitory factor (MIF), neuroleukin, immunologic suppressor factors, transfer factors, or combinations thereof), monokines (e.g., IL-1, TNF-alpha, interferon-α, interferon-β, colony stimulating factors, e.g., CSF2, CSF3, macrophage CSF or GM-CSF, or combinations thereof), chemokines (e.g., beta-thromboglobulin, C chemokines, CC chemokines, CXC chemokines, CX3C chemokines, macrophage inflammatory protein (MIP), or combinations thereof), interleukins (e.g., IL-1, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10, IL-11, IL-12, IL-13, IL-14, IL-15, IL-17, IL-18, IL-19, IL-20, IL-21, IL-22, IL-23, IL-24, IL-25, IL-26, IL-27, IL-28, IL-29, IL-30, IL-31, IL-32, IL-33, IL-34, IL-35, IL-36, or combinations thereof), and several related signaling molecules, such as tumor necrosis factor (TNF) and interferons (e.g., interferon-α, interferon-β, interferon-γ, interferon-λ, or combinations thereof).
Non-limiting examples of growth factors include those of fibroblast growth factor (FGF) family, bone morphogenic protein (BMP) family, platelet derived growth factor (PDGF) family, transforming growth factor beta (TGFbeta) family, nerve growth factor (NGF) family, epidermal growth factor (EGF) family, insulin related growth factor (IGF) family, hepatocyte growth factor (HGF) family, hematopoietic growth factors (HeGFs), platelet-derived endothelial cell growth factor (PD-ECGF), angiopoietin, vascular endothelial growth factor (VEGF) family, glucocorticoids, or combinations thereof.
Non-limiting examples of mitogens include phytohaemagglutinin (PHA), concanavalin A (conA), lipopolysaccharide (LPS), pokeweed mitogen (PWM), phorbol ester such as phorbol myristate acetate (PMA) with or without ionomycin, or combinations thereof.
Non-limiting examples of cell surface receptors the ligands of which may act as immunomodulants include Toll-like receptors (TLRs) (e.g., TLR1, TLR2, TLR3, TLR4, TLR5, TLR6, TLR7, TLR8, TLR9, TLR10, TLR11, TLR12 or TLR13), CD80, CD86, CD40, CCR7, or C-type lectin receptors.
In certain embodiments, differentiation promoting agents may be used to obtain particular types of target cells. Differentiation promoting agents include anticoagulants, chelating agents, and antibiotics. Examples of such agents may be one or more of the following: vitamins and minerals or derivatives thereof, such as A (retinol), B3, C (ascorbate), ascorbate 2-phosphate, D such as D2 or D3, K, retinoic acid, nicotinamide, zinc or zinc compound, and calcium or calcium compounds; natural or synthetic hormones such as hydrocortisone, and dexamethasone; amino acids or derivatives thereof, such as L-glutamine (L-glu), ethylene glycol tetracetic acid (EGTA), proline, and non-essential amino acids (NEAA); compounds or derivatives thereof, such as β-mercaptoethyl, dibutyl cyclic adenosine monophosphate (db-CAMP), monothioglycerol (MTG), putrescine, dimethyl sulfoxide (DMSO), hypoxanthine, adenine, forskolin, cilostamide, and 3-isobutyl-1-methylxanthine; nucleosides and analogues thereof, such as 5-azacytidine; acids or salts thereof, such as ascorbic acid, pyruvate, okadic acid, linoleic acid, ethylenediaminetetraacetic acid (EDTA), anticoagulant citrate dextrose formula A (ACDA), disodium EDTA, sodium butyrate, and glycerophosphate; antibiotics or drugs, such as G418, gentamycine, Pentoxifylline (1-(5-oxohexyl)-3,7-dimethylxanthine), and indomethacin; and proteins such as tissue plasminogen activator (TPA).
Transdifferentiation
In certain embodiments, the screening platform and methods of screening are used for identifying transcription factors that drive transdifferentiation of cells into target cell types. As used herein, the terms “transdifferentiation” and “lineage reprogramming” refer to the process by which a committed cell of a first cell lineage is changed into another cell of a different cell type or a process in which one mature somatic cell transforms into another mature somatic cell without undergoing an intermediate pluripotent state or progenitor cell type. In some embodiments, transdifferentiation may be a combination of retrodifferentiation and redifferentiation. A “transdifferentiated cell” is a cell that results from transdifferentiation of a committed cell. For example, a committed cell such as a blood cell or glial cell may be transdifferentiated into a neuron; or a fibroblast may be transdifferentiated into a myocyte. As used herein, “retrodifferentiation” is the process by which a committed cell, i.e., mature, specialized cell, reverts back to a more primitive cell stage. A “retrodifferentiated cell” is a cell that results from retrodifferentiation of a committed cell. As used herein, “redifferentiation” refers to the process by which an uncommitted cell or a retrodifferentiated cell differentiates into a more mature, specialized cell. A “redifferentiated cell” refers to a cell that results from redifferentiation of an uncommitted cell or a retrodifferentiated cell. If a redifferentiated cell is obtained through redifferentiation of a retrodifferentiated cell, the redifferentiated cell may be of the same or different lineage as the committed cell that had undergone retrodifferentiation. For example, a committed cell such as a white blood cell may be retrodifferentiated to form a retrodifferentiated cell such as a pluripotent stem cell, and then the retrodifferentiated cell may be redifferentiated to form a lymphocyte, which is of the same lineage as the white blood cell (committed cell), or redifferentiated to form a neuron, which is of a different lineage than the white blood cell (committed cell).
In certain embodiments, transcription factors are used to transdifferentiate cells of one lineage into a target cell of a different lineage. In certain embodiments, target cell types can be transferred to a subject in need thereof to regenerate a diseased or damaged tissue. One study showed that that islet α-cells can be lineage-traced and reprogrammed by the transcription factors PDX1 and MAFA to produce and secrete insulin in response to glucose that are capable of reversing diabetes in mice (see, e.g., Furuyama, K. et al., 2019 Diabetes relief in mice by glucose-sensing insulin-secreting human α-cells Nature 567, 43-48). Another study showed that functional cardiomyocytes can be directly reprogrammed from differentiated somatic cells using three developmental transcription factors (i.e., Gata4, Mef2c and Tbx5) (see, e.g., Ieda, et al. (2010). “Direct Reprogramming of Fibroblasts into Functional Cardiomyocytes by Defined Factors”. Cell. 142 (3): 375-386. Another study identified that a combination of three factors, Ascl1, Bm2 and Myt11, sufficed to convert mouse embryonic and postnatal fibroblasts into functional neurons in vitro (see, e.g., Vierbuchen, et al., (2010). “Direct conversion of fibroblasts to functional neurons by defined factors”. Nature. 463 (7284): 1035-1041). In certain embodiments, transcription factors that differentiate stem cells into a target cell (e.g., progenitor cell) can be used to transdifferentiate cells of one lineage into a target cell of a different lineage. In certain embodiments, TFs that are expressed in progenitor cells can be used to transdifferentiate cells of one lineage into a target cell of a different lineage (see, e.g., Graf, T.; Enver, T. (2009). “Forcing cells to change lineages”. Nature. 462 (7273): 587-594). In this approach, transcription factors from progenitor cells of the target cell type are transfected into a somatic cell to induce transdifferentiation. Determining the unique set of cellular factors that is needed to be manipulated for each cell conversion is a long and costly process that involves much trial and error. Previous methods required narrowing down factors one by one. As a result, this first step of identifying the key set of cellular factors for cell conversion is the major obstacle researchers face in the field of cell reprogramming. In certain embodiments, the pooled screening methods described herein are used for determining which transcription factors to use.
In certain embodiments, cells can be transdifferentiated to target cells in vivo by targeted modulation of transcription factors or downstream targets. In certain embodiments, the targeted modulation of transcription factors can be used to regenerate, replenish or replace damaged or diseased cells in a subject in need thereof (e.g., heart cells, pancreatic β cells, eye cells, nervous system cells).
In certain embodiments, modulation of one or more of the transcription factors RFX4, NFIB, ASCL1 and PAX6 are used to transdifferentiate glia cells into neurons, astrocytes, or oligodendrocytes. For example, oligodendrocytes may be produced to regenerate the myelin sheath on axons.
In certain embodiments, modulation of one or more of the transcription factors MESP1, EOMES and ESR1 are used to transdifferentiate cardiofibroblasts into cardiomyocytes. For example, cardiomyocytes may be produced to regenerate a damaged heart.
Cell State Transitions
In certain embodiments, the screening platform and methods of screening are used for identifying transcription factors that modify the cell state or cell state transitions of target cell types. In example embodiments, cell state reflects the fact that cells of a particular type can exhibit variability with regard to one or more features and/or can exist in a variety of different conditions, while retaining the features of their particular cell type and not gaining features that would cause them to be classified as a different cell type. The different states or conditions in which a cell can exist may be characteristic of a particular cell type (e.g., they may involve properties or characteristics exhibited only by that cell type and/or involve functions performed only or primarily by that cell type) or may occur in multiple different cell types. Sometimes a cell state reflects the capability of a cell to respond to a particular stimulus or environmental condition (e.g., whether or not the cell will respond, or the type of response that will be elicited) or is a condition of the cell brought about by a stimulus or environmental condition. Cells in different cell states may be distinguished from one another in a variety of ways. For example, they may express, produce, or secrete one or more different genes, proteins, or other molecules (“markers”), exhibit differences in protein modifications such as phosphorylation, acetylation, etc., or may exhibit differences in appearance. Thus, a cell state may be a condition of the cell in which the cell expresses, produces, or secretes one or more markers, exhibits particular protein modification(s), has a particular appearance, and/or will or will not exhibit one or more biological response(s) to a stimulus or environmental condition.
In example embodiments, a transcription factor or combination of TFs can transition a cell from expressing one cell program to another cell program while the cell type remains the same (e.g., biological program, signature, expression program as described herein). For example, a cell may transition from an “old cell signature” to a “young cell signature” for rejuvenation (e.g., transitioning an “old neuron” to “young neuron”). Another example is enhancing certain cell functions, such as increasing efficiency of T cell killing by transitioning “exhausted T cell signature” to “active or naïve T cell signature.”
Another example of cell state is “activated” state as compared with “resting” or “non-activated” state. Many cell types in the body have the capacity to respond to a stimulus by modifying their state to an activated state. The particular alterations in state may differ depending on the cell type and/or the particular stimulus. A stimulus could be any biological, chemical, or physical agent to which a cell may be exposed.
Another example of cell state reflects the condition of cell (e.g., a muscle cell or adipose cell) as either sensitive or resistant to insulin. Insulin resistant cells exhibit decreased response to circulating insulin; for example, insulin-resistant skeletal muscle cells exhibit markedly reduced insulin-stimulated glucose uptake and a variety of other metabolic abnormalities that distinguish these cells from cells with normal insulin sensitivity.
In an example embodiment, the cell state is an immune cell state. The term “immune cell” as used throughout this specification generally encompasses any cell derived from a hematopoietic stem cell that plays a role in the immune response. The term is intended to encompass immune cells both of the innate or adaptive immune system. The immune cell as referred to herein may be a leukocyte, at any stage of differentiation (e.g., a stem cell, a progenitor cell, a mature cell) or any activation stage. Immune cells include lymphocytes (such as natural killer cells, T-cells (including, e.g., thymocytes, Th or Tc; Th1, Th2, Th17, Thαβ, CD4+, CD8+, effector Th, memory Th, regulatory Th, CD4+/CD8+ thymocytes, CD4−/CD8− thymocytes, γδ T cells, etc.) or B-cells (including, e.g., pro-B cells, early pro-B cells, late pro-B cells, pre-B cells, large pre-B cells, small pre-B cells, immature or mature B-cells, producing antibodies of any isotype, T1 B-cells, T2, B-cells, naïve B-cells, GC B-cells, plasmablasts, memory B-cells, plasma cells, follicular B-cells, marginal zone B-cells, B-1 cells, B-2 cells, regulatory B cells, etc.), such as for instance, monocytes (including, e.g., classical, non-classical, or intermediate monocytes), (segmented or banded) neutrophils, eosinophils, basophils, mast cells, histiocytes, microglia, including various subtypes, maturation, differentiation, or activation stages, such as for instance hematopoietic stem cells, myeloid progenitors, lymphoid progenitors, myeloblasts, promyelocytes, myelocytes, metamyelocytes, monoblasts, promonocytes, lymphoblasts, prolymphocytes, small lymphocytes, macrophages (including, e.g., Kupffer cells, stellate macrophages, M1 or M2 macrophages), (myeloid or lymphoid) dendritic cells (including, e.g., Langerhans cells, conventional or myeloid dendritic cells, plasmacytoid dendritic cells, mDC-1, mDC-2, Mo-DC, HP-DC, veiled cells), granulocytes, polymorphonuclear cells, antigen-presenting cells (APC), etc.
As used throughout this specification, “immune response” refers to a response by a cell of the immune system, such as a B cell, T cell (CD4+ or CD8+), regulatory T cell, antigen-presenting cell, dendritic cell, monocyte, macrophage, NKT cell, NK cell, basophil, eosinophil, or neutrophil, to a stimulus. In some embodiments, the response is specific for a particular antigen (an “antigen-specific response”), and refers to a response by a CD4 T cell, CD8 T cell, or B cell via their antigen-specific receptor. In some embodiments, an immune response is a T cell response, such as a CD4+ response or a CD8+ response. Such responses by these cells can include, for example, cytotoxicity, proliferation, cytokine or chemokine production, trafficking, or phagocytosis, and can be dependent on the nature of the immune cell undergoing the response.
T cell response refers more specifically to an immune response in which T cells directly or indirectly mediate or otherwise contribute to an immune response in a subject. T cell-mediated response may be associated with cell mediated effects, cytokine mediated effects, and even effects associated with B cells if the B cells are stimulated, for example, by cytokines secreted by T cells. By means of an example but without limitation, effector functions of MHC class I restricted Cytotoxic T lymphocytes (CTLs), may include cytokine and/or cytolytic capabilities, such as lysis of target cells presenting an antigen peptide recognized by the T cell receptor (naturally-occurring TCR or genetically engineered TCR, e.g., chimeric antigen receptor, CAR), secretion of cytokines, preferably IFN gamma, TNF alpha and/or or more immunostimulatory cytokines, such as IL-2, and/or antigen peptide-induced secretion of cytotoxic effector molecules, such as granzymes, perforins or granulysin. By means of example but without limitation, for MHC class II restricted T helper (Th) cells, effector functions may be antigen peptide-induced secretion of cytokines, preferably, IFN gamma, TNF alpha, IL-4, IL5, IL-10, and/or IL-2. By means of example but without limitation, for T regulatory (Treg) cells, effector functions may be antigen peptide-induced secretion of cytokines, preferably, IL-10, IL-35, and/or TGF-beta. B cell response refers more specifically to an immune response in which B cells directly or indirectly mediate or otherwise contribute to an immune response in a subject. Effector functions of B cells may include in particular production and secretion of antigen-specific antibodies by B cells (e.g., polyclonal B cell response to a plurality of the epitopes of an antigen (antigen-specific antibody response)), antigen presentation, and/or cytokine secretion.
During persistent immune activation, such as during uncontrolled tumor growth or chronic infections, subpopulations of immune cells, particularly of CD8+ or CD4+ T cells, become compromised to different extents with respect to their cytokine and/or cytolytic capabilities. Such immune cells, particularly CD8+ or CD4+ T cells, are commonly referred to as “dysfunctional” or as “functionally exhausted” or “exhausted”. As used herein, the term “dysfunctional” or “functional exhaustion” refer to a state of a cell where the cell does not perform its usual function or activity in response to normal input signals, and includes refractivity of immune cells to stimulation, such as stimulation via an activating receptor or a cytokine. Such a function or activity includes, but is not limited to, proliferation (e.g., in response to a cytokine, such as IFN-gamma) or cell division, entrance into the cell cycle, cytokine production, cytotoxicity, migration and trafficking, phagocytotic activity, or any combination thereof. Normal input signals can include, but are not limited to, stimulation via a receptor (e.g., T cell receptor, B cell receptor, co-stimulatory receptor). Unresponsive immune cells can have a reduction of at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or even 100% in cytotoxic activity, cytokine production, proliferation, trafficking, phagocytotic activity, or any combination thereof, relative to a corresponding control immune cell of the same type. In some particular embodiments of the aspects described herein, a cell that is dysfunctional is a CD8+ T cell that expresses the CD8+ cell surface marker. Such CD8+ cells normally proliferate and produce cell killing enzymes, e.g., they can release the cytotoxins perforin, granzymes, and granulysin. However, exhausted/dysfunctional T cells do not respond adequately to TCR stimulation, and display poor effector function, sustained expression of inhibitory receptors and a transcriptional state distinct from that of functional effector or memory T cells. Dysfunction/exhaustion of T cells thus prevents optimal control of infection and tumors. Exhausted/dysfunctional immune cells, such as T cells, such as CD8+ T cells, may produce reduced amounts of IFN-gamma, TNF-alpha and/or one or more immunostimulatory cytokines, such as IL-2, compared to functional immune cells. Exhausted/dysfunctional immune cells, such as T cells, such as CD8+ T cells, may further produce (increased amounts of) one or more immunosuppressive transcription factors or cytokines, such as IL-10 and/or Foxp3, compared to functional immune cells, thereby contributing to local immunosuppression. Dysfunctional CD8+ T cells can be both protective and detrimental against disease control. As used herein, a “dysfunctional immune state” refers to an overall suppressive immune state in a subject or microenvironment of the subject (e.g., tumor microenvironment). For example, increased IL-10 production leads to suppression of other immune cells in a population of immune cells.
CD8+ T cell function is associated with their cytokine profiles. It has been reported that effector CD8+ T cells with the ability to simultaneously produce multiple cytokines (polyfunctional CD8+ T cells) are associated with protective immunity in patients with controlled chronic viral infections as well as cancer patients responsive to immune therapy (Spranger et al., 2014, J. Immunother. Cancer, vol. 2, 3). In the presence of persistent antigen CD8+ T cells were found to have lost cytolytic activity completely over time (Moskophidis et al., 1993, Nature, vol. 362, 758-761). It was subsequently found that dysfunctional T cells can differentially produce IL-2, TNFa and IFNg in a hierarchical order (Wherry et al., 2003, J. Virol., vol. 77, 4911-4927). Decoupled dysfunctional and activated CD8+ cell states have also been described (see, e.g., Singer, et al. (2016). A Distinct Gene Module for Dysfunction Uncoupled from Activation in Tumor-Infiltrating T Cells. Cell 166, 1500-1511 e1509; WO/2017/075478; and WO/2018/049025).
As used herein, terms such as “Th17 cell” and/or “Th17 phenotype” and all grammatical variations thereof refer to a differentiated T helper cell that expresses one or more cytokines selected from the group the consisting of interleukin 17A (IL-17A), interleukin 17F (IL-17F), and interleukin 17A/F heterodimer (IL17-AF). As used herein, terms such as “Th1 cell” and/or “Th1 phenotype” and all grammatical variations thereof refer to a differentiated T helper cell that expresses interferon gamma (IFNγ). As used herein, terms such as “Th2 cell” and/or “Th2 phenotype” and all grammatical variations thereof refer to a differentiated T helper cell that expresses one or more cytokines selected from the group the consisting of interleukin 4 (IL-4), interleukin 5 (IL-5) and interleukin 13 (IL-13). As used herein, terms such as “Treg cell” and/or “Treg phenotype” and all grammatical variations thereof refer to a differentiated T cell that expresses Foxp3.
Depending on the cytokines used for differentiation, in vitro polarized Th17 cells can either cause severe autoimmune responses upon adoptive transfer (‘pathogenic Th17 cell state’) or have little or no effect in inducing autoimmune disease (‘non-pathogenic cell state’) (Ghoreschi et al., 2010; and Lee et al., 2012 “Induction and molecular signature of pathogenic Th17 cells,” Nature Immunology, vol. 13(10): 991-999). A dynamic regulatory network controls Th17 differentiation (See e.g., Yosef et al., Dynamic regulatory network controlling Th17 cell differentiation, Nature, vol. 496: 461-468 (2013); Wang et al., CD5L/AIM Regulates Lipid Biosynthesis and Restrains Th17 Cell Pathogenicity, Cell Volume 163, Issue 6, p 1413-1427, 3 Dec. 2015; Gaublomme et al., Single-Cell Genomics Unveils Critical Regulators of Th17 Cell Pathogenicity, Cell Volume 163, Issue 6, p 1400-1412, 3 Dec. 2015; and International publication numbers WO2016138488A2, WO2015130968, WO/2012/048265, WO/2014/145631 and WO/2014/134351, the contents of which are hereby incorporated by reference in their entirety).
Markers specific for the cell state can be determined for each TF as described previously (e.g., activated, quiescent, exhausted cell state markers). Markers can be determined, for example, by scRNA-seq (e.g., entire programs), flow FISH, reporters, etc.
Therapeutic Compositions and Uses
In certain embodiment, the cells produced according to the present invention are used for treatment, to model a disease, or to screen for therapeutic agents. In certain embodiments, target cells obtained according to the methods described herein may be used for the treatment of a subject in need thereof. In certain embodiments, target cells transdifferentiated according to the methods described herein may be used for the treatment of a subject in need thereof. In certain embodiments, target cells are transferred to a subject to repair, regenerate, replace or replenish a target tissue or cell type. In certain embodiments, transcription factors or agents capable of modulating expression or activity of the transcription factors or downstream pathways are introduced in vivo to generate target cells. In certain embodiments, the TFs or agents are introduced to a specific target region requiring the target cells.
As used herein, a “subject” is a vertebrate, including any member of the class mammalia. As used herein, a “mammal” refers to any mammal including but not limited to human, mouse, rat, sheep, monkey, goat, rabbit, hamster, horse, cow or pig.
In certain embodiments, a cell-based therapeutic includes engraftment of the cells of the present invention. As used herein, the term “engraft” or “engraftment” refers to the process of cell incorporation into a tissue of interest in vivo through contact with existing cells of the tissue.
In certain embodiments, the cell based therapy may comprise adoptive cell transfer (ACT). As used herein adoptive cell transfer and adoptive cell therapy are used interchangeably. In certain embodiments, the target cells differentiated according to the methods described herein may be transferred to a subject in need thereof. If possible, use of autologous cells helps the recipient by minimizing GVHD issues. In certain embodiments, autologous stem cells are harvested from a subject and the cells are modulated to overexpress the transcription factor(s) to differentiate the stem cells into target cells.
In certain embodiments, the target cells are used as a cell-based therapy to treat a subject suffering from a disease. In certain embodiments, the disease may be treated by infusion of target cell types (see, e.g., US Patent Publication No. 20110091433A1 and Table 2 of application). In certain embodiments, a disease may be treated by inducing target cells in vivo. Target cells may be induced by expressing transcription factors at a specific site of the disease. Transcription factors may be provided to specific cells at a location of disease. In certain embodiments, mRNA is provided. In certain embodiments, transdifferentiation of target cells is performed in vivo.
Diseases
In certain embodiment, the cells produced according to the present invention are used for treatment, to model a disease, or to screen for therapeutic agents. The disease may be selected from the group consisting of bone marrow failure, hematological conditions, aplastic anemia, beta-thalassemia, diabetes, neuron disease, motor neuron disease, Parkinson's disease, spinal cord injury, muscular dystrophy, kidney disease, liver disease, multiple sclerosis, congestive heart failure, head trauma, lung disease, psoriasis, liver cirrhosis, vision loss, cystic fibrosis, hepatitis C virus, human immunodeficiency virus, inflammatory bowel disease (IBD), and any disorder associated with tissue degeneration.
In certain embodiments, the neuron disease may be a disease where GABAergic neurons are implicated. In certain embodiments, the disease may be autism, schizophrenia, epilepsy, dementia, Alzheimer's disease, or anxiety disorders (e.g., depression) (Rudy, et al., Three Groups of Interneurons Account for Nearly 100% of Neocortical GABAergic Neurons, Dev Neurobiol. 2011 Jan. 1; 71(1): 45-61; Xu and Wong, GABAergic Inhibitory Neurons as Therapeutic Targets for Cognitive Impairment in Schizophrenia, Acta Pharmacol Sin. 2018 May; 39(5): 733-753; Fogaça and Duman, Cortical GABAergic Dysfunction in Stress and Depression: New Insights for Therapeutic Interventions, Front Cell Neurosci. 2019; 13: 87; Choi et al., Pathology of nNOS expressing GABAergic neurons in mouse model of Alzheimer's disease, Neuroscience. 2018 Aug. 1; 384: 41-53; Treiman, GABAergic Mechanisms in Epilepsy, Epilepsia. 2001; 42 Suppl 3:8-12; and Coghlan et al., GABA System Dysfunction in Autism and Related Disorders: From Synapse to Symptoms, Neurosci Biobehav Rev. 2012 October; 36(9): 2044-2055).
Aplastic anemia is a rare but fatal bone marrow disorder, marked by pancytopaenia and hypocellular bone marrow (Young et al. Blood 2006, 108: 2509-2519). The disorder may be caused by an immune-mediated pathophysiology with activated type I cytotoxic T cells expressing Th1 cytokine, especially y-interferon targeted towards the haematopoietic stem cell compartment, leading to bone marrow failure and hence anhaematoposis (Bacigalupo et al. Hematology 2007, 23-28). The majority of aplastic anaemia patients can be treated with stem cell transplantation obtained from HLA-matched siblings (Locasciulli et al. Haematologica. 2007; 92:11-18.).
Thalassaemia is an inherited autosomal recessive blood disease marked by a reduced synthesis rate of one of the globin chains that make up hemoglobin. Thus, there is an underproduction of normal globin proteins, often due to mutations in regulatory genes, which results in formation of abnormal hemoglobin molecules, causing anemia. Different types of thalassemia include alpha thalassemia, beta thalassemia, and delta thalassemia, which affect production of the alpha globin, beta globin, and delta globin, respectively.
Diabetes is a syndrome resulting in abnormally high blood sugar levels (hyperglycemia). Diabetes refers to a group of diseases that lead to high blood glucose levels due to defects in either insulin secretion or insulin action in the body. Diabetes is typically separated into two types: type 1 diabetes, marked by a diminished production of insulin, or type 2 diabetes, marked by a resistance to the effects of insulin. Both types lead to hyperglycemia, which largely causes the symptoms generally associated with diabetes, e.g., excessive urine production, resulting compensatory thirst and increased fluid intake, blurred vision, unexplained weight loss, lethargy, and changes in energy metabolism.
Motor neuron diseases refer to a group of neurological disorders that affect motor neurons. Such diseases include amyotrophic lateral sclerosis (ALS), primary lateral sclerosis (PLS), and progressive muscular atrophy (PMA). ALS is marked by degeneration of both the upper and lower motor neurons, which ceases messages to the muscles and results in their weakening and eventual atrophy. PLS is a rare motor neuron disease affecting upper motor neurons only, which causes difficulties with balance, weakness and stiffness in legs, spasticity, and speech problems. PMA is a subtype of ALS that affects only the lower motor neurons, which can cause muscular atrophy, fasciculations, and weakness.
Parkinson's disease (PD) is a neurodegenerative disorder marked by the loss of the nigrostriatal pathway, resulting from degeneration of dopaminergic neurons within the substantia nigra. The cause of PD is not known, but is associated with the progressive death of dopaminergic (tyrosine hydroxylase (TH) positive) mesencephalic neurons, inducing motor impairment. Hence, PD is characterized by muscle rigidity, tremor, bradykinesia, and potentially akinesia.
Spinal cord injury is characterized by damage to the spinal cord and, in particular, the nerve fibers, resulting in impairment of part or all muscles or nerves below the injury site. Such damage may occur through trauma to the spine that fractures, dislocates, crushes, or compresses one or more of the vertebrae, or through nontraumatic injuries caused by arthritis, cancer, inflammation, or disk degeneration.
Muscular dystrophy (MD) refers to a set of hereditary muscle diseases that weaken skeletal muscles. MD may be characterized by progressive muscle weakness, defects in muscle proteins, muscle cell apoptosis, and tissue atrophy. There are over 100 diseases which exhibit MD characteristics, although nine diseases in particular—Duchenne, Becker, limb girdle, congenital, facioscapulohumeral, myotonic, oculopharyngeal, distal, and Emery-Dreifuss—are classified as MD.
Kidney disease refers to conditions that damage the kidneys and decrease their ability to function, which includes removal of wastes and excess water from the blood, regulation of electrolytes, blood pressure, acid-base balance, and reabsorption of glucose and amino acids. The two main causes of kidney disease are diabetes and high blood pressure, although other causes include glomerulonephritis, lupus, and malformations and obstructions in the kidney.
Multiple sclerosis is an autoimmune condition in which the immune system attacks the central nervous system, leading to demyelination. MS affects the ability of nerve cells in the brain and spinal cord to communicate with each other, as the body's own immune system attacks and damages the myelin which enwraps the neuron axons. When myelin is lost, the axons can no longer effectively conduct signals. This can lead to various neurological symptoms which usually progresses into physical and cognitive disability. In certain embodiments, target cells may include oligodendrocytes.
Congestive heart failure refers to a condition in which the heart cannot pump enough blood to the body's other organs. This condition can result from coronary artery disease, scar tissue on the heart cause by myocardial infarction, high blood pressure, heart valve disease, heart defects, and heart valve infection. Treatment programs typically consist of rest, proper diet, modified daily activities, and drugs such as angiotensin-converting enzyme (ACE) inhibitors, beta blockers, digitalis, diuretics, vasodilators. However, the treatment program will not reverse the damage or condition of the heart.
Hepatitis C is an infectious disease in the liver, caused by hepatitis C virus. Hepatitis C can progress to scarring (fibrosis) and advanced scarring (cirrhosis). Cirrhosis can lead to liver failure and other complications such as liver cancer.
Head trauma refers to an injury of the head that may or may not cause injury to the brain. Common causes of head trauma include traffic accidents, home and occupational accidents, falls, and assaults. Various types of problems may result from head trauma, including skull fracture, lacerations of the scalp, subdural hematoma (bleeding below the dura mater), epidural hematoma (bleeding between the dura mater and the skull), cerebral contusion (brain bruise), concussion (temporary loss of function due to trauma), coma, or even death.
Lung disease is a broad term for diseases of the respiratory system, which includes the lung, pleural cavity, bronchial tubes, trachea, upper respiratory tract, and nerves and muscles for breathing. Examples of lung diseases include obstructive lung diseases, in which the bronchial tubes become narrowed; restrictive or fibrotic lung diseases, in which the lung loses compliance and causes incomplete lung expansion and increased lung stiffness; respiratory tract infections, which can be caused by the common cold or pneumonia; respiratory tumors, such as those caused by cancer; pleural cavity diseases; and pulmonary vascular diseases, which affect pulmonary circulation.
Pharmaceutical Compositions
Target cells of the present invention may be combined with various components to produce compositions of the invention. The compositions may be combined with one or more pharmaceutically acceptable carriers or diluents to produce a pharmaceutical composition (which may be for human or animal use). Suitable carriers and diluents include, but are not limited to, isotonic saline solutions, for example phosphate-buffered saline. The composition of the invention may be administered by direct injection. The composition may be formulated for parenteral, intramuscular, intravenous, subcutaneous, intraocular, oral, transdermal administration, or injection into the spinal fluid.
Compositions comprising target cells may be delivered by injection or implantation. Cells may be delivered in suspension or embedded in a support matrix such as natural and/or synthetic biodegradable matrices. Natural matrices include, but are not limited to, collagen matrices. Synthetic biodegradable matrices include, but are not limited to, polyanhydrides and polylactic acid. These matrices may provide support for fragile cells in vivo.
The compositions may also comprise the target cells of the present invention, and at least one pharmaceutically acceptable excipient, carrier, or vehicle.
Delivery may also be by controlled delivery, i.e., delivered over a period of time which may be from several minutes to several hours or days. Delivery may be systemic (for example by intravenous injection) or directed to a particular site of interest. Cells may be introduced in vivo using liposomal transfer.
Target cells may be administered in doses of from 1×105 to 1×107 cells per kg. For example a 70 kg patient may be administered 1.4×106 cells for reconstitution of tissues. The dosages may be any combination of the target cells listed in this application.
Genetic Modifying Agents
In certain embodiments, the one or more modulating agents (e.g., for overexpressing transcription factors, silencing transcription factors or tagging cells with a detectable marker) may be a genetic modifying agent. The genetic modifying agent may comprise a CRISPR system, a zinc finger nuclease system, a TALEN, a meganuclease, or RNAi.
CRISPR
In certain embodiments, a CRISPR system is used to enhance expression or activity of transcription factors. In certain embodiments, the transcription factor expression or activity is enhanced temporarily, such that the enhancement is not permanent. In certain embodiments, expression of the transcription from its endogenous gene is enhanced (e.g., by directing an activator to the gene).
In certain embodiments, modification of transcription factor mRNA by a Cas13-deaminase system can be used to modulate transcription factor activity in order to generate target cells (see, e.g., International Patent Publication No. WO 2019/084062). In certain embodiments, the modification silences ubiquitination, methylation, acetylation, succinylation, glycosylation, O-GlcNAc, O-linked glycosylation, iodination, nitrosylation, sulfation, carboxyglutamation, phosphorylation, or a combination thereof. In some embodiments, the modification increases a half-life of a target TF. In certain embodiments, the transcription activity is enhanced by modifying a phosphorylation site on the transcription factor (see, e.g., Hunter and Karin, 1992, The regulation of Transcription by Phosphorylation. Cell, Vol. 70, 375-387; and Whitmarsh and Davis, 2000, Regulation of transcription factor function by phosphorylation. CMLS, Cell. Mol. Life Sci. 57: 1172).
In general, a CRISPR-Cas or CRISPR system as used in herein and in documents, such as International Patent Publication No. WO 2014/093622 (PCT/US2013/074667), refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or “RNA(s)” as that term is herein used (e.g., RNA(s) to guide Cas, such as Cas9, e.g. CRISPR RNA and transactivating (tracr) RNA or a single guide RNA (sgRNA) (chimeric RNA)) or other sequences and transcripts from a CRISPR locus. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system). See, e.g., Shmakov et al. (2015) “Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems”, Molecular Cell, DOI: dx.doi.org/10.1016/j.molcel.2015.10.008.
CRISPR-Cas systems can generally fall into two classes based on their architectures of their effector molecules, which are each further subdivided by type and subtype. The two class are Class 1 and Class 2. Class 1 CRISPR-Cas systems have effector modules composed of multiple Cas proteins, some of which form crRNA-binding complexes, while Class 2 CRISPR-Cas systems include a single, multi-domain crRNA-binding protein.
In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 1 CRISPR-Cas system. In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 2 CRISPR-Cas system.
In certain embodiments, a CRISPR system is used to enhance expression or activity of transcription factors (e.g., RFX4, NFIB, ASCL1, PAX6). In certain embodiments, the transcription factor expression or activity is enhanced temporarily, such that the enhancement is not permanent. In certain embodiments, expression of the transcription from its endogenous gene is enhanced (e.g., by directing an activator to the gene). In certain embodiments, genes are targeted for downregulation. In certain embodiments, genes are targeted for editing.
In certain embodiments, modification of transcription factor mRNA by a Cas13-deaminase system can be used to modulate transcription factor activity in order to generate target cells (see, e.g., International Patent Publication No. WO 2019/084062). In certain embodiments, the modification silences ubiquitination, methylation, acetylation, succinylation, glycosylation, O-GlcNAc, O-linked glycosylation, iodination, nitrosylation, sulfation, carboxyglutamation, phosphorylation, or a combination thereof. In some embodiments, the modification increases a half-life of a target TF. In certain embodiments, the transcription activity is enhanced by modifying a phosphorylation site on the transcription factor (see, e.g., Hunter and Karin, 1992, The regulation of Transcription by Phosphorylation. Cell, Vol. 70, 375-387; and Whitmarsh and Davis, 2000, Regulation of transcription factor function by phosphorylation. CMLS, Cell. Mol. Life Sci. 57: 1172).
Class 1 CRISPR-Cas Systems
In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 1 CRISPR-Cas system. Class 1 CRISPR-Cas systems are divided into types I, II, and IV. Makarova et al. 2020. Nat. Rev. 18: 67-83., particularly as described in
The Class 1 systems typically use a multi-protein effector complex, which can, in some embodiments, include ancillary proteins, such as one or more proteins in a complex referred to as a CRISPR-associated complex for antiviral defense (Cascade), one or more adaptation proteins (e.g. Cas1, Cas2, RNA nuclease), and/or one or more accessory proteins (e.g. Cas 4, DNA nuclease), CRISPR associated Rossman fold (CARF) domain containing proteins, and/or RNA transcriptase.
The backbone of the Class 1 CRISPR-Cas system effector complexes can be formed by RNA recognition motif domain-containing protein(s) of the repeat-associated mysterious proteins (RAMPs) family subunits, e.g., Cas 5, Cas6, and/or Cas7. RAMP proteins are characterized by having one or more RNA recognition motif domains. In some embodiments, multiple copies of RAMPs can be present. In some embodiments, the Class I CRISPR-Cas system can include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more Cas5, Cas6, and/or Cas 7 proteins. In some embodiments, the Cas6 protein is an RNAse, which can be responsible for pre-crRNA processing. When present in a Class 1 CRISPR-Cas system, Cas6 can be optionally physically associated with the effector complex.
Class 1 CRISPR-Cas system effector complexes can, in some embodiments, also include a large subunit. The large subunit can be composed of or include a Cas8 and/or Cas10 protein. See, e.g., FIGS. 1 and 2. Koonin E V, Makarova K S. 2019. Phil. Trans. R. Soc. B 374: 20180087, DOI: 10.1098/rstb.2018.0087 and Makarova et al. 2020.
Class 1 CRISPR-Cas system effector complexes can, in some embodiments, include a small subunit (for example, Cas11). See, e.g., FIGS. 1 and 2. Koonin E V, Makarova K S. 2019 Origins and evolution of CRISPR-Cas systems. Phil. Trans. R. Soc. B 374: 20180087, DOI: 10.1098/rstb.2018.0087.
In some embodiments, the Class 1 CRISPR-Cas system can be a Type I CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-A CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-B CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-C CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-D CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-E CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-F1 CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-F2 CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-F3 CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-G CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a CRISPR Cas variant, such as a Type I-A, I-B, I-E, I-F and I-U variants, which can include variants carried by transposons and plasmids, including versions of subtype I-F encoded by a large family of Tn7-like transposon and smaller groups of Tn7-like transposons that encode similarly degraded subtype I-B systems as previously described.
In some embodiments, the Class 1 CRISPR-Cas system can be a Type III CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-A CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-B CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-C CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-D CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-E CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-F CRISPR-Cas system.
In some embodiments, the Class 1 CRISPR-Cas system can be a Type IV CRISPR-Cas-system. In some embodiments, the Type IV CRISPR-Cas system can be a subtype IV-A CRISPR-Cas system. In some embodiments, the Type IV CRISPR-Cas system can be a subtype IV-B CRISPR-Cas system. In some embodiments, the Type IV CRISPR-Cas system can be a subtype IV-C CRISPR-Cas system.
The effector complex of a Class 1 CRISPR-Cas system can, in some embodiments, include a Cas3 protein that is optionally fused to a Cas2 protein, a Cas4, a Cas 5, a Cas6, a Cas7, a Cas8, a Cas10, a Cas11, or a combination thereof. In some embodiments, the effector complex of a Class 1 CRISPR-Cas system can have multiple copies, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14, of any one or more Cas proteins.
Class 2 CRISPR-Cas Systems
The compositions, systems, and methods described in greater detail elsewhere herein can be designed and adapted for use with Class 2 CRISPR-Cas systems. Thus, in some embodiments, the CRISPR-Cas system is a Class 2 CRISPR-Cas system. Class 2 systems are distinguished from Class 1 systems in that they have a single, large, multi-domain effector protein. In certain example embodiments, the Class 2 system can be a Type II, Type V, or Type VI system, which are described in Makarova et al. “Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants” Nature Reviews Microbiology, 18:67-81 (February 2020), incorporated herein by reference. Each type of Class 2 system is further divided into subtypes. See Markova et al. 2020, particularly at Figure. 2. Class 2, Type II systems can be divided into 4 subtypes: II-A, II-B, II-C1, and II-C2. Class 2, Type V systems can be divided into 17 subtypes: V-A, V-B1, V-B2, V-C, V-D, V-E, V-F1, V-FI(V-U3), V-F2, V-F3, V-G, V-H, V-I, V-K (V-U5), V-U1, V-U2, and V-U4. Class 2, Type IV systems can be divided into 5 subtypes: VI-A, VI-B1, VI-B2, VI-C, and VI-D.
The distinguishing feature of these types is that their effector complexes consist of a single, large, multi-domain protein. Type V systems differ from Type II effectors (e.g. Cas9) contain two nuclear domains that are each responsible for the cleavage of one strand of the target DNA, with the HNH nuclease inserted inside the Ruv-C like nuclease domain sequence. The Type V systems (e.g. Cas12) only contain a RuvC-like nuclease domain that cleaves both strands. Type VI (Cas13) are unrelated to the effectors of type II and V systems, contain two HEPN domains and target RNA. Cas13 proteins also display collateral activity that is triggered by target recognition. Some Type V systems have also been found to possess this collateral activity two single-stranded DNA in in vitro contexts.
In some embodiments, the Class 2 system is a Type II system. In some embodiments, the Type II CRISPR-Cas system is a II-A CRISPR-Cas system. In some embodiments, the Type II CRISPR-Cas system is a II-B CRISPR-Cas system. In some embodiments, the Type II CRISPR-Cas system is a II-C1 CRISPR-Cas system. In some embodiments, the Type II CRISPR-Cas system is a II-C2 CRISPR-Cas system. In some embodiments, the Type II system is a Cas9 system. In some embodiments, the Type II system includes a Cas9.
In some embodiments, the Class 2 system is a Type V system. In some embodiments, the Type V CRISPR-Cas system is a V-A CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-B1 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-B2 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-C CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-D CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-E CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F1 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F1 (V-U3) CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F2 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F3 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-G CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-H CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-I CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-K (V-U5) CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U1 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U2 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U4 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system includes a Cas12a (Cpf1), Cas12b (C2c1), Cas12c (C2c3), CasX, and/or Cas14.
In some embodiments the Class 2 system is a Type VI system. In some embodiments, the Type VI CRISPR-Cas system is a VI-A CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-B1 CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-B2 CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-C CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-D CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system includes a Cas13a (C2c2), Cas13b (Group 29/30), Cas13c, and/or Cas13d.
Specialized Cas-Based Systems
In some embodiments, the system is a Cas-based system that is capable of performing a specialized function or activity. For example, the Cas protein may be fused, operably coupled to, or otherwise associated with one or more functionals domains. In certain example embodiments, the Cas protein may be a catalytically dead Cas protein (“dCas”) and/or have nickase activity. A nickase is a Cas protein that cuts only one strand of a double stranded target. In such embodiments, the dCas or nickase provide a sequence specific targeting functionality that delivers the functional domain to or proximate a target sequence. Example functional domains that may be fused to, operably coupled to, or otherwise associated with a Cas protein can be or include, but are not limited to a nuclear localization signal (NLS) domain, a nuclear export signal (NES) domain, a translational activation domain, a transcriptional activation domain (e.g., VP64, p65, MyoD1, HSF1, RTA, and SET7/9), a translation initiation domain, a transcriptional repression domain (e.g., a KRAB domain, NuE domain, NcoR domain, and a SID domain such as a SID4X domain), a nuclease domain (e.g., FokI), a histone modification domain (e.g., a histone acetyltransferase), a light inducible/controllable domain, a chemically inducible/controllable domain, a transposase domain, a homologous recombination machinery domain, a recombinase domain, an integrase domain, and combinations thereof. Methods for generating catalytically dead Cas9 or a nickase Cas9 (WO 2014/204725, Ran et al. Cell. 2013 Sep. 12; 154(6):1380-1389), Cas12 (Liu et al. Nature Communications, 8, 2095 (2017), and Cas13 (International Patent Publication Nos. WO 2019/005884, WO2019/060746) are known in the art and incorporated herein by reference.
In some embodiments, the functional domains can have one or more of the following activities: methylase activity, demethylase activity, translation activation activity, translation initiation activity, translation repression activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity, molecular switch activity, chemical inducibility, light inducibility, and nucleic acid binding activity. In some embodiments, the one or more functional domains may comprise epitope tags or reporters. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporters include, but are not limited to, glutathione-S-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT) beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and auto-fluorescent proteins including blue fluorescent protein (BFP).
The one or more functional domain(s) may be positioned at, near, and/or in proximity to a terminus of the effector protein (e.g., a Cas protein). In embodiments having two or more functional domains, each of the two can be positioned at or near or in proximity to a terminus of the effector protein (e.g., a Cas protein). In some embodiments, such as those where the functional domain is operably coupled to the effector protein, the one or more functional domains can be tethered or linked via a suitable linker (including, but not limited to, GlySer linkers) to the effector protein (e.g., a Cas protein). When there is more than one functional domain, the functional domains can be same or different. In some embodiments, all the functional domains are the same. In some embodiments, all of the functional domains are different from each other. In some embodiments, at least two of the functional domains are different from each other. In some embodiments, at least two of the functional domains are the same as each other.
Other suitable functional domains can be found, for example, in International Patent Publication No. WO 2019/018423.
Split CRISPR-Cas Systems
In some embodiments, the CRISPR-Cas system is a split CRISPR-Cas system. See e.g. Zetche et al., 2015. Nat. Biotechnol. 33(2): 139-142, the compositions and techniques of which can be used in and/or adapted for use with the present invention. Split CRISPR-Cas proteins are set forth herein and in documents incorporated herein by reference in further detail herein. In certain embodiments, each part of a split CRISPR protein is attached to a member of a specific binding pair, and when bound with each other, the members of the specific binding pair maintain the parts of the CRISPR protein in proximity. In certain embodiments, each part of a split CRISPR protein is associated with an inducible binding pair. An inducible binding pair is one which is capable of being switched “on” or “off” by a protein or small molecule that binds to both members of the inducible binding pair. In some embodiments, CRISPR proteins may preferably split between domains, leaving domains intact. In particular embodiments, said Cas split domains (e.g., RuvC and HNH domains in the case of Cas9) can be simultaneously or sequentially introduced into the cell such that said split Cas domain(s) process the target nucleic acid sequence in the algae cell. The reduced size of the split Cas compared to the wild type Cas allows other methods of delivery of the systems to the cells, such as the use of cell penetrating peptides as described herein.
Base Editing
In some embodiments, a polynucleotide of the present invention described elsewhere herein (e.g., RFX4, NFIB, ASCL1, PAX6) can be modified using a base editing system. In some embodiments, a Cas protein is connected or fused to a nucleotide deaminase. Thus, in some embodiments the Cas-based system can be a base editing system. As used herein “base editing” refers generally to the process of polynucleotide modification via a CRISPR-Cas-based or Cas-based system that does not include excising nucleotides to make the modification. Base editing can convert base pairs at precise locations without generating excess undesired editing byproducts that can be made using traditional CRISPR-Cas systems.
In certain example embodiments, the nucleotide deaminase may be a DNA base editor used in combination with a DNA binding Cas protein such as, but not limited to, Class 2 Type II and Type V systems. Two classes of DNA base editors are generally known: cytosine base editors (CBEs) and adenine base editors (ABEs). CBEs convert a C·G base pair into a T·A base pair (Komor et al. 2016. Nature. 533:420-424; Nishida et al. 2016. Science. 353; and Li et al. Nat. Biotech. 36:324-327) and ABEs convert an A·T base pair to a G·C base pair. Collectively, CBEs and ABEs can mediate all four possible transition mutations (C to T, A to G, T to C, and G to A). Rees and Liu. 2018.Nat. Rev. Genet. 19(12): 770-788, particularly at FIGS. 1b, 2a-2c, 3a-3f, and Table 1. In some embodiments, the base editing system includes a CBE and/or an ABE. In some embodiments, a polynucleotide of the present invention described elsewhere herein can be modified using a base editing system. Rees and Liu. 2018. Nat. Rev. Gent. 19(12):770-788. Base editors also generally do not need a DNA donor template and/or rely on homology-directed repair. Komor et al. 2016. Nature. 533:420-424; Nishida et al. 2016. Science. 353; and Gaudeli et al. 2017. Nature. 551:464-471. Upon binding to a target locus in the DNA, base pairing between the guide RNA of the system and the target DNA strand leads to displacement of a small segment of ssDNA in an “R-loop”. Nishimasu et al. Cell. 156:935-949. DNA bases within the ssDNA bubble are modified by the enzyme component, such as a deaminase. In some systems, the catalytically disabled Cas protein can be a variant or modified Cas can have nickase functionality and can generate a nick in the non-edited DNA strand to induce cells to repair the non-edited strand using the edited strand as a template. Komor et al. 2016. Nature. 533:420-424; Nishida et al. 2016. Science. 353; and Gaudeli et al. 2017. Nature. 551:464-471. Base editors may be further engineered to optimize conversion of nucleotides (e.g., A:T to G:C). Richter et al. 2020. Nature Biotechnology. doi.org/10.1038/s41587-020-0453-z.
Other Example Type V base editing systems are described in International Patent Publication Nos. WO 2018/213708 and WO 2018/213726, and International Patent Application Nos. PCT/US2018/067207, PCT/US2018/067225, and PCT/US2018/067307 which are incorporated by referenced herein.
In certain example embodiments, the base editing system may be a RNA base editing system. As with DNA base editors, a nucleotide deaminase capable of converting nucleotide bases may be fused to a Cas protein. However, in these embodiments, the Cas protein will need to be capable of binding RNA. Example RNA binding Cas proteins include, but are not limited to, RNA-binding Cas9s such as Francisella novicida Cas9 (“FnCas9”), and Class 2 Type VI Cas systems. The nucleotide deaminase may be a cytidine deaminase or an adenosine deaminase, or an adenosine deaminase engineered to have cytidine deaminase activity. In certain example embodiments, the RNA based editor may be used to delete or introduce a post-translation modification site in the expressed mRNA. In contrast to DNA base editors, whose edits are permanent in the modified cell, RNA base editors can provide edits where finer temporal control may be needed, for example in modulating a particular immune response. Example Type VI RNA-base editing systems are described in Cox et al. 2017. Science 358: 1019-1027, International Patent Publication Nos. WO 2019/005884, WO 2019/005886, and WO 2019/071048, and International Patent Application Nos. PCT/US20018/05179 and PCT/US2018/067207, which are incorporated herein by reference. An example FnCas9 system that may be adapted for RNA base editing purposes is described in International Patent Publication No. WO 2016/106236, which is incorporated herein by reference.
An example method for delivery of base-editing systems, including use of a split-intein approach to divide CBE and ABE into reconstitutable halves, is described in Levy et al. Nature Biomedical Engineering doi.org/10.1038/s41441-019-0505-5 (2019), which is incorporated herein by reference.
Prime Editing
In some embodiments, a polynucleotide of the present invention described elsewhere herein (e.g. RFX4, NFIB, ASCL1, PAX6) can be modified using a prime editing system (See e.g. Anzalone et al. 2019. Nature. 576: 149-157). Like base editing systems, prime editing systems can be capable of targeted modification of a polynucleotide without generating double stranded breaks and does not require donor templates. Further prime editing systems can be capable of all 12 possible combination swaps. Prime editing can operate via a “search-and-replace” methodology and can mediate targeted insertions, deletions, all 12 possible base-to-base conversion, and combinations thereof. Generally, a prime editing system, as exemplified by PE1, PE2, and PE3 (Id.), can include a reverse transcriptase fused or otherwise coupled or associated with an RNA-programmable nickase, and a prime-editing extended guide RNA (pegRNA) to facility direct copying of genetic information from the extension on the pegRNA into the target polynucleotide. Embodiments that can be used with the present invention include these and variants thereof. Prime editing can have the advantage of lower off-target activity than traditional CRISPR-Cas systems along with few byproducts and greater or similar efficiency as compared to traditional CRISPR-Cas systems.
In some embodiments, the prime editing guide molecule can specify both the target polynucleotide information (e.g., sequence) and contain a new polynucleotide cargo that replaces target polynucleotides. To initiate transfer from the guide molecule to the target polynucleotide, the PE system can nick the target polynucleotide at a target side to expose a 3′hydroxyl group, which can prime reverse transcription of an edit-encoding extension region of the guide molecule (e.g. a prime editing guide molecule or peg guide molecule) directly into the target site in the target polynucleotide. See e.g. Anzalone et al. 2019. Nature. 576: 149-157, particularly at FIGS. 1b, 1c, related discussion, and Supplementary discussion.
In some embodiments, a prime editing system can be composed of a Cas polypeptide having nickase activity, a reverse transcriptase, and a guide molecule. The Cas polypeptide can lack nuclease activity. The guide molecule can include a target binding sequence as well as a primer binding sequence and a template containing the edited polynucleotide sequence. The guide molecule, Cas polypeptide, and/or reverse transcriptase can be coupled together or otherwise associate with each other to form an effector complex and edit a target sequence. In some embodiments, the Cas polypeptide is a Class 2, Type V Cas polypeptide. In some embodiments, the Cas polypeptide is a Cas9 polypeptide (e.g., is a Cas9 nickase). In some embodiments, the Cas polypeptide is fused to the reverse transcriptase. In some embodiments, the Cas polypeptide is linked to the reverse transcriptase.
In some embodiments, the prime editing system can be a PEI system or variant thereof, a PE2 system or variant thereof, or a PE3 (e.g., PE3, PE3b) system. See e.g., Anzalone et al. 2019. Nature. 576: 149-157, particularly at pgs. 2-3, FIGS. 2a, 3a-3f, 4a-4b, Extended data FIGS. 3a-3b, 4,
The peg guide molecule can be about 10 to about 200 or more nucleotides in length, such as 10 to/or 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, or 200 or more nucleotides in length. Optimization of the peg guide molecule can be accomplished as described in Anzalone et al. 2019. Nature. 576: 149-157, particularly at pg. 3, FIG. 2a-2b, and Extended Data FIGS. 5a-c.
CAST Systems
In some embodiments, a polynucleotide of the present invention described elsewhere herein (e.g., RFX4, NFIB, ASCL1, PAX6) can be modified using a CRISPR-Associated Transposase (CAST) System, such as any of those described in PCT/US2019/066835. In some embodiments, a polynucleotide of the present invention described elsewhere herein can be modified using a CRISPR Associated Transposase (“CAST”) system. CAST system can include a Cas protein that is catalytically inactive, or engineered to be catalytically active, and further comprises a transposase (or subunits thereof) that catalyze RNA-guided DNA transposition. Such systems are able to insert DNA sequences at a target site in a DNA molecule without relying on host cell repair machinery. CAST systems can be Class 1 or Class 2 CAST systems. An example Class 1 system is described in Klompe et al. Nature, doi:10.1038/s41586-019-1323, which is in incorporated herein by reference. An example Class 2 system is described in Strecker et al. Science. 10/1126/science. aax9181 (2019), and International Patent Application No. PCT/US2019/066835, which are incorporated herein by reference.
Guide Molecules
The CRISPR-Cas or Cas-Based system described herein can, in some embodiments, include one or more guide molecules. The terms guide molecule, guide sequence and guide polynucleotide, refer to polynucleotides capable of guiding Cas to a target genomic locus and are used interchangeably as in foregoing cited documents such as WO 2014/093622 (PCT/US2013/074667). In general, a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a CRISPR complex to the target sequence. The guide molecule can be a polynucleotide.
The ability of a guide sequence (within a nucleic acid-targeting guide RNA) to direct sequence-specific binding of a nucleic acid-targeting complex to a target nucleic acid sequence may be assessed by any suitable assay. For example, the components of a nucleic acid-targeting CRISPR system sufficient to form a nucleic acid-targeting complex, including the guide sequence to be tested, may be provided to a host cell having the corresponding target nucleic acid sequence, such as by transfection with vectors encoding the components of the nucleic acid-targeting complex, followed by an assessment of preferential targeting (e.g., cleavage) within the target nucleic acid sequence, such as by Surveyor assay (Qui et al. 2004. BioTechniques. 36(4)702-707). Similarly, cleavage of a target nucleic acid sequence may be evaluated in a test tube by providing the target nucleic acid sequence, components of a nucleic acid-targeting complex, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art.
In some embodiments, the guide molecule is an RNA. The guide molecule(s) (also referred to interchangeably herein as guide polynucleotide and guide sequence) that are included in the CRISPR-Cas or Cas based system can be any polynucleotide sequence having sufficient complementarity with a target nucleic acid sequence to hybridize with the target nucleic acid sequence and direct sequence-specific binding of a nucleic acid-targeting complex to the target nucleic acid sequence. In some embodiments, the degree of complementarity, when optimally aligned using a suitable alignment algorithm, can be about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal BLAT, Novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, San Diego, CA), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
A guide sequence, and hence a nucleic acid-targeting guide, may be selected to target any target nucleic acid sequence. The target sequence may be DNA. The target sequence may be any RNA sequence. In some embodiments, the target sequence may be a sequence within a RNA molecule selected from the group consisting of messenger RNA (mRNA), pre-mRNA, ribosomal RNA (rRNA), transfer RNA (tRNA), micro-RNA (miRNA), small interfering RNA (siRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), double stranded RNA (dsRNA), non-coding RNA (ncRNA), long non-coding RNA (lncRNA), and small cytoplasmatic RNA (scRNA). In some preferred embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of mRNA, pre-mRNA, and rRNA. In some preferred embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of ncRNA, and lncRNA. In some more preferred embodiments, the target sequence may be a sequence within an mRNA molecule or a pre-mRNA molecule.
In some embodiments, a nucleic acid-targeting guide is selected to reduce the degree secondary structure within the nucleic acid-targeting guide. In some embodiments, about or less than about 75%, 50%, 40%, 30%, 25%, 20%, 15%, 10%, 5%, 1%, or fewer of the nucleotides of the nucleic acid-targeting guide participate in self-complementary base pairing when optimally folded. Optimal folding may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res. 9 (1981), 133-148). Another example folding algorithm is the online webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see e.g., A. R. Gruber et al., 2008, Cell 106(1): 23-24; and PA Carr and G M Church, 2009, Nature Biotechnology 27(12): 1151-62).
In certain embodiments, a guide RNA or crRNA may comprise, consist essentially of, or consist of a direct repeat (DR) sequence and a guide sequence or spacer sequence. In certain embodiments, the guide RNA or crRNA may comprise, consist essentially of, or consist of a direct repeat sequence fused or linked to a guide sequence or spacer sequence. In certain embodiments, the direct repeat sequence may be located upstream (i.e., 5′) from the guide sequence or spacer sequence. In other embodiments, the direct repeat sequence may be located downstream (i.e., 3′) from the guide sequence or spacer sequence.
In certain embodiments, the crRNA comprises a stem loop, preferably a single stem loop. In certain embodiments, the direct repeat sequence forms a stem loop, preferably a single stem loop.
In certain embodiments, the spacer length of the guide RNA is from 15 to 35 nt. In certain embodiments, the spacer length of the guide RNA is at least 15 nucleotides. In certain embodiments, the spacer length is from 15 to 17 nt, e.g., 15, 16, or 17 nt, from 17 to 20 nt, e.g., 17, 18, 19, or 20 nt, from 20 to 24 nt, e.g., 20, 21, 22, 23, or 24 nt, from 23 to 25 nt, e.g., 23, 24, or 25 nt, from 24 to 27 nt, e.g., 24, 25, 26, or 27 nt, from 27 to 30 nt, e.g., 27, 28, 29, or 30 nt, from 30-35 nt, e.g., 30, 31, 32, 33, 34, or 35 nt, or 35 nt or longer.
The “tracrRNA” sequence or analogous terms includes any polynucleotide sequence that has sufficient complementarity with a crRNA sequence to hybridize. In some embodiments, the degree of complementarity between the tracrRNA sequence and crRNA sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length. In some embodiments, the tracr sequence and crRNA sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin.
In general, degree of complementarity is with reference to the optimal alignment of the sca sequence and tracr sequence, along the length of the shorter of the two sequences. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the sca sequence or tracr sequence. In some embodiments, the degree of complementarity between the tracr sequence and sca sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher.
In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence can be about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or 100%; a guide or RNA or sgRNA can be about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length; or guide or RNA or sgRNA can be less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length; and tracr RNA can be 30 or 50 nucleotides in length. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence is greater than 94.5% or 95% or 95.5% or 96% or 96.5% or 97% or 97.5% or 98% or 98.5% or 99% or 99.5% or 99.9%, or 100%. Off target is less than 100% or 99.9% or 99.5% or 99% or 99% or 98.5% or 98% or 97.5% or 97% or 96.5% or 96% or 95.5% or 95% or 94.5% or 94% or 93% or 92% or 91% or 90% or 89% or 88% or 87% or 86% or 85% or 84% or 83% or 82% or 81% or 80% complementarity between the sequence and the guide, with it advantageous that off target is 100% or 99.9% or 99.5% or 99% or 99% or 98.5% or 98% or 97.5% or 97% or 96.5% or 96% or 95.5% or 95% or 94.5% complementarity between the sequence and the guide.
In some embodiments according to the invention, the guide RNA (capable of guiding Cas to a target locus) may comprise (1) a guide sequence capable of hybridizing to a genomic target locus in the eukaryotic cell; (2) a tracr sequence; and (3) a tracr mate sequence. All (1) to (3) may reside in a single RNA, i.e., an sgRNA (arranged in a 5′ to 3′ orientation), or the tracr RNA may be a different RNA than the RNA containing the guide and tracr sequence. The tracr hybridizes to the tracr mate sequence and directs the CRISPR/Cas complex to the target sequence. Where the tracr RNA is on a different RNA than the RNA containing the guide and tracr sequence, the length of each RNA may be optimized to be shortened from their respective native lengths, and each may be independently chemically modified to protect from degradation by cellular RNase or otherwise increase stability.
Many modifications to guide sequences are known in the art and are further contemplated within the context of this invention. Various modifications may be used to increase the specificity of binding to the target sequence and/or increase the activity of the Cas protein and/or reduce off-target effects. Example guide sequence modifications are described in International Patent Application No. PCT US2019/045582, specifically paragraphs [0178]-[0333]. which is incorporated herein by reference.
Target Sequences, PAMs, and PFSs Target Sequences
In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. A target sequence may comprise RNA polynucleotides. The term “target RNA” refers to a RNA polynucleotide being or comprising the target sequence. In other words, the target polynucleotide can be a polynucleotide or a part of a polynucleotide to which a part of the guide sequence is designed to have complementarity to and to which the effector function mediated by the complex comprising the CRISPR effector protein and a guide molecule is to be directed to. In some embodiments, a target sequence is located in the nucleus or cytoplasm of a cell.
The guide sequence can specifically bind a target sequence in a target polynucleotide. The target polynucleotide may be DNA. The target polynucleotide may be RNA. The target polynucleotide can have one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) target sequences. The target polynucleotide can be on a vector. The target polynucleotide can be genomic DNA. The target polynucleotide can be episomal. Other forms of the target polynucleotide are described elsewhere herein.
The target sequence may be DNA. The target sequence may be any RNA sequence. In some embodiments, the target sequence may be a sequence within a RNA molecule selected from the group consisting of messenger RNA (mRNA), pre-mRNA, ribosomal RNA (rRNA), transfer RNA (tRNA), micro-RNA (miRNA), small interfering RNA (siRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), double stranded RNA (dsRNA), non-coding RNA (ncRNA), long non-coding RNA (lncRNA), and small cytoplasmatic RNA (scRNA). In some preferred embodiments, the target sequence (also referred to herein as a target polynucleotide) may be a sequence within a RNA molecule selected from the group consisting of mRNA, pre-mRNA, and rRNA. In some preferred embodiments, the target sequence may be a sequence within a RNA molecule selected from the group consisting of ncRNA, and lncRNA. In some more preferred embodiments, the target sequence may be a sequence within an mRNA molecule or a pre-mRNA molecule.
PAM and PFS Elements
PAM elements are sequences that can be recognized and bound by Cas proteins. Cas proteins/effector complexes can then unwind the dsDNA at a position adjacent to the PAM element. It will be appreciated that Cas proteins and systems that include them that target RNA do not require PAM sequences (Marraffini et al. 2010. Nature. 463:568-571). Instead, many rely on PFSs, which are discussed elsewhere herein. In certain embodiments, the target sequence should be associated with a PAM (protospacer adjacent motif) or PFS (protospacer flanking sequence or site); that is, a short sequence recognized by the CRISPR complex. Depending on the nature of the CRISPR-Cas protein, the target sequence should be selected such that its complementary sequence in the DNA duplex (also referred to herein as the non-target sequence) is upstream or downstream of the PAM. In the embodiments, the complementary sequence of the target sequence is downstream or 3′ of the PAM or upstream or 5′ of the PAM. The precise sequence and length requirements for the PAM differ depending on the Cas protein used, but PAMs are typically 2-5 base pair sequences adjacent the protospacer (that is, the target sequence). Examples of the natural PAM sequences for different Cas proteins are provided herein below and the skilled person will be able to identify further PAM sequences for use with a given Cas protein.
The ability to recognize different PAM sequences depends on the Cas polypeptide(s) included in the system. See e.g. Gleditzsch et al. 2019. RNA Biology. 16(4):504-517. Table 15 below shows several Cas polypeptides and the PAM sequence they recognize.
In a preferred embodiment, the CRISPR effector protein may recognize a 3′ PAM. In certain embodiments, the CRISPR effector protein may recognize a 3′ PAM which is 5′H, wherein H is A, C or U.
Further, engineering of the PAM Interacting (PI) domain on the Cas protein may allow programing of PAM specificity, improve target site recognition fidelity, and increase the versatility of the CRISPR-Cas protein, for example as described for Cas9 in Kleinstiver B P et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature. 2015 Jul. 23; 523(7561):481-5. doi: 10.1038/nature14592. As further detailed herein, the skilled person will understand that Cas13 proteins may be modified analogously. Gao et al, “Engineered Cpf1 Enzymes with Altered PAM Specificities,” bioRxiv 091611; doi: http://dx.doi.org/10.1101/091611 (Dec. 4, 2016). Doench et al. created a pool of sgRNAs, tiling across all possible target sites of a panel of six endogenous mouse and three endogenous human genes and quantitatively assessed their ability to produce null alleles of their target gene by antibody staining and flow cytometry. The authors showed that optimization of the PAM improved activity and also provided an on-line tool for designing sgRNAs.
PAM sequences can be identified in a polynucleotide using an appropriate design tool, which are commercially available as well as online. Such freely available tools include, but are not limited to, CRISPRFinder and CRISPRTarget. Mojica et al. 2009. Microbiol. 155(Pt. 3):733-740; Atschul et al. 1990. J. Mol. Biol. 215:403-410; Biswass et al. 2013 RNA Biol. 10:817-827; and Grissa et al. 2007. Nucleic Acid Res. 35:W52-57. Experimental approaches to PAM identification can include, but are not limited to, plasmid depletion assays (Jiang et al. 2013. Nat. Biotechnol. 31:233-239; Esvelt et al. 2013. Nat. Methods. 10:1116-1121; Kleinstiver et al. 2015. Nature. 523:481-485), screened by a high-throughput in vivo model called PAM-SCNAR (Pattanayak et al. 2013. Nat. Biotechnol. 31:839-843 and Leenay et al. 2016.Mol. Cell. 16:253), and negative screening (Zetsche et al. 2015. Cell. 163:759-771).
As previously mentioned, CRISPR-Cas systems that target RNA do not typically rely on PAM sequences. Instead such systems typically recognize protospacer flanking sites (PFSs) instead of PAMs Thus, Type VI CRISPR-Cas systems typically recognize protospacer flanking sites (PFSs) instead of PAMs. PFSs represents an analogue to PAMs for RNA targets. Type VI CRISPR-Cas systems employ a Cas13. Some Cas13 proteins analyzed to date, such as Cas13a (C2c2) identified from Leptotrichia shahii (LShCAs13a) have a specific discrimination against G at the 3′end of the target RNA. The presence of a C at the corresponding crRNA repeat site can indicate that nucleotide pairing at this position is rejected. However, some Cas13 proteins (e.g., LwaCAs13a and PspCas13b) do not seem to have a PFS preference. See e.g., Gleditzsch et al. 2019. RNA Biology. 16(4):504-517.
Some Type VI proteins, such as subtype B, have 5′-recognition of D (G, T, A) and a 3′-motif requirement of NAN or NNA. One example is the Cas13b protein identified in Bergeyella zoohelcum (BzCas13b). See e.g., Gleditzsch et al. 2019. RNA Biology. 16(4):504-517.
Overall Type VI CRISPR-Cas systems appear to have less restrictive rules for substrate (e.g. target sequence) recognition than those that target DNA (e.g., Type V and type II).
Zinc Finger Nucleases
In some embodiments, the polynucleotide is modified using a Zinc Finger nuclease or system thereof. One type of programmable DNA-binding domain is provided by artificial zinc-finger (ZF) technology, which involves arrays of ZF modules to target new DNA-binding sites in the genome. Each finger module in a ZF array targets three DNA bases. A customized array of individual zinc finger domains is assembled into a ZF protein (ZFP).
ZFPs can comprise a functional domain. The first synthetic zinc finger nucleases (ZFNs) were developed by fusing a ZF protein to the catalytic domain of the Type IIS restriction enzyme FokI. (Kim, Y. G. et al., 1994, Chimeric restriction endonuclease, Proc. Natl. Acad. Sci. U.S.A. 91, 883-887; Kim, Y. G. et al., 1996, Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc. Natl. Acad. Sci. U.S.A. 93, 1156-1160). Increased cleavage specificity can be attained with decreased off target activity by use of paired ZFN heterodimers, each targeting different nucleotide sequences separated by a short spacer. (Doyon, Y. et al., 2011, Enhancing zinc-finger-nuclease activity with improved obligate heterodimeric architectures. Nat. Methods 8, 74-79). ZFPs can also be designed as transcription activators and repressors and have been used to target many genes in a wide variety of organisms. Exemplary methods of genome editing using ZFNs can be found for example in U.S. Pat. Nos. 6,534,261, 6,607,882, 6,746,838, 6,794,136, 6,824,978, 6,866,997, 6,933,113, 6,979,539, 7,013,219, 7,030,215, 7,220,719, 7,241,573, 7,241,574, 7,585,849, 7,595,376, 6,903,185, and 6,479,626, all of which are specifically incorporated by reference.
TALE Nucleases
In some embodiments, a TALE nuclease or TALE nuclease system can be used to modify a polynucleotide. In some embodiments, the methods provided herein use isolated, non-naturally occurring, recombinant or engineered DNA binding proteins that comprise TALE monomers or TALE monomers or half monomers as a part of their organizational structure that enable the targeting of nucleic acid sequences with improved efficiency and expanded specificity.
Naturally occurring TALEs or “wild type TALEs” are nucleic acid binding proteins secreted by numerous species of proteobacteria. TALE polypeptides contain a nucleic acid binding domain composed of tandem repeats of highly conserved monomer polypeptides that are predominantly 33, 34 or 35 amino acids in length and that differ from each other mainly in amino acid positions 12 and 13. In advantageous embodiments the nucleic acid is DNA. As used herein, the term “polypeptide monomers”, “TALE monomers” or “monomers” will be used to refer to the highly conserved repetitive polypeptide sequences within the TALE nucleic acid binding domain and the term “repeat variable di-residues” or “RVD” will be used to refer to the highly variable amino acids at positions 12 and 13 of the polypeptide monomers. As provided throughout the disclosure, the amino acid residues of the RVD are depicted using the IUPAC single letter code for amino acids. A general representation of a TALE monomer which is comprised within the DNA binding domain is X1-11-(X12X13)-X14-33 or 34 or 35, where the subscript indicates the amino acid position and X represents any amino acid. X12X13 indicate the RVDs. In some polypeptide monomers, the variable amino acid at position 13 is missing or absent and in such monomers, the RVD consists of a single amino acid. In such cases the RVD may be alternatively represented as X*, where X represents X12 and (*) indicates that X13 is absent. The DNA binding domain comprises several repeats of TALE monomers and this may be represented as (X1-11-(X12X13)-X14-33 or 34 or 35) z, where in an advantageous embodiment, z is at least 5 to 40. In a further advantageous embodiment, z is at least 10 to 26.
The TALE monomers can have a nucleotide binding affinity that is determined by the identity of the amino acids in its RVD. For example, polypeptide monomers with an RVD of NI can preferentially bind to adenine (A), monomers with an RVD of NG can preferentially bind to thymine (T), monomers with an RVD of HD can preferentially bind to cytosine (C) and monomers with an RVD of NN can preferentially bind to both adenine (A) and guanine (G). In some embodiments, monomers with an RVD of IG can preferentially bind to T. Thus, the number and order of the polypeptide monomer repeats in the nucleic acid binding domain of a TALE determines its nucleic acid target specificity. In some embodiments, monomers with an RVD of NS can recognize all four base pairs and can bind to A, T, G or C. The structure and function of TALEs is further described in, for example, Moscou et al., Science 326:1501 (2009); Boch et al., Science 326:1509-1512 (2009); and Zhang et al., Nature Biotechnology 29:149-153 (2011).
The polypeptides used in methods of the invention can be isolated, non-naturally occurring, recombinant or engineered nucleic acid-binding proteins that have nucleic acid or DNA binding regions containing polypeptide monomer repeats that are designed to target specific nucleic acid sequences.
As described herein, polypeptide monomers having an RVD of HN or NH preferentially bind to guanine and thereby allow the generation of TALE polypeptides with high binding specificity for guanine containing target nucleic acid sequences. In some embodiments, polypeptide monomers having RVDs RN, NN, NK, SN, NH, KN, HN, NQ, HH, RG, KH, RH and SS can preferentially bind to guanine. In some embodiments, polypeptide monomers having RVDs RN, NK, NQ, HH, KH, RH, SS and SN can preferentially bind to guanine and can thus allow the generation of TALE polypeptides with high binding specificity for guanine containing target nucleic acid sequences. In some embodiments, polypeptide monomers having RVDs HH, KH, NH, NK, NQ, RH, RN and SS can preferentially bind to guanine and thereby allow the generation of TALE polypeptides with high binding specificity for guanine containing target nucleic acid sequences. In some embodiments, the RVDs that have high binding specificity for guanine are RN, NH RH and KH. Furthermore, polypeptide monomers having an RVD of NV can preferentially bind to adenine and guanine. In some embodiments, monomers having RVDs of H*, HA, KA, N*, NA, NC, NS, RA, and S* bind to adenine, guanine, cytosine and thymine with comparable affinity.
The predetermined N-terminal to C-terminal order of the one or more polypeptide monomers of the nucleic acid or DNA binding domain determines the corresponding predetermined target nucleic acid sequence to which the polypeptides of the invention will bind. As used herein the monomers and at least one or more half monomers are “specifically ordered to target” the genomic locus or gene of interest. In plant genomes, the natural TALE-binding sites always begin with a thymine (T), which may be specified by a cryptic signal within the non-repetitive N-terminus of the TALE polypeptide; in some cases, this region may be referred to as repeat 0. In animal genomes, TALE binding sites do not necessarily have to begin with a thymine (T) and polypeptides of the invention may target DNA sequences that begin with T, A, G or C. The tandem repeat of TALE monomers always ends with a half-length repeat or a stretch of sequence that may share identity with only the first 20 amino acids of a repetitive full-length TALE monomer and this half repeat may be referred to as a half-monomer. Therefore, it follows that the length of the nucleic acid or DNA being targeted is equal to the number of full monomers plus two.
As described in Zhang et al., Nature Biotechnology 29:149-153 (2011), TALE polypeptide binding efficiency may be increased by including amino acid sequences from the “capping regions” that are directly N-terminal or C-terminal of the DNA binding region of naturally occurring TALEs into the engineered TALEs at positions N-terminal or C-terminal of the engineered TALE DNA binding region. Thus, in certain embodiments, the TALE polypeptides described herein further comprise an N-terminal capping region and/or a C-terminal capping region.
An exemplary amino acid sequence of a N-terminal capping region is:
An exemplary amino acid sequence of a C-terminal capping region is:
As used herein the predetermined “N-terminus” to “C terminus” orientation of the N-terminal capping region, the DNA binding domain comprising the repeat TALE monomers and the C-terminal capping region provide structural basis for the organization of different domains in the d-TALEs or polypeptides of the invention.
The entire N-terminal and/or C-terminal capping regions are not necessary to enhance the binding activity of the DNA binding region. Therefore, in certain embodiments, fragments of the N-terminal and/or C-terminal capping regions are included in the TALE polypeptides described herein.
In certain embodiments, the TALE polypeptides described herein contain a N-terminal capping region fragment that included at least 10, 20, 30, 40, 50, 54, 60, 70, 80, 87, 90, 94, 100, 102, 110, 117, 120, 130, 140, 147, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260 or 270 amino acids of an N-terminal capping region. In certain embodiments, the N-terminal capping region fragment amino acids are of the C-terminus (the DNA-binding region proximal end) of an N-terminal capping region. As described in Zhang et al., Nature Biotechnology 29:149-153 (2011), N-terminal capping region fragments that include the C-terminal 240 amino acids enhance binding activity equal to the full length capping region, while fragments that include the C-terminal 147 amino acids retain greater than 80% of the efficacy of the full length capping region, and fragments that include the C-terminal 117 amino acids retain greater than 50% of the activity of the full-length capping region.
In some embodiments, the TALE polypeptides described herein contain a C-terminal capping region fragment that included at least 6, 10, 20, 30, 37, 40, 50, 60, 68, 70, 80, 90, 100, 110, 120, 127, 130, 140, 150, 155, 160, 170, 180 amino acids of a C-terminal capping region. In certain embodiments, the C-terminal capping region fragment amino acids are of the N-terminus (the DNA-binding region proximal end) of a C-terminal capping region. As described in Zhang et al., Nature Biotechnology 29:149-153 (2011), C-terminal capping region fragments that include the C-terminal 68 amino acids enhance binding activity equal to the full-length capping region, while fragments that include the C-terminal 20 amino acids retain greater than 50% of the efficacy of the full-length capping region.
In certain embodiments, the capping regions of the TALE polypeptides described herein do not need to have identical sequences to the capping region sequences provided herein. Thus, in some embodiments, the capping region of the TALE polypeptides described herein have sequences that are at least 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical or share identity to the capping region amino acid sequences provided herein. Sequence identity is related to sequence homology. Homology comparisons may be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs may calculate percent (%) homology between two or more sequences and may also calculate the sequence identity shared by two or more amino acid or nucleic acid sequences. In some preferred embodiments, the capping region of the TALE polypeptides described herein have sequences that are at least 95% identical or share identity to the capping region amino acid sequences provided herein.
Sequence homologies can be generated by any of a number of computer programs known in the art, which include but are not limited to BLAST or FASTA. Suitable computer programs for carrying out alignments like the GCG Wisconsin Bestfit package may also be used. Once the software has produced an optimal alignment, it is possible to calculate % homology, preferably % sequence identity. The software typically does this as part of the sequence comparison and generates a numerical result.
In some embodiments described herein, the TALE polypeptides of the invention include a nucleic acid binding domain linked to the one or more effector domains. The terms “effector domain” or “regulatory and functional domain” refer to a polypeptide sequence that has an activity other than binding to the nucleic acid sequence recognized by the nucleic acid binding domain. By combining a nucleic acid binding domain with one or more effector domains, the polypeptides of the invention may be used to target the one or more functions or activities mediated by the effector domain to a particular target DNA sequence to which the nucleic acid binding domain specifically binds.
In some embodiments of the TALE polypeptides described herein, the activity mediated by the effector domain is a biological activity. For example, in some embodiments the effector domain is a transcriptional inhibitor (i.e., a repressor domain), such as an mSin interaction domain (SID). SID4X domain or a Krüppel-associated box (KRAB) or fragments of the KRAB domain. In some embodiments the effector domain is an enhancer of transcription (i.e. an activation domain), such as the VP16, VP64 or p65 activation domain. In some embodiments, the nucleic acid binding is linked, for example, with an effector domain that includes but is not limited to a transposase, integrase, recombinase, resolvase, invertase, protease, DNA methyltransferase, DNA demethylase, histone acetylase, histone deacetylase, nuclease, transcriptional repressor, transcriptional activator, transcription factor recruiting, protein nuclear-localization signal or cellular uptake signal.
In some embodiments, the effector domain is a protein domain which exhibits activities which include but are not limited to transposase activity, integrase activity, recombinase activity, resolvase activity, invertase activity, protease activity, DNA methyltransferase activity, DNA demethylase activity, histone acetylase activity, histone deacetylase activity, nuclease activity, nuclear-localization signaling activity, transcriptional repressor activity, transcriptional activator activity, transcription factor recruiting activity, or cellular uptake signaling activity. Other preferred embodiments of the invention may include any combination of the activities described herein.
Meganucleases
In some embodiments, a meganuclease or system thereof can be used to modify a polynucleotide. Meganucleases, which are endodeoxyribonucleases characterized by a large recognition site (double-stranded DNA sequences of 12 to 40 base pairs). Exemplary methods for using meganucleases can be found in U.S. Pat. Nos. 8,163,514, 8,133,697, 8,021,867, 8,119,361, 8,119,381, 8,124,369, and 8,129,134, which are specifically incorporated by reference.
Sequences Related to Nucleus Targeting and Transportation
In some embodiments, one or more components (e.g., the Cas protein and/or deaminase, Zn Finger protein, TALE, or meganuclease) in the composition for engineering cells may comprise one or more sequences related to nucleus targeting and transportation. Such sequence may facilitate the one or more components in the composition for targeting a sequence within a cell. In order to improve targeting of the CRISPR-Cas protein and/or the nucleotide deaminase protein or catalytic domain thereof used in the methods of the present disclosure to the nucleus, it may be advantageous to provide one or both of these components with one or more nuclear localization sequences (NLSs).
In some embodiments, the NLSs used in the context of the present disclosure are heterologous to the proteins. Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 10790) or PKKKRKVEAS (SEQ ID NO: 10791); the NLS from nucleoplasmin (e.g., the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 10792)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO: 10793) or RQRRNELKRSP (SEQ ID NO: 10794); the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 10795); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 10796) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO: 10797) and PPKKARED (SEQ ID NO: 10798) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO: 10799) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO: 10800) of mouse c-abl IV; the sequences DRLRR (SEQ ID NO: 10801) and PKQKKRK (SEQ ID NO: 10802) of the influenza virus NS1; the sequence RKLKKKIKKL (SEQ ID NO: 10803) of the Hepatitis virus delta antigen; the sequence REKKKFLKRR (SEQ ID NO; 10804) of the mouse Mx1 protein; the sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 10805) of the human poly(ADP-ribose) polymerase; and the sequence RKCLQAGMNLEARKTKK (SEQ ID NO: 10806) of the steroid hormone receptors (human) glucocorticoid. In general, the one or more NLSs are of sufficient strength to drive accumulation of the DNA-targeting Cas protein in a detectable amount in the nucleus of a eukaryotic cell. In general, strength of nuclear localization activity may derive from the number of NLSs in the CRISPR-Cas protein, the particular NLS(s) used, or a combination of these factors. Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to the nucleic acid-targeting protein, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g., a stain specific for the nucleus such as DAPI). Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of nucleic acid-targeting complex formation (e.g., assay for deaminase activity) at the target sequence, or assay for altered gene expression activity affected by DNA-targeting complex formation and/or DNA-targeting), as compared to a control not exposed to the CRISPR-Cas protein and deaminase protein, or exposed to a CRISPR-Cas and/or deaminase protein lacking the one or more NLSs.
The CRISPR-Cas and/or nucleotide deaminase proteins may be provided with 1 or more, such as with, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more heterologous NLSs. In some embodiments, the proteins comprises about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the amino-terminus, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the carboxy-terminus, or a combination of these (e.g., zero or at least one or more NLS at the amino-terminus and zero or at one or more NLS at the carboxy terminus). When more than one NLS is present, each may be selected independently of the others, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs present in one or more copies. In some embodiments, an NLS is considered near the N- or C-terminus when the nearest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N- or C-terminus. In preferred embodiments of the CRISPR-Cas proteins, an NLS attached to the C-terminal of the protein.
In certain embodiments, the CRISPR-Cas protein and the deaminase protein are delivered to the cell or expressed within the cell as separate proteins. In these embodiments, each of the CRISPR-Cas and deaminase protein can be provided with one or more NLSs as described herein. In certain embodiments, the CRISPR-Cas and deaminase proteins are delivered to the cell or expressed with the cell as a fusion protein. In these embodiments one or both of the CRISPR-Cas and deaminase protein is provided with one or more NLSs. Where the nucleotide deaminase is fused to an adaptor protein (such as MS2) as described above, the one or more NLS can be provided on the adaptor protein, provided that this does not interfere with aptamer binding. In particular embodiments, the one or more NLS sequences may also function as linker sequences between the nucleotide deaminase and the CRISPR-Cas protein.
In certain embodiments, guides of the disclosure comprise specific binding sites (e.g., aptamers) for adapter proteins, which may be linked to or fused to an nucleotide deaminase or catalytic domain thereof. When such a guide forms a CRISPR complex (e.g., CRISPR-Cas protein binding to guide and target) the adapter proteins bind and, the nucleotide deaminase or catalytic domain thereof associated with the adapter protein is positioned in a spatial orientation which is advantageous for the attributed function to be effective.
The skilled person will understand that modifications to the guide which allow for binding of the adapter+nucleotide deaminase, but not proper positioning of the adapter+nucleotide deaminase (e.g., due to steric hindrance within the three dimensional structure of the CRISPR complex) are modifications which are not intended. The one or more modified guide may be modified at the tetra loop, the stem loop 1, stem loop 2, or stem loop 3, as described herein, preferably at either the tetra loop or stem loop 2, and in some cases at both the tetra loop and stem loop 2.
In some embodiments, a component (e.g., the dead Cas protein, the nucleotide deaminase protein or catalytic domain thereof, or a combination thereof) in the systems may comprise one or more nuclear export signals (NES), one or more nuclear localization signals (NLS), or any combinations thereof. In some cases, the NES may be an HIV Rev NES. In certain cases, the NES may be MAPK NES. When the component is a protein, the NES or NLS may be at the C terminus of component. Alternatively or additionally, the NES or NLS may be at the N terminus of component. In some examples, the Cas protein and optionally said nucleotide deaminase protein or catalytic domain thereof comprise one or more heterologous nuclear export signal(s) (NES(s)) or nuclear localization signal(s) (NLS(s)), preferably an HIV Rev NES or MAPK NES, preferably C-terminal.
Templates
In some embodiments, the composition for engineering cells comprises a template, e.g., a recombination template. A template may be a component of another vector as described herein, contained in a separate vector, or provided as a separate polynucleotide. In some embodiments, a recombination template is designed to serve as a template in homologous recombination, such as within or near a target sequence nicked or cleaved by a nucleic acid-targeting effector protein as a part of a nucleic acid-targeting complex.
In an embodiment, the template nucleic acid alters the sequence of the target position. In an embodiment, the template nucleic acid results in the incorporation of a modified, or non-naturally occurring base into the target nucleic acid.
The template sequence may undergo a breakage mediated or catalyzed recombination with the target sequence. In an embodiment, the template nucleic acid may include sequence that corresponds to a site on the target sequence that is cleaved by a Cas protein mediated cleavage event. In an embodiment, the template nucleic acid may include sequence that corresponds to both, a first site on the target sequence that is cleaved in a first Cas protein mediated event, and a second site on the target sequence that is cleaved in a second Cas protein mediated event.
In certain embodiments, the template nucleic acid can include sequence which results in an alteration in the coding sequence of a translated sequence, e.g., one which results in the substitution of one amino acid for another in a protein product, e.g., transforming a mutant allele into a wild type allele, transforming a wild type allele into a mutant allele, and/or introducing a stop codon, insertion of an amino acid residue, deletion of an amino acid residue, or a nonsense mutation. In certain embodiments, the template nucleic acid can include sequence which results in an alteration in a non-coding sequence, e.g., an alteration in an exon or in a 5′ or 3′ non-translated or non-transcribed region. Such alterations include an alteration in a control element, e.g., a promoter, enhancer, and an alteration in a cis-acting or trans-acting control element.
A template nucleic acid having homology with a target position in a target gene may be used to alter the structure of a target sequence. The template sequence may be used to alter an unwanted structure, e.g., an unwanted or mutant nucleotide. The template nucleic acid may include sequence which, when integrated, results in: decreasing the activity of a positive control element; increasing the activity of a positive control element; decreasing the activity of a negative control element; increasing the activity of a negative control element; decreasing the expression of a gene; increasing the expression of a gene; increasing resistance to a disorder or disease; increasing resistance to viral entry; correcting a mutation or altering an unwanted amino acid residue conferring, increasing, abolishing or decreasing a biological property of a gene product, e.g., increasing the enzymatic activity of an enzyme, or increasing the ability of a gene product to interact with another molecule.
The template nucleic acid may include sequence which results in: a change in sequence of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12 or more nucleotides of the target sequence.
A template polynucleotide may be of any suitable length, such as about or more than about 10, 15, 20, 25, 50, 75, 100, 150, 200, 500, 1000, or more nucleotides in length. In an embodiment, the template nucleic acid may be 20+/−10, 30+/−10, 40+/−10, 50+/−10, 60+/−10, 70+/−10, 80+/−10, 90+/−10, 100+/−10, 1 10+/−10, 120+/−10, 130+/−10, 140+/−10, 150+/−10, 160+/−10, 170+/−10, 1 80+/−10, 190+/−10, 200+/−10, 210+/−10, of 220+/−10 nucleotides in length. In an embodiment, the template nucleic acid may be 30+/−20, 40+/−20, 50+/−20, 60+/−20, 70+/−20, 80+/−20, 90+/−20, 100+/−20, 1 10+/−20, 120+/−20, 130+/−20, 140+/−20, 150+/−20, 160+/−20, 170+/−20, 180+/−20, 190+/−20, 200+/−20, 210+/−20, of 220+/−20 nucleotides in length. In an embodiment, the template nucleic acid is 10 to 1,000, 20 to 900, 30 to 800, 40 to 700, 50 to 600, 50 to 500, 50 to 400, 50 to 300, 50 to 200, or 50 to 100 nucleotides in length.
In some embodiments, the template polynucleotide is complementary to a portion of a polynucleotide comprising the target sequence. When optimally aligned, a template polynucleotide might overlap with one or more nucleotides of a target sequences (e.g., about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 or more nucleotides). In some embodiments, when a template sequence and a polynucleotide comprising a target sequence are optimally aligned, the nearest nucleotide of the template polynucleotide is within about 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 5000, 10000, or more nucleotides from the target sequence.
The exogenous polynucleotide template comprises a sequence to be integrated (e.g., a mutated gene). The sequence for integration may be a sequence endogenous or exogenous to the cell. Examples of a sequence to be integrated include polynucleotides encoding a protein or a non-coding RNA (e.g., a microRNA). Thus, the sequence for integration may be operably linked to an appropriate control sequence or sequences. Alternatively, the sequence to be integrated may provide a regulatory function.
An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp. In some methods, the exemplary upstream or downstream sequence have about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000.
An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp. In some methods, the exemplary upstream or downstream sequence have about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000
In certain embodiments, one or both homology arms may be shortened to avoid including certain sequence repeat elements. For example, a 5′ homology arm may be shortened to avoid a sequence repeat element. In other embodiments, a 3′ homology arm may be shortened to avoid a sequence repeat element. In some embodiments, both the 5′ and the 3′ homology arms may be shortened to avoid including certain sequence repeat elements.
In some methods, the exogenous polynucleotide template may further comprise a marker. Such a marker may make it easy to screen for targeted integrations. Examples of suitable markers include restriction sites, fluorescent proteins, or selectable markers. The exogenous polynucleotide template of the disclosure can be constructed using recombinant techniques (see, for example, Sambrook et al., 2001 and Ausubel et al., 1996).
In certain embodiments, a template nucleic acid for correcting a mutation may be designed for use as a single-stranded oligonucleotide. When using a single-stranded oligonucleotide, 5′ and 3′ homology arms may range up to about 200 base pairs (bp) in length, e.g., at least 25, 50, 75, 100, 125, 150, 175, or 200 bp in length.
In certain embodiments, a template nucleic acid for correcting a mutation may be designed for use with a homology-independent targeted integration system. Suzuki et al. describe in vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration (2016, Nature 540:144-149). Schmid-Burgk, et al. describe use of the CRISPR-Cas9 system to introduce a double-strand break (DSB) at a user-defined genomic location and insertion of a universal donor DNA (Nat Commun. 2016 Jul. 28; 7:12338). Gao, et al. describe “Plug-and-Play Protein Modification Using Homology-Independent Universal Genome Engineering” (Neuron. 2019 Aug. 21; 103(4):583-597).
RNAi
In certain embodiments, the genetic modifying agent is RNAi (e.g., shRNA). As used herein, “gene silencing” or “gene silenced” in reference to an activity of an RNAi molecule, for example a siRNA or miRNA refers to a decrease in the mRNA level in a cell for a target gene by at least about 5%, about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 95%, about 99%, about 100% of the mRNA level found in the cell without the presence of the miRNA or RNA interference molecule. In one preferred embodiment, the mRNA levels are decreased by at least about 70%, about 80%, about 90%, about 95%, about 99%, about 100%.
As used herein, the term “RNAi” refers to any type of interfering RNA, including but not limited to, siRNAi, shRNAi, endogenous microRNA and artificial microRNA. For instance, it includes sequences previously identified as siRNA, regardless of the mechanism of down-stream processing of the RNA (i.e., although siRNAs are believed to have a specific method of in vivo processing resulting in the cleavage of mRNA, such sequences can be incorporated into the vectors in the context of the flanking sequences described herein). The term “RNAi” can include both gene silencing RNAi molecules, and also RNAi effector molecules which activate the expression of a gene.
As used herein, a “siRNA” refers to a nucleic acid that forms a double stranded RNA, which double stranded RNA has the ability to reduce or inhibit expression of a gene or target gene when the siRNA is present or expressed in the same cell as the target gene. The double stranded RNA siRNA can be formed by the complementary strands. In one embodiment, a siRNA refers to a nucleic acid that can form a double stranded siRNA. The sequence of the siRNA can correspond to the full-length target gene, or a subsequence thereof. Typically, the siRNA is at least about 15-50 nucleotides in length (e.g., each complementary sequence of the double stranded siRNA is about 15-50 nucleotides in length, and the double stranded siRNA is about 15-50 base pairs in length, preferably about 19-30 base nucleotides, preferably about 20-25 nucleotides in length, e.g., 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length).
As used herein “shRNA” or “small hairpin RNA” (also called stem loop) is a type of siRNA. In one embodiment, these shRNAs are composed of a short, e.g., about 19 to about 25 nucleotide, antisense strand, followed by a nucleotide loop of about 5 to about 9 nucleotides, and the analogous sense strand. Alternatively, the sense strand can precede the nucleotide loop structure and the antisense strand can follow.
The terms “microRNA” or “miRNA” are used interchangeably herein are endogenous RNAs, some of which are known to regulate the expression of protein-coding genes at the posttranscriptional level. Endogenous microRNAs are small RNAs naturally present in the genome that are capable of modulating the productive utilization of mRNA. The term artificial microRNA includes any type of RNA sequence, other than endogenous microRNA, which is capable of modulating the productive utilization of mRNA. MicroRNA sequences have been described in publications such as Lim, et al., Genes & Development, 17, p. 991-1008 (2003), Lim et al Science 299, 1540 (2003), Lee and Ambros Science, 294, 862 (2001), Lau et al., Science 294, 858-861 (2001), Lagos-Quintana et al, Current Biology, 12, 735-739 (2002), Lagos Quintana et al, Science 294, 853-857 (2001), and Lagos-Quintana et al, RNA, 9, 175-179 (2003), which are incorporated by reference. Multiple microRNAs can also be incorporated into a precursor molecule. Furthermore, miRNA-like stem-loops can be expressed in cells as a vehicle to deliver artificial miRNAs and short interfering RNAs (siRNAs) for the purpose of modulating the expression of endogenous genes through the miRNA and or RNAi pathways.
As used herein, “double stranded RNA” or “dsRNA” refers to RNA molecules that are comprised of two strands. Double-stranded molecules include those comprised of a single RNA molecule that doubles back on itself to form a two-stranded structure. For example, the stem loop structure of the progenitor molecules from which the single-stranded miRNA is derived, called the pre-miRNA (Bartel et al. 2004. Cell 1 16:281-297), comprises a dsRNA molecule.
Delivery
The programmable nucleic acid modifying agents and other modulating agents, or components thereof, or nucleic acid molecules thereof (including, for instance HDR template), or nucleic acid molecules encoding or providing components thereof, may be delivered by a delivery system herein described.
Vector delivery, e.g., plasmid, viral delivery: the modulating agents, can be delivered using any suitable vector, e.g., plasmid or viral vectors, such as adeno associated virus (AAV), lentivirus, adenovirus or other viral vector types, or combinations thereof. In some embodiments, the vector, e.g., plasmid or viral vector is delivered to the tissue of interest by, for example, an intramuscular injection, while other times the delivery is via intravenous, transdermal, intranasal, oral, mucosal, or other delivery methods. Such delivery may be either via a single dose, or multiple doses. One skilled in the art understands that the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector choice, the target cell, organism, or tissue, the general condition of the subject to be treated, the degree of transformation/modification sought, the administration route, the administration mode, the type of transformation/modification sought, etc.
In certain embodiments, mRNA encoding the transcription factors are delivered to a subject in need thereof. In certain embodiments, the mRNA is modified mRNA (see, e.g., U.S. Pat. No. 9,428,535 B2)
In certain embodiments, proteins, mRNA or cells are administered via targeted injection (e.g., the tissue to be repaired), intravenous, infusion, or other delivery methods. Such delivery may be either via a single dose, or multiple doses. One skilled in the art understands that the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the target cell, or tissue, the general condition of the subject to be treated, the degree of modification sought, the administration route, the administration mode, the type of modification sought, etc.
In certain embodiment, transcription factors are expressed in target tissue cells temporarily. In certain embodiments, the time of transcription factor expression or enhancement is only the time required to differentiate or transdifferentiate cells into target cells. In certain embodiments, transcription factors are expressed or enhanced for 1 to 14 days, preferably, about 2 days. In certain embodiments, the means of delivery does not result in integration of a sequence encoding transcription factors in the genome of target cells.
The invention is further described in the following examples, which do not limit the scope of the invention described in the claims.
EXAMPLES Example 1—Identification of Transcription Factors that Differentiate hESCs into Radial Glia
Radial glia are neural progenitors of the developing mammalian brain capable of generating neurons, astrocytes, and oligodendrocytes. The two most established methods for producing neural progenitors, embryoid body formation and dual SMAD inhibition, are not high-throughput and produce non-homogenous neural progenitor populations (Chambers S M, et al., Highly efficient neural conversion of human ES and iPS cells by dual inhibition of SMAD signaling. Nat Biotechnol. 2009; 27(3):275-80; and Pankratz M T, et al., Directed neural differentiation of human embryonic stem cells via an obligated primitive anterior stage. Stem Cells. 2007; 25(6):1511-20). Applicants developed a stepwise method for differentiating hESCs into neural progenitors. Although previous studies have shown that overexpression of the TFs ASCL1 and PAX6 can drive differentiation of embryonic stem cells into neural progenitors and neurons, the TFs that direct human radial glia differentiation remain unknown (Chanda S, et al., Generation of induced neuronal cells by the single reprogramming factor ASCL1. Stem Cell Reports. 2014; 3(2):282-96; and Zhang X, et al., Pax6 is a human neuroectoderm cell fate determinant. Cell Stem Cell. 2010; 7(1):90-100). Applicants individually overexpressed candidate TFs that are specifically expressed in radial glia based on available RNA-sequencing (RNA-seq) datasets, and selected those that generate cells expressing radial glia-specific marker genes and presenting associated morphology. Identification of novel TFs that direct radial glia differentiation can enable better understanding of neural development and provide positive controls for establishing a TF screening platform.
To establish a system for TF-directed differentiation, Applicants compared two overexpression methods, cDNA and CRISPR activation (Konermann S, et al., Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature. 2015; 517(7536):583-8), to upregulate known TFs that direct differentiation of hESCs to neurons, NEUROD1 and NEUROG2, in the HUES66 hESC line (Zhang Y, et al., 2013). Applicants chose the HUES66 line because of its ability to generate brain organoids efficiently and maintain karyotype stability (Quadrato G, et al., Cell diversity and network dynamics in photosensitive human brain organoids. Nature. 2017; 545(7652):48-53). Applicants found that in this system only cDNA overexpression successfully and efficiently differentiated hESCs into neurons by immunostaining for MAP2, a neuronal marker (specifically, the TF ORF without UTR as described further herein). Based on the results of the comparison, Applicants used cDNA to overexpress TFs individually in a targeted arrayed screen to identify those that could differentiate hESCs into radial glia (
Applicants next evaluated the fidelity of radial glia differentiated from each candidate. First, Applicants performed RNA-seq on radial glia derived from overexpressing each candidate for 7 and 12 days. Gene signature analysis of the RNA-seq data suggested similarities (e.g., EOMES and RFX4) and differences (e.g., NFIB and ASCL1) in the transcriptomes between the candidates. To determine how closely the differentiated radial glia resembled their in vivo counterparts, Applicants computationally generated gene expression signatures based on the 1,000 most differentially expressed genes compared to the GFP overexpression control and quantified enrichment of these signatures in human fetal radial glia and other neural cell types from the Pollen et al. dataset (
Discussion of Methods for Selection and Characterization of TFs Driving Optimal Radial Glia Differentiation
Applicants can continue to validate the candidate TFs. Applicants have already identified and selected the most promising TFs for further characterization to understand their role in radial glia differentiation. In particular, because some of the candidates did not produce neurons until after 4 weeks of differentiation, Applicants can spontaneously differentiate radial glia derived by candidate TF overexpression for a total of 6-8 weeks to observe additional astrocytes and oligodendrocytes. Applicants can immunostain the cells that have been differentiated for 6 and 8 weeks to determine which candidates generate radial glia that can differentiate into all 3 cell types at this time point. After pinpointing the ideal TF induction and differentiation timeline, Applicants can perform single-cell RNA-seq on the cells spontaneously differentiated from the top 4 candidates to more precisely characterize the types of differentiated cells. Due to the morphology of neural cells and difficulty in dissociating single neural cell types, single nuclei can be isolated from neural cells and sequenced as previously described (see e.g., WO/2017/164936). Applicants can compare the anatomical location of the cell types that the differentiated cells correspond to in vivo to the TF expression pattern in the human brain using the Allen Human Brain Atlas (Sunkin S M, et al., Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system. Nucleic Acids Res. 2013; 41(Database issue):D996-D1008). To better understand the regulatory pathways through which the TFs drive differentiation, Applicants can also perform chromatin immunoprecipitation followed by sequencing (ChIP-seq) using the epitope tag (e.g., V5) on the TF cDNA constructs and identify target genes for the top 4 candidates. Applicants can integrate differentially expressed genes and TF target genes from the RNA-seq and ChIP-seq results respectively to better understand potential pathway similarities and differences between the top 4 TFs. Finally, Applicants can combine 2 or 3 of the top 4 candidates and assess any potential synergistic improvement in radial glia fidelity using RNA-seq and spontaneous differentiation.
Given the data described herein, Applicants expect to find several candidate TFs whose overexpression can differentiate hESCs into radial glia that closely resemble primary cells. Applicants can also uncover multiple candidate TFs that each produce different subtypes of radial glia. Some of these candidates might upregulate the radial glia marker genes without exhibiting other properties associated with radial glia, such as ability to differentiate into different neural cell types. Since the candidate TFs likely have different downstream gene targets, the radial glia produced can have different transcriptome signatures and spontaneously differentiate into varying proportions of different downstream neural cell types. Applicants expect that the types of downstream cell types identified by single-nuclei RNA-seq can correlate with the expression pattern of the TF in the human brain.
A number of directed differentiation protocols require overexpression of two or more TFs for successful cell type conversion. It is possible that one TF can be insufficient for generating radial glia that can maintain multipotency and spontaneously differentiate into neurons, astrocytes, and oligodendrocytes. In this case, Applicants can select 5-10 candidates that produce cell types with transcriptome signatures that are most similar to human fetal radial glia and overexpress different combinations of these candidates. Applicants can also combine the top 5-10 TFs that are most specifically and highly expressed in radial glia based on available RNA-seq datasets (Camp J G, et al., 2015; Johnson M B, et al., 2015; Pollen A A, et al., 2015; Thomsen E R, et al., 2016; Wu J Q, et al., 2010; and Zhang Y, et al., 2016).
Example 2—Arrayed TF Screen for iNP Differentiation
As described in example 1, Applicants compared two methods for overexpressing TFs to direct differentiation, ORF (open reading frame, cDNA) and synergistic activation mediators (SAM) CRISPR-Cas9 activation16. Applicants used these methods to stably upregulate NEUROD1 or NEUROG2, two TFs that have been previously shown to induce neuronal differentiation, in the HUES66 hESC line (
Based on the results of the comparison, Applicants used TF ORF overexpression to screen for TFs that could differentiate hESCs into iNPs first in an arrayed format to identify optimal parameters and candidate TFs that could guide the development of pooled TF screens (
Example 3—Development of a Pooled TF Screening Platform
Pooled screens are less expensive and time-intensive than arrayed screens because they do not require individually preparing each perturbation (e.g., overexpression of TFs) in the library. Pooled screening involves transducing pooled lentiviral libraries at a low multiplicity of infection (MOI) to ensure that most cells only receive one stably integrated construct. At the end of the screen, deep sequencing of DNA barcodes contained in the constructs integrated in the bulk genomic DNA can be used to identify changes in the construct distribution resulting from the applied screening selection pressure. In certain embodiments, cells having characteristic markers for the cell type of interest (e.g., radial glia) are sorted and the DNA barcodes corresponding to TFs are determined, thus identifying TFs required for differentiation into the cell type of interest.
Applicants provide a generalizable TF screening platform based on pooled screening for further identification of regulators driving cellular differentiation (
Applicants have engineered two different HUES66 hESC reporter lines that express the fluorescent protein EGFP upon upregulation of an endogenous radial glia marker gene, either VIM or SLC1A3. Screening in two different marker gene reporter lines can more specifically pinpoint which TFs direct radial glia differentiation rather than upregulate one gene that may also be expressed in other cell types. For each marker gene, Applicants used CRISPR-Cas9 to precisely edit the endogenous locus such that the EGFP is expressed under the same promoter as the marker gene, followed by a ribosomal skipping site P2A and the marker gene (Cong L, et al., Multiplex genome engineering using CRISPR/Cas systems. Science. 2013; 339(6121):819-23; and Mali P, et al., RNA-guided human genome engineering via Cas9. Science. 2013; 339(6121):823-6). Applicants chose to insert EGFP at the N-terminus of the proteins because its location was consistent across the isoforms. The P2A ribosomal skipping site separates the EGFP and marker gene proteins and prevents the EGFP insertion from potentially interfering with protein folding of the endogenous gene. For each marker gene, Applicants generated three clonal hESC lines to reduce the possibility that candidate TFs identified only have an effect in a particular clonal line. Applicants evaluated the ability of the reporter lines to fluoresce upon marker gene upregulation by targeting CRISPR activators to the marker gene promoter as well as by overexpressing a candidate TF from Example 1 to differentiate the hESCs into radial glia (Konermann S, et al., 2015). In both cases, Applicants detected EGFP fluorescence in both marker lines by imaging. For TF overexpression, Applicants also observed morphological changes consistent with radial glia differentiation.
Applicants validated the pooled screening system by pooling the targeted 90 TF library in Examples 1-2 and performing a targeted pooled screen (
Development of a Versatile Genome-Scale TF Screen
To scale up the pooled TF screen to include all annotated TF isoforms, Applicants can use the >1,300 TF library from the Broad GPP and then synthesize a >3,500 genome-scale TF library that includes all annotated TFs (see, e.g., Table 3). The Broad GPP library is a convenient intermediate because it is readily available at a lower cost. Applicants added the candidates identified in Examples 1-2 to the Broad GPP library as positive controls. Applicants amplified the pooled Broad GPP library and verified even distribution of the TFs with deep sequencing. Applicants can package the Broad GPP library into lentivirus for transducing the hESC radial glia reporter lines. As in the targeted pooled screen, Applicants can isolate the fluorescent and control cell populations and deep sequence the barcodes to compare the TF distribution between the two populations. Applicants can evaluate the results of the Broad GPP library using the candidates identified in Examples 1-2. If the TF screen using the Broad GPP library is successful, Applicants can synthesize the complete >3,500 genome-scale TF library and screen for radial glia differentiation using the genome-scale library.
Validation of Novel TFs
Applicants can validate any additional TFs identified in the pooled screens using the arrayed methods described in Examples 1-2. If any of the candidate TFs produce radial glia that are comparable with the top 3 candidates identified in Examples 1-2, Applicants can combine the TF(s) from the pooled screens with those from the arrayed screens to potentially improve radial glia fidelity.
Discussion
By starting with a targeted pooled library and incrementally scaling up to a genome-scale library using known positive controls, Applicants can establish a generalizable TF screening platform. As Applicants increase the TF library size, Applicants expect that the proportion of fluorescent cells in the screening population can decrease. Applicants can adjust the screening parameters, such as increasing flow cytometry time and number of PCR cycles for barcode amplification, to detect the rarer positive population. Performing the pooled screening platform with the genome-scale TF library may provide additional novel TFs that can drive radial glia differentiation.
As shown in Examples 1-2, it is possible that radial glia differentiation can require upregulation of multiple TFs. To screen for combinations of TFs, Applicants can transduce the TF libraries at high MOI such that each cell potentially overexpresses multiple TFs. Applicants can validate the candidates most enriched for radial glia marker gene expression both individually and combinatorically. Multiple barcodes in single cells can be determined by any single cell sequencing method described herein.
Since current neural progenitor differentiation protocols often require formation of neural rosettes, it is possible that pooled screening cannot recover some candidates found in the arrayed screen in Examples 1-2. Applicants can recover these candidates by constructing an inducible TF library (e.g., dox inducible), transducing the library at low cell density, allowing the cells to multiply in small colonies, and then inducing TF overexpression.
Compared to short hairpin RNAs and guide RNAs, cDNAs contain longer variable sequences, which can increase the skew in the distribution of pooled cDNA libraries. If the pooled cDNA libraries are significantly more skewed, Applicants can increase the screening coverage such that more cells are expressing each cDNA.
Example 4—Development of a Pooled TF Screening Platform Using Flow-FISH
Applicants have further developed a pooled transcription factor screening platform that does not require generating clonal cell lines that express a marker gene. Applicants have used Flow FISH to read out transcription factor screens. The method provides for detecting marker genes for indicating differentiation of target cells using gene specific probes and sorting the cells. In certain embodiments, multiple markers are used to increase specificity. Selecting for multiple reporter genes at the same time can narrow down target cell types because usually one gene is not specific enough depending on the target cell type. Additionally, the assay is versatile in that reporter genes can be added or changed by applying different probes. Flow FISH combines FISH to fluorescently label mRNA of reporter genes and flow cytometry (see, e.g., Arrigucci et al., FISH-Flow, a protocol for the concurrent detection of mRNA and protein in single cells using fluorescence in situ hybridization and flow cytometry, Nat Protoc. 2017 June; 12(6):1245-1260. doi:10.1038/nprot.2017.039). Specifically, Applicants fluorescently label mRNA of reporter genes, select for target cell types by flow cytometry, and then amplify TF barcodes to identify TFs enriched in the target cells. In certain embodiments, the marker genes are selected, such that they are specifically expressed only in the target cell. In this way, false positive selection or background is avoided. The assay is also optimized to remove background fluorescence and to select for true positive cells.
Applicants used the 90 TF library to screen for TFs that differentiate into radial glia by combining both SLC1A3 and VIM probes for those reporter genes (Table 4). The data shows that Applicants were able to selectively enrich for TFs that were identified in the arrayed and reporter gene screens to differentiate radial glia described in Examples 1-3.
Example 5—Identification of Candidate TFs Using the Pooled TF Screening Platform
Having optimized parameters and identified candidate TFs in the arrayed screen, Applicants generated a pooled TF screening approach, as described herein. The pooled screening platform is less expensive and laborious than arrayed screening, making it more high-throughput. Applicants simplified TF identification in pooled screens by pairing a unique DNA barcode with each of the 90 TF ORF isoforms synthesized for the arrayed screen (
For the reporter cell line method, Applicants generated clonal reporter cell lines with EGFP inserted downstream of an endogenous NP marker gene, either SLC1A3 or VIM as described. Applicants transduced the SLCIA3 or VIM reporter cell line with the pooled TF library, differentiated the cells for 7 days, and sorted for high and low EGFP-expressing cells (
For the flow-FISH method, Applicants transduced hESCs with the pooled TF library, differentiated the cells for 7 days, and labeled 2 or 10 NP marker gene transcripts using pooled FISH probes (
For the scRNA-seq method, Applicants transduced hESCs with the pooled TF library, differentiated the cells for 7 days, and performed scRNA-seq to profile 59,640 single cells (
Overall, the arrayed and pooled screens nominated overlapping sets of candidate TFs for iNP differentiation (
Example 6—Validation of Candidate TFs
To validate the screening results, Applicants chose to focus on the eight candidate TFs from the flow-FISH screen as well as two additional candidates that were enriched in the other screens and previously suggested to mediate iNP differentiation, ASCL127 and PAX628 (
Example 7—Spontaneous Differentiation of iNPs
Next, Applicants functionally validated the candidate TFs by spontaneously differentiating the iNPs produced by each candidate. Applicants transiently overexpressed candidate TFs for 1 week to produce iNPs and removed growth factors from the media to allow the iNPs to spontaneously differentiate (
Applicants validated these four TFs in two additional pluripotent stem cell lines, iPSC11a and H1. For both cell lines, overexpression of the four TFs produced iNPs that expressed higher levels of NP marker genes relative to GFP control (
Applicants further characterized the cells spontaneously differentiated from iNPs produced by these four TFs using scRNA-seq. Cluster analysis of 52,364 cells revealed that the iNPs generated a broad range of cell types that are produced by NPs during development, such as cell types from the retina, CNS, epithelium, and neural crest (
Applicants sought to better understand the transcriptional networks that lead to iNP production by profiling the transcriptional targets of the four TFs using chromatin immunoprecipitation with sequencing (ChIP-seq). Motif analysis generated distinct motifs for each TF and suggested potential transcription coregulators, some of which have been previously shown to interact with the TF (
Example 8—Modeling Neurodevelopmental Disorders Using iNPs
To demonstrate that iNPs can be used to model neurological disorders, Applicants knocked out and overexpressed DYRK1A, perturbations which have been implicated in autism spectrum disorder31 and Down syndrome32 respectively, in iPSC11a (
Applicants then spontaneously differentiated the iNPs to further profile the effects of DYRK1A perturbation on neurogenesis and neural development. Applicants found that knockout of DYRK1A increased, whereas overexpression of DYRK1A decreased, the proportion of proliferating iNPs (
Example 9—Genome-Scale TF Screen to Identify Drivers of Astrocyte Differentiation
Astrocytes are the most abundant cell type in the vertebrate central nervous system. Although previously thought to be passive responders of neuronal damage, growing evidence suggests that astrocytes actively signal to neurons to influence synaptic development, transmission, and plasticity through secreted and contact-dependent signals (Chung W S, et al., 2015). Current protocols to differentiate astrocytes from hESCs are labor-intensive, requiring the production of embryoid bodies, and take several months to produce mature astrocytes (Krencik R, et al., 2011). Identification of TFs that direct astrocyte differentiation can enable better understanding of astrocyte development and contribute to more complete models of the brain amenable to high-throughput studies. Therefore, Applicants can apply the genome-scale TF screens described herein to identify candidates that can differentiate radial glia into astrocytes (
Using the methods described in Example 2, Applicants have engineered two different HUES66 hESC reporter lines that express the fluorescent protein EGFP upon upregulation of an astrocyte marker gene, either ALDH1L1 or GFAP. For each reporter line, Applicants generated three clonal lines and verified fluorescence upon marker gene upregulation using CRISPR activation. Flow-FISH using astrocyte markers and scRNA-seq may also be used as described.
Genome-Scale TF Screen for Astrocyte Differentiation
Applicants can differentiate both the GFAP and ALDH1L1 hESC reporter lines or hESCs into radial glia using dox-inducible overexpression of the top radial glia candidate TF(s) found in Examples 1-9. Once the hESC cells have differentiated into radial glia, Applicants can withdraw dox to turn off overexpression and transduce the cells with the genome-scale TF library. Since neurogenesis precedes gliogenesis in the developing brain, Applicants hypothesize that astrocyte differentiation might require signaling from neurons. Applicants can thus perform the TF screen in the presence of neurons differentiated through NEUROG2 overexpression (Zhang Y, et al., 2013). Astrocyte differentiation might also require more time than radial glia differentiation, so Applicants can perform small-scale screens to determine the optimal time point. After 1, 2, and 4 weeks of differentiation, Applicants can use flow cytometry to quantify the percentage of fluorescent cells. Applicants can then perform the genome-scale screen and, at the time point with the highest percentage of fluorescent cells, Applicants can isolate fluorescent cells indicating upregulation of the marker gene and cells with the lowest 15% of fluorescence as controls. Applicants can deep sequence the TF barcodes in both populations to identify TFs enriched in the fluorescent population.
Validation of Candidate TFs
After identifying candidate TFs for astrocyte differentiation, Applicants can evaluate the fidelity of astrocytes differentiated from these candidates using RNA-seq, immunostaining, and functional studies on synapse formation and elimination. Applicants can perform RNA-seq on the differentiated astrocytes at two different time points determined by enrichment of fluorescent cells during the screen. Applicants can compare the RNA-seq results from differentiated astrocytes to those from human astrocytes using methods described in Example 1-2. Applicants can also immunostain the differentiated astrocytes for astrocyte markers SOX9, AQP4, and GFAP. Finally, Applicants can assess the ability of differentiated astrocytes to promote synapse formation and elimination. For synapse formation, Applicants can culture isolated mouse neurons or differentiated human neurons with and without the differentiated astrocytes and quantify the number of synapses in each condition by immunostaining for pre- and post-synaptic markers bassoon and homer1, respectively, and imaging. Applicants can quantify synapse elimination with an in vitro assay used in previous studies where Applicants conjugate a pH-sensitive fluorescent dye (pHrodo) to isolated synaptosomes that fluoresce upon incorporation into lysosomes through phagocytosis (Chung W S, et al., Astrocytes mediate synapse elimination through MEGF10 and MERTK pathways. Nature. 2013; 504(7480):394-400).
Discussion
Like radial glia, astrocytes in the human brain are very diverse, and Applicants therefore expect to find multiple TFs that direct differentiation into different subtypes of astrocytes. These TFs can likely regulate cellular pathways that are important for astrocyte function. Like in vivo astrocytes, the differentiated astrocytes can potentially increase synapse formation and phagocytose synaptosomes.
Since astrocytes arise at a later time point than radial glia during development, Applicants may extend the differentiation time of the pooled screen accordingly. In addition, it is possible that astrocyte differentiation requires exogenous factors beyond those provided by NEUROG2-differentiated neurons. Applicants can screen in the presence of isolated mouse neurons or mouse cortical brain slices to provide additional factors. If astrocyte differentiation requires upregulation of more than one TF, Applicants can transduce the TF library at high MOI. Applicants can also combine TF upregulation with downregulation by generating a TF CRISPR knockdown library and transducing cells with both the cDNA and CRISPR knockdown libraries.
Example 10—Discussion
In summary, Applicants have developed a systematic method to identify TFs for iNP differentiation that could be applied to any cell type of interest. Applicants showed that Applicants could start with NP RNA-seq data to select TFs and marker genes for unbiased pooled screening. Applicants demonstrated feasibility of using reporter cell line, flow-FISH, or scRNA-seq methods to select candidate TFs. Applicants found four novel TFs that could individually differentiate hESCs and iPSCs into iNPs that resemble the morphology, transcriptome signature, and functionality of human fetal radial glia. Out of the four candidate TFs, RFX4-derived iNPs spontaneously differentiated into the highest proportion of CNS cell types, although relative to the other candidates RFX4 has not been extensively studied in CNS development38,39. The findings thus highlight the importance of performing unbiased TF screens. By knocking out and overexpressing DYRK1A in iNPs to model neurodevelopmental disorders, Applicants demonstrated the potential of iNPs to advance our understanding of complex processes in development and disease.
The screening approach could be extended to generate other cell types that may require more than one TF. To identify combinations of TFs, Applicants could screen TFs at a higher MOI to increase the probability of introducing more than one TF in the same cell. Iterative TF screens, for instance performing TF screens in iNPs for differentiation into neurons or glia, may more closely mimic the natural developmental trajectory and facilitate generation of mature cell types. Other factors, such as mechanical stress or signaling from other cell types that are naturally present during development, may also be necessary in TF screens for some cell types.
Beyond cellular programming, TF screening enables identification of factors involved in cellular reprogramming and trans-differentiation, as well as cancer progression and senescence. The demonstration that barcoding of ORFs allows for a variety of screening selection methods could also apply to pooled ORF screening of other protein families of interest. Future application of this TF screening platform for cellular engineering has the potential to expand the number of available cellular models that will help elucidate complex regulatory mechanisms behind development and disease.
Example 11—TF Screen to Identify Drivers of Cardiomyocyte Differentiation
Using the described screens, Applicants have identified that the transcription factor EOMES generates cardiomyocytes. Overexpression of EOMES for 2 days differentiates stem cells into beating cardiomyocytes by 8 days. This differentiation method produces much higher percentages of cardiomyocytes (˜75% vs ˜30%) than the published mouse method (see, e.g., Van den Ameele J, Tiberi L, Bondue A, et al. Eomesodermin induces Mesp1 expression and cardiac differentiation from embryonic stem cells in the absence of Activin. EMBO Reports. 2012; 13(4):355-362. doi:10.1038/embor.2012.23; and WO2013010965A1). The present invention has demonstrates using human EOMES for differentiating human stem cells. For the cardiomyocytes, Applicants have observed the cells beating after 2 weeks of differentiation and have made a video recording. Applicants have also further identified MESP1 and ESR1 as candidates that drive cardiomyocyte differentiation. In certain embodiments, the cardiomyocytes generated according to the present invention may be used for transplant into patients suffering from heart disease. The present methods also allow for generating cardiomyocytes in a method requiring the expression of a single transcription factor as opposed to previous methods requiring fibroblasts to be differentiated into cardiomyocytes by expressing three transcription factors. In certain embodiments, the cardiomyocytes of the present invention may be used for screening drugs. For example, drugs that are toxic to cardiomyocytes can be screened.
Conditions for generating cardiomyocytes according to the present invention include the following. Culturing ES cells in RPMI+1X B27(without insulin)+50 ug/ml ascorbic acid; switch to RPMI+1×B27 at day 7. The seeding density is high (about 500,000 cells/mL). Dox (about 500 ng/ml) is added to induce expression of the transcription factor (e.g., EOMES) between or at days 0-2. This method results in about 75% of the cells expressing the cardiomyocyte marker TNNT2.
Example 12—A Multiplexed Transcription Factor Screening Platform for Directed Differentiation
Directed differentiation of human pluripotent stem cells into diverse cell types has the potential to realize a broad array of cellular replacement therapies and provides a tractable model that can be perturbed, genetically or chemically, to assess effects in a cell type-specific context (Cohen and Melton, 2011; Colman and Dreesen, 2009; Keller, 2005; Kiskinis and Eggan, 2010; Robinton and Daley, 2012). However, it remains challenging or impossible to generate many cell types (Cohen and Melton, 2011; Colman and Dreesen, 2009; Keller, 2005; Kiskinis and Eggan, 2010; Robinton and Daley, 2012). The best differentiation methods are often labor-intensive and can require months to produce even heterogenous or immature cell populations. Many of these methods rely on exogenous growth factors or small molecules, which are often dosage-sensitive and difficult to identify in a scalable manner. Alternatively, overexpression of transcription factors (TFs) has been shown to rapidly and efficiently generate many different cell types, including neurons and skeletal muscle cells (Furuyama et al., 2019; Pang et al., 2011; Song et al., 2012; Sugimura et al., 2017; Takahashi and Yamanaka, 2006; Weintraub et al., 1989; Zhang et al., 2013). As TFs use endogenous regulatory pathways to drive differentiation, mimicking natural development, this approach to engineering cell fate may produce higher fidelity models while illuminating aspects of development. However, the process of discovering TFs for directed differentiation relies on time-intensive and low-throughput arrayed screens. Arrayed screens, in which each perturbation must be performed and tested individually, are challenging to carry out at large scale, typically limited to 5-25 TFs (Furuyama et al., 2019; Pang et al., 2011; Song et al., 2012; Sugimura et al., 2017; Takahashi and Yamanaka, 2006; Weintraub et al., 1989; Zhang et al., 2013). By contrast, pooled screening approaches, which make use of barcodes to enable multiple perturbations to be tested in parallel, are more scalable, both in terms of time and cost.
To unlock the potential of this promising approach, Applicants sought to develop a multiplexed TF screening platform to identify TFs that can drive specific cell fates in a high-throughput manner. Applicants explored two requirements for pooled screening to identify TFs that drive differentiation. First, perturbations can be introduced into cells via a single copy to drive sufficient TF expression to induce cellular programing. Second, target cell types can be enriched from a diverse cell population, and the TF perturbations that produce the target cell types can be identified.
Applicants first compared different TF overexpression methods and found that ORF overexpression most effectively differentiated human embryonic stem cells (hESCs) into neurons. To establish a generalizable platform for systematic identification of TFs for cellular programming, Applicants created a barcoded human TF library, which Applicants named Multiplexed Overexpression of Regulatory Factors (MORF). The MORF library consists of all known TFs from the human genome, with 3,548 isoforms covering 1,836 genes, and used this library to assay 90 TF isoforms for differentiation of hESCs into neural progenitors (NPs). Applicants chose NPs as the target cell type because induced NPs (iNPs) offer a tractable model for studying complex disorders of the central nervous system (CNS), but current methods for producing iNPs, namely embryoid body formation (Schafer et al., 2019; Zhang et al., 2001) or dual SMAD inhibition (Chambers et al., 2009; Shi et al., 2012a), are low-throughput or produce variable differentiation results depending on the cell line (Hu et al., 2010), respectively. Applicants selected for TFs that drive iNP differentiation using various methods to enrich for target cell types based on marker gene combinations. The pooled screens identified four TFs (RFX4, NFIB, PAX6, and ASCL1), each of which produced multipotent iNPs that could spontaneously differentiate into CNS cell types. Addition of dual SMAD inhibitors to RFX4-overexpressing cells produced homogenous iNPs that preferentially differentiated into GABAergic neurons. RFX4-iNPs can be used to model neurodevelopmental disorders. Using iNPs as a demonstration, Applicants show that pooled TF screening is a scalable and generalizable approach for systematically identifying TFs that drive differentiation of desired cell types.
Example 13—TF ORF Overexpression Effectively Drives Differentiation
Recently, the microbial CRISPR-Cas9 system has been adapted for large-scale gene activation screening, which provides a rapid and efficient method for elucidating complex biology at the genome scale (Gilbert et al., 2014; Konermann et al., 2015). Applicants therefore first sought to leverage the ease and scalability of CRISPR activation (CRISPRa) to screen 1,965 annotated TF genes (Zhang et al., 2012) for their ability to drive differentiation of HUES66 hESCs toward NP cell fates. However, the initial screen did not lead to significant differentiation (data not shown), in contrast to previous observations in mouse embryonic stem cells (Liu et al., 2018).
Although CRISPRa has been used in a range of biological contexts (Gilbert et al., 2014; Joung et al., 2017a; Konermann et al., 2015), the particular regulatory environment of hESCs may be uniquely buffered against TF overexpression. Therefore, Applicants next compared the ability of CRISPRa and ORF-based methods to overexpress NEUROD1 or NEUROG2, two TFs that have been previously shown to induce neuronal differentiation (Zhang et al., 2013), at single copy in HUES66 hESCs (
Example 14—a Barcoded Human TF Library for Directed Differentiation
To enable high-throughput, systematic identification of TFs for directed differentiation of any desired cell type, Applicants created a barcoded human TF library, MORF (
Example 15—Development of a Pooled TF ORF Screening Platform for iNP Differentiation
As a demonstration, Applicants performed a targeted TF screen for differentiation of hESCs into iNPs. To select a subset of TFs for the screen, Applicants examined eight RNA-sequencing (RNA-seq) datasets (Camp et al., 2015; Johnson et al., 2015; Llorens-Bobadilla et al., 2015; Pollen et al., 2015; Shin et al., 2015; Thomsen et al., 2016; Wu et al., 2010; Zhang et al., 2016) and identified 70 TFs that were found to be specifically expressed in NPs in at least two datasets. For each TF, Applicants included isoforms that comprised >25% of the expressed transcript in NPs, resulting in a total of 90 TF isoforms (see Methods; Table 1). Applicants pooled the barcoded TFs and packaged them into a lentiviral library for delivery in hESCs (
For the reporter cell line method, Applicants generated clonal reporter cell lines with EGFP inserted downstream of an endogenous NP marker gene, either SLC1A3 or VIM, which were selected based on convergence across published RNA-seq datasets and high expression levels (Camp et al., 2015; Johnson et al., 2015; Llorens-Bobadilla et al., 2015; Pollen et al., 2015; Shin et al., 2015; Thomsen et al., 2016; Wu et al., 2010; Zhang et al., 2016). Applicants transduced the SLC1A3 or VIM reporter cell line with the pooled TF library, differentiated the cells for 7 days, and sorted for high and low EGFP-expressing cells (
For the flow-FISH method, Applicants transduced hESCs with the pooled TF library, differentiated the cells for 7 days, and labeled either 2 or 10 NP marker gene transcripts using pooled FISH probes (
For the scRNA-seq method, Applicants transduced hESCs with the pooled TF library, differentiated the cells for 7 days, and performed scRNA-seq to profile 53,560 single cells (
To verify the results from the pooled screen, Applicants performed an arrayed screen on the same 90 TF isoforms, packaging each TF individually into lentivirus for delivery into hESCs (
Example 16—Validation of Candidate TFs for iNP Differentiation
For downstream analysis, Applicants chose to focus on the eight candidate TFs from the flow-FISH screen as well as two additional candidates that were enriched in the other screens and previously suggested to mediate iNP differentiation, ASCL1 (Casarosa et al., 1999) and PAX6 (Zhang et al., 2010) (
Example 17—Functional Evaluation of iNP Multipotency Using Spontaneous Differentiation
Next, Applicants evaluated the multipotency of iNPs produced by each candidate TF by spontaneously differentiating the iNPs. Applicants transiently overexpressed candidate TFs for 1 week to produce iNPs and then removed growth factors from the media to allow the iNPs to spontaneously differentiate for 8 weeks (
Applicants validated these four TFs in two additional pluripotent stem cell lines, iPSC11a and H1. For both cell lines, overexpression of the four TFs produced iNPs that expressed higher levels of NP marker genes relative to GFP control (
Applicants further characterized the cells spontaneously differentiated from iNPs produced by these four TFs using scRNA-seq. Cluster analysis of 53,113 cells revealed that the iNPs generated a broad range of cell types, such as cell types from the retina, CNS, epithelium, and neural crest (
To better understand the transcriptional networks that lead to iNP production, Applicants profiled the four TFs using chromatin immunoprecipitation with sequencing (ChIP-seq). Motif analysis generated distinct motifs for each TF and suggested potential transcriptional coregulators, some of which have been found in previous studies (
Example 18—Combining RFX4 with Dual SMAD Inhibition Produces Homogenous iNPs
Next, Applicants sought to improve the consistency of RFX4-iNPs. Although RFX4-iNPs produced the highest proportion of CNS cell types, the iNPs were less consistent between biological replicates (
Applicants then compared iNPs generated by the optimized protocol, RFX4-DS, to those from two alternative NP differentiation methods that rely on EB (Schafer et al., 2019) and DS (Shi et al., 2012a). Applicants derived iNPs using the three differentiation methods in two batch replicates and performed scRNA-seq on 42,780 iNPs (15,211 RFX4-DS-iNPs, 11,148 EB-iNPs, and 16,421 DS-iNPs). Cluster analysis showed that, as expected, the majority of the cells were NPs (
To characterize the cells spontaneously differentiated from RFX4-DS-iNPs, Applicants performed scRNA-seq on 26,111 cells at 4 and 8 weeks of spontaneous differentiation. Cluster analysis showed that RFX4-DS-iNPs differentiated into predominantly CNS cell types, radial glia, and neurons, with a small subset differentiating into meningeal cells (
Example 19—RFX4-iNPs Accurately Model Effects of DYRK1A Perturbations on Neural Development
To explore the utility of the differentiation protocol Applicants developed, Applicants transiently overexpressed RFX4 to differentiate iPSC11a into iNPs to study the effects of DYRK1A perturbation on NPs during neural development (
Applicants further characterized neurons spontaneously differentiated from DYRK1A-perturbed iNPs using electrophysiology. Whole-cell patch-clamp recording of neurons after 12-14 weeks of spontaneous differentiation confirmed that neurons derived from unperturbed iNPs were electrophysiologically functional (
Example 20—Discussion
By screening TF ORFs, Applicants were able to identify four TFs that could individually differentiate hESCs and induced pluripotent stem cells into iNPs that resemble the morphology, transcriptome signature, and multipotency of NPs. Of the four candidate TFs, overexpression of RFX4, which has not been extensively studied in CNS development, resulted in the highest proportion of CNS cell types, highlighting the importance of performing large-scale, unbiased TF screens (Ashique et al., 2009; Blackshear et al., 2003). Combining RFX4 overexpression with dual SMAD inhibition produced homogenous iNPs that spontaneously differentiated into predominantly GABAergic neurons. Notably, the differentiation method produced iNPs within 7 days, compared to 11-16 days for existing differentiation methods, and is more scalable than the embryoid body method (Chambers et al., 2009; Schafer et al., 2019; Shi et al., 2012a; Zhang et al., 2001). By perturbing DYRK1A in iNPs to model neurodevelopmental disorders, Applicants found that DYRK1A modulates iNP proliferation to disrupt neurogenesis, confirming results from previous studies in other model systems (Fotaki et al., 2002; Hammerle et al., 2011; Park et al., 2010; Soppa et al., 2014; Yabut et al., 2010) and suggesting candidate genes that mediate the effect of DYRK1A on neural development.
Although Applicants focused here on 90 TF isoforms highly expressed in the target cell type (˜23% of TFs expressed in NPs and ˜2.5% of all TF isoforms), the accessibility and low-cost nature of the multiplexed screening approach lends itself to scalable extensions of the technology to additional cell types of interest. For some of these cell types, Applicants have recommended lists of marker genes and TFs based on published RNA-seq datasets (Table 9). Applicants have also provided code for aggregating gene lists from different datasets and selecting marker genes and a subset of TFs from the TF library for targeted screening (see Methods). Moreover, the approach may be applied to identify combinations of TFs by screening at a higher MOI to increase the probability of introducing more than one TF in the same cell. Iterative TF screens may also expand the landscape of cell types it is possible to generate with this platform. For instance, performing TF screens in iNPs for differentiation into neurons or glia may facilitate generation of mature cell types as iterative overexpression of TFs may mimic the natural developmental trajectory.
Beyond directed differentiation, TF screening enables identification of factors involved in cellular reprogramming (Takahashi and Yamanaka, 2006) and trans-differentiation (Pang et al., 2011; Song et al., 2012), as well as cancer progression (Darnell, 2002) and senescence (Campisi, 2001). The ORF barcoding approach allows for a variety of screening selection methods and could also be extended to pooled ORF screening of other protein families of interest. Future application of the multiplexed TF screening platform for cellular engineering has the potential to expand the number of available cellular models that will help elucidate complex regulatory mechanisms behind development and disease.
Example 21—Methods for Examples 1-21
Sequences and cloning. The plasmids lentiMPHv2 (Addgene 89308) and lentiSAMv2 (Addgene 75112) were used for CRISPR activation. LentiCRISPRv2 (Addgene 52961) was used for CRISPR-Cas9 mediated homology-directed repair (HDR). The Puromycin resistance gene in lentiCRISPRv2 was replaced with Blasticidin resistance gene (Addgene 75112) for CRISPR-Cas9 knockout of DYRK1A. Single guide RNA (sgRNA) spacer sequences used in this study are listed in Table 10, and cloned into the respective vectors as previously described (Joung et al., 2017b). For spontaneous differentiation using a dox-inducible gene expression system, the plasmid pUltra-puro-RTTA3 (Addgene 58750) was used for rtTA. The EF1a promoter in pLX_TRC209 (Broad Genetic Perturbation Platform) was replaced with the pTight promoter (Addgene 31877). For DYRK1A overexpression, the codon-optimized DYRK1A sequence (NM_001396) was cloned into pLX_TRC209 (Broad Genetic Perturbation Platform) for expression under EF1a and the Hygromycin resistance gene was replaced with a Blasticidin resistance gene (Addgene 75112).
Cell culture and differentiation. HEK293FT cells (Thermo Fisher Scientific R70007) were maintained in high-glucose DMEM with GlutaMax and pyruvate (Thermo Fisher Scientific 10569010) supplemented with 10% fetal bovine serum (VWR 97068-085) and 1% penicillin/streptomycin (Thermo Fisher Scientific 15140122). Cells were passaged every other day at a ratio of 1:4 or 1:5 using TrypLE Express (Thermo Fisher Scientific 12604021).
Unless otherwise specified, human embryonic stem cells (hESCs) used in these experiments were from the HUES66 cell line (Harvard Stem Cell Institute iPS Core Facility). Other stem cell lines used in this study include human induced pluripotent stem cell (iPSC) 11a (gift from the Arlotta laboratory, Harvard University) and hESC H1 (WiCell). hESCs and iPSCs were maintained in cell culture dishes coated with 1% Geltrex membrane matrix (Thermo Fisher Scientific A1413202) in mTeSR1 medium (STEMCELL Technologies 85850). For routine maintenance, stem cells were passaged 1:10-1:20 using ReLeSR (STEMCELL Technologies 05873) and seeded in mTeSR with 10 μM ROCK Inhibitor Y27632 (Enzo Life Sciences ALX-270-333-M025). For lentivirus transduction and differentiation, cells were dissociated using Accutase (STEMCELL Technologies 07920). All stem cells were maintained below passage 30 and confirmed to be karyotypically normal and negative for mycoplasma within 5 passages before differentiation.
During neuronal differentiation, stem cell media was incrementally shifted towards neuronal media, consisting of Neurobasal medium (Thermo Fisher Scientific 21103049) supplemented with B-27 (Thermo Fisher Scientific 17504044), GlutaMAX (Thermo Fisher Scientific 35050061), and Normocin (Invivogen ant-nr-1). 1 day after the start of differentiation (day 1), media was changed to stem cell media with the appropriate antibiotic. Antibiotic was included in the media for a total of 5 days of selection. On day 2, media was changed to 75% stem cell media and 25% neuronal media. On day 3, media was changed to 50% stem cell media and 50% neuronal media. On day 4, media was changed to 25% stem cell media and 75% neuronal media. On day 5, media was changed to neuronal media.
During TF-driven neural progenitor (NP) differentiation, stem cell media was gradually shifted towards NP media, consisting of DMEM/F-12 with HEPES (Thermo Fisher Scientific 11330057) supplemented with B-27 (Thermo Fisher Scientific 17504044), 20 ng/ml EGF (MilliporeSigma E9644), 20 ng/mL bFGF (STEMCELL Technologies 78003), 2 ug/ml heparin (STEMCELL Technologies 07980), and Normocin (Invivogen ant-nr-1). Similar to neuronal differentiation, stem cell media was shifted by increasing the proportion of NP media 25% incrementally from day 2 to day 5. Cells were passaged at day 4 when selected with the appropriate antibiotic. For spontaneous differentiation, 2 μg/mL doxycycline (MilliporeSigma D9891) was added to the media starting from day 0 for 7 days. After 7 days, cells were maintained in NP media for 3 days before media was changed to differentiation media, which had the same components as NP media but without EGF and bFGF. During spontaneous differentiation, 40-60% of differentiation media was refreshed every other day.
For comparison to other NP differentiation methods, embryoid body (EB) (Schafer et al., 2019) and dual SMAD inhibition (DS) (Shi et al., 2012a) methods were used to differentiate hESCs into NP as previously described. To provide the best comparison between the methods, the differentiation timelines for the three methods were aligned such that the iNP differentiation ended around the same time. The iNPs produced by the three methods were dissociated for scRNA-seq at the same time. During the RFX4-iNP protocol optimization, base media from the DS and EB protocols were tested. DS media is a 1:1 mix of N-2 and B-27-containing media. N-2 medium consists of DMEM/F12 with HEPES (Thermo Fisher Scientific 11330057) supplemented with N-2 (Thermo Fisher Scientific 17502048), 5 μg/mL insulin (Millipore Sigma 19278), 100 μM nonessential amino acids (Thermo Fisher Scientific 11140050), 100 M 2-mercaptoethanol (Millipore Sigma M6250), and Normocin (Invivogen ant-nr-1). B-27 medium is the same as the neuronal medium described above. EB media consists of DMEM/F12 with HEPES (Thermo Fisher Scientific 11330057) supplemented with N-2 (Thermo Fisher Scientific 17502048), B27 minus vitamin A (Thermo Fisher Scientific 12587010), and Normocin (Invivogen ant-nr-1). SMAD inhibitors dorsomorphin (Millipore Sigma P5499) and SB-431542 (R&D Systems 1614) were added where indicated.
Lentivirus production. HEK293FT cells (Thermo Fisher Scientific R70007) were cultured as described above. 1 day prior to transfection, cells were seeded at ˜40% confluency in T25, T75, or T225 flasks (Thermo Fisher Scientific 156367, 156499, or 159934). Cells were transfected the next day at ˜90-99% confluency. For each T25 flask, 3.4 μg of plasmid containing the vector of interest, 2.6 μg of psPAX2 (Addgene 12260), and 1.7 μg of pMD2.G (Addgene 12259) were transfected using 17.5 μL of Lipofectamine 3000 (Thermo Fisher Scientific L3000150), 15 μL of P3000 Enhancer (Thermo Fisher Scientific L3000150), and 1.25 mL of Opti-MEM (Thermo Fisher Scientific 31985070). Transfection parameters were scaled up linearly with flask area for T75 and T225 flasks. Media was changed 5 h after transfection. Virus supernatant was harvested 48 h post-transfection, filtered with a 0.45 μm PVDF filter (MilliporeSigma SLHV013SL), aliquoted, and stored at −80° C.
Lentivirus transduction. For transduction, 3×106 hESCs or iPSCs were seeded in 10-cm cell culture dishes with 10 μM ROCK Inhibitor Y27632 (Enzo Life Sciences ALX-270-333-M025) and an appropriate volume of lentivirus in mTeSR. After 24 h, media was refreshed with the appropriate antibiotic. For 5 days, media with the appropriate antibiotic was refreshed every day, and cells were passaged after 3 days of selection. Concentrations for selection agents were determined using a kill curve: 150 μg/mL Hygromycin (Thermo Fisher Scientific 10687010), 3 μg/mL Blasticidin (Thermo Fisher Scientific A1113903), and 1 μg/mL Puromycin (Thermo Fisher A1113803). Lentiviral titers were calculated by transducing cells with 5 different volumes of lentivirus and determining viability after a complete selection of 3 days (Joung et al., 2017b).
qPCR quantification of transcript expression. Cells were seeded in 96-well plates and grown to 60-90% confluency before RNA was reverse transcribed for qPCR as described previously (Joung et al., 2017b). TaqMan qPCR was performed with custom or readymade probes (Tables 11 and 12). Significance testing was performed using Student's t-test.
Western blot. Protein lysates were harvested with RIPA lysis buffer (Cell Signaling Technologies 9806S) containing protease inhibitor cocktail (MilliporeSigma 05892791001). Samples were standardized for protein concentration using the Pierce BCA protein assay (VWR 23227), and 20 μg or 40 μg of the samples were incubated at 70° C. for 10 mins under reducing conditions. After denaturation, samples were separated by Bolt 4-12% Bis-Tris Plus Gels (Thermo Fisher Scientific NW04125BOX) and transferred onto a PVDF membrane using iBlot Transfer Stacks (Thermo Fisher Scientific IB401001).
For NEUROD1 and V5, blots were blocked with Odyssey Blocking Buffer (TBS; LiCOr 927-50000) for 1 h at room temperature. Blots were then probed with different primary antibodies [anti-NEUROD1 (Abcam ab60704, 1:1,000 dilution), anti-GAPDH (Cell Signaling Technologies 2118L, 1:1,000 dilution), anti-V5 (Cell Signaling Technologies 13202S, 1:1,000 dilution), anti-ACTB (MilliporeSigma A5441, 1:5,000 dilution)] in Odyssey Blocking Buffer overnight at 4° C. Blots were washed with TBST before incubation with secondary antibodies IRDye 680RD Donkey anti-Mouse IgG (LiCOr 925-68072) and IRDye 800CW Donkey anti-Rabbit IgG (LiCOr 925-32213) at 1:20,000 dilution in Odyssey Blocking Buffer for 1 h at room temperature. Blots were washed with TBST and imaged using the Odyssey CLx (LiCOr).
For DYRK1A, blots were blocked with 5% BLOT-QuickBlocker (G Biosciences 786-011) in TBST for 1 h at room temperature. Blots were then probed with different primary antibodies [anti-DYRK1A (Novus Biologicals H00001859-M01, 1:250 dilution) or anti-ACTB (Cell Signaling Technologies 4967L, 1:1,000 dilution)] in 2.5% BLOT-QuickBlocker (G Biosciences 786-011) in TBST overnight at 4° C. Blots were washed with TBST before incubation with secondary antibodies anti-mouse IgG, HRP-linked antibody (Cell Signaling Technologies 7076S) and anti-rabbit IgG, HRP-linked antibody (Cell Signaling Technologies 7074S) at 1:5,000 dilution in 2.5% BLOT-QuickBlocker (G Biosciences 786-011) in TBST for 1 h at room temperature. Blots were washed with TBST and imaged using the Pierce ECL Western Blotting Substrate (Thermo Fisher Scientific 32209) on the ChemiDox XRS+ (Bio-Rad).
Immunofluorescence and imaging. Cells were cultured on poly-D-lysine/laminin coated glass coverslips (VWR 354087) in 24-well plates as described above. Prior to staining, cells were washed with 1 mL PBS and fixed with 4% paraformaldehyde (VWR 15710) in PBS for 30 mins at room temperature. Cells were washed with PBS and blocked in PBS with 2.5% goat serum (Cell Signaling Technologies 5425S) and 0.1% Triton X-100 (MilliporeSigma 93443) for 1 h at room temperature. Cells were then stained with different primary antibodies [anti-MAP2 (MilliporeSigma M1406, 1:500 dilution), anti-PAX6 (Abcam ab5790, 1:500 dilution), anti-Nestin (MilliporeSigma MAB5326, 1:200 dilution), anti-VIM (Proteintech 10366-1-AP, 1:200 dilution), anti-GFAP (Abcam ab4674, 1:500 dilution), anti-NG2 (MilliporeSigma AB5320, 1:200 dilution), anti-PDGFRA (Cell Signaling Technologies 3164S, 1:200 dilution), or anti-FOXG1 (Abcam ab18259, 1:500 dilution] in PBS with 1.25% goat serum (Cell Signaling Technologies 5425S) and 0.1% Triton X-100 (MilliporeSigma 93443) overnight at 4° C. Cells were washed in PBS with 0.1% Triton X-100 (MilliporeSigma 93443) before staining with the appropriate secondary antibodies [goat anti-mouse IgG (Alexa Fluor 568, Thermo Fisher Scientific A-11031, 1:1,000 dilution), goat anti-chicken IgY (Alexa Fluor 488, Thermo Fisher Scientific A-11039, 1:1,000 dilution), goat anti-rabbit IgG (Alexa Fluor 647, Thermo Fisher Scientific A-21244, 1:1,000 dilution), or goat anti-rabbit IgG (Alexa Fluor 488, Thermo Fisher Scientific A-11008, 1:1,000 dilution)] in PBS with 1.25% goat serum (Cell Signaling Technologies 5425S) and 0.1% Triton X-100 (MilliporeSigma 93443) for 1 h at room temperature. Cells were washed in PBS with 0.1% Triton X-100 (MilliporeSigma 93443), mounted onto slides using ProLong Gold Antifade Mountant with DAPI (Thermo Fisher Scientific P36941), and nail polished (VWR 100491-940). Immunostained coverslips were imaged on a Zeiss Axio Observer with a Hamatsu Camera using a Plan-Apochromat 20x objective and a 1.6× Optovar.
Image quantification. Images were taken from randomly selected regions using fixed exposure times. The MeasureImageIntensity module in CellProfiler 3.1.8 was used to analyze grayscale 577 nm images (MAP2) for mean intensity units. For induced neurons, mean intensity units were normalized by the number of nuclei in each image. The IdentifyPrimaryObjects module in CellProfiler was used to identify and count nuclei in the grayscale 353 nm (DAPI) images with the following settings modified from default: Typical diameter of objects, in pixel units (Min, Max): 25, 70; Threshold strategy: Adaptive; Threshold smoothing scale: 1.5; Lower and upper bounds on threshold: 0.06, 1.0. Significance testing was performed using Student's t-test.
Design and cloning of TF ORF libraries. The barcoded human TF library (MORF) consisted of 1,836 genes that were selected based on AnimalTFDB (Zhang et al., 2015) and Uniprot (UniProt, 2015) annotations and included histone modifiers. The library included 3,548 isoforms that overlapped between RefSeq and Gencode annotations, as well as 2 control vectors expressing GFP and mCherry. 593 of the 3,548 isoforms were obtained from the Broad Genomic Perturbation Platform and sequence verified. Table 3 lists the sequences of TFs in MORF.
To design a targeted TF ORF library for NP differentiation, single-cell or bulk RNA-seq datasets of human or mouse radial glia, neural stem cells, differentiated neural progenitors from 2D cultures or brain organoids, and fetal astrocytes were used to select TFs that were shown to be specifically expressed in these cell types (Camp et al., 2015; Johnson et al., 2015; Llorens-Bobadilla et al., 2015; Pollen et al., 2015; Shin et al., 2015; Thomsen et al., 2016; Wu et al., 2010; Zhang et al., 2016). TFs that were identified in 2 or more datasets (out of 8) were included in the library. Then, bulk RNA-seq data of human fetal astrocytes (Zhang et al., 2016) was used to identify TF isoforms annotated in RefSeq that comprised >25% of the TF gene transcripts. These criteria selected 90 TF isoforms covering 70 TF genes (Table 1).
TF ORF isoforms that were not available from the Broad Genomic Perturbation Platform were synthesized with 24-bp barcodes (Genewiz) and cloned in an arrayed format into pLX_TRC317 (MORF; Broad Genetic Perturbation Platform) or pLX_TRC209 (targeted NP library; Broad Genetic Perturbation Platform) for expression under the EF1a promoter. Barcodes for each TF were selected to have a Hamming distance of at least 3 compared to all other barcodes.
Reporter cell line screen. To generate reporter cell lines, EGFP from pLX_TRC209 (Broad Genetic Perturbation Platform) followed by a T2A (GGCAGTGGAGAGGGCAGAGGAAGTCTGCTAACATGCGGTGACGTCGAGGAGAA TCCTGGCCCA (SEQ ID NO: 10809)) self-cleaving peptide was inserted at the N-terminus of endogenous SLC1A3 and VIM genomic sequences. Clonal reporter cell lines were generated using CRISPR-Cas9 mediated HDR. To construct the HDR plasmids for each gene, the HDR templates that consisted of the 850-1,000 bp genomic regions flanking the sgRNA cleavage sites were PCR amplified from HUES66 genomic DNA using KAPA HiFi HotStart Readymix (KAPA Biosystems KK2602). Then EGFP-T2A flanked by HDR templates were cloned into pUC19 (Addgene 50005). HUES66 cells were nucleofected with 10 μg of sgRNA and Cas9 plasmid (Addgene 52961) and 6 μg of HDR plasmid using the P3 Primary Cell 4D-Nucleofector X Kit (Lonza V4XP-3024) according to the manufacturer's instructions. Cells were then seeded sparsely (2 electroporation reactions per 10-cm cell culture dish) to form single-cell clones. After 18 h, cells were selected for Cas9 expression with 0.5 μg/mL Puromycin for 2 days and expanded until colonies can be picked (˜1 week).
Cell colonies were detached by replacing the media with PBS and incubating at room temperature for 15 mins. Each cell colony was removed from the Petri dish using a 200 μL pipette tip and transferred a well in a 96-well plate for expansion. Clones with EGFP insertions were identified by 2-round PCR amplification (Table 13), first with primers amplifying outside of the HDR template (HDR Fwd 1 and HDR Rev, 15 cycles) and then with primers amplifying the region of insertion (HDR Fwd 2 and HDR Rev, 15 cycles) to avoid detecting the HDR template plasmid as a false positive. Products were run on a gel to identify clones with insertions and Sanger sequencing confirmed that EGFP had been inserted at the intended site without mutations. For each reporter cell line, 3 clones with EGFP inserted into one of the two alleles were selected for further expansion and characterization.
For TF ORF screening using reporter hESC lines, SLCIA3 or VIM reporter HUES66 cell lines were transduced with the pooled TF ORF library at MOI <0.3 and differentiated into iNPs as described above. After 7 days of differentiation, 5-10×106 cells were sorted for EGFP expression using the Sony SH800S Cell Sorter. For each clonal line, the percentage of cells sorted for the control condition was matched to those expressing EGFP (˜15-20%). After sorting, TF barcodes from each population were amplified (Table 13) and deep-sequenced on the Illumina MiSeq platform as previously described (>0.5 million reads per cell population) (Joung et al., 2017b). NGS reads that perfectly matched each barcode were counted and normalized to the total number of perfectly matched NGS reads for each condition. Enrichment of each TF was calculated as the normalized barcode count in the high population divided by the count in the low population.
Flow-FISH screen. For TF ORF screening using flow-FISH, HUES66 cells were transduced with the pooled TF ORF library at MOI <0.3 and differentiated into iNPs as described above. After 7 days of differentiation, cells were labeled with the appropriate FISH probes (Table 14) using the PrimeFlow RNA assay kit (Thermo Fisher Scientific 88-18005-204) with 20 million cells in 4 reactions per biological replicate. FISH probes targeting transcripts with similar expression levels were pooled together. Once the cells were labeled, the entire cell population was sorted for high or low fluorescence (15% of cells per bin), indicating an aggregate expression level of the transcripts labeled with the pooled FISH probes for the particular wavelength. After sorting, TF barcodes from each population were amplified (Table 13) using a modified ChIP reverse cross-linking protocol as described previously (Fulco et al., 2019) and deep-sequenced on the Illumina NextSeq platform (>4 million reads per cell population). Enrichment of each TF was calculated as described above for the reporter cell line screen.
Single-cell RNA sequencing (scRNA-seq) and data analysis. Cells were dissociated with Accutase (STEMCELL Technologies 07920) for 10 mins (NP) or 50 mins (spontaneously differentiated cells) at 37° C. and filtered using a 70 μm cell strainer (MilliporeSigma CLS431751) to obtain single cells. Cells were resuspended in PBS containing 0.04% BSA, counted, and loaded in the 10× Genomics Chromium Controller. 10,000 cells were used as input for each channel of a 10× Chromium Chip. For cells from the scRNA-seq pooled screen and spontaneous differentiation of four candidate TFs, scRNA-seq libraries were prepared using the Chromium Single Cell 3′ Library & Gel Bead Kit v2 (10× Genomics 120237) according to the manufacturer's instructions. Libraries were sequenced on the NextSeq platform, aiming for a minimum coverage of 20,000 reads per single cell (paired-end; read 1: 26 cycles; i7 index: 8 cycles, i5 index: 0 cycles; read 2: 55 cycles). For cells from the NP method comparison and spontaneous differentiation of RFX4-DS-iNPs, scRNA-seq libraries were prepared using the Chromium Single Cell 3′ Library & Gel Bead Kit v3 (10x Genomics 1000075) and sequenced on the HiSeq X platform (paired-end; read 1: 28 cycles; i7 index: 8 cycles, i5 index: 0 cycles; read 2: 96 cycles).
Sequencing data were aligned and quantified using the Cell Ranger Single-Cell Software Suite v3.1.0 (10× Genomics) (Zheng et al., 2017) against the GRCh38 human reference genome provided by Cell Ranger. The Python package Scanpy v1.4.4 (Wolf et al., 2018) was used to cluster and visualize cells. Cells with 400-7,000 detected genes and less than 5% total mitochondrial gene expression were retained for analysis. Genes that were detected in fewer than 3 cells were removed. Scanpy was used to log normalize, scale, and center the data and unwanted variation was removed by regressing out the number of UMIs and percent mitochondrial reads. Next, highly variable genes were identified and used as input for dimensionality reduction via principal component analysis (PCA). The resulting principal components were then used to cluster the cells, which were visualized using Uniform manifold approximation and projection (UMAP). Clusters were identified using Louvain by fitting the top 50 principal components to compute a neighborhood graph of observations with local neighborhood number of 20 using the scanpy.pp.neighbors function. Cells were then clustered into subgroups using the Louvain algorithm implemented as the scanpy.tl.louvain function. Cluster marker genes and associated p-values were identified using the scanpy.tl.rank_gene_groups function.
For scRNA-seq analysis of the pooled 90 TF screen for NP differentiation, distance between cells with different TF perturbations was calculated using the scipy.spatial.distance.cdist function from the SciPy Python library. For each TF perturbation, the pairwise distance between cells with the TF perturbation and cells without the TF perturbation was calculated and the median of the distances was determined. The 939 highly variable genes were used in the distance calculation. To identify TFs that produced transcriptome profiles similar to radial glia from human fetal cortex or brain organoid, TF scRNA-seq signatures were correlated to available scRNA-seq datasets (Nowakowski et al., 2017; Pollen et al., 2015; Quadrato et al., 2017). The 218 most variable genes in the scRNA-seq data, which were identified using the scanpy.pp.highly_variable_genes function with the parameters “min_mean=0.075, max_mean=8 and min_disp=1.5”, were used for the correlation analysis. The Spearman correlations between expression of these genes in each TF-perturbed single cell and the average expression in radial glia scRNA-seq from human fetal cortex or organoid were calculated. Then, the average correlation of each TF was determined by taking the average of the corresponding TF-perturbed single cell correlations. Candidate TFs were ranked based on the z-score of the average correlation across all datasets. For comparing TF transcriptome signatures to other cell types from the mouse organogenesis cell atlas (Cao et al., 2019), average expression of the top 30 marker genes (ranked by p-value) for each cell type was used to assess similarity. The z-score of the average marker gene expression for cells perturbed by each TF was used to identify TF perturbations that were most similar to each cell type.
For determining consistency within batch replicates of different iNP differentiation methods, the cluster of spontaneously differentiated neurons was excluded from the analysis. Distance between cells within the same batch replicate was calculated using the scipy.spatial.distance.pdist function from the SciPy Python library. The 2,305 highly variable genes were used in the distance calculation. For determining consistency between batch replicates, distance between cells in different batch replicates of the same method was calculated using the scipy.spatial.distance.cdist function.
ScRNA-seq screen. For TF ORF screening using scRNA-seq, HUES66 cells were transduced with the pooled TF ORF library at MOI <0.3 and differentiated into iNPs. Then, iNPs were dissociated for scRNA-seq analysis as described above. To pair TF barcodes with cell barcodes, TF and cell barcodes were PCR amplified from cDNA retained following the whole transcriptome amplification step of the 10× Genomics scRNA-seq library preparation protocol (Table 13). The resulting amplicon was sequenced on the Illumina NextSeq platform, aiming for a minimum coverage of 20,000 reads per single cell (paired-end; read 1: 16 cycles; read 2: 72 cycles). For each cell, the TF whose corresponding barcode had the highest number of perfectly matching NGS reads was paired with the cell if the TF barcode had at least 2 reads and >25% more reads than the second highest TF. Otherwise, the cell was excluded from the scRNA-seq analysis.
Arrayed screen. For TF ORF screening in an arrayed format, individual TF ORF isoforms were packaged into lentivirus as described above. Cells were transduced at MOI <0.5 by seeding 1.6×104 cells in 96-well plates and adding the appropriate volume of lentivirus. Cells were differentiated into NP and harvested for qPCR at 7 days after transduction as described above.
Bulk RNA sequencing (RNA-seq) and data analysis. RNA from cells plated in 24-well plates and grown to 60-90% confluency was harvested using the RNeasy Plus Mini Kit (Qiagen 74134). RNA-seq libraries were prepared using NEBNext Ultra RNA Library Prep Kit for Illumina (NEB E7530S) and deep sequenced on the Illumina NextSeq platform (>9 million reads per biological replicate). Bowtie(Langmead et al., 2009) index was created based on the human hg38 UCSC genome and RefSeq transcriptome. Next, RSEM v1.3.1 (Li and Dewey, 2011) was run with command line options “--estimate-rspd --bowtie-chunkmbs 512 --paired-end” to align paired-end reads directly to this index using Bowtie and estimate expression levels in transcripts per million (TPM) based on the alignments.
To correlate TF ORF RNA-seq signatures to those from human fetal cortex or brain organoid (Nowakowski et al., 2017; Pollen et al., 2015; Quadrato et al., 2017), transcript measurements from each available dataset were converted to TPM. For each cell type, TPM measurements from single cells were averaged to obtain average TPM values of genes for the cell type. The top 2,000 genes that had the highest fold change between the TF ORF expression condition compared to the GFP control condition (stem cells overexpressing GFP that were cultured in mTeSR1 stem cell media) were used to define the TF ORF RNA-seq signature. Expression of these genes in TPM was used to calculate the Pearson correlation between the TF ORF and the cell type of interest from available datasets.
To identify genes that were differentially expressed as a result of TF ORF expression, RSEM's TPM estimates for each transcript were transformed to log-space by taking log 2(TPM+1). Transcripts were considered detected if their transformed expression level was equal to or above 1 (in log 2(TPM+1) scale). All genes detected in at least three libraries were used to find differentially expressed genes. The Student's t-test was performed on the TF ORF overexpression condition against GFP control condition. Only genes that were significant (p-value pass 0.05 FDR correction) were reported.
For analysis of transcriptome changes as a result of DYRK1A perturbation, transcripts were considered detected if the average TPM of either the perturbed or control conditions was greater than 1. In the DYRK1A knockout perturbations, the Student's t-test was performed on the DYRK1A-targeting sgRNA condition against both non-targeting sgRNA conditions. In the DYRK1A overexpression perturbation, the Student's t-test was performed on the DYRK1A ORF condition against the GFP control condition. Volcano plots showed genes that had p-value pass 0.01 FDR correction with fold change that was greater or less than 1. The heat map of genes with DYRK1A dosage-dependent expression changes showed genes that had p-value pass 0.05 FDR correction.
Chromatin immunoprecipitation with sequencing (ChIP-seq). Cells were plated in 10-cm cell culture dishes and grown to 60-80% confluency. For each condition, two biological replicates were harvested for ChIP-seq. Formaldehyde (MilliporeSigma 252549) was added directly to the growth media for a final concentration of 1% and cells were incubated at 37° C. for 10 mins to initiate chromatin fixation. Fixation was quenched by adding 2.5 M glycine (MilliporeSigma G7126) in PBS for a final concentration of 125 mM glycine and incubated at room temperature for 5 mins. Cells were then washed with ice-cold PBS, scraped, and pelleted at 1,000×g for 5 mins.
Cell pellets were prepared for ChIP-seq using the Epigenomics Alternative Mag Bead ChIP Protocol v2.0 (Consortium, 2004). Briefly, cell pellets were resuspended in 100 μL of lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris-HCL pH 8.1) containing protease inhibitor cocktail (MilliporeSigma 05892791001) and incubated for 10 mins at 4° C. Then 400 μL of dilution buffer (0.01% SDS, 1.1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-HCl pH 8.1, and 167 mM NaCl) containing protease inhibitor cocktail (MilliporeSigma 05892791001) was added. Samples were pulse sonicated with 2 rounds of 10 mins (30s on-off cycles, high frequency) in a rotating water bath sonicator (Diagenode Bioruptor) with 5 mins on ice between each round. 10 μL of sonicated sample was set aside as input control. Then 500 μL of dilution buffer (0.01% SDS, 1.1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-HCl pH 8.1, and 167 mM NaCl) containing protease inhibitor cocktail (MilliporeSigma 05892791001) and 1 μL of anti-V5 (Thermo Fisher Scientific R960-25) was added to the sonicated sample. ChIP samples were rotated end over end overnight at 4° C.
For each ChIP, 50 L of Protein A/G Magnetic Beads (Thermo Fisher Scientific 88802) was washed with 1 mL of blocking buffer (0.5% TWEEN and 0.5% BSA in PBS) containing protease inhibitor cocktail (MilliporeSigma 05892791001) twice before resuspending in 100 μL of blocking buffer. ChIP samples were transferred to the beads and rotated end over end for 1 h at 4° C. ChIP supernatant was then removed and the beads were washed twice with 200 μL of RIPA low salt buffer (0.1% SDS, 1% Triton x-100, 1 mM EDTA, 20 mM Tris-HCl pH 8.1, 140 mM NaCl, 0.1% DOC), twice with 200 μL of RIPA high salt buffer (0.1% SDS, 1% Triton x-100, 1 mM EDTA, 20 mM Tris-HCl pH 8.1, 500 mM NaCl, 0.1% DOC), twice with 200 μL of LiCl wash buffer (250 mM LiCl, 1% NP40, 1% DOC, 1 mM EDTA, 10 mM Tris-HCl pH 8.1), and twice with 200 μL of TE (10 mM Tris-HCl pH8.0, 1 mM EDTA pH 8.0). ChIP samples were eluted with 50 μL of elution buffer (10 mM Tris-HCl pH 8.0, 5 mM EDTA, 300 mM NaCl, 0.1% SDS). 40 μL of water was added to the input control samples. 8 μL of reverse cross-linking buffer (250 mM Tris-HCl pH 6.5, 62.5 mM EDTA pH 8.0, 1.25 M NaCl, 5 mg/ml Proteinase K, 62.5 μg/ml RNAse A) was added to the ChIP and input control samples and then incubated at 65° C. for 5 h. After reverse crosslinking, samples were purified using 116 μL of SPRIselect Reagent (Beckman Coulter B23318).
ChIP samples were prepared for NGS with NEBNext Ultra II DNA Library Prep Kit for Illumina (NEB E7645S) and deep-sequenced on the Illumina NextSeq platform (>60 million reads per condition). Bowtie (Langmead et al., 2009) was used to align paired-end reads to the human hg38 UCSC genome with command line options q -X 300 --sam --chunkmbs 512″. Next, biological replicates were merged and Model-based Analysis of ChIP-seq (MACS) (Feng et al., 2012) was run with command line options “-g hs -B -S --mfold 6,30” to identify TF peaks. HOMER (Heinz et al., 2010) was used to discover motifs in the TF peak regions identified by MACS. The findMotifsGenome.pl program from HOMER was run with the command line options “-size 200 -mask” and the top 3 known and de novo motifs were presented. TFs were considered potential regulators of a candidate gene if the TF peak region identified by MACS overlapped with the 20 kb region centered around the transcriptional start site of the candidate gene based on RefSeq annotations.
Indel analysis. Cells plated in 96-well plates were grown to 60-80% confluency and assessed for indel rates as previously described (Joung et al., 2017b). Genomic DNA was harvested from cells using QuickExtract DNA Extraction kit (Lucigen QE09050). The genomic region flanking the site of interest was amplified using NEBNext High Fidelity 2x PCR Master Mix (New England BioLabs M0541L), first with region-specific primers (Table 13) for 15 cycles and then with barcoded primers for 15 cycles as previously described. PCR products were sequenced on the Illumina MiSeq platform (>10,000 reads per condition), and indel analysis was performed as previously described (Joung et al., 2017b).
Click-iT EdU flow cytometry assay. Cells plated in 24-well plates were differentiated and EdU incorporation was measured using the Click-iT EdU Alexa Fluor 488 Flow Cytometry Assay Kit (Thermo Fisher Scientific C10420) according to a modified version of the manufacturer's instructions. EdU was added to the culture medium to a final concentration of 10 μM for 2 h before cells were dissociated with Accutase (STEMCELL Technologies 07920) for 15-45 mins at 37° C. Cells were transferred to a 96-well plate, pelleted at 200×g for 5 mins, and washed once with 200 μL of 1% BSA (MilliporeSigma A9418) in PBS. Cells were resuspended in 100 μL of Click-iT fixative and incubated for 15 mins at room temperature in the dark. After fixing, cells were washed with 200 μL of 1% BSA (MilliporeSigma A9418) in PBS twice, resuspended in 100 μL of Click-iT saponin-based permeabilization and wash reagent, and incubated for 15 mins in the dark. To each sample, 500 μL of Click-iT reaction cocktail was added and the reaction mixture was incubated for 30 mins at room temperature in the dark. Cells were washed with 200 μL of Click-iT saponin-based permeabilization and wash reagent twice and resuspended in 200 μL of 1% BSA (MilliporeSigma A9418) in PBS before analysis on a CytoFLEX Flow Cytometer (Beckman Coulter). For each sample, 10,000 cells were analyzed with FlowJo (FlowJo). Significance testing was performed using Student's t-test.
Electrophysiology. Whole-cell patch-clamp recordings were performed as described (doi: 10.1016/j.celrep.2018.04.066). Recording pipettes were pulled from thin-walled borosilicate glass capillary tubing (KG33, King Precision Glass, CA, USA) on a P-97 puller (Sutter Instrument, CA, USA) and had resistances of 3-5 M2 when filled with internal solution (in mM: 128 K-gluconate, 10 HEPES, 10 phosphocreatine sodium salt, 1.1 EGTA, 5 ATP magnesium salt and 0.4 GTP sodium salt, pH=7.3, 300-305 mOsm). The cultured cells were constantly perfused at a speed of 3 ml/min with the extracellular solution (119 mM NaCl, 2.3 mM KCl, 2 mM CaCl2, 1 mM MgCl2, 15 mM HEPES, 5 mM glucose, pH=7.3-7.4, Osmolarity was adjusted to 325 mOsm with sucrose). All the experiments were performed at room temperature unless otherwise specified.
Cells were visualized with a 40X water-immersion objective on an upright microscope (Olympus, Japan) equipped with IR-DIC. Recordings were made using a Multiclamp 700B amplifier (Molecular Devices, CA, USA) and Clampex 10.7 software (Molecular Devices, CA, USA). In current clamp mode, membrane potential was held at −65 mV with a Multiclamp 700B amplifier, and step currents were then injected to elicit action potentials. Subsequent analysis was performed using Clampfit 10.7 software (Molecular Devices, CA, USA). The spontaneous AMPA receptor mediated excitatory postsynaptic currents (sEPSCs) were recorded after entering whole-cell path clamp recording mode at least for 3 min. The data were stored on a computer for subsequent off-line analysis. Cells in which the series resistance (Rs) changed by >20% were excluded for data analysis. In addition, cells with Rs more than 20 MΩ at any time during the recordings were discarded.
Reagent availability. The pooled and arrayed versions of MORF have been deposited at Addgene for distribution to the scientific community.
Code availability. Applicants have provided a Python script for aggregating gene lists from different datasets and selecting marker genes and TFs from MORF on the Feng Zhang lab GitHub page (github.com/fengzhanglab/TF_screen_manuscript).
REFERENCES
- 1 Cohen, D. E. & Melton, D. Turning straw into gold: directing cell fate for regenerative medicine. Nat Rev Genet 12, 243-252, doi: 10.1038/nrg2938 (2011).
- 2 Colman, A. & Dreesen, O. Pluripotent stem cells and disease modeling. Cell Stem Cell 5, 244-247, doi: 10.1016/j.stem.2009.08.010 (2009).
- 3 Keller, G. Embryonic stem cell differentiation: emergence of a new era in biology and medicine. Genes Dev 19, 1129-1155, doi: 10.1101/gad.1303605 (2005).
- 4 Kiskinis, E. & Eggan, K. Progress toward the clinical application of patient-specific pluripotent stem cells. J Clin Invest 120, 51-59, doi: 10.1172/JCI40553 (2010).
- 5 Robinton, D. A. & Daley, G. Q. The promise of induced pluripotent stem cells in research and therapy. Nature 481, 295-305, doi: 10.1038/nature10761 (2012).
- 6 Furuyama, K. et al. Diabetes relief in mice by glucose-sensing insulin-secreting human alpha-cells. Nature, doi: 10.1038/s41586-019-0942-8 (2019).
- 7 Pang, Z. P. et al. Induction of human neuronal cells by defined transcription factors. Nature 476, 220-223, doi:10.1038/nature10202 (2011).
- 8 Song, K. et al. Heart repair by reprogramming non-myocytes with cardiac transcription factors. Nature 485, 599-604, doi: 10.1038/nature11139 (2012).
- 9 Sugimura, R. et al. Haematopoietic stem and progenitor cells from human pluripotent stem cells. Nature 545, 432-438, doi: 10.1038/nature22370 (2017).
- 10 Takahashi, K. et al. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861-872, doi:10.1016/j.cell.2007.11.019 (2007).
- 11 Weintraub, H. et al. Activation of muscle-specific genes in pigment, nerve, fat, liver, and fibroblast cell lines by forced expression of MyoD. Proc Natl Acad Sci USA 86, 5434-5438 (1989).
- 12 Zhang, Y. et al. Rapid single-step induction of functional neurons from human pluripotent stem cells. Neuron 78, 785-798, doi:10.1016/j.neuron.2013.05.029 (2013).
- 13 Zhang, S. C., Wernig, M., Duncan, I. D., Brustle, O. & Thomson, J. A. In vitro differentiation of transplantable neural precursors from human embryonic stem cells. Nat Biotechnol 19, 1129-1133, doi: 10.1038/nbt1201-1129 (2001).
- 14 Chambers, S. M. et al. Highly efficient neural conversion of human ES and iPS cells by dual inhibition of SMAD signaling. Nat Biotechnol 27, 275-280, doi:10.1038/nbt.1529 (2009).
- 15 Hu, B. Y. et al. Neural differentiation of human induced pluripotent stem cells follows developmental principles but with variable potency. Proc Natl Acad Sci USA 107, 4335-4340, doi:10.1073/pnas.0910012107 (2010).
- 16 Konermann, S. et al. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature 517, 583-588, doi: 10.1038/nature14136 (2015).
- 17 Camp, J. G. et al. Human cerebral organoids recapitulate gene expression programs of fetal neocortex development. Proc Natl Acad Sci USA 112, 15672-15677, doi:10.1073/pnas.1520760112 (2015).
- 18 Johnson, M. B. et al. Single-cell analysis reveals transcriptional heterogeneity of neural progenitors in human cortex. Nat Neurosci 18, 637-646, doi:10.1038/nn.3980 (2015).
- 19 Llorens-Bobadilla, E. et al. Single-Cell Transcriptomics Reveals a Population of Dormant Neural Stem Cells that Become Activated upon Brain Injury. Cell Stem Cell 17, 329-340, doi:10.1016/j.stem.2015.07.002 (2015).
- 20 Pollen, A. A. et al. Molecular identity of human outer radial glia during cortical development. Cell 163, 55-67, doi:10.1016/j.cell.2015.09.004 (2015).
- 21 Shin, J. et al. Single-Cell RNA-Seq with Waterfall Reveals Molecular Cascades underlying Adult Neurogenesis. Cell Stem Cell 17, 360-372, doi:10.1016/j.stem.2015.07.013 (2015).
- 22 Thomsen, E. R. et al. Fixed single-cell transcriptomic characterization of human radial glial diversity. Nat Methods 13, 87-93, doi:10.1038/nmeth.3629 (2016).
- 23 Wu, J. Q. et al. Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proc Natl Acad Sci USA 107, 5254-5259, doi:10.1073/pnas.0914114107 (2010).
- 24 Zhang, Y. et al. Purification and Characterization of Progenitor and Mature Human Astrocytes Reveals Transcriptional and Functional Differences with Mouse. Neuron 89, 37-53, doi:10.1016/j.neuron.2015.11.013 (2016).
- 25 Quadrato, G. et al. Cell diversity and network dynamics in photosensitive human brain organoids. Nature 545, 48-53, doi: 10.1038/nature22047 (2017).
- 26 Nowakowski, T. J. et al. Spatiotemporal gene expression trajectories reveal developmental hierarchies of the human cortex. Science 358, 1318-1323, doi:10.1126/science.aap8809 (2017).
- 27 Casarosa, S., Fode, C. & Guillemot, F. Mash1 regulates neurogenesis in the ventral telencephalon. Development 126, 525-534 (1999).
- 28 Zhang, X. et al. Pax6 is a human neuroectoderm cell fate determinant. Cell Stem Cell 7, 90-100, doi:10.1016/j.stem.2010.04.017 (2010).
- 29 Murre, C. et al. Interactions between heterologous helix-loop-helix proteins generate complexes that bind specifically to a common DNA sequence. Cell 58, 537-544 (1989).
- 30 Morotomi-Yano, K. et al. Human regulatory factor X 4 (RFX4) is a testis-specific dimeric DNA-binding protein that cooperates with other human RFX members. J Biol Chem 277, 836-842, doi:10.1074/jbc.M108638200 (2002).
- 31 O'Roak, B. J. et al. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders. Science 338, 1619-1622, doi:10.1126/science.1227764 (2012).
- 32 Smith, D. J. et al. Functional screening of 2 Mb of human chromosome 21q22.2 in transgenic mice implicates minibrain in learning defects associated with Down syndrome. Nat Genet 16, 28-36, doi:10.1038/ng0597-28 (1997).
- 33 Fotaki, V. et al. Dyrk1A haploinsufficiency affects viability and causes developmental delay and abnormal brain morphology in mice. Mol Cell Biol 22, 6636-6647 (2002).
- 34 Hammerle, B. et al. Transient expression of Mnb/Dyrkla couples cell cycle exit and differentiation of neuronal precursors by inducing p27KIP1 expression and suppressing NOTCH signaling. Development 138, 2543-2554, doi:10.1242/dev.066167 (2011).
- 35 Park, J. et al. Dyrk1A phosphorylates p53 and inhibits proliferation of embryonic neuronal cells. J Biol Chem 285, 31895-31906, doi:10.1074/jbc.M110.147520 (2010).
- 36 Yabut, O., Domogauer, J. & D'Arcangelo, G. Dyrk1A overexpression inhibits proliferation and induces premature neuronal differentiation of neural progenitor cells. J Neurosci 30, 4004-4014, doi: 10.1523/JNEUROSCI.4711-09.2010 (2010).
- 37 Soppa, U. et al. The Down syndrome-related protein kinase DYRK1A phosphorylates p27(Kip1) and Cyclin D1 and induces cell cycle exit and neuronal differentiation. Cell Cycle 13, 2084-2100, doi: 10.4161/cc.29104 (2014).
- 38 Ashique, A. M. et al. The Rfx4 transcription factor modulates Shh signaling by regional control of ciliogenesis. Sci Signal 2, ra70, doi: 10.1126/scisignal.2000602 (2009).
- 39 Blackshear, P. J. et al. Graded phenotypic response to partial and complete deficiency of a brain-specific transcript variant of the winged helix transcription factor RFX4. Development 130, 4539-4552, doi: 10.1242/dev.00661 (2003).
- 40 Joung, J. et al. Genome-scale CRISPR-Cas9 knockout and transcriptional activation screening. Nat Protoc 12, 828-863, doi:10.1038/nprot.2017.016 (2017).
- 41 Fulco, C. P. et al. Activity-by-Contact model of enhancer specificity from thousands of CRISPR perturbations. bioRxiv, 529990, doi: 10.1101/529990 (2019).
- 42 Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun 8, 14049, doi: 10.1038/ncomms14049 (2017).
- 43 Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33, 495-502, doi:10.1038/nbt.3192 (2015).
- 44 Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25, doi:10.1186/gb-2009-10-3-r25 (2009).
- 45 Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323, doi: 10.1186/1471-2105-12-323 (2011).
- 46 Consortium, E. P. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636-640, doi: 10.1126/science.1105136 (2004).
- 47 Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using MACS. Nat Protoc 7, 1728-1740, doi:10.1038/nprot.2012.101 (2012).
- 48 Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 576-589, doi:10.1016/j.molcel.2010.05.004 (2010).
- 49 Campisi, J. (2001). Cellular senescence as a tumor-suppressor mechanism. Trends Cell Biol 11, S27-31.
- 50 Cao, J., Spielmann, M., Qiu, X., Huang, X., Ibrahim, D. M., Hill, A. J., Zhang, F., Mundlos, S., Christiansen, L., Steemers, F. J., et al. (2019). The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496-502.
- 51 Damell, J. E., Jr. (2002). Transcription factors as targets for cancer therapy. Nat Rev Cancer 2, 740-749.
- 52 De Rubeis, S., He, X., Goldberg, A. P., Poultney, C. S., Samocha, K., Cicek, A. E., Kou, Y., Liu, L., Fromer, M., Walker, S., et al. (2014). Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209-215.
- 53 Englund, C., Fink, A., Lau, C., Pham, D., Daza, R. A., Bulfone, A., Kowalczyk, T., and Hevner, R. F. (2005). Pax6, Tbr2, and Tbr1 are expressed sequentially by radial glia, intermediate progenitor cells, and postmitotic neurons in developing neocortex. J Neurosci 25, 247-251.
- 54 Frantz, G. D., Weimann, J. M., Levin, M. E., and McConnell, S. K. (1994). Otx 1 and Otx2 define layers and regions in developing cerebral cortex and cerebellum. J Neurosci 14, 5725-5740.
- 55 Gilbert, L. A., Horlbeck, M. A., Adamson, B., Villalta, J. E., Chen, Y., Whitehead, E. H., Guimaraes, C., Panning, B., Ploegh, H. L., Bassik, M. C., et al. (2014). Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell 159, 647-661.
- 56 Gotz, M., Stoykova, A., and Gruss, P. (1998). Pax6 controls radial glia differentiation in the cerebral cortex. Neuron 21, 1031-1044.
- 57 Liu, Y., Yu, C., Daley, T. P., Wang, F., Cao, W. S., Bhate, S., Lin, X., Still, C., 2nd, Liu, H., Zhao, D., et al. (2018). CRISPR Activation Screens Systematically Identify Factors that Drive Neuronal Fate and Reprogramming. Cell Stem Cell 23, 758-771 e758.
- 58 Matsunaga, E., Nambu, S., Oka, M., and Iriki, A. (2015). Complex and dynamic expression of cadherins in the embryonic marmoset cerebral cortex. Dev Growth Differ 57, 474-483.
- 59 Reinchisi, G., Ijichi, K., Glidden, N., Jakovcevski, I., and Zecevic, N. (2012). COUP-TFII expressing interneurons in human fetal forebrain. Cereb Cortex 22, 2820-2830. Schafer, S. T., Paquola, A. C. M., Stern, S., Gosselin, D., Ku, M., Pena, M., Kuret, T. J. M., 60 Liyanage, M., Mansour, A. A., Jaeger, B. N., et al. (2019). Pathological priming causes developmental gene network heterochronicity in autistic subject-derived neurons. Nat Neurosci 22, 243-255.
- 61 Shi, Y., Kirwan, P., and Livesey, F. J. (2012a). Directed differentiation of human pluripotent stem cells to cerebral cortex neurons and neural networks. Nat Protoc 7, 1836-1846.
- 62 Shi, Y., Kirwan, P., Smith, J., Robinson, H. P., and Livesey, F. J. (2012b). Human cerebral cortex development from pluripotent stem cells to functional excitatory synapses. Nat Neurosci 15, 477-486, S471.
- 63 Steele-Perkins, G., Plachez, C., Butz, K. G., Yang, G., Bachurski, C. J., Kinsman, S. L., Litwack, E. D., Richards, L. J., and Gronostajski, R. M. (2005). The transcription factor gene Nfib is essential for both lung maturation and brain development. Mol Cell Biol 25, 685-698.
- 64 UniProt, C. (2015). UniProt: a hub for protein information. Nucleic Acids Res 43, D204-212.
- 65 Wolf, F. A., Angerer, P., and Theis, F. J. (2018). SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19, 15.
- 66 Zhang, H. M., Chen, H., Liu, W., Liu, H., Gong, J., Wang, H., and Guo, A. Y. (2012). AnimalTFDB: a comprehensive animal transcription factor database. Nucleic Acids Res 40, D144-149.
- 67 Zhang, H. M., Liu, T., Liu, C. J., Song, S., Zhang, X., Liu, W., Jia, H., Xue, Y., and Guo, A. Y. (2015). AnimalTFDB 2.0: a resource for expression, prediction and functional study of animal transcription factors. Nucleic Acids Res 43, D76-81.
Tables