US Patent Application for METHODS FOR DIFFERENTIATING AND SCREENING STEM CELLS Patent Application (Application #20240309320 issued September 19, 2024) (2024)

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application Nos. 63/219,705, filed Jul. 8, 2021 and 63/313,842, filed Feb. 25, 2022. The entire contents of the above-identified applications are hereby fully incorporated herein by reference.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

This invention was made with government support under Grant Nos. MH117886, HG009761, MH110049, and HL141201 awarded by the National Institutes of Health. The government has certain rights in the invention.

REFERENCE TO AN ELECTRONIC SEQUENCE LISTING

The contents of the electronic sequence listing (BROD-5420WP_ST26.xml”; Size is 23,452,824 (23.5 MB on disk) bytes and it was created on Jul. 8, 2022) is herein incorporated by reference in its entirety.

TECHNICAL FIELD

The subject matter disclosed herein is generally directed to methods of differentiating stem cells into target cell types and screening platforms for systematically identifying transcription factors (TFs) that drive differentiation of stem cells into target cell types.

BACKGROUND

Directed differentiation of human pluripotent stem cells into diverse cell types has the potential to realize a broad array of cellular replacement therapies and provides a tractable model that can be perturbed, genetically or chemically, to assess effects in a cell type-specific context1-5. Despite the utility of cellular engineering, however, it remains challenging or impossible to generate many cell types1-5. The best differentiation methods are often labor-intensive and can require months to produce even heterogenous or immature cell populations. Many of these methods rely on exogenous growth factors or small molecules, which are often dosage-sensitive and difficult to identify in a scalable manner. Alternatively, overexpression of transcription factors (TFs) has been shown to rapidly and efficiently generate many different cell types, including neurons and skeletal muscle cells6-12. As TFs use endogenous regulatory pathways to drive differentiation, mimicking natural development, this approach to engineering cell fate may produce higher fidelity models while illuminating aspects of cellular development. However, the process of discovering TFs for directed differentiation relies on time-intensive and low-throughput arrayed screens. Arrayed screens, in which each perturbation must be performed and tested individually, are inherently limited in their scalability, typically 5-25 TFs6-12. By contrast, pooled screening approaches, which make use of barcodes to enable multiple perturbations to be tested in parallel, are dramatically more scalable, both in terms of time and cost.

In vitro models of the human brain enable high-throughput genetic and chemical screens that can advance our understanding of complex neuro-developmental and -degenerative diseases. To simultaneously assess thousands of different perturbations and ensure unbiased results, such models should be homogenous, robust, and scalable. Current methods for generating models of the brain generally involve differentiating human embryonic stem cells (hESCs) into neural cells using exogenous factors or small molecules, a process that is labor-intensive, time-consuming, and produces non-homogeneous cell types (Douvaras P, et al., Efficient generation of myelinating oligodendrocytes from primary progressive multiple sclerosis patients by induced pluripotent stem cells. Stem Cell Reports. 2014; 3(2):250-9; Krencik R, et al., Specification of transplantable astroglial subtypes from human pluripotent stem cells. Nat Biotechnol. 2011; 29(6):528-34; Li X J, et al., Specification of motoneurons from human embryonic stem cells. Nat Biotechnol. 2005; 23(2):215-21; Perrier A L, et al., Derivation of midbrain dopamine neurons from human embryonic stem cells. Proc Natl Acad Sci USA. 2004; and Muffat J, et al., Efficient derivation of microglia-like cells from human pluripotent stem cells. Nat Med. 2016; 22(11):1358-67). Furthermore, many cell types in the brain cannot be derived from hESCs. Although methods exist for differentiating neural progenitors and some neuronal subtypes, none efficiently generate glial cells (astrocytes, oligodendrocytes, and microglia) that resemble their in vivo counterparts without transplantation (Douvaras P, et al., 2014, Krencik R, et al., 2011, and Muffat J, et al., 2016). Since glia have been shown to play critical roles in neural development and disease, including them in models is critical to the success of this approach for studying the brain (Chung W S, et al., Do glia drive synaptic and cognitive impairment in disease? Nat Neurosci. 2015; 18(11): 1539-45; and Hong S, Stevens B. Microglia: Phagocytosing to Clear, Sculpt, and Eliminate. Dev Cell. 2016; 38(2):126-8).

Thus, there is a need to develop an efficient method that can generate more complete in vitro models of the human brain. Additionally, there is a need for in vitro models of other cell types that can advance our understanding of development and disease.

SUMMARY

In certain example embodiments, the present invention provides for screening platforms for systematically identifying transcription factors (TFs) that drive differentiation of pluripotent stem cells into target cell types. In certain example embodiments, the present invention provides for differentiation methods based on overexpression of TFs to generate specific cell types. Applicants provide examples of the screening methods to identify transcription factors that are capable of differentiating stem cells into all cell types, including neural progenitors/radial glia in the developing central nervous system that are capable of differentiating into neurons, astrocytes, and oligodendrocytes. In certain embodiments, the neural progenitors are referred to as induced neural progenitors (iNPs). Some, but not all, of the iNPs become radial glial cells. Thus, “neural progenitors” as used herein may be referred to as “induced neural progenitors” or “radial glia”. Applicants further identify TFs that are capable of differentiating stem cells into cardiomyocytes.

In one aspect, the present invention provides for a method of differentiating a pluripotent cell population to a target cell type of interest comprising overexpressing one or more transcription factors (TFs) from Table 1 or Table 3 in a pluripotent cell population, and selecting cells expressing one or more target cell markers. In certain embodiments, the target cell is a neural progenitor and selecting cells comprises selecting cells expressing one or more radial glial cell markers. In certain embodiments, the one or more transcription factors are selected from the group consisting of RFX4, NFIB, ASCL1, PAX6, EOMES, FOS, OTX1, NFIC, LHX2, FANCD2, NOTCH1, SMARCC1, ESR2, ESR1, MESP1, RCOR2, GLI3, NOTCH2, HELLS, BCL11A, HES1, FANCD2, SOX9, FEZF2, and TCF7L2 or TFs that are ranked in the top 10% of any screening method in Table 1 (e.g., RFX4, NFIB, ASCL1, PAX6, EOMES, FOS, OTX1, NFIC, LHX2, RCOR2, GLI3, NOTCH2, HELLS, BCL11A, HES1, FANCD2, SOX9, FEZF2, TCF7L2). In certain embodiments, the one or more transcription factors are RFX4, NFIB, ASCL1, PAX6, or a combination thereof. In preferred embodiments, RFX4 is overexpressed to produce the neural progenitors. In certain embodiments, the method further comprises producing RFX4 neural progenitor cells in media comprising dual SMAD inhibitors. In certain embodiments, the one or more radial glial cell markers are selected from Table 2. In certain embodiments, the one or more radial glial cell markers are selected from the group consisting of NES, VIM, SLC1A3, and PAX6. In certain embodiments, the method further comprises inducing differentiation of the neural progenitors into neurons, astrocytes and/or oligodendrocytes. In certain embodiments, differentiation comprises spontaneous differentiation of the neural progenitors. In certain embodiments, differentiation comprises directed differentiation of the neural progenitors.

In certain embodiments, selecting further comprises selecting cells enriched for expression of one or more gene signatures expressed in in vivo radial glia cells. The one or more gene signatures may be any in vivo gene signature known in the art (see, e.g., Pollen et al., Molecular identity of human outer radial glia during cortical development. Cell. 2015; 163(1):55-67). In certain embodiments, selecting cells enriched for expression of one or more gene signatures expressed in in vivo radial glia cells comprises identifying gene signatures for each TF by identifying differentially expressed genes between cells overexpressing a transcription factor and control cells; and selecting cells having a signature that is enriched in an in vivo radial glia cell type. Differentially expressed genes may be identified by comparing expression of genes in cells overexpressing a transcription factor and control cells overexpressing only the reporter gene (e.g., GFP). In certain embodiments, the signature may encompass the top differentially expressed genes (e.g., top 10, 100, 1000 or more most differentially expressed genes). In certain embodiments, the gene signatures are compared to in vivo cells and the gene signatures from cells having an overexpressed transcription factor that are most enriched in the in vivo cell types are selected.

In another aspect, the present invention provides for an isolated neural progenitor cell produced by the method of any embodiment herein. In certain embodiments, the present invention provides for a therapeutic composition comprising the isolated neural progenitor cell. In certain embodiments, the present invention provides for an ex vivo system comprising the isolated neural progenitor cell.

In another aspect, the present invention provides for a method of producing neurons, astrocytes and/or oligodendrocytes comprising expressing one or more transcription factors from Table 1 in the isolated neural progenitor cell of any embodiment herein and inducing spontaneous differentiation of the isolated neural progenitor cells. In another aspect, the present invention provides for a method of producing neurons, astrocytes and/or oligodendrocytes comprising expressing one or more transcription factors from Table 1 in the isolated neural progenitor cell of any embodiment herein and inducing directed differentiation of the isolated neural progenitor cells. In preferred embodiments, the neural progenitor cell was produced by overexpression of RFX4. In certain embodiments, the method further comprises differentiating RFX4 neural progenitor cells in media comprising dual SMAD inhibitors. In certain embodiments, the RFX4 neural progenitor cells are differentiated for 7 days. In certain embodiments, the RFX4 neural progenitor cells are differentiated into CNS cell types, radial glia, and neurons. In certain embodiments, the neurons are GABAergic neurons.

In another aspect, the present invention provides for an isolated neuron, astrocyte, or oligodendrocyte produced according to any method described herein. In certain embodiments, the present invention provides for a therapeutic composition comprising the isolated neuron, astrocyte, or oligodendrocyte. In certain embodiments, the present invention provides for an ex vivo system comprising the isolated neurons, astrocytes, and/or oligodendrocytes. In preferred embodiments, the neuron is a GABAergic neuron. In certain embodiments, the GABAergic neuron can be used in a model of autism, schizophrenia, epilepsy, dementia, Alzheimer's disease, or anxiety disorders (e.g., depression).

In another aspect, the present invention provides for a non-naturally occurring population of stem cells comprising a reporter gene integrated into an endogenous locus of each stem cell in the population, wherein the endogenous locus is associated with a marker gene for a cell type of interest; the reporter gene is under control of the promoter for the marker gene; and the reporter gene and marker gene are expressed as separate proteins, whereby the marker gene and reporter gene are co-expressed upon differentiation of the stem cells into the cell type of interest. The non-naturally occurring population of stem cells may further comprise a second reporter gene integrated into a second endogenous locus of the stem cell, wherein the locus is associated with a marker gene for a second cell type of interest, and wherein the second cell type of interest is more differentiated than the first cell type of interest. The reporter gene and marker gene (e.g., first and/or second) may be separated by a ribosomal skipping site. The ribosomal skipping site may be a P2A sequence. The reporter gene may be a fluorescent protein as described herein. The cell type of interest may be any differentiated cell (e.g., more differentiated than a stem cell, including but not limited to a progenitor cell). The cell type of interest may be a neural progenitor or mature neural cell type.

In certain embodiments, the cell type of interest is a radial glia cell. The marker gene may be selected from Table 2. The marker gene may be selected from the group consisting of NES, VIM, SLC1A3, and PAX6.

In certain embodiments, the cell type of interest is an astrocyte. The marker gene may be selected from the group consisting of ALDH1L1 and GFAP.

In another aspect, the present invention provides for a pooled transcription factor screening system comprising a transcription factor library comprising one or more vectors encoding a transcription factor and a barcode identifying said transcription factor; and a population of pluripotent cells. In certain embodiments, the transcription factors encoded by the vectors are selected from Table 1 and/or Table 3. In certain embodiments, the population of pluripotent cells are stem cells. In certain embodiments, the system further comprises one or more fluorescent probes configured for detecting one or more target cell marker gene transcripts (e.g., Flow-FISH probes).

In another aspect, the present invention provides for a method of screening for transcription factors capable of differentiating pluripotent cells into a cell type of interest comprising: a) introducing a transcription factor library comprising one or more vectors to a population of pluripotent cells, wherein each vector encodes: a transcription factor selected from Table 1 and/or Table 3 or an agent capable of modulating said transcription factor, and a barcode identifying each transcription factor; b) culturing the cells to allow differentiation of the cells (e.g., 2-10 days, or 2-7 days, or 5-7 days); c) selecting cells expressing one or more marker genes for the cell type of interest; and d) determining barcodes enriched in cells expressing the one or marker genes, thereby identifying transcription factors capable of differentiating pluripotent cells into a cell type of interest. In certain embodiments, the population of pluripotent cells is a population of human embryonic stem cells (hESCs). In certain embodiments, each transcription factor is inducible. In certain embodiments, the transcription factors selected are normally expressed by the cell type of interest.

In certain embodiments, selecting cells expressing one or more marker genes for the cell type of interest comprises Flow-FISH using probes targeting one or more marker genes. In certain embodiments, selecting cells expressing one or more marker genes for the cell type of interest comprises single cell RNA-seq. In certain embodiments, selecting cells further comprises comparing single cell RNA-seq expression profiles of cells overexpressing one or more of the transcription factors to those of cells overexpressing controls (e.g., green fluorescent protein) to infer pseudotime for each cell, wherein transcription factors that increased pseudotimes direct differentiation. In certain embodiments, selecting cells further comprises grouping one or more of the transcription factors in modules that alter expression of the same gene programs, wherein transcription factors in the same modules are co-functional.

In certain embodiments, the one or more populations of pluripotent cells are stem cells. In certain embodiments, selecting cells expressing one or marker genes for the cell type of interest comprises detecting the reporter gene. In certain embodiments, selecting cells comprises FACS.

In certain embodiments, determining barcodes comprises sequencing the DNA barcode or transcript comprising the barcode. In certain embodiments, determining barcodes comprises amplification of barcode sequences (e.g., PCR).

In certain embodiments, the method further comprises introducing the transcription factor library at a low cell density, such that the cells multiply into small colonies; and inducing expression of the transcription factors or agents encoded by the vectors. In certain embodiments, the method further comprises introducing the vector library at a low MOI, such that most cells receive no more than one vector. In certain embodiments, the method further comprises introducing the vector library at a high MOI, such that most cells receive one or more vectors.

In certain embodiments, the transcription factor library comprises viral vectors. In certain embodiments, the viral vectors are lentivirus, adenovirus or adeno associated virus (AAV) vectors.

In certain embodiments, the transcription factor library further encodes a protein tag in frame with the transcription factor coding sequence.

In certain embodiments, the population of stem cells expresses a CRISPR system and the transcription factor library comprises vectors encoding one or more CRISPR guide sequences targeting one of the transcription factors. In certain embodiments, the guide sequences comprise one or more aptamer sequences specific for binding an adaptor protein and the CRISPR system comprises an enzymatically inactive CRISPR enzyme and the adaptor protein comprises a functional domain. In certain embodiments, the CRISPR system comprises an enzymatically inactive CRISPR enzyme and a functional domain. In certain embodiments, the functional domain is a transcription activation or repression domain.

In certain embodiments, the transcription factor library comprises vectors encoding a shRNA for one of the transcription factors.

In certain embodiments, identifying transcription factors further comprises determining gene signatures for each identified TF, wherein the gene signature comprises differentially expressed genes between cells overexpressing each transcription factor and control cells; and selecting transcription factors inducing a gene signature that is enriched in an in vivo cell type.

In another aspect, the present invention provides for a method of producing cardiomyocytes comprising overexpressing a transcription factor selected from the group consisting of MESP1, EOMES and ESR1 in a pluripotent cell population, and selecting cells expressing one or more cardiomyocyte markers. In certain embodiments, the transcription factor is EOMES. In certain embodiments, the amino acid sequence of EOMES is SEQ ID NO: 10807 or SEQ ID NO: 10808. In certain embodiments, the transcription factor is induced for about 2 days. In certain embodiments, the transcription factor is induced when the cell density is about 500,000 cells/ml. In certain embodiments, the one or more cardiomyocyte markers comprises TNNT2. In certain embodiments, selecting further comprises selecting cells enriched for expression of one or more gene signatures expressed in in vivo cardiomyocytes.

In another aspect, the present invention provides for an isolated cardiomyocyte produced by the method according to any embodiment herein. In certain embodiments, the present invention provides for a therapeutic composition comprising the isolated cardiomyocyte. In certain embodiments, the present invention provides for an ex vivo system comprising the isolated cardiomyocyte.

In certain embodiments, the pluripotent cell according to any embodiment herein is an embryonic stem cell (ES) or induced pluripotent stem cell. In certain embodiments, the stem cell is a human embryonic stem cell (ES). In certain embodiments, the human embryonic stem cell is selected from the group consisting of HUES66, HUES64, HUES3, HUES8, HUES53, HUES28, HUES49, HUES9, HUES48, HUES45, HUES1, HUES44, HUES6, H1, HUES62, HUES65, H7, HUES13, H9, and HUES63. In certain embodiments, the stem cell is a human induced pluripotent stem cell (iPSC). In certain embodiments, the human iPSC is selected from the group consisting of 11a, PGP1, GM08330 (also known as GM8330-8), and Mito 210.

In another aspect, the present invention provides for a stem cell comprising an exogenous nucleotide sequence capable of inducible expression of one or more transcription factors selected from the group consisting of RFX4, NFIB, ASCL1 and PAX6.

In another aspect, the present invention provides for a stem cell comprising an exogenous nucleotide sequence capable of inducible expression of one or more transcription factors selected from the group consisting of MESP1, EOMES and ESR1.

In another aspect, the present invention provides for a method of predicting transcription factor combinations for differentiating a stem cell into a cell type of interest comprising determining the average gene expression of one or more genes for two or more stem cells each expressing a single transcription factor and comparing the average expression to a gene signature specific for the cell type of interest. In certain embodiments, the method further comprises differentiating a stem cell into the cell type of interest by expressing in the stem cell a double or triple combination of transcription factors whose average gene expression is most similar to a gene signature specific for the cell type of interest.

In another aspect, the present invention provides for a method of differentiating a stem cell into a cell type of interest comprising expressing in the stem cell a double or triple combination of transcription factors selected from the clusters in Table 19.

These and other aspects, objects, features, and advantages of the example embodiments can become apparent to those having ordinary skill in the art upon consideration of the following detailed description of illustrated example embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

An understanding of the features and advantages of the present invention can be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention may be utilized, and the accompanying drawings of which:

FIG. 1—Targeted arrayed TF screen. (A), Screening schematic. (B), Expression of radial glia marker genes after ASCL1 overexpression. (C), Image of differentiated cells after 4 days of ASCL1 overexpression. Scale bar, 100 μm.

FIG. 2—Gene expression signature of differentiated radial glia. Heat map of Z-scores indicating enrichment of TF candidate gene expression signatures in each cell type in vivo.

FIG. 3—Immunostaining of radial glia differentiated from candidate TFs. (A), Immunostaining of radial glia markers (VIM and NES) after 12 days of TF overexpression. (B), Immunostaining of neurons (MAP2), astrocytes (GFAP), and oligodendrocytes (NG2) after 4 weeks of spontaneous differentiation from radial glia induced by candidate TF overexpression. Scale bar, 50 μm.

FIG. 4—Immunostaining of neurons and astrocytes differentiated from ASCL1. Immunostaining for markers identifying neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursors (NG2 and PDGFRA) at indicated time points after induction of the TF (7 days, 14 days, 28 days).

FIG. 5—Immunostaining of neurons and astrocytes differentiated from NFIB. Immunostaining for markers identifying neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursors (NG2 and PDGFRA) at indicated time points after induction of the TF (7 days, 14 days, 28 days).

FIG. 6—Immunostaining of neurons and astrocytes differentiated from PAX6. Immunostaining for markers identifying neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursors (NG2 and PDGFRA) at indicated time points after induction of the TF (7 days, 14 days, 28 days).

FIG. 7—Immunostaining of neurons and astrocytes differentiated from RFX4. Immunostaining for markers identifying neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursors (NG2 and PDGFRA) at indicated time points after induction of the TF (7 days, 14 days, 28 days).

FIG. 8—Pooled TF screen. (A), Screening schematic. (B), Heat map of Z-scores representing median enrichment of each TF from 3 screens of 90 transcription factors performed in different clonal cell lines.

FIG. 9—Scatter Plot. Results of pooled screening of 1,387 transcription factors.

FIG. 10—Genome-wide astrocyte differentiation screen. Screening schematic.

FIG. 11—Cardiomyocyte differentiation. Bar graph showing the percentage of TNNT2 positive cells after cardiomyocyte differentiation of human embryonic stem cells under different conditions for inducing expression of two isoforms of EOMES.

FIG. 12—Cardiomyocyte differentiation. Bar graph showing the percentage of TNNT2 positive cells after cardiomyocyte differentiation of human embryonic stem cells under different conditions for inducing expression of two isoforms of EOMES or a small molecule differentiation method.

FIG. 13—Development of a pooled TF screening platform for directed differentiation. (A) Schematic of pooled TF screening. Barcoded TF ORFs are pooled and packaged into lentivirus for delivery into hESCs. TFs that can differentiate hESCs into the cell type of interest are identified using a reporter cell line, flow-FISH, or single-cell RNA sequencing, followed by deep sequencing of TF barcodes. MOI, multiplicity of infection. (B) Scatterplot showing enrichment of candidate TFs identified by flow-FISH with pooled FISH probes targeting 2 or 10 NP marker genes from n=3 infection replicates. (C) Same as (B) highlighting different isoforms of candidate TFs. (D) Comparison of TFs that ranked in the top 10% from the 4 different screens.

FIG. 14—Validation of candidate TFs for iNP differentiation. (A) Expression of NP marker genes VIM and NES in iNPs produced by candidate TFs after 7 days of overexpression. Cell culture media used for each ORF is indicated in parentheses. Scale bar, 50 μm. (B) Heat map of bulk RNA sequencing (RNA-seq) signature correlation between iNPs and human fetal cortex cell types from the Pollen 2015 dataset20. D7 and D12 indicate the number of days that the ORF was overexpressed. RG, radial glia; IPC, intermediate progenitor cell; N, neuron; IN, interneuron.

FIG. 15—Candidate TFs produce iNPs that can spontaneously differentiate into cell types in the central nervous system. (A) Schematic of spontaneous differentiation. Dox-inducible candidate TFs are transiently overexpressed for 1 week to differentiate hESCs into iNPs and spontaneously differentiated for 8 weeks by withdrawing dox and growth factors. Spontaneously differentiated cells were characterized by immunostaining and single-cell RNA sequencing. rtTA, reverse tetracycline-controlled transactivator; dox, doxycycline; EGF, epidermal growth factor; FGF, fetal growth factor. (B) Expression of marker genes for neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursor cells (PDGFRA) after 1, 2, 4, or 8 weeks of spontaneous differentiation for 4 candidate TFs. Scale bar, 100 μm.

FIG. 16—Single-cell RNA sequencing of spontaneously differentiated cells from iNPs demonstrates development of a broad range of cell types. (A)-(C), t-distributed stochastic neighbor embedding (tSNE) visualization of single-cell RNA sequencing data from cells that have been spontaneously differentiated from iNPs for 8 weeks. iNPs were derived using RFX4, NFIB, ASCL1, or PAX6. A total of 52,364 cells from n=2 bioreps per TF were analyzed. (A) Cells are grouped into 31 clusters, and cluster 5 is further divided into 3 subclusters. Colors indicate cell type or state. (B) Clusters that represent central nervous system (CNS) cell types are highlighted. Percentage of total cells that contribute to the specified CNS cell type is indicated. (C) Cells spontaneously differentiated from each candidate TF are highlighted. Colors indicate bioreps, S1 and S2. (D) Quantification of spontaneously differentiated cells. Left, percentage of cells from each biorep that were grouped into each cluster. Right, over all distribution of general cell types. RP, retinal progenitors; RPE, retinal pigment epithelium; RGC, retinal ganglion cells; PR, photoreceptors; DNP, dorsal neural progenitors; RG, radial glia; Astro, astrocytes; CN, cortical neurons; HB&SCN, hindbrain and spinal cord neurons; IN, interneurons; EPD&CPE, ependyma and choroid plexis epithelium; EP, epithelial progenitors; BE, bronchial epithelium; CE, cranial epithelium; NC, neural crest; CNC, cranial neural crest; Pro, uncommitted progenitors; (P), proliferative cells; (S), structural cell types such as bone and cartilage.

FIG. 17—Modeling neurodevelopmental disorders using RFX4-iNPs with DYRK1A perturbation. (A) Schematic of disease modeling by perturbing DYRK1A expression. hESCs are transduced with Cas9 and DYRK1A KO sgRNAs or DYRK1A ORF to knockout or overexpress DYRK1A respectively. RFX4 is then transiently overexpressed for 1 week to differentiate hESCs into iNPs and spontaneously differentiated for 8 weeks by withdrawing dox and growth factors. Effects of DYRK1A perturbation were characterized by bulk RNA sequencing, EdU labeling, and immunostaining. rtTA, reverse tetracycline-controlled transactivator; dox, doxycycline; EGF, epidermal growth factor; FGF, fetal growth factor. (B)-(C), Expression of DYRK1A at 7 days after transduction with Cas9 and DYRK1A KO sgRNAs (B) or DYRK1A ORF (C). (D) Heat map of genes that were significantly differentially expressed (T-test q-value<0.05 with FDR correction) depending on the dosage of DYRK1A. Genes are annotated with broad categories of gene function relevant to neural development. (E)-(F), Percentage of EdU labeled cells at 0, 2, or 4 weeks of spontaneous differentiation for DYRK1A knockout (E) or overexpression (F). Values represent mean±SEM from n=3 bioreps. 10,000 cells were analyzed per biorep. (G)-(H), Intensity of MAP2 staining for neurons at 0, 1, 2, 4, or 8 weeks of spontaneous differentiation for DYRK1A knockout (G) or overexpression (F). Values represent mean±SEM from n=2 bioreps with 6 images per biorep. KO, knockout; NT, non-targeting. ****P<0.0001; ***P<0.001; **P<0.01; *P<0.05. ns, not significant.

FIG. 18—Comparison of TF overexpression methods for neuronal differentiation. (A) Schematic of ORF and CRISPR-Cas9 activator comparison. hESCs are transduced with ORF, ORF with UTRs, or SAM CRISPR-Cas9 activator to overexpress NEUROD1 or NEUROG2 for directed differentiation into induced neurons. (B) Expression of NEUROD1 mRNA and protein after NEUROD1 overexpression from n=4 bioreps. (C) Expression of marker genes for neurons (MAP2) and NPs (PAX6) after NEUROD1 overexpression. (D) Expression of NEUROG2 mRNA after NEUROG2 overexpression from n=4 bioreps. (E) Expression of marker genes for neurons (MAP2) and NPs (PAX6) after NEUROG2 overexpression. (F) Intensity of MAP2 staining from n=6 images per condition. All values are mean±SEM. Scale bar, 100 μm. ****P<0.0001; ***P<0.001. ns=not significant. UTR, untranslated region; NT, nontargeting.

FIG. 19—Arrayed TF ORF screen for iNP differentiation. (A) 90 TF ORFs included in the library for the arrayed screen (Table 1). (B) Schematic for arrayed screening (e.g., wells). TF ORFs were individually synthesized, cloned, and packaged into lentivirus for delivery into hESCs. After 4 or 7 days of differentiation, expression of NP marker genes SLC1A3 and VIM were measured to identify candidate TFs. (C) Timeline for arrayed screening. mTeSR stem cell media was incrementally changed to NP media during differentiation, and expression of NP marker genes was measured after 4 and 7 days of differentiation. (D)-(G), Expression of VIM and SLCIA3 mRNA relative to control hESCs overexpressing GFP in NP media from n=3 infection replicates at 4 (D,E) or 7 (F,G) days of differentiation. Candidate TFs (D,F) and other isoforms of candidate TFs (E,G) are indicated.

FIG. 20—A pooled TF ORF screening platform for iNP differentiation. (A) Design of lentiviral vectors for expression of barcoded TFs. WPRE, Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element. (B) Schematic of pooled TF screening with 3 different methods for selecting cell types of interest. For the reporter cell line method, reporter cell lines transduced with the TF library are differentiated and sorted into high or low marker gene-expressing cell populations. For the flow-FISH method, differentiated cells are labeled with FISH probes targeting 2-10 marker genes and sorted based on marker gene expression. For the single-cell RNA sequencing method, differentiated cells can be analyzed using single-cell RNA-seq. In all selection methods, sequencing of TF barcodes enables identification of candidate TFs. (C) FACS plots showing distribution of EGFP expression in SLC1A3 and VIM reporter cell lines with or without the TF library. High and low bins sorted for sequencing of TF barcodes are indicated. (D)-(E), Enrichment of candidate TFs (D) or other isoforms of candidate TFs (D) in the high EGFP-expressing bin relative to the low bin from n=3 infection replicates per reporter cell line. (F) Representative FACS plot showing expression of RPL13A control or SLC1A3 and VIM mRNA labeled by FISH probes from n=3 infection replicates. High and low bins sorted for sequencing of TF barcodes are indicated. (G) Same as (F), showing expression of 10 marker gene mRNA labeled by FISH probes. (H) Comparison of candidate TF enrichment in screens using reporter cell lines and flow-FISH.

FIG. 21—Selection of candidate TFs using single-cell RNA sequencing. (A) Number of cells analyzed using single-cell RNA sequencing (RNA-seq) for each TF isoform out of 59,640 cells. (B) t-distributed stochastic neighbor embedding (tSNE) clustering of single-cell RNA-seq data from hESCs transduced with the TF library. Cells grouped into 18 clusters. (C) Same as (B) highlighting cells expressing a TF of interest. (D) Candidate TFs identified using single-cell RNA-seq. Top, correlations between TF transcriptome signatures and radial glia from human fetal cortex or brain organoid datasets20,25,26. Values represent mean correlation of cells expressing each TF as z-scores. Dashed line indicates cutoff for identifying candidate TFs. Bottom, heat map indicating percentage of cells overexpressing each TF isoform that was grouped into a particular cluster. Candidate TFs selected using single-cell RNA-seq are indicated in blue.

FIG. 22—Validation of candidate TFs for iNP differentiation. (A) Expression of candidate TFs measured using the V5 epitope tag after 7 days of differentiation. (B) Expression of NP marker genes PAX6 and NES in iNPs produced by candidate TFs after 7 days of overexpression. Cell culture media used for each ORF is indicated in parentheses. Scale bar, 50 μm. (C)-(D), Heat map of bulk RNA sequencing (RNA-seq) signature correlation between iNPs and human fetal brain cell types from the Nowakowski 2017 dataset26 (C) or human brain organoids from the Quadrato 2017 dataset25 (D). D7 and D12 indicate whether the ORF was overexpressed for 7 or 12 days, respectively. RG, radial glia; div, dividing; oRG, outer radial glia; tRG, truncated radial glia; vRG, ventricular radial glia; MGE, medial ganglionic eminence; IPC, intermediate progenitor cell; nEN, newborn excitatory neurons, EN, excitatory neurons; PFC, prefrontal cortex; V1, primary visual cortex; nIN, newborn interneurons; IN, interneurons; CTX, cortex; CGE, cortical ganglionic eminence; STR, striatum; OPC, oligodendrocyte precursor cells; Glyc, cells expressing glycolysis genes; Pro, proliferating progenitors; NE, neuroepithelium; DN, dopaminergic neurons; CLN, callosal neurons; CFN, corticofugal neurons; Meso, mesodermal progenitors.

FIG. 23—Characterization of spontaneously differentiated cells produced by candidate TFs in HUES66. Expression of marker genes for neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursor cells (NG2) after 1, 2, 4, or 8 weeks of spontaneous differentiation for 4 candidate TFs. Scale bar, 100 μm.

FIG. 24—Characterization of iNPs and spontaneously differentiated cells produced by candidate TFs in iPSC11a and H1 pluripotent stem cell lines. (A)-(B), Expression of NP marker genes in iPSC11a iNPs (A) or H1 iNPs (B) after 1 week of TF overexpression. (C)-(D), Expression of marker genes for neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursor cells (NG2 and PDGFRA) in cells spontaneously differentiated from iPSC11a iNPs (C) or H1 iNPs (D) for 8 weeks. Scale bar, 100 μm.

FIG. 25—Single-cell RNA sequencing profiling of spontaneously differentiated cells produced by candidate TFs. (A) Heat map showing the z-score of the mean log-transformed, normalized counts for each cluster of selected marker genes used to annotate clusters. For a more extensive set of genes, see Table 8. RP, retinal progenitors; RPE, retinal pigment epithelium; RGC, retinal ganglion cells; PR, photoreceptors; DNP, dorsal neural progenitors; RG, radial glia; Astro, astrocytes; CN, cortical neurons; HB&SCN, hindbrain and spinal cord neurons; IN, interneurons; EPD&CPE, ependyma and choroid plexis epithelium; EP, epithelial progenitors; BE, bronchial epithelium; CE, cranial epithelium; NC, neural crest; CNC, cranial neural crest; Pro, uncommitted progenitors; (P), proliferative cells; (S), structural cell types such as bone and cartilage. (B) Distribution of cell types generated in human brain organoids at 6 months from the Quadrato 2017 dataset25.

FIG. 26—ChIP-seq analysis of candidate TFs. (A) Top 3 de novo or known motifs identified using HOMER motif analysis. The names of the TFs with the closest matching motifs, indicating potential cofactors of candidate TFs, are listed. The percentages of ChIP peaks that contained each motif relative to the background, and the associated P-values of enrichment, are also listed. (B)-(C), Example NP marker gene loci with significant ChIP peaks from all 4 candidate TFs for HES1 (B) and BMPR1B (C). (D) Heat map showing percentage of NP-specific TFs or genes that had candidate TF ChIP peaks within 10 kb of the annotated transcriptional start site (TSS). (E) Overlap of NP-specific genes that had candidate TF ChIP peaks within 10 kb of the TSS and were differentially expressed (t-test q-value<0.05 with FDR correction) upon candidate TF overexpression. Blue regions indicate overlap.

FIG. 27—DYRK1A perturbation in RFX4-iNPs to model neurological disorders. (A) Percent indel in RFX4-derived iNPs transduced with DYRK1A KO sgRNAs. Values represent mean±SEM from n=3 bioreps. (B) DYRK1A mRNA expression measured using qPCR probes targeting the endogenous sequence or the codon-optimized ORF sequence. Values represent mean±SEM from n=4 bioreps with 4 technical replicates per biorep. *P<0.05; ND, not detected. (C) Venn diagram showing the number of genes that were significantly differentially expressed (t-test q-value<0.05 with FDR correction) and had an absolute log 2 fold change relative to control that was greater than 1. The KO sgRNAs 1 and 2 conditions were compared to both NT sgRNAs 1 and 2 controls. The ORF condition was compared to GFP control. (D)-(F) Volcano plots showing the number of genes that were significantly differentially expressed (t-test q-value<0.05 with FDR correction) and had an absolute log 2 fold change relative to control that was greater than 1 for DYRK1A KO sgRNA 1 (D), KO sgRNA 2 (E), and ORF (F) conditions. For a full list of genes, see Table 9. (G) Representative images of MAP2 staining during spontaneous differentiation for NT sgRNA 1 and DYRK1A KO sgRNA 2. Scale bar, 100 μm. KO, knockout; NT, non-targeting.

FIG. 28—A barcoded human TF library for directed differentiation. Schematic showing how the TF library can be used to produce differentiated cell types for cellular models and therapies. Puro, puromycin. WPRE, Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element. MOI, multiplicity of infection.

FIG. 29—Development of a multiplexed TF screening platform for directed differentiation. (A) Schematic of multiplexed TF screening. Barcoded TF ORFs are pooled and packaged into lentivirus for delivery into hESCs. TFs that can differentiate hESCs into the cell type of interest are identified using reporter cell line, flow-FISH, or single-cell RNA sequencing (scRNA-seq), followed by deep sequencing of TF barcodes. MOI, multiplicity of infection. (B) Scatterplot showing median enrichment of candidate TFs identified using SLC1A3 or VIM reporter cell lines from n=3 infection replicates. (C) Scatterplot showing average enrichment of candidate TFs identified by flow-FISH with pooled FISH probes targeting 2 or 10 NP marker genes from n=3 infection replicates. (D) Uniform manifold approximation and projection (UMAP) clustering of scRNA-seq data from 53,560 hESCs transduced with the TF library. (E) Heatmap indicating correlations between TF transcriptome signatures and radial glia from human fetal cortex or brain organoid datasets. Values represent mean correlation of cells overexpressing each TF as z-scores. (F) Comparison of TFs that ranked in the top 10% from the 4 different screens.

FIG. 30—Validation of candidate TFs driving iNP differentiation. Top, expression of NP marker genes VIM and NES in iNPs produced by candidate TFs after 7 days of overexpression. Cell culture media used for each ORF is indicated in parentheses. Scale bar, 50 μm. Bottom, heat map of bulk RNA sequencing (RNA-seq) signature correlation between iNPs and human fetal cortex cell types from the Pollen 2015 dataset (Pollen et al., 2015). D7 and D12 indicate the number of days that the ORF was overexpressed. RG, radial glia; IPC, intermediate progenitor cell; N, neuron; IN, interneuron.

FIG. 31—Candidate TFs produce iNPs that can spontaneously differentiate into cell types in the central nervous system. (A) Schematic of spontaneous differentiation. Dox-inducible candidate TFs are transiently overexpressed for 1 week to differentiate hESCs into iNPs, which then spontaneously differentiate for 8 weeks following withdrawal of dox and growth factors. Spontaneously differentiated cells were characterized by immunostaining and single-cell RNA sequencing. rtTA, reverse tetracycline-controlled transactivator; dox, doxycycline; EGF, epidermal growth factor; FGF, fetal growth factor. (B) Expression of marker genes for neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursor cells (PDGFRA) after 1, 2, 4, or 8 weeks of spontaneous differentiation for 4 candidate TFs. Scale bar, 100 μm.

FIG. 32—Single-cell RNA sequencing of spontaneously differentiated cells from iNPs reveals a broad array of cell types. (A) UMAP clustering of scRNA-seq data from 53,113 cells that have been spontaneously differentiated from iNPs for 8 weeks. iNPs were derived using RFX4, NFIB, ASCL1, or PAX6 with n=2 biological replicates per TF. Colors indicate cell type or state. (B) Data as in (A), with clusters representing central nervous system (CNS) cell types highlighted. Percentage of total cells that contribute to the specified CNS cell type is indicated. (C) Dot plot showing marker genes for each cluster. Circle size indicates percentage of cells expressing the gene in the given cluster and color indicates the mean gene expression value. Horizontal lines distinguish between retinal, CNS, epithelial, and CNC cell types. (D) Cells spontaneously differentiated from each candidate TF are highlighted. Colors indicate biological replicates, S1 and S2. (E) Heatmap showing the percentage of cells from each biological replicate that were grouped into each cluster. (F) Distribution of general cell types produced by each biological replicate. Pro, uncommitted progenitors; RP, retinal progenitors; RPE, retinal pigment epithelium; PR, photoreceptors; RGC, retinal ganglion cells; DNP, dorsal neural progenitors; RG, radial glia; Astro, astrocytes; CN, CNS neurons; EPD, ependyma; EP, epithelial progenitors; BE, bronchial epithelium; CE, cranial epithelium; CNC, cranial neural crest; CNCP, cranial neural crest progenitors; (P), proliferative cells.

FIG. 33—Combining RFX4 with dual SMAD inhibition produces homogenous NPs that generate predominantly GABAergic neurons. (A) UMAP clustering of scRNA-seq data from iNPs derived using different iNP differentiation methods. RFX4-DS-iNPs were produced by combining RFX4 overexpression with dual SMAD inhibition, EB-iNPs were produced using the embryoid body protocol (Schafer et al., 2019), and DS-iNPs were produced using the dual SMAD inhibition protocol (Shi et al., 2012a). Data represents n=2 batch replicates per method with 15,211 RFX4-DS-iNPs, 11,148 EB-iNPs, and 16,421 DS-iNPs. Colors indicate cell type or state. (B) Dot plot showing marker genes for each cluster. Circle size indicates percentage of cells expressing the gene in the given cluster and color indicates the mean expression value. (C) Box plots showing distributions of Euclidean distances between cells within the same batch replicate. Whiskers indicate the 5th and 95th percentiles. (D) Same as (C), for cells between different batch replicates. (E) Data as in (A), highlighting cells derived from each differentiation method. Colors indicate batch replicates, S1 and S2. (F) Heatmap showing the percentage of cells from each batch replicate that were grouped into each cluster. (G) Data as in (A), colored by marker gene expression. (H) UMAP clustering of scRNA-seq data from 26,111 cells that have been spontaneously differentiated from iNPs. iNPs were produced by combining RFX4 overexpression with dual SMAD inhibition and spontaneously differentiated for 4 or 8 weeks. Data represents n=2 biological replicates per timepoint. Colors indicate cell type or state. (I) Dot plot showing marker genes for each cluster. Circle size indicates percentage of cells expressing the gene in the given cluster and color indicates the mean expression value. (J) Data as in (H), colored by marker gene expression. (K) Cells from each time point are highlighted. Colors indicate biological replicates, S1 and S2. (L) Heatmap showing the percentage of cells from each biological replicate that were grouped into each cluster. (M) Distribution of general cell types produced by each biological replicate. NP, neural progenitors; CN, CNS neurons; CNC, cranial neural crest; RG, radial glia; MNG, meninges; P, proliferative cells.

FIG. 34—Modeling neurodevelopmental disorders using RFX4-iNPs with DYRK1A perturbation. (A) Schematic of disease modeling by perturbing DYRK1A expression. Human induced pluripotent stem cells (iPSCs) are transduced with Cas9 and sgRNAs or ORF to knockout or overexpress DYRK1A, respectively. RFX4 is then transiently overexpressed for 1 week to differentiate iPSCs into iNPs, which then spontaneously differentiate for 8 weeks following withdrawal of dox and growth factors. Effects of DYRK1A perturbation were characterized using bulk RNA sequencing, EdU labeling, immunostaining, or electrophysiology. rtTA, reverse tetracycline-controlled transactivator; dox, doxycycline; EGF, epidermal growth factor; FGF, fetal growth factor. (B-D) Volcano plots showing the number of genes that were significantly differentially expressed (t-test q-value<0.05 with FDR correction) and had an absolute log 2 fold change relative to control that was greater than 1 for DYRK1A KO sgRNA 1 (B), KO sgRNA 2 (C), and ORF (D) conditions. For a full list of genes, see Table S3. The KO sgRNAs 1 and 2 conditions were compared to both NT sgRNAs. The ORF condition was compared to GFP control. (E) Venn diagram summarizing the significantly differentially expressed genes in (B-D). (F) Heatmap of genes that were significantly differentially expressed (T-test q-value<0.05 with FDR correction) depending on the dosage of DYRK1A. Genes are annotated with broad categories of gene function relevant to neural development. Average gene expression measurements across n=3 biological replicates are shown. (G-H) Percentage of EdU labeled cells at 0, 2, or 4 weeks of spontaneous differentiation for DYRK1A knockout (G) or overexpression (H). Values represent mean±SEM from n=3 biological replicates. 10,000 cells were analyzed per biological replicate. (I-J) Intensity of MAP2 staining for neurons at 0, 1, 2, 4, or 8 weeks of spontaneous differentiation for DYRK1A knockout (I) or overexpression (J). Values represent mean±SEM from n=2 biological replicates with 6 images per biological replicate. KO, knockout; NT, non-targeting. ****P<0.0001; ***P<0.001; **P<0.01; *P<0.05. ns, not significant.

FIG. 35—Comparison of TF overexpression methods for neuronal differentiation. (A) Schematic of ORF and CRISPR-Cas9 activator comparison. hESCs are transduced with ORF, ORF with UTRs, or SAM CRISPR-Cas9 activator to overexpress NEUROD1 or NEUROG2 for directed differentiation into induced neurons. (B) Expression of NEUROD1 mRNA and protein after NEUROD1 overexpression from n=4 biological replicates. (C) Expression of marker genes for neurons (MAP2) and NPs (PAX6) after NEUROD1 overexpression. (D) Expression of NEUROG2 mRNA after NEUROG2 overexpression from n=4 biological replicates. (E) Expression of marker genes for neurons (MAP2) and NPs (PAX6) after NEUROG2 overexpression. (F) Intensity of MAP2 staining from n=6 images per condition. All values are mean±SEM. Scale bar, 100 μm. ****P<0.0001; ***P<0.001. ns=not significant. UTR, untranslated region; NT, nontargeting.

FIG. 36—A multiplexed TF ORF screening platform for iNP differentiation. (A) Timeline for screening. mTeSR stem cell media was incrementally changed to NP media during differentiation, and cells were harvested after 7 days of differentiation. (B) FACS histograms showing distribution of EGFP expression in SLC1A3 and VIM reporter cell lines with or without the TF library. High and low bins sorted for sequencing of TF barcodes are indicated. (C) Scatterplot showing enrichment of alternative isoforms of candidate TFs identified using SLC1A3 or VIM reporter cell lines from n=3 infection replicates. (D) Representative FACS plot showing expression of RPL13A control or SLC1A3 and VIM mRNA labeled by FISH probes from n=3 infection replicates. High and low bins sorted for sequencing of TF barcodes are indicated. (E) Same as (D), showing expression of 10 marker gene mRNA labeled by FISH probes. (F) Scatterplot showing enrichment of alternative isoforms of candidate TFs identified by flow-FISH with pooled FISH probes targeting 2 or 10 NP marker genes from n=3 infection replicates. (G) Comparison of candidate TF enrichment in screens using reporter cell lines and flow-FISH. (H) Number of cells analyzed using single-cell RNA sequencing (RNA-seq) that were assigned to each TF isoform out of 53,560 cells. (I) Uniform manifold approximation and projection (UMAP) clustering of single-cell RNA-seq data from hESCs transduced with the TF library. Cells expressing TFs of interest are highlighted. (J) Z-score of median Euclidean distances between cells expressing a TF and the rest of the cells. Distances were calculated using 939 highly variable genes. (K) Heatmap showing relative marker gene expression of cell types from the mouse organogenesis cell atlas (Cao Nature 2019) in cells overexpressing each TF isoform. The top 30 marker genes for each cell type were used to determine marker gene enrichment as z-scores. Candidate TFs selected using single-cell RNA-seq are indicated in blue.

FIG. 37—Validation of candidate TFs identified by pooled screens for iNP differentiation. (A) Schematic for arrayed screening. TF ORFs were individually synthesized, cloned, and packaged into lentivirus for delivery into hESCs. After 7 days of differentiation, expression of NP marker genes SLC1A3 and VIM was measured to identify candidate TFs. (B-C) Expression of VIM and SLC1A3 mRNA relative to control hESCs overexpressing GFP in NP media from n=3 infection replicates. Candidate TFs (B) and alternative isoforms of candidate TFs (C) are indicated. (D) Western blot showing expression of candidate TFs measured using the V5 epitope tag after 7 days of differentiation. (E) Top, expression of NP marker genes PAX6 and NES in iNPs produced by candidate TFs after 7 days of overexpression. Cell culture media used for each ORF is indicated in parentheses. Scale bar, 50 μm. Middle and bottom, Heatmaps of bulk RNA sequencing (RNA-seq) signature correlation between iNPs and human fetal brain cell types from the Nowakowski 2017 dataset (middle) or human brain organoids from the Quadrato 2017 dataset (bottom). D7 and D12 indicate whether the ORF was overexpressed for 7 or 12 days, respectively. RG, radial glia; div, dividing; oRG, outer radial glia; tRG, truncated radial glia; vRG, ventricular radial glia; MGE, medial ganglionic eminence; IPC, intermediate progenitor cell; nEN, newborn excitatory neurons, EN, excitatory neurons; PFC, prefrontal cortex; V1, primary visual cortex; nIN, newborn interneurons; IN, interneurons; CTX, cortex; CGE, cortical ganglionic eminence; STR, striatum; OPC, oligodendrocyte precursor cells; Glyc, cells expressing glycolysis genes; Pro, proliferating progenitors; NE, neuroepithelium; DN, dopaminergic neurons; CLN, callosal neurons; CFN, corticofugal neurons; Meso, mesodermal progenitors.

FIG. 38—Characterization of iNPs and spontaneously differentiated cells produced by candidate TFs in different stem cell lines. (A) Expression of marker genes for neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursor cells (NG2) in cells spontaneously differentiated for 1, 2, 4, or 8 weeks from HUES66 iNPs produced by 4 candidate TFs. (B-C) Expression of NP marker genes in iPSC11a iNPs (B) or H1 iNPs (C) after 1 week of TF overexpression. (D-E) Expression of marker genes for neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursor cells (NG2 and PDGFRA) in cells spontaneously differentiated from iPSC11a iNPs (D) or H1 iNPs (E) for 8 weeks. Scale bar, 100 μm.

FIG. 39—Profiling spontaneously differentiated neurons from iNPs by single-cell RNA sequencing and target genes of candidate TFs by ChIP-seq. (A-E) UMAP clustering of single-cell RNA-seq data from 4,162 neurons that have been spontaneously differentiated from iNPs for 8 weeks. iNPs were derived using RFX4, NFIB, ASCL1, or PAX6 with n=2 biological replicates per TF. (A-D) or biological replicates (E). (A-D) Marker genes for general regions of the central nervous systems (A), newborn cortical excitatory neurons (B), neuronal subtypes (C), and cortical projection neurons (D) are shown. Colors indicate gene expression. (E) Neurons spontaneously differentiated from each candidate TF are highlighted. Colors indicate biological replicates, S1 and S2. (F) Top 3 de novo or known motifs identified using HOMER motif analysis. The names of the TFs with the closest matching motifs, indicating potential cofactors of candidate TFs, and the associated P-values of enrichment are listed. G, Heatmap showing percentage of NP-specific TFs or genes that had candidate TF ChIP peaks within 10 kb of the annotated transcriptional start site (TSS). (H-I) Overlap of NP-specific genes that had candidate TF ChIP peaks within 10 kb of the TSS and were differentially expressed (t-test q-value<0.05 with FDR correction) upon candidate TF overexpression. Genes that were shared between candidate TFs are shown in (H), with blue regions indicating overlap, and genes unique to each candidate TF are shown in (I).

FIG. 40—Characterization of iNPs produced by combining RFX4 with dual SMAD inhibition. (A) Schematic for different media conditions (M1-M8) tested. SMAD inhibitors dorsomorphin (DM) and SB-431542 (SB) were added to the media at the indicated concentrations. mTeSR stem cell media was changed to different NP media (NP, EB, and DS; see Methods) over 7 days of differentiation. (B) Heatmaps showing expression of neuron marker genes TUJ1 and MAP2 relative to GAPDH control in cells from iNPs that have undergone spontaneous neurogenesis for 2 or 4 weeks. iNPs were differentiated for 5 or 7 days using each of the media conditions in (A) and seeded at low or high densities prior to spontaneous neurogenesis. Colors represent mean expression from n=4 biological replicates. (C) Same as (A), for additional media conditions tested. (D) Same as (B), for the media conditions shown in (C). (E) UMAP clustering of scRNA-seq data from iNPs derived using different iNP differentiation methods. Marker genes for the telencephalon are shown. Data represents n=2 batch replicates per method with 15,211 RFX4-DS-iNPs, 11,148 EB-iNPs, and 16,421 DS-iNPs. Colors indicate gene expression. (F) Expression of NP marker genes NES and FOXG1 in iNPs produced by different NP differentiation methods. RFX4-DS-iNPs were produced by combining RFX4 overexpression with dual SMAD inhibition, EB-iNPs were produced using the embryoid body protocol (Schafer et al., 2019), and DS-iNPs were produced using the dual SMAD inhibition protocol (Shi et al., 2012a). Scale bar, 50 μm. (G-J) UMAP clustering of single-cell RNA-seq data from 26,111 cells that have been spontaneously differentiated from iNPs. iNPs were produced by combining RFX4 overexpression with dual SMAD inhibition and spontaneously differentiated for 4 or 8 weeks. Data represents n=2 biological replicates per timepoint. Marker genes for general regions of the central nervous systems (G), radial glia subtypes (H), neuronal subtypes (I), and GABAergic interneuron subtypes (J) are shown. Colors indicate gene expression.

FIG. 41—Perturbations of DYRK1A in RFX4-iNPs for modeling neurological disorders. (A) Percent indels in RFX4-iNPs transduced with DYRK1A KO sgRNAs. Values represent mean±SEM from n=3 biological replicates. (B) DYRK1A mRNA expression measured using qPCR probes targeting the endogenous sequence or the codon-optimized ORF sequence. Values represent mean±SEM from n=4 biological replicates with 4 technical replicates per biological replicate. ND, not detected. (C-D) Western blot of DYRK1A at 7 days after transduction with Cas9 and DYRK1A KO sgRNAs (C) or DYRK1A ORF (D). (E) Representative images of MAP2 staining during spontaneous differentiation for NT sgRNA 1 and DYRK1A KO sgRNA 2. Scale bar, 100 μm. (F) Representative electrophysiology traces for neurons with or without evoked action potentials (AP) and spontaneous excitatory postsynaptic currents (EPSCs). (G) Proportion of neurons with or without AP and EPSCs for different DYRK1A perturbations from n=31-45 neurons. (H-I) Intrinsic membrane (H) and action potential (I) properties measured using electrophysiology for different DYRK1A perturbations from n=12-36 neurons with evoked action potentials. Mean±SEM indicated on graph. *P<0.05.

FIG. 42—Building a TF Atlas of directed differentiation. (A) Schematic of TF Atlas setup. All 3,550 barcoded TF ORFs from the MORF library were packaged into lentivirus for delivery into human embryonic stem cells (hESCs) at a low multiplicity of infection (MOI). After 7 days of TF ORF overexpression, cells were profiled using single-cell RNA sequencing (scRNA-seq) to map TF ORFs to expression changes. (B-D) Uniform manifold approximation and projection (UMAP) of scRNA-seq data from 671,453 cells overexpressing 3,266 TF isoforms. Colors indicate Louvain clusters (B), gene expression (C), and diffusion pseudotime (D). (E) Smoothened heat map of the top 1,000 upregulated and downregulated genes over diffusion pseudotime. Gene expression in each row is represented as z-scores. Genes are ordered based on the slope of expression change over pseudotime fitted using linear regression. (F-G) Most enriched pathways among the top 100 upregulated (F) and downregulated (G) genes. (H) Heat map showing significance of the difference between assigned pseudotimes of cells expressing each TF isoform and those expressing controls. TF isoforms are grouped by gene. Only 320 TF genes with multiple isoforms, at least one of which induces a significantly different pseudotime than control, are included.

FIG. 43—Unbiased grouping of TFs based on gene programs. (A) Heat maps showing pairwise Pearson correlation (top) and enrichment of 100 gene programs (bottom) identified using non-negative matrix factorization (NMF) on mean expression profiles of 3,266 TF ORFs. TFs are ordered by hierarchical clustering. Each TF ORF is annotated by TF family and average diffusion pseudotime relative to control. Some TF groups are labeled and annotated based on known relationships. Numbers in parentheses indicate the number of TF isoforms that were found in the same group. (B-C) Zoomed in subsets of (A) with top enriched pathway annotated for each gene program. (D) UMAP of scRNA-seq data highlighting enrichment of each gene program.

FIG. 44—Mapping TF ORFs in differentiated cells to reference cell types. (A-B) UMAP of scRNA-seq data from 28,825 differentiated cells. Cells from clusters 6-8 of the TF Atlas shown in FIG. 42B were reclustered for further characterization. Colors indicate Louvain clusters (A) and nominated cell type from the human fetal cell atlas (Cao Science 2020) (B). Cell type matches with score >0.3 are highlighted. (C-D) Heat maps showing percentage of cells with the indicated TF ORF that were assigned to each cluster (C) or nominated cell type (D). Numbers after TF gene names indicate the isoform. Percentages are determined by normalizing to the total number of cells overexpressing the indicated TF in the entire TF Atlas. Only the 5 most enriched TF ORFs that are greater than 5% are shown. EMT, epithelial-mesenchymal transition; ENS, enteric nervous system.

FIG. 45—Validation of candidate TFs for differentiation towards nominated cell types. (A) Expression of marker genes for each nominated cell type in H1 hESCs after 7 days of candidate TF or GFP overexpression. Numbers after TF gene names indicate the isoform. n=4. (B-C) Scatterplot comparing expression of 205 marker genes in H1 hESCs to H9 hESCs (B) or 11a iPSCs (C). Expression is measured as average fold change in cells overexpressing candidate TF relative to GFP. (D-K) Left, expression of marker genes in H1 hESCs after 7 days of candidate TF overexpression. Right, intensity of marker gene staining from n=6 images per condition. Mean intensity per cell is normalized to cells overexpressing the GFP control. Scale bar, 25 μm. Marker genes for neuron (D), EMT smooth muscle (E), endothelial (F), smooth muscle (G), metanephric (H), intestinal epithelial (I), lung ciliated epithelial (J), and trophoblast (K) cells are shown. EMT, epithelial-mesenchymal transition. Values represent mean±SEM. ****P<0.0001; ***P<0.001; **P<0.01; *P<0.05.

FIG. 46—Targeted TF overexpression screening platform for directed differentiation. (A) Schematic of targeted TF screening. A subset of TFs are pooled from the MORF library and packaged into lentivirus for delivery into hESCs. TFs that can differentiate hESCs into the cell type of interest are identified using reporter cell line, flow-FISH, or scRNA-seq, followed by deep sequencing of TF barcodes. MOI, multiplicity of infection. (B) Comparison of TFs that ranked in the top 10% from the 4 different screens for induced neural progenitor (iNP) differentiation. (C) Expression of markers for neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursor cells (PDGFRA) after 1, 2, 4, or 8 weeks of spontaneous differentiation from RFX4-iNPs. Scale bar, 100 μm. (D-F) ScRNA-seq data from 26,111 cells that have been spontaneously differentiated from iNPs for 4 or 8 weeks. iNPs were produced by RFX4-DS-iNPs. Data represents n=2 biological replicates per timepoint. NP, neural progenitors; CN, CNS neurons; CNC, cranial neural crest; RG, radial glia; MNG, meninges; (P), proliferative cells. (D) UMAP clustering results with colors indicating Louvain clusters. (E) Dot plot showing marker genes for each cluster. Circle size indicates percentage of cells expressing the gene in the given cluster and color indicates the mean expression value. (F) Distribution of general cell types produced by each biological replicate. (G-J) Disease modeling by knocking out or overexpressing DYRK1A in human induced pluripotent stem cells (iPSCs) and differentiating into neural progenitors using RFX4. (G-H) Percentage of EdU labeled cells at 0, 2, or 4 weeks of spontaneous differentiation for DYRK1A knockout (G) or overexpression (H). n=3 biological replicates. (I-J) Intensity of MAP2 staining for neurons at 0, 1, 2, 4, or 8 weeks of spontaneous differentiation for DYRK1A knockout (I) or overexpression (J). n=12 images. Values represent mean±SEM. KO, knockout; NT, non-targeting; sg, single guide RNA. ****P<0.0001; ***P<0.001; **P<0.01; *P<0.05; ns, not significant.

FIG. 47—Regulatory networks by joint profiling of chromatin accessibility and gene expression under TF overexpression. (A) Weighted nearest neighbor (WNN) UMAP of joint chromatin accessibility and gene expression measured by scATAC- and scRNA-seq, respectively, from 69,085 cells overexpressing 198 TF isoforms for 4 or 7 days. Colors indicate clusters identified by the smart local moving (SLM) algorithm. (B) Dot plot showing marker genes for each cluster. Color indicates the expression and circle size indicates chromatin accessibility. Values represent average fold change relative to other clusters. (C-E) Example marker gene chromatin accessibility (left) and expression (right) for different clusters compared to the undifferentiated cluster 0. Genes that show strong (C), weak (D), and no (E) correlation between ATAC and RNA profiles are included. (F) Heat maps showing the top TF ORF (left) and nominated regulators (right) for each cluster. Left, percentage of cells with the indicated TF ORF is shown. Numbers after TF gene names indicate the isoform. Percentages are determined by normalizing to the total number of cells with the TF ORF in the joint scATAC- and scRNA-seq dataset. Only the 6 most enriched TF ORFs that are greater than 5% are shown. Right, average AUC (area under the ROC curve) of TF motif enrichment and RNA expression is shown. TFs with significantly enriched (FDR<0.05) motif and expression in each cluster are included. TFs that were identified as top ORFs and regulators are labeled in blue.

FIG. 48—Combinatorial TF screening and prediction. (A) UMAP of scRNA-seq profiles from the combinatorial TF screen in hESCs. Each circle represents the mean expression profile of cells overexpressing the indicated TF ORF(s). The screen included 10 TF ORFs in combinations, including 44 doubles and 3 triples, as well as 10 singles. Example single TF profiles with associated grouping of TF combinations (CDX1, FLI1, and KLF4) are indicated with black borders. (B-C) Percent accuracy for different approaches to predict TFs for measured double (B) or triple (C) TF expression profiles. Single TF profiles were averaged or fitted with linear regression models against double or triple TF profiles. Combinations of single TF profiles were ranked by similarity to the measured combinatorial TF profile. The nominated combinations were compared to the known TF combinations of the measured combinatorial TF profiles to assess accuracy. Kernel ridge and random forest regression algorithms did not significantly outperform random selection for triplet prediction and were excluded. (D-I) Cell type prediction results for double TF profiles. Known combinations (D) or predicted combinations for hepatoblasts (E), bronchiolar and alveolar epithelial cells (F), metanephric cells (G), vascular endothelial cells (H), and trophoblast giant cells (I) are shown. TF combinations were ranked by the gene signature scores for each respective cell type. As gene signature scores were discrete, the percentile ranks were reported as ranges. For predicted combinations, TFs that are part of known combinations, developmentally critical, or specifically expressed in the target cell types are indicated in blue.

FIG. 49—Comparison of TF overexpression methods for neuronal differentiation. (A) Schematic of ORF and CRISPR activator (CRISPRa) comparison. hESCs are transduced with ORF, ORF with UTRs, or SAM CRISPRa to upregulating NEUROD1 or NEUROG2 for directed differentiation into induced neurons. (B) Expression of NEUROD1 mRNA and protein after NEUROD1 upregulation. n=4. (C) Expression of marker genes for neurons (MAP2) and neural progenitors (PAX6) after NEUROD1 upregulation. (D) Expression of NEUROG2 mRNA after NEUROG2 upregulation. n=4. (E) Expression of marker genes for neurons (MAP2) and NPs (PAX6) after NEUROG2 upregulation. (F) Intensity of MAP2 staining normalized to nuclei count. n=6. All values are mean±SEM. Scale bar, 100 μm. ****P<0.0001; ***P<0.001; ns, not significant. UTR, untranslated region; NT, nontargeting; sg, single guide RNA.

FIG. 50—Bulk TF screening in different cell culture media. (A) Design of barcoded TF ORF lentiviral vectors. WPRE, Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element. (B) Schematic of bulk TF screening. All 3,550 barcoded TF ORFs from the MORF library were packaged into lentivirus for delivery into hESCs at a low multiplicity of infection (MOI). After 7 days of TF ORF overexpression in 7 different cell culture media, cells were stained for stem cell markers (TRA-1-60 and SSEA4) and sorted to enrich for stem and differentiated cells. Deep sequencing of TF barcodes profiled changes in TF distribution. (C) Scatterplots comparing the TF barcode distribution for the initial plasmid and lentiviral libraries to the unsorted cells cultured in 7 different medias (M1-M7, see methods) after 7 days of TF ORF overexpression. BR1 and BR2 indicate the two biological replicates. Skew represents the ratio between the 90th and 10th percentile barcode counts. (D) Heat map showing the fold change in TF barcodes in each media condition relative to the initial lentivirus library. The top 10 most enriched and depleted TF barcodes are labeled. Numbers after the TF gene name indicate the isoform. (E) Heat map showing pairwise Pearson correlation between each of the conditions in (D). Conditions are ordered by hierarchical clustering. (F-G) Scatterplots showing the relationship between TF barcode counts and ORF length for the lentivirus library (F) and the average unsorted cells after 7 days of overexpression (G).

FIG. 51—Bulk TF screening to evaluate effects of media on TF-induced differentiation outcome. (A) Scatterplots showing the fold change in TF barcodes in the sorted differentiated cells relative to stem cells for each media condition (M1-M7, see methods). BR1 and BR2 indicate the two biological replicates. TFs with known roles in development or differentiation are labeled. (B) Heat map summarizing the fold changes in (A) for each TF isoform. The top 50 most enriched TFs are labeled. Numbers after the TF gene name indicate the isoform. (C) Data as in (B), highlighting the TFs with known roles in development or differentiation. (D) Heat map showing the pairwise Pearson correlation between each of the conditions in (B). The top 5% of TFs with the highest average fold change were evaluated. Conditions are ordered by hierarchical clustering. (E) Box plots showing fold enrichment of 67 developmentally critical TFs (Parekh Cell Systems 2018 and this study) for each media condition. Whiskers indicate the 10th and 90th percentiles.

FIG. 52—Data quality control for the TF Atlas. (A) Violin plots showing distribution of genes, unique molecular identifiers (UMIs), and percent mitochondrial counts per cell in the TF Atlas. (B) Comparison of TF ORF distributions between the bulk TF screen and the TF Atlas scRNA-seq. For each TF ORF, barcode counts per million (CPM) from the bulk screen is compared to the number of cells per TF in the TF Atlas. (C) Distribution of cells overexpressing each TF isoform. Cells were subsampled or filtered by TF ORF such that each TF had between 3 and 1,000 cells in the TF Atlas. (D) Scatterplot showing the relationship between average expression of the TF ORF per cell to the TF ORF length. (E) Density scatterplot showing, for each cell, expression of the TF ORF and the corresponding endogenous TF. TF ORF expression is measured using barcode counts and endogenous TF expression is measured using scRNA-seq counts. (F) UMAP of TF Atlas scRNA-seq data highlighting cells with indicated ORF. Numbers after TF gene names indicate the isoform. (G) Heat maps showing percentage of cells with the indicated TF ORF that were assigned to each cluster.

FIG. 53—Pseudotime analysis for ordering cells in differentiation trajectories. (A-B) Force-directed graph (FDG) representation of TF Atlas scRNA-seq data. Colors indicate Louvain clusters (A) and diffusion pseudotime (B). (C) Stream plot of velocities shown on the UMAP of TF Atlas scRNA-seq data from 671,453 cells overexpressing 3,266 TF isoforms. Colors indicate Louvain clusters. (D) UMAP of TF Atlas scRNA-seq data. Colors indicate RNA velocity pseudotimes. (E) FDG representation of (C). (F) FDG representation of (D). (G) Density scatterplots comparing the diffusion pseudotimes to RNA velocity for each cell. (H-J) Density scatterplots showing the number of genes (H), UMIs (I), and TF barcode counts (J) over diffusion pseudotime for each cell. (K) Comparison of the average euclidean distance and pseudotime for cells overexpressing TFs relative to those overexpressing controls.

FIG. 54—Differentially expressed genes across pseudotime. (A) Smoothened heat map of the top 1,000 upregulated and downregulated genes over RNA velocity. Gene expression in each row is represented as z-scores. Genes are ordered based on the slope of expression change over pseudotime fitted using linear regression. (B) Gene expression along trajectories calculated with diffusion (left) or RNA velocity (right). (C) Scatterplot comparing the differentiation results of the scRNA-seq pseudotime analysis to the bulk TF screen. For the scRNA-seq screen, the average pseudotime of cells overexpressing TFs relative to those overexpressing GFP or mCherry controls is shown. For the bulk TF screen, the average fold change in the corresponding TF barcodes in the sorted differentiated cells relative to stem cells is shown. (D) Significance of the difference between assigned pseudotimes of cells expressing each TF isoform and those expressing controls. Subset of TF isoforms from FIG. 42H are included. Dashed line indicates the threshold above which FDR<0.05.

FIG. 55—Unbiased clustering of TFs based on Pearson correlation of gene expression. (A) Heat map showing pairwise Pearson correlation for mean expression profiles of 3,266 TF ORFs. TFs are ordered by hierarchical clustering. Each TF is annotated by TF family and average pseudotime relative to control. Some TF groups are labeled and annotated based on known relationship. (B-C) Zoomed in subsets of (A).

FIG. 56—Differential gene expression analysis and cell type mapping for differentiated cells. (A) Smoothened heat map showing expression of marker genes for each cluster of differentiated cells from FIG. 44A. Cells are sorted by cluster followed by diffusion pseudotime. Gene expression in each column is represented as z-scores. (B) Heat map showing percentage of cells from each cluster that mapped to the indicated reference cell type. EMT, epithelial-mesenchymal transition; ENS, enteric nervous system. (C) Heat map showing enrichment of Gene Ontology (GO) biological process terms in differentially expressed genes for each cluster. CNS, central nervous system; diff., differentiation; reg., regulation; dev., development; migr., migration.

FIG. 57—Expression of marker genes across stem cell lines and in additional nominated cell types. (A) Heat map showing expression of marker genes in H1 hESCs (left), H9 hESCs (middle), or 11a iPSCs (right) after 7 days of candidate TF or GFP overexpression. Expression is shown as average fold change in cells overexpressing candidate TF relative to GFP. Numbers after TF gene names indicate the isoform. (B) Expression of marker genes for each nominated cell type in H1 hESCs after 7 days of candidate TF or GFP overexpression. n=4. Values represent mean±SEM. ****P<0.0001; ***P<0.001; **P<0.01; *P<0.05; ns, not significant.

FIG. 58—Validation of candidate TFs in other stem cell lines for differentiation towards nominated cell types. (A-B) Expression of marker genes for each nominated cell type in H9 hESCs (A) or 11a iPSCs (B) after 7 days of candidate TF or GFP overexpression. Numbers after TF gene names indicate the isoform. n=4. Values represent mean±SEM. ****P<0.0001; ***P<0.001; **P<0.01; *P<0.05; ns, not significant; ND, not detected.

FIG. 59—Immunostaining of marker genes to validate candidate TFs for inducing differentiation of nominated cell types. (A-C) Left, expression of marker genes in H1 hESCs after 7 days of candidate TF or GFP overexpression. Right, intensity of marker gene staining from n=6 images per condition. Numbers after TF gene names indicate the isoform. Mean intensity per cell is normalized to cells overexpressing the GFP control. Scale bar, 25 μm. Marker genes for stromal (A), intestinal epithelial (B), and lung ciliated epithelial (C) cells are shown. Values represent mean±SEM. **P<0.01; *P<0.05; ns, not significant. (D) Expression of marker genes in H1 hESCs after 7 days of GFP overexpression. Controls for data in FIG. 45D-K.

FIG. 60—A targeted TF ORF screening platform for iNP differentiation. (A) Timeline for screening. mTeSR stem cell media was incrementally changed to neural progenitor media during differentiation, and cells were harvested after 7 days of differentiation. (B) FACS histograms showing distribution of EGFP expression in SLC1A3 and VIM reporter cell lines with or without the TF library. High and low bins sorted for sequencing of TF barcodes are indicated. (C-D) Scatterplots showing enrichment of candidate TFs (C) and alternative isoforms (D) identified using SLC1A3 or VIM reporter cell lines. n=3 replicates per reporter cell line. (E-F) Representative FACS plots showing expression of 2 (E) or 10 (F) NP marker genes labeled by pooled FISH probes. High and low bins sorted for sequencing of TF barcodes are indicated. (G-H) Scatterplot showing enrichment of candidate TFs (G) and alternative isoforms (H) identified by flow-FISH with pooled FISH probes targeting 2 or 10 NP marker genes. n=3 replicates per flow-FISH screen. (I) Comparison of candidate TF enrichment in screens using reporter cell lines and flow-FISH.

FIG. 61—TF ORF screening with single-cell RNA-sequencing and in an arrayed format. (A-G) TF ORF screening using single-cell RNA sequencing (scRNA-seq) on 60,997 cells as readout. (A) Violin plots showing distribution of genes, unique molecular identifiers (UMIs), and percent mitochondrial counts per cell. (B) Distribution of cells overexpressing each TF isoform. (C) Comparison of TF ORF expression per cell measured by TF barcode counts and TF ORF length. Data represents mean±SEM. (D-E) Uniform manifold approximation and projection (UMAP) clustering of scRNA-seq data. Colors indicate Louvain clusters (D) or cells expressing TFs of interest (E). (F) Z-score of mean Euclidean distances between cells expressing a TF and the rest of the cells. (G) Heatmap indicating correlations between mean expression profiles of cells overexpressing each TF and human radial glia from published datasets (14, 22-25). Values represent z-scores of Pearson correlation. (H-I) Scatterplots showing enrichment of candidate TFs (H) and alternative isoforms (I) identified using arrayed screening format. TF ORFs were individually packaged into lentivirus for delivery into hESCs. Expression of marker genes SLC1A3 and VIM was measured to identify candidate TFs. N=3 screening replicates.

FIG. 62—Validation of candidate TFs driving iNP differentiation. (A) Western blot showing expression of candidate TFs measured using the V5 epitope tag after 7 days of differentiation. (B) Top, expression of NP markers VIM and NES in iNPs produced by candidate TFs after 7 days of overexpression. Cell culture media used for each ORF is indicated in parentheses. Scale bar, 50 μm. Bottom, heat maps showing correlation between expression profiles of iNPs and human fetal cortex or brain organoid cell types from 3 datasets (14, 23, 24). D7 and D12 indicate the number of days that the ORF was overexpressed. RG, radial glia; IPC, intermediate progenitor cell; N, neuron; IN, interneuron; div, dividing; oRG, outer radial glia; tRG, truncated radial glia; vRG, ventricular radial glia; MGE, medial ganglionic eminence; nEN, newborn excitatory neurons, EN, excitatory neurons; PFC, prefrontal cortex; V1, primary visual cortex; nIN, newborn interneurons; CTX, cortex; CGE, cortical ganglionic eminence; STR, striatum; OPC, oligodendrocyte precursor cells; Glyc, cells expressing glycolysis genes; Pro, proliferating progenitors; NE, neuroepithelium; DN, dopaminergic neurons; CLN, callosal neurons; CFN, corticofugal neurons; Meso, mesodermal progenitors.

FIG. 63—Characterization of cells spontaneously differentiated from iNPs generated by candidate TFs. (A) Schematic of spontaneous differentiation. Dox-inducible candidate TFs are transiently overexpressed for 1 week to differentiate hESCs into iNPs, which then spontaneously differentiate for 8 weeks following withdrawal of dox and growth factors. Spontaneously differentiated cells were characterized by immunostaining and single-cell RNA sequencing. dox, doxycycline; EGF, epidermal growth factor; FGF, fetal growth factor. (B-C) Expression of marker genes for neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursor cells [PDGFRA (B) or NG2 (C)] in cells spontaneously differentiated for 1, 2, 4, or 8 weeks from iNPs produced by candidate TFs. Scale bar, 100 μm.

FIG. 64—Validation of candidate TFs in other stem cell lines for iNP differentiation. (A-B) Expression of NP marker genes in iNPs generated using 11a iPSC (A) or H1 hESC (B) lines after 1 week of TF overexpression. (C-D) Expression of marker genes for neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursor cells (NG2 and PDGFRA) in cells spontaneously differentiated from 11a iPSC iNPs (C) or H1 hESC iNPs (D) for 8 weeks. Scale bar, 100 μm.

FIG. 65—Differentiation of cardiomyocytes from EOMES-derived progenitors. (A) Percent of cells that stained for TNNT2 at day 10 for different EOMES induction times and seeding densities pooled from n=3 biological replicates. (B) Percent of cells that stained for TNNT2 at day 10 after 2 days of EOMES induction or GSK and Wnt inhibition pooled from n=3 biological replicates. (C) Expression of cardiomyocyte markers TNNT2 and NKX2.5 at day 30 after 2 days of EOMES induction or GSK and Wnt inhibition. Scale bar, 100 μm. (D) UMAP clustering of single-cell RNA-seq data from 16,698 cells that have been spontaneously differentiated for 4 weeks after EOMES induction or GSK and Wnt inhibition. Data represents n=2 biological replicates per differentiation method. Colors indicate cell types of state. (E) Dot plot showing marker genes for each cluster. Circle size indicates percentage of cells expressing the gene in the given cluster and color indicates the mean expression value. (F) Data as in (D), colored by marker gene expression. (G) Heatmap showing the percentage of cells from each biological replicate that were grouped into each cluster. (H) Distribution of general cell types produced by each biological replicate. (I) Cells derived using each differentiation method are highlighted. Colors indicate biological replicates, S1 and S2. (J) Data as in (D), highlighting expression of marker genes for mature cardiomyocytes. (K) Violin plots showing expression of marker genes for mature cardiomyocytes for each biological replicate. VCM, ventricular cardiomyocytes; ACM, atrial cardiomyocytes; SMM, smooth muscle cells; SKM, skeletal muscle cells; EPTH, epithelial cells; (P), proliferative cells.

FIG. 66—Profiling cells spontaneously differentiated from iNPs using single-cell RNA sequencing. (A) UMAP clustering of scRNA-seq data from 53,113 cells that have been spontaneously differentiated from iNPs for 8 weeks. iNPs were derived using RFX4, NFIB, ASCL1, or PAX6 with n=2 biological replicates per TF. Colors indicate Louvain clusters. (B) Dot plot showing marker genes for each cluster. Circle size indicates percentage of cells expressing the gene in the given cluster and color indicates the mean expression value. Horizontal lines distinguish between major cell types. Pro, uncommitted progenitors; RP, retinal progenitors; RPE, retinal pigment epithelium; PR, photoreceptors; RGC, retinal ganglion cells; DNP, dorsal neural progenitors; RG, radial glia; Astro, astrocytes; CN, CNS neurons; EPD, ependyma; EP, epithelial progenitors; BE, bronchial epithelium; CE, cranial epithelium; CNC, cranial neural crest; CNCP, cranial neural crest progenitors; (P), proliferative cells.

FIG. 67—Single-cell RNA sequencing comparison of spontaneously differentiated cells produced by candidate TF iNPs. (A-B) UMAP clustering of scRNA-seq data from 53,113 cells that have been spontaneously differentiated from iNPs for 8 weeks. iNPs were derived using RFX4, NFIB, ASCL1, or PAX6 with n=2 biological replicates per TF. (A) Clusters representing central nervous system (CNS) cell types highlighted. Percentage of cells that contribute to the specified CNS cell type is indicated. (B) Cells spontaneously differentiated from each candidate TF are highlighted. Colors indicate biological replicates, S1 and S2. (C) Heatmap showing the percentage of cells from each replicate that were grouped into each cluster. (D) Distribution of general cell types produced by each biological replicate. Pro, uncommitted progenitors; RP, retinal progenitors; RPE, retinal pigment epithelium; PR, photoreceptors; RGC, retinal ganglion cells; DNP, dorsal neural progenitors; RG, radial glia; Astro, astrocytes; CN, CNS neurons; EPD, ependyma; EP, epithelial progenitors; BE, bronchial epithelium; CE, cranial epithelium; CNC, cranial neural crest; CNCP, cranial neural crest progenitors; (P), proliferative cells.

FIG. 68—Profiling spontaneously differentiated neurons from iNPs by single-cell RNA sequencing and target genes of candidate TFs by ChIP-seq. (A-E) UMAP reclustering of 4,162 neurons from clusters CN 1-3 of FIG. 66A. (A-D) Marker genes for general regions of the central nervous systems (A), newborn cortical excitatory neurons (B), neuronal subtypes (C), and cortical projection neurons (D) are shown. Colors indicate gene expression. (E) Neurons spontaneously differentiated from each candidate TF are highlighted. Colors indicate biological replicates, S1 and S2. (F) Top 3 de novo or known motifs identified using HOMER motif analysis. The names of the TFs with the closest matching motifs, indicating potential cofactors of candidate TFs, and the associated P-values of enrichment are listed. (G) Heatmap showing percentage of NP-specific TFs or genes that had candidate TF ChIP peaks within 10 kb of the annotated transcriptional start site (TSS). (H-I) Overlap of NP-specific genes that had candidate TF ChIP peaks within 10 kb of the TSS and were differentially expressed (t-test q-value<0.05 with FDR correction) upon candidate TF overexpression. Genes that were shared between candidate TFs are shown in (H), with blue regions indicating overlap, and genes unique to each candidate TF are shown in (I).

FIG. 69—Combining RFX4 with dual SMAD inhibition produces homogenous iNPs. (A) Schematic for different media conditions (M1-M8) tested. SMAD inhibitors dorsomorphin (DM) and SB-431542 (SB) were added to the media at the indicated concentrations. mTeSR stem cell media was changed to different NP media (NP, EB, and DS; see Methods) over 7 days of differentiation. (B) Heatmaps showing expression of neuron marker genes TUJ1 and MAP2 relative to GAPDH control in cells from iNPs that have undergone spontaneous neurogenesis for 2 or 4 weeks. iNPs were differentiated for 5 or 7 days using each of the media conditions in (A) and seeded at low or high densities prior to spontaneous neurogenesis. Colors represent mean expression from n=4 biological replicates. (C) Same as (A), for additional media conditions tested. (D) Same as (B), for the media conditions shown in (C). (E-K) Profiling of iNPs derived using different iNP differentiation methods by scRNA-seq. RFX4-DS-iNPs were produced by combining RFX4 overexpression with dual SMAD inhibition, EB-iNPs were produced using the embryoid body protocol (8), and DS-iNPs were produced using the dual SMAD inhibition protocol (7). Data represents n=2 batch replicates per method with 15,211 RFX4-DS-iNPs, 11,148 EB-iNPs, and 16,421 DS-iNPs. (E) UMAP clustering of scRNA-seq data with colors indicating Louvain clusters. (F) Dot plot showing marker genes for each cluster. Circle size indicates percentage of cells expressing the gene in the given cluster and color indicates the mean expression value. (G-H) Box plots showing intra-(G) or inter-(H) replicate Euclidean distances between cells. Whiskers indicate the 5th and 95th percentiles. (I) Data as in (E), highlighting cells derived from each differentiation method. Colors indicate batch replicates, S1 and S2. (J) Heatmap showing the percentage of cells from each batch replicate that were grouped into each cluster. (K) Data as in (E), colored by marker gene expression. NP, neural progenitors; CN, CNS neurons; CNC, cranial neural crest.

FIG. 70—Characterization of iNPs produced by combining RFX4 with dual SMAD inhibition. ScRNA-seq profiling of 26,111 cells that have been spontaneously differentiated from iNPs. iNPs were produced by combining RFX4 overexpression with dual SMAD inhibition and spontaneously differentiated for 4 or 8 weeks. Data represents n=2 biological replicates per timepoint. (A-B) UMAP clustering of scRNA-seq data. (A) Colors indicate expression of marker genes for major cell types. (B) Cells from each time point are highlighted. Colors indicate biological replicates, S1 and S2. (C) Heatmap showing the percentage of cells from each biological replicate that were grouped into each cluster from FIG. 5D. (D-G) UMAP clustering of scRNA-seq data. Marker genes for general regions of the central nervous systems (D), radial glia subtypes (E), neuronal subtypes (F), and GABAergic interneuron subtypes (G) are shown. Colors indicate gene expression. CN, CNS neurons; RG, radial glia; MNG, meninges; (P), proliferative cells.

FIG. 71—Modeling neurodevelopmental disorders using RFX4-iNPs with DYRK1A perturbation. (A) Schematic of disease modeling by perturbing DYRK1A expression. Human induced pluripotent stem cells (iPSCs) are transduced with Cas9 and sgRNAs or ORF to knockout or overexpress DYRK1A, respectively. RFX4 is then transiently overexpressed for 1 week to differentiate iPSCs into iNPs, which then spontaneously differentiate for 8 weeks following withdrawal of dox and growth factors. Effects of DYRK1A perturbation were characterized using bulk RNA sequencing, EdU labeling, immunostaining, or electrophysiology. dox, doxycycline; EGF, epidermal growth factor; FGF, fetal growth factor. (B) Percent indels in RFX4-iNPs transduced with DYRK1A KO sgRNAs. n=3. (C) DYRK1A expression measured using qPCR probes targeting the endogenous sequence or the codon-optimized ORF sequence. n=4. (D-E) Western blot of DYRK1A at 7 days after transduction with Cas9 and DYRK1A KO sgRNAs (D) or DYRK1A ORF (E). (F-H) Volcano plots showing the number of genes that were significantly differentially expressed (t-test q-value<0.05 with FDR correction) and had an absolute log 2 fold change relative to control that was greater than 1 for DYRK1A KO sgRNA 1 (F), KO sgRNA 2 (G), and ORF (H) conditions. For a full list of genes, see Table 17. The KO sgRNAs 1 and 2 conditions were compared to both NT sgRNAs. The ORF condition was compared to GFP control. (I) Venn diagram summarizing the significantly differentially expressed genes in (F-H). (J) Heatmap of genes that were significantly differentially expressed (t-test q-value<0.5 with FDR correction) depending on the dosage of DYRK1A. Genes are annotated with broad categories of gene function relevant to neural development. n=3. (K) Representative images of MAP2 staining during spontaneous differentiation for NT sg1 and DYRK1A KO sg2. Scale bar, 100 μm. Values represent mean±SEM. sg, single guide RNA; KO, knockout; NT, nontargeting. *P<0.05; ND, not detected.

FIG. 72—Characterization of DYRK1A perturbations in RFX4-iNP differentiated neurons by electrophysiology. (A) Representative electrophysiology traces for neurons with or without evoked action potentials (AP) and spontaneous excitatory postsynaptic currents (EPSCs). (B) Proportion of neurons with or without AP and EPSCs for different DYRK1A perturbations from n=31-45 neurons. (C-D) Intrinsic membrane (C) and action potential (D) properties measured using electrophysiology for different DYRK1A perturbations from n=12-36 neurons with evoked action potentials. Values represent mean±SEM. *P<0.05.

FIG. 73—Joint profiling of chromatin accessibility and gene expression on a subset of TF ORFs. (A) Violin plots showing distribution of UMIs and genes per cell for scRNA-seq from the joint profiling dataset. (B) Violin plots showing distribution of UMIs and fraction of reads in the top 500,000 peaks per cell for scATAC-seq from the joint profiling dataset. (C) Representative fragment histogram for scATAC-seq data using the first two megabases of chromosome 1. (D) Transcriptional start site (TSS) enrichment score for scATAC-seq data. (E) RNA (left) and ATAC (right) UMAP of 69,085 cells overexpressing 198 TF isoforms. Colors indicate clusters identified by the small local moving (SLM) algorithm. (F) Distribution of cells from day 4 or day 7 of TF overexpression in each of the clusters from FIG. 5A. Clusters with >30% cells from either time point are indicated with asterisks. (G) Weighted nearest neighbor (WNN) UMAP of joint profiling data from FIG. 46A, colored by diffusion pseudotime. (H) Violin plots comparing diffusion pseudotimes of each time point. (I) Heat map showing significance of the top nominated regulators for each cluster. Top regulators were nominated by evaluating motif enrichment in ATAC peaks with significant peak-gene associations in each cluster. TFs that were identified as top ORFs and regulators are labeled in blue.

FIG. 74—Combinatorial TF screening identifies TF combinations with similar expression profiles. (A) UMAP of scRNA-seq profiles from hESCs overexpressing 57 combinations of 10 TF ORFs for 7 days. Colors indicate Louvain clusters. (B) Heat map showing percentage of cells with the indicated TF combination for each cluster. Percentages are determined by normalizing to the total number of cells with the TF ORF in the combinatorial dataset. (C) Heat map showing pairwise Pearson correlation between mean expression profiles of each TF combination. TF combinations are ordered by hierarchical clustering.

FIG. 75—Fitting expression profiles of TF combinations with linear regression. (A-C) Heat maps showing the coefficient weights (A-B) and score (C) for linear regression. Single TF expression profiles were fitted to model each measured double TF profile by performing linear regression with an interaction term on the mean expression profiles. (D) Annotated relationships for each TF combination based on the fitted linear regression coefficients. (E) Heat maps showing average expression profile of double TFs with those of respective single TFs for example combinations with annotated relationships.

FIG. 76—Predicting TF combinations using the TF Atlas. (A-F) Percent accuracy for different approaches to predict TFs for double (A-C) or triple (D-F) TF combinations. Single TF expression profiles from the TF Atlas were averaged or fitted with linear regression models against measured double or triple TF expression profiles. TF combinations were ranked by the fit to the measured combinatorial TF profile. The top combinations were evaluated for accuracy. For comparison to the single TF profiles from the combinatorial TF screen dataset, prediction accuracy for the 10 corresponding TFs from the TF Atlas are shown (A,D). To reduce the number of possible combinations, TFs were grouped into 30 (B,E) or 51 (C,F) clusters based on expression profile similarity. (G-L) Prediction results for triple TF profiles. Known combinations (G) or predicted combinations for hepatoblasts (H), bronchiolar and alveolar epithelial cells (I), metanephric cells (J), vascular endothelial cells (K), and trophoblast giant cells (L) are shown. To expand the number of known combinations, parts of known combinations with more than 3 TFs were included for ENS neurons and cardiomyocytes. TF combinations were ranked by the gene signature scores for each respective cell type. As gene signature scores were discrete, the percentile ranks were reported as ranges. For predicted combinations, TFs that are part of known combinations, developmentally critical, or specifically expressed in the target cell types are indicated in blue.

The figures herein are for illustrative purposes only and are not necessarily drawn to scale.

DETAILED DESCRIPTION OF THE EXAMPLE EMBODIMENTS General Definitions

Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. Definitions of common terms and techniques in molecular biology may be found in Molecular Cloning: A Laboratory Manual, 2nd edition (1989) (Sambrook, Fritsch, and Maniatis); Molecular Cloning: A Laboratory Manual, 4th edition (2012) (Green and Sambrook); Current Protocols in Molecular Biology (1987) (F. M. Ausubel et al. eds.); the series Methods in Enzymology (Academic Press, Inc.): PCR 2: A Practical Approach (1995) (M. J. MacPherson, B. D. Hames, and G. R. Taylor eds.): Antibodies, A Laboratory Manual (1988) (Harlow and Lane, eds.): Antibodies A Laboratory Manual, 2nd edition 2013 (E. A. Greenfield ed.); Animal Cell Culture (1987) (R. I. Freshney, ed.); Benjamin Lewin, Genes IX, published by Jones and Bartlet, 2008 (ISBN 0763752223); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0632021829); Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 9780471185710); Singleton et al., Dictionary of Microbiology and Molecular Biology 2nd ed., J. Wiley & Sons (New York, N.Y. 1994), March, Advanced Organic Chemistry Reactions, Mechanisms and Structure 4th ed., John Wiley & Sons (New York, N.Y. 1992); and Marten H. Hofker and Jan van Deursen, Transgenic Mouse Methods and Protocols, 2nd edition (2011).

As used herein, the singular forms “a”, “an”, and “the” include both singular and plural referents unless the context clearly dictates otherwise.

The term “optional” or “optionally” means that the subsequent described event, circumstance or substituent may or may not occur, and that the description includes instances where the event or circumstance occurs and instances where it does not.

The recitation of numerical ranges by endpoints includes all numbers and fractions subsumed within the respective ranges, as well as the recited endpoints.

The terms “about” or “approximately” as used herein when referring to a measurable value such as a parameter, an amount, a temporal duration, and the like, are meant to encompass variations of and from the specified value, such as variations of +/−10% or less, +/−5% or less, +/−1% or less, and +/−0.1% or less of and from the specified value, insofar such variations are appropriate to perform in the disclosed invention. It is to be understood that the value to which the modifier “about” or “approximately” refers is itself also specifically, and preferably, disclosed.

As used herein, a “biological sample” may contain whole cells and/or live cells and/or cell debris. The biological sample may contain (or be derived from) a “bodily fluid”. The present invention encompasses embodiments wherein the bodily fluid is selected from amniotic fluid, aqueous humour, vitreous humour, bile, blood serum, breast milk, cerebrospinal fluid, cerumen (earwax), chyle, chyme, endolymph, perilymph, exudates, feces, female ejaculate, gastric acid, gastric juice, lymph, mucus (including nasal drainage and phlegm), pericardial fluid, peritoneal fluid, pleural fluid, pus, rheum, saliva, sebum (skin oil), semen, sputum, synovial fluid, sweat, tears, urine, vaginal secretion, vomit and mixtures of one or more thereof. Biological samples include cell cultures, bodily fluids, cell cultures from bodily fluids. Bodily fluids may be obtained from a mammal organism, for example by puncture, or other collecting or sampling procedures.

The terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed.

The term “multiplicity of infection” (MOI) as used herein refers to the ratio of agents (e.g. vector, transcription factors) introduced to target cells (e.g. stem cell, radial glia). In certain embodiments, MOI can refer to viral vectors used to introduce an agent.

Various embodiments are described hereinafter. It should be noted that the specific embodiments are not intended as an exhaustive description or as a limitation to the broader aspects discussed herein. One aspect described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced with any other embodiment(s). Reference throughout this specification to “one embodiment”, “an embodiment,” “an example embodiment,” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment,” “in an embodiment,” or “an example embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to a person skilled in the art from this disclosure, in one or more embodiments. Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention. For example, in the appended claims, any of the claimed embodiments can be used in any combination.

All publications, published patent documents, and patent applications cited herein are hereby incorporated by reference to the same extent as though each individual publication, published patent document, or patent application was specifically and individually indicated as being incorporated by reference.

Overview

The ability to engineer cell types of interest has advanced basic research and has therapeutic potential, but is currently limited to a small number of cell types. Transcription factors (TFs) regulate gene programs, thereby controlling diverse cellular processes and cell states. Although overexpression of transcription factors (TFs) has been shown to efficiently convert one cell type to another, the process of discovering the right TFs is time-intensive and low-throughput.

The ability to engineer any cell type of interest has the potential to advance our understanding of biological processes and capability to treat disease1-5. Despite this, currently only a few cell types can be generated efficiently and consistently1,2,4,5. Overexpression of transcription factors (TFs) can be used to engineer cell fates, and TFs have been shown to rapidly and efficiently generate many different cell types, including neurons and skeletal muscle cells6-12. For example, overexpressing either NEUROD1 or NEUROG2 can efficiently and rapidly differentiate hESCs into cortical neurons (Zhang Y, et al., Rapid single-step induction of functional neurons from human pluripotent stem cells. Neuron. 2013; 78(5):785-98). As TFs use endogenous regulatory pathways to drive differentiation, mimicking natural development, this approach may produce higher fidelity models while illuminating aspects of cellular development. Although overexpression of transcription factors (TFs) has been shown to efficiently convert one cell type to another, the process of discovering TFs that can direct differentiation into a desired cell type (cellular engineering) is time-intensive and low-throughput, limiting the number of transformative TFs that have been identified. Typically, candidate TFs are overexpressed individually or in specific combinations. Cells produced from independent perturbations are evaluated for similarity with the target cell type using discrete assays. This costly and time-consuming process has restricted the TFs tested per cell type to those predicted from prior studies (5-25 TFs on average), thus limiting the number of novel TFs that have been identified for cellular engineering.

To achieve a comprehensive understanding of TFs and their respective programs, Applicants developed a platform for high-throughput, systematic TF ORF overexpression that leverages barcodes for pooled screening. Applicants created a library of all annotated human TF splice isoforms (1,836 genes encoding 3,548 isoforms) and applied it to build a TF Atlas charting expression profiles in human embryonic stem cells (hESCs) overexpressing each TF. The comprehensive TF Atlas allowed systematic investigation and generalized observations, showing that 27% of TF genes could function as “master regulators” that induce differentiation when overexpressed in hESCs. Applicants mapped TF-induced expression profiles to reference cell types and validated candidate TFs for generation of diverse cell types, spanning all three germ layers and trophoblasts. Further targeted screens with a subset of the library allowed Applicants to create a tailored cellular disease model and integrate mRNA expression and chromatin accessibility data to identify downstream regulators. Finally, Applicants predicted the effects of TF combinations, demonstrated the validity of the predictions in a combinatorial TF overexpression dataset, and showed how to predict combinations of TFs that could produce target profiles of reference cell types, reducing the combinatorial search space for experiments. The TF atlas provides a comprehensive overview of gene regulatory networks and a roadmap for further understanding developmental trajectories and guiding cellular engineering efforts.

Applicants also provide different selection methods to enrich for expression of different numbers of marker genes that define the target cell type (reporter assay, Flow-FISH, and scRNA-seq).

Applicants applied the library to differentiation of human embryonic stem cells (hESCs) into neural progenitors (NPs). Applicants identified four TFs (RFX4, NFIB, PAX6, and ASCL1) that produced induced NPs (iNPs) that spontaneously differentiate into an array of central nervous system (CNS) cell types. RFX4-iNPs gave rise to the highest proportion of CNS cell types and, when combined with dual SMAD inhibition, produced iNPs at >98% purity that differentiated into predominantly GABAergic neurons, opening up new avenues for studying this cell type.

In an exemplary case, 90 TF isoforms specifically expressed in a selected target cell type (neural progenitors) were selected using available expression data (Camp et al., 2015; Johnson et al., 2015; Llorens-Bobadilla et al., 2015; Pollen et al., 2015; Shin et al., 2015; Thomsen et al., 2016; Wu et al., 2010; Zhang et al., 2016) for screening neural progenitors (NPs). Applicants chose to differentiate hESCs into induced NPs (iNPs) because NPs are born early during development, and therefore overexpression of a single TF may be sufficient to differentiate NPs from hESCs, though none have been identified. In addition, current methods for producing NPs, embryoid body formation13 or dual SMAD inhibition14, are either low-throughput or produce variable differentiation results depending on the cell line15, respectively. Through pooled screening of 90 TF isoforms, Applicants found four novel TFs (RFX4, NFIB, PAX6, and ASCL1), each of which can produce functional iNPs within 1 week. The iNPs resemble the morphology, transcriptome signature, and functional capabilities of human fetal NPs. Applicants then applied the iNPs to model neurodevelopmental disorders. These results collectively demonstrate the feasibility of using pooled TF screening to produce a diverse array of cell types that could be tailored for specific applications.

Notably, although RFX4 has not been extensively studied in neural development, RFX4-derived iNPs spontaneously differentiated into the highest proportion of cell types in the central nervous system (CNS), highlighting the importance of performing unbiased TF screens. Applicants demonstrated that RFX4-derived iNPs can be used to model neurodevelopmental disorders. Applicants also identified transcription factors capable of differentiating stem cells into cardiomyocytes. The TF screening platform provides a generalizable approach for cellular programming that could expand our ability to generate desired cell types and elucidate the complex TF regulatory networks that govern cell type specification.

Embodiments disclosed herein provide for a screening platform and methods of screening for transcription factors (TFs) that drive differentiation of stem cells into target cell types. The stem cells may be induced pluripotent stem cells (also known as iPS cells or iPSCs). The iPSCs may be patient derived.

Embodiments disclosed herein also provide for a screening platform and methods of screening for transcription factors that drive transdifferentiation of cells into target cell types. In certain embodiments, transcription factors that differentiate stem cells into a target cell (e.g., progenitor cell) can be used to transdifferentiate cells of a different lineage to target cells. In certain embodiments, TFs that are expressed in progenitor cells can be used to transdifferentiate cells of one lineage into a target cell of a different lineage.

Embodiments disclosed herein also provide also provide for high throughput screening methods for identifying transcription factors that enhance or suppress tumor growth. In certain embodiments, a barcoded transcription factor library is introduced to a cancer cell line. After growing the cancer cell line (e.g., 2 weeks) the barcodes are sequenced and enriched and depleted barcodes are identified as compared to the barcodes present in the initial library. Enriched barcodes may indicate transcription factors that enhance tumor growth and depleted barcodes may indicate transcription factors that suppress tumor growth.

In certain embodiments, the screening platform is a high-throughput multiplex screening platform.

Embodiments disclosed herein also provide for methods of using transcription factors to drive differentiation of stem cells (e.g., iPSCs or hESCs) into target cell types (e.g., neural cell types, cardiomyocytes), providing a road map for the development of an array of in vitro human models (e.g., brain) that can be tailored for specific applications. Embodiments disclosed herein also provide for in vitro models of in vivo cell types for use in modelling development and disease. In certain embodiments, target cell types can be transferred to a subject in need thereof to regenerate a diseased or damaged tissue.

Embodiments disclosed herein also provide differentiating or transdifferentiating cells into target cells in vivo by targeted modulation of transcription factors or downstream targets. In certain embodiments, the targeted modulation of transcription factors can be used to regenerate, replenish or replace damaged or diseased cells in a subject in need thereof (e.g., heart cells, pancreatic β cells, eye cells, nervous system cells).

Embodiments disclosed herein also provide for modulating transcription factors that enhance tumor growth or that suppress tumor growth. In certain embodiments, transcription factors are modulated in a treatment regimen in a subject suffering from cancer. In certain embodiments, the treatment is targeted to tumors or sites of tumors.

Many methods of modulating transcription factors may be used. In certain embodiments, the activity of transcription factors can be enhanced (e.g., by modulation of TF phosphorylation sites). In certain embodiments, TFs are overexpressed. In certain embodiments, agents capable of enhancing expression or activity of transcription factors are used. In certain embodiments, agents capable of reducing expression or activity of transcription factors are used.

Applicants provide further examples of the screening methods to identify transcription factors required for differentiation of hESCs into radial glia, neural progenitors in the developing central nervous system that are capable of differentiating into neurons, astrocytes, and oligodendrocytes. Applicants further identify TFs required for differentiation of hESCs into cardiomyocytes. The present invention also advantageously provides for high-throughput methods of screening.

Applicants identified TFs that can differentiate hESCs into radial glia. Additionally, these candidate TFs can advantageously be applied to a high-throughput screening platform for identifying TFs that direct differentiation into specific cell types of interest (e.g., interneurons, pyramidal neurons, and oligodendrocytes). The screen can advantageously be used to identify TFs that differentiate radial glia into astrocytes. The screening platform can advance understanding of gene regulation in neural development and provide robust, scalable cellular models for studying the brain.

Finally, the methods of differentiation using the identified transcription factors can advantageously produce homogenous populations of target cells (e.g., neural progenitor cell populations).

Screening Platforms

In certain embodiments, the present invention provides a screening platform for systematically identifying transcription factors (TFs) that drive differentiation of cells (e.g., pluripotent, stem cells, progenitor cells) into target cell types (e.g., neural cells, muscle cells, endocrine cells). In certain embodiments, the screening platform comprises pluripotent cells that are differentiated into target cells by overexpressing a plurality of transcription factors in the pluripotent cells. Over expression of transcription factors may be performed according to any method known in the art (e.g., introducing a vector encoding the transcription factor, introducing an agent capable of inducing expression of the endogenous gene, as described further herein). The screening platforms can provide a framework for the development of an array of in vitro human models that can be tailored for specific applications described herein. Further, the screening platform can be used to generate a transcription factor atlas, such that differential gene expression in cells differentiated using each individual transcription factor is identified. Thus, the atlas can be used to group TFs based on gene expression and to identify TFs for each target cell type. The gene expression profile generated by overexpressing single TFs in the TF Atlas can be used to predict expression profiles produced by overexpressing TF combinations (discussed further herein).

In certain embodiments, transcription factors may be selected for screening based on expression of the transcription factors in the target cell types or in progenitor cells for the target cell types. Non-limiting examples of transcription factors may be found in Tables 1, 3, 4 and 5. Cell type specific transcription factors are known in the art. Additionally, expression of transcription factors in a target cell type can be determined experimentally (e.g., by RNA sequencing).

An exemplary screening platform comprises one or more populations of pluripotent cells, a means to over express one or more transcription factors in the one or more populations of cells, and a means to identify target cells after differentiation of the cells. Each population of pluripotent cells may express a different transcription factor.

Pooled Screening Platforms

In certain embodiments, TFs are screened for differentiation of stem cells into a target cell in a pooled screen, such that a library of transcription factors are introduced to a single population of stem cells and transcription factors able to differentiate the stem cells are identified. In certain embodiments, transcription factors are introduced such that each cell receives no more than one transcription factor or are introduced such that single cells receive one or more transcription factors (e.g., 2, 3, 4, 5 transcription factors). In certain embodiments, the pooled screening platform can be used to identify combinations of transcription factors required for differentiation into a target cell type.

An exemplary pooled screening platform comprises a single population of pluripotent cells, a means to over express one or more transcription factors in one or more cells in the population of cells, and a high throughput means to identify target cells (e.g., microscopy, FACS, Flow-FISH, single cell RNA-seq, or reporter gene) and the over expressed transcription factor introduced to generate the target cells (e.g., barcode). Each pluripotent cell in the pool may express a different transcription factor or combination of transcription factors.

In certain embodiments, barcodes are used to identify the transcription factor or modulating agent for the transcription factor introduced to a cell or population of cells. In certain embodiments, stem cells differentiated into target cells are enriched (e.g., sorted) and the barcodes identified in the enriched cells indicate the transcription factors introduced. Thus, transcription factors may be identified by determining the enrichment of barcodes in cells differentiated into target cells compared to barcodes in the starting library.

Nucleic acid barcode or barcode refer to a short sequence of nucleotides (for example, DNA or RNA) that is used as an identifier for an associated molecule, such as a target molecule and/or target nucleic acid (e.g., transcription factor). A nucleic acid barcode can have a length of at least, for example, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, or 100 nucleotides and can be in single- or double-stranded form. In certain embodiments, the barcode is configured for amplification and subsequent sequencing. In certain embodiments, the barcode is expressed as a transcript (e.g., poly A tailed transcript) that can be identified using a method of RNA sequencing as described further herein. In certain embodiments barcoding uses an error correcting scheme (T. K. Moon, Error Correction Coding: Mathematical Methods and Algorithms (Wiley, New York, ed. 1, 2005)).

Pluripotent Cells Stem Cells

Pluripotent cells may include any mammalian stem cell. As used herein, the term “stem cell” refers to a multipotent cell having the capacity to self-renew and to differentiate into multiple cell lineages. Mammalian stem cells may include, but are not limited to, embryonic stem cells of various types, such as murine embryonic stem cells, e.g., as described by Evans & Kaufman 1981 (Nature 292: 154-6) and Martin 1981 (PNAS 78: 7634-8); rat pluripotent stem cells, e.g., as described by lannaccone et al. 1994 (Dev Biol 163: 288-292); hamster embryonic stem cells, e.g., as described by Doetschman et al. 1988 (Dev Biol 127: 224-227); rabbit embryonic stem cells, e.g., as described by Graves et al. 1993 (Mol Reprod Dev 36: 424-433); porcine pluripotent stem cells, e.g., as described by Notarianni et al. 1991 (J Reprod Fertil Suppl 43: 255-60) and Wheeler 1994 (Reprod Fertil Dev 6: 563-8); sheep embryonic stem cells, e.g., as described by Notarianni et al. 1991 (supra); bovine embryonic stem cells, e.g., as described by Roach et al. 2006 (Methods Enzymol 418: 21-37); human embryonic stem (hES) cells, e.g., as described by Thomson et al. 1998 (Science 282: 1 145-1 147); human embryonic germ (hEG) cells, e.g., as described by Shamblott et al. 1998 (PNAS 95: 13726); embryonic stem cells from other primates such as Rhesus stem cells, e.g., as described by Thomson et al. 1995 (PNAS 92:7844-7848) or marmoset stem cells, e.g., as described by Thomson et al. 1996 (Biol Reprod 55: 254-259). In certain embodiments, the pluripotent cells may include, but are not limited to lymphoid stem cells, myeloid stem cells, neural stem cells, skeletal muscle satellite cells, epithelial stem cells, endodermal and neuroectodermal stem cells, germ cells, extraembryonic and embryonic stem cells, mesenchymal stem cells, intestinal stem cells, embryonic stem cells, and induced pluripotent stem cells (iPSCs).

As noted, prototype “human ES cells” are described by Thomson et al. 1998 (supra) and in U.S. Pat. No. 6,200,806. The scope of the term covers pluripotent stem cells that are derived from a human embryo at the blastocyst stage, or before substantial differentiation of the cells into the three germ layers. ES cells, in particular hES cells, are typically derived from the inner cell mass of blastocysts or from whole blastocysts. Derivation of hES cell lines from the morula stage has been documented and ES cells so obtained can also be used in the invention (Strelchenko et al. 2004. Reproductive BioMedicine Online 9: 623-629). As noted, prototype “human EG cells” are described by Shamblott et al. 1998 (supra). Such cells may be derived, e.g., from gonadal ridges and mesenteries containing primordial germ cells from fetuses. In humans, the fetuses may be typically 5-11 weeks post-fertilization.

In certain embodiments, mouse embryonic stem cells are used. In certain embodiments, mouse embryonic stem cells differentiated into a target cell may be transferred to a mouse to perform in vivo functional studies.

Human embryonic stem cells may include, but are not limited to the HUES66, HUES64, HUES3, HUES8, HUES53, HUES28, HUES49, HUES9, HUES48, HUES45, HUES1, HUES44, HUES6, H1, HUES62, HUES65, H7, HUES13, H9, and HUES63 cell lines. In certain embodiments, the stem cell is a human induced pluripotent stem cell (iPSC). In certain embodiments, the human iPSC is selected from the group consisting of 11a, PGP1, GM08330 (also known as GM8330-8), and Mito 210.

General techniques useful in the practice of this invention in cell culture and media uses are known in the art (e.g., Large Scale Mammalian Cell Culture (Hu et al. 1997. Curr Opin Biotechnol 8: 148); Serum-free Media (K. Kitano. 1991. Biotechnology 17: 73); or Large Scale Mammalian Cell Culture (Curr Opin Biotechnol 2: 375, 1991). The terms “culturing” or “cell culture” are common in the art and broadly refer to maintenance of cells and potentially expansion (proliferation, propagation) of cells in vitro. Typically, animal cells, such as mammalian cells, such as human cells, are cultured by exposing them to (i.e., contacting them with) a suitable cell culture medium in a vessel or container adequate for the purpose (e.g., a 96-, 24-, or 6-well plate, a T-25, T-75, T-150 or T-225 flask, or a cell factory), at art-known conditions conducive to in vitro cell culture, such as temperature of 37° C., 5% v/v CO2 and >95% humidity.

Methods related to culturing stem cells are also useful in the practice of this invention (see, e.g., “Teratocarcinomas and embryonic stem cells: A practical approach” (E. J. Robertson, ed., IRL Press Ltd. 1987); “Guide to Techniques in Mouse Development” (P. M. Wasserman et al. eds., Academic Press 1993); “Embryonic Stem Cells: Methods and Protocols” (Kursad Turksen, ed., Humana Press, Totowa N.J., 2001); “Embryonic Stem Cell Differentiation in vitro” (M. V. Wiles, Meth. Enzymol. 225: 900, 1993); “Properties and uses of Embryonic Stem Cells: Prospects for Application to Human Biology and Gene Therapy” (P. D. Rathjen et al., al., 1993). Differentiation of stem cells is reviewed, e.g., in Robertson. 1997. Meth Cell Biol 75: 173; Roach and McNeish. 2002. Methods Mol Biol 185: 1-16; and Pedersen. 1998. Reprod Fertil Dev 10: 31). For further elaboration of general techniques useful in the practice of this invention, the practitioner can refer to standard textbooks and reviews in cell biology, tissue culture, and embryology (see, e.g., Culture of Human Stem Cells (R. Ian Freshney, Glyn N. Stacey, Jonathan M. Auerbach—2007); Protocols for Neural Cell Culture (Laurie C. Doering—2009); Neural Stem Cell Assays (Navjot Kaur, Mohan C. Vemuri—2015); Working with Stem Cells (Henning Ulrich, Priscilla Davidson Negraes—2016); and Biomaterials as Stem Cell Niche (Krishnendu Roy—2010)). In certain embodiments, stem cells are spontaneously differentiated or directed to differentiate (see, e.g., Amit and Itskovitz-Eldor, Derivation and spontaneous differentiation of human embryonic stem cells, J Anat. 2002 March; 200(3): 225-232). For further methods of cell culture solutions and systems, see International Patent Publication No. WO 2014/159356A1.

Induced Pluripotent Cells

In certain embodiments, iPSCs or iPSC cell lines are used to identify transcription factors for differentiation of target cells. iPSCs advantageously can be used to generate patient specific models and cell types. iPSCs are a type of pluripotent stem cell that can be generated directly from adult cells. Further, because embryonic stem cells can only be derived from embryos, it has so far not been feasible to create patient-matched embryonic stem cell lines.

Various strategies can be used to induce pluripotency, or increase potency, in cells (Takahashi, K., and Yamanaka, S., Cell 126, 663-676 (2006); Takahashi et al., Cell 131, 861-872 (2007); Yu et al., Science 318, 1917-1920 (2007); Zhou et al., Cell Stem Cell 4, 381-384 (2009); Kim et al., Cell Stem Cell 4, 472-476 (2009); Yamanaka et al., 2009; Saha, K., Jaenisch, R., Cell Stem Cell 5, 584-595 (2009)), and improve the efficiency of reprogramming (Shi et al., Cell Stem Cell 2, 525 20 528 (2008a); Shi et al., Cell Stem Cell 3, 568-574 (2008b); Huangfu et al., Nat Biotechnol 26, 795-797 (2008a); Huangfu et al., Nat Biotechnol 26, 1269-1275 (2008b); Silva et al., Plos Bio 6, e253. doi: 10.1371/journal. pbio. 0060253 (2008); Lyssiotis et al., PNAS 106, 8912-8917 (2009); Ichida et al., Cell Stem Cell 5, 491-503 (2009); Maherali, N., Hochedlinger, K., Curr Biol 19, 1718-1723 (2009b); Esteban et 25 al., Cell Stem Cell 6, 71-79 (2010); and Feng et al., Cell Stem Cell 4, 301-3 12 (2009)).

Generally, techniques for reprogramming involve modulation of specific cellular pathways, either directly or indirectly, using polynucleotide-, polypeptide and/or small molecule-based approaches (see, e.g., International Patent Publication No. WO 2012/087965A2). The developmental potency of a cell may be increased, for example, by contacting a cell with one or more pluripotency factors. “Contacting”, as used herein, can involve culturing cells in the presence of a pluripotency factor (such as, for example, small molecules, proteins, peptides, etc.) or introducing pluripotency factors into the cell. Pluripotency factors can be introduced into cells by culturing the cells in the presence of the factor, including transcription factors such as proteins, under conditions that allow for introduction of the transcription factor into the cell. See, e.g., Zhou H et al., Cell Stem Cell. 2009 May 8; 4(5):381-4; International Patent Publication No. WO 2009/117439. Introduction into the cell may be facilitated, for example, using transient methods, e.g., protein transduction, microinjection, non-integrating gene delivery, mRNA transduction, etc., or any other suitable technique. In some embodiments, the transcription factors are introduced into the cells by expression from a recombinant vector that has been introduced into the cell, or by incubating the cells in the presence of exogenous transcription factor polypeptides such that the polypeptides enter the cell. In particular embodiments, the pluripotency factor is a transcription factor. Exemplary transcription factors that are associated with increasing, establishing, or maintaining the potency of a cell include, but are not limited to Oct-3/4, Cdx-2, 15 Gbx2, Gsh1, HesX1, HoxA10, HoxA 11, HoxB1, Irx2, Isl1, Meis1, Meox2, Nanog, Nkx2.2, Onecut, Otx1, Oxt2, Pax5, Pax6, Pdx1, Tcf1, Tcf2, Zfhxlb, Klf-4, Atbf1, Esrb, Genf, Jarid2, Jmjd1a, Jmjd2c, Klf-3, Klf-5, Mel-18, Myst3, Nac1, REST, Rex-i, Rybp, Sall4, Sall1, Tif1, YY1, Zeb2, Zfp281, Zfp57, Zic3, Coup-Tf1, Coup-Tf2, Bmi1, Rnf2, Mta1, Pias1, Pias2, Pias3, Piasy, Sox2, Lef1, Sox15, Sox6, Tcf-7, Tcf7ll, c-Myc, L-Myc, N-Myc, Hand1, Mad1, Mad3, Mad4, Mxi1, Myf5, Neurog2, Ngn3, Olig2, Tcf3, Tcf4, Foxc1, Foxd3, BAF155, C/EBPP, mafa, Eomes, Tbx-3; Rfx4, Stat3, Stella, and UTF-1. Exemplary transcription factors include Oct4, Sox2, Klf4, c-Myc, and Nanog.

Small molecule reprogramming agents are also pluripotency factors and may also be employed in the methods of the invention for inducing reprogramming and maintaining or increasing cell potency. In some embodiments of the invention, one or more small molecule reprogramming agents are used to induce pluripotency of a somatic cell, increase or maintain the potency of a cell, or improve the efficiency of reprogramming. In some embodiments, small molecule reprogramming agents are employed in the methods of the invention to improve the efficiency of reprogramming. Improvements in efficiency of reprogramming can be measured by (1) a decrease in the time required for reprogramming and generation of pluripotent cells (e.g., by shortening the time to generate pluripotent cells by at least a day compared to a similar or same process without the small molecule), or alternatively, or in combination, (2) an increase in the number of pluripotent cells generated by a particular process (e.g., increasing the number of cells reprogrammed in a given time period by at least 10%, 30%, 50%, 100%, 200%, 500%, etc. compared to a similar or same process without the small molecule). In some embodiments, a 2-fold to 20-fold improvement in reprogramming efficiency is observed. In some embodiments, reprogramming efficiency is improved by more than 20 fold. In some embodiments, a more than 100 fold improvement in efficiency is observed over the method without the small molecule reprogramming agent (e.g., a more than 100 fold increase in the number of pluripotent cells generated). Several classes of small molecule reprogramming agents may be important to increasing, establishing, and/or maintaining the potency of a cell. Exemplary small molecule reprogramming agents include, but are not limited to: agents that inhibit H3K9 methylation or promote H3K9 demethylation; agents that inhibit H3K4 demethylation or promotes H3K4 methylation; agents that inhibit histone deacetylation or promote histone acetylation; L-type Ca channel agonists; activators of the cAMP pathway; DNA methyltransferase (DNMT) inhibitors; nuclear receptor ligands; GSK3 inhibitors; MEK inhibitors; TGFP receptor/ALK5 inhibitors; HDAC inhibitors; Erk inhibitors; ROCK inhibitors; FGFR inhibitors; and PARP inhibitors. Exemplary small molecule reprogramming agents include GSK3 inhibitors; MEK inhibitors; TGFP receptor/ALK5 inhibitors; HDAC inhibitors; Erk inhibitors; and ROCK inhibitors.

In some embodiments of the invention, small molecule reprogramming agents are used to replace one or more transcription factors in the methods of the invention to induce pluripotency, improve the efficiency of reprogramming, and/or increase or maintain the potency of a cell. For example, in some embodiments, a cell is contacted with one or more small molecule reprogramming agents, wherein the agents are included in an amount sufficient to improve the efficiency of reprogramming. In other embodiments, one or more small molecule reprogramming agents are used in addition to transcription factors in the methods of the invention. In one embodiment, a cell is contacted with at least one pluripotency transcription factor and at least one small molecule reprogramming agent under conditions to increase, establish, and/or maintain the potency of the cell or improve the efficiency of the reprogramming process. In another embodiment, a cell is contacted with at least one pluripotency transcription factor and at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine, or at least ten small molecule reprogramming agents under conditions and for a time sufficient to increase, establish, and/or maintain the potency of the cell or improve the efficiency of reprogramming. The state of potency or differentiation of cells can be assessed by monitoring the pluripotency characteristics (e.g., expression of markers including, but not limited to SSEA-3, SSEA-4, TRA-1-60, TRA-1-81, TRA-2-49/6E, Oct-3/4, Sox2, Nanog, GDF3, REX1, FGF4, ESG1, DPPA2, DPPA4, and hTERT).

Introducing Transcription Factors

In certain embodiments, the screening platform may comprise an open reading frame (ORF) or cDNA encoding each transcription factor used in the screen (as used herein cDNA or ORF may be used interchangeably). A cDNA may be synthesized and cloned into a vector. A plurality of cDNAs may be cloned into a library of vectors, such that each transcription factor is represented in the library. Representative transcription factor libraries are known in the art (see, e.g., Yang et al., 2011, A public genome-scale lentiviral expression library of human ORFs Nature Methods 8, 659-66; and portals.broadinstitute.org/gpp/public/).

In certain embodiments, the screening platform may comprise an agent capable of overexpressing or modulating activity of endogenous transcription factors. In certain embodiments, the agent may be a CRISPR system. In certain embodiments, pluripotent cells are differentiated into target cells by introducing a CRISPR system targeting the endogenous loci encoding the transcription factors. In certain embodiments, the CRISPR system comprises a functional domain that is targeted to the endogenous loci encoding the transcription factors. The functional domain may be a transcriptional activator or repressor (see, e.g., Konermann et al. “Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex” Nature. 2014 Dec. 10. doi: 10.1038/nature14136; Qi, L. S., et al. (2013). “Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression”. Cell. 152 (5): 1173-83; and Gilbert, L. A., et al., (2013). “CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes”. Cell. 154 (2): 442-51). In certain embodiments, a functional domain is targeted to a genomic locus encoding a transcription factor using a guide sequence that includes one or more aptamer sequences. In particular embodiments, this is ensured by the use of adaptor protein/aptamer combinations that exist within the diversity of bacteriophage coat proteins. Examples of such coat proteins include but are not limited to: MS2, PP7, Qβ, F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, ϕCb5, ϕCb8r, ϕCb12r, ϕCb23r, 7s and PRR1. In particular embodiments, the aptamer is a minimal hairpin aptamer which selectively binds dimerized MS2 bacteriophage coat proteins in mammalian cells and is introduced into the guide molecule, such as in the stemloop and/or in a tetraloop. In these embodiments, the functional domain is fused to MS2 (see, e.g., Konermann et al., Nature 2015, 517(7536): 583-588).

In certain embodiments, the arrayed screening platform can utilize multiwell plates to introduce individual transcription factors or an agent capable of modulating said transcription factors to populations of pluripotent cells. As used throughout the specification, reference to introducing transcription factors can refer to overexpressing the transcription factor from a vector or introducing an agent capable of modulating said transcription factor (e.g., CRISPR system targeting the transcription factor). Thus, each well of the multiwell plate may be configured for overexpression of a single transcription factor or combination of multiple transcription factors.

In certain embodiments, transcription factors may be introduced to individual cells by nanowires (see e.g., Shalek et al., Vertical silicon nanowires as a universal platform for delivering biomolecules into living cells, PNAS, Volume 107, Issue 1870 February, 2010). This modality enables one to assess the phenotypic consequences of introducing a broad range of biological effectors (DNAs, RNAs, peptides, proteins, and small molecules) into almost any cell type. In certain embodiments, the nanowires may be configured on a microarray format. In certain embodiments, the microarray may be configured for overexpressing transcription factors in a site-specific fashion. In certain embodiments, the array may be coupled with live-cell imaging.

Vectors

In certain embodiments, vectors are used to overexpress or modulate expression of transcription factors. Vectors for introducing CRISPR systems are described further herein.

The term “vector” generally denotes a tool that allows or facilitates the transfer of an entity from one environment to another. More particularly, the term “vector” as used throughout this specification refers to nucleic acid molecules to which nucleic acid fragments (cDNA) may be inserted and cloned, i.e., propagated. Hence, a vector is typically a replicon, into which another nucleic acid segment may be inserted, such as to bring about the replication of the inserted segment in a defined host cell or vehicle organism.

A vector thus typically contains an origin of replication and other entities necessary for replication and/or maintenance in a host cell. A vector may typically contain one or more unique restriction sites allowing for insertion of nucleic acid fragments. A vector may also preferably contain a selection marker, such as, e.g., an antibiotic resistance gene or auxotrophic gene (e.g., URA3, which encodes an enzyme necessary for uracil biosynthesis or TRP1, which encodes an enzyme required for tryptophan biosynthesis), to allow selection of recipient cells that contain the vector. Vectors include, but are not limited to, nucleic acid molecules that are single-stranded, double-stranded, or partially double-stranded; nucleic acid molecules that comprise one or more free ends, no free ends (e.g., circular); nucleic acid molecules that comprise DNA, RNA, or both; and other varieties of polynucleotides known in the art.

Expression vectors are generally configured to allow for and/or effect the expression of nucleic acids (e.g., cDNA, CRISPR system) introduced thereto in a desired expression system, e.g., in vitro, in a host cell, host organ and/or host organism. For example, the vector can express nucleic acids functionally or operatively linked to regulatory element(s) and hence the regulatory element(s) drive expression. The promoter(s) can be constitutive promoter(s) and/or conditional promoter(s) and/or inducible promoter(s). In certain embodiments, the vectors comprise regulatory sequences for inducible expression of cDNAs encoding transcription factors. Thus, expression of the transcription factors in cells can induced at particular time points after introducing the vectors. Inducible expression systems are known in the art and may include, for example, Tet on/off systems (see, e.g., Gossen et al., Transcriptional activation by tetracyclines in mammalian cells. Science. 1995 Jun. 23; 268(5218):1766-9).

In certain example embodiments, the vectors disclosed herein may further encode an epitope tag in frame with the transcription factors for use in downstream assessment of protein expression and TF abundance in cell populations respectively. Epitope tags provide high sensitivity and specificity in detection by specific antigen binding molecules (e.g., antibodies, aptamers). Exemplary epitope tags include, but are not limited to, Flag, CBP, GST, HA, HBH, MBP, Myc, polyHis, S-tag, SUMO, TAP, TRX, or V5.

Vectors may include, without limitation, plasmids (which refer to circular double stranded DNA loops which, in their vector form are not bound to the chromosome), episomes, phagemids, bacteriophages, bacteriophage-derived vectors, bacterial artificial chromosomes (BAC), yeast artificial chromosomes (YAC), P1-derived artificial chromosomes (PAC), transposons, cosmids, linear nucleic acids, viral vectors, etc., as appropriate. A vector can be a DNA or RNA vector. A vector can be a self-replicating extrachromosomal vector or a vector which integrates into a host genome, hence, vectors can be autonomous or integrative.

The term “viral vectors” refers to the use as viruses, or virus-associated vectors as carriers of the nucleic acid construct into the cell. Constructs may be integrated and packaged into non-replicating, defective viral genomes like adenovirus, adeno-associated virus (AAV), or herpes simplex virus (HSV) or others, including retroviral and lentiviral vectors, for infection or transduction into cells. The vector may or may not be incorporated into the cell's genome. The constructs may include viral sequences for transfection, if desired. Alternatively, the construct may be incorporated into vectors capable of episomal replication, e.g., EPV and EBV vectors.

Methods for introducing nucleic acids, including vectors, expression cassettes and expression vectors, into cells (e.g., transfection, transduction or transformation) are known to the person skilled in the art, and may include calcium phosphate co-precipitation, electroporation, micro-injection, protoplast fusion, lipofection, exosome-mediated transfection, transfection employing polyamine transfection reagents, bombardment of cells by nucleic acid-coated tungsten micro projectiles, viral particle delivery, etc.

Identification of Target Cells

In certain embodiments, differentiation of pluripotent cells is monitored. In certain embodiments, differentiation of pluripotent cells is monitored by microscopy. The screening method may further be combined with live cell imaging to monitor differentiation upon overexpression of transcription factors. The screening method may also be combined with FACS or ELISA assays to determine cells expressing markers specific for differentiated cell types. Additionally, methods of detecting target cell specific markers may include detecting reporter genes linked to marker genes, FISH, Flow-FISH, RNA sequencing, single cell RNA sequencing, quantitative RT-PCR, or western blot. In preferred embodiments, a pooled screen uses three different selection methods to enrich for cells that express one or more marker genes that define the target cell type; reporter assay, Flow-FISH, and scRNA-seq. In preferred embodiments, each transcription factor is associated with a unique barcode sequence that can be detected using sequencing.

Reporter Genes

In certain embodiments, differentiated target cells can be identified and enriched from a pool of cells using a detectable marker (i.e., high throughput means to identify target cells). In certain embodiments, the pooled screening platform uses detectable markers associated with marker genes specific to target cells to identify transcription factors.

In certain embodiments, the detectable marker is integrated into a genomic locus in the pool of cells such that the detectable marker is under control of the regulatory sequences for a target cell specific marker gene. In other words, a polynucleotide sequence encoding a detectable marker is integrated into a genomic locus encoding a marker gene, such that the marker gene and detectable marker are under control of the regulatory sequences for the marker gene and upon activation of the marker gene the detectable marker is co-expressed. In certain embodiments, the marker gene and detectable marker are expressed as separate proteins to avoid the detectable marker from interfering with proper protein folding and function of the marker gene. Thus, the detectable marker can be used to monitor activation of the marker gene to indicate differentiation into a target cell type. Thus, the present invention also provides for a population of pluripotent cells comprising a detectable marker integrated into an endogenous marker gene specific for a target cell.

Integration of the detectable marker gene at a genomic locus can be performed using known methods in the art. In certain embodiments, a donor construct is used to integrate a polynucleotide sequence encoding the detectable marker. In certain embodiments, the donor construct may comprise a nucleotide sequence encoding: a detectable marker, and optionally, a resistance gene operably linked to a separate regulatory sequence. Cells having the donor construct integrated can be selected based on fluorescence of the detectable marker. Cells having the donor construct integrated can be selected based on selection of cells expressing the resistance gene. The cells can be further selected by determining the integration site of the donor construct.

Selectable markers are known in the art and enable screening for targeted integrations. Examples of selectable markers include, but are not limited to, antibiotic resistance genes, such as beta-lactamase, neo, FabI, URA3, cam, tet, blasticidin, hyg, puromycin and the like. A selectable marker useful in accordance with the invention may be any selectable marker appropriate for use in a eukaryotic cell, such as a mammalian cell, or more specifically a human cell. One of skill in the art will understand and be able to identify and use selectable markers in accordance with the invention.

In certain embodiments, the donor construct is a plasmid, vector, PCR product, or synthesized polynucleotide sequence. In certain embodiments, the donor construct is modified to increase stability or to increase efficiency of integration into a genomic locus. In certain embodiments, the donor construct is modified by a 5′ and/or 3′ phosphorylation modification. In certain embodiments, the donor construct is modified by one or more internal or terminal PTO modifications. Phosphorothioate (PTO) modifications are used to generate nuclease resistant oligonucleotides. In PTO oligonucleotides, a non-bridging oxygen is replaced by a sulfur atom. Therefore, PTOs are also known as “S-oligos”. Phosphorothioate can be introduced to an oligonucleotide at the 5′- or 3′-end to inhibits exonuclease degradation and internally to limit the attack by endonucleases. In certain embodiments, the donor construct is obtained using PCR amplification and the 5′ phosphorylation is introduced using 5′ phosphorylated primers.

In certain embodiments, a genetic modifying agent is used to target the donor construct sequence to the correct genomic location (e.g., CRISPR, TALEN, Zinc finger protein, meganuclease).

In certain embodiments, a method of tagging genes in cells uses a donor template having homology arms that can be integrated at a target locus in the genome of a cell using homology dependent based repair mechanisms. In certain embodiments, a method of tagging genes in cells uses a generic donor template that can be integrated at any target locus in the genome of a cell using homology independent based repair mechanisms. In certain embodiments, gene tagging uses a CRISPR system. In certain embodiments, gene tagging uses a system that alleviates the need for homology templates. Previous reports using zinc-finger nucleases, TALE effector nucleases or CRISPR-Cas9 technology have shown that plasmids containing an endonuclease cleavage site can be integrated in a homology-independent manner and any of these methods may be used for constructing the tagged pluripotent population of cells of the present invention (see, e.g., Lackner, D. H. et al. A generic strategy for CRISPR-Cas9-mediated gene tagging. Nat. Commun. 6:10237 doi: 10.1038/ncomms10237 (2015); Auer, et al., Highly efficient CRISPR/Cas9-mediated knock-in in zebrafish by homology-independent DNA repair. Genome Res. 24, 142-153 (2014); Maresca, et al., Obligate ligation-gated recombination (ObLiGaRe): custom-designed nuclease-mediated targeted integration through nonhomologous end joining. Genome Res. 23, 539-546 (2013); and Cristea, S. et al., In vivo cleavage of transgene donors promotes nuclease-mediated targeted integration. Biotechnol. Bioeng. 110, 871-880 (2013)).

In certain embodiments, cells are tagged by introducing a ribonucleoprotein complex (RNP) comprising a donor sequence, guide sequences targeting a genomic locus and a CRISPR system. Delivery of CRISPR RNP complexes is described further herein. For example, the RNP complexes may be delivered to a population of cells by transfection.

In certain embodiments, the detectable marker is integrated downstream of the marker gene. In certain embodiments, the detectable marker is integrated upstream of the marker gene.

In certain embodiments, the detectable marker is separated from the marker gene by a ribosomal skipping site. Ribosomal ‘skipping’ refers to generating more than one protein during translation where a specific sequence in the nascent peptide chain prevents the ribosome from creating the peptide bond with the next proline. Translation continues and gives rise to a second chain. This mechanism results in apparent co-translational cleavage of the polyprotein. This process is induced by a ‘2A-like’, or CHYSEL (cis-acting hydrolase element) sequence. In other words, a normal peptide bond is impaired at the site, resulting in two discontinuous protein fragments from one translation event.

In certain embodiments, the detectable marker is a fluorescent protein such as green fluorescent protein (GFP), enhanced green fluorescent protein (EGFP), red fluorescent protein (RFP), blue fluorescent protein (BFP), cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), miRFP (e.g., miRFP670, see, Shcherbakova, et al., Nat Commun. 2016; 7: 12405), mCherry, tdTomato, DsRed-Monomer, DsRed-Express, DSRed-Express2, DsRed2, AsRed2, mStrawberry, mPlum, mRaspberry, HcRed1, E2-Crimson, mOrange, mOrange2, mBanana, ZsYellow1, TagBFP, mTagBFP2, Azurite, EBFP2, mKalamal, Sirius, Sapphire, T-Sapphire, ECFP, Cerulean, SCFP3A, mTurquoise, mTurquoise2, monomelic Midoriishi-Cyan, TagCFP, niTFP1, Emerald, Superfolder GFP, Monomeric Azami Green, TagGFP2, mUKG, mWasabi, Clover, mNeonGreen, Citrine, Venus, SYFP2, TagYFP, Monomeric Kusabira-Orange, mKOk, mK02, mTangerine, mApple, mRuby, mRuby2, HcRed-Tandem, mKate2, mNeptune, NiFP, mkeima Red, LSS-mKate1, LSS-mkate2, mBeRFP, PA-GFP, PAmCherry1, PATagRFP, TagRFP6457, IFP1.2, iRFP, Kaede (green), Kaede (red), KikGR1 (green), KikGR1 (red), PS-CFP2, mEos2 (green), mEos2 (red), mEos3.2 (green), mEos3.2 (red), PSmOrange, Dronpa, Dendra2, Timer, AmCyan1, or a combination thereof. In certain embodiments, the detectable marker is a cell surface marker. In other instances, the cell surface marker is a marker not normally expressed on the cells, such as a truncated nerve growth factor receptor (tNGFR), a truncated epidermal growth factor receptor (tEGFR), CD8, truncated CD8, CD19, truncated CD19, a variant thereof, a fragment thereof, a derivative thereof, or a combination thereof.

In certain embodiments, the signal of the detectable marker may be enhanced by using a fluorescently labeled antibody, antibody fragment, nanobody, or aptamer. The binding agent may be specific to the detectable marker.

Flow-FISH

In certain embodiments, Flow FISH (fluorescent in-situ hybridization) is used to identify target cells in transcription factor screens. Flow FISH is a cytogenetic technique to quantify the copy number of RNA or specific repetitive elements in genomic DNA of whole cell populations via the combination of flow cytometry with cytogenetic fluorescent in situ hybridization staining protocols (see, e.g., C. P. Fulco et al., Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat Genet 51, 1664-1669 (2019); and Coillard A, Segura E. Visualization of RNA at the Single Cell Level by Fluorescent in situ Hybridization Coupled to Flow Cytometry. Bio Protoc. 2018; 8(12):e2892). The method provides for detecting marker genes for indicating differentiation of target cells using gene specific FISH probes and sorting the cells. In certain embodiments, multiple markers are used to increase specificity. Selecting for multiple reporter genes at the same time can narrow down target cell types because in certain embodiments one gene is not specific enough depending on the target cell type. Additionally, the assay is versatile in that reporter genes can be added or changed by applying different probes. Flow FISH combines FISH to fluorescently label mRNA of reporter genes and flow cytometry (see, e.g., Arrigucci et al., FISH-Flow, a protocol for the concurrent detection of mRNA and protein in single cells using fluorescence in situ hybridization and flow cytometry, Nat Protoc. 2017 June; 12(6):1245-1260. doi:10.1038/nprot.2017.039). In certain embodiments, the mRNA of reporter genes is fluorescently labeled; target cells are selected by flow cytometry; and TF barcodes are sequenced (e.g., amplified and then sequenced) to identify TFs enriched in the target cells. In certain embodiments, the marker genes are selected, such that they are specifically expressed only in the target cell. In this way, false positive selection or background is avoided. In certain embodiments, the assay is optimized to remove background fluorescence and to select for true positive cells.

Single Cell RNA-seq

In certain embodiments, the invention provides for identifying transcription factors whose overexpression can differentiate stem cells or progenitor cells into target cells by using single cell sequencing methods. In certain embodiments, transcription factors are introduced to a population of cells and single cells are analyzed by single cell sequencing. The population of cells may be analyzed with or without an integrated detectable marker. The introduced transcription factors can be identified in cells having a gene signature or biological program of interest (e.g., signature characteristic of the target cell). As used herein a “signature” may encompass any gene or genes, protein or proteins, or epigenetic element(s) whose expression profile or whose occurrence is associated with a specific cell type, subtype, or cell state of a specific cell type or subtype within a population of cells. A gene signature as used herein, may thus refer to any set of up- and down-regulated genes that are representative of a cell type or subtype or cell state. In certain embodiments, transcription factors are introduced at a high MOI to identify combinations of transcription factors capable of inducing a signature or biological program characteristic of the target cell of interest.

The transcription factors introduced may be identified by a barcode associated with each transcription factor. The barcode may be expressed on a transcript capable of identification by RNA-seq (e.g., a poly-A tailed transcript including the barcode sequence). In certain embodiments, single cells can be analyzed for a target cell phenotype or target cell subtypes after introducing transcription factors identified by the screening methods described herein. Thus, single cell sequencing may be used for identification of transcription factors and for analysis of cells differentiated by overexpressing transcription factors.

In certain embodiments, the invention involves single cell RNA sequencing (see, e.g., Kalisky, T., Blainey, P. & Quake, S. R. Genomic Analysis at the Single-Cell Level. Annual review of genetics 45, 431-445, (2011); Kalisky, T. & Quake, S. R. Single-cell genomics. Nature Methods 8, 311-314 (2011); Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research, (2011); Tang, F. et al. RNA-Seq analysis to capture the transcriptome landscape of a single cell. Nature Protocols 5, 516-535, (2010); Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nature Methods 6, 377-382, (2009); Ramskold, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nature Biotechnology 30, 777-782, (2012); and Hashimshony, T., Wagner, F., Sher, N. & Yanai, I. CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification. Cell Reports, Cell Reports, Volume 2, Issue 3, p 666-673, 2012).

In certain embodiments, the invention involves plate based single cell RNA sequencing (see, e.g., Picelli, S. et al., 2014, “Full-length RNA-seq from single cells using Smart-seq2” Nature protocols 9, 171-181, doi:10.1038/nprot.2014.006).

In certain embodiments, the invention involves high-throughput single-cell RNA-seq. In this regard reference is made to Macosko et al., 2015, “Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets” Cell 161, 1202-1214; International Patent Application No. PCT/US2015/049178, published as International Patent Publication No. WO 2016/040476 on Mar. 17, 2016; Klein et al., 2015, “Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells” Cell 161, 1187-1201; International Patent Application No. PCT/US2016/027734, published as International Patent Publication No. WO 2016/168584A1 on Oct. 20, 2016; Zheng, et al., 2016, “Haplotyping germline and cancer genomes with high-throughput linked-read sequencing” Nature Biotechnology 34, 303-311; Zheng, et al., 2017, “Massively parallel digital transcriptional profiling of single cells” Nat. Commun. 8, 14049 doi: 10.1038/ncomms14049; International Patent Publication No. WO 2014 210353A2; Zilionis, et al., 2017, “Single-cell barcoding and sequencing using droplet microfluidics” Nat Protoc. January; 12(1):44-73; Cao et al., 2017, “Comprehensive single cell transcriptional profiling of a multicellular organism by combinatorial indexing” bioRxiv preprint first posted online Feb. 2, 2017, doi: dx.doi.org/10.1101/104844; Rosenberg et al., 2017, “Scaling single cell transcriptomics through split pool barcoding” bioRxiv preprint first posted online Feb. 2, 2017, doi: dx.doi.org/10.1101/105163; Rosenberg et al., “Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding” Science 15 Mar. 2018; Vitak, et al., “Sequencing thousands of single-cell genomes with combinatorial indexing” Nature Methods, 14(3):302-308, 2017; Cao, et al., Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 357(6352):661-667, 2017; Gierahn et al., “Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput” Nature Methods 14, 395-398 (2017); and Hughes, et al., “Highly Efficient, Massively-Parallel Single-Cell RNA-Seq Reveals Cellular States and Molecular Features of Human Skin Pathology” bioRxiv 689273; doi: doi.org/10.1101/689273, all the contents and disclosure of each of which are herein incorporated by reference in their entirety.

In certain embodiments, the invention involves single nucleus RNA sequencing. In this regard reference is made to Swiech et al., 2014, “In vivo interrogation of gene function in the mammalian brain using CRISPR-Cas9” Nature Biotechnology Vol. 33, pp. 102-106; Habib et al., 2016, “Div-Seq: Single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons” Science, Vol. 353, Issue 6302, pp. 925-928; Habib et al., 2017, “Massively parallel single-nucleus RNA-seq with DroNc-seq” Nat Methods. 2017 October; 14(10):955-958; International Patent Application No. PCT/US2016/059239, published as WO 2017/164936 on Sep. 28, 2017; Patent Application No. PCT/US2018/060860, published as WO 2019/094984 on May 16, 2019; Patent Application No. PCT/US2019/055894, published as WO 2020/077236 on Apr. 16, 2020; and Drokhlyansky, et al., “The enteric nervous system of the human and mouse colon at a single-cell resolution,” bioRxiv 746743; doi: doi.org/10.1101/746743, which are herein incorporated by reference in their entirety.

In certain embodiments, the invention involves the Assay for Transposase Accessible Chromatin using sequencing (ATAC-seq) as described. (see, e.g., Buenrostro, et al., Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods 2013; 10 (12): 1213-1218; Buenrostro et al., Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523, 486-490 (2015); Cusanovich, D. A., Daza, R., Adey, A., Pliner, H., Christiansen, L., Gunderson, K. L., Steemers, F. J., Trapnell, C. & Shendure, J. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing. Science. 2015 May 22; 348(6237):910-4. doi: 10.1126/science.aab1601. Epub 2015 May 7; US20160208323A1; US20160060691A1; and WO2017156336A1).

In certain embodiments, the invention involves single cell multimodal data. Multiomic review (see, e.g., Lee J, Hyeon D Y, Hwang D. Single-cell multiomics: technologies and data analysis methods. Exp Mol Med. 2020; 52(9):1428-1442. doi:10.1038/s12276-020-0420-2). In certain embodiments, SHARE-Seq (Ma, S. et al. Chromatin potential identified by shared single cell profiling of RNA and chromatin. bioRxiv 2020.06.17.156943 (2020) doi:10.1101/2020.06.17.156943) is used to generate single cell RNA-seq and chromatin accessibility data. In certain embodiments, CITE-seq (Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865-868 (2017)) (cellular proteins) is used to generate single cell RNA-seq and proteomics data. In certain embodiments, Patch-seq (Cadwell, C. R. et al. Electrophysiological, transcriptomic and morphologic profiling of single neurons using Patch-seq. Nat. Biotechnol. 34, 199-203 (2016)) is used to generate single cell RNA-seq and patch-clamping electrophysiological recording and morphological analysis of single neurons data (see, e.g., van den Hurk, et al., Patch-Seq Protocol to Analyze the Electrophysiology, Morphology and Transcriptome of Whole Single Neurons Derived From Human Pluripotent Stem Cells, Front Mol Neurosci. 2018; 11: 261).

Transcription Factor Modules

In example embodiments, the invention provides for identifying transcription factors whose overexpression can differentiate stem cells or progenitor cells into target cells by using single cell sequencing methods. In example embodiments, selecting cells further comprises grouping one or more of the transcription factors into modules that alter expression of the same gene programs, such that transcription factors in the same modules are co-functional (i.e., function in similar pathways or have similar functions). As used herein the term “gene program” or “program” can be used interchangeably with “biological program”, “expression program”, “transcriptional program”, “expression profile”, or “expression program” and may refer to a set of genes that share a role in a biological function (e.g., an activation program, cell differentiation program, proliferation program). Biological programs can include a pattern of gene expression that result in a corresponding physiological event or phenotypic trait. Biological programs can include up to several hundred genes that are expressed in a spatially and temporally controlled fashion. Expression of individual genes can be shared between biological programs. Expression of individual genes can be shared among different single cell types; however, expression of a biological program may be cell type specific or temporally specific (e.g., the biological program is expressed in a cell type at a specific time). Multiple biological programs may include the same gene, reflecting the gene's roles in different processes. Expression of a biological program may be regulated by a master switch, such as a transcription factor or chromatin modifier. As used herein, the term “topic” refers to a biological program. The biological program can be modeled as a distribution over expressed genes.

One method to identify cell programs is non-negative matrix factorization (NMF) (see, e.g., Lee D D and Seung H S, Learning the parts of objects by non-negative matrix factorization, Nature. 1999 Oct. 21; 401(6755):788-91). As an alternative, a generative model based on latent Dirichlet allocation (LDA) (Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent Dirichlet allocation. J Mach Learn Res 3, 993-1022), or “topic modeling” may be created. Topic modeling is a statistical data mining approach for discovering the abstract topics that explain the words occurring in a collection of text documents. Originally developed to discover key semantic topics reflected by the words used in a corpus of documents (Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41, 391-407), topic modeling can be used to explore gene programs (“topics”) in each cell (“document”) based on the distribution of genes (“words”) expressed in the cell. A gene can belong to multiple programs, and its relative relevance in the topic is reflected by a weight. A cell is then represented as a weighted mixture of topics, where the weights reflect the importance of the corresponding gene program in the cell. Topic modeling using LDA has recently been applied to scRNA-seq data (see, e.g., Bielecki, Riesenfeld, Kowalczyk, et al., 2018 Skin inflammation driven by differentiation of quiescent tissue-resident ILCs into a spectrum of pathogenic effectors. bioRxiv 461228; and duVerle, D. A., Yotsukura, S., Nomura, S., Aburatani, H., and Tsuda, K. (2016). CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data. BMC Bioinformatics 17, 363). Other approaches include word embeddings. Identifying cell programs can recover cell states and bridge differences between cells. Single cell types may span a range of continuous cell states (see, e.g., Shekhar et al., Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics Cell. 2016 Aug. 25; 166(5):1308-1323.e30; and Bielecki, et al., 2018).

Pseudotime

In example embodiments, the invention provides for identifying transcription factors whose overexpression can differentiate stem cells or progenitor cells into target cell types by using single cell sequencing methods. In example embodiments, selecting cells further comprises inferring pseudotime distribution of cells by comparing expression profiles of single cells overexpressing one or more of the transcription factors to those overexpressing controls (e.g., empty vector not expressing a transcription factor or a vector overexpressing a control protein), wherein transcription factors that increase pseudotimes direct differentiation. The methods of the invention can use any trajectory inference (TI) method (see, e.g., Cao J, Spielmann M, Qiu X, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019; 566(7745):496-502; Chen H, Albergante L, Hsu J Y, et al. Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nat Commun. 2019; 10(1):1903; and Van den Berge K, Roux de Bézieux H, Street K, et al. Trajectory-based differential expression analysis for single-cell sequencing data. Nat Commun. 2020; 11(1):1201).

Cellular processes, such as cell differentiation and cell maturation, are dynamic in nature and not always well described by discrete analysis like clustering. Therefore, other methods such as single-cell trajectory inference and pseudotime estimation have emerged. These methods allow to study cellular dynamics, delineate cell developmental lineages, and characterize the transition between different cell states. Briefly, single cells are ordered along deterministic or probabilistic trajectories and a numeric value referred to as pseudotime is assigned to each cell to indicate how far it progresses along a dynamic process of interest. Cell trajectory analysis, also known as pseudo-time series (pseudotime) analysis, uses single cell gene expression to order individual cells at pseudo-time, placing the cells at appropriate trajectory positions corresponding to biological processes, such as cell differentiation, by way of the individual cell's asynchronous biological processes. Most TI methods share a common workflow: dimensionality reduction followed by inference of lineages and pseudotimes in the reduced dimensional space. In that reduced dimensional space, a cell's pseudotime for a given lineage is the distance, along the lineage, between the cell and the origin of the lineage. For cells overexpressing TFs, the origin is defined using cells overexpressing controls.

Target Cell Types

Target cell types may include, but are not limited to an immune cell, intestinal cell, liver cell, kidney cell, lung cell, brain cell, epithelial cell, endoderm cell, neuron, ectoderm cell, islet cell, acinar cell, hematopoietic cell, hepatocyte, skin/keratinocyte, melanocyte, bone/osteocyte, hair/dermal papilla cell, cartilage/chondrocyte, fat cell/adipocyte, skeletal muscular cell, endothelium cell, cardiac muscle/cardiomyocyte, trophoblast. Target cells may also include progenitor cells associated with target cell types. Markers specific to target cell types are well known in the art.

In certain embodiments, target cell types are neural progenitors. In preferred embodiments, neural progenitors are differentiated to obtain a target cell type that is a neuron, astrocyte and/or oligodendrocyte. In more preferred embodiments, the target cell type is a neuron. In more preferred embodiments, the neuron is a GABAergic neuron. Neurons that produce GABA as their output are called GABAergic neurons, and have chiefly inhibitory action at receptors in the adult vertebrate (Rudy, et al., Three Groups of Interneurons Account for Nearly 100% of Neocortical GABAergic Neurons, Dev Neurobiol. 2011 Jan. 1; 71(1): 45-61). Malfunction of GABAergic neurons has been implicated in a number of diseases ranging from epilepsy to schizophrenia, anxiety disorders and autism. Id.

In certain embodiments, cells differentiated by overexpression of specific transcription factors can be further analyzed. Differentiated target cells can be analyzed for expression of biomarkers specific to the target cells or specific to a phenotype associated with the target cells.

The term “biomarker” is widespread in the art and commonly broadly denotes a biological molecule, more particularly an endogenous biological molecule, and/or a detectable portion thereof, whose qualitative and/or quantitative evaluation in a tested object (e.g., in or on a cell, cell population, tissue, organ, or organism) is predictive or informative with respect to one or more aspects of the tested object's phenotype and/or genotype. The terms “marker” and “biomarker” may be used interchangeably throughout this specification. Biomarkers as intended herein may be nucleic acid-based or peptide-, polypeptide- and/or protein-based. For example, a marker may be comprised of peptide(s), polypeptide(s) and/or protein(s) encoded by a given gene, or of detectable portions thereof. Further, whereas the term “nucleic acid” generally encompasses DNA, RNA and DNA/RNA hybrid molecules, in the context of markers the term may typically refer to heterogeneous nuclear RNA (hnRNA), pre-mRNA, messenger RNA (mRNA), or complementary DNA (cDNA), or detectable portions thereof. Such nucleic acid species are particularly useful as markers, since they contain qualitative and/or quantitative information about the expression of the gene. Particularly preferably, a nucleic acid-based marker may encompass mRNA of a given gene, or cDNA made of the mRNA, or detectable portions thereof. Any such nucleic acid(s), peptide(s), polypeptide(s) and/or protein(s) encoded by or produced from a given gene are encompassed by the term “gene product(s)”.

Preferably, markers as intended herein may be extracellular or cell surface markers, as methods to measure extracellular or cell surface marker(s) need not disturb the integrity of the cell membrane and may not require fixation/permeabilization of the cells.

Unless otherwise apparent from the context, reference herein to any marker, such as a peptide, polypeptide, protein, or nucleic acid, may generally also encompass modified forms of said marker, such as bearing post-expression modifications including, for example, phosphorylation, glycosylation, lipidation, methylation, cysteinylation, sulphonation, glutathionylation, acetylation, oxidation of methionine to methionine sulphoxide or methionine sulphone, and the like.

The term “peptide” as used throughout this specification preferably refers to a polypeptide as used herein consisting essentially of 50 amino acids or less, e.g., 45 amino acids or less, preferably 40 amino acids or less, e.g., 35 amino acids or less, more preferably 30 amino acids or less, e.g., 25 or less, 20 or less, 15 or less, 10 or less or 5 or less amino acids.

The term “polypeptide” as used throughout this specification generally encompasses polymeric chains of amino acid residues linked by peptide bonds. Hence, insofar a protein is only composed of a single polypeptide chain, the terms “protein” and “polypeptide” may be used interchangeably herein to denote such a protein. The term is not limited to any minimum length of the polypeptide chain. The term may encompass naturally, recombinantly, semi-synthetically or synthetically produced polypeptides. The term also encompasses polypeptides that carry one or more co- or post-expression-type modifications of the polypeptide chain, such as, without limitation, glycosylation, acetylation, phosphorylation, sulfonation, methylation, ubiquitination, signal peptide removal, N-terminal Met removal, conversion of pro-enzymes or pre-hormones into active forms, etc. The term further also includes polypeptide variants or mutants which carry amino acid sequence variations vis-à-vis a corresponding native polypeptide, such as, e.g., amino acid deletions, additions and/or substitutions. The term contemplates both full-length polypeptides and polypeptide parts or fragments, e.g., naturally-occurring polypeptide parts that ensue from processing of such full-length polypeptides.

The term “protein” as used throughout this specification generally encompasses macromolecules comprising one or more polypeptide chains, i.e., polymeric chains of amino acid residues linked by peptide bonds. The term may encompass naturally, recombinantly, semi-synthetically or synthetically produced proteins. The term also encompasses proteins that carry one or more co- or post-expression-type modifications of the polypeptide chain(s), such as, without limitation, glycosylation, acetylation, phosphorylation, sulfonation, methylation, ubiquitination, signal peptide removal, N-terminal Met removal, conversion of pro-enzymes or pre-hormones into active forms, etc. The term further also includes protein variants or mutants which carry amino acid sequence variations vis-à-vis a corresponding native protein, such as, e.g., amino acid deletions, additions and/or substitutions. The term contemplates both full-length proteins and protein parts or fragments, e.g., naturally-occurring protein parts that ensue from processing of such full-length proteins.

The reference to any marker, including any peptide, polypeptide, protein, or nucleic acid, corresponds to the marker commonly known under the respective designations in the art. The terms encompass such markers of any organism where found, and particularly of animals, preferably warm-blooded animals, more preferably vertebrates, yet more preferably mammals, including humans and non-human mammals, still more preferably of humans.

The terms particularly encompass such markers, including any peptides, polypeptides, proteins, or nucleic acids, with a native sequence, i.e., ones of which the primary sequence is the same as that of the markers found in or derived from nature. A skilled person understands that native sequences may differ between different species due to genetic divergence between such species. Moreover, native sequences may differ between or within different individuals of the same species due to normal genetic diversity (variation) within a given species. Also, native sequences may differ between or even within different individuals of the same species due to somatic mutations, or post-transcriptional or post-translational modifications. Any such variants or isoforms of markers are intended herein. Accordingly, all sequences of markers found in or derived from nature are considered “native”. The terms encompass the markers when forming a part of a living organism, organ, tissue or cell, when forming a part of a biological sample, as well as when at least partly isolated from such sources. The terms also encompass markers when produced by recombinant or synthetic means.

In certain embodiments, markers, including any peptides, polypeptides, proteins, or nucleic acids, may be human, i.e., their primary sequence may be the same as a corresponding primary sequence of or present in a naturally occurring human markers. Hence, the qualifier “human” in this connection relates to the primary sequence of the respective markers, rather than to their origin or source. For example, such markers may be present in or isolated from samples of human subjects or may be obtained by other means (e.g., by recombinant expression, cell-free transcription or translation, or non-biological nucleic acid or peptide synthesis).

The reference herein to any marker, including any peptide, polypeptide, protein, or nucleic acid, also encompasses fragments thereof. Hence, the reference herein to measuring (or measuring the quantity of) any one marker may encompass measuring the marker and/or measuring one or more fragments thereof.

For example, any marker and/or one or more fragments thereof may be measured collectively, such that the measured quantity corresponds to the sum amounts of the collectively measured species. In another example, any marker and/or one or more fragments thereof may be measured each individually. The terms encompass fragments arising by any mechanism, in vivo and/or in vitro, such as, without limitation, by alternative transcription or translation, exo- and/or endo-proteolysis, exo- and/or endo-nucleolysis, or degradation of the peptide, polypeptide, protein, or nucleic acid, such as, for example, by physical, chemical and/or enzymatic proteolysis or nucleolysis.

The term “fragment” as used throughout this specification with reference to a peptide, polypeptide, or protein generally denotes a portion of the peptide, polypeptide, or protein, such as typically an N- and/or C-terminally truncated form of the peptide, polypeptide, or protein. Preferably, a fragment may comprise at least about 30%, e.g., at least about 50% or at least about 70%, preferably at least about 80%, e.g., at least about 85%, more preferably at least about 90%, and yet more preferably at least about 95% or even about 99% of the amino acid sequence length of said peptide, polypeptide, or protein. For example, insofar not exceeding the length of the full-length peptide, polypeptide, or protein, a fragment may include a sequence of ≥5 consecutive amino acids, or ≥10 consecutive amino acids, or ≥20 consecutive amino acids, or ≥30 consecutive amino acids, e.g., ≥40 consecutive amino acids, such as for example ≥50 consecutive amino acids, e.g., ≥60, ≥70, ≥80, ≥90, ≥100, ≥200, ≥ 300, ≥400, ≥500 or ≥600 consecutive amino acids of the corresponding full-length peptide, polypeptide, or protein.

The term “fragment” as used throughout this specification with reference to a nucleic acid (polynucleotide) generally denotes a 5′- and/or 3′-truncated form of a nucleic acid. Preferably, a fragment may comprise at least about 30%, e.g., at least about 50% or at least about 70%, preferably at least about 80%, e.g., at least about 85%, more preferably at least about 90%, and yet more preferably at least about 95% or even about 99% of the nucleic acid sequence length of said nucleic acid. For example, insofar not exceeding the length of the full-length nucleic acid, a fragment may include a sequence of ≥5 consecutive nucleotides, or ≥10 consecutive nucleotides, or ≥20 consecutive nucleotides, or ≥30 consecutive nucleotides, e.g., ≥40 consecutive nucleotides, such as for example ≥50 consecutive nucleotides, e.g., ≥60, ≥ 70, ≥80, ≥90, ≥100, ≥200, ≥300, ≥400, ≥500 or ≥600 consecutive nucleotides of the corresponding full-length nucleic acid.

Cells such as target cells as disclosed herein may in the context of the present specification be said to “comprise the expression” or conversely to “not express” one or more markers, such as one or more genes or gene products; or be described as “positive” or conversely as “negative” for one or more markers, such as one or more genes or gene products; or be said to “comprise” a defined “gene or gene product signature”.

Such terms are commonplace and well-understood by the skilled person when characterizing cell phenotypes. By means of additional guidance, when a cell is said to be positive for or to express or comprise expression of a given marker, such as a given gene or gene product, a skilled person would conclude the presence or evidence of a distinct signal for the marker when carrying out a measurement capable of detecting or quantifying the marker in or on the cell. Suitably, the presence or evidence of the distinct signal for the marker would be concluded based on a comparison of the measurement result obtained for the cell to a result of the same measurement carried out for a negative control (for example, a cell known to not express the marker) and/or a positive control (for example, a cell known to express the marker). Where the measurement method allows for a quantitative assessment of the marker, a positive cell may generate a signal for the marker that is at least 1.5-fold higher than a signal generated for the marker by a negative control cell or than an average signal generated for the marker by a population of negative control cells, e.g., at least 2-fold, at least 4-fold, at least 10-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50-fold higher or even higher. Further, a positive cell may generate a signal for the marker that is 3.0 or more standard deviations, e.g., 3.5 or more, 4.0 or more, 4.5 or more, or 5.0 or more standard deviations, higher than an average signal generated for the marker by a population of negative control cells.

A marker, for example a gene or gene product, for example a peptide, polypeptide, protein, or nucleic acid, or a group of two or more markers, is “detected” or “measured” in a tested object (e.g., in or on a cell, cell population, tissue, organ, or organism) when the presence or absence and/or quantity of said marker or said group of markers is detected or determined in the tested object, preferably substantially to the exclusion of other molecules and analytes, e.g., other genes or gene products.

The terms “increased” or “increase” or “upregulated” or “upregulate” as used herein generally mean an increase by a statically significant amount. For avoidance of doubt, “increased” means a statistically significant increase of at least 10% as compared to a reference level, including an increase of at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100% or more, including, for example at least 2-fold, at least 3-fold, at least 4-fold, at least 5-fold, at least 10-fold increase or greater as compared to a reference level, as that term is defined herein.

The term “reduced” or “reduce” or “decrease” or “decreased” or “downregulate” or “downregulated” as used herein generally means a decrease by a statistically significant amount relative to a reference. For avoidance of doubt, “reduced” means statistically significant decrease of at least 10% as compared to a reference level, for example a decrease by at least 20%, at least 30%, at least 40%, at least 50%, or at least 60%, or at least 70%, or at least 80%, at least 90% or more, up to and including a 100% decrease (i.e., absent level as compared to a reference sample), or any decrease between 10-100% as compared to a reference level, as that.

The terms “quantity”, “amount” and “level” are synonymous and generally well-understood in the art. The terms as used throughout this specification may particularly refer to an absolute quantification of a marker in a tested object (e.g., in or on a cell, cell population, tissue, organ, or organism, e.g., in a biological sample of a subject), or to a relative quantification of a marker in a tested object, i.e., relative to another value such as relative to a reference value, or to a range of values indicating a base-line of the marker. Such values or ranges may be obtained as conventionally known.

An absolute quantity of a marker may be advantageously expressed as weight or as molar amount, or more commonly as a concentration, e.g., weight per volume or mol per volume. A relative quantity of a marker may be advantageously expressed as an increase or decrease or as a fold-increase or fold-decrease relative to said another value, such as relative to a reference value. Performing a relative comparison between first and second variables (e.g., first and second quantities) may but need not require determining first the absolute values of said first and second variables. For example, a measurement method may produce quantifiable readouts (such as, e.g., signal intensities) for said first and second variables, wherein said readouts are a function of the value of said variables, and wherein said readouts may be directly compared to produce a relative value for the first variable vs. the second variable, without the actual need to first convert the readouts to absolute values of the respective variables.

Reference values may be established according to known procedures previously employed for other cell populations, biomarkers and gene or gene product signatures. For example, a reference value may be established in an individual or a population of individuals characterized by a particular diagnosis, prediction and/or prognosis of said disease or condition (i.e., for whom said diagnosis, prediction and/or prognosis of the disease or condition holds true). Such population may comprise without limitation 2 or more, 10 or more, 100 or more, or even several hundred or more individuals.

A “deviation” of a first value from a second value may generally encompass any direction (e.g., increase: first value>second value; or decrease: first value<second value) and any extent of alteration.

For example, a deviation may encompass a decrease in a first value by, without limitation, at least about 10% (about 0.9-fold or less), or by at least about 20% (about 0.8-fold or less), or by at least about 30% (about 0.7-fold or less), or by at least about 40% (about 0.6-fold or less), or by at least about 50% (about 0.5-fold or less), or by at least about 60% (about 0.4-fold or less), or by at least about 70% (about 0.3-fold or less), or by at least about 80% (about 0.2-fold or less), or by at least about 90% (about 0.1-fold or less), relative to a second value with which a comparison is being made.

For example, a deviation may encompass an increase of a first value by, without limitation, at least about 10% (about 1.1-fold or more), or by at least about 20% (about 1.2-fold or more), or by at least about 30% (about 1.3-fold or more), or by at least about 40% (about 1.4-fold or more), or by at least about 50% (about 1.5-fold or more), or by at least about 60% (about 1.6-fold or more), or by at least about 70% (about 1.7-fold or more), or by at least about 80% (about 1.8-fold or more), or by at least about 90% (about 1.9-fold or more), or by at least about 100% (about 2-fold or more), or by at least about 150% (about 2.5-fold or more), or by at least about 200% (about 3-fold or more), or by at least about 500% (about 6-fold or more), or by at least about 700% (about 8-fold or more), or like, relative to a second value with which a comparison is being made.

Preferably, a deviation may refer to a statistically significant observed alteration. For example, a deviation may refer to an observed alteration which falls outside of error margins of reference values in a given population (as expressed, for example, by standard deviation or standard error, or by a predetermined multiple thereof, e.g., ±1×SD or ±2×SD or ±3×SD, or ±1×SE or ±2×SE or ±3×SE). Deviation may also refer to a value falling outside of a reference range defined by values in a given population (for example, outside of a range which comprises ≥40%, ≥50%, ≥60%, ≥70%, ≥75% or ≥80% or ≥85% or ≥90% or ≥95% or even ≥100% of values in said population).

In a further embodiment, a deviation may be concluded if an observed alteration is beyond a given threshold or cut-off. Such threshold or cut-off may be selected as generally known in the art to provide for a chosen sensitivity and/or specificity of the prediction methods, e.g., sensitivity and/or specificity of at least 50%, or at least 60%, or at least 70%, or at least 80%, or at least 85%, or at least 90%, or at least 95%.

For example, receiver-operating characteristic (ROC) curve analysis can be used to select an optimal cut-off value of the quantity of a given immune cell population, biomarker or gene or gene product signatures, for clinical use of the present diagnostic tests, based on acceptable sensitivity and specificity, or related performance measures which are well-known per se, such as positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), negative likelihood ratio (LR−), Youden index, or similar.

In certain embodiments, the target cells may be detected, quantified, sorted or isolated using a technique selected from the group consisting of flow cytometry, mass cytometry, fluorescence activated cell sorting (FACS), fluorescence microscopy, affinity separation, magnetic cell separation, microfluidic separation, RNA-seq (e.g., bulk or single cell), quantitative PCR, MERFISH (multiplex (in situ) RNA FISH) and combinations thereof. The technique may employ one or more agents capable of specifically binding to one or more gene products expressed or not expressed by the target cells, preferably on the cell surface of the target cells. The one or more agents may be one or more antibodies. Other methods including absorbance assays and colorimetric assays are known in the art and may be used herein.

In other example embodiments, detection of a marker may include immunological assay methods, wherein the ability of an assay to separate, detect and/or quantify a marker (such as, preferably, peptide, polypeptide, or protein) is conferred by specific binding between a separable, detectable and/or quantifiable immunological binding agent (antibody) and the marker. Immunological assay methods include without limitation immunohistochemistry, immunocytochemistry, flow cytometry, mass cytometry, fluorescence activated cell sorting (FACS), fluorescence microscopy, fluorescence based cell sorting using microfluidic systems, immunoaffinity adsorption based techniques such as affinity chromatography, magnetic particle separation, magnetic activated cell sorting or bead based cell sorting using microfluidic systems, enzyme-linked immunosorbent assay (ELISA) and ELISPOT based techniques, radioimmunoassay (RIA), western blot, etc.

In certain example embodiments, detection of a marker or signature may include biochemical assay methods, including inter alia assays of enzymatic activity, membrane channel activity, substance-binding activity, gene regulatory activity, or cell signaling activity of a marker, e.g., peptide, polypeptide, protein, or nucleic acid.

In other example embodiments, detection of a marker may include mass spectrometry analysis methods. Generally, any mass spectrometric (MS) techniques that are capable of obtaining precise information on the mass of peptides, and preferably also on fragmentation and/or (partial) amino acid sequence of selected peptides (e.g., in tandem mass spectrometry, MS/MS; or in post source decay, TOF MS), may be useful herein for separation, detection and/or quantification of markers (such as, preferably, peptides, polypeptides, or proteins). Suitable peptide MS and MS/MS techniques and systems are well-known per se (see, e.g., Methods in Molecular Biology, vol. 146: “Mass Spectrometry of Proteins and Peptides”, by Chapman, ed., Humana Press 2000, ISBN 089603609x; Biemann 1990. Methods Enzymol 193: 455-79; or Methods in Enzymology, vol. 402: “Biological Mass Spectrometry”, by Burlingame, ed., Academic Press 2005, ISBN 9780121828073) and may be used herein. MS arrangements, instruments and systems suitable for biomarker peptide analysis may include, without limitation, matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) MS; MALDI-TOF post-source-decay (PSD); MALDI-TOF/TOF; surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF) MS; electrospray ionization mass spectrometry (ESI-MS); ESI-MS/MS; ESI-MS/(MS)n (n is an integer greater than zero); ESI 3D or linear (2D) ion trap MS; ESI triple quadrupole MS; ESI quadrupole orthogonal TOF (Q-TOF); ESI Fourier transform MS systems; desorption/ionization on silicon (DIOS); secondary ion mass spectrometry (SIMS); atmospheric pressure chemical ionization mass spectrometry (APCI-MS); APCI-MS/MS; APCI-(MS)n; atmospheric pressure photoionization mass spectrometry (APPI-MS); APPI-MS/MS; and APPI-(MS)n. Peptide ion fragmentation in tandem MS (MS/MS) arrangements may be achieved using manners established in the art, such as, e.g., collision induced dissociation (CID). Detection and quantification of markers by mass spectrometry may involve multiple reaction monitoring (MRM), such as described among others by Kuhn et al. 2004 (Proteomics 4: 1175-86). MS peptide analysis methods may be advantageously combined with upstream peptide or protein separation or fractionation methods, such as for example with the chromatographic and other methods.

In other example embodiments, detection of a marker may include chromatography methods. In a one example embodiment, chromatography refers to a process in which a mixture of substances (analytes) carried by a moving stream of liquid or gas (“mobile phase”) is separated into components as a result of differential distribution of the analytes, as they flow around or over a stationary liquid or solid phase (“stationary phase”), between said mobile phase and said stationary phase. The stationary phase may be usually a finely divided solid, a sheet of filter material, or a thin film of a liquid on the surface of a solid, or the like. Chromatography may be columnar. While particulars of chromatography are well known in the art, for further guidance see, e.g., Meyer M., 1998, ISBN: 047198373X, and “Practical HPLC Methodology and Applications”, Bidlingmeyer, B. A., John Wiley & Sons Inc., 1993. Exemplary types of chromatography include, without limitation, high-performance liquid chromatography (HPLC), normal phase HPLC (NP-HPLC), reversed phase HPLC (RP-HPLC), ion exchange chromatography (IEC), such as cation or anion exchange chromatography, hydrophilic interaction chromatography (HILIC), hydrophobic interaction chromatography (HIC), size exclusion chromatography (SEC) including gel filtration chromatography or gel permeation chromatography, chromatofocusing, affinity chromatography such as immunoaffinity, immobilized metal affinity chromatography, and the like.

In certain embodiments, further techniques for separating, detecting and/or quantifying markers may be used in conjunction with any of the above described detection methods. Such methods include, without limitation, chemical extraction partitioning, isoelectric focusing (IEF) including capillary isoelectric focusing (CIEF), capillary isotachophoresis (CITP), capillary electrochromatography (CEC), and the like, one-dimensional polyacrylamide gel electrophoresis (PAGE), two-dimensional polyacrylamide gel electrophoresis (2D-PAGE), capillary gel electrophoresis (CGE), capillary zone electrophoresis (CZE), micellar electrokinetic chromatography (MEKC), free flow electrophoresis (FFE), etc.

In certain examples, such methods may include separating, detecting and/or quantifying markers at the nucleic acid level, more particularly RNA level, e.g., at the level of hnRNA, pre-mRNA, mRNA, or cDNA. Standard quantitative RNA or cDNA measurement tools known in the art may be used. Non-limiting examples include hybridization-based analysis, microarray expression analysis, digital gene expression profiling (DGE), RNA-in-situ hybridization (RISH), Northern-blot analysis and the like; PCR, RT-PCR, RT-qPCR, end-point PCR, digital PCR or the like; supported oligonucleotide detection, pyrosequencing, polony cyclic sequencing by synthesis, simultaneous bi-directional sequencing, single-molecule sequencing, single molecule real time sequencing, true single molecule sequencing, hybridization-assisted nanopore sequencing, sequencing by synthesis, single-cell RNA sequencing (sc-RNA seq), or the like.

The present invention is also directed to signatures and uses thereof. In certain embodiments, a homogenous population of a target cell type (e.g., radial glia) may allow identification of specific signatures (e.g., rare signatures). As used herein a “signature” may encompass any gene or genes, protein or proteins, or epigenetic element(s) whose expression profile or whose occurrence is associated with a specific cell type, subtype, or cell state of a specific cell type or subtype within a population of cells (e.g., radial glia). In certain embodiments, the expression of the target cell signatures is dependent on epigenetic modification of the genes or regulatory elements associated with the genes. Thus, in certain embodiments, use of signature genes includes epigenetic modifications that may be detected or modulated. For ease of discussion, when discussing gene expression, any gene or genes, protein or proteins, or epigenetic element(s) may be substituted. Reference to a gene name throughout the specification encompasses the human gene, mouse gene and all other orthologues as known in the art in other organisms. As used herein, the terms “signature”, “expression profile”, or “expression program” may be used interchangeably. It is to be understood that also when referring to proteins (e.g., differentially expressed proteins), such may fall within the definition of “gene” signature. Levels of expression or activity or prevalence may be compared between different cells in order to characterize or identify for instance signatures specific for cell (sub)populations. Increased or decreased expression or activity of signature genes may be compared between different cells in order to characterize or identify for instance specific cell (sub)populations. The detection of a signature in single cells may be used to identify and quantitate for instance specific cell (sub)populations. A signature may include a gene or genes, protein or proteins, or epigenetic element(s) whose expression or occurrence is specific to a cell (sub)population, such that expression or occurrence is exclusive to the cell (sub)population. A gene signature as used herein, may thus refer to any set of up- and down-regulated genes that are representative of a cell type or subtype. A gene signature as used herein, may also refer to any set of up- and down-regulated genes between different cells or cell (sub)populations derived from a gene-expression profile. For example, a gene signature may comprise a list of genes differentially expressed in a distinction of interest.

The signature as defined herein (being it a gene signature, protein signature or other genetic or epigenetic signature) can be used to indicate the presence of a cell type, a subtype of the cell type, the state of the microenvironment of a population of cells, a particular cell type population or subpopulation, and/or the overall status of the entire cell (sub)population. Furthermore, the signature may be indicative of cells within a population of cells in vivo.

The signature according to certain embodiments of the present invention may comprise or consist of one or more genes, proteins and/or epigenetic elements, such as for instance 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of two or more genes, proteins and/or epigenetic elements, such as for instance 2, 3, 4, 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of three or more genes, proteins and/or epigenetic elements, such as for instance 3, 4, 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of four or more genes, proteins and/or epigenetic elements, such as for instance 4, 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of five or more genes, proteins and/or epigenetic elements, such as for instance 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of six or more genes, proteins and/or epigenetic elements, such as for instance 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of seven or more genes, proteins and/or epigenetic elements, such as for instance 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of eight or more genes, proteins and/or epigenetic elements, such as for instance 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of nine or more genes, proteins and/or epigenetic elements, such as for instance 9, 10 or more. In certain embodiments, the signature may comprise or consist of ten or more genes, proteins and/or epigenetic elements, such as for instance 10, 11, 12, 13, 14, 15, or more. It is to be understood that a signature according to the invention may for instance also include genes or proteins as well as epigenetic elements combined.

In certain embodiments, a signature is characterized as being specific for a particular target cell or target cell (sub)population if it is upregulated or only present, detected or detectable in that particular target cell or target cell (sub)population, or alternatively is downregulated or only absent, or undetectable in that particular target cell or target cell (sub)population. In this context, a signature consists of one or more differentially expressed genes/proteins or differential epigenetic elements when comparing different cells or cell (sub)populations, including comparing different target cell or target cell (sub)populations, as well as comparing target cell or target cell (sub)populations with non-target cell or non-target cell (sub)populations. It is to be understood that “differentially expressed” genes/proteins include genes/proteins which are up- or down-regulated as well as genes/proteins which are turned on or off. When referring to up- or down-regulation, in certain embodiments, such up- or down-regulation is preferably at least two-fold, such as two-fold, three-fold, four-fold, five-fold, or more, such as for instance at least ten-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50-fold, or more. Alternatively, or in addition, differential expression may be determined based on common statistical tests, as is known in the art.

As discussed herein, differentially expressed genes/proteins, or differential epigenetic elements may be differentially expressed on a single cell level, or may be differentially expressed on a cell population level. Preferably, the differentially expressed genes/proteins or epigenetic elements as discussed herein, such as constituting the gene signatures as discussed herein, when as to the cell population or subpopulation level, refer to genes that are differentially expressed in all or substantially all cells of the population or subpopulation (such as at least 80%, preferably at least 90%, such as at least 95% of the individual cells). This allows one to define a particular subpopulation of target cells. As referred to herein, a “subpopulation” of cells preferably refers to a particular subset of cells of a particular cell type which can be distinguished or are uniquely identifiable and set apart from other cells of this cell type. The cell subpopulation may be phenotypically characterized, and is preferably characterized by the signature as discussed herein. A cell (sub)population as referred to herein may constitute of a (sub)population of cells of a particular cell type characterized by a specific cell state.

When referring to induction, or alternatively suppression of a particular signature, preferable is meant induction or alternatively suppression (or upregulation or downregulation) of at least one gene/protein and/or epigenetic element of the signature, such as for instance at least two, at least three, at least four, at least five, at least six, or all genes/proteins and/or epigenetic elements of the signature.

In certain embodiments, cells overexpressing transcription factors may be analyzed for the ability to further differentiate (e.g., radial glia can be differentiated to astrocytes, oligodendrocytes and neurons). The cells may be analyzed by analyzing spontaneous or directed differentiation methods. In certain embodiments, cells are analyzed by performing xenografts in immune compromised animal models. In certain embodiments, the cells are analyzed for the ability to repair or regenerate diseased tissue.

Oncology Screening

In certain embodiments, the barcoded transcription library can be used for a method of pooled screening for transcription factors that enhance or suppress tumor growth. Expression of tumor suppressors have been shown to suppress tumor growth (see, e.g., Wang et al., Restoring expression of wild-type p53 suppresses tumor growth but does not cause tumor regression in mice with a p53 missense mutation. J Clin Invest. 2011 March; 121(3):893-904). In certain embodiments, the method is used to identify therapeutic targets for treating specific cancers. Cancer cell lines for any cancer type may be used. Cancer cell lines may be obtained from a patient. In certain embodiments, the barcoded transcription factor library is introduced to a cancer cell line in vitro, the cells are grown (e.g., 1 to 3 weeks), and the enrichment and depletion of barcodes in the cells is determined as compared to the barcodes present in the original library. In certain embodiments, the barcoded transcription factor library is introduced to a cancer cell line in vitro and transferred to an in vivo model (e.g., nude mice), the cells are grown in vivo (e.g., 1 to 8 weeks), tumor cells are removed (e.g., the tumor), and the enrichment and depletion of barcodes in the cells is determined as compared to the barcodes present in the original library. Barcodes that are enriched represent transcription factors that enhance tumor growth. These transcription factor may be targeted for inhibition to suppress tumor growth. Barcodes that are depleted represent transcription factors that suppress tumor growth. These transcription factors may be overexpressed or activated to suppress tumor growth.

Combinatorial TF Screening and Prediction

In example embodiments, the genes and gene programs expressed in cells screened by overexpression of single transcription factors is used to identify transcription factor combinations to differentiate stem cells into a target cell type. In example embodiments, single cells overexpressing single transcription factors are used to identify one or more differentially expressed genes as compared to cells not expressing a transcription factor. In one embodiment, a transcription factor atlas as described herein is used. The differentially expressed genes can be used to determine combinations of transcription factors for directing differentiation of stem cells into target cells that more faithfully recapitulate the in vivo target cells. Thus, providing for improved cellular models and therapeutics. In one example embodiment, the average expression of differentially expressed genes for two or more transcription factors are compared to the gene expression of the differentially expressed genes in the target cell. The combination of transcription factors that provide an average expression that most closely recapitulates the expression in the target cell can be used to differentiate stem cells into the target cells. In example embodiments, the average is taken from 2, 3, 4, or more transcription factors, preferably, 2, 3, or 4 transcription factors. In example embodiments, more than 1 gene is averaged, for example, more than 10, 100, 1,000, 5,000, or 10,000 genes. In example embodiments, the genes are part of a gene program, expression program, or pathway as described herein.

In example embodiments, combinations of TFs can be screened using the methods and libraries described herein. For example, a library of 4, 5, 6, 7, 8, 9, 10, 20 or more transcription factors can be introduced to stem cells. In preferred embodiments, the TF library is introduced at high MOI (e.g., greater than 1, 2, 3, 4, 5 or more vectors per cell). In example embodiments, the cells are profiled by single cell RNA-seq. Using the pooled screening methods described herein TF combinations can be identified that are overexpressed by each single cell.

Use of Target Cells and Transcription Factors In Vitro Models

In certain embodiments, the present invention provides methods of generating target cell types in vitro. In vitro models may be obtained by overexpressing transcription factors identified through screening as described herein. In certain embodiments, the methods advantageously produce homogeneous cell types. The methods also provide target cells with reduced labor, time and cost.

In certain embodiments, the in vitro models of the present invention may be used to study development, cell biology and disease. In certain embodiments, the in vitro models of the present invention may be used to screen for drugs capable of modulating the target cells or for determining toxicity of drugs (e.g., toxic to cardiomyocytes). In certain embodiments, the in vitro models of the present invention may be used to identify specific cell states and/or subtypes.

In certain embodiments, the in vitro models of the present invention may be used in perturbation studies. Perturbations may include conditions, substances or agents. Agents may be of physical, chemical, biochemical and/or biological nature. Perturbations may include treatment with a small molecule, protein, RNAi, CRISPR system, TALE system, Zinc finger system, meganuclease, pathogen, allergen, biomolecule, or environmental stress. Such methods may be performed in any manner appropriate for the particular application.

In certain embodiments, the in vitro models are configured for performing perturb-seq. Methods and tools for genome-scale screening of perturbations in single cells using CRISPR have been described, herein referred to as perturb-seq (see e.g., Dixit et al., “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens” 2016, Cell 167, 1853-1866; Adamson et al., “A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response” 2016, Cell 167, 1867-1882; Feldman et al., Lentiviral co-packaging mitigates the effects of intermolecular recombination and multiple integrations in pooled genetic screens, bioRxiv 262121, doi: doi.org/10.1101/262121; Datlinger, et al., 2017, Pooled CRISPR screening with single-cell transcriptome readout. Nature Methods. Vol. 14 No. 3 DOI: 10.1038/nmeth.4177; Hill et al., On the design of CRISPR-based single cell molecular screens, Nat Methods. 2018 April; 15(4): 271-274; Replogle, et al., “Combinatorial single-cell CRISPR screens by direct guide RNA capture and targeted sequencing” Nat Biotechnol (2020). doi.org/10.1038/s41587-020-0470-y; and International Patent Publication No. WO 2017/075294). In certain embodiments, stem cells are configured for expression of a CRISPR enzyme, such that the cells can be induced to differentiate by overexpressing a transcription factor and barcoded guide sequences can be introduced to the cells.

Differentiation of Progenitor Cells

In certain embodiments, target cells are further differentiated. In certain embodiments, cells are differentiated by spontaneous differentiation. In certain embodiments, cells are differentiated by directed differentiation.

As used herein the term “spontaneous differentiation” refers to a process where progenitor cells spontaneously differentiate into a target cell and usually involves removal of growth factors from the media. In certain embodiments, the process of spontaneous differentiation can be accelerated by suboptimal culture conditions, such as cultivation to high density for extended periods (4-7 weeks) without replacement of a feeder layer. In certain embodiments, neural progenitor cells obtained by overexpressing transcription factors are spontaneously differentiated into neurons, astrocytes and oligodendrocytes by removal of growth factors from the media (see, e.g., Example 1-2).

As used herein the term “directed differentiation” refers to exposing the stem cells or pluripotent cells to specific signaling pathways modulators and manipulating cell culture conditions (environmental or exogenous) to mimic the natural sequence of developmental decisions to produce a given cell type/tissue. In certain embodiments, pluripotent stem cells (PSCs) are cultured in controlled conditions involving specific substrate or extracellular matrices promoting cell adhesion and differentiation, and defined culture media compositions. A limited number of signaling factors, such as growth factors or small molecules, controlling cell differentiation is applied sequentially or in a combinatorial manner, at varying dosage and exposure time (Cohen D E, Melton D, 2011 “Turning straw into gold: directing cell fate for regenerative medicine”. Nature Reviews Genetics. 12 (4): 243-252). In certain embodiments, radial glia produced using the TF overexpression method as described herein can also be differentiated by directed differentiation into neurons, astrocytes, oligodendrocytes, or organoids.

As used herein, the term “organoid” or “epithelial organoid” refers to a cell cluster or aggregate that resembles an organ, or part of an organ, and possesses cell types relevant to that particular organ. Organoid systems have been described previously, for example, for brain, retinal, stomach, lung, thyroid, small intestine, colon, liver, kidney, pancreas, prostate, mammary gland, fallopian tube, taste buds, salivary glands, and esophagus (see, e.g., Clevers, Modeling Development and Disease with Organoids, Cell. 2016 Jun. 16; 165(7):1586-1597).

In certain embodiments, directed differentiation may include the use of hormones, cytokines, growth factors, mitogens or any other differentiation promoting agents.

In certain embodiments, dual SMAD inhibition (Chambers et al., 2009; Shi et al., 2012a) is used to differentiate RFX4 neural progenitor cells towards CNS cell types, radial glia, and neurons. In certain embodiments, the neurons are GABAergic neurons. Dual SMAD inhibition may include two inhibitors of SMAD signaling. One inhibitor may be a BMP inhibitor. BMP inhibitors include chordin, follistatin, and noggin (Chambers et al., 2009). The two inhibitors may be Noggin and SB431542. SB431542 inhibits the Lefty/Activin/TGFβ pathways by blocking phosphorylation of ALK4, ALK5, ALK7 receptors. Id.

Non-limiting examples of hormones include growth hormone (GH), adrenocorticotropic hormone (ACTH), dehydroepiandrosterone (DHEA), cortisol, epinephrine, thyroid hormone, estrogen, progesterone, testosterone, or combinations thereof.

Non-limiting examples of cytokines include lymphokines (e.g., interferon-γ, IL-2, IL-3, IL-4, IL-6, granulocyte-macrophage colony-stimulating factor (GM-CSF), interferon-γ, leukocyte migration inhibitory factors (T-LIF, B-LIF), lymphotoxin-alpha, macrophage-activating factor (MAF), macrophage migration-inhibitory factor (MIF), neuroleukin, immunologic suppressor factors, transfer factors, or combinations thereof), monokines (e.g., IL-1, TNF-alpha, interferon-α, interferon-β, colony stimulating factors, e.g., CSF2, CSF3, macrophage CSF or GM-CSF, or combinations thereof), chemokines (e.g., beta-thromboglobulin, C chemokines, CC chemokines, CXC chemokines, CX3C chemokines, macrophage inflammatory protein (MIP), or combinations thereof), interleukins (e.g., IL-1, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10, IL-11, IL-12, IL-13, IL-14, IL-15, IL-17, IL-18, IL-19, IL-20, IL-21, IL-22, IL-23, IL-24, IL-25, IL-26, IL-27, IL-28, IL-29, IL-30, IL-31, IL-32, IL-33, IL-34, IL-35, IL-36, or combinations thereof), and several related signaling molecules, such as tumor necrosis factor (TNF) and interferons (e.g., interferon-α, interferon-β, interferon-γ, interferon-λ, or combinations thereof).

Non-limiting examples of growth factors include those of fibroblast growth factor (FGF) family, bone morphogenic protein (BMP) family, platelet derived growth factor (PDGF) family, transforming growth factor beta (TGFbeta) family, nerve growth factor (NGF) family, epidermal growth factor (EGF) family, insulin related growth factor (IGF) family, hepatocyte growth factor (HGF) family, hematopoietic growth factors (HeGFs), platelet-derived endothelial cell growth factor (PD-ECGF), angiopoietin, vascular endothelial growth factor (VEGF) family, glucocorticoids, or combinations thereof.

Non-limiting examples of mitogens include phytohaemagglutinin (PHA), concanavalin A (conA), lipopolysaccharide (LPS), pokeweed mitogen (PWM), phorbol ester such as phorbol myristate acetate (PMA) with or without ionomycin, or combinations thereof.

Non-limiting examples of cell surface receptors the ligands of which may act as immunomodulants include Toll-like receptors (TLRs) (e.g., TLR1, TLR2, TLR3, TLR4, TLR5, TLR6, TLR7, TLR8, TLR9, TLR10, TLR11, TLR12 or TLR13), CD80, CD86, CD40, CCR7, or C-type lectin receptors.

In certain embodiments, differentiation promoting agents may be used to obtain particular types of target cells. Differentiation promoting agents include anticoagulants, chelating agents, and antibiotics. Examples of such agents may be one or more of the following: vitamins and minerals or derivatives thereof, such as A (retinol), B3, C (ascorbate), ascorbate 2-phosphate, D such as D2 or D3, K, retinoic acid, nicotinamide, zinc or zinc compound, and calcium or calcium compounds; natural or synthetic hormones such as hydrocortisone, and dexamethasone; amino acids or derivatives thereof, such as L-glutamine (L-glu), ethylene glycol tetracetic acid (EGTA), proline, and non-essential amino acids (NEAA); compounds or derivatives thereof, such as β-mercaptoethyl, dibutyl cyclic adenosine monophosphate (db-CAMP), monothioglycerol (MTG), putrescine, dimethyl sulfoxide (DMSO), hypoxanthine, adenine, forskolin, cilostamide, and 3-isobutyl-1-methylxanthine; nucleosides and analogues thereof, such as 5-azacytidine; acids or salts thereof, such as ascorbic acid, pyruvate, okadic acid, linoleic acid, ethylenediaminetetraacetic acid (EDTA), anticoagulant citrate dextrose formula A (ACDA), disodium EDTA, sodium butyrate, and glycerophosphate; antibiotics or drugs, such as G418, gentamycine, Pentoxifylline (1-(5-oxohexyl)-3,7-dimethylxanthine), and indomethacin; and proteins such as tissue plasminogen activator (TPA).

Transdifferentiation

In certain embodiments, the screening platform and methods of screening are used for identifying transcription factors that drive transdifferentiation of cells into target cell types. As used herein, the terms “transdifferentiation” and “lineage reprogramming” refer to the process by which a committed cell of a first cell lineage is changed into another cell of a different cell type or a process in which one mature somatic cell transforms into another mature somatic cell without undergoing an intermediate pluripotent state or progenitor cell type. In some embodiments, transdifferentiation may be a combination of retrodifferentiation and redifferentiation. A “transdifferentiated cell” is a cell that results from transdifferentiation of a committed cell. For example, a committed cell such as a blood cell or glial cell may be transdifferentiated into a neuron; or a fibroblast may be transdifferentiated into a myocyte. As used herein, “retrodifferentiation” is the process by which a committed cell, i.e., mature, specialized cell, reverts back to a more primitive cell stage. A “retrodifferentiated cell” is a cell that results from retrodifferentiation of a committed cell. As used herein, “redifferentiation” refers to the process by which an uncommitted cell or a retrodifferentiated cell differentiates into a more mature, specialized cell. A “redifferentiated cell” refers to a cell that results from redifferentiation of an uncommitted cell or a retrodifferentiated cell. If a redifferentiated cell is obtained through redifferentiation of a retrodifferentiated cell, the redifferentiated cell may be of the same or different lineage as the committed cell that had undergone retrodifferentiation. For example, a committed cell such as a white blood cell may be retrodifferentiated to form a retrodifferentiated cell such as a pluripotent stem cell, and then the retrodifferentiated cell may be redifferentiated to form a lymphocyte, which is of the same lineage as the white blood cell (committed cell), or redifferentiated to form a neuron, which is of a different lineage than the white blood cell (committed cell).

In certain embodiments, transcription factors are used to transdifferentiate cells of one lineage into a target cell of a different lineage. In certain embodiments, target cell types can be transferred to a subject in need thereof to regenerate a diseased or damaged tissue. One study showed that that islet α-cells can be lineage-traced and reprogrammed by the transcription factors PDX1 and MAFA to produce and secrete insulin in response to glucose that are capable of reversing diabetes in mice (see, e.g., Furuyama, K. et al., 2019 Diabetes relief in mice by glucose-sensing insulin-secreting human α-cells Nature 567, 43-48). Another study showed that functional cardiomyocytes can be directly reprogrammed from differentiated somatic cells using three developmental transcription factors (i.e., Gata4, Mef2c and Tbx5) (see, e.g., Ieda, et al. (2010). “Direct Reprogramming of Fibroblasts into Functional Cardiomyocytes by Defined Factors”. Cell. 142 (3): 375-386. Another study identified that a combination of three factors, Ascl1, Bm2 and Myt11, sufficed to convert mouse embryonic and postnatal fibroblasts into functional neurons in vitro (see, e.g., Vierbuchen, et al., (2010). “Direct conversion of fibroblasts to functional neurons by defined factors”. Nature. 463 (7284): 1035-1041). In certain embodiments, transcription factors that differentiate stem cells into a target cell (e.g., progenitor cell) can be used to transdifferentiate cells of one lineage into a target cell of a different lineage. In certain embodiments, TFs that are expressed in progenitor cells can be used to transdifferentiate cells of one lineage into a target cell of a different lineage (see, e.g., Graf, T.; Enver, T. (2009). “Forcing cells to change lineages”. Nature. 462 (7273): 587-594). In this approach, transcription factors from progenitor cells of the target cell type are transfected into a somatic cell to induce transdifferentiation. Determining the unique set of cellular factors that is needed to be manipulated for each cell conversion is a long and costly process that involves much trial and error. Previous methods required narrowing down factors one by one. As a result, this first step of identifying the key set of cellular factors for cell conversion is the major obstacle researchers face in the field of cell reprogramming. In certain embodiments, the pooled screening methods described herein are used for determining which transcription factors to use.

In certain embodiments, cells can be transdifferentiated to target cells in vivo by targeted modulation of transcription factors or downstream targets. In certain embodiments, the targeted modulation of transcription factors can be used to regenerate, replenish or replace damaged or diseased cells in a subject in need thereof (e.g., heart cells, pancreatic β cells, eye cells, nervous system cells).

In certain embodiments, modulation of one or more of the transcription factors RFX4, NFIB, ASCL1 and PAX6 are used to transdifferentiate glia cells into neurons, astrocytes, or oligodendrocytes. For example, oligodendrocytes may be produced to regenerate the myelin sheath on axons.

In certain embodiments, modulation of one or more of the transcription factors MESP1, EOMES and ESR1 are used to transdifferentiate cardiofibroblasts into cardiomyocytes. For example, cardiomyocytes may be produced to regenerate a damaged heart.

Cell State Transitions

In certain embodiments, the screening platform and methods of screening are used for identifying transcription factors that modify the cell state or cell state transitions of target cell types. In example embodiments, cell state reflects the fact that cells of a particular type can exhibit variability with regard to one or more features and/or can exist in a variety of different conditions, while retaining the features of their particular cell type and not gaining features that would cause them to be classified as a different cell type. The different states or conditions in which a cell can exist may be characteristic of a particular cell type (e.g., they may involve properties or characteristics exhibited only by that cell type and/or involve functions performed only or primarily by that cell type) or may occur in multiple different cell types. Sometimes a cell state reflects the capability of a cell to respond to a particular stimulus or environmental condition (e.g., whether or not the cell will respond, or the type of response that will be elicited) or is a condition of the cell brought about by a stimulus or environmental condition. Cells in different cell states may be distinguished from one another in a variety of ways. For example, they may express, produce, or secrete one or more different genes, proteins, or other molecules (“markers”), exhibit differences in protein modifications such as phosphorylation, acetylation, etc., or may exhibit differences in appearance. Thus, a cell state may be a condition of the cell in which the cell expresses, produces, or secretes one or more markers, exhibits particular protein modification(s), has a particular appearance, and/or will or will not exhibit one or more biological response(s) to a stimulus or environmental condition.

In example embodiments, a transcription factor or combination of TFs can transition a cell from expressing one cell program to another cell program while the cell type remains the same (e.g., biological program, signature, expression program as described herein). For example, a cell may transition from an “old cell signature” to a “young cell signature” for rejuvenation (e.g., transitioning an “old neuron” to “young neuron”). Another example is enhancing certain cell functions, such as increasing efficiency of T cell killing by transitioning “exhausted T cell signature” to “active or naïve T cell signature.”

Another example of cell state is “activated” state as compared with “resting” or “non-activated” state. Many cell types in the body have the capacity to respond to a stimulus by modifying their state to an activated state. The particular alterations in state may differ depending on the cell type and/or the particular stimulus. A stimulus could be any biological, chemical, or physical agent to which a cell may be exposed.

Another example of cell state reflects the condition of cell (e.g., a muscle cell or adipose cell) as either sensitive or resistant to insulin. Insulin resistant cells exhibit decreased response to circulating insulin; for example, insulin-resistant skeletal muscle cells exhibit markedly reduced insulin-stimulated glucose uptake and a variety of other metabolic abnormalities that distinguish these cells from cells with normal insulin sensitivity.

In an example embodiment, the cell state is an immune cell state. The term “immune cell” as used throughout this specification generally encompasses any cell derived from a hematopoietic stem cell that plays a role in the immune response. The term is intended to encompass immune cells both of the innate or adaptive immune system. The immune cell as referred to herein may be a leukocyte, at any stage of differentiation (e.g., a stem cell, a progenitor cell, a mature cell) or any activation stage. Immune cells include lymphocytes (such as natural killer cells, T-cells (including, e.g., thymocytes, Th or Tc; Th1, Th2, Th17, Thαβ, CD4+, CD8+, effector Th, memory Th, regulatory Th, CD4+/CD8+ thymocytes, CD4−/CD8− thymocytes, γδ T cells, etc.) or B-cells (including, e.g., pro-B cells, early pro-B cells, late pro-B cells, pre-B cells, large pre-B cells, small pre-B cells, immature or mature B-cells, producing antibodies of any isotype, T1 B-cells, T2, B-cells, naïve B-cells, GC B-cells, plasmablasts, memory B-cells, plasma cells, follicular B-cells, marginal zone B-cells, B-1 cells, B-2 cells, regulatory B cells, etc.), such as for instance, monocytes (including, e.g., classical, non-classical, or intermediate monocytes), (segmented or banded) neutrophils, eosinophils, basophils, mast cells, histiocytes, microglia, including various subtypes, maturation, differentiation, or activation stages, such as for instance hematopoietic stem cells, myeloid progenitors, lymphoid progenitors, myeloblasts, promyelocytes, myelocytes, metamyelocytes, monoblasts, promonocytes, lymphoblasts, prolymphocytes, small lymphocytes, macrophages (including, e.g., Kupffer cells, stellate macrophages, M1 or M2 macrophages), (myeloid or lymphoid) dendritic cells (including, e.g., Langerhans cells, conventional or myeloid dendritic cells, plasmacytoid dendritic cells, mDC-1, mDC-2, Mo-DC, HP-DC, veiled cells), granulocytes, polymorphonuclear cells, antigen-presenting cells (APC), etc.

As used throughout this specification, “immune response” refers to a response by a cell of the immune system, such as a B cell, T cell (CD4+ or CD8+), regulatory T cell, antigen-presenting cell, dendritic cell, monocyte, macrophage, NKT cell, NK cell, basophil, eosinophil, or neutrophil, to a stimulus. In some embodiments, the response is specific for a particular antigen (an “antigen-specific response”), and refers to a response by a CD4 T cell, CD8 T cell, or B cell via their antigen-specific receptor. In some embodiments, an immune response is a T cell response, such as a CD4+ response or a CD8+ response. Such responses by these cells can include, for example, cytotoxicity, proliferation, cytokine or chemokine production, trafficking, or phagocytosis, and can be dependent on the nature of the immune cell undergoing the response.

T cell response refers more specifically to an immune response in which T cells directly or indirectly mediate or otherwise contribute to an immune response in a subject. T cell-mediated response may be associated with cell mediated effects, cytokine mediated effects, and even effects associated with B cells if the B cells are stimulated, for example, by cytokines secreted by T cells. By means of an example but without limitation, effector functions of MHC class I restricted Cytotoxic T lymphocytes (CTLs), may include cytokine and/or cytolytic capabilities, such as lysis of target cells presenting an antigen peptide recognized by the T cell receptor (naturally-occurring TCR or genetically engineered TCR, e.g., chimeric antigen receptor, CAR), secretion of cytokines, preferably IFN gamma, TNF alpha and/or or more immunostimulatory cytokines, such as IL-2, and/or antigen peptide-induced secretion of cytotoxic effector molecules, such as granzymes, perforins or granulysin. By means of example but without limitation, for MHC class II restricted T helper (Th) cells, effector functions may be antigen peptide-induced secretion of cytokines, preferably, IFN gamma, TNF alpha, IL-4, IL5, IL-10, and/or IL-2. By means of example but without limitation, for T regulatory (Treg) cells, effector functions may be antigen peptide-induced secretion of cytokines, preferably, IL-10, IL-35, and/or TGF-beta. B cell response refers more specifically to an immune response in which B cells directly or indirectly mediate or otherwise contribute to an immune response in a subject. Effector functions of B cells may include in particular production and secretion of antigen-specific antibodies by B cells (e.g., polyclonal B cell response to a plurality of the epitopes of an antigen (antigen-specific antibody response)), antigen presentation, and/or cytokine secretion.

During persistent immune activation, such as during uncontrolled tumor growth or chronic infections, subpopulations of immune cells, particularly of CD8+ or CD4+ T cells, become compromised to different extents with respect to their cytokine and/or cytolytic capabilities. Such immune cells, particularly CD8+ or CD4+ T cells, are commonly referred to as “dysfunctional” or as “functionally exhausted” or “exhausted”. As used herein, the term “dysfunctional” or “functional exhaustion” refer to a state of a cell where the cell does not perform its usual function or activity in response to normal input signals, and includes refractivity of immune cells to stimulation, such as stimulation via an activating receptor or a cytokine. Such a function or activity includes, but is not limited to, proliferation (e.g., in response to a cytokine, such as IFN-gamma) or cell division, entrance into the cell cycle, cytokine production, cytotoxicity, migration and trafficking, phagocytotic activity, or any combination thereof. Normal input signals can include, but are not limited to, stimulation via a receptor (e.g., T cell receptor, B cell receptor, co-stimulatory receptor). Unresponsive immune cells can have a reduction of at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or even 100% in cytotoxic activity, cytokine production, proliferation, trafficking, phagocytotic activity, or any combination thereof, relative to a corresponding control immune cell of the same type. In some particular embodiments of the aspects described herein, a cell that is dysfunctional is a CD8+ T cell that expresses the CD8+ cell surface marker. Such CD8+ cells normally proliferate and produce cell killing enzymes, e.g., they can release the cytotoxins perforin, granzymes, and granulysin. However, exhausted/dysfunctional T cells do not respond adequately to TCR stimulation, and display poor effector function, sustained expression of inhibitory receptors and a transcriptional state distinct from that of functional effector or memory T cells. Dysfunction/exhaustion of T cells thus prevents optimal control of infection and tumors. Exhausted/dysfunctional immune cells, such as T cells, such as CD8+ T cells, may produce reduced amounts of IFN-gamma, TNF-alpha and/or one or more immunostimulatory cytokines, such as IL-2, compared to functional immune cells. Exhausted/dysfunctional immune cells, such as T cells, such as CD8+ T cells, may further produce (increased amounts of) one or more immunosuppressive transcription factors or cytokines, such as IL-10 and/or Foxp3, compared to functional immune cells, thereby contributing to local immunosuppression. Dysfunctional CD8+ T cells can be both protective and detrimental against disease control. As used herein, a “dysfunctional immune state” refers to an overall suppressive immune state in a subject or microenvironment of the subject (e.g., tumor microenvironment). For example, increased IL-10 production leads to suppression of other immune cells in a population of immune cells.

CD8+ T cell function is associated with their cytokine profiles. It has been reported that effector CD8+ T cells with the ability to simultaneously produce multiple cytokines (polyfunctional CD8+ T cells) are associated with protective immunity in patients with controlled chronic viral infections as well as cancer patients responsive to immune therapy (Spranger et al., 2014, J. Immunother. Cancer, vol. 2, 3). In the presence of persistent antigen CD8+ T cells were found to have lost cytolytic activity completely over time (Moskophidis et al., 1993, Nature, vol. 362, 758-761). It was subsequently found that dysfunctional T cells can differentially produce IL-2, TNFa and IFNg in a hierarchical order (Wherry et al., 2003, J. Virol., vol. 77, 4911-4927). Decoupled dysfunctional and activated CD8+ cell states have also been described (see, e.g., Singer, et al. (2016). A Distinct Gene Module for Dysfunction Uncoupled from Activation in Tumor-Infiltrating T Cells. Cell 166, 1500-1511 e1509; WO/2017/075478; and WO/2018/049025).

As used herein, terms such as “Th17 cell” and/or “Th17 phenotype” and all grammatical variations thereof refer to a differentiated T helper cell that expresses one or more cytokines selected from the group the consisting of interleukin 17A (IL-17A), interleukin 17F (IL-17F), and interleukin 17A/F heterodimer (IL17-AF). As used herein, terms such as “Th1 cell” and/or “Th1 phenotype” and all grammatical variations thereof refer to a differentiated T helper cell that expresses interferon gamma (IFNγ). As used herein, terms such as “Th2 cell” and/or “Th2 phenotype” and all grammatical variations thereof refer to a differentiated T helper cell that expresses one or more cytokines selected from the group the consisting of interleukin 4 (IL-4), interleukin 5 (IL-5) and interleukin 13 (IL-13). As used herein, terms such as “Treg cell” and/or “Treg phenotype” and all grammatical variations thereof refer to a differentiated T cell that expresses Foxp3.

Depending on the cytokines used for differentiation, in vitro polarized Th17 cells can either cause severe autoimmune responses upon adoptive transfer (‘pathogenic Th17 cell state’) or have little or no effect in inducing autoimmune disease (‘non-pathogenic cell state’) (Ghoreschi et al., 2010; and Lee et al., 2012 “Induction and molecular signature of pathogenic Th17 cells,” Nature Immunology, vol. 13(10): 991-999). A dynamic regulatory network controls Th17 differentiation (See e.g., Yosef et al., Dynamic regulatory network controlling Th17 cell differentiation, Nature, vol. 496: 461-468 (2013); Wang et al., CD5L/AIM Regulates Lipid Biosynthesis and Restrains Th17 Cell Pathogenicity, Cell Volume 163, Issue 6, p 1413-1427, 3 Dec. 2015; Gaublomme et al., Single-Cell Genomics Unveils Critical Regulators of Th17 Cell Pathogenicity, Cell Volume 163, Issue 6, p 1400-1412, 3 Dec. 2015; and International publication numbers WO2016138488A2, WO2015130968, WO/2012/048265, WO/2014/145631 and WO/2014/134351, the contents of which are hereby incorporated by reference in their entirety).

Markers specific for the cell state can be determined for each TF as described previously (e.g., activated, quiescent, exhausted cell state markers). Markers can be determined, for example, by scRNA-seq (e.g., entire programs), flow FISH, reporters, etc.

Therapeutic Compositions and Uses

In certain embodiment, the cells produced according to the present invention are used for treatment, to model a disease, or to screen for therapeutic agents. In certain embodiments, target cells obtained according to the methods described herein may be used for the treatment of a subject in need thereof. In certain embodiments, target cells transdifferentiated according to the methods described herein may be used for the treatment of a subject in need thereof. In certain embodiments, target cells are transferred to a subject to repair, regenerate, replace or replenish a target tissue or cell type. In certain embodiments, transcription factors or agents capable of modulating expression or activity of the transcription factors or downstream pathways are introduced in vivo to generate target cells. In certain embodiments, the TFs or agents are introduced to a specific target region requiring the target cells.

As used herein, a “subject” is a vertebrate, including any member of the class mammalia. As used herein, a “mammal” refers to any mammal including but not limited to human, mouse, rat, sheep, monkey, goat, rabbit, hamster, horse, cow or pig.

In certain embodiments, a cell-based therapeutic includes engraftment of the cells of the present invention. As used herein, the term “engraft” or “engraftment” refers to the process of cell incorporation into a tissue of interest in vivo through contact with existing cells of the tissue.

In certain embodiments, the cell based therapy may comprise adoptive cell transfer (ACT). As used herein adoptive cell transfer and adoptive cell therapy are used interchangeably. In certain embodiments, the target cells differentiated according to the methods described herein may be transferred to a subject in need thereof. If possible, use of autologous cells helps the recipient by minimizing GVHD issues. In certain embodiments, autologous stem cells are harvested from a subject and the cells are modulated to overexpress the transcription factor(s) to differentiate the stem cells into target cells.

In certain embodiments, the target cells are used as a cell-based therapy to treat a subject suffering from a disease. In certain embodiments, the disease may be treated by infusion of target cell types (see, e.g., US Patent Publication No. 20110091433A1 and Table 2 of application). In certain embodiments, a disease may be treated by inducing target cells in vivo. Target cells may be induced by expressing transcription factors at a specific site of the disease. Transcription factors may be provided to specific cells at a location of disease. In certain embodiments, mRNA is provided. In certain embodiments, transdifferentiation of target cells is performed in vivo.

Diseases

In certain embodiment, the cells produced according to the present invention are used for treatment, to model a disease, or to screen for therapeutic agents. The disease may be selected from the group consisting of bone marrow failure, hematological conditions, aplastic anemia, beta-thalassemia, diabetes, neuron disease, motor neuron disease, Parkinson's disease, spinal cord injury, muscular dystrophy, kidney disease, liver disease, multiple sclerosis, congestive heart failure, head trauma, lung disease, psoriasis, liver cirrhosis, vision loss, cystic fibrosis, hepatitis C virus, human immunodeficiency virus, inflammatory bowel disease (IBD), and any disorder associated with tissue degeneration.

In certain embodiments, the neuron disease may be a disease where GABAergic neurons are implicated. In certain embodiments, the disease may be autism, schizophrenia, epilepsy, dementia, Alzheimer's disease, or anxiety disorders (e.g., depression) (Rudy, et al., Three Groups of Interneurons Account for Nearly 100% of Neocortical GABAergic Neurons, Dev Neurobiol. 2011 Jan. 1; 71(1): 45-61; Xu and Wong, GABAergic Inhibitory Neurons as Therapeutic Targets for Cognitive Impairment in Schizophrenia, Acta Pharmacol Sin. 2018 May; 39(5): 733-753; Fogaça and Duman, Cortical GABAergic Dysfunction in Stress and Depression: New Insights for Therapeutic Interventions, Front Cell Neurosci. 2019; 13: 87; Choi et al., Pathology of nNOS expressing GABAergic neurons in mouse model of Alzheimer's disease, Neuroscience. 2018 Aug. 1; 384: 41-53; Treiman, GABAergic Mechanisms in Epilepsy, Epilepsia. 2001; 42 Suppl 3:8-12; and Coghlan et al., GABA System Dysfunction in Autism and Related Disorders: From Synapse to Symptoms, Neurosci Biobehav Rev. 2012 October; 36(9): 2044-2055).

Aplastic anemia is a rare but fatal bone marrow disorder, marked by pancytopaenia and hypocellular bone marrow (Young et al. Blood 2006, 108: 2509-2519). The disorder may be caused by an immune-mediated pathophysiology with activated type I cytotoxic T cells expressing Th1 cytokine, especially y-interferon targeted towards the haematopoietic stem cell compartment, leading to bone marrow failure and hence anhaematoposis (Bacigalupo et al. Hematology 2007, 23-28). The majority of aplastic anaemia patients can be treated with stem cell transplantation obtained from HLA-matched siblings (Locasciulli et al. Haematologica. 2007; 92:11-18.).

Thalassaemia is an inherited autosomal recessive blood disease marked by a reduced synthesis rate of one of the globin chains that make up hemoglobin. Thus, there is an underproduction of normal globin proteins, often due to mutations in regulatory genes, which results in formation of abnormal hemoglobin molecules, causing anemia. Different types of thalassemia include alpha thalassemia, beta thalassemia, and delta thalassemia, which affect production of the alpha globin, beta globin, and delta globin, respectively.

Diabetes is a syndrome resulting in abnormally high blood sugar levels (hyperglycemia). Diabetes refers to a group of diseases that lead to high blood glucose levels due to defects in either insulin secretion or insulin action in the body. Diabetes is typically separated into two types: type 1 diabetes, marked by a diminished production of insulin, or type 2 diabetes, marked by a resistance to the effects of insulin. Both types lead to hyperglycemia, which largely causes the symptoms generally associated with diabetes, e.g., excessive urine production, resulting compensatory thirst and increased fluid intake, blurred vision, unexplained weight loss, lethargy, and changes in energy metabolism.

Motor neuron diseases refer to a group of neurological disorders that affect motor neurons. Such diseases include amyotrophic lateral sclerosis (ALS), primary lateral sclerosis (PLS), and progressive muscular atrophy (PMA). ALS is marked by degeneration of both the upper and lower motor neurons, which ceases messages to the muscles and results in their weakening and eventual atrophy. PLS is a rare motor neuron disease affecting upper motor neurons only, which causes difficulties with balance, weakness and stiffness in legs, spasticity, and speech problems. PMA is a subtype of ALS that affects only the lower motor neurons, which can cause muscular atrophy, fasciculations, and weakness.

Parkinson's disease (PD) is a neurodegenerative disorder marked by the loss of the nigrostriatal pathway, resulting from degeneration of dopaminergic neurons within the substantia nigra. The cause of PD is not known, but is associated with the progressive death of dopaminergic (tyrosine hydroxylase (TH) positive) mesencephalic neurons, inducing motor impairment. Hence, PD is characterized by muscle rigidity, tremor, bradykinesia, and potentially akinesia.

Spinal cord injury is characterized by damage to the spinal cord and, in particular, the nerve fibers, resulting in impairment of part or all muscles or nerves below the injury site. Such damage may occur through trauma to the spine that fractures, dislocates, crushes, or compresses one or more of the vertebrae, or through nontraumatic injuries caused by arthritis, cancer, inflammation, or disk degeneration.

Muscular dystrophy (MD) refers to a set of hereditary muscle diseases that weaken skeletal muscles. MD may be characterized by progressive muscle weakness, defects in muscle proteins, muscle cell apoptosis, and tissue atrophy. There are over 100 diseases which exhibit MD characteristics, although nine diseases in particular—Duchenne, Becker, limb girdle, congenital, facioscapulohumeral, myotonic, oculopharyngeal, distal, and Emery-Dreifuss—are classified as MD.

Kidney disease refers to conditions that damage the kidneys and decrease their ability to function, which includes removal of wastes and excess water from the blood, regulation of electrolytes, blood pressure, acid-base balance, and reabsorption of glucose and amino acids. The two main causes of kidney disease are diabetes and high blood pressure, although other causes include glomerulonephritis, lupus, and malformations and obstructions in the kidney.

Multiple sclerosis is an autoimmune condition in which the immune system attacks the central nervous system, leading to demyelination. MS affects the ability of nerve cells in the brain and spinal cord to communicate with each other, as the body's own immune system attacks and damages the myelin which enwraps the neuron axons. When myelin is lost, the axons can no longer effectively conduct signals. This can lead to various neurological symptoms which usually progresses into physical and cognitive disability. In certain embodiments, target cells may include oligodendrocytes.

Congestive heart failure refers to a condition in which the heart cannot pump enough blood to the body's other organs. This condition can result from coronary artery disease, scar tissue on the heart cause by myocardial infarction, high blood pressure, heart valve disease, heart defects, and heart valve infection. Treatment programs typically consist of rest, proper diet, modified daily activities, and drugs such as angiotensin-converting enzyme (ACE) inhibitors, beta blockers, digitalis, diuretics, vasodilators. However, the treatment program will not reverse the damage or condition of the heart.

Hepatitis C is an infectious disease in the liver, caused by hepatitis C virus. Hepatitis C can progress to scarring (fibrosis) and advanced scarring (cirrhosis). Cirrhosis can lead to liver failure and other complications such as liver cancer.

Head trauma refers to an injury of the head that may or may not cause injury to the brain. Common causes of head trauma include traffic accidents, home and occupational accidents, falls, and assaults. Various types of problems may result from head trauma, including skull fracture, lacerations of the scalp, subdural hematoma (bleeding below the dura mater), epidural hematoma (bleeding between the dura mater and the skull), cerebral contusion (brain bruise), concussion (temporary loss of function due to trauma), coma, or even death.

Lung disease is a broad term for diseases of the respiratory system, which includes the lung, pleural cavity, bronchial tubes, trachea, upper respiratory tract, and nerves and muscles for breathing. Examples of lung diseases include obstructive lung diseases, in which the bronchial tubes become narrowed; restrictive or fibrotic lung diseases, in which the lung loses compliance and causes incomplete lung expansion and increased lung stiffness; respiratory tract infections, which can be caused by the common cold or pneumonia; respiratory tumors, such as those caused by cancer; pleural cavity diseases; and pulmonary vascular diseases, which affect pulmonary circulation.

Pharmaceutical Compositions

Target cells of the present invention may be combined with various components to produce compositions of the invention. The compositions may be combined with one or more pharmaceutically acceptable carriers or diluents to produce a pharmaceutical composition (which may be for human or animal use). Suitable carriers and diluents include, but are not limited to, isotonic saline solutions, for example phosphate-buffered saline. The composition of the invention may be administered by direct injection. The composition may be formulated for parenteral, intramuscular, intravenous, subcutaneous, intraocular, oral, transdermal administration, or injection into the spinal fluid.

Compositions comprising target cells may be delivered by injection or implantation. Cells may be delivered in suspension or embedded in a support matrix such as natural and/or synthetic biodegradable matrices. Natural matrices include, but are not limited to, collagen matrices. Synthetic biodegradable matrices include, but are not limited to, polyanhydrides and polylactic acid. These matrices may provide support for fragile cells in vivo.

The compositions may also comprise the target cells of the present invention, and at least one pharmaceutically acceptable excipient, carrier, or vehicle.

Delivery may also be by controlled delivery, i.e., delivered over a period of time which may be from several minutes to several hours or days. Delivery may be systemic (for example by intravenous injection) or directed to a particular site of interest. Cells may be introduced in vivo using liposomal transfer.

Target cells may be administered in doses of from 1×105 to 1×107 cells per kg. For example a 70 kg patient may be administered 1.4×106 cells for reconstitution of tissues. The dosages may be any combination of the target cells listed in this application.

Genetic Modifying Agents

In certain embodiments, the one or more modulating agents (e.g., for overexpressing transcription factors, silencing transcription factors or tagging cells with a detectable marker) may be a genetic modifying agent. The genetic modifying agent may comprise a CRISPR system, a zinc finger nuclease system, a TALEN, a meganuclease, or RNAi.

CRISPR

In certain embodiments, a CRISPR system is used to enhance expression or activity of transcription factors. In certain embodiments, the transcription factor expression or activity is enhanced temporarily, such that the enhancement is not permanent. In certain embodiments, expression of the transcription from its endogenous gene is enhanced (e.g., by directing an activator to the gene).

In certain embodiments, modification of transcription factor mRNA by a Cas13-deaminase system can be used to modulate transcription factor activity in order to generate target cells (see, e.g., International Patent Publication No. WO 2019/084062). In certain embodiments, the modification silences ubiquitination, methylation, acetylation, succinylation, glycosylation, O-GlcNAc, O-linked glycosylation, iodination, nitrosylation, sulfation, carboxyglutamation, phosphorylation, or a combination thereof. In some embodiments, the modification increases a half-life of a target TF. In certain embodiments, the transcription activity is enhanced by modifying a phosphorylation site on the transcription factor (see, e.g., Hunter and Karin, 1992, The regulation of Transcription by Phosphorylation. Cell, Vol. 70, 375-387; and Whitmarsh and Davis, 2000, Regulation of transcription factor function by phosphorylation. CMLS, Cell. Mol. Life Sci. 57: 1172).

In general, a CRISPR-Cas or CRISPR system as used in herein and in documents, such as International Patent Publication No. WO 2014/093622 (PCT/US2013/074667), refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or “RNA(s)” as that term is herein used (e.g., RNA(s) to guide Cas, such as Cas9, e.g. CRISPR RNA and transactivating (tracr) RNA or a single guide RNA (sgRNA) (chimeric RNA)) or other sequences and transcripts from a CRISPR locus. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system). See, e.g., Shmakov et al. (2015) “Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems”, Molecular Cell, DOI: dx.doi.org/10.1016/j.molcel.2015.10.008.

CRISPR-Cas systems can generally fall into two classes based on their architectures of their effector molecules, which are each further subdivided by type and subtype. The two class are Class 1 and Class 2. Class 1 CRISPR-Cas systems have effector modules composed of multiple Cas proteins, some of which form crRNA-binding complexes, while Class 2 CRISPR-Cas systems include a single, multi-domain crRNA-binding protein.

In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 1 CRISPR-Cas system. In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 2 CRISPR-Cas system.

In certain embodiments, a CRISPR system is used to enhance expression or activity of transcription factors (e.g., RFX4, NFIB, ASCL1, PAX6). In certain embodiments, the transcription factor expression or activity is enhanced temporarily, such that the enhancement is not permanent. In certain embodiments, expression of the transcription from its endogenous gene is enhanced (e.g., by directing an activator to the gene). In certain embodiments, genes are targeted for downregulation. In certain embodiments, genes are targeted for editing.

In certain embodiments, modification of transcription factor mRNA by a Cas13-deaminase system can be used to modulate transcription factor activity in order to generate target cells (see, e.g., International Patent Publication No. WO 2019/084062). In certain embodiments, the modification silences ubiquitination, methylation, acetylation, succinylation, glycosylation, O-GlcNAc, O-linked glycosylation, iodination, nitrosylation, sulfation, carboxyglutamation, phosphorylation, or a combination thereof. In some embodiments, the modification increases a half-life of a target TF. In certain embodiments, the transcription activity is enhanced by modifying a phosphorylation site on the transcription factor (see, e.g., Hunter and Karin, 1992, The regulation of Transcription by Phosphorylation. Cell, Vol. 70, 375-387; and Whitmarsh and Davis, 2000, Regulation of transcription factor function by phosphorylation. CMLS, Cell. Mol. Life Sci. 57: 1172).

Class 1 CRISPR-Cas Systems

In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 1 CRISPR-Cas system. Class 1 CRISPR-Cas systems are divided into types I, II, and IV. Makarova et al. 2020. Nat. Rev. 18: 67-83., particularly as described in FIG. 1. Type I CRISPR-Cas systems are divided into 9 subtypes (I-A, I-B, I-C, I-D, I-E, I-F1, I-F2, I-F3, and IG). Makarova et al., 2020. Class 1, Type I CRISPR-Cas systems can contain a Cas3 protein that can have helicase activity. Type III CRISPR-Cas systems are divided into 6 subtypes (III-A, III-B, III-C, III-D, III-E, and III-F). Type III CRISPR-Cas systems can contain a Cas10 that can include an RNA recognition motif called Palm and a cyclase domain that can cleave polynucleotides. Makarova et al., 2020. Type IV CRISPR-Cas systems are divided into 3 subtypes. (IV-A, IV-B, and IV-C). Makarova et al., 2020. Class 1 systems also include CRISPR-Cas variants, including Type I-A, I-B, I-E, I-F and I-U variants, which can include variants carried by transposons and plasmids, including versions of subtype I-F encoded by a large family of Tn7-like transposon and smaller groups of Tn7-like transposons that encode similarly degraded subtype I-B systems. Peters et al., PNAS 114 (35) (2017); DOI: 10.1073/pnas.1709035114; see also, Makarova et al. 2018. The CRISPR Journal, v. 1, n 5, FIG. 5.

The Class 1 systems typically use a multi-protein effector complex, which can, in some embodiments, include ancillary proteins, such as one or more proteins in a complex referred to as a CRISPR-associated complex for antiviral defense (Cascade), one or more adaptation proteins (e.g. Cas1, Cas2, RNA nuclease), and/or one or more accessory proteins (e.g. Cas 4, DNA nuclease), CRISPR associated Rossman fold (CARF) domain containing proteins, and/or RNA transcriptase.

The backbone of the Class 1 CRISPR-Cas system effector complexes can be formed by RNA recognition motif domain-containing protein(s) of the repeat-associated mysterious proteins (RAMPs) family subunits, e.g., Cas 5, Cas6, and/or Cas7. RAMP proteins are characterized by having one or more RNA recognition motif domains. In some embodiments, multiple copies of RAMPs can be present. In some embodiments, the Class I CRISPR-Cas system can include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more Cas5, Cas6, and/or Cas 7 proteins. In some embodiments, the Cas6 protein is an RNAse, which can be responsible for pre-crRNA processing. When present in a Class 1 CRISPR-Cas system, Cas6 can be optionally physically associated with the effector complex.

Class 1 CRISPR-Cas system effector complexes can, in some embodiments, also include a large subunit. The large subunit can be composed of or include a Cas8 and/or Cas10 protein. See, e.g., FIGS. 1 and 2. Koonin E V, Makarova K S. 2019. Phil. Trans. R. Soc. B 374: 20180087, DOI: 10.1098/rstb.2018.0087 and Makarova et al. 2020.

Class 1 CRISPR-Cas system effector complexes can, in some embodiments, include a small subunit (for example, Cas11). See, e.g., FIGS. 1 and 2. Koonin E V, Makarova K S. 2019 Origins and evolution of CRISPR-Cas systems. Phil. Trans. R. Soc. B 374: 20180087, DOI: 10.1098/rstb.2018.0087.

In some embodiments, the Class 1 CRISPR-Cas system can be a Type I CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-A CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-B CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-C CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-D CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-E CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-F1 CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-F2 CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-F3 CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-G CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a CRISPR Cas variant, such as a Type I-A, I-B, I-E, I-F and I-U variants, which can include variants carried by transposons and plasmids, including versions of subtype I-F encoded by a large family of Tn7-like transposon and smaller groups of Tn7-like transposons that encode similarly degraded subtype I-B systems as previously described.

In some embodiments, the Class 1 CRISPR-Cas system can be a Type III CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-A CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-B CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-C CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-D CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-E CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-F CRISPR-Cas system.

In some embodiments, the Class 1 CRISPR-Cas system can be a Type IV CRISPR-Cas-system. In some embodiments, the Type IV CRISPR-Cas system can be a subtype IV-A CRISPR-Cas system. In some embodiments, the Type IV CRISPR-Cas system can be a subtype IV-B CRISPR-Cas system. In some embodiments, the Type IV CRISPR-Cas system can be a subtype IV-C CRISPR-Cas system.

The effector complex of a Class 1 CRISPR-Cas system can, in some embodiments, include a Cas3 protein that is optionally fused to a Cas2 protein, a Cas4, a Cas 5, a Cas6, a Cas7, a Cas8, a Cas10, a Cas11, or a combination thereof. In some embodiments, the effector complex of a Class 1 CRISPR-Cas system can have multiple copies, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14, of any one or more Cas proteins.

Class 2 CRISPR-Cas Systems

The compositions, systems, and methods described in greater detail elsewhere herein can be designed and adapted for use with Class 2 CRISPR-Cas systems. Thus, in some embodiments, the CRISPR-Cas system is a Class 2 CRISPR-Cas system. Class 2 systems are distinguished from Class 1 systems in that they have a single, large, multi-domain effector protein. In certain example embodiments, the Class 2 system can be a Type II, Type V, or Type VI system, which are described in Makarova et al. “Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants” Nature Reviews Microbiology, 18:67-81 (February 2020), incorporated herein by reference. Each type of Class 2 system is further divided into subtypes. See Markova et al. 2020, particularly at Figure. 2. Class 2, Type II systems can be divided into 4 subtypes: II-A, II-B, II-C1, and II-C2. Class 2, Type V systems can be divided into 17 subtypes: V-A, V-B1, V-B2, V-C, V-D, V-E, V-F1, V-FI(V-U3), V-F2, V-F3, V-G, V-H, V-I, V-K (V-U5), V-U1, V-U2, and V-U4. Class 2, Type IV systems can be divided into 5 subtypes: VI-A, VI-B1, VI-B2, VI-C, and VI-D.

The distinguishing feature of these types is that their effector complexes consist of a single, large, multi-domain protein. Type V systems differ from Type II effectors (e.g. Cas9) contain two nuclear domains that are each responsible for the cleavage of one strand of the target DNA, with the HNH nuclease inserted inside the Ruv-C like nuclease domain sequence. The Type V systems (e.g. Cas12) only contain a RuvC-like nuclease domain that cleaves both strands. Type VI (Cas13) are unrelated to the effectors of type II and V systems, contain two HEPN domains and target RNA. Cas13 proteins also display collateral activity that is triggered by target recognition. Some Type V systems have also been found to possess this collateral activity two single-stranded DNA in in vitro contexts.

In some embodiments, the Class 2 system is a Type II system. In some embodiments, the Type II CRISPR-Cas system is a II-A CRISPR-Cas system. In some embodiments, the Type II CRISPR-Cas system is a II-B CRISPR-Cas system. In some embodiments, the Type II CRISPR-Cas system is a II-C1 CRISPR-Cas system. In some embodiments, the Type II CRISPR-Cas system is a II-C2 CRISPR-Cas system. In some embodiments, the Type II system is a Cas9 system. In some embodiments, the Type II system includes a Cas9.

In some embodiments, the Class 2 system is a Type V system. In some embodiments, the Type V CRISPR-Cas system is a V-A CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-B1 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-B2 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-C CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-D CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-E CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F1 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F1 (V-U3) CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F2 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F3 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-G CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-H CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-I CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-K (V-U5) CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U1 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U2 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U4 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system includes a Cas12a (Cpf1), Cas12b (C2c1), Cas12c (C2c3), CasX, and/or Cas14.

In some embodiments the Class 2 system is a Type VI system. In some embodiments, the Type VI CRISPR-Cas system is a VI-A CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-B1 CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-B2 CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-C CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-D CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system includes a Cas13a (C2c2), Cas13b (Group 29/30), Cas13c, and/or Cas13d.

Specialized Cas-Based Systems

In some embodiments, the system is a Cas-based system that is capable of performing a specialized function or activity. For example, the Cas protein may be fused, operably coupled to, or otherwise associated with one or more functionals domains. In certain example embodiments, the Cas protein may be a catalytically dead Cas protein (“dCas”) and/or have nickase activity. A nickase is a Cas protein that cuts only one strand of a double stranded target. In such embodiments, the dCas or nickase provide a sequence specific targeting functionality that delivers the functional domain to or proximate a target sequence. Example functional domains that may be fused to, operably coupled to, or otherwise associated with a Cas protein can be or include, but are not limited to a nuclear localization signal (NLS) domain, a nuclear export signal (NES) domain, a translational activation domain, a transcriptional activation domain (e.g., VP64, p65, MyoD1, HSF1, RTA, and SET7/9), a translation initiation domain, a transcriptional repression domain (e.g., a KRAB domain, NuE domain, NcoR domain, and a SID domain such as a SID4X domain), a nuclease domain (e.g., FokI), a histone modification domain (e.g., a histone acetyltransferase), a light inducible/controllable domain, a chemically inducible/controllable domain, a transposase domain, a homologous recombination machinery domain, a recombinase domain, an integrase domain, and combinations thereof. Methods for generating catalytically dead Cas9 or a nickase Cas9 (WO 2014/204725, Ran et al. Cell. 2013 Sep. 12; 154(6):1380-1389), Cas12 (Liu et al. Nature Communications, 8, 2095 (2017), and Cas13 (International Patent Publication Nos. WO 2019/005884, WO2019/060746) are known in the art and incorporated herein by reference.

In some embodiments, the functional domains can have one or more of the following activities: methylase activity, demethylase activity, translation activation activity, translation initiation activity, translation repression activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity, molecular switch activity, chemical inducibility, light inducibility, and nucleic acid binding activity. In some embodiments, the one or more functional domains may comprise epitope tags or reporters. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporters include, but are not limited to, glutathione-S-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT) beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and auto-fluorescent proteins including blue fluorescent protein (BFP).

The one or more functional domain(s) may be positioned at, near, and/or in proximity to a terminus of the effector protein (e.g., a Cas protein). In embodiments having two or more functional domains, each of the two can be positioned at or near or in proximity to a terminus of the effector protein (e.g., a Cas protein). In some embodiments, such as those where the functional domain is operably coupled to the effector protein, the one or more functional domains can be tethered or linked via a suitable linker (including, but not limited to, GlySer linkers) to the effector protein (e.g., a Cas protein). When there is more than one functional domain, the functional domains can be same or different. In some embodiments, all the functional domains are the same. In some embodiments, all of the functional domains are different from each other. In some embodiments, at least two of the functional domains are different from each other. In some embodiments, at least two of the functional domains are the same as each other.

Other suitable functional domains can be found, for example, in International Patent Publication No. WO 2019/018423.

Split CRISPR-Cas Systems

In some embodiments, the CRISPR-Cas system is a split CRISPR-Cas system. See e.g. Zetche et al., 2015. Nat. Biotechnol. 33(2): 139-142, the compositions and techniques of which can be used in and/or adapted for use with the present invention. Split CRISPR-Cas proteins are set forth herein and in documents incorporated herein by reference in further detail herein. In certain embodiments, each part of a split CRISPR protein is attached to a member of a specific binding pair, and when bound with each other, the members of the specific binding pair maintain the parts of the CRISPR protein in proximity. In certain embodiments, each part of a split CRISPR protein is associated with an inducible binding pair. An inducible binding pair is one which is capable of being switched “on” or “off” by a protein or small molecule that binds to both members of the inducible binding pair. In some embodiments, CRISPR proteins may preferably split between domains, leaving domains intact. In particular embodiments, said Cas split domains (e.g., RuvC and HNH domains in the case of Cas9) can be simultaneously or sequentially introduced into the cell such that said split Cas domain(s) process the target nucleic acid sequence in the algae cell. The reduced size of the split Cas compared to the wild type Cas allows other methods of delivery of the systems to the cells, such as the use of cell penetrating peptides as described herein.

Base Editing

In some embodiments, a polynucleotide of the present invention described elsewhere herein (e.g., RFX4, NFIB, ASCL1, PAX6) can be modified using a base editing system. In some embodiments, a Cas protein is connected or fused to a nucleotide deaminase. Thus, in some embodiments the Cas-based system can be a base editing system. As used herein “base editing” refers generally to the process of polynucleotide modification via a CRISPR-Cas-based or Cas-based system that does not include excising nucleotides to make the modification. Base editing can convert base pairs at precise locations without generating excess undesired editing byproducts that can be made using traditional CRISPR-Cas systems.

In certain example embodiments, the nucleotide deaminase may be a DNA base editor used in combination with a DNA binding Cas protein such as, but not limited to, Class 2 Type II and Type V systems. Two classes of DNA base editors are generally known: cytosine base editors (CBEs) and adenine base editors (ABEs). CBEs convert a C·G base pair into a T·A base pair (Komor et al. 2016. Nature. 533:420-424; Nishida et al. 2016. Science. 353; and Li et al. Nat. Biotech. 36:324-327) and ABEs convert an A·T base pair to a G·C base pair. Collectively, CBEs and ABEs can mediate all four possible transition mutations (C to T, A to G, T to C, and G to A). Rees and Liu. 2018.Nat. Rev. Genet. 19(12): 770-788, particularly at FIGS. 1b, 2a-2c, 3a-3f, and Table 1. In some embodiments, the base editing system includes a CBE and/or an ABE. In some embodiments, a polynucleotide of the present invention described elsewhere herein can be modified using a base editing system. Rees and Liu. 2018. Nat. Rev. Gent. 19(12):770-788. Base editors also generally do not need a DNA donor template and/or rely on homology-directed repair. Komor et al. 2016. Nature. 533:420-424; Nishida et al. 2016. Science. 353; and Gaudeli et al. 2017. Nature. 551:464-471. Upon binding to a target locus in the DNA, base pairing between the guide RNA of the system and the target DNA strand leads to displacement of a small segment of ssDNA in an “R-loop”. Nishimasu et al. Cell. 156:935-949. DNA bases within the ssDNA bubble are modified by the enzyme component, such as a deaminase. In some systems, the catalytically disabled Cas protein can be a variant or modified Cas can have nickase functionality and can generate a nick in the non-edited DNA strand to induce cells to repair the non-edited strand using the edited strand as a template. Komor et al. 2016. Nature. 533:420-424; Nishida et al. 2016. Science. 353; and Gaudeli et al. 2017. Nature. 551:464-471. Base editors may be further engineered to optimize conversion of nucleotides (e.g., A:T to G:C). Richter et al. 2020. Nature Biotechnology. doi.org/10.1038/s41587-020-0453-z.

Other Example Type V base editing systems are described in International Patent Publication Nos. WO 2018/213708 and WO 2018/213726, and International Patent Application Nos. PCT/US2018/067207, PCT/US2018/067225, and PCT/US2018/067307 which are incorporated by referenced herein.

In certain example embodiments, the base editing system may be a RNA base editing system. As with DNA base editors, a nucleotide deaminase capable of converting nucleotide bases may be fused to a Cas protein. However, in these embodiments, the Cas protein will need to be capable of binding RNA. Example RNA binding Cas proteins include, but are not limited to, RNA-binding Cas9s such as Francisella novicida Cas9 (“FnCas9”), and Class 2 Type VI Cas systems. The nucleotide deaminase may be a cytidine deaminase or an adenosine deaminase, or an adenosine deaminase engineered to have cytidine deaminase activity. In certain example embodiments, the RNA based editor may be used to delete or introduce a post-translation modification site in the expressed mRNA. In contrast to DNA base editors, whose edits are permanent in the modified cell, RNA base editors can provide edits where finer temporal control may be needed, for example in modulating a particular immune response. Example Type VI RNA-base editing systems are described in Cox et al. 2017. Science 358: 1019-1027, International Patent Publication Nos. WO 2019/005884, WO 2019/005886, and WO 2019/071048, and International Patent Application Nos. PCT/US20018/05179 and PCT/US2018/067207, which are incorporated herein by reference. An example FnCas9 system that may be adapted for RNA base editing purposes is described in International Patent Publication No. WO 2016/106236, which is incorporated herein by reference.

An example method for delivery of base-editing systems, including use of a split-intein approach to divide CBE and ABE into reconstitutable halves, is described in Levy et al. Nature Biomedical Engineering doi.org/10.1038/s41441-019-0505-5 (2019), which is incorporated herein by reference.

Prime Editing

In some embodiments, a polynucleotide of the present invention described elsewhere herein (e.g. RFX4, NFIB, ASCL1, PAX6) can be modified using a prime editing system (See e.g. Anzalone et al. 2019. Nature. 576: 149-157). Like base editing systems, prime editing systems can be capable of targeted modification of a polynucleotide without generating double stranded breaks and does not require donor templates. Further prime editing systems can be capable of all 12 possible combination swaps. Prime editing can operate via a “search-and-replace” methodology and can mediate targeted insertions, deletions, all 12 possible base-to-base conversion, and combinations thereof. Generally, a prime editing system, as exemplified by PE1, PE2, and PE3 (Id.), can include a reverse transcriptase fused or otherwise coupled or associated with an RNA-programmable nickase, and a prime-editing extended guide RNA (pegRNA) to facility direct copying of genetic information from the extension on the pegRNA into the target polynucleotide. Embodiments that can be used with the present invention include these and variants thereof. Prime editing can have the advantage of lower off-target activity than traditional CRISPR-Cas systems along with few byproducts and greater or similar efficiency as compared to traditional CRISPR-Cas systems.

In some embodiments, the prime editing guide molecule can specify both the target polynucleotide information (e.g., sequence) and contain a new polynucleotide cargo that replaces target polynucleotides. To initiate transfer from the guide molecule to the target polynucleotide, the PE system can nick the target polynucleotide at a target side to expose a 3′hydroxyl group, which can prime reverse transcription of an edit-encoding extension region of the guide molecule (e.g. a prime editing guide molecule or peg guide molecule) directly into the target site in the target polynucleotide. See e.g. Anzalone et al. 2019. Nature. 576: 149-157, particularly at FIGS. 1b, 1c, related discussion, and Supplementary discussion.

In some embodiments, a prime editing system can be composed of a Cas polypeptide having nickase activity, a reverse transcriptase, and a guide molecule. The Cas polypeptide can lack nuclease activity. The guide molecule can include a target binding sequence as well as a primer binding sequence and a template containing the edited polynucleotide sequence. The guide molecule, Cas polypeptide, and/or reverse transcriptase can be coupled together or otherwise associate with each other to form an effector complex and edit a target sequence. In some embodiments, the Cas polypeptide is a Class 2, Type V Cas polypeptide. In some embodiments, the Cas polypeptide is a Cas9 polypeptide (e.g., is a Cas9 nickase). In some embodiments, the Cas polypeptide is fused to the reverse transcriptase. In some embodiments, the Cas polypeptide is linked to the reverse transcriptase.

In some embodiments, the prime editing system can be a PEI system or variant thereof, a PE2 system or variant thereof, or a PE3 (e.g., PE3, PE3b) system. See e.g., Anzalone et al. 2019. Nature. 576: 149-157, particularly at pgs. 2-3, FIGS. 2a, 3a-3f, 4a-4b, Extended data FIGS. 3a-3b, 4,

The peg guide molecule can be about 10 to about 200 or more nucleotides in length, such as 10 to/or 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, or 200 or more nucleotides in length. Optimization of the peg guide molecule can be accomplished as described in Anzalone et al. 2019. Nature. 576: 149-157, particularly at pg. 3, FIG. 2a-2b, and Extended Data FIGS. 5a-c.

CAST Systems

In some embodiments, a polynucleotide of the present invention described elsewhere herein (e.g., RFX4, NFIB, ASCL1, PAX6) can be modified using a CRISPR-Associated Transposase (CAST) System, such as any of those described in PCT/US2019/066835. In some embodiments, a polynucleotide of the present invention described elsewhere herein can be modified using a CRISPR Associated Transposase (“CAST”) system. CAST system can include a Cas protein that is catalytically inactive, or engineered to be catalytically active, and further comprises a transposase (or subunits thereof) that catalyze RNA-guided DNA transposition. Such systems are able to insert DNA sequences at a target site in a DNA molecule without relying on host cell repair machinery. CAST systems can be Class 1 or Class 2 CAST systems. An example Class 1 system is described in Klompe et al. Nature, doi:10.1038/s41586-019-1323, which is in incorporated herein by reference. An example Class 2 system is described in Strecker et al. Science. 10/1126/science. aax9181 (2019), and International Patent Application No. PCT/US2019/066835, which are incorporated herein by reference.

Guide Molecules

The CRISPR-Cas or Cas-Based system described herein can, in some embodiments, include one or more guide molecules. The terms guide molecule, guide sequence and guide polynucleotide, refer to polynucleotides capable of guiding Cas to a target genomic locus and are used interchangeably as in foregoing cited documents such as WO 2014/093622 (PCT/US2013/074667). In general, a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a CRISPR complex to the target sequence. The guide molecule can be a polynucleotide.

The ability of a guide sequence (within a nucleic acid-targeting guide RNA) to direct sequence-specific binding of a nucleic acid-targeting complex to a target nucleic acid sequence may be assessed by any suitable assay. For example, the components of a nucleic acid-targeting CRISPR system sufficient to form a nucleic acid-targeting complex, including the guide sequence to be tested, may be provided to a host cell having the corresponding target nucleic acid sequence, such as by transfection with vectors encoding the components of the nucleic acid-targeting complex, followed by an assessment of preferential targeting (e.g., cleavage) within the target nucleic acid sequence, such as by Surveyor assay (Qui et al. 2004. BioTechniques. 36(4)702-707). Similarly, cleavage of a target nucleic acid sequence may be evaluated in a test tube by providing the target nucleic acid sequence, components of a nucleic acid-targeting complex, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art.

In some embodiments, the guide molecule is an RNA. The guide molecule(s) (also referred to interchangeably herein as guide polynucleotide and guide sequence) that are included in the CRISPR-Cas or Cas based system can be any polynucleotide sequence having sufficient complementarity with a target nucleic acid sequence to hybridize with the target nucleic acid sequence and direct sequence-specific binding of a nucleic acid-targeting complex to the target nucleic acid sequence. In some embodiments, the degree of complementarity, when optimally aligned using a suitable alignment algorithm, can be about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal BLAT, Novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, San Diego, CA), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).

A guide sequence, and hence a nucleic acid-targeting guide, may be selected to target any target nucleic acid sequence. The target sequence may be DNA. The target sequence may be any RNA sequence. In some embodiments, the target sequence may be a sequence within a RNA molecule selected from the group consisting of messenger RNA (mRNA), pre-mRNA, ribosomal RNA (rRNA), transfer RNA (tRNA), micro-RNA (miRNA), small interfering RNA (siRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), double stranded RNA (dsRNA), non-coding RNA (ncRNA), long non-coding RNA (lncRNA), and small cytoplasmatic RNA (scRNA). In some preferred embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of mRNA, pre-mRNA, and rRNA. In some preferred embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of ncRNA, and lncRNA. In some more preferred embodiments, the target sequence may be a sequence within an mRNA molecule or a pre-mRNA molecule.

In some embodiments, a nucleic acid-targeting guide is selected to reduce the degree secondary structure within the nucleic acid-targeting guide. In some embodiments, about or less than about 75%, 50%, 40%, 30%, 25%, 20%, 15%, 10%, 5%, 1%, or fewer of the nucleotides of the nucleic acid-targeting guide participate in self-complementary base pairing when optimally folded. Optimal folding may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res. 9 (1981), 133-148). Another example folding algorithm is the online webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see e.g., A. R. Gruber et al., 2008, Cell 106(1): 23-24; and PA Carr and G M Church, 2009, Nature Biotechnology 27(12): 1151-62).

In certain embodiments, a guide RNA or crRNA may comprise, consist essentially of, or consist of a direct repeat (DR) sequence and a guide sequence or spacer sequence. In certain embodiments, the guide RNA or crRNA may comprise, consist essentially of, or consist of a direct repeat sequence fused or linked to a guide sequence or spacer sequence. In certain embodiments, the direct repeat sequence may be located upstream (i.e., 5′) from the guide sequence or spacer sequence. In other embodiments, the direct repeat sequence may be located downstream (i.e., 3′) from the guide sequence or spacer sequence.

In certain embodiments, the crRNA comprises a stem loop, preferably a single stem loop. In certain embodiments, the direct repeat sequence forms a stem loop, preferably a single stem loop.

In certain embodiments, the spacer length of the guide RNA is from 15 to 35 nt. In certain embodiments, the spacer length of the guide RNA is at least 15 nucleotides. In certain embodiments, the spacer length is from 15 to 17 nt, e.g., 15, 16, or 17 nt, from 17 to 20 nt, e.g., 17, 18, 19, or 20 nt, from 20 to 24 nt, e.g., 20, 21, 22, 23, or 24 nt, from 23 to 25 nt, e.g., 23, 24, or 25 nt, from 24 to 27 nt, e.g., 24, 25, 26, or 27 nt, from 27 to 30 nt, e.g., 27, 28, 29, or 30 nt, from 30-35 nt, e.g., 30, 31, 32, 33, 34, or 35 nt, or 35 nt or longer.

The “tracrRNA” sequence or analogous terms includes any polynucleotide sequence that has sufficient complementarity with a crRNA sequence to hybridize. In some embodiments, the degree of complementarity between the tracrRNA sequence and crRNA sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length. In some embodiments, the tracr sequence and crRNA sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin.

In general, degree of complementarity is with reference to the optimal alignment of the sca sequence and tracr sequence, along the length of the shorter of the two sequences. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the sca sequence or tracr sequence. In some embodiments, the degree of complementarity between the tracr sequence and sca sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher.

In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence can be about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or 100%; a guide or RNA or sgRNA can be about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length; or guide or RNA or sgRNA can be less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length; and tracr RNA can be 30 or 50 nucleotides in length. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence is greater than 94.5% or 95% or 95.5% or 96% or 96.5% or 97% or 97.5% or 98% or 98.5% or 99% or 99.5% or 99.9%, or 100%. Off target is less than 100% or 99.9% or 99.5% or 99% or 99% or 98.5% or 98% or 97.5% or 97% or 96.5% or 96% or 95.5% or 95% or 94.5% or 94% or 93% or 92% or 91% or 90% or 89% or 88% or 87% or 86% or 85% or 84% or 83% or 82% or 81% or 80% complementarity between the sequence and the guide, with it advantageous that off target is 100% or 99.9% or 99.5% or 99% or 99% or 98.5% or 98% or 97.5% or 97% or 96.5% or 96% or 95.5% or 95% or 94.5% complementarity between the sequence and the guide.

In some embodiments according to the invention, the guide RNA (capable of guiding Cas to a target locus) may comprise (1) a guide sequence capable of hybridizing to a genomic target locus in the eukaryotic cell; (2) a tracr sequence; and (3) a tracr mate sequence. All (1) to (3) may reside in a single RNA, i.e., an sgRNA (arranged in a 5′ to 3′ orientation), or the tracr RNA may be a different RNA than the RNA containing the guide and tracr sequence. The tracr hybridizes to the tracr mate sequence and directs the CRISPR/Cas complex to the target sequence. Where the tracr RNA is on a different RNA than the RNA containing the guide and tracr sequence, the length of each RNA may be optimized to be shortened from their respective native lengths, and each may be independently chemically modified to protect from degradation by cellular RNase or otherwise increase stability.

Many modifications to guide sequences are known in the art and are further contemplated within the context of this invention. Various modifications may be used to increase the specificity of binding to the target sequence and/or increase the activity of the Cas protein and/or reduce off-target effects. Example guide sequence modifications are described in International Patent Application No. PCT US2019/045582, specifically paragraphs [0178]-[0333]. which is incorporated herein by reference.

Target Sequences, PAMs, and PFSs Target Sequences

In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. A target sequence may comprise RNA polynucleotides. The term “target RNA” refers to a RNA polynucleotide being or comprising the target sequence. In other words, the target polynucleotide can be a polynucleotide or a part of a polynucleotide to which a part of the guide sequence is designed to have complementarity to and to which the effector function mediated by the complex comprising the CRISPR effector protein and a guide molecule is to be directed to. In some embodiments, a target sequence is located in the nucleus or cytoplasm of a cell.

The guide sequence can specifically bind a target sequence in a target polynucleotide. The target polynucleotide may be DNA. The target polynucleotide may be RNA. The target polynucleotide can have one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) target sequences. The target polynucleotide can be on a vector. The target polynucleotide can be genomic DNA. The target polynucleotide can be episomal. Other forms of the target polynucleotide are described elsewhere herein.

The target sequence may be DNA. The target sequence may be any RNA sequence. In some embodiments, the target sequence may be a sequence within a RNA molecule selected from the group consisting of messenger RNA (mRNA), pre-mRNA, ribosomal RNA (rRNA), transfer RNA (tRNA), micro-RNA (miRNA), small interfering RNA (siRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), double stranded RNA (dsRNA), non-coding RNA (ncRNA), long non-coding RNA (lncRNA), and small cytoplasmatic RNA (scRNA). In some preferred embodiments, the target sequence (also referred to herein as a target polynucleotide) may be a sequence within a RNA molecule selected from the group consisting of mRNA, pre-mRNA, and rRNA. In some preferred embodiments, the target sequence may be a sequence within a RNA molecule selected from the group consisting of ncRNA, and lncRNA. In some more preferred embodiments, the target sequence may be a sequence within an mRNA molecule or a pre-mRNA molecule.

PAM and PFS Elements

PAM elements are sequences that can be recognized and bound by Cas proteins. Cas proteins/effector complexes can then unwind the dsDNA at a position adjacent to the PAM element. It will be appreciated that Cas proteins and systems that include them that target RNA do not require PAM sequences (Marraffini et al. 2010. Nature. 463:568-571). Instead, many rely on PFSs, which are discussed elsewhere herein. In certain embodiments, the target sequence should be associated with a PAM (protospacer adjacent motif) or PFS (protospacer flanking sequence or site); that is, a short sequence recognized by the CRISPR complex. Depending on the nature of the CRISPR-Cas protein, the target sequence should be selected such that its complementary sequence in the DNA duplex (also referred to herein as the non-target sequence) is upstream or downstream of the PAM. In the embodiments, the complementary sequence of the target sequence is downstream or 3′ of the PAM or upstream or 5′ of the PAM. The precise sequence and length requirements for the PAM differ depending on the Cas protein used, but PAMs are typically 2-5 base pair sequences adjacent the protospacer (that is, the target sequence). Examples of the natural PAM sequences for different Cas proteins are provided herein below and the skilled person will be able to identify further PAM sequences for use with a given Cas protein.

The ability to recognize different PAM sequences depends on the Cas polypeptide(s) included in the system. See e.g. Gleditzsch et al. 2019. RNA Biology. 16(4):504-517. Table 15 below shows several Cas polypeptides and the PAM sequence they recognize.

TABLE 15 Example PAM Sequences Cas Protein PAM Sequence SpCas9 NGG/NRG SaCas9 NGRRT or NGRRN NmeCas9 NNNNGATT CjCas9 NNNNRYAC StCas9 NNAGAAW Cas12a (Cpf1) (including TTTV LbCpf1 and AsCpf1) Cas12b (C2c1) TTT, TTA, and TTC Cas12c (C2c3) TA Cas12d (CasY) TA Cas12e (CasX) 5′-TTCN-3′

In a preferred embodiment, the CRISPR effector protein may recognize a 3′ PAM. In certain embodiments, the CRISPR effector protein may recognize a 3′ PAM which is 5′H, wherein H is A, C or U.

Further, engineering of the PAM Interacting (PI) domain on the Cas protein may allow programing of PAM specificity, improve target site recognition fidelity, and increase the versatility of the CRISPR-Cas protein, for example as described for Cas9 in Kleinstiver B P et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature. 2015 Jul. 23; 523(7561):481-5. doi: 10.1038/nature14592. As further detailed herein, the skilled person will understand that Cas13 proteins may be modified analogously. Gao et al, “Engineered Cpf1 Enzymes with Altered PAM Specificities,” bioRxiv 091611; doi: http://dx.doi.org/10.1101/091611 (Dec. 4, 2016). Doench et al. created a pool of sgRNAs, tiling across all possible target sites of a panel of six endogenous mouse and three endogenous human genes and quantitatively assessed their ability to produce null alleles of their target gene by antibody staining and flow cytometry. The authors showed that optimization of the PAM improved activity and also provided an on-line tool for designing sgRNAs.

PAM sequences can be identified in a polynucleotide using an appropriate design tool, which are commercially available as well as online. Such freely available tools include, but are not limited to, CRISPRFinder and CRISPRTarget. Mojica et al. 2009. Microbiol. 155(Pt. 3):733-740; Atschul et al. 1990. J. Mol. Biol. 215:403-410; Biswass et al. 2013 RNA Biol. 10:817-827; and Grissa et al. 2007. Nucleic Acid Res. 35:W52-57. Experimental approaches to PAM identification can include, but are not limited to, plasmid depletion assays (Jiang et al. 2013. Nat. Biotechnol. 31:233-239; Esvelt et al. 2013. Nat. Methods. 10:1116-1121; Kleinstiver et al. 2015. Nature. 523:481-485), screened by a high-throughput in vivo model called PAM-SCNAR (Pattanayak et al. 2013. Nat. Biotechnol. 31:839-843 and Leenay et al. 2016.Mol. Cell. 16:253), and negative screening (Zetsche et al. 2015. Cell. 163:759-771).

As previously mentioned, CRISPR-Cas systems that target RNA do not typically rely on PAM sequences. Instead such systems typically recognize protospacer flanking sites (PFSs) instead of PAMs Thus, Type VI CRISPR-Cas systems typically recognize protospacer flanking sites (PFSs) instead of PAMs. PFSs represents an analogue to PAMs for RNA targets. Type VI CRISPR-Cas systems employ a Cas13. Some Cas13 proteins analyzed to date, such as Cas13a (C2c2) identified from Leptotrichia shahii (LShCAs13a) have a specific discrimination against G at the 3′end of the target RNA. The presence of a C at the corresponding crRNA repeat site can indicate that nucleotide pairing at this position is rejected. However, some Cas13 proteins (e.g., LwaCAs13a and PspCas13b) do not seem to have a PFS preference. See e.g., Gleditzsch et al. 2019. RNA Biology. 16(4):504-517.

Some Type VI proteins, such as subtype B, have 5′-recognition of D (G, T, A) and a 3′-motif requirement of NAN or NNA. One example is the Cas13b protein identified in Bergeyella zoohelcum (BzCas13b). See e.g., Gleditzsch et al. 2019. RNA Biology. 16(4):504-517.

Overall Type VI CRISPR-Cas systems appear to have less restrictive rules for substrate (e.g. target sequence) recognition than those that target DNA (e.g., Type V and type II).

Zinc Finger Nucleases

In some embodiments, the polynucleotide is modified using a Zinc Finger nuclease or system thereof. One type of programmable DNA-binding domain is provided by artificial zinc-finger (ZF) technology, which involves arrays of ZF modules to target new DNA-binding sites in the genome. Each finger module in a ZF array targets three DNA bases. A customized array of individual zinc finger domains is assembled into a ZF protein (ZFP).

ZFPs can comprise a functional domain. The first synthetic zinc finger nucleases (ZFNs) were developed by fusing a ZF protein to the catalytic domain of the Type IIS restriction enzyme FokI. (Kim, Y. G. et al., 1994, Chimeric restriction endonuclease, Proc. Natl. Acad. Sci. U.S.A. 91, 883-887; Kim, Y. G. et al., 1996, Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc. Natl. Acad. Sci. U.S.A. 93, 1156-1160). Increased cleavage specificity can be attained with decreased off target activity by use of paired ZFN heterodimers, each targeting different nucleotide sequences separated by a short spacer. (Doyon, Y. et al., 2011, Enhancing zinc-finger-nuclease activity with improved obligate heterodimeric architectures. Nat. Methods 8, 74-79). ZFPs can also be designed as transcription activators and repressors and have been used to target many genes in a wide variety of organisms. Exemplary methods of genome editing using ZFNs can be found for example in U.S. Pat. Nos. 6,534,261, 6,607,882, 6,746,838, 6,794,136, 6,824,978, 6,866,997, 6,933,113, 6,979,539, 7,013,219, 7,030,215, 7,220,719, 7,241,573, 7,241,574, 7,585,849, 7,595,376, 6,903,185, and 6,479,626, all of which are specifically incorporated by reference.

TALE Nucleases

In some embodiments, a TALE nuclease or TALE nuclease system can be used to modify a polynucleotide. In some embodiments, the methods provided herein use isolated, non-naturally occurring, recombinant or engineered DNA binding proteins that comprise TALE monomers or TALE monomers or half monomers as a part of their organizational structure that enable the targeting of nucleic acid sequences with improved efficiency and expanded specificity.

Naturally occurring TALEs or “wild type TALEs” are nucleic acid binding proteins secreted by numerous species of proteobacteria. TALE polypeptides contain a nucleic acid binding domain composed of tandem repeats of highly conserved monomer polypeptides that are predominantly 33, 34 or 35 amino acids in length and that differ from each other mainly in amino acid positions 12 and 13. In advantageous embodiments the nucleic acid is DNA. As used herein, the term “polypeptide monomers”, “TALE monomers” or “monomers” will be used to refer to the highly conserved repetitive polypeptide sequences within the TALE nucleic acid binding domain and the term “repeat variable di-residues” or “RVD” will be used to refer to the highly variable amino acids at positions 12 and 13 of the polypeptide monomers. As provided throughout the disclosure, the amino acid residues of the RVD are depicted using the IUPAC single letter code for amino acids. A general representation of a TALE monomer which is comprised within the DNA binding domain is X1-11-(X12X13)-X14-33 or 34 or 35, where the subscript indicates the amino acid position and X represents any amino acid. X12X13 indicate the RVDs. In some polypeptide monomers, the variable amino acid at position 13 is missing or absent and in such monomers, the RVD consists of a single amino acid. In such cases the RVD may be alternatively represented as X*, where X represents X12 and (*) indicates that X13 is absent. The DNA binding domain comprises several repeats of TALE monomers and this may be represented as (X1-11-(X12X13)-X14-33 or 34 or 35) z, where in an advantageous embodiment, z is at least 5 to 40. In a further advantageous embodiment, z is at least 10 to 26.

The TALE monomers can have a nucleotide binding affinity that is determined by the identity of the amino acids in its RVD. For example, polypeptide monomers with an RVD of NI can preferentially bind to adenine (A), monomers with an RVD of NG can preferentially bind to thymine (T), monomers with an RVD of HD can preferentially bind to cytosine (C) and monomers with an RVD of NN can preferentially bind to both adenine (A) and guanine (G). In some embodiments, monomers with an RVD of IG can preferentially bind to T. Thus, the number and order of the polypeptide monomer repeats in the nucleic acid binding domain of a TALE determines its nucleic acid target specificity. In some embodiments, monomers with an RVD of NS can recognize all four base pairs and can bind to A, T, G or C. The structure and function of TALEs is further described in, for example, Moscou et al., Science 326:1501 (2009); Boch et al., Science 326:1509-1512 (2009); and Zhang et al., Nature Biotechnology 29:149-153 (2011).

The polypeptides used in methods of the invention can be isolated, non-naturally occurring, recombinant or engineered nucleic acid-binding proteins that have nucleic acid or DNA binding regions containing polypeptide monomer repeats that are designed to target specific nucleic acid sequences.

As described herein, polypeptide monomers having an RVD of HN or NH preferentially bind to guanine and thereby allow the generation of TALE polypeptides with high binding specificity for guanine containing target nucleic acid sequences. In some embodiments, polypeptide monomers having RVDs RN, NN, NK, SN, NH, KN, HN, NQ, HH, RG, KH, RH and SS can preferentially bind to guanine. In some embodiments, polypeptide monomers having RVDs RN, NK, NQ, HH, KH, RH, SS and SN can preferentially bind to guanine and can thus allow the generation of TALE polypeptides with high binding specificity for guanine containing target nucleic acid sequences. In some embodiments, polypeptide monomers having RVDs HH, KH, NH, NK, NQ, RH, RN and SS can preferentially bind to guanine and thereby allow the generation of TALE polypeptides with high binding specificity for guanine containing target nucleic acid sequences. In some embodiments, the RVDs that have high binding specificity for guanine are RN, NH RH and KH. Furthermore, polypeptide monomers having an RVD of NV can preferentially bind to adenine and guanine. In some embodiments, monomers having RVDs of H*, HA, KA, N*, NA, NC, NS, RA, and S* bind to adenine, guanine, cytosine and thymine with comparable affinity.

The predetermined N-terminal to C-terminal order of the one or more polypeptide monomers of the nucleic acid or DNA binding domain determines the corresponding predetermined target nucleic acid sequence to which the polypeptides of the invention will bind. As used herein the monomers and at least one or more half monomers are “specifically ordered to target” the genomic locus or gene of interest. In plant genomes, the natural TALE-binding sites always begin with a thymine (T), which may be specified by a cryptic signal within the non-repetitive N-terminus of the TALE polypeptide; in some cases, this region may be referred to as repeat 0. In animal genomes, TALE binding sites do not necessarily have to begin with a thymine (T) and polypeptides of the invention may target DNA sequences that begin with T, A, G or C. The tandem repeat of TALE monomers always ends with a half-length repeat or a stretch of sequence that may share identity with only the first 20 amino acids of a repetitive full-length TALE monomer and this half repeat may be referred to as a half-monomer. Therefore, it follows that the length of the nucleic acid or DNA being targeted is equal to the number of full monomers plus two.

As described in Zhang et al., Nature Biotechnology 29:149-153 (2011), TALE polypeptide binding efficiency may be increased by including amino acid sequences from the “capping regions” that are directly N-terminal or C-terminal of the DNA binding region of naturally occurring TALEs into the engineered TALEs at positions N-terminal or C-terminal of the engineered TALE DNA binding region. Thus, in certain embodiments, the TALE polypeptides described herein further comprise an N-terminal capping region and/or a C-terminal capping region.

An exemplary amino acid sequence of a N-terminal capping region is:

(SEQ ID NO: 10788) M D P I R S R T P S P A R E L L S G P Q P D G V Q P T A D R G V S P P A G G P L D G L P A R R T M S R T R L P S P P A P S P A F S A D S F S D L L R Q F D P S L F N T S L F D S L P P F G A H H T E A A T G E W D E V Q S G L R A A D A P P P T M R V A V T A A R P P R A K P A P R R R A A Q P S D A S P A A Q V D L R T L G Y S Q Q Q Q E K I K P K V R S T V A Q H H E A L V G H G F T H A H I V A L S Q H P A A L G T V A V K Y Q D M I A A L P E A T H E A I V G V G K Q W S G A R A L E A L L T V A G E L R G P P L Q L D T G Q L L K I A K R G G V T A V E A V H A W R N A L T G A P L N

An exemplary amino acid sequence of a C-terminal capping region is:

(SEQ ID NO: 10789) R P A L E S I V A Q L S R P D P A L A A L T N D H L V A L A C L G G R P A L D A V K K G L P H A P A L I K R T N R R I P E R T S H R V A D H A Q V V R V L G F F Q C H S H P A Q A F D D A M T Q F G M S R H G L L Q L F R R V G V T E L E A R S G T L P P A S Q R W D R I L Q A S G M K R A K P S P T S T Q T P D Q A S L H A F A D S L E R D L D A P S P M H E G D Q T R A S

As used herein the predetermined “N-terminus” to “C terminus” orientation of the N-terminal capping region, the DNA binding domain comprising the repeat TALE monomers and the C-terminal capping region provide structural basis for the organization of different domains in the d-TALEs or polypeptides of the invention.

The entire N-terminal and/or C-terminal capping regions are not necessary to enhance the binding activity of the DNA binding region. Therefore, in certain embodiments, fragments of the N-terminal and/or C-terminal capping regions are included in the TALE polypeptides described herein.

In certain embodiments, the TALE polypeptides described herein contain a N-terminal capping region fragment that included at least 10, 20, 30, 40, 50, 54, 60, 70, 80, 87, 90, 94, 100, 102, 110, 117, 120, 130, 140, 147, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260 or 270 amino acids of an N-terminal capping region. In certain embodiments, the N-terminal capping region fragment amino acids are of the C-terminus (the DNA-binding region proximal end) of an N-terminal capping region. As described in Zhang et al., Nature Biotechnology 29:149-153 (2011), N-terminal capping region fragments that include the C-terminal 240 amino acids enhance binding activity equal to the full length capping region, while fragments that include the C-terminal 147 amino acids retain greater than 80% of the efficacy of the full length capping region, and fragments that include the C-terminal 117 amino acids retain greater than 50% of the activity of the full-length capping region.

In some embodiments, the TALE polypeptides described herein contain a C-terminal capping region fragment that included at least 6, 10, 20, 30, 37, 40, 50, 60, 68, 70, 80, 90, 100, 110, 120, 127, 130, 140, 150, 155, 160, 170, 180 amino acids of a C-terminal capping region. In certain embodiments, the C-terminal capping region fragment amino acids are of the N-terminus (the DNA-binding region proximal end) of a C-terminal capping region. As described in Zhang et al., Nature Biotechnology 29:149-153 (2011), C-terminal capping region fragments that include the C-terminal 68 amino acids enhance binding activity equal to the full-length capping region, while fragments that include the C-terminal 20 amino acids retain greater than 50% of the efficacy of the full-length capping region.

In certain embodiments, the capping regions of the TALE polypeptides described herein do not need to have identical sequences to the capping region sequences provided herein. Thus, in some embodiments, the capping region of the TALE polypeptides described herein have sequences that are at least 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical or share identity to the capping region amino acid sequences provided herein. Sequence identity is related to sequence homology. Homology comparisons may be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs may calculate percent (%) homology between two or more sequences and may also calculate the sequence identity shared by two or more amino acid or nucleic acid sequences. In some preferred embodiments, the capping region of the TALE polypeptides described herein have sequences that are at least 95% identical or share identity to the capping region amino acid sequences provided herein.

Sequence homologies can be generated by any of a number of computer programs known in the art, which include but are not limited to BLAST or FASTA. Suitable computer programs for carrying out alignments like the GCG Wisconsin Bestfit package may also be used. Once the software has produced an optimal alignment, it is possible to calculate % homology, preferably % sequence identity. The software typically does this as part of the sequence comparison and generates a numerical result.

In some embodiments described herein, the TALE polypeptides of the invention include a nucleic acid binding domain linked to the one or more effector domains. The terms “effector domain” or “regulatory and functional domain” refer to a polypeptide sequence that has an activity other than binding to the nucleic acid sequence recognized by the nucleic acid binding domain. By combining a nucleic acid binding domain with one or more effector domains, the polypeptides of the invention may be used to target the one or more functions or activities mediated by the effector domain to a particular target DNA sequence to which the nucleic acid binding domain specifically binds.

In some embodiments of the TALE polypeptides described herein, the activity mediated by the effector domain is a biological activity. For example, in some embodiments the effector domain is a transcriptional inhibitor (i.e., a repressor domain), such as an mSin interaction domain (SID). SID4X domain or a Krüppel-associated box (KRAB) or fragments of the KRAB domain. In some embodiments the effector domain is an enhancer of transcription (i.e. an activation domain), such as the VP16, VP64 or p65 activation domain. In some embodiments, the nucleic acid binding is linked, for example, with an effector domain that includes but is not limited to a transposase, integrase, recombinase, resolvase, invertase, protease, DNA methyltransferase, DNA demethylase, histone acetylase, histone deacetylase, nuclease, transcriptional repressor, transcriptional activator, transcription factor recruiting, protein nuclear-localization signal or cellular uptake signal.

In some embodiments, the effector domain is a protein domain which exhibits activities which include but are not limited to transposase activity, integrase activity, recombinase activity, resolvase activity, invertase activity, protease activity, DNA methyltransferase activity, DNA demethylase activity, histone acetylase activity, histone deacetylase activity, nuclease activity, nuclear-localization signaling activity, transcriptional repressor activity, transcriptional activator activity, transcription factor recruiting activity, or cellular uptake signaling activity. Other preferred embodiments of the invention may include any combination of the activities described herein.

Meganucleases

In some embodiments, a meganuclease or system thereof can be used to modify a polynucleotide. Meganucleases, which are endodeoxyribonucleases characterized by a large recognition site (double-stranded DNA sequences of 12 to 40 base pairs). Exemplary methods for using meganucleases can be found in U.S. Pat. Nos. 8,163,514, 8,133,697, 8,021,867, 8,119,361, 8,119,381, 8,124,369, and 8,129,134, which are specifically incorporated by reference.

Sequences Related to Nucleus Targeting and Transportation

In some embodiments, one or more components (e.g., the Cas protein and/or deaminase, Zn Finger protein, TALE, or meganuclease) in the composition for engineering cells may comprise one or more sequences related to nucleus targeting and transportation. Such sequence may facilitate the one or more components in the composition for targeting a sequence within a cell. In order to improve targeting of the CRISPR-Cas protein and/or the nucleotide deaminase protein or catalytic domain thereof used in the methods of the present disclosure to the nucleus, it may be advantageous to provide one or both of these components with one or more nuclear localization sequences (NLSs).

In some embodiments, the NLSs used in the context of the present disclosure are heterologous to the proteins. Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence PKKKRKV (SEQ ID NO: 10790) or PKKKRKVEAS (SEQ ID NO: 10791); the NLS from nucleoplasmin (e.g., the nucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ ID NO: 10792)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQ ID NO: 10793) or RQRRNELKRSP (SEQ ID NO: 10794); the hRNPA1 M9 NLS having the sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 10795); the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 10796) of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ ID NO: 10797) and PPKKARED (SEQ ID NO: 10798) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO: 10799) of human p53; the sequence SALIKKKKKMAP (SEQ ID NO: 10800) of mouse c-abl IV; the sequences DRLRR (SEQ ID NO: 10801) and PKQKKRK (SEQ ID NO: 10802) of the influenza virus NS1; the sequence RKLKKKIKKL (SEQ ID NO: 10803) of the Hepatitis virus delta antigen; the sequence REKKKFLKRR (SEQ ID NO; 10804) of the mouse Mx1 protein; the sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 10805) of the human poly(ADP-ribose) polymerase; and the sequence RKCLQAGMNLEARKTKK (SEQ ID NO: 10806) of the steroid hormone receptors (human) glucocorticoid. In general, the one or more NLSs are of sufficient strength to drive accumulation of the DNA-targeting Cas protein in a detectable amount in the nucleus of a eukaryotic cell. In general, strength of nuclear localization activity may derive from the number of NLSs in the CRISPR-Cas protein, the particular NLS(s) used, or a combination of these factors. Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to the nucleic acid-targeting protein, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g., a stain specific for the nucleus such as DAPI). Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of nucleic acid-targeting complex formation (e.g., assay for deaminase activity) at the target sequence, or assay for altered gene expression activity affected by DNA-targeting complex formation and/or DNA-targeting), as compared to a control not exposed to the CRISPR-Cas protein and deaminase protein, or exposed to a CRISPR-Cas and/or deaminase protein lacking the one or more NLSs.

The CRISPR-Cas and/or nucleotide deaminase proteins may be provided with 1 or more, such as with, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more heterologous NLSs. In some embodiments, the proteins comprises about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the amino-terminus, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the carboxy-terminus, or a combination of these (e.g., zero or at least one or more NLS at the amino-terminus and zero or at one or more NLS at the carboxy terminus). When more than one NLS is present, each may be selected independently of the others, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs present in one or more copies. In some embodiments, an NLS is considered near the N- or C-terminus when the nearest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N- or C-terminus. In preferred embodiments of the CRISPR-Cas proteins, an NLS attached to the C-terminal of the protein.

In certain embodiments, the CRISPR-Cas protein and the deaminase protein are delivered to the cell or expressed within the cell as separate proteins. In these embodiments, each of the CRISPR-Cas and deaminase protein can be provided with one or more NLSs as described herein. In certain embodiments, the CRISPR-Cas and deaminase proteins are delivered to the cell or expressed with the cell as a fusion protein. In these embodiments one or both of the CRISPR-Cas and deaminase protein is provided with one or more NLSs. Where the nucleotide deaminase is fused to an adaptor protein (such as MS2) as described above, the one or more NLS can be provided on the adaptor protein, provided that this does not interfere with aptamer binding. In particular embodiments, the one or more NLS sequences may also function as linker sequences between the nucleotide deaminase and the CRISPR-Cas protein.

In certain embodiments, guides of the disclosure comprise specific binding sites (e.g., aptamers) for adapter proteins, which may be linked to or fused to an nucleotide deaminase or catalytic domain thereof. When such a guide forms a CRISPR complex (e.g., CRISPR-Cas protein binding to guide and target) the adapter proteins bind and, the nucleotide deaminase or catalytic domain thereof associated with the adapter protein is positioned in a spatial orientation which is advantageous for the attributed function to be effective.

The skilled person will understand that modifications to the guide which allow for binding of the adapter+nucleotide deaminase, but not proper positioning of the adapter+nucleotide deaminase (e.g., due to steric hindrance within the three dimensional structure of the CRISPR complex) are modifications which are not intended. The one or more modified guide may be modified at the tetra loop, the stem loop 1, stem loop 2, or stem loop 3, as described herein, preferably at either the tetra loop or stem loop 2, and in some cases at both the tetra loop and stem loop 2.

In some embodiments, a component (e.g., the dead Cas protein, the nucleotide deaminase protein or catalytic domain thereof, or a combination thereof) in the systems may comprise one or more nuclear export signals (NES), one or more nuclear localization signals (NLS), or any combinations thereof. In some cases, the NES may be an HIV Rev NES. In certain cases, the NES may be MAPK NES. When the component is a protein, the NES or NLS may be at the C terminus of component. Alternatively or additionally, the NES or NLS may be at the N terminus of component. In some examples, the Cas protein and optionally said nucleotide deaminase protein or catalytic domain thereof comprise one or more heterologous nuclear export signal(s) (NES(s)) or nuclear localization signal(s) (NLS(s)), preferably an HIV Rev NES or MAPK NES, preferably C-terminal.

Templates

In some embodiments, the composition for engineering cells comprises a template, e.g., a recombination template. A template may be a component of another vector as described herein, contained in a separate vector, or provided as a separate polynucleotide. In some embodiments, a recombination template is designed to serve as a template in homologous recombination, such as within or near a target sequence nicked or cleaved by a nucleic acid-targeting effector protein as a part of a nucleic acid-targeting complex.

In an embodiment, the template nucleic acid alters the sequence of the target position. In an embodiment, the template nucleic acid results in the incorporation of a modified, or non-naturally occurring base into the target nucleic acid.

The template sequence may undergo a breakage mediated or catalyzed recombination with the target sequence. In an embodiment, the template nucleic acid may include sequence that corresponds to a site on the target sequence that is cleaved by a Cas protein mediated cleavage event. In an embodiment, the template nucleic acid may include sequence that corresponds to both, a first site on the target sequence that is cleaved in a first Cas protein mediated event, and a second site on the target sequence that is cleaved in a second Cas protein mediated event.

In certain embodiments, the template nucleic acid can include sequence which results in an alteration in the coding sequence of a translated sequence, e.g., one which results in the substitution of one amino acid for another in a protein product, e.g., transforming a mutant allele into a wild type allele, transforming a wild type allele into a mutant allele, and/or introducing a stop codon, insertion of an amino acid residue, deletion of an amino acid residue, or a nonsense mutation. In certain embodiments, the template nucleic acid can include sequence which results in an alteration in a non-coding sequence, e.g., an alteration in an exon or in a 5′ or 3′ non-translated or non-transcribed region. Such alterations include an alteration in a control element, e.g., a promoter, enhancer, and an alteration in a cis-acting or trans-acting control element.

A template nucleic acid having homology with a target position in a target gene may be used to alter the structure of a target sequence. The template sequence may be used to alter an unwanted structure, e.g., an unwanted or mutant nucleotide. The template nucleic acid may include sequence which, when integrated, results in: decreasing the activity of a positive control element; increasing the activity of a positive control element; decreasing the activity of a negative control element; increasing the activity of a negative control element; decreasing the expression of a gene; increasing the expression of a gene; increasing resistance to a disorder or disease; increasing resistance to viral entry; correcting a mutation or altering an unwanted amino acid residue conferring, increasing, abolishing or decreasing a biological property of a gene product, e.g., increasing the enzymatic activity of an enzyme, or increasing the ability of a gene product to interact with another molecule.

The template nucleic acid may include sequence which results in: a change in sequence of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12 or more nucleotides of the target sequence.

A template polynucleotide may be of any suitable length, such as about or more than about 10, 15, 20, 25, 50, 75, 100, 150, 200, 500, 1000, or more nucleotides in length. In an embodiment, the template nucleic acid may be 20+/−10, 30+/−10, 40+/−10, 50+/−10, 60+/−10, 70+/−10, 80+/−10, 90+/−10, 100+/−10, 1 10+/−10, 120+/−10, 130+/−10, 140+/−10, 150+/−10, 160+/−10, 170+/−10, 1 80+/−10, 190+/−10, 200+/−10, 210+/−10, of 220+/−10 nucleotides in length. In an embodiment, the template nucleic acid may be 30+/−20, 40+/−20, 50+/−20, 60+/−20, 70+/−20, 80+/−20, 90+/−20, 100+/−20, 1 10+/−20, 120+/−20, 130+/−20, 140+/−20, 150+/−20, 160+/−20, 170+/−20, 180+/−20, 190+/−20, 200+/−20, 210+/−20, of 220+/−20 nucleotides in length. In an embodiment, the template nucleic acid is 10 to 1,000, 20 to 900, 30 to 800, 40 to 700, 50 to 600, 50 to 500, 50 to 400, 50 to 300, 50 to 200, or 50 to 100 nucleotides in length.

In some embodiments, the template polynucleotide is complementary to a portion of a polynucleotide comprising the target sequence. When optimally aligned, a template polynucleotide might overlap with one or more nucleotides of a target sequences (e.g., about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 or more nucleotides). In some embodiments, when a template sequence and a polynucleotide comprising a target sequence are optimally aligned, the nearest nucleotide of the template polynucleotide is within about 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 5000, 10000, or more nucleotides from the target sequence.

The exogenous polynucleotide template comprises a sequence to be integrated (e.g., a mutated gene). The sequence for integration may be a sequence endogenous or exogenous to the cell. Examples of a sequence to be integrated include polynucleotides encoding a protein or a non-coding RNA (e.g., a microRNA). Thus, the sequence for integration may be operably linked to an appropriate control sequence or sequences. Alternatively, the sequence to be integrated may provide a regulatory function.

An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp. In some methods, the exemplary upstream or downstream sequence have about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000.

An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp. In some methods, the exemplary upstream or downstream sequence have about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000

In certain embodiments, one or both homology arms may be shortened to avoid including certain sequence repeat elements. For example, a 5′ homology arm may be shortened to avoid a sequence repeat element. In other embodiments, a 3′ homology arm may be shortened to avoid a sequence repeat element. In some embodiments, both the 5′ and the 3′ homology arms may be shortened to avoid including certain sequence repeat elements.

In some methods, the exogenous polynucleotide template may further comprise a marker. Such a marker may make it easy to screen for targeted integrations. Examples of suitable markers include restriction sites, fluorescent proteins, or selectable markers. The exogenous polynucleotide template of the disclosure can be constructed using recombinant techniques (see, for example, Sambrook et al., 2001 and Ausubel et al., 1996).

In certain embodiments, a template nucleic acid for correcting a mutation may be designed for use as a single-stranded oligonucleotide. When using a single-stranded oligonucleotide, 5′ and 3′ homology arms may range up to about 200 base pairs (bp) in length, e.g., at least 25, 50, 75, 100, 125, 150, 175, or 200 bp in length.

In certain embodiments, a template nucleic acid for correcting a mutation may be designed for use with a homology-independent targeted integration system. Suzuki et al. describe in vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration (2016, Nature 540:144-149). Schmid-Burgk, et al. describe use of the CRISPR-Cas9 system to introduce a double-strand break (DSB) at a user-defined genomic location and insertion of a universal donor DNA (Nat Commun. 2016 Jul. 28; 7:12338). Gao, et al. describe “Plug-and-Play Protein Modification Using Homology-Independent Universal Genome Engineering” (Neuron. 2019 Aug. 21; 103(4):583-597).

RNAi

In certain embodiments, the genetic modifying agent is RNAi (e.g., shRNA). As used herein, “gene silencing” or “gene silenced” in reference to an activity of an RNAi molecule, for example a siRNA or miRNA refers to a decrease in the mRNA level in a cell for a target gene by at least about 5%, about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 95%, about 99%, about 100% of the mRNA level found in the cell without the presence of the miRNA or RNA interference molecule. In one preferred embodiment, the mRNA levels are decreased by at least about 70%, about 80%, about 90%, about 95%, about 99%, about 100%.

As used herein, the term “RNAi” refers to any type of interfering RNA, including but not limited to, siRNAi, shRNAi, endogenous microRNA and artificial microRNA. For instance, it includes sequences previously identified as siRNA, regardless of the mechanism of down-stream processing of the RNA (i.e., although siRNAs are believed to have a specific method of in vivo processing resulting in the cleavage of mRNA, such sequences can be incorporated into the vectors in the context of the flanking sequences described herein). The term “RNAi” can include both gene silencing RNAi molecules, and also RNAi effector molecules which activate the expression of a gene.

As used herein, a “siRNA” refers to a nucleic acid that forms a double stranded RNA, which double stranded RNA has the ability to reduce or inhibit expression of a gene or target gene when the siRNA is present or expressed in the same cell as the target gene. The double stranded RNA siRNA can be formed by the complementary strands. In one embodiment, a siRNA refers to a nucleic acid that can form a double stranded siRNA. The sequence of the siRNA can correspond to the full-length target gene, or a subsequence thereof. Typically, the siRNA is at least about 15-50 nucleotides in length (e.g., each complementary sequence of the double stranded siRNA is about 15-50 nucleotides in length, and the double stranded siRNA is about 15-50 base pairs in length, preferably about 19-30 base nucleotides, preferably about 20-25 nucleotides in length, e.g., 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length).

As used herein “shRNA” or “small hairpin RNA” (also called stem loop) is a type of siRNA. In one embodiment, these shRNAs are composed of a short, e.g., about 19 to about 25 nucleotide, antisense strand, followed by a nucleotide loop of about 5 to about 9 nucleotides, and the analogous sense strand. Alternatively, the sense strand can precede the nucleotide loop structure and the antisense strand can follow.

The terms “microRNA” or “miRNA” are used interchangeably herein are endogenous RNAs, some of which are known to regulate the expression of protein-coding genes at the posttranscriptional level. Endogenous microRNAs are small RNAs naturally present in the genome that are capable of modulating the productive utilization of mRNA. The term artificial microRNA includes any type of RNA sequence, other than endogenous microRNA, which is capable of modulating the productive utilization of mRNA. MicroRNA sequences have been described in publications such as Lim, et al., Genes & Development, 17, p. 991-1008 (2003), Lim et al Science 299, 1540 (2003), Lee and Ambros Science, 294, 862 (2001), Lau et al., Science 294, 858-861 (2001), Lagos-Quintana et al, Current Biology, 12, 735-739 (2002), Lagos Quintana et al, Science 294, 853-857 (2001), and Lagos-Quintana et al, RNA, 9, 175-179 (2003), which are incorporated by reference. Multiple microRNAs can also be incorporated into a precursor molecule. Furthermore, miRNA-like stem-loops can be expressed in cells as a vehicle to deliver artificial miRNAs and short interfering RNAs (siRNAs) for the purpose of modulating the expression of endogenous genes through the miRNA and or RNAi pathways.

As used herein, “double stranded RNA” or “dsRNA” refers to RNA molecules that are comprised of two strands. Double-stranded molecules include those comprised of a single RNA molecule that doubles back on itself to form a two-stranded structure. For example, the stem loop structure of the progenitor molecules from which the single-stranded miRNA is derived, called the pre-miRNA (Bartel et al. 2004. Cell 1 16:281-297), comprises a dsRNA molecule.

Delivery

The programmable nucleic acid modifying agents and other modulating agents, or components thereof, or nucleic acid molecules thereof (including, for instance HDR template), or nucleic acid molecules encoding or providing components thereof, may be delivered by a delivery system herein described.

Vector delivery, e.g., plasmid, viral delivery: the modulating agents, can be delivered using any suitable vector, e.g., plasmid or viral vectors, such as adeno associated virus (AAV), lentivirus, adenovirus or other viral vector types, or combinations thereof. In some embodiments, the vector, e.g., plasmid or viral vector is delivered to the tissue of interest by, for example, an intramuscular injection, while other times the delivery is via intravenous, transdermal, intranasal, oral, mucosal, or other delivery methods. Such delivery may be either via a single dose, or multiple doses. One skilled in the art understands that the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector choice, the target cell, organism, or tissue, the general condition of the subject to be treated, the degree of transformation/modification sought, the administration route, the administration mode, the type of transformation/modification sought, etc.

In certain embodiments, mRNA encoding the transcription factors are delivered to a subject in need thereof. In certain embodiments, the mRNA is modified mRNA (see, e.g., U.S. Pat. No. 9,428,535 B2)

In certain embodiments, proteins, mRNA or cells are administered via targeted injection (e.g., the tissue to be repaired), intravenous, infusion, or other delivery methods. Such delivery may be either via a single dose, or multiple doses. One skilled in the art understands that the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the target cell, or tissue, the general condition of the subject to be treated, the degree of modification sought, the administration route, the administration mode, the type of modification sought, etc.

In certain embodiment, transcription factors are expressed in target tissue cells temporarily. In certain embodiments, the time of transcription factor expression or enhancement is only the time required to differentiate or transdifferentiate cells into target cells. In certain embodiments, transcription factors are expressed or enhanced for 1 to 14 days, preferably, about 2 days. In certain embodiments, the means of delivery does not result in integration of a sequence encoding transcription factors in the genome of target cells.

The invention is further described in the following examples, which do not limit the scope of the invention described in the claims.

EXAMPLES Example 1—Identification of Transcription Factors that Differentiate hESCs into Radial Glia

Radial glia are neural progenitors of the developing mammalian brain capable of generating neurons, astrocytes, and oligodendrocytes. The two most established methods for producing neural progenitors, embryoid body formation and dual SMAD inhibition, are not high-throughput and produce non-homogenous neural progenitor populations (Chambers S M, et al., Highly efficient neural conversion of human ES and iPS cells by dual inhibition of SMAD signaling. Nat Biotechnol. 2009; 27(3):275-80; and Pankratz M T, et al., Directed neural differentiation of human embryonic stem cells via an obligated primitive anterior stage. Stem Cells. 2007; 25(6):1511-20). Applicants developed a stepwise method for differentiating hESCs into neural progenitors. Although previous studies have shown that overexpression of the TFs ASCL1 and PAX6 can drive differentiation of embryonic stem cells into neural progenitors and neurons, the TFs that direct human radial glia differentiation remain unknown (Chanda S, et al., Generation of induced neuronal cells by the single reprogramming factor ASCL1. Stem Cell Reports. 2014; 3(2):282-96; and Zhang X, et al., Pax6 is a human neuroectoderm cell fate determinant. Cell Stem Cell. 2010; 7(1):90-100). Applicants individually overexpressed candidate TFs that are specifically expressed in radial glia based on available RNA-sequencing (RNA-seq) datasets, and selected those that generate cells expressing radial glia-specific marker genes and presenting associated morphology. Identification of novel TFs that direct radial glia differentiation can enable better understanding of neural development and provide positive controls for establishing a TF screening platform.

To establish a system for TF-directed differentiation, Applicants compared two overexpression methods, cDNA and CRISPR activation (Konermann S, et al., Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature. 2015; 517(7536):583-8), to upregulate known TFs that direct differentiation of hESCs to neurons, NEUROD1 and NEUROG2, in the HUES66 hESC line (Zhang Y, et al., 2013). Applicants chose the HUES66 line because of its ability to generate brain organoids efficiently and maintain karyotype stability (Quadrato G, et al., Cell diversity and network dynamics in photosensitive human brain organoids. Nature. 2017; 545(7652):48-53). Applicants found that in this system only cDNA overexpression successfully and efficiently differentiated hESCs into neurons by immunostaining for MAP2, a neuronal marker (specifically, the TF ORF without UTR as described further herein). Based on the results of the comparison, Applicants used cDNA to overexpress TFs individually in a targeted arrayed screen to identify those that could differentiate hESCs into radial glia (FIG. 1a). Applicants selected a set of 73 TFs shown to be specifically expressed in radial glia or neural progenitors in 6 published RNA-seq datasets (Camp J G, et al., Human cerebral organoids recapitulate gene expression programs of fetal neocortex development. Proc Natl Acad Sci USA. 2015; 112(51):15672-7; Johnson M B, et al., Single-cell analysis reveals transcriptional heterogeneity of neural progenitors in human cortex. Nat Neurosci. 2015; 18(5):637-46; Pollen A A, et al., Molecular identity of human outer radial glia during cortical development. Cell. 2015; 163(1):55-67; Thomsen E R, et al., Fixed single-cell transcriptomic characterization of human radial glial diversity. Nat Methods. 2016; 13(1):87-93; Wu J Q, et al., Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proc Natl Acad Sci USA. 2010; 107(11):5254-9; and Zhang Y, et al., Purification and Characterization of Progenitor and Mature Human Astrocytes Reveals Transcriptional and Functional Differences with Mouse. Neuron. 2016; 89(1):37-53). For each TF, Applicants included isoforms that comprised >25% of the expressed transcript, resulting in a total of 90 TF isoforms (Table 1). Applicants chose to synthesize the targeted TF library to avoid potential sequence errors commonly found in existing cDNA libraries, and cloned the library into a vector with a constitutive EF1a promoter. Applicants included a V5 epitope tag and unique 24-nucleotide DNA barcode on each TF to facilitate downstream assessment of protein expression and TF abundance in the cell population respectively (SEQ ID NO: 1-90). Applicants packaged the targeted library into lentivirus for delivery into hESCs and screened the targeted library in an arrayed format, where hESCs in each well of a 96-well plate express a different TF (FIG. 1a). The barcode is transcribed but not translated (i.e., because it is not part of the ORF). The barcode is lentivirally integrated with the cDNA in the genomic DNA. Applicants PCR amplify the barcode from the genomic DNA to identify which cDNA constructs were integrated. At 4 and 7 days after transduction, Applicants evaluated the TFs using imaging for radial glia-like morphology and qPCR for two radial glia marker genes, SLC1A3 and VIM, identified in published RNA-seq datasets (Id). Applicants identified 7 candidate TFs: ASCL1, EOMES, FOS, NFIB, OTX1, PAX6, and RFX4 (FIG. 1b, c).

Applicants next evaluated the fidelity of radial glia differentiated from each candidate. First, Applicants performed RNA-seq on radial glia derived from overexpressing each candidate for 7 and 12 days. Gene signature analysis of the RNA-seq data suggested similarities (e.g., EOMES and RFX4) and differences (e.g., NFIB and ASCL1) in the transcriptomes between the candidates. To determine how closely the differentiated radial glia resembled their in vivo counterparts, Applicants computationally generated gene expression signatures based on the 1,000 most differentially expressed genes compared to the GFP overexpression control and quantified enrichment of these signatures in human fetal radial glia and other neural cell types from the Pollen et al. dataset (FIG. 2) (Pollen A A, et al., 2015; and Barbie D A, et al., Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009; 462(7269): 108-12). Applicants found that candidates NFIB and RFX4 produced radial glia that were most similar to radial glia in vivo. Second, Applicants immunostained for radial glia markers NES and VIM, and found that all of the radial glia differentiated by these candidates expressed these markers (FIG. 3a). Finally, Applicants spontaneously differentiated the radial glia to determine if they could produce neurons, astrocytes, and oligodendrocytes. Applicants cloned the candidates into a different vector under a dox-inducible promoter and induced expression of the candidates for 5, 7, and 12 days and then withdrew growth factors EGF and bFGF from the media (which maintain the progenitor state) and allowed cells to differentiate for 1, 2, and 4 weeks. Applicants immunostained for markers identifying neurons (MAP2), astrocytes (GFAP), and oligodendrocyte precursors (NG2 and PDGFRA). Similar to neural development in vivo, Applicants observed that neurogenesis occurred before gliogenesis. By 4 weeks of differentiation, radial glia differentiated from 3 of the 7 candidates (FOS, NFIB, and OTX1) produced only neurons and 3 (ASCL1, PAX6, and RFX4) produced both neurons and astrocytes (FIG. 3b). Applicants, further show that the 4 candidates (ASCL1, NFIB, PAX6, and RFX4) differentiate into both neurons and astrocytes by week 4 (FIGS. 4-7). Transcription factors were induced with doxycycline for 6 days and cells were stained at the indicated time points.

Discussion of Methods for Selection and Characterization of TFs Driving Optimal Radial Glia Differentiation

Applicants can continue to validate the candidate TFs. Applicants have already identified and selected the most promising TFs for further characterization to understand their role in radial glia differentiation. In particular, because some of the candidates did not produce neurons until after 4 weeks of differentiation, Applicants can spontaneously differentiate radial glia derived by candidate TF overexpression for a total of 6-8 weeks to observe additional astrocytes and oligodendrocytes. Applicants can immunostain the cells that have been differentiated for 6 and 8 weeks to determine which candidates generate radial glia that can differentiate into all 3 cell types at this time point. After pinpointing the ideal TF induction and differentiation timeline, Applicants can perform single-cell RNA-seq on the cells spontaneously differentiated from the top 4 candidates to more precisely characterize the types of differentiated cells. Due to the morphology of neural cells and difficulty in dissociating single neural cell types, single nuclei can be isolated from neural cells and sequenced as previously described (see e.g., WO/2017/164936). Applicants can compare the anatomical location of the cell types that the differentiated cells correspond to in vivo to the TF expression pattern in the human brain using the Allen Human Brain Atlas (Sunkin S M, et al., Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system. Nucleic Acids Res. 2013; 41(Database issue):D996-D1008). To better understand the regulatory pathways through which the TFs drive differentiation, Applicants can also perform chromatin immunoprecipitation followed by sequencing (ChIP-seq) using the epitope tag (e.g., V5) on the TF cDNA constructs and identify target genes for the top 4 candidates. Applicants can integrate differentially expressed genes and TF target genes from the RNA-seq and ChIP-seq results respectively to better understand potential pathway similarities and differences between the top 4 TFs. Finally, Applicants can combine 2 or 3 of the top 4 candidates and assess any potential synergistic improvement in radial glia fidelity using RNA-seq and spontaneous differentiation.

Given the data described herein, Applicants expect to find several candidate TFs whose overexpression can differentiate hESCs into radial glia that closely resemble primary cells. Applicants can also uncover multiple candidate TFs that each produce different subtypes of radial glia. Some of these candidates might upregulate the radial glia marker genes without exhibiting other properties associated with radial glia, such as ability to differentiate into different neural cell types. Since the candidate TFs likely have different downstream gene targets, the radial glia produced can have different transcriptome signatures and spontaneously differentiate into varying proportions of different downstream neural cell types. Applicants expect that the types of downstream cell types identified by single-nuclei RNA-seq can correlate with the expression pattern of the TF in the human brain.

A number of directed differentiation protocols require overexpression of two or more TFs for successful cell type conversion. It is possible that one TF can be insufficient for generating radial glia that can maintain multipotency and spontaneously differentiate into neurons, astrocytes, and oligodendrocytes. In this case, Applicants can select 5-10 candidates that produce cell types with transcriptome signatures that are most similar to human fetal radial glia and overexpress different combinations of these candidates. Applicants can also combine the top 5-10 TFs that are most specifically and highly expressed in radial glia based on available RNA-seq datasets (Camp J G, et al., 2015; Johnson M B, et al., 2015; Pollen A A, et al., 2015; Thomsen E R, et al., 2016; Wu J Q, et al., 2010; and Zhang Y, et al., 2016).

Example 2—Arrayed TF Screen for iNP Differentiation

As described in example 1, Applicants compared two methods for overexpressing TFs to direct differentiation, ORF (open reading frame, cDNA) and synergistic activation mediators (SAM) CRISPR-Cas9 activation16. Applicants used these methods to stably upregulate NEUROD1 or NEUROG2, two TFs that have been previously shown to induce neuronal differentiation, in the HUES66 hESC line (FIG. 18a)12. For both TFs, Applicants found that expression of the TF ORF effectively induced neuronal differentiation (FIG. 18b-f). However, overexpression of the TFs using the ORF with endogenous UTRs or SAM CRISPR-Cas9 activator did not efficiently differentiate hESCs into neurons despite robust transcriptional upregulation, potentially due to endogenous post-transcriptional regulatory mechanisms that limit protein expression (FIG. 18b-f). The results suggest that cell fate pathways are tightly regulated and that using the most artificial overexpression method, TF ORF, would be advantageous for cellular engineering.

Based on the results of the comparison, Applicants used TF ORF overexpression to screen for TFs that could differentiate hESCs into iNPs first in an arrayed format to identify optimal parameters and candidate TFs that could guide the development of pooled TF screens (FIG. 19a,b). To select a subset of TFs for the arrayed screen, Applicants examined eight RNA-seq datasets17-24 that were available at the time and identified 70 TFs that were shown to be specifically expressed in NPs. For each TF, Applicants included isoforms that comprised >25% of the expressed transcript, resulting in a total of 90 TF isoforms (FIG. 19a and Table 1). Applicants synthesized the TF isoforms and packaged each TF individually into lentivirus for delivery into hESCs in an arrayed format (FIG. 19b). During the screen, Applicants incrementally shifted the stem cell culture media to NP media (FIG. 19c) and measured expression of two NP marker genes selected using published RNA-seq datasets, SLC1A3 and VIM, at 4 and 7 days after transduction (FIG. 19b)17-24. The arrayed TF screen identified eight candidate TFs whose isoforms ranked in the top 10% for SLC1A3 and VIM upregulation in the screen (FIG. 19d-g; Table 1).

Example 3—Development of a Pooled TF Screening Platform

Pooled screens are less expensive and time-intensive than arrayed screens because they do not require individually preparing each perturbation (e.g., overexpression of TFs) in the library. Pooled screening involves transducing pooled lentiviral libraries at a low multiplicity of infection (MOI) to ensure that most cells only receive one stably integrated construct. At the end of the screen, deep sequencing of DNA barcodes contained in the constructs integrated in the bulk genomic DNA can be used to identify changes in the construct distribution resulting from the applied screening selection pressure. In certain embodiments, cells having characteristic markers for the cell type of interest (e.g., radial glia) are sorted and the DNA barcodes corresponding to TFs are determined, thus identifying TFs required for differentiation into the cell type of interest.

Applicants provide a generalizable TF screening platform based on pooled screening for further identification of regulators driving cellular differentiation (FIG. 8a). Applicants can develop the pooled screen based on the findings differentiating hESCs into radial glia. The pooled screening platform further comprises engineered hESC reporter lines that fluoresce upon differentiation into radial glia by genetically tagging radial glia marker genes with GFP. The pooled screening platform provides a more cost-effective, versatile, and reliable approach compared to antibody staining. In addition, the use of reporter lines for marker genes found through RNA-seq of target cell types increases the versatility of the platform; for any cell type of interest, one can collect RNA-seq data, identify marker genes, and screen for TFs that upregulate the marker genes. Applicants can overexpress pooled TF libraries in the hESC reporter lines, and select for candidates using flow cytometry followed by deep sequencing of the barcodes associated with the cDNAs (FIG. 8a). Applicants can validate the pooled screening approach by pooling the 90 TFs from Examples 1-2 and performing a pooled screen with this targeted TF library. To develop a generalizable platform for differentiating hESCs into any desired cell type, Applicants can scale up the pooled screen first with an available >1300 TF library from the Broad Genomics Perturbations Platform (GPP) and then with a synthesized >3500 TF library consisting of all annotated TFs. The genome-scale TF library can be a valuable resource for constructing a directed differentiation cell atlas that can be helpful for the scientific community.

Applicants have engineered two different HUES66 hESC reporter lines that express the fluorescent protein EGFP upon upregulation of an endogenous radial glia marker gene, either VIM or SLC1A3. Screening in two different marker gene reporter lines can more specifically pinpoint which TFs direct radial glia differentiation rather than upregulate one gene that may also be expressed in other cell types. For each marker gene, Applicants used CRISPR-Cas9 to precisely edit the endogenous locus such that the EGFP is expressed under the same promoter as the marker gene, followed by a ribosomal skipping site P2A and the marker gene (Cong L, et al., Multiplex genome engineering using CRISPR/Cas systems. Science. 2013; 339(6121):819-23; and Mali P, et al., RNA-guided human genome engineering via Cas9. Science. 2013; 339(6121):823-6). Applicants chose to insert EGFP at the N-terminus of the proteins because its location was consistent across the isoforms. The P2A ribosomal skipping site separates the EGFP and marker gene proteins and prevents the EGFP insertion from potentially interfering with protein folding of the endogenous gene. For each marker gene, Applicants generated three clonal hESC lines to reduce the possibility that candidate TFs identified only have an effect in a particular clonal line. Applicants evaluated the ability of the reporter lines to fluoresce upon marker gene upregulation by targeting CRISPR activators to the marker gene promoter as well as by overexpressing a candidate TF from Example 1 to differentiate the hESCs into radial glia (Konermann S, et al., 2015). In both cases, Applicants detected EGFP fluorescence in both marker lines by imaging. For TF overexpression, Applicants also observed morphological changes consistent with radial glia differentiation.

Applicants validated the pooled screening system by pooling the targeted 90 TF library in Examples 1-2 and performing a targeted pooled screen (FIG. 8a). Applicants amplified and packaged the pooled library into lentiviral vectors and transduced each hESC reporter line at MOI<0.3. After 7 days, Applicants used flow cytometry to isolate live cells expressing fluorescent EGFP, indicating upregulation of the radial glia marker gene, and live cells with the lowest 15% fluorescence for baseline TF distribution. Applicants isolated genomic DNA from each population, PCR amplified the DNA barcodes associated with the TFs, and deep sequenced the barcodes to identify TFs that were more enriched in the fluorescent population compared to control in the VIM and SLC1A3 reporter cell lines (FIG. 8b). Applicants found that the candidates identified in Example 1 were significantly enriched in the pooled screens. Six of 7 TFs were in the top 15 candidates. ASCL1 was not enriched in the fluorescent population in the pooled screens, potentially because ASCL1-driven differentiation relies on early formation of neural rosettes (FIG. 1c), or radial arrangements of neural stem cells, which are less likely to form in a pooled screen because nearby cells can be overexpressing different TFs rather than ASCL1. FIG. 9 is a scatterplot of the 1,387 TF screening results, showing that the 7 TF candidates (ASCL1, EOMES, FOS, NFIB, OTX1, PAX6, and RFX4) are enriched and also show additional candidates for differentiating stem cells into radial glia (FANCD2, NOTCH1, SMARCC1, ESR2, ESR1, and MESP1).

Development of a Versatile Genome-Scale TF Screen

To scale up the pooled TF screen to include all annotated TF isoforms, Applicants can use the >1,300 TF library from the Broad GPP and then synthesize a >3,500 genome-scale TF library that includes all annotated TFs (see, e.g., Table 3). The Broad GPP library is a convenient intermediate because it is readily available at a lower cost. Applicants added the candidates identified in Examples 1-2 to the Broad GPP library as positive controls. Applicants amplified the pooled Broad GPP library and verified even distribution of the TFs with deep sequencing. Applicants can package the Broad GPP library into lentivirus for transducing the hESC radial glia reporter lines. As in the targeted pooled screen, Applicants can isolate the fluorescent and control cell populations and deep sequence the barcodes to compare the TF distribution between the two populations. Applicants can evaluate the results of the Broad GPP library using the candidates identified in Examples 1-2. If the TF screen using the Broad GPP library is successful, Applicants can synthesize the complete >3,500 genome-scale TF library and screen for radial glia differentiation using the genome-scale library.

Validation of Novel TFs

Applicants can validate any additional TFs identified in the pooled screens using the arrayed methods described in Examples 1-2. If any of the candidate TFs produce radial glia that are comparable with the top 3 candidates identified in Examples 1-2, Applicants can combine the TF(s) from the pooled screens with those from the arrayed screens to potentially improve radial glia fidelity.

Discussion

By starting with a targeted pooled library and incrementally scaling up to a genome-scale library using known positive controls, Applicants can establish a generalizable TF screening platform. As Applicants increase the TF library size, Applicants expect that the proportion of fluorescent cells in the screening population can decrease. Applicants can adjust the screening parameters, such as increasing flow cytometry time and number of PCR cycles for barcode amplification, to detect the rarer positive population. Performing the pooled screening platform with the genome-scale TF library may provide additional novel TFs that can drive radial glia differentiation.

As shown in Examples 1-2, it is possible that radial glia differentiation can require upregulation of multiple TFs. To screen for combinations of TFs, Applicants can transduce the TF libraries at high MOI such that each cell potentially overexpresses multiple TFs. Applicants can validate the candidates most enriched for radial glia marker gene expression both individually and combinatorically. Multiple barcodes in single cells can be determined by any single cell sequencing method described herein.

Since current neural progenitor differentiation protocols often require formation of neural rosettes, it is possible that pooled screening cannot recover some candidates found in the arrayed screen in Examples 1-2. Applicants can recover these candidates by constructing an inducible TF library (e.g., dox inducible), transducing the library at low cell density, allowing the cells to multiply in small colonies, and then inducing TF overexpression.

Compared to short hairpin RNAs and guide RNAs, cDNAs contain longer variable sequences, which can increase the skew in the distribution of pooled cDNA libraries. If the pooled cDNA libraries are significantly more skewed, Applicants can increase the screening coverage such that more cells are expressing each cDNA.

Example 4—Development of a Pooled TF Screening Platform Using Flow-FISH

Applicants have further developed a pooled transcription factor screening platform that does not require generating clonal cell lines that express a marker gene. Applicants have used Flow FISH to read out transcription factor screens. The method provides for detecting marker genes for indicating differentiation of target cells using gene specific probes and sorting the cells. In certain embodiments, multiple markers are used to increase specificity. Selecting for multiple reporter genes at the same time can narrow down target cell types because usually one gene is not specific enough depending on the target cell type. Additionally, the assay is versatile in that reporter genes can be added or changed by applying different probes. Flow FISH combines FISH to fluorescently label mRNA of reporter genes and flow cytometry (see, e.g., Arrigucci et al., FISH-Flow, a protocol for the concurrent detection of mRNA and protein in single cells using fluorescence in situ hybridization and flow cytometry, Nat Protoc. 2017 June; 12(6):1245-1260. doi:10.1038/nprot.2017.039). Specifically, Applicants fluorescently label mRNA of reporter genes, select for target cell types by flow cytometry, and then amplify TF barcodes to identify TFs enriched in the target cells. In certain embodiments, the marker genes are selected, such that they are specifically expressed only in the target cell. In this way, false positive selection or background is avoided. The assay is also optimized to remove background fluorescence and to select for true positive cells.

Applicants used the 90 TF library to screen for TFs that differentiate into radial glia by combining both SLC1A3 and VIM probes for those reporter genes (Table 4). The data shows that Applicants were able to selectively enrich for TFs that were identified in the arrayed and reporter gene screens to differentiate radial glia described in Examples 1-3.

Example 5—Identification of Candidate TFs Using the Pooled TF Screening Platform

Having optimized parameters and identified candidate TFs in the arrayed screen, Applicants generated a pooled TF screening approach, as described herein. The pooled screening platform is less expensive and laborious than arrayed screening, making it more high-throughput. Applicants simplified TF identification in pooled screens by pairing a unique DNA barcode with each of the 90 TF ORF isoforms synthesized for the arrayed screen (FIG. 20a; Table 1). Applicants pooled the barcoded TFs and packaged the TFs into a pooled lentiviral library for delivery (FIG. 13a). To determine the ideal strategy for selecting TFs that drive iNP differentiation, Applicants explored three different methods that can simultaneously assay different numbers of marker genes to select for target cell types: reporter cell line (1 gene), flow-FISH (up to 10 genes), and single-cell RNA-seq (scRNA-seq; up to ˜2,000 genes; FIG. 13a and FIG. 20b).

For the reporter cell line method, Applicants generated clonal reporter cell lines with EGFP inserted downstream of an endogenous NP marker gene, either SLC1A3 or VIM as described. Applicants transduced the SLCIA3 or VIM reporter cell line with the pooled TF library, differentiated the cells for 7 days, and sorted for high and low EGFP-expressing cells (FIG. 13a and FIG. 20b, c). Deep sequencing of the TF barcodes in each population identified nine candidate TFs that were ranked in the top 10% for enrichment in the high EGFP-expressing cell population, indicating upregulation of SLC1A3 or VIM (FIG. 20d, e and Table 1). Five of the nine candidate TFs were identified in the arrayed screen (FIG. 20d, e and Table 1).

For the flow-FISH method, Applicants transduced hESCs with the pooled TF library, differentiated the cells for 7 days, and labeled 2 or 10 NP marker gene transcripts using pooled FISH probes (FIG. 13a and FIG. 20b). By pooling the FISH probes, Applicants could sort for cells expressing high or low levels of 2-10 marker genes at the same time (FIG. 20f, g). Similar to the reporter cell line method, Applicants deep sequenced the TF barcodes and identified eight candidate TFs whose isoforms ranked in the top 10% for enrichment in cells expressing higher levels of marker genes (FIG. 13b, c and Table 1). Applicants found that for some TFs, such as EOMES and RFX4, the choice of TF isoform can produce very different differentiation results (FIG. 13c). Six of the eight candidate TFs from the flow-FISH screen overlapped with those from the arrayed screen (FIG. 20d, e and Table 1).

For the scRNA-seq method, Applicants transduced hESCs with the pooled TF library, differentiated the cells for 7 days, and performed scRNA-seq to profile 59,640 single cells (FIG. 13a and FIG. 20b). In the barcoded TF ORF vector design, the TF barcode is expressed in the TF mRNA, which is captured by scRNA-seq and can be mapped to cell barcodes (FIG. 20a). After assigning TFs to cells, Applicants found that the number of cells that had each TF overexpressed was very skewed, with the top 10% of TFs having 92 times more cells than the bottom 10% of TFs, potentially due to TF-dependent effects on cell death and proliferation (FIG. 21a). Cluster analysis of the scRNA-seq results suggested that overexpression of several TFs, for instance ASCL1 and FEZF2, generated distinct transcriptome signatures that clustered together, while overexpression of most TFs did not produce distinct transcriptome signatures (FIG. 21b-d). By correlating the TF transcriptome signatures with those of radial glia from published datasets20,25,26, which represent NPs in the developing cortex, Applicants identified eight candidate TFs whose isoforms ranked in the top 10% for highest correlation (FIG. 21d and Table 1). Three of the eight candidate TFs were candidates identified in the arrayed screen, potentially because scRNA-seq samples provide expression of more genes (FIG. 21d and Table 1).

Overall, the arrayed and pooled screens nominated overlapping sets of candidate TFs for iNP differentiation (FIG. 13d, and Table 1). Out of the pooled screening methods, flow-FISH identified the highest number (6 out of 8) of candidate TFs that overlapped with other screens (FIG. 13d and FIG. 20h). Flow-FISH is also more versatile than reporter cell lines and more accessible than the scRNA-seq, suggesting that it may be the ideal screening method for other cell types.

Example 6—Validation of Candidate TFs

To validate the screening results, Applicants chose to focus on the eight candidate TFs from the flow-FISH screen as well as two additional candidates that were enriched in the other screens and previously suggested to mediate iNP differentiation, ASCL127 and PAX628 (FIG. 13d). Applicants individually overexpressed the top isoform of each TF in hESCs and verified TF expression (FIG. 22a). Immunostaining the iNPs for NP markers showed that all iNPs expressed higher levels of VIM, a gene used to select target cells in the pooled screen, compared to hESCs and exhibited diverse morphologies (FIG. 14a and FIG. 22b). Five candidate TFs (OTX1, EOMES, RFX4, PAX6, and ASCL1) produced iNPs that were morphologically distinct from hESCs overexpressing GFP control, two candidate TFs (HES1 and LHX2) produced iNPs with similar morphologies to hESCs, and three candidate TFs (NFIC, FOS, and NFIB) produced iNPs with morphologies that were in between the two groups. Applicants then compared bulk RNA-seq signatures of iNPs to different cell types in the human fetal cortex or brain organoids20,25,26. Applicants found that transcriptome signatures of iNPs derived using RFX4, ASCL1, and PAX6 were the most similar to NPs, whereas those produced by EOMES and FOS were the most different (FIG. 14b and FIG. 22d, e). The validation results suggest that although overexpression of all candidate TFs upregulated NP marker genes, not all candidate TFs generated cells with transcriptome signatures that resembled those of NPs.

Example 7—Spontaneous Differentiation of iNPs

Next, Applicants functionally validated the candidate TFs by spontaneously differentiating the iNPs produced by each candidate. Applicants transiently overexpressed candidate TFs for 1 week to produce iNPs and removed growth factors from the media to allow the iNPs to spontaneously differentiate (FIG. 15a). Functional iNPs, like NPs, should spontaneously differentiate into cell types in the central nervous system (CNS) such as neurons and astrocytes. Out of the ten candidate TFs, four (RFX4, NFIB, PAX6, and ASCL1) produced iNPs that spontaneously differentiated into neurons, astrocytes, and, more rarely, oligodendrocyte precursor cells (FIG. 15b and FIG. 23). Spontaneous differentiation of iNPs generated by these four TFs followed the natural developmental progression of neurogenesis starting at week 1 followed by gliogenesis at week 4 (FIG. 15b and FIG. 23). RFX4 iNPs patterned into neural rosettes prior to neurogenesis (FIG. 15b and FIG. 23).

Applicants validated these four TFs in two additional pluripotent stem cell lines, iPSC11a and H1. For both cell lines, overexpression of the four TFs produced iNPs that expressed higher levels of NP marker genes relative to GFP control (FIG. 24a, b). Using spontaneous differentiation to functionally characterize iNPs, Applicants found that RFX4 and NFIB consistently produced functional iNPs in iPSC11a (FIG. 24c), and RFX4 produced functional iNPs in H1 (FIG. 24d). These results indicate that the effects of some TFs are cell line-dependent, while others, like RFX4, are cell line-independent and more likely to play critical roles in NP specification during development.

Applicants further characterized the cells spontaneously differentiated from iNPs produced by these four TFs using scRNA-seq. Cluster analysis of 52,364 cells revealed that the iNPs generated a broad range of cell types that are produced by NPs during development, such as cell types from the retina, CNS, epithelium, and neural crest (FIG. 16a, b, FIG. 25a, and Tables 5 and 6). Applicants found that the spontaneously differentiated cell types were generally consistent between biological replicates and distinct between TFs (FIG. 16c, d). RFX4 produced more CNS cell types; NFIB produced more epithelium and neural crest cell types; PAX6 generated cell types in all regions; and ASCL1 produced more retina cell types (FIG. 16c, d). The distributions of cell types generated by TF-iNPs are similar to those of human brain organoids generated with NP embryoid bodies (FIG. 16d and FIG. 25b). Together, the spontaneous differentiation results show that four of the candidate TFs produce functional iNPs.

Applicants sought to better understand the transcriptional networks that lead to iNP production by profiling the transcriptional targets of the four TFs using chromatin immunoprecipitation with sequencing (ChIP-seq). Motif analysis generated distinct motifs for each TF and suggested potential transcription coregulators, some of which have been previously shown to interact with the TF (FIG. 26a)29,30. Applicants assigned TFs as potential regulators of a NP marker gene if the TF had a ChIP-seq peak within 10 kb of the gene's transcriptional start site (FIG. 26b-d). For each TF, Applicants identified NP marker genes with TF ChIP-seq peaks that were also differentially expressed upon TF overexpression. Comparison of these NP marker genes between TFs suggested candidate genes that could contribute to the potential mechanisms by which each TF produced iNPs (FIG. 26e). In addition, Applicants found that each of the four TFs had ChIP-seq peaks that were proximal to its own promoter, indicating that each TF positively regulates its own expression to sustain the high expression levels required for differentiation (FIG. 26e).

Example 8—Modeling Neurodevelopmental Disorders Using iNPs

To demonstrate that iNPs can be used to model neurological disorders, Applicants knocked out and overexpressed DYRK1A, perturbations which have been implicated in autism spectrum disorder31 and Down syndrome32 respectively, in iPSC11a (FIG. 17a-c and FIG. 27a, b). Applicants transiently overexpressed RFX4 to differentiate the iPSCs into iNPs to study the effects of DYRK1A perturbation on NPs during neural development. Applicants characterized iNPs using bulk RNA-seq and identified genes that were significantly differentially expressed as a result of DYRK1A perturbation (FIG. 17d, FIG. 27c-f, and Table 7). Applicants identified 42 genes that showed DYRK1A dosage-dependent expression changes, some of which are known to be involved in cellular proliferation, neuronal migration, and synapse formation (FIG. 17d).

Applicants then spontaneously differentiated the iNPs to further profile the effects of DYRK1A perturbation on neurogenesis and neural development. Applicants found that knockout of DYRK1A increased, whereas overexpression of DYRK1A decreased, the proportion of proliferating iNPs (FIG. 17e, f), consistent with results from previous studies of DYRK1A perturbation in different model systems33-37. At week 0 of spontaneous differentiation, DYRK1A knockout iNPs showed reduced proliferation, potentially due to toxicity of DNA double-strand breaks introduced by Cas9 (FIG. 17e). However, at weeks 2 and 4, DYRK1A knockout iNPs showed significantly increased proportions of proliferating cells, indicating that more iNPs were actively dividing instead of undergoing neurogenesis (FIG. 17e). As a result, at weeks 2 and 4, Applicants observed a significant reduction in neuronal MAP2 staining (FIG. 17g and FIG. 27g). In contrast, at weeks 0 and 2, DYRK1A overexpression iNPs showed lower proportions of proliferating cells (FIG. 17f). Since there are fewer iNPs due to lower initial proliferation, Applicants observed significant reductions in neuronal MAP2 staining at weeks 0 and 1 (FIG. 17h). Collectively, the DYRK1A perturbation experiments demonstrate that RFX4-iNPs can be used to model effects of perturbations on neural development and neurogenesis, advancing our understanding of complex neurological disorders.

Example 9—Genome-Scale TF Screen to Identify Drivers of Astrocyte Differentiation

Astrocytes are the most abundant cell type in the vertebrate central nervous system. Although previously thought to be passive responders of neuronal damage, growing evidence suggests that astrocytes actively signal to neurons to influence synaptic development, transmission, and plasticity through secreted and contact-dependent signals (Chung W S, et al., 2015). Current protocols to differentiate astrocytes from hESCs are labor-intensive, requiring the production of embryoid bodies, and take several months to produce mature astrocytes (Krencik R, et al., 2011). Identification of TFs that direct astrocyte differentiation can enable better understanding of astrocyte development and contribute to more complete models of the brain amenable to high-throughput studies. Therefore, Applicants can apply the genome-scale TF screens described herein to identify candidates that can differentiate radial glia into astrocytes (FIG. 10). In addition, performing the astrocyte differentiation screen using the radial glia developed in Examples 1 and 2, 3, 4 can validate the radial glia as a robust model for high-throughput screening.

Using the methods described in Example 2, Applicants have engineered two different HUES66 hESC reporter lines that express the fluorescent protein EGFP upon upregulation of an astrocyte marker gene, either ALDH1L1 or GFAP. For each reporter line, Applicants generated three clonal lines and verified fluorescence upon marker gene upregulation using CRISPR activation. Flow-FISH using astrocyte markers and scRNA-seq may also be used as described.

Genome-Scale TF Screen for Astrocyte Differentiation

Applicants can differentiate both the GFAP and ALDH1L1 hESC reporter lines or hESCs into radial glia using dox-inducible overexpression of the top radial glia candidate TF(s) found in Examples 1-9. Once the hESC cells have differentiated into radial glia, Applicants can withdraw dox to turn off overexpression and transduce the cells with the genome-scale TF library. Since neurogenesis precedes gliogenesis in the developing brain, Applicants hypothesize that astrocyte differentiation might require signaling from neurons. Applicants can thus perform the TF screen in the presence of neurons differentiated through NEUROG2 overexpression (Zhang Y, et al., 2013). Astrocyte differentiation might also require more time than radial glia differentiation, so Applicants can perform small-scale screens to determine the optimal time point. After 1, 2, and 4 weeks of differentiation, Applicants can use flow cytometry to quantify the percentage of fluorescent cells. Applicants can then perform the genome-scale screen and, at the time point with the highest percentage of fluorescent cells, Applicants can isolate fluorescent cells indicating upregulation of the marker gene and cells with the lowest 15% of fluorescence as controls. Applicants can deep sequence the TF barcodes in both populations to identify TFs enriched in the fluorescent population.

Validation of Candidate TFs

After identifying candidate TFs for astrocyte differentiation, Applicants can evaluate the fidelity of astrocytes differentiated from these candidates using RNA-seq, immunostaining, and functional studies on synapse formation and elimination. Applicants can perform RNA-seq on the differentiated astrocytes at two different time points determined by enrichment of fluorescent cells during the screen. Applicants can compare the RNA-seq results from differentiated astrocytes to those from human astrocytes using methods described in Example 1-2. Applicants can also immunostain the differentiated astrocytes for astrocyte markers SOX9, AQP4, and GFAP. Finally, Applicants can assess the ability of differentiated astrocytes to promote synapse formation and elimination. For synapse formation, Applicants can culture isolated mouse neurons or differentiated human neurons with and without the differentiated astrocytes and quantify the number of synapses in each condition by immunostaining for pre- and post-synaptic markers bassoon and homer1, respectively, and imaging. Applicants can quantify synapse elimination with an in vitro assay used in previous studies where Applicants conjugate a pH-sensitive fluorescent dye (pHrodo) to isolated synaptosomes that fluoresce upon incorporation into lysosomes through phagocytosis (Chung W S, et al., Astrocytes mediate synapse elimination through MEGF10 and MERTK pathways. Nature. 2013; 504(7480):394-400).

Discussion

Like radial glia, astrocytes in the human brain are very diverse, and Applicants therefore expect to find multiple TFs that direct differentiation into different subtypes of astrocytes. These TFs can likely regulate cellular pathways that are important for astrocyte function. Like in vivo astrocytes, the differentiated astrocytes can potentially increase synapse formation and phagocytose synaptosomes.

Since astrocytes arise at a later time point than radial glia during development, Applicants may extend the differentiation time of the pooled screen accordingly. In addition, it is possible that astrocyte differentiation requires exogenous factors beyond those provided by NEUROG2-differentiated neurons. Applicants can screen in the presence of isolated mouse neurons or mouse cortical brain slices to provide additional factors. If astrocyte differentiation requires upregulation of more than one TF, Applicants can transduce the TF library at high MOI. Applicants can also combine TF upregulation with downregulation by generating a TF CRISPR knockdown library and transducing cells with both the cDNA and CRISPR knockdown libraries.

Example 10—Discussion

In summary, Applicants have developed a systematic method to identify TFs for iNP differentiation that could be applied to any cell type of interest. Applicants showed that Applicants could start with NP RNA-seq data to select TFs and marker genes for unbiased pooled screening. Applicants demonstrated feasibility of using reporter cell line, flow-FISH, or scRNA-seq methods to select candidate TFs. Applicants found four novel TFs that could individually differentiate hESCs and iPSCs into iNPs that resemble the morphology, transcriptome signature, and functionality of human fetal radial glia. Out of the four candidate TFs, RFX4-derived iNPs spontaneously differentiated into the highest proportion of CNS cell types, although relative to the other candidates RFX4 has not been extensively studied in CNS development38,39. The findings thus highlight the importance of performing unbiased TF screens. By knocking out and overexpressing DYRK1A in iNPs to model neurodevelopmental disorders, Applicants demonstrated the potential of iNPs to advance our understanding of complex processes in development and disease.

The screening approach could be extended to generate other cell types that may require more than one TF. To identify combinations of TFs, Applicants could screen TFs at a higher MOI to increase the probability of introducing more than one TF in the same cell. Iterative TF screens, for instance performing TF screens in iNPs for differentiation into neurons or glia, may more closely mimic the natural developmental trajectory and facilitate generation of mature cell types. Other factors, such as mechanical stress or signaling from other cell types that are naturally present during development, may also be necessary in TF screens for some cell types.

Beyond cellular programming, TF screening enables identification of factors involved in cellular reprogramming and trans-differentiation, as well as cancer progression and senescence. The demonstration that barcoding of ORFs allows for a variety of screening selection methods could also apply to pooled ORF screening of other protein families of interest. Future application of this TF screening platform for cellular engineering has the potential to expand the number of available cellular models that will help elucidate complex regulatory mechanisms behind development and disease.

Example 11—TF Screen to Identify Drivers of Cardiomyocyte Differentiation

Using the described screens, Applicants have identified that the transcription factor EOMES generates cardiomyocytes. Overexpression of EOMES for 2 days differentiates stem cells into beating cardiomyocytes by 8 days. This differentiation method produces much higher percentages of cardiomyocytes (˜75% vs ˜30%) than the published mouse method (see, e.g., Van den Ameele J, Tiberi L, Bondue A, et al. Eomesodermin induces Mesp1 expression and cardiac differentiation from embryonic stem cells in the absence of Activin. EMBO Reports. 2012; 13(4):355-362. doi:10.1038/embor.2012.23; and WO2013010965A1). The present invention has demonstrates using human EOMES for differentiating human stem cells. For the cardiomyocytes, Applicants have observed the cells beating after 2 weeks of differentiation and have made a video recording. Applicants have also further identified MESP1 and ESR1 as candidates that drive cardiomyocyte differentiation. In certain embodiments, the cardiomyocytes generated according to the present invention may be used for transplant into patients suffering from heart disease. The present methods also allow for generating cardiomyocytes in a method requiring the expression of a single transcription factor as opposed to previous methods requiring fibroblasts to be differentiated into cardiomyocytes by expressing three transcription factors. In certain embodiments, the cardiomyocytes of the present invention may be used for screening drugs. For example, drugs that are toxic to cardiomyocytes can be screened.

Conditions for generating cardiomyocytes according to the present invention include the following. Culturing ES cells in RPMI+1X B27(without insulin)+50 ug/ml ascorbic acid; switch to RPMI+1×B27 at day 7. The seeding density is high (about 500,000 cells/mL). Dox (about 500 ng/ml) is added to induce expression of the transcription factor (e.g., EOMES) between or at days 0-2. This method results in about 75% of the cells expressing the cardiomyocyte marker TNNT2.

FIG. 11 shows an experiment differentiating cardiomyocytes with different concentrations of Dox to express two different EOMES isoforms. Applicants measured the percentage of cells expressing TNNT2 (Troponin T, cardiomyocyte marker) by fixing cells, staining with TNNT2 antibodies, and quantifying using flow cytometry at 10 days after the start of dox induction. As used herein, 263 refers to EOMES isoform NM_005442 (SEQ ID NO: 10807) and 312 refers to EOMES isoform NM_001278182 (SEQ ID NO: 10808). As used herein, d2, d4, and d6 refers to 2 days, 4 days, and 6 days of dox induction respectively. As used herein, and refer to cell seeding density at 300,000 cells/mL and 500,000 cells/mL. In conclusion, FIG. 11 shows that 2 days of dox induction at 500,000 cells/mL are required for high efficiency differentiation of cardiomyocytes for the 263 and 312 isoforms.

(SEQ ID NO: 10807 MQLGEQLLVSSVNLPGAHFYPLESARGGSGGSAGHLPSAAPSPQK LDLDKASKKFSGSLSCEAVSGEPAAASAGAPAAMLSDTDAGDAFA SAAAVAKPGPPDGRKGSPCGEEELPSAAAAAAAAAAAAAATARYS MDSLSSERYYLQSPGPQGSELAAPCSLFPYQAAAGAPHGPVYPAP NGARYPYGSMLPPGGFPAAVCPPGRAQFGPGAGAGSGAGGSSGGG GGPGTYQYSQGAPLYGPYPGAAAAGSCGGLGGLGVPGSGFRAHVY LCNRPLWLKFHRHQTEMIITKQGRRMFPFLSFNINGLNPTAHYNV FVEVVLADPNHWRFQGGKWVTCGKADNNMQGNKMYVHPESPNTGS HWMRQEISFGKLKLTNNKGANNNNTQMIVLQSLHKYQPRLHIVEV TEDGVEDLNEPSKTQTFTFSETQFIAVTAYQNTDITQLKIDHNPF AKGFRDNYDSSHQIVPGGRYGVQSFFPEPFVNTLPQARYYNGERT VPQTNGLLSPQQSEEVANPPQRWLVTPVQQPGTNKLDISSYESEY TSSTLLPYGIKSLPLQTSHALGYYPDPTFPAMAGWGGRGSYORKM AAGLPWTSRTSPTVFSEDQLSKEKVKEEIGSSWIETPPSIKSLDS NDSGVYTSACKRRRLSPSNSSNENSPSIKCEDINAEEYSKDTSKG MGGYYAFYTTP (SEQ ID NO: 10808) MQLGEQLLVSSVNLPGAHFYPLESARGGSGGSAGHLPSAAPSPQK LDLDKASKKFSGSLSCEAVSGEPAAASAGAPAAMLSDTDAGDAFA SAAAVAKPGPPDGRKGSPCGEEELPSAAAAAAAAAAAAAATARYS MDSLSSERYYLQSPGPQGSELAAPCSLFPYQAAAGAPHGPVYPAP NGARYPYGSMLPPGGFPAAVCPPGRAQFGPGAGAGSGAGGSSGGG GGPGTYQYSQGAPLYGPYPGAAAAGSCGGLGGLGVPGSGFRAHVY LCNRPLWLKFHRHQTEMIITKQGRRMFPFLSFNINGLNPTAHYNV FVEVVLADPNHWRFQGGKWVTCGKADNNMQGNKMYVHPESPNTGS HWMRQEISFGKLKLTNNKGANNNNTQMIVLQSLHKYQPRLHIVEV TEDGVEDLNEPSKTQTFTFSETQFIAVTAYQNTDITQLKIDHNPF AKGFRDNYDSMYTASENDRLTPSPTDSPRSHQIVPGGRYGVQSFF PEPFVNTLPQARYYNGERTVPQTNGLLSPQQSEEVANPPQRWLVT PVQQPGTNKLDISSYESEYTSSTLLPYGIKSLPLQTSHALGYYPD PTFPAMAGWGGRGSYQRKMAAGLPWTSRTSPTVFSEDQLSKEKVK EEIGSSWIETPPSIKSLDSNDSGVYTSACKRRRLSPSNSSNENSP SIKCEDINAEEYSKDTSKGMGGYYAFYTTP

FIG. 12 shows an experiment comparing the differentiating cardiomyocytes by the methods according to the present invention and differentiation by using a small molecule method. Applicants measured the percentage of cells expressing TNNT2 by fixing cells, antibody staining, and quantifying using flow cytometry at 10 days after the start of dox induction. TF refers to adding dox and over expressing the transcription factor EOMES for 2 days. SM refers to an optimized version of a published small molecule differentiation method (Lian et al., Directed cardiomyocyte differentiation from human pluripotent stem cells by modulating Wnt/β-catenin signaling under fully defined conditions, Nature Protocols volume 8, pages 162-175 (2013) doi:10.1038/nprot.2012.150). Applicants determined that the method according to the present invention using the 263 TF conditions is comparable to 263 SM method. Further studies also show differentiation of human pluripotent stem cells (hPSCs) to cardiomyocytes using small molecules (see, e.g., Karakikes, et al., Small molecule-mediated directed differentiation of human embryonic stem cells toward ventricular cardiomyocytes, Stem Cells Transl Med. (2014); Sharma, et al., Derivation of highly purified cardiomyocytes from human induced pluripotent stem cells using small molecule-modulated differentiation and subsequent glucose starvation, J Vis Exp. (2015); and Burridge, et al., Chemically Defined Culture and Cardiomyocyte Differentiation of Human Pluripotent Stem Cells. Curr Protoc Hum Genet. (2015)).

Example 12—A Multiplexed Transcription Factor Screening Platform for Directed Differentiation

Directed differentiation of human pluripotent stem cells into diverse cell types has the potential to realize a broad array of cellular replacement therapies and provides a tractable model that can be perturbed, genetically or chemically, to assess effects in a cell type-specific context (Cohen and Melton, 2011; Colman and Dreesen, 2009; Keller, 2005; Kiskinis and Eggan, 2010; Robinton and Daley, 2012). However, it remains challenging or impossible to generate many cell types (Cohen and Melton, 2011; Colman and Dreesen, 2009; Keller, 2005; Kiskinis and Eggan, 2010; Robinton and Daley, 2012). The best differentiation methods are often labor-intensive and can require months to produce even heterogenous or immature cell populations. Many of these methods rely on exogenous growth factors or small molecules, which are often dosage-sensitive and difficult to identify in a scalable manner. Alternatively, overexpression of transcription factors (TFs) has been shown to rapidly and efficiently generate many different cell types, including neurons and skeletal muscle cells (Furuyama et al., 2019; Pang et al., 2011; Song et al., 2012; Sugimura et al., 2017; Takahashi and Yamanaka, 2006; Weintraub et al., 1989; Zhang et al., 2013). As TFs use endogenous regulatory pathways to drive differentiation, mimicking natural development, this approach to engineering cell fate may produce higher fidelity models while illuminating aspects of development. However, the process of discovering TFs for directed differentiation relies on time-intensive and low-throughput arrayed screens. Arrayed screens, in which each perturbation must be performed and tested individually, are challenging to carry out at large scale, typically limited to 5-25 TFs (Furuyama et al., 2019; Pang et al., 2011; Song et al., 2012; Sugimura et al., 2017; Takahashi and Yamanaka, 2006; Weintraub et al., 1989; Zhang et al., 2013). By contrast, pooled screening approaches, which make use of barcodes to enable multiple perturbations to be tested in parallel, are more scalable, both in terms of time and cost.

To unlock the potential of this promising approach, Applicants sought to develop a multiplexed TF screening platform to identify TFs that can drive specific cell fates in a high-throughput manner. Applicants explored two requirements for pooled screening to identify TFs that drive differentiation. First, perturbations can be introduced into cells via a single copy to drive sufficient TF expression to induce cellular programing. Second, target cell types can be enriched from a diverse cell population, and the TF perturbations that produce the target cell types can be identified.

Applicants first compared different TF overexpression methods and found that ORF overexpression most effectively differentiated human embryonic stem cells (hESCs) into neurons. To establish a generalizable platform for systematic identification of TFs for cellular programming, Applicants created a barcoded human TF library, which Applicants named Multiplexed Overexpression of Regulatory Factors (MORF). The MORF library consists of all known TFs from the human genome, with 3,548 isoforms covering 1,836 genes, and used this library to assay 90 TF isoforms for differentiation of hESCs into neural progenitors (NPs). Applicants chose NPs as the target cell type because induced NPs (iNPs) offer a tractable model for studying complex disorders of the central nervous system (CNS), but current methods for producing iNPs, namely embryoid body formation (Schafer et al., 2019; Zhang et al., 2001) or dual SMAD inhibition (Chambers et al., 2009; Shi et al., 2012a), are low-throughput or produce variable differentiation results depending on the cell line (Hu et al., 2010), respectively. Applicants selected for TFs that drive iNP differentiation using various methods to enrich for target cell types based on marker gene combinations. The pooled screens identified four TFs (RFX4, NFIB, PAX6, and ASCL1), each of which produced multipotent iNPs that could spontaneously differentiate into CNS cell types. Addition of dual SMAD inhibitors to RFX4-overexpressing cells produced homogenous iNPs that preferentially differentiated into GABAergic neurons. RFX4-iNPs can be used to model neurodevelopmental disorders. Using iNPs as a demonstration, Applicants show that pooled TF screening is a scalable and generalizable approach for systematically identifying TFs that drive differentiation of desired cell types.

Example 13—TF ORF Overexpression Effectively Drives Differentiation

Recently, the microbial CRISPR-Cas9 system has been adapted for large-scale gene activation screening, which provides a rapid and efficient method for elucidating complex biology at the genome scale (Gilbert et al., 2014; Konermann et al., 2015). Applicants therefore first sought to leverage the ease and scalability of CRISPR activation (CRISPRa) to screen 1,965 annotated TF genes (Zhang et al., 2012) for their ability to drive differentiation of HUES66 hESCs toward NP cell fates. However, the initial screen did not lead to significant differentiation (data not shown), in contrast to previous observations in mouse embryonic stem cells (Liu et al., 2018).

Although CRISPRa has been used in a range of biological contexts (Gilbert et al., 2014; Joung et al., 2017a; Konermann et al., 2015), the particular regulatory environment of hESCs may be uniquely buffered against TF overexpression. Therefore, Applicants next compared the ability of CRISPRa and ORF-based methods to overexpress NEUROD1 or NEUROG2, two TFs that have been previously shown to induce neuronal differentiation (Zhang et al., 2013), at single copy in HUES66 hESCs (FIG. 35A). In order to pinpoint whether expression level or endogenous UTRs were responsible for limiting TF expression, Applicants included ORFs with endogenous UTRs in the comparison with CRISPRa. For both NEUROD1 and NEUROG2, Applicants found that expression of the TF ORF effectively induced neuronal differentiation (FIGS. 35B-F). Surprisingly, Applicants found that overexpression of the TFs using the ORF with endogenous UTRs did not efficiently differentiate hESCs into neurons, despite robust transcriptional upregulation. As Applicants had observed for the large-scale screen, CRISPRa upregulation of NEUROD1 and NEUROG2 did not effectively induce differentiation. These results suggest that there may be endogenous post-transcriptional regulatory mechanisms in hESCs that buffer against TF protein expression (FIGS. 35B-F). Applicants therefore proceeded with TF ORF overexpression for screening.

Example 14—a Barcoded Human TF Library for Directed Differentiation

To enable high-throughput, systematic identification of TFs for directed differentiation of any desired cell type, Applicants created a barcoded human TF library, MORF (FIG. 28 and Table 3). The library consists of 1,836 genes, including histone modifiers, and covers 3,548 isoforms that overlap between the RefSeq and GENCODE annotations. Applicants also included two control vectors in the library. All vectors in the library contain unique barcodes that facilitate pooled screening. MORF is provided in an arrayed format that can be readily subpooled for targeted TF screens, followed by characterization of individual candidate TFs. MORF enables a generalizable approach for TF screening that will expand the ability to generate desired cell types.

Example 15—Development of a Pooled TF ORF Screening Platform for iNP Differentiation

As a demonstration, Applicants performed a targeted TF screen for differentiation of hESCs into iNPs. To select a subset of TFs for the screen, Applicants examined eight RNA-sequencing (RNA-seq) datasets (Camp et al., 2015; Johnson et al., 2015; Llorens-Bobadilla et al., 2015; Pollen et al., 2015; Shin et al., 2015; Thomsen et al., 2016; Wu et al., 2010; Zhang et al., 2016) and identified 70 TFs that were found to be specifically expressed in NPs in at least two datasets. For each TF, Applicants included isoforms that comprised >25% of the expressed transcript in NPs, resulting in a total of 90 TF isoforms (see Methods; Table 1). Applicants pooled the barcoded TFs and packaged them into a lentiviral library for delivery in hESCs (FIG. 29A). Applicants differentiated the cells for 7 days before selecting TFs that drive iNP differentiation (FIG. 36A). To determine the ideal strategy for selecting TFs, Applicants explored three different methods that can simultaneously assay different numbers of marker genes: reporter cell line (1 gene), flow-FISH (2-10 genes), and single-cell RNA-sequencing (scRNA-seq; 10-2,000 genes; FIG. 29A).

For the reporter cell line method, Applicants generated clonal reporter cell lines with EGFP inserted downstream of an endogenous NP marker gene, either SLC1A3 or VIM, which were selected based on convergence across published RNA-seq datasets and high expression levels (Camp et al., 2015; Johnson et al., 2015; Llorens-Bobadilla et al., 2015; Pollen et al., 2015; Shin et al., 2015; Thomsen et al., 2016; Wu et al., 2010; Zhang et al., 2016). Applicants transduced the SLC1A3 or VIM reporter cell line with the pooled TF library, differentiated the cells for 7 days, and sorted for high and low EGFP-expressing cells (FIGS. 29A and 36B). Deep sequencing of the TF barcodes in each population identified candidate TFs that were enriched in the high EGFP-expressing cell population, indicating upregulation of SLC1A3 or VIM (FIGS. 29B and 36C; Table 1).

For the flow-FISH method, Applicants transduced hESCs with the pooled TF library, differentiated the cells for 7 days, and labeled either 2 or 10 NP marker gene transcripts using pooled FISH probes (FIG. 29A). By pooling the FISH probes, Applicants could sort for cells expressing high or low levels of 2-10 marker genes at the same time (FIGS. 36D and 36E). Similar to the reporter cell line method, Applicants deep sequenced the TF barcodes and identified candidate TFs that were enriched in cells expressing higher levels of marker genes (FIGS. 29C and 36F; Table 1). Applicants found that for some TFs, such as EOMES and RFX4, the choice of TF isoform can produce very different differentiation results (FIGS. 29C and 36F). Both the flow-FISH and reporter cell line methods to assay SLC1A3 and VIM expression produced comparable TF enrichment profiles (FIG. 36G).

For the scRNA-seq method, Applicants transduced hESCs with the pooled TF library, differentiated the cells for 7 days, and performed scRNA-seq to profile 53,560 single cells (FIG. 29A). In the barcoded TF ORF vector design, the TF barcode is expressed in the TF mRNA, which is captured by scRNA-seq and can be mapped to cell barcodes (FIG. 28). After assigning TFs to cells, Applicants found that the number of cells that had each TF overexpressed was very skewed, with the top 10% of the TFs having 92 times more cells than the bottom 10% of TFs, potentially due to TF-dependent effects on cell death and proliferation (FIG. 36H). Cluster analysis of the scRNA-seq results suggested that overexpression of several TFs, for instance ASCL1 and EOMES, generated distinct transcriptome signatures that clustered together and were more distant to those of other TFs, while overexpression of most TFs did not produce distinct transcriptome signatures (FIGS. 29D, 361, and 36J). By comparing the TF transcriptome signatures with those published for radial glia (Nowakowski et al., 2017; Pollen et al., 2015; Quadrato et al., 2017), which represent NPs in the developing cortex, Applicants identified candidate TFs with the highest transcriptome signature correlation (FIG. 29E and Table 1). Applicants also compared TF transcriptome signatures to other cell types from the mouse organogenesis cell atlas (Cao et al., 2019) to nominate TFs for additional cell types, such as FOXN4 for early mesenchyme or SOX9 for Schwann cell precursors (FIG. 36K).

To verify the results from the pooled screen, Applicants performed an arrayed screen on the same 90 TF isoforms, packaging each TF individually into lentivirus for delivery into hESCs (FIG. 37A-C). The arrayed and pooled screens nominated overlapping sets of candidate TFs for iNP differentiation (FIG. 29F and Table 1), some of which (NFIB (Steele-Perkins et al., 2005), OTX1 (Frantz et al., 1994), PAX6 (Englund et al., 2005; Gotz et al., 1998), EOMES (Bulfone et al., 1999; Englund et al., 2005), and ASCL1 (Casarosa et al., 1999)) are known to be involved in neural development, further supporting the screening results. Out of the pooled screening methods, flow-FISH identified the highest number (6 out of 8) of candidate TFs that overlapped with other screens (FIG. 29F). Compared to using reporter cell lines, flow-FISH is more versatile, because the marker gene combinations can be easily exchanged or combined without generating another clonal reporter cell line. Flow-FISH is also more accessible than scRNA-seq and can measure a greater dynamic range of transcript expression. Together, these results suggest that flow-FISH may be an ideal screening method for other cell types.

Example 16—Validation of Candidate TFs for iNP Differentiation

For downstream analysis, Applicants chose to focus on the eight candidate TFs from the flow-FISH screen as well as two additional candidates that were enriched in the other screens and previously suggested to mediate iNP differentiation, ASCL1 (Casarosa et al., 1999) and PAX6 (Zhang et al., 2010) (FIG. 29F). Applicants individually overexpressed the top isoform of each TF in hESCs and verified TF expression (FIG. 37D). Immunostaining the iNPs for NP markers showed that, compared to hESCs, all iNPs expressed higher levels of VIM, a marker used to select target cells in the pooled screen, and exhibited diverse morphologies (FIGS. 30 and 37E). Five candidate TFs (OTX1, EOMES, RFX4, PAX6, and ASCL1) produced iNPs that appear morphologically distinct from hESCs overexpressing GFP control, two candidate TFs (HES1 and LHX2) produced iNPs with similar morphologies to hESCs, and three candidate TFs (NFIC, FOS, and NFIB) produced iNPs with morphologies that were in between the two groups. Applicants then compared bulk RNA-seq signatures of iNPs to different cell types in the human fetal cortex and in brain organoids (Nowakowski et al., 2017; Pollen et al., 2015; Quadrato et al., 2017). Applicants found that transcriptome signatures of iNPs derived using RFX4, ASCL1, and PAX6 were the most similar to NPs, whereas those produced by EOMES and FOS were the most different (FIGS. 30 and 37E; Table 7). Thus, Applicants have validated the pooled screening approach by confirming that overexpression of all candidate TFs upregulated marker genes that are used to enrich for NPs.

Example 17—Functional Evaluation of iNP Multipotency Using Spontaneous Differentiation

Next, Applicants evaluated the multipotency of iNPs produced by each candidate TF by spontaneously differentiating the iNPs. Applicants transiently overexpressed candidate TFs for 1 week to produce iNPs and then removed growth factors from the media to allow the iNPs to spontaneously differentiate for 8 weeks (FIG. 31A). Like NPs, iNPs should spontaneously differentiate into cell types in the CNS such as neurons and astrocytes. Out of the ten candidate TFs, four (RFX4, NFIB, PAX6, and ASCL1) produced iNPs that spontaneously differentiated into neurons, astrocytes, and, more rarely, oligodendrocyte precursor cells (FIGS. 31B and 38A). Spontaneous differentiation of iNPs generated by these four TFs followed the natural developmental progression of neurogenesis starting at week 1 and proceeding to gliogenesis at week 4 (FIGS. 31B and 38A). RFX4-iNPs patterned into neural rosettes prior to neurogenesis (FIGS. 31B and 38A).

Applicants validated these four TFs in two additional pluripotent stem cell lines, iPSC11a and H1. For both cell lines, overexpression of the four TFs produced iNPs that expressed higher levels of NP marker genes relative to GFP control (FIGS. 38B and 38C). Following spontaneous differentiation, Applicants found that RFX4 and NFIB consistently produced functional iNPs in iPSC11a (FIG. 38D), and RFX4 produced functional iNPs in H1 (FIG. 38E). These results indicate that the effects of some TFs are cell line-dependent, while others, like RFX4, are cell line-independent, which may point to a more critical role in NP specification during development.

Applicants further characterized the cells spontaneously differentiated from iNPs produced by these four TFs using scRNA-seq. Cluster analysis of 53,113 cells revealed that the iNPs generated a broad range of cell types, such as cell types from the retina, CNS, epithelium, and neural crest (FIGS. 32A-C and Table 6). For the CNS, iNPs spontaneously produced different regionally-restricted progenitors, such as radial glia and dorsal neural progenitors, as well as neurons, astrocytes, and ependyma (FIGS. 32B and 32C). Applicants found that the spontaneously differentiated cell types were generally consistent between biological replicates of the same TF, except for those from RFX4-iNPs, and distinct between TFs (FIGS. 32D-F). RFX4-iNPs produced more CNS cell types; NFIB-iNPs produced more epithelium and neural crest cell types; PAX6-iNPs generated diverse cell types; and ASCL1-iNPs produced more retina cell types (FIGS. 32D-F). Further analysis of CNS neurons spontaneously differentiated from iNPs showed that the neurons expressed marker genes representative of diverse brain regions as well as neurotransmitters and included newborn cortical excitatory neurons and cortical projection neurons (FIGS. 39A-D). RFX4-iNPs generated diverse neurons, NFIB-iNPs produced more cortical projection and excitatory neurons, PAX6-iNPs produced more forebrain neurons, and ASCL1-iNPs generated more forebrain GABAergic neurons (FIG. 39E). Together, the spontaneous differentiation results show that four of the candidate TFs produce functional iNPs.

To better understand the transcriptional networks that lead to iNP production, Applicants profiled the four TFs using chromatin immunoprecipitation with sequencing (ChIP-seq). Motif analysis generated distinct motifs for each TF and suggested potential transcriptional coregulators, some of which have been found in previous studies (FIG. 39F) (Morotomi-Yano et al., 2002; Murre et al., 1989). Applicants identified candidate genes that could contribute to the potential mechanisms behind directed iNP differentiation by examining NP marker genes with TF ChIP-seq peaks that were also differentially expressed upon TF overexpression (FIGS. 39G-I and Table 8). In addition, Applicants found that each of the four TFs had ChIP-seq peaks that were proximal to its own promoter, indicating a positive feedback mechanism that contributes to the high expression levels required for driving differentiation (FIGS. 39H and 39I).

Example 18—Combining RFX4 with Dual SMAD Inhibition Produces Homogenous iNPs

Next, Applicants sought to improve the consistency of RFX4-iNPs. Although RFX4-iNPs produced the highest proportion of CNS cell types, the iNPs were less consistent between biological replicates (FIGS. 32D-F). Applicants overexpressed RFX4 in H1 hESCs and tested transition from stem cell media to two alternative NP media used in the embryoid body (EB) (Schafer et al., 2019) and dual SMAD inhibition (DS) (Shi et al., 2012a) NP differentiation methods (FIG. 40A). Applicants also tested addition of dual SMAD inhibitors and two different NP induction times, 5 and 7 days (FIG. 40A). By spontaneously differentiating the iNPs and measuring expression of the neuronal marker genes TUBB3 and MAP2 as a heuristic for the proportion of iNPs that underwent neurogenesis, Applicants could identify conditions that promoted differentiation of CNS iNPs and increased homogeneity of the iNP population. Applicants found that combining RFX4 overexpression with dual SMAD inhibitors in the initial NP media for 7 days produced the most homogenous iNPs (FIGS. 40A-D).

Applicants then compared iNPs generated by the optimized protocol, RFX4-DS, to those from two alternative NP differentiation methods that rely on EB (Schafer et al., 2019) and DS (Shi et al., 2012a). Applicants derived iNPs using the three differentiation methods in two batch replicates and performed scRNA-seq on 42,780 iNPs (15,211 RFX4-DS-iNPs, 11,148 EB-iNPs, and 16,421 DS-iNPs). Cluster analysis showed that, as expected, the majority of the cells were NPs (FIGS. 33A and 33B; Table 6). Applicants also observed immature neurons that have spontaneously differentiated from iNPs and cranial neural crest cells that were off-target products of NP differentiation (FIGS. 33A and 33B). Using distances between cells from the same batch replicate and cells from different batch replicates as metrics for intra- and inter-batch variability respectively, Applicants found that RFX4-DS-iNPs had lower intra- and inter-batch distances compared to EB- and DS-iNPs (FIGS. 33C and 33D). In addition, batch replicates of RFX4-DS-iNPs had more consistent percentages of cells that were grouped into each cluster than those of EB- and DS-iNPs, suggesting that the RFX4-DS protocol produces more consistent iNPs than alternative protocols (FIGS. 33E and 33F). All three protocols generated iNPs that expressed telencephalon markers such as SIX3 and LHX2, although RFX4-DS-iNPs did not express FOXG1, suggesting that there may be potential differences between RFX4-DS-iNPs and iNPs generated by existing methods that could contribute to differences in downstream cell types derived from iNPs (FIG. 40E). Applicants confirmed this observation by immunostaining iNPs for FOXG1 (FIG. 40F). Further analysis of genes that were differentially expressed between iNP differentiation methods showed that RFX4-DS-iNPs expressed higher levels of CRABP1, NR2F2, and CDH6, whereas EB- and DS-iNPs expressed EMX2, PAX6, and CNTNAP2 (FIG. 33G). These results indicate that RFX4-DS-iNPs may resemble NPs of the deep layer neocortex, rather than of the ventricular zone (Cadwell et al., 2019; Matsunaga et al., 2015).

To characterize the cells spontaneously differentiated from RFX4-DS-iNPs, Applicants performed scRNA-seq on 26,111 cells at 4 and 8 weeks of spontaneous differentiation. Cluster analysis showed that RFX4-DS-iNPs differentiated into predominantly CNS cell types, radial glia, and neurons, with a small subset differentiating into meningeal cells (FIGS. 33H-J and Table 6). At each differentiation time point, the spontaneously differentiated cell types were remarkably consistent between biological replicates (FIGS. 33K and 33L). RFX4-DS-iNPs produced 98% CNS cell types at 4 weeks and 94% at 8 weeks (FIG. 33M), suggesting that initially >98% of iNPs were capable of spontaneously differentiating into CNS cell types because differentiated neurons do not divide, unlike meningeal cells. Similar to RFX4-DS-iNPs, most of the radial glia differentiated from RFX4-DS-iNPs expressed telencephalon marker genes SIX3 and LHX2, but not FOXG1 (FIG. 40G). By contrast, differentiated neurons expressed all three marker genes (FIG. 40G). The radial glia were diverse, with some expressing markers indicative of more restricted precursors for astrocytes (CD44) and ependymal cells (FOXJ1; FIG. 40H). RFX4-DS-iNPs produced predominantly GABAergic neurons (GAD2 and SLC32A1) that expressed markers indicative of different GABAergic interneuron subtypes, such as SST, CALB1, CALB2, and PVALB (FIGS. 401 and 40J). The propensity for RFX4-DS-iNPs to spontaneously differentiate into GABAergic neurons, rather than glutamatergic neurons as previously shown for iNPs produced by alternative methods (Schafer et al., 2019; Shi et al., 2012b), may stem from initial differences observed between the iNPs (FIGS. 33G, 40E, and 40F). Specifically, RFX4-DS-iNPs expressed higher levels of NR2F2, a marker gene for cortical GABAergic interneurons originating from the ganglionic eminence and neocortex in the human fetal forebrain (Reinchisi et al., 2012). RFX4 ChIP-seq and bulk RNA-seq data further suggests that RFX4 directly regulates NR2F2, as RFX4 had a ChIP-seq peak within 5 kb of all four annotated transcriptional start sites of NR2F2 isoforms and RFX overexpression robustly upregulated expression of NR2F2 (Tables 7 and 8). Overall, the results suggest that RFX4 overexpression can be combined with dual SMAD inhibition to produce homogenous iNPs that spontaneously differentiate into GABAergic neurons.

Example 19—RFX4-iNPs Accurately Model Effects of DYRK1A Perturbations on Neural Development

To explore the utility of the differentiation protocol Applicants developed, Applicants transiently overexpressed RFX4 to differentiate iPSC11a into iNPs to study the effects of DYRK1A perturbation on NPs during neural development (FIGS. 34A and 41A-D). DYRK1A knockout has been implicated in autism spectrum disorder (De Rubeis et al., 2014; Iossifov et al., 2014), whereas overexpression of DYRK1A has been linked to Down syndrome (Smith et al., 1997). Applicants characterized iNPs using bulk RNA-seq and identified 42 genes that were significantly differentially expressed in a DYRK1A dosage-dependent manner, some of which are known to be involved in cellular proliferation, neuronal migration, and synapse formation (FIGS. 34B-F; Table 7). Applicants spontaneously differentiated the RFX4-derived iNPs to profile the effects of DYRK1A perturbation on neurogenesis and neural development. DYRK1A knockout iNPs initially showed reduced proliferation, potentially due to toxicity of DNA double-strand breaks introduced by Cas9, but at weeks 2 and 4 of spontaneous differentiation, DYRK1A knockout iNPs showed significantly increased proportions of proliferating cells, indicating that more iNPs were actively dividing instead of undergoing neurogenesis (FIG. 34G). By contrast, DYRK1A overexpressing iNPs showed lower proportions of proliferating cells at weeks 0 and 2 (FIG. 34H). As increased iNP proliferation deters neurogenesis, Applicants immunostained spontaneously differentiating iNPs for expression of the neuronal marker MAP2. For the DYRK1A knockout iNPs, Applicants observed a significant reduction in neuronal MAP2 staining at weeks 2 and 4 (FIGS. 341 and 41E). For the DYRK1A overexpression iNPs, as there were fewer iNPs due to lower initial proliferation, Applicants observed significant reductions in neuronal MAP2 staining at weeks 0 and 1 (FIG. 34J).

Applicants further characterized neurons spontaneously differentiated from DYRK1A-perturbed iNPs using electrophysiology. Whole-cell patch-clamp recording of neurons after 12-14 weeks of spontaneous differentiation confirmed that neurons derived from unperturbed iNPs were electrophysiologically functional (FIGS. 41F and 41G). Both DYRK1A knockout and overexpression iNPs exhibited reduced proportions of neurons with properties indicative of maturation, such as presence of evoked action potentials and spontaneous excitatory postsynaptic activity (FIGS. 41F and 41G). In addition, neurons produced by DYRK1A knockout iNPs had higher resting membrane potential and membrane resistance (FIG. 41H). Applicants did not observe any significant differences in action potential properties (FIG. 41I). Together, these electrophysiology results suggest that neurons spontaneously differentiated from DYRK1A knockout and overexpression iNPs are less mature. The DYRK1A perturbation results are consistent with previous studies in other model systems (Fotaki et al., 2002; Hammerle et al., 2011; Park et al., 2010; Soppa et al., 2014; Yabut et al., 2010) and provide additional insight for how different DYRK1A expression levels can affect neural development. Thus, RFX4-iNPs can be used to model effects of perturbations on neural development and neurogenesis and may serve as a tractable system for studying complex neurological disorders.

Example 20—Discussion

By screening TF ORFs, Applicants were able to identify four TFs that could individually differentiate hESCs and induced pluripotent stem cells into iNPs that resemble the morphology, transcriptome signature, and multipotency of NPs. Of the four candidate TFs, overexpression of RFX4, which has not been extensively studied in CNS development, resulted in the highest proportion of CNS cell types, highlighting the importance of performing large-scale, unbiased TF screens (Ashique et al., 2009; Blackshear et al., 2003). Combining RFX4 overexpression with dual SMAD inhibition produced homogenous iNPs that spontaneously differentiated into predominantly GABAergic neurons. Notably, the differentiation method produced iNPs within 7 days, compared to 11-16 days for existing differentiation methods, and is more scalable than the embryoid body method (Chambers et al., 2009; Schafer et al., 2019; Shi et al., 2012a; Zhang et al., 2001). By perturbing DYRK1A in iNPs to model neurodevelopmental disorders, Applicants found that DYRK1A modulates iNP proliferation to disrupt neurogenesis, confirming results from previous studies in other model systems (Fotaki et al., 2002; Hammerle et al., 2011; Park et al., 2010; Soppa et al., 2014; Yabut et al., 2010) and suggesting candidate genes that mediate the effect of DYRK1A on neural development.

Although Applicants focused here on 90 TF isoforms highly expressed in the target cell type (˜23% of TFs expressed in NPs and ˜2.5% of all TF isoforms), the accessibility and low-cost nature of the multiplexed screening approach lends itself to scalable extensions of the technology to additional cell types of interest. For some of these cell types, Applicants have recommended lists of marker genes and TFs based on published RNA-seq datasets (Table 9). Applicants have also provided code for aggregating gene lists from different datasets and selecting marker genes and a subset of TFs from the TF library for targeted screening (see Methods). Moreover, the approach may be applied to identify combinations of TFs by screening at a higher MOI to increase the probability of introducing more than one TF in the same cell. Iterative TF screens may also expand the landscape of cell types it is possible to generate with this platform. For instance, performing TF screens in iNPs for differentiation into neurons or glia may facilitate generation of mature cell types as iterative overexpression of TFs may mimic the natural developmental trajectory.

Beyond directed differentiation, TF screening enables identification of factors involved in cellular reprogramming (Takahashi and Yamanaka, 2006) and trans-differentiation (Pang et al., 2011; Song et al., 2012), as well as cancer progression (Darnell, 2002) and senescence (Campisi, 2001). The ORF barcoding approach allows for a variety of screening selection methods and could also be extended to pooled ORF screening of other protein families of interest. Future application of the multiplexed TF screening platform for cellular engineering has the potential to expand the number of available cellular models that will help elucidate complex regulatory mechanisms behind development and disease.

Example 21—Methods for Examples 1-21

Sequences and cloning. The plasmids lentiMPHv2 (Addgene 89308) and lentiSAMv2 (Addgene 75112) were used for CRISPR activation. LentiCRISPRv2 (Addgene 52961) was used for CRISPR-Cas9 mediated homology-directed repair (HDR). The Puromycin resistance gene in lentiCRISPRv2 was replaced with Blasticidin resistance gene (Addgene 75112) for CRISPR-Cas9 knockout of DYRK1A. Single guide RNA (sgRNA) spacer sequences used in this study are listed in Table 10, and cloned into the respective vectors as previously described (Joung et al., 2017b). For spontaneous differentiation using a dox-inducible gene expression system, the plasmid pUltra-puro-RTTA3 (Addgene 58750) was used for rtTA. The EF1a promoter in pLX_TRC209 (Broad Genetic Perturbation Platform) was replaced with the pTight promoter (Addgene 31877). For DYRK1A overexpression, the codon-optimized DYRK1A sequence (NM_001396) was cloned into pLX_TRC209 (Broad Genetic Perturbation Platform) for expression under EF1a and the Hygromycin resistance gene was replaced with a Blasticidin resistance gene (Addgene 75112).

Cell culture and differentiation. HEK293FT cells (Thermo Fisher Scientific R70007) were maintained in high-glucose DMEM with GlutaMax and pyruvate (Thermo Fisher Scientific 10569010) supplemented with 10% fetal bovine serum (VWR 97068-085) and 1% penicillin/streptomycin (Thermo Fisher Scientific 15140122). Cells were passaged every other day at a ratio of 1:4 or 1:5 using TrypLE Express (Thermo Fisher Scientific 12604021).

Unless otherwise specified, human embryonic stem cells (hESCs) used in these experiments were from the HUES66 cell line (Harvard Stem Cell Institute iPS Core Facility). Other stem cell lines used in this study include human induced pluripotent stem cell (iPSC) 11a (gift from the Arlotta laboratory, Harvard University) and hESC H1 (WiCell). hESCs and iPSCs were maintained in cell culture dishes coated with 1% Geltrex membrane matrix (Thermo Fisher Scientific A1413202) in mTeSR1 medium (STEMCELL Technologies 85850). For routine maintenance, stem cells were passaged 1:10-1:20 using ReLeSR (STEMCELL Technologies 05873) and seeded in mTeSR with 10 μM ROCK Inhibitor Y27632 (Enzo Life Sciences ALX-270-333-M025). For lentivirus transduction and differentiation, cells were dissociated using Accutase (STEMCELL Technologies 07920). All stem cells were maintained below passage 30 and confirmed to be karyotypically normal and negative for mycoplasma within 5 passages before differentiation.

During neuronal differentiation, stem cell media was incrementally shifted towards neuronal media, consisting of Neurobasal medium (Thermo Fisher Scientific 21103049) supplemented with B-27 (Thermo Fisher Scientific 17504044), GlutaMAX (Thermo Fisher Scientific 35050061), and Normocin (Invivogen ant-nr-1). 1 day after the start of differentiation (day 1), media was changed to stem cell media with the appropriate antibiotic. Antibiotic was included in the media for a total of 5 days of selection. On day 2, media was changed to 75% stem cell media and 25% neuronal media. On day 3, media was changed to 50% stem cell media and 50% neuronal media. On day 4, media was changed to 25% stem cell media and 75% neuronal media. On day 5, media was changed to neuronal media.

During TF-driven neural progenitor (NP) differentiation, stem cell media was gradually shifted towards NP media, consisting of DMEM/F-12 with HEPES (Thermo Fisher Scientific 11330057) supplemented with B-27 (Thermo Fisher Scientific 17504044), 20 ng/ml EGF (MilliporeSigma E9644), 20 ng/mL bFGF (STEMCELL Technologies 78003), 2 ug/ml heparin (STEMCELL Technologies 07980), and Normocin (Invivogen ant-nr-1). Similar to neuronal differentiation, stem cell media was shifted by increasing the proportion of NP media 25% incrementally from day 2 to day 5. Cells were passaged at day 4 when selected with the appropriate antibiotic. For spontaneous differentiation, 2 μg/mL doxycycline (MilliporeSigma D9891) was added to the media starting from day 0 for 7 days. After 7 days, cells were maintained in NP media for 3 days before media was changed to differentiation media, which had the same components as NP media but without EGF and bFGF. During spontaneous differentiation, 40-60% of differentiation media was refreshed every other day.

For comparison to other NP differentiation methods, embryoid body (EB) (Schafer et al., 2019) and dual SMAD inhibition (DS) (Shi et al., 2012a) methods were used to differentiate hESCs into NP as previously described. To provide the best comparison between the methods, the differentiation timelines for the three methods were aligned such that the iNP differentiation ended around the same time. The iNPs produced by the three methods were dissociated for scRNA-seq at the same time. During the RFX4-iNP protocol optimization, base media from the DS and EB protocols were tested. DS media is a 1:1 mix of N-2 and B-27-containing media. N-2 medium consists of DMEM/F12 with HEPES (Thermo Fisher Scientific 11330057) supplemented with N-2 (Thermo Fisher Scientific 17502048), 5 μg/mL insulin (Millipore Sigma 19278), 100 μM nonessential amino acids (Thermo Fisher Scientific 11140050), 100 M 2-mercaptoethanol (Millipore Sigma M6250), and Normocin (Invivogen ant-nr-1). B-27 medium is the same as the neuronal medium described above. EB media consists of DMEM/F12 with HEPES (Thermo Fisher Scientific 11330057) supplemented with N-2 (Thermo Fisher Scientific 17502048), B27 minus vitamin A (Thermo Fisher Scientific 12587010), and Normocin (Invivogen ant-nr-1). SMAD inhibitors dorsomorphin (Millipore Sigma P5499) and SB-431542 (R&D Systems 1614) were added where indicated.

Lentivirus production. HEK293FT cells (Thermo Fisher Scientific R70007) were cultured as described above. 1 day prior to transfection, cells were seeded at ˜40% confluency in T25, T75, or T225 flasks (Thermo Fisher Scientific 156367, 156499, or 159934). Cells were transfected the next day at ˜90-99% confluency. For each T25 flask, 3.4 μg of plasmid containing the vector of interest, 2.6 μg of psPAX2 (Addgene 12260), and 1.7 μg of pMD2.G (Addgene 12259) were transfected using 17.5 μL of Lipofectamine 3000 (Thermo Fisher Scientific L3000150), 15 μL of P3000 Enhancer (Thermo Fisher Scientific L3000150), and 1.25 mL of Opti-MEM (Thermo Fisher Scientific 31985070). Transfection parameters were scaled up linearly with flask area for T75 and T225 flasks. Media was changed 5 h after transfection. Virus supernatant was harvested 48 h post-transfection, filtered with a 0.45 μm PVDF filter (MilliporeSigma SLHV013SL), aliquoted, and stored at −80° C.

Lentivirus transduction. For transduction, 3×106 hESCs or iPSCs were seeded in 10-cm cell culture dishes with 10 μM ROCK Inhibitor Y27632 (Enzo Life Sciences ALX-270-333-M025) and an appropriate volume of lentivirus in mTeSR. After 24 h, media was refreshed with the appropriate antibiotic. For 5 days, media with the appropriate antibiotic was refreshed every day, and cells were passaged after 3 days of selection. Concentrations for selection agents were determined using a kill curve: 150 μg/mL Hygromycin (Thermo Fisher Scientific 10687010), 3 μg/mL Blasticidin (Thermo Fisher Scientific A1113903), and 1 μg/mL Puromycin (Thermo Fisher A1113803). Lentiviral titers were calculated by transducing cells with 5 different volumes of lentivirus and determining viability after a complete selection of 3 days (Joung et al., 2017b).

qPCR quantification of transcript expression. Cells were seeded in 96-well plates and grown to 60-90% confluency before RNA was reverse transcribed for qPCR as described previously (Joung et al., 2017b). TaqMan qPCR was performed with custom or readymade probes (Tables 11 and 12). Significance testing was performed using Student's t-test.

Western blot. Protein lysates were harvested with RIPA lysis buffer (Cell Signaling Technologies 9806S) containing protease inhibitor cocktail (MilliporeSigma 05892791001). Samples were standardized for protein concentration using the Pierce BCA protein assay (VWR 23227), and 20 μg or 40 μg of the samples were incubated at 70° C. for 10 mins under reducing conditions. After denaturation, samples were separated by Bolt 4-12% Bis-Tris Plus Gels (Thermo Fisher Scientific NW04125BOX) and transferred onto a PVDF membrane using iBlot Transfer Stacks (Thermo Fisher Scientific IB401001).

For NEUROD1 and V5, blots were blocked with Odyssey Blocking Buffer (TBS; LiCOr 927-50000) for 1 h at room temperature. Blots were then probed with different primary antibodies [anti-NEUROD1 (Abcam ab60704, 1:1,000 dilution), anti-GAPDH (Cell Signaling Technologies 2118L, 1:1,000 dilution), anti-V5 (Cell Signaling Technologies 13202S, 1:1,000 dilution), anti-ACTB (MilliporeSigma A5441, 1:5,000 dilution)] in Odyssey Blocking Buffer overnight at 4° C. Blots were washed with TBST before incubation with secondary antibodies IRDye 680RD Donkey anti-Mouse IgG (LiCOr 925-68072) and IRDye 800CW Donkey anti-Rabbit IgG (LiCOr 925-32213) at 1:20,000 dilution in Odyssey Blocking Buffer for 1 h at room temperature. Blots were washed with TBST and imaged using the Odyssey CLx (LiCOr).

For DYRK1A, blots were blocked with 5% BLOT-QuickBlocker (G Biosciences 786-011) in TBST for 1 h at room temperature. Blots were then probed with different primary antibodies [anti-DYRK1A (Novus Biologicals H00001859-M01, 1:250 dilution) or anti-ACTB (Cell Signaling Technologies 4967L, 1:1,000 dilution)] in 2.5% BLOT-QuickBlocker (G Biosciences 786-011) in TBST overnight at 4° C. Blots were washed with TBST before incubation with secondary antibodies anti-mouse IgG, HRP-linked antibody (Cell Signaling Technologies 7076S) and anti-rabbit IgG, HRP-linked antibody (Cell Signaling Technologies 7074S) at 1:5,000 dilution in 2.5% BLOT-QuickBlocker (G Biosciences 786-011) in TBST for 1 h at room temperature. Blots were washed with TBST and imaged using the Pierce ECL Western Blotting Substrate (Thermo Fisher Scientific 32209) on the ChemiDox XRS+ (Bio-Rad).

Immunofluorescence and imaging. Cells were cultured on poly-D-lysine/laminin coated glass coverslips (VWR 354087) in 24-well plates as described above. Prior to staining, cells were washed with 1 mL PBS and fixed with 4% paraformaldehyde (VWR 15710) in PBS for 30 mins at room temperature. Cells were washed with PBS and blocked in PBS with 2.5% goat serum (Cell Signaling Technologies 5425S) and 0.1% Triton X-100 (MilliporeSigma 93443) for 1 h at room temperature. Cells were then stained with different primary antibodies [anti-MAP2 (MilliporeSigma M1406, 1:500 dilution), anti-PAX6 (Abcam ab5790, 1:500 dilution), anti-Nestin (MilliporeSigma MAB5326, 1:200 dilution), anti-VIM (Proteintech 10366-1-AP, 1:200 dilution), anti-GFAP (Abcam ab4674, 1:500 dilution), anti-NG2 (MilliporeSigma AB5320, 1:200 dilution), anti-PDGFRA (Cell Signaling Technologies 3164S, 1:200 dilution), or anti-FOXG1 (Abcam ab18259, 1:500 dilution] in PBS with 1.25% goat serum (Cell Signaling Technologies 5425S) and 0.1% Triton X-100 (MilliporeSigma 93443) overnight at 4° C. Cells were washed in PBS with 0.1% Triton X-100 (MilliporeSigma 93443) before staining with the appropriate secondary antibodies [goat anti-mouse IgG (Alexa Fluor 568, Thermo Fisher Scientific A-11031, 1:1,000 dilution), goat anti-chicken IgY (Alexa Fluor 488, Thermo Fisher Scientific A-11039, 1:1,000 dilution), goat anti-rabbit IgG (Alexa Fluor 647, Thermo Fisher Scientific A-21244, 1:1,000 dilution), or goat anti-rabbit IgG (Alexa Fluor 488, Thermo Fisher Scientific A-11008, 1:1,000 dilution)] in PBS with 1.25% goat serum (Cell Signaling Technologies 5425S) and 0.1% Triton X-100 (MilliporeSigma 93443) for 1 h at room temperature. Cells were washed in PBS with 0.1% Triton X-100 (MilliporeSigma 93443), mounted onto slides using ProLong Gold Antifade Mountant with DAPI (Thermo Fisher Scientific P36941), and nail polished (VWR 100491-940). Immunostained coverslips were imaged on a Zeiss Axio Observer with a Hamatsu Camera using a Plan-Apochromat 20x objective and a 1.6× Optovar.

Image quantification. Images were taken from randomly selected regions using fixed exposure times. The MeasureImageIntensity module in CellProfiler 3.1.8 was used to analyze grayscale 577 nm images (MAP2) for mean intensity units. For induced neurons, mean intensity units were normalized by the number of nuclei in each image. The IdentifyPrimaryObjects module in CellProfiler was used to identify and count nuclei in the grayscale 353 nm (DAPI) images with the following settings modified from default: Typical diameter of objects, in pixel units (Min, Max): 25, 70; Threshold strategy: Adaptive; Threshold smoothing scale: 1.5; Lower and upper bounds on threshold: 0.06, 1.0. Significance testing was performed using Student's t-test.

Design and cloning of TF ORF libraries. The barcoded human TF library (MORF) consisted of 1,836 genes that were selected based on AnimalTFDB (Zhang et al., 2015) and Uniprot (UniProt, 2015) annotations and included histone modifiers. The library included 3,548 isoforms that overlapped between RefSeq and Gencode annotations, as well as 2 control vectors expressing GFP and mCherry. 593 of the 3,548 isoforms were obtained from the Broad Genomic Perturbation Platform and sequence verified. Table 3 lists the sequences of TFs in MORF.

To design a targeted TF ORF library for NP differentiation, single-cell or bulk RNA-seq datasets of human or mouse radial glia, neural stem cells, differentiated neural progenitors from 2D cultures or brain organoids, and fetal astrocytes were used to select TFs that were shown to be specifically expressed in these cell types (Camp et al., 2015; Johnson et al., 2015; Llorens-Bobadilla et al., 2015; Pollen et al., 2015; Shin et al., 2015; Thomsen et al., 2016; Wu et al., 2010; Zhang et al., 2016). TFs that were identified in 2 or more datasets (out of 8) were included in the library. Then, bulk RNA-seq data of human fetal astrocytes (Zhang et al., 2016) was used to identify TF isoforms annotated in RefSeq that comprised >25% of the TF gene transcripts. These criteria selected 90 TF isoforms covering 70 TF genes (Table 1).

TF ORF isoforms that were not available from the Broad Genomic Perturbation Platform were synthesized with 24-bp barcodes (Genewiz) and cloned in an arrayed format into pLX_TRC317 (MORF; Broad Genetic Perturbation Platform) or pLX_TRC209 (targeted NP library; Broad Genetic Perturbation Platform) for expression under the EF1a promoter. Barcodes for each TF were selected to have a Hamming distance of at least 3 compared to all other barcodes.

Reporter cell line screen. To generate reporter cell lines, EGFP from pLX_TRC209 (Broad Genetic Perturbation Platform) followed by a T2A (GGCAGTGGAGAGGGCAGAGGAAGTCTGCTAACATGCGGTGACGTCGAGGAGAA TCCTGGCCCA (SEQ ID NO: 10809)) self-cleaving peptide was inserted at the N-terminus of endogenous SLC1A3 and VIM genomic sequences. Clonal reporter cell lines were generated using CRISPR-Cas9 mediated HDR. To construct the HDR plasmids for each gene, the HDR templates that consisted of the 850-1,000 bp genomic regions flanking the sgRNA cleavage sites were PCR amplified from HUES66 genomic DNA using KAPA HiFi HotStart Readymix (KAPA Biosystems KK2602). Then EGFP-T2A flanked by HDR templates were cloned into pUC19 (Addgene 50005). HUES66 cells were nucleofected with 10 μg of sgRNA and Cas9 plasmid (Addgene 52961) and 6 μg of HDR plasmid using the P3 Primary Cell 4D-Nucleofector X Kit (Lonza V4XP-3024) according to the manufacturer's instructions. Cells were then seeded sparsely (2 electroporation reactions per 10-cm cell culture dish) to form single-cell clones. After 18 h, cells were selected for Cas9 expression with 0.5 μg/mL Puromycin for 2 days and expanded until colonies can be picked (˜1 week).

Cell colonies were detached by replacing the media with PBS and incubating at room temperature for 15 mins. Each cell colony was removed from the Petri dish using a 200 μL pipette tip and transferred a well in a 96-well plate for expansion. Clones with EGFP insertions were identified by 2-round PCR amplification (Table 13), first with primers amplifying outside of the HDR template (HDR Fwd 1 and HDR Rev, 15 cycles) and then with primers amplifying the region of insertion (HDR Fwd 2 and HDR Rev, 15 cycles) to avoid detecting the HDR template plasmid as a false positive. Products were run on a gel to identify clones with insertions and Sanger sequencing confirmed that EGFP had been inserted at the intended site without mutations. For each reporter cell line, 3 clones with EGFP inserted into one of the two alleles were selected for further expansion and characterization.

For TF ORF screening using reporter hESC lines, SLCIA3 or VIM reporter HUES66 cell lines were transduced with the pooled TF ORF library at MOI <0.3 and differentiated into iNPs as described above. After 7 days of differentiation, 5-10×106 cells were sorted for EGFP expression using the Sony SH800S Cell Sorter. For each clonal line, the percentage of cells sorted for the control condition was matched to those expressing EGFP (˜15-20%). After sorting, TF barcodes from each population were amplified (Table 13) and deep-sequenced on the Illumina MiSeq platform as previously described (>0.5 million reads per cell population) (Joung et al., 2017b). NGS reads that perfectly matched each barcode were counted and normalized to the total number of perfectly matched NGS reads for each condition. Enrichment of each TF was calculated as the normalized barcode count in the high population divided by the count in the low population.

Flow-FISH screen. For TF ORF screening using flow-FISH, HUES66 cells were transduced with the pooled TF ORF library at MOI <0.3 and differentiated into iNPs as described above. After 7 days of differentiation, cells were labeled with the appropriate FISH probes (Table 14) using the PrimeFlow RNA assay kit (Thermo Fisher Scientific 88-18005-204) with 20 million cells in 4 reactions per biological replicate. FISH probes targeting transcripts with similar expression levels were pooled together. Once the cells were labeled, the entire cell population was sorted for high or low fluorescence (15% of cells per bin), indicating an aggregate expression level of the transcripts labeled with the pooled FISH probes for the particular wavelength. After sorting, TF barcodes from each population were amplified (Table 13) using a modified ChIP reverse cross-linking protocol as described previously (Fulco et al., 2019) and deep-sequenced on the Illumina NextSeq platform (>4 million reads per cell population). Enrichment of each TF was calculated as described above for the reporter cell line screen.

Single-cell RNA sequencing (scRNA-seq) and data analysis. Cells were dissociated with Accutase (STEMCELL Technologies 07920) for 10 mins (NP) or 50 mins (spontaneously differentiated cells) at 37° C. and filtered using a 70 μm cell strainer (MilliporeSigma CLS431751) to obtain single cells. Cells were resuspended in PBS containing 0.04% BSA, counted, and loaded in the 10× Genomics Chromium Controller. 10,000 cells were used as input for each channel of a 10× Chromium Chip. For cells from the scRNA-seq pooled screen and spontaneous differentiation of four candidate TFs, scRNA-seq libraries were prepared using the Chromium Single Cell 3′ Library & Gel Bead Kit v2 (10× Genomics 120237) according to the manufacturer's instructions. Libraries were sequenced on the NextSeq platform, aiming for a minimum coverage of 20,000 reads per single cell (paired-end; read 1: 26 cycles; i7 index: 8 cycles, i5 index: 0 cycles; read 2: 55 cycles). For cells from the NP method comparison and spontaneous differentiation of RFX4-DS-iNPs, scRNA-seq libraries were prepared using the Chromium Single Cell 3′ Library & Gel Bead Kit v3 (10x Genomics 1000075) and sequenced on the HiSeq X platform (paired-end; read 1: 28 cycles; i7 index: 8 cycles, i5 index: 0 cycles; read 2: 96 cycles).

Sequencing data were aligned and quantified using the Cell Ranger Single-Cell Software Suite v3.1.0 (10× Genomics) (Zheng et al., 2017) against the GRCh38 human reference genome provided by Cell Ranger. The Python package Scanpy v1.4.4 (Wolf et al., 2018) was used to cluster and visualize cells. Cells with 400-7,000 detected genes and less than 5% total mitochondrial gene expression were retained for analysis. Genes that were detected in fewer than 3 cells were removed. Scanpy was used to log normalize, scale, and center the data and unwanted variation was removed by regressing out the number of UMIs and percent mitochondrial reads. Next, highly variable genes were identified and used as input for dimensionality reduction via principal component analysis (PCA). The resulting principal components were then used to cluster the cells, which were visualized using Uniform manifold approximation and projection (UMAP). Clusters were identified using Louvain by fitting the top 50 principal components to compute a neighborhood graph of observations with local neighborhood number of 20 using the scanpy.pp.neighbors function. Cells were then clustered into subgroups using the Louvain algorithm implemented as the scanpy.tl.louvain function. Cluster marker genes and associated p-values were identified using the scanpy.tl.rank_gene_groups function.

For scRNA-seq analysis of the pooled 90 TF screen for NP differentiation, distance between cells with different TF perturbations was calculated using the scipy.spatial.distance.cdist function from the SciPy Python library. For each TF perturbation, the pairwise distance between cells with the TF perturbation and cells without the TF perturbation was calculated and the median of the distances was determined. The 939 highly variable genes were used in the distance calculation. To identify TFs that produced transcriptome profiles similar to radial glia from human fetal cortex or brain organoid, TF scRNA-seq signatures were correlated to available scRNA-seq datasets (Nowakowski et al., 2017; Pollen et al., 2015; Quadrato et al., 2017). The 218 most variable genes in the scRNA-seq data, which were identified using the scanpy.pp.highly_variable_genes function with the parameters “min_mean=0.075, max_mean=8 and min_disp=1.5”, were used for the correlation analysis. The Spearman correlations between expression of these genes in each TF-perturbed single cell and the average expression in radial glia scRNA-seq from human fetal cortex or organoid were calculated. Then, the average correlation of each TF was determined by taking the average of the corresponding TF-perturbed single cell correlations. Candidate TFs were ranked based on the z-score of the average correlation across all datasets. For comparing TF transcriptome signatures to other cell types from the mouse organogenesis cell atlas (Cao et al., 2019), average expression of the top 30 marker genes (ranked by p-value) for each cell type was used to assess similarity. The z-score of the average marker gene expression for cells perturbed by each TF was used to identify TF perturbations that were most similar to each cell type.

For determining consistency within batch replicates of different iNP differentiation methods, the cluster of spontaneously differentiated neurons was excluded from the analysis. Distance between cells within the same batch replicate was calculated using the scipy.spatial.distance.pdist function from the SciPy Python library. The 2,305 highly variable genes were used in the distance calculation. For determining consistency between batch replicates, distance between cells in different batch replicates of the same method was calculated using the scipy.spatial.distance.cdist function.

ScRNA-seq screen. For TF ORF screening using scRNA-seq, HUES66 cells were transduced with the pooled TF ORF library at MOI <0.3 and differentiated into iNPs. Then, iNPs were dissociated for scRNA-seq analysis as described above. To pair TF barcodes with cell barcodes, TF and cell barcodes were PCR amplified from cDNA retained following the whole transcriptome amplification step of the 10× Genomics scRNA-seq library preparation protocol (Table 13). The resulting amplicon was sequenced on the Illumina NextSeq platform, aiming for a minimum coverage of 20,000 reads per single cell (paired-end; read 1: 16 cycles; read 2: 72 cycles). For each cell, the TF whose corresponding barcode had the highest number of perfectly matching NGS reads was paired with the cell if the TF barcode had at least 2 reads and >25% more reads than the second highest TF. Otherwise, the cell was excluded from the scRNA-seq analysis.

Arrayed screen. For TF ORF screening in an arrayed format, individual TF ORF isoforms were packaged into lentivirus as described above. Cells were transduced at MOI <0.5 by seeding 1.6×104 cells in 96-well plates and adding the appropriate volume of lentivirus. Cells were differentiated into NP and harvested for qPCR at 7 days after transduction as described above.

Bulk RNA sequencing (RNA-seq) and data analysis. RNA from cells plated in 24-well plates and grown to 60-90% confluency was harvested using the RNeasy Plus Mini Kit (Qiagen 74134). RNA-seq libraries were prepared using NEBNext Ultra RNA Library Prep Kit for Illumina (NEB E7530S) and deep sequenced on the Illumina NextSeq platform (>9 million reads per biological replicate). Bowtie(Langmead et al., 2009) index was created based on the human hg38 UCSC genome and RefSeq transcriptome. Next, RSEM v1.3.1 (Li and Dewey, 2011) was run with command line options “--estimate-rspd --bowtie-chunkmbs 512 --paired-end” to align paired-end reads directly to this index using Bowtie and estimate expression levels in transcripts per million (TPM) based on the alignments.

To correlate TF ORF RNA-seq signatures to those from human fetal cortex or brain organoid (Nowakowski et al., 2017; Pollen et al., 2015; Quadrato et al., 2017), transcript measurements from each available dataset were converted to TPM. For each cell type, TPM measurements from single cells were averaged to obtain average TPM values of genes for the cell type. The top 2,000 genes that had the highest fold change between the TF ORF expression condition compared to the GFP control condition (stem cells overexpressing GFP that were cultured in mTeSR1 stem cell media) were used to define the TF ORF RNA-seq signature. Expression of these genes in TPM was used to calculate the Pearson correlation between the TF ORF and the cell type of interest from available datasets.

To identify genes that were differentially expressed as a result of TF ORF expression, RSEM's TPM estimates for each transcript were transformed to log-space by taking log 2(TPM+1). Transcripts were considered detected if their transformed expression level was equal to or above 1 (in log 2(TPM+1) scale). All genes detected in at least three libraries were used to find differentially expressed genes. The Student's t-test was performed on the TF ORF overexpression condition against GFP control condition. Only genes that were significant (p-value pass 0.05 FDR correction) were reported.

For analysis of transcriptome changes as a result of DYRK1A perturbation, transcripts were considered detected if the average TPM of either the perturbed or control conditions was greater than 1. In the DYRK1A knockout perturbations, the Student's t-test was performed on the DYRK1A-targeting sgRNA condition against both non-targeting sgRNA conditions. In the DYRK1A overexpression perturbation, the Student's t-test was performed on the DYRK1A ORF condition against the GFP control condition. Volcano plots showed genes that had p-value pass 0.01 FDR correction with fold change that was greater or less than 1. The heat map of genes with DYRK1A dosage-dependent expression changes showed genes that had p-value pass 0.05 FDR correction.

Chromatin immunoprecipitation with sequencing (ChIP-seq). Cells were plated in 10-cm cell culture dishes and grown to 60-80% confluency. For each condition, two biological replicates were harvested for ChIP-seq. Formaldehyde (MilliporeSigma 252549) was added directly to the growth media for a final concentration of 1% and cells were incubated at 37° C. for 10 mins to initiate chromatin fixation. Fixation was quenched by adding 2.5 M glycine (MilliporeSigma G7126) in PBS for a final concentration of 125 mM glycine and incubated at room temperature for 5 mins. Cells were then washed with ice-cold PBS, scraped, and pelleted at 1,000×g for 5 mins.

Cell pellets were prepared for ChIP-seq using the Epigenomics Alternative Mag Bead ChIP Protocol v2.0 (Consortium, 2004). Briefly, cell pellets were resuspended in 100 μL of lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris-HCL pH 8.1) containing protease inhibitor cocktail (MilliporeSigma 05892791001) and incubated for 10 mins at 4° C. Then 400 μL of dilution buffer (0.01% SDS, 1.1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-HCl pH 8.1, and 167 mM NaCl) containing protease inhibitor cocktail (MilliporeSigma 05892791001) was added. Samples were pulse sonicated with 2 rounds of 10 mins (30s on-off cycles, high frequency) in a rotating water bath sonicator (Diagenode Bioruptor) with 5 mins on ice between each round. 10 μL of sonicated sample was set aside as input control. Then 500 μL of dilution buffer (0.01% SDS, 1.1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-HCl pH 8.1, and 167 mM NaCl) containing protease inhibitor cocktail (MilliporeSigma 05892791001) and 1 μL of anti-V5 (Thermo Fisher Scientific R960-25) was added to the sonicated sample. ChIP samples were rotated end over end overnight at 4° C.

For each ChIP, 50 L of Protein A/G Magnetic Beads (Thermo Fisher Scientific 88802) was washed with 1 mL of blocking buffer (0.5% TWEEN and 0.5% BSA in PBS) containing protease inhibitor cocktail (MilliporeSigma 05892791001) twice before resuspending in 100 μL of blocking buffer. ChIP samples were transferred to the beads and rotated end over end for 1 h at 4° C. ChIP supernatant was then removed and the beads were washed twice with 200 μL of RIPA low salt buffer (0.1% SDS, 1% Triton x-100, 1 mM EDTA, 20 mM Tris-HCl pH 8.1, 140 mM NaCl, 0.1% DOC), twice with 200 μL of RIPA high salt buffer (0.1% SDS, 1% Triton x-100, 1 mM EDTA, 20 mM Tris-HCl pH 8.1, 500 mM NaCl, 0.1% DOC), twice with 200 μL of LiCl wash buffer (250 mM LiCl, 1% NP40, 1% DOC, 1 mM EDTA, 10 mM Tris-HCl pH 8.1), and twice with 200 μL of TE (10 mM Tris-HCl pH8.0, 1 mM EDTA pH 8.0). ChIP samples were eluted with 50 μL of elution buffer (10 mM Tris-HCl pH 8.0, 5 mM EDTA, 300 mM NaCl, 0.1% SDS). 40 μL of water was added to the input control samples. 8 μL of reverse cross-linking buffer (250 mM Tris-HCl pH 6.5, 62.5 mM EDTA pH 8.0, 1.25 M NaCl, 5 mg/ml Proteinase K, 62.5 μg/ml RNAse A) was added to the ChIP and input control samples and then incubated at 65° C. for 5 h. After reverse crosslinking, samples were purified using 116 μL of SPRIselect Reagent (Beckman Coulter B23318).

ChIP samples were prepared for NGS with NEBNext Ultra II DNA Library Prep Kit for Illumina (NEB E7645S) and deep-sequenced on the Illumina NextSeq platform (>60 million reads per condition). Bowtie (Langmead et al., 2009) was used to align paired-end reads to the human hg38 UCSC genome with command line options q -X 300 --sam --chunkmbs 512″. Next, biological replicates were merged and Model-based Analysis of ChIP-seq (MACS) (Feng et al., 2012) was run with command line options “-g hs -B -S --mfold 6,30” to identify TF peaks. HOMER (Heinz et al., 2010) was used to discover motifs in the TF peak regions identified by MACS. The findMotifsGenome.pl program from HOMER was run with the command line options “-size 200 -mask” and the top 3 known and de novo motifs were presented. TFs were considered potential regulators of a candidate gene if the TF peak region identified by MACS overlapped with the 20 kb region centered around the transcriptional start site of the candidate gene based on RefSeq annotations.

Indel analysis. Cells plated in 96-well plates were grown to 60-80% confluency and assessed for indel rates as previously described (Joung et al., 2017b). Genomic DNA was harvested from cells using QuickExtract DNA Extraction kit (Lucigen QE09050). The genomic region flanking the site of interest was amplified using NEBNext High Fidelity 2x PCR Master Mix (New England BioLabs M0541L), first with region-specific primers (Table 13) for 15 cycles and then with barcoded primers for 15 cycles as previously described. PCR products were sequenced on the Illumina MiSeq platform (>10,000 reads per condition), and indel analysis was performed as previously described (Joung et al., 2017b).

Click-iT EdU flow cytometry assay. Cells plated in 24-well plates were differentiated and EdU incorporation was measured using the Click-iT EdU Alexa Fluor 488 Flow Cytometry Assay Kit (Thermo Fisher Scientific C10420) according to a modified version of the manufacturer's instructions. EdU was added to the culture medium to a final concentration of 10 μM for 2 h before cells were dissociated with Accutase (STEMCELL Technologies 07920) for 15-45 mins at 37° C. Cells were transferred to a 96-well plate, pelleted at 200×g for 5 mins, and washed once with 200 μL of 1% BSA (MilliporeSigma A9418) in PBS. Cells were resuspended in 100 μL of Click-iT fixative and incubated for 15 mins at room temperature in the dark. After fixing, cells were washed with 200 μL of 1% BSA (MilliporeSigma A9418) in PBS twice, resuspended in 100 μL of Click-iT saponin-based permeabilization and wash reagent, and incubated for 15 mins in the dark. To each sample, 500 μL of Click-iT reaction cocktail was added and the reaction mixture was incubated for 30 mins at room temperature in the dark. Cells were washed with 200 μL of Click-iT saponin-based permeabilization and wash reagent twice and resuspended in 200 μL of 1% BSA (MilliporeSigma A9418) in PBS before analysis on a CytoFLEX Flow Cytometer (Beckman Coulter). For each sample, 10,000 cells were analyzed with FlowJo (FlowJo). Significance testing was performed using Student's t-test.

Electrophysiology. Whole-cell patch-clamp recordings were performed as described (doi: 10.1016/j.celrep.2018.04.066). Recording pipettes were pulled from thin-walled borosilicate glass capillary tubing (KG33, King Precision Glass, CA, USA) on a P-97 puller (Sutter Instrument, CA, USA) and had resistances of 3-5 M2 when filled with internal solution (in mM: 128 K-gluconate, 10 HEPES, 10 phosphocreatine sodium salt, 1.1 EGTA, 5 ATP magnesium salt and 0.4 GTP sodium salt, pH=7.3, 300-305 mOsm). The cultured cells were constantly perfused at a speed of 3 ml/min with the extracellular solution (119 mM NaCl, 2.3 mM KCl, 2 mM CaCl2, 1 mM MgCl2, 15 mM HEPES, 5 mM glucose, pH=7.3-7.4, Osmolarity was adjusted to 325 mOsm with sucrose). All the experiments were performed at room temperature unless otherwise specified.

Cells were visualized with a 40X water-immersion objective on an upright microscope (Olympus, Japan) equipped with IR-DIC. Recordings were made using a Multiclamp 700B amplifier (Molecular Devices, CA, USA) and Clampex 10.7 software (Molecular Devices, CA, USA). In current clamp mode, membrane potential was held at −65 mV with a Multiclamp 700B amplifier, and step currents were then injected to elicit action potentials. Subsequent analysis was performed using Clampfit 10.7 software (Molecular Devices, CA, USA). The spontaneous AMPA receptor mediated excitatory postsynaptic currents (sEPSCs) were recorded after entering whole-cell path clamp recording mode at least for 3 min. The data were stored on a computer for subsequent off-line analysis. Cells in which the series resistance (Rs) changed by >20% were excluded for data analysis. In addition, cells with Rs more than 20 MΩ at any time during the recordings were discarded.

Reagent availability. The pooled and arrayed versions of MORF have been deposited at Addgene for distribution to the scientific community.

Code availability. Applicants have provided a Python script for aggregating gene lists from different datasets and selecting marker genes and TFs from MORF on the Feng Zhang lab GitHub page (github.com/fengzhanglab/TF_screen_manuscript).

REFERENCES

  • 1 Cohen, D. E. & Melton, D. Turning straw into gold: directing cell fate for regenerative medicine. Nat Rev Genet 12, 243-252, doi: 10.1038/nrg2938 (2011).
  • 2 Colman, A. & Dreesen, O. Pluripotent stem cells and disease modeling. Cell Stem Cell 5, 244-247, doi: 10.1016/j.stem.2009.08.010 (2009).
  • 3 Keller, G. Embryonic stem cell differentiation: emergence of a new era in biology and medicine. Genes Dev 19, 1129-1155, doi: 10.1101/gad.1303605 (2005).
  • 4 Kiskinis, E. & Eggan, K. Progress toward the clinical application of patient-specific pluripotent stem cells. J Clin Invest 120, 51-59, doi: 10.1172/JCI40553 (2010).
  • 5 Robinton, D. A. & Daley, G. Q. The promise of induced pluripotent stem cells in research and therapy. Nature 481, 295-305, doi: 10.1038/nature10761 (2012).
  • 6 Furuyama, K. et al. Diabetes relief in mice by glucose-sensing insulin-secreting human alpha-cells. Nature, doi: 10.1038/s41586-019-0942-8 (2019).
  • 7 Pang, Z. P. et al. Induction of human neuronal cells by defined transcription factors. Nature 476, 220-223, doi:10.1038/nature10202 (2011).
  • 8 Song, K. et al. Heart repair by reprogramming non-myocytes with cardiac transcription factors. Nature 485, 599-604, doi: 10.1038/nature11139 (2012).
  • 9 Sugimura, R. et al. Haematopoietic stem and progenitor cells from human pluripotent stem cells. Nature 545, 432-438, doi: 10.1038/nature22370 (2017).
  • 10 Takahashi, K. et al. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131, 861-872, doi:10.1016/j.cell.2007.11.019 (2007).
  • 11 Weintraub, H. et al. Activation of muscle-specific genes in pigment, nerve, fat, liver, and fibroblast cell lines by forced expression of MyoD. Proc Natl Acad Sci USA 86, 5434-5438 (1989).
  • 12 Zhang, Y. et al. Rapid single-step induction of functional neurons from human pluripotent stem cells. Neuron 78, 785-798, doi:10.1016/j.neuron.2013.05.029 (2013).
  • 13 Zhang, S. C., Wernig, M., Duncan, I. D., Brustle, O. & Thomson, J. A. In vitro differentiation of transplantable neural precursors from human embryonic stem cells. Nat Biotechnol 19, 1129-1133, doi: 10.1038/nbt1201-1129 (2001).
  • 14 Chambers, S. M. et al. Highly efficient neural conversion of human ES and iPS cells by dual inhibition of SMAD signaling. Nat Biotechnol 27, 275-280, doi:10.1038/nbt.1529 (2009).
  • 15 Hu, B. Y. et al. Neural differentiation of human induced pluripotent stem cells follows developmental principles but with variable potency. Proc Natl Acad Sci USA 107, 4335-4340, doi:10.1073/pnas.0910012107 (2010).
  • 16 Konermann, S. et al. Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex. Nature 517, 583-588, doi: 10.1038/nature14136 (2015).
  • 17 Camp, J. G. et al. Human cerebral organoids recapitulate gene expression programs of fetal neocortex development. Proc Natl Acad Sci USA 112, 15672-15677, doi:10.1073/pnas.1520760112 (2015).
  • 18 Johnson, M. B. et al. Single-cell analysis reveals transcriptional heterogeneity of neural progenitors in human cortex. Nat Neurosci 18, 637-646, doi:10.1038/nn.3980 (2015).
  • 19 Llorens-Bobadilla, E. et al. Single-Cell Transcriptomics Reveals a Population of Dormant Neural Stem Cells that Become Activated upon Brain Injury. Cell Stem Cell 17, 329-340, doi:10.1016/j.stem.2015.07.002 (2015).
  • 20 Pollen, A. A. et al. Molecular identity of human outer radial glia during cortical development. Cell 163, 55-67, doi:10.1016/j.cell.2015.09.004 (2015).
  • 21 Shin, J. et al. Single-Cell RNA-Seq with Waterfall Reveals Molecular Cascades underlying Adult Neurogenesis. Cell Stem Cell 17, 360-372, doi:10.1016/j.stem.2015.07.013 (2015).
  • 22 Thomsen, E. R. et al. Fixed single-cell transcriptomic characterization of human radial glial diversity. Nat Methods 13, 87-93, doi:10.1038/nmeth.3629 (2016).
  • 23 Wu, J. Q. et al. Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proc Natl Acad Sci USA 107, 5254-5259, doi:10.1073/pnas.0914114107 (2010).
  • 24 Zhang, Y. et al. Purification and Characterization of Progenitor and Mature Human Astrocytes Reveals Transcriptional and Functional Differences with Mouse. Neuron 89, 37-53, doi:10.1016/j.neuron.2015.11.013 (2016).
  • 25 Quadrato, G. et al. Cell diversity and network dynamics in photosensitive human brain organoids. Nature 545, 48-53, doi: 10.1038/nature22047 (2017).
  • 26 Nowakowski, T. J. et al. Spatiotemporal gene expression trajectories reveal developmental hierarchies of the human cortex. Science 358, 1318-1323, doi:10.1126/science.aap8809 (2017).
  • 27 Casarosa, S., Fode, C. & Guillemot, F. Mash1 regulates neurogenesis in the ventral telencephalon. Development 126, 525-534 (1999).
  • 28 Zhang, X. et al. Pax6 is a human neuroectoderm cell fate determinant. Cell Stem Cell 7, 90-100, doi:10.1016/j.stem.2010.04.017 (2010).
  • 29 Murre, C. et al. Interactions between heterologous helix-loop-helix proteins generate complexes that bind specifically to a common DNA sequence. Cell 58, 537-544 (1989).
  • 30 Morotomi-Yano, K. et al. Human regulatory factor X 4 (RFX4) is a testis-specific dimeric DNA-binding protein that cooperates with other human RFX members. J Biol Chem 277, 836-842, doi:10.1074/jbc.M108638200 (2002).
  • 31 O'Roak, B. J. et al. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders. Science 338, 1619-1622, doi:10.1126/science.1227764 (2012).
  • 32 Smith, D. J. et al. Functional screening of 2 Mb of human chromosome 21q22.2 in transgenic mice implicates minibrain in learning defects associated with Down syndrome. Nat Genet 16, 28-36, doi:10.1038/ng0597-28 (1997).
  • 33 Fotaki, V. et al. Dyrk1A haploinsufficiency affects viability and causes developmental delay and abnormal brain morphology in mice. Mol Cell Biol 22, 6636-6647 (2002).
  • 34 Hammerle, B. et al. Transient expression of Mnb/Dyrkla couples cell cycle exit and differentiation of neuronal precursors by inducing p27KIP1 expression and suppressing NOTCH signaling. Development 138, 2543-2554, doi:10.1242/dev.066167 (2011).
  • 35 Park, J. et al. Dyrk1A phosphorylates p53 and inhibits proliferation of embryonic neuronal cells. J Biol Chem 285, 31895-31906, doi:10.1074/jbc.M110.147520 (2010).
  • 36 Yabut, O., Domogauer, J. & D'Arcangelo, G. Dyrk1A overexpression inhibits proliferation and induces premature neuronal differentiation of neural progenitor cells. J Neurosci 30, 4004-4014, doi: 10.1523/JNEUROSCI.4711-09.2010 (2010).
  • 37 Soppa, U. et al. The Down syndrome-related protein kinase DYRK1A phosphorylates p27(Kip1) and Cyclin D1 and induces cell cycle exit and neuronal differentiation. Cell Cycle 13, 2084-2100, doi: 10.4161/cc.29104 (2014).
  • 38 Ashique, A. M. et al. The Rfx4 transcription factor modulates Shh signaling by regional control of ciliogenesis. Sci Signal 2, ra70, doi: 10.1126/scisignal.2000602 (2009).
  • 39 Blackshear, P. J. et al. Graded phenotypic response to partial and complete deficiency of a brain-specific transcript variant of the winged helix transcription factor RFX4. Development 130, 4539-4552, doi: 10.1242/dev.00661 (2003).
  • 40 Joung, J. et al. Genome-scale CRISPR-Cas9 knockout and transcriptional activation screening. Nat Protoc 12, 828-863, doi:10.1038/nprot.2017.016 (2017).
  • 41 Fulco, C. P. et al. Activity-by-Contact model of enhancer specificity from thousands of CRISPR perturbations. bioRxiv, 529990, doi: 10.1101/529990 (2019).
  • 42 Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun 8, 14049, doi: 10.1038/ncomms14049 (2017).
  • 43 Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33, 495-502, doi:10.1038/nbt.3192 (2015).
  • 44 Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25, doi:10.1186/gb-2009-10-3-r25 (2009).
  • 45 Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323, doi: 10.1186/1471-2105-12-323 (2011).
  • 46 Consortium, E. P. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636-640, doi: 10.1126/science.1105136 (2004).
  • 47 Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using MACS. Nat Protoc 7, 1728-1740, doi:10.1038/nprot.2012.101 (2012).
  • 48 Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 576-589, doi:10.1016/j.molcel.2010.05.004 (2010).
  • 49 Campisi, J. (2001). Cellular senescence as a tumor-suppressor mechanism. Trends Cell Biol 11, S27-31.
  • 50 Cao, J., Spielmann, M., Qiu, X., Huang, X., Ibrahim, D. M., Hill, A. J., Zhang, F., Mundlos, S., Christiansen, L., Steemers, F. J., et al. (2019). The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496-502.
  • 51 Damell, J. E., Jr. (2002). Transcription factors as targets for cancer therapy. Nat Rev Cancer 2, 740-749.
  • 52 De Rubeis, S., He, X., Goldberg, A. P., Poultney, C. S., Samocha, K., Cicek, A. E., Kou, Y., Liu, L., Fromer, M., Walker, S., et al. (2014). Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209-215.
  • 53 Englund, C., Fink, A., Lau, C., Pham, D., Daza, R. A., Bulfone, A., Kowalczyk, T., and Hevner, R. F. (2005). Pax6, Tbr2, and Tbr1 are expressed sequentially by radial glia, intermediate progenitor cells, and postmitotic neurons in developing neocortex. J Neurosci 25, 247-251.
  • 54 Frantz, G. D., Weimann, J. M., Levin, M. E., and McConnell, S. K. (1994). Otx 1 and Otx2 define layers and regions in developing cerebral cortex and cerebellum. J Neurosci 14, 5725-5740.
  • 55 Gilbert, L. A., Horlbeck, M. A., Adamson, B., Villalta, J. E., Chen, Y., Whitehead, E. H., Guimaraes, C., Panning, B., Ploegh, H. L., Bassik, M. C., et al. (2014). Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation. Cell 159, 647-661.
  • 56 Gotz, M., Stoykova, A., and Gruss, P. (1998). Pax6 controls radial glia differentiation in the cerebral cortex. Neuron 21, 1031-1044.
  • 57 Liu, Y., Yu, C., Daley, T. P., Wang, F., Cao, W. S., Bhate, S., Lin, X., Still, C., 2nd, Liu, H., Zhao, D., et al. (2018). CRISPR Activation Screens Systematically Identify Factors that Drive Neuronal Fate and Reprogramming. Cell Stem Cell 23, 758-771 e758.
  • 58 Matsunaga, E., Nambu, S., Oka, M., and Iriki, A. (2015). Complex and dynamic expression of cadherins in the embryonic marmoset cerebral cortex. Dev Growth Differ 57, 474-483.
  • 59 Reinchisi, G., Ijichi, K., Glidden, N., Jakovcevski, I., and Zecevic, N. (2012). COUP-TFII expressing interneurons in human fetal forebrain. Cereb Cortex 22, 2820-2830. Schafer, S. T., Paquola, A. C. M., Stern, S., Gosselin, D., Ku, M., Pena, M., Kuret, T. J. M., 60 Liyanage, M., Mansour, A. A., Jaeger, B. N., et al. (2019). Pathological priming causes developmental gene network heterochronicity in autistic subject-derived neurons. Nat Neurosci 22, 243-255.
  • 61 Shi, Y., Kirwan, P., and Livesey, F. J. (2012a). Directed differentiation of human pluripotent stem cells to cerebral cortex neurons and neural networks. Nat Protoc 7, 1836-1846.
  • 62 Shi, Y., Kirwan, P., Smith, J., Robinson, H. P., and Livesey, F. J. (2012b). Human cerebral cortex development from pluripotent stem cells to functional excitatory synapses. Nat Neurosci 15, 477-486, S471.
  • 63 Steele-Perkins, G., Plachez, C., Butz, K. G., Yang, G., Bachurski, C. J., Kinsman, S. L., Litwack, E. D., Richards, L. J., and Gronostajski, R. M. (2005). The transcription factor gene Nfib is essential for both lung maturation and brain development. Mol Cell Biol 25, 685-698.
  • 64 UniProt, C. (2015). UniProt: a hub for protein information. Nucleic Acids Res 43, D204-212.
  • 65 Wolf, F. A., Angerer, P., and Theis, F. J. (2018). SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19, 15.
  • 66 Zhang, H. M., Chen, H., Liu, W., Liu, H., Gong, J., Wang, H., and Guo, A. Y. (2012). AnimalTFDB: a comprehensive animal transcription factor database. Nucleic Acids Res 40, D144-149.
  • 67 Zhang, H. M., Liu, T., Liu, C. J., Song, S., Zhang, X., Liu, W., Jia, H., Xue, Y., and Guo, A. Y. (2015). AnimalTFDB 2.0: a resource for expression, prediction and functional study of animal transcription factors. Nucleic Acids Res 43, D76-81.

Tables

TABLE 1 TF ORF isoforms and respective ranks in the screens. The TF ORF library consisted of 90 TF isoforms of 70 TF genes that were synthesized with a 24bp barcode (SEQ ID NO: 1-90). Ranks of TF ORF isoforms in each NP screening method are reported. For the arrayed screen, the median values of the SLC1A3 and VIM mRNA expression fold changes at day 7 were averaged, and the TFs were ranked based on the average. “NA” ranks represent cases where the SLC1A3 or VIM mRNA expression levels could not be detected. For the reporter cell line screen, the median values of the TF barcode enrichment in the SLC1A3 and VIM screens were averaged to determine the TF rank. For the Flow-FISH screens, the TF ranks were determined based on the average TF barcode enrichment. For the single-cell RNA-seq screen, TFs were ranked based on average correlation with radial glia from human fetal cortex and organoids provided in existing datasets. “NA” indicates that the TF barcode was not detected in the single-cell RNA-seq data. Flow- Flow- FISH FISH Barcode Rank Rank scRNA- Perturb- Gene (SEQ ID Average Arrayed Reporter (2 (10 seq seq Name RefSeq Isoform NO) Rank Rank Rank genes) genes) Rank Rank ARX NM_139058  1 NA NA 88 39 83 37 78 ASCL1 NM_004316  2 25  2 62 22 27 12  1 BCL11A NM_022893  3 NA NA  9 61 38 21 17 BCL11A NM_138559  4 47.2 17 53 58 63 45 43 BRIP1 NM_032043  5 NA NA 33 12 14 23  5 CDK1 NM_001170407  6 53.8 27 41 63 71 67 42 CDK1 NM_001786  7 45.8 22 30 56 56 65 54 CENPA NM_001042426  8 57.4 19 48 66 74 80 32 CHAF1A NM_005483  9 48.4 57 87 33 30 35  9 CXXC1 NM_014593 10 NA NA 14 53 48 58 69 E2F1 NM_005225 11 NA NA 86 90 90 26 29 E2F2 NM_004091 12 NA NA 83 84 87 17 22 E2F7 NM_203394 13 NA NA 15 27 41 24 39 E2F8 NM_001256372 14 NA NA 84 14 15 42 12 EGR1 NM_001964 15 68.6 60 65 80 69 69 74 EMX2 NM_001165924 16 67.8 44 59 81 81 74 62 EMX2 NM_004098 17 NA NA 46 50 88 43  8 ENO1 NM_001201483 18 53.6 40 39 67 60 62 25 ENO1 NM_001428 19 61 33 64 71 77 60 59 EOMES NM_001278183 20 61.8 41 67 68 61 72 76 EOMES NM_005442 21 11.2  3 19  2 10 22 15 FANCD2 NM_001018115 22 NA NA 57 10 17 NA NA FEZF2 NM_018008 23 54.4 12 79 88 84  9 30 FOS NM_005252 24 20.8  6  4  7  9 78 86 FOXG1 NM_005249 25 59.6 32 85 65 59 57 77 FOXM1 NM_001243089 26 73.8 54 71 85 89 70 23 FOXM1 NM_202003 27 71.6 51 72 87 85 63 18 FOXN4 NM_213596 28 NA NA 28 41 46 36 45 GLI3 NM_000168 29 42.8 46  1 75 42 50 84 H2AFX NM_002105 30 62.4 26 61 72 76 77 67 HELLS NM_001289068 31 29.4 39  7 25 24 52 37 HELLS NM_001289073 32 36.8 36 32 40 36 40 24 HES1 NM_005524 33 NA NA 24  6  7 20 63 HES5 NM_001010926 34 63.4 53 77 79 52 56 41 HMGB1 NM_002128 35 46 45 40 57 54 34 36 HMGB2 NM_002129 36 62 52 54 64 67 73 53 HOPX NM_139212 37 67 30 63 78 80 84 52 ID3 NM_002167 38 42.2 20 45 43 39 64 81 ID4 NM_001546 39 31.4 14 47 35 20 41 50 INSM1 NM_002196 40 50 35 13 86 78 38 82 KLF15 NM_014079 41 NA NA 34 26 28 47 80 LHX2 NM_004789 42 12.6 13 25  9  6 10 55 MAZ NM_001276275 43 NA NA 70 82 73 51 31 MAZ NM_001276276 44 69.4 29 66 83 82 87 65 MEIS1 NM_002398 45 26.4 23 26 34 34 15 19 MXD3 NM_001142935 46 52 59 44 48 55 54 75 MXD3 NM_031300 47 44.4 56 43 55 43 25 64 MYBL2 NM_001278610 48 68.6 49 82 70 66 76 79 NFATC4 NM_001198966 49 48.4 15 78 89 44 16 14 NFATC4 NM_001288802 50 66 55 90 51 75 59 83 NFIB NM_001190737 51  5  1  5  8  8  3 16 NFIB NM_005596 52 11.6  5 18 17 12  6 11 NFIC NM_001245004 53 11.4 10 35  3  5  4  3 NFIC NM_005597 54  7.6  7 21  4  4  2  6 NOTCH1 NM_017617 55 NA NA 75 28 31 NA NA NOTCH2 NM_024408 56 NA NA  6 37  1 NA NA NR1D1 NM_021724 57 61.2 61 23 76 64 82 66 OTX1 NM_014562 58 12.2  9  2  1  3 46 27 PAX6 NM_000280 59 13.2  8 20 15 18  5 38 PAX6 NM_001310159 60 13.6 18  8 16 19  7 26 PAX6 NM_001310160 61 65.8 48 56 69 70 86 56 PLK4 NM_001190799 62 NA NA 42 31 29 27 85 PLK4 NM_001190801 63 NA NA 74 18 33 14  2 POU3F2 NM_005604 64 54.8 58 55 62 86 13 13 PRMT7 NM_001184824 65 50.6 37 51 44 68 53 21 RAD54L NM_001142548 66 32 16 31 24 21 68 58 RCOR2 NM_173587 67 39.2 11 58 47 50 30 61 RFX2 NM_000635 68 39.8 31 50 45 45 28 70 RFX2 NM_134433 69 NA NA 11 46 35 71 34 RFX4 NM_001206691 70  6.2  4  3  5 11  8  4 RFX4 NM_032491 71 42.6 21 22 59 62 49 40 SMAD1 NM_001003688 72 56.4 34 38 60 65 85 57 SMARCC1 NM_003074 73 NA NA 76 20 22 39 73 SOX1 NM_005986 74 NA NA 52 36 26 33 48 SOX11 NM_003108 75 NA NA 89 54 51 32 68 SOX2 NM_003106 76 NA NA 12 13 13 44 44 SOX4 NM_003107 77 NA NA 81 21 32 19 87 SOX9 NM_000346 78 NA NA 37 11  2  1 10 SSRP1 NM_003146 79 40.2 38 17 38 47 61 49 STAT3 NM_003150 80 43 24 16 42 58 75 33 STAT3 NM_213662 81 52.2 43 68 32 37 81 47 TCF12 NM_207037 82 NA NA 10 23 16 31 20 TCF12 NM_207038 83 NA NA 60 19 23 29 72 TCF7L2 NM_001146286 84 NA NA 49 49 79 11  7 TEAD2 NM_001256662 85 69.6 47 73 77 72 79 46 TEAD2 NM_003598 86 45.8 25 69 29 40 66 51 TRIM28 NM_005762 87 53.6 50 36 74 53 55 35 UHRF1 NM_001290052 88 43.8 42 29 73 57 18 71 YAP1 NM_001195045 89 47.8 28 27 52 49 83 60 ZFP36L1 NM_001244701 90 NA NA 80 30 25 48 28 SEQ ID NO Barcode 1 GACGATAGGTTGTGAGACGTAGAA 2 ATCCCTCTGGATTGCGCATAGGAC 3 TCGACGACGCTCAGCGGCCTATCA 4 GGTTGGATCTTGTAGGGAGCTCGT 5 CTTTCTTGCAGTCGCACCGAGCTC 6 GGTTAGAAATTCACGTTCCTATAC 7 GACTTGGGAACTCCGCAGAGACTG 8 GACTAGTATCCTTCATAGCTCCTC 9 ACTCCCTCCAAGGTGAGCGTCCTT 10 ACGCTGAAATGACAATTTCCGGAA 11 CGTATTTCGCACCCGGAAAGGGTG 12 TCGGCCCTGCCATCACTTCATAGT 13 TGCAAGAGCCACGGAGTTTATGTC 14 GCTAATAAGTGTCCGAGCACTCAC 15 CAGTGGAAACCTGACATTTACTTA 16 AGCGCTCGCACGCGCGTATCCATC 17 TTCCCTGCTGTCATGCTCACCATG 18 ACTCAGATACTGGGCCTTCCTTCT 19 GGTAATTATTGGGCGTTATCTCCA 20 GACTTGTTACATCGTTCCATAGGC 21 TACTGGAGCCGGCTACAGATGCTC 22 TTCCTTATACGAACGCCGATTGAT 23 GCCGCCACTTTCAGCCTCCGACGC 24 ATCACATTGAAGTCATTTACACCT 25 CGTGAACCTCAAGTCAAGCCAAAC 26 CTAGCCCGAAGGACACCTATTCCC 27 TGGGTTGTTCCGTTACTCATGAAG 28 TCCGAAAGCCTTGTTATTGAAATT 29 TAGCGGAAGTCTAACTGACTTCAT 30 CGTTTCTTATACCTCCGGCCATGG 31 GGCGCCGGGTTTGCAATTTCCCTG 32 CGCATATCGATCCTGATCCCGAGC 33 ATCTCCTGGGCCCGAAGTTGACGA 34 AGTGTCGAGCTATATGCAGCCGAA 35 TTCATGGTGCTCATGTCGTATTTC 36 AGGCCATTATTGGGTCCCGTGATT 37 TGCCTGAGTTAAAGACGCAGCTGC 38 AGAAGTCCGAACTATACTTGGATC 39 ATGGCTTCGTGACTGTGATTATCG 40 CCTTAATGATGCGAAGGGAAATAT 41 AGCACCAATGATCGCCGCTAGATG 42 ACCGTAGGAGCACGTTGCGATGAA 43 ACATGAACTATTAAACGAAACGCA 44 TCCAGGTCACCGGCGCGGCAGAAA 45 GTGGTGCTAACATTTAGTCGCTAG 46 GCTCACGCGGGATTAATGCGTCGT 47 TCGCACTTCTTATTCTGAGGCCCG 48 GCTCTTTCATTTCCTAGCTCCCGA 49 GTTATTTGGAAGACATTCACCTTG 50 TTCAATCCGTTCCCTCGCACAGTG 51 ACTGGTTCACGCAGAAACTGGTTC 52 AACGGCGAACGAGACAGACCAACG 53 ATAGGCAGAATCTACGAGAGGCTA 54 CTAGCCATCTTCCGCAATGGGACC 55 CGAATTATATTTGATTAACTGTCT 56 TCGAGCAGAAGGTGAGTGTCGGAC 57 ACGCTTGGACTGCACACGGAGCGT 58 GACGAATGAAACGCAGGACCGAGA 59 GATACGTGGTCACGTCGTCCCACA 60 GGACGCCAACACCTACTGTCATAG 61 GCTTCATAAACTAGTATCGGCAGA 62 GACCGTTGCTGGTTGTACATGCGC 63 CTGAGACACGTACTCCGGGTTCAC 64 AGTAACTTCAGTGATTTAACATTC 65 TCACAAACCGGAGACTGAGATCTT 66 CCATCAGATCCTACGCCCTCGCTC 67 AGCGCGGCGCACAACATGCTAACT 68 AGCATGACATAGTACCGGAAGCCA 69 ATTATAGCACATCCGCTACTTAGT 70 GAGTCCTCGTAATGCGACACAGTC 71 CTTATTATGCTATTCGCCTGAAAG 72 CGTAGGCAGTTAGGGCTGTCTAGA 73 AGAGTTGTTGAGCTGTCATGTGCA 74 GTGCGATACTACATGTACCGTATA 75 AGTAGTCATCTTATCTGTTCAGTC 76 ACAGAAGGTCCGCCATCGTGCATA 77 GGTCCGTTACAAATCCGATCTCCC 78 TGTGACGTTTAATCGTAGAAGACA 79 CTATCGAGTGTTGGTGATGACTTG 80 GATCTAGCGCGAGACTCCGAACAT 81 CGTGGGACCAGGGATATAGGGCCA 82 AACCTTAGGTAAAGGGCCTACAGC 83 TAAACAGGGAAAGTAATTCGGTGT 84 GTAAGGTAGTTGGATCCAAATGAG 85 AATCGAGGTAGCCAACGTATCGGG 86 AGGAGACCTCCACTGCCATGCATT 87 ACCTTGGCAGACCTATGGTGAGAG 88 ATTCGGTTTGGTGGATCTCCGGAC 89 ATTATCTCCGCGAGGACCGAATGG 90 CGAGGTTGTGACCTCCTGGGAACA

TABLE 2 Radial glial cell markers Gene Name 1-20 Gene Name 21-40 VIM FKBP10 PTN LFNG SLC1A3 FADS2 CLU EDNRB CKB PTPRZ1 NR2E1 FAM107A NCAN OAF TTYH1 TNC MLC1 SFXN5 F3 PDLIM4 FABP7 PDLIM3 ATP1A2 HES5 DBI SPARCL1 GJA1 COL11A1 FJX1 ATP1B2 RGMA DDAH1 SLC1A2 MT3 PHGDH ALDOC GPR98 NES SCARA3 PAX6

TABLE 3 TF isoforms in the barcoded human TF library. The TF library consisted of 1,836 genes covering 3,548 isoforms that overlapped between RefSeq and Gencode annotations, as well as 2 control vectors expressing GFP and mCherry. 593 of the 3,548 isoforms were obtained from the Broad Genomic Perturbation Platform (Broad GPP) and sequence verified. The rest of the isoforms were synthesized by Genewiz. Some of the Broad GPP TF ORFs contained V5 epitope tags. Each TF has a unique 24-bp barcode that facilitates identification in pooled screens. ORF se- Bar- RefSeq Insert quence code Epi- Gene SEQ SEQ SEQ tope Source Name Name RefSeq and Gencode ID ID ID ID Tag Genewiz TFORF0001 HIF3A NM_022462, ENST00000244303 91 3641 7191 None Genewiz TFORF0002 HIF3A NM_152796, ENST00000472815 92 3642 7192 None Genewiz TFORF0003 HIF3A NM_152794, ENST00000300862 93 3643 7193 None Genewiz TFORF0004 HIF3A XM_005259153, ENST00000600383 94 3644 7194 None Genewiz TFORF0005 HIF3A NM_152795, ENST00000377670 95 3645 7195 None Genewiz TFORF0006 TULP4 XM_017011069, XM_017011070, NM_020245, ENST00000367097 96 3646 7196 None Genewiz TFORF0007 TULP4 XM_017011071, NM_001007466, ENST00000367094 97 3647 7197 None Genewiz TFORF0008 ZNF709 NM_152601, ENST00000397732 98 3648 7198 None Genewiz TFORF0009 ZNF708 NM_021269, ENST00000356929 99 3649 7199 None Genewiz TFORF0010 ZNF879 XM_011534550, XM_011534551, NM_001136116, 100 3650 7200 None XM_005265908, ENST00000444149 Genewiz TFORF0011 ZNF878 NM_001080404, ENST00000547628 101 3651 7201 None Genewiz TFORF0012 ZNF700 NM_001271848, ENST00000622593 102 3652 7202 None Genewiz TFORF0013 ZNF700 NM_144566, ENST00000254321 103 3653 7203 None Genewiz TFORF0014 ZNF707 NM_001100599, NM_001100598, NM_173831, 104 3654 7204 None NM_001288806, NM_001288805, XM_011516977, ENST00000358656, ENST00000532158, ENST00000532205, ENST00000418203 Genewiz TFORF0015 DRAP1 NM_006442, ENST00000312515 105 3655 7205 None Genewiz TFORF0016 IRX5 NM_005853, ENST00000394636 106 3656 7206 None Genewiz TFORF0017 IRX5 NM_001252197, ENST00000320990 107 3657 7207 None Genewiz TFORF0018 IRX4 NM_001278635, NM_001278633, ENST00000613726, 108 3658 7208 None ENST00000622814 Genewiz TFORF0019 IRX4 NM_016358, NM_001278634, NM_001278632, 109 3659 7209 None ENST00000513692, ENST00000231357, ENST00000505790 Genewiz TFORF0020 IRX6 NM_024335, ENST00000290552 110 3660 7210 None Genewiz TFORF0021 IRX1 NM_024337, ENST00000302006 111 3661 7211 None Genewiz TFORF0022 IRX3 NM_024336, ENST00000329734 112 3662 7212 None Genewiz TFORF0023 IRX2 XM_011513979, NM_033267, NM_001134222, 113 3663 7213 None ENST00000382611, ENST00000302057 Genewiz TFORF0024 FOXQ1 NM_033260, ENST00000296839 114 3664 7214 None Genewiz TFORF0025 DHX9 NM_001357, ENST00000367549 115 3665 7215 None Genewiz TFORF0026 ZNF45 NM_003425, XM_017027224, XM_017027221, 116 3666 7216 None XM_017027217, XM_017027222, XM_011527267, XM_017027219, XM_017027218, XM_011527269, XM_011527273, XM_017027223, XM_017027220, XM_017027225, XM_017027226, XM_017027227, XM_011527271, ENST00000269973, ENST00000615985, ENST00000589703 Genewiz TFORF0027 ZNF44 NM_001164276, ENST00000356109 117 3667 7217 None Genewiz TFORF0028 ZNF44 NM_016264, ENST00000355684 118 3668 7218 None Genewiz TFORF0029 ZNF43 NM_001256653, ENST00000357491 119 3669 7219 None Genewiz TFORF0030 ZNF43 NM_003423, ENST00000354959 120 3670 7220 None Genewiz TFORF0031 ZNF43 XM_011528259, XM_011528257, XM_017027214, 121 3671 7221 None NM_001256648, NM_001256649, NM_001256650, XM_017027209, XM_017027210, XM_017027208, XM_017027216, XM_017027207, XM_017027212, XM_017027213, XM_017027211, XM_017027215, ENST00000594012, ENST00000595461, ENST00000598381 Genewiz TFORF0032 GSC2 NM_005315, ENST00000086933 122 3672 7222 None Genewiz TFORF0033 SP1 NM_003109, XM_011538696, ENST00000426431 123 3673 7223 None Genewiz TFORF0034 SP2 NM_003110, ENST00000376741 124 3674 7224 None Genewiz TFORF0035 SP3 NM_003111, ENST00000310015 125 3675 7225 None Genewiz TFORF0036 SP3 NM_001017371, ENST00000418194, ENST00000640958 126 3676 7226 None Genewiz TFORF0037 SP5 NM_001003845, ENST00000375281 127 3677 7227 None Genewiz TFORF0038 SP7 NM_001300837, XM_011537900, ENST00000537210 128 3678 7228 None Genewiz TFORF0039 SP7 NM_152860, NM_001173467, ENST00000536324, 129 3679 7229 None ENST00000303846 Genewiz TFORF0040 ZNF676 NM_001001411, ENST00000397121 130 3680 7230 None Genewiz TFORF0041 ZNF675 NM_138330, ENST00000359788 131 3681 7231 None Genewiz TFORF0042 ZNF674 NM_001146291, ENST00000414387 132 3682 7232 None Genewiz TFORF0043 ZNF674 NM_001039891, XM_011543941, ENST00000523374 133 3683 7233 None Genewiz TFORF0044 ZNF672 NM_024836, XM_005270336, ENST00000306562 134 3684 7234 None Genewiz TFORF0045 ZNF671 NM_024833, ENST00000317398 135 3685 7235 None Genewiz TFORF0046 MAX NM_145113, ENST00000618858, ENST00000394606, 136 3686 7236 None ENST00000553928, ENST00000556979 Genewiz TFORF0047 MAX NM_001320415, XM_017021313, XM_017021312, 137 3687 7237 None ENST00000557277 Genewiz TFORF0048 MAX NM_197957, ENST00000341653 138 3688 7238 None Genewiz TFORF0049 MAX NM_145112, ENST00000358402 139 3689 7239 None Genewiz TFORF0050 MAZ NM_001276276, ENST00000562337 140 3690 7240 None Genewiz TFORF0051 MAZ NM_002383, ENST00000322945 141 3691 7241 None Genewiz TFORF0052 MAZ NM_001276275, ENST00000545521 142 3692 7242 None Genewiz TFORF0053 MAZ NM_001042539, ENST00000219782 143 3693 7243 None Genewiz TFORF0054 ZNF679 NM_153363, XM_017011797, ENST00000421025, 144 3694 7244 None ENST00000255746 Genewiz TFORF0055 ZNF678 NM_178549 145 3695 7245 None Genewiz TFORF0056 MAF XM_017023234, XM_017023233, XM_017023235, 146 3696 7246 None ENST00000569649 Genewiz TFORF0057 MAF NM_001031804, ENST00000393350 147 3697 7247 None Genewiz TFORF0058 MAF NM_005360, ENST00000326043 148 3698 7248 None Genewiz TFORF0059 CTBP1 NM_001328, ENST00000290921 149 3699 7249 None Genewiz TFORF0060 CTBP2 NM_001321014, NM_001290215, NM_001321013, 150 3700 7250 None NM_001321012, NM_001290214, NM_001329, NM_001083914, XM_017015757, XM_011539355, XM_005269567, XM_005269564, XM_005269571, XM_017015756, XM_011539354, XM_011539351, XM_006717642, XM_011539353, XM_005269561, XM_005269569, XM_005269568, XM_005269572, ENST00000337195, ENST00000531469, ENST00000494626, ENST00000411419 Genewiz TFORF0061 CTBP2 XM_011539349, ENST00000334808 151 3701 7251 None Genewiz TFORF0062 CTBP2 NM_022802, ENST00000309035 152 3702 7252 None Genewiz TFORF0063 GTF3C2 NM_001521, NM_001035521, ENST00000359541, 153 3703 7253 None ENST00000264720 Genewiz TFORF0064 DENND4A NM_005848, XM_017021863, XM_005254121, 154 3704 7254 None ENST00000431932 Genewiz TFORF0065 DENND4A NM_001144823, ENST00000443035 155 3705 7255 None Genewiz TFORF0066 ZNF451 NM_001257273, ENST00000370708 156 3706 7256 None Genewiz TFORF0067 ZNF451 NM_015555, ENST00000357489 157 3707 7257 None Genewiz TFORF0068 ZNF451 NM_001031623, ENST00000370706 158 3708 7258 None Genewiz TFORF0069 CSDE1 NM_007158, NM_001242893, ENST00000339438, 159 3709 7259 None ENST00000261443 Genewiz TFORF0070 CSDE1 NM_001242891, ENST00000610726, ENST00000438362 160 3710 7260 None Genewiz TFORF0071 CSDE1 NM_001130523, ENST00000369530 161 3711 7261 None Genewiz TFORF0072 CSDE1 NM_001007553, NM_001242892, ENST00000358528, 162 3712 7262 None ENST00000534699 Genewiz TFORF0073 ZNF454 NM_001178090, NM_182594, NM_001323306, 163 3713 7263 None NM_001178089, ENST00000320129, ENST00000519564 Genewiz TFORF0074 BACH2 NM_021813, NM_001170794, XM_017011166, 164 3714 7264 None XM_011536040, XM_017011165, XM_011536039, XM_005248759, ENST00000257749, ENST00000537989, ENST00000343122 Genewiz TFORF0075 KCNIP4 XM_017008653, NM_001035004, NM_147182, 165 3715 7265 None ENST00000359001, ENST00000509207 Genewiz TFORF0076 KCNIP4 NM_001035003, ENST00000382148 166 3716 7266 None Genewiz TFORF0077 KCNIP4 NM_147183, ENST00000382150 167 3717 7267 None Genewiz TFORF0078 KCNIP4 NM_147181, ENST00000447367 168 3718 7268 None Genewiz TFORF0079 KCNIP3 NM_013434, ENST00000295225 169 3719 7269 None Genewiz TFORF0080 KCNIP3 NM_001034914, ENST00000468529 170 3720 7270 None Genewiz TFORF0081 KCNIP2 NM_014591, ENST00000461105 171 3721 7271 None Genewiz TFORF0082 KCNIP2 NM_173194, ENST00000348850 172 3722 7272 None Genewiz TFORF0083 KCNIP2 NM_173193, ENST00000353068 173 3723 7273 None Genewiz TFORF0084 KCNIP2 NM_173195, ENST00000343195 174 3724 7274 None Genewiz TFORF0085 KCNIP2 NM_173191, ENST00000356640 175 3725 7275 None Genewiz TFORF0086 LIN28B XM_011535818, ENST00000637759 176 3726 7276 None Genewiz TFORF0087 LIN28A NM_024674, XM_011542148, ENST00000326279, 177 3727 7277 None ENST00000254231 Genewiz TFORF0088 ZNF408 NM_024741, ENST00000311764 178 3728 7278 None Genewiz TFORF0089 ZEB1 NM_001128128, ENST00000446923 179 3729 7279 None Genewiz TFORF0090 ZEB1 NM_001174093, ENST00000560721 180 3730 7280 None Genewiz TFORF0091 ZEB1 NM_001174096, ENST00000361642 181 3731 7281 None Genewiz TFORF0092 ZEB1 NM_001174095, ENST00000542815 182 3732 7282 None Genewiz TFORF0093 ZEB1 NM_030751, ENST00000320985 183 3733 7283 None Genewiz TFORF0094 ZEB2 XM_017005414, XM_017005415, ENST00000638087, 184 3734 7284 None ENST00000638007, ENST00000636413, ENST00000637045, ENST00000637304 Genewiz TFORF0095 ZEB2 NM_001171653, ENST00000539609 185 3735 7285 None Genewiz TFORF0096 ZEB2 NM_014795, XM_006712881, XM_006712882, 186 3736 7286 None ENST00000627532, ENST00000409487, ENST00000558170 Genewiz TFORF0097 PSMD14 NM_005805, ENST00000409682 187 3737 7287 None Genewiz TFORF0098 MSGN1 NM_001105569, ENST00000281047 188 3738 7288 None Genewiz TFORF0099 HLF XM_017024556, ENST00000430986, ENST00000573945, 189 3739 7289 None ENST00000575345 Genewiz TFORF0100 NOV NM_002514, ENST00000259526 190 3740 7290 None Genewiz TFORF0101 FOXC2 NM_005251, ENST00000320354 191 3741 7291 None Genewiz TFORF0102 FOXC1 NM_001453, ENST00000380874 192 3742 7292 None Genewiz TFORF0103 LEUTX NM_001143832, ENST00000396841 193 3743 7293 None Genewiz TFORF0104 SMAD9 NM_001127217, ENST00000399275, ENST00000379826 194 3744 7294 None Genewiz TFORF0105 TRERF1 NM_001297573, XM_017011048, ENST00000541110 195 3745 7295 None Genewiz TFORF0106 TRERF1 XM_017011053, XM_017011052, ENST00000340840 196 3746 7296 None Genewiz TFORF0107 TRERF1 NM_033502, XM_017011049, ENST00000372922 197 3747 7297 None Genewiz TFORF0108 TRERF1 XM_017011054, ENST00000354325 198 3748 7298 None Genewiz TFORF0109 SMAD6 NM_005585, ENST00000288840 199 3749 7299 None Genewiz TFORF0110 SMAD7 NM_001190821, ENST00000589634 200 3750 7300 None Genewiz TFORF0111 SMAD7 NM_001190822, ENST00000591805 201 3751 7301 None Genewiz TFORF0112 SMAD2 NM_001135937, XM_017025746, ENST00000356825, 202 3752 7302 None ENST00000586040 Genewiz TFORF0113 SMAD3 NM_001145104, ENST00000537194 203 3753 7303 None Genewiz TFORF0114 SMAD3 NM_001145102, ENST00000540846 204 3754 7304 None Genewiz TFORF0115 SMAD3 NM_001145103, ENST00000439724 205 3755 7305 None Genewiz TFORF0116 FOXM1 NM_001243088, ENST00000627656 206 3756 7306 None Genewiz TFORF0117 FOXM1 NM_202003, ENST00000361953 207 3757 7307 None Genewiz TFORF0118 FOXM1 NM_202002, ENST00000342628 208 3758 7308 None Genewiz TFORF0119 ELMSAN1 XM_005268204, XM_005268205, XM_005268206, 209 3759 7309 None ENST00000423556 Genewiz TFORF0120 ELMSAN1 NM_194278, NM_001043318, ENST00000394071, 210 3760 7310 None ENST00000286523 Genewiz TFORF0121 BARX2 NM_003658, ENST00000281437 211 3761 7311 None Genewiz TFORF0122 BARX1 NM_021570, ENST00000253968 212 3762 7312 None Genewiz TFORF0123 OLIG2 NM_005806, XM_005260908, ENST00000382357, 213 3763 7313 None ENST00000333337 Genewiz TFORF0124 OLIG1 NM_138983, ENST00000382348 214 3764 7314 None Genewiz TFORF0125 RLF NM_012421, ENST00000372771 215 3765 7315 None Genewiz TFORF0126 CXXC1 NM_014593, XM_017025718, ENST00000285106 216 3766 7316 None Genewiz TFORF0127 SP8 NM_198956, ENST00000361443 217 3767 7317 None Genewiz TFORF0128 SP8 NM_182700, ENST00000418710 218 3768 7318 None Genewiz TFORF0129 SP9 NM_001145250, ENST00000394967 219 3769 7319 None Genewiz TFORF0130 HMGB3 NM_001301229, NM_005342, NM_001301228, 220 3770 7320 None ENST00000325307, ENST00000448905 Genewiz TFORF0131 HMGB4 NM_145205, ENST00000519684, ENST00000522796 221 3771 7321 None Genewiz TFORF0132 FIZ1 NM_032836, XM_005259352, ENST00000221665 222 3772 7322 None Genewiz TFORF0133 ZNF780A NM_001142579, ENST00000414720 223 3773 7323 None Genewiz TFORF0134 ZNF780A XM_011526772, XM_011526771, XM_005258773, 224 3774 7324 None XM_006723150, NM_001142577, ENST00000455521, ENST00000594395 Genewiz TFORF0135 ZNF780A XM_017026617, XM_017026618, XM_017026619, 225 3775 7325 None NM_001010880, NM_001142578, ENST00000340963, ENST00000595687 Genewiz TFORF0136 ZNF780B NM_001005851, XM_005258593, XM_017026427, 226 3776 7326 None XM_017026426, XM_017026425, ENST00000617676, ENST00000434248 Genewiz TFORF0137 NFX1 NM_002504, ENST00000379540 227 3777 7327 None Genewiz TFORF0138 NFX1 NM_147134, ENST00000318524 228 3778 7328 None Genewiz TFORF0139 UBTF NM_001076684, NM_001076683, XM_017025004, 229 3779 7329 None XM_017025003, ENST00000343638, ENST00000533177, ENST00000393606, ENST00000526094 Genewiz TFORF0140 UBTF NM_014233, XM_006722059, XM_006722061, 230 3780 7330 None XM_006722060, ENST00000302904, ENST00000436088, ENST00000529383 Genewiz TFORF0141 POU5F1B NM_001159542, ENST00000465342 231 3781 7331 None Genewiz TFORF0142 FXN NM_001161706, ENST00000396364 232 3782 7332 None Genewiz TFORF0143 FXN NM_181425, ENST00000396366 233 3783 7333 None Genewiz TFORF0144 PHOX2A NM_005169, ENST00000298231 234 3784 7334 None Genewiz TFORF0145 PHOX2B NM_003924, ENST00000226382 235 3785 7335 None Genewiz TFORF0146 THAP1 NM_199003, ENST00000345117 236 3786 7336 None Genewiz TFORF0147 SNW1 NM_001318844, ENST00000555761 237 3787 7337 None Genewiz TFORF0148 THAP4 NM_001164356, ENST00000402136 238 3788 7338 None Genewiz TFORF0149 THAP4 NM_015963, ENST00000407315 239 3789 7339 None Genewiz TFORF0150 THAPS NM_182529, ENST00000313516 240 3790 7340 None Genewiz TFORF0151 THAP5 NM_001130475, ENST00000415914 241 3791 7341 None Genewiz TFORF0152 THAP6 NM_001317791, XM_017007800, ENST00000380837 242 3792 7342 None Genewiz TFORF0153 THAP6 XM_005262774, ENST00000507556 243 3793 7343 None Genewiz TFORF0154 THAP7 NM_030573, NM_001008695, ENST00000215742, 244 3794 7344 None ENST00000399133 Genewiz TFORF0155 THAP9 NM_024672, ENST00000302236 245 3795 7345 None Genewiz TFORF0156 EWSR1 NM_013986, ENST00000414183 246 3796 7346 None Genewiz TFORF0157 EWSR1 NM_001163286, ENST00000332035 247 3797 7347 None Genewiz TFORF0158 EWSR1 NM_005243, ENST00000397938 248 3798 7348 None Genewiz TFORF0159 EWSR1 NM_001163287, ENST00000333395 249 3799 7349 None Genewiz TFORF0160 EWSR1 NM_001163285, ENST00000406548 250 3800 7350 None Genewiz TFORF0161 MLLT1 NM_005934, ENST00000252674 251 3801 7351 None Genewiz TFORF0162 PES1 NM_001282328, NM_001282327, XM_017028678, 252 3802 7352 None ENST00000402281, ENST00000405677 Genewiz TFORF0163 PES1 NM_001243225, ENST00000335214 253 3803 7353 None Genewiz TFORF0164 SIX4 NM_017420, ENST00000216513 254 3804 7354 None Genewiz TFORF0165 SIX5 NM_175875, ENST00000317578 255 3805 7355 None Genewiz TFORF0166 SIX6 NM_007374, ENST00000327720 256 3806 7356 None Genewiz TFORF0167 SIX1 NM_005982, ENST00000247182 257 3807 7357 None Genewiz TFORF0168 SIX2 NM_016932, ENST00000303077 258 3808 7358 None Genewiz TFORF0169 SIX3 NM_005413, ENST00000260653 259 3809 7359 None Genewiz TFORF0170 ZNF587B NM_001204818, ENST00000442832 260 3810 7360 None Genewiz TFORF0171 ZFP69 XM_006710606, NM_198494, NM_001320179, 261 3811 7361 None ENST00000372706, ENST00000372705 Genewiz TFORF0172 ONECUT2 NM_004852, ENST00000491143 262 3812 7362 None Genewiz TFORF0173 RELB XM_005259127, ENST00000505236 263 3813 7363 None Genewiz TFORF0174 RELA NM_021975, ENST00000406246 264 3814 7364 None Genewiz TFORF0175 RELA NM_001243985, ENST00000612991 265 3815 7365 None Genewiz TFORF0176 RELA NM_001145138, ENST00000308639 266 3816 7366 None Genewiz TFORF0177 HDGF NM_001126051, ENST00000368209 267 3817 7367 None Genewiz TFORF0178 HDGF NM_001126050, ENST00000368206 268 3818 7368 None Genewiz TFORF0179 ZFP62 NM_152283, XM_017009714, XM_017009712, 269 3819 7369 None XM_017009710, XM_017009715, XM_017009713, XM_017009711, ENST00000512132 Genewiz TFORF0180 ZFP62 NM_001172638, XM_017009717, ENST00000502412 270 3820 7370 None Genewiz TFORF0181 ESX1 NM_153448, ENST00000372588 271 3821 7371 None Genewiz TFORF0182 SHOX2 NM_006884, ENST00000441443 272 3822 7372 None Genewiz TFORF0183 SHOX2 NM_001163678, ENST00000483851 273 3823 7373 None Genewiz TFORF0184 SHOX2 XM_006713728, XM_017007055, ENST00000425436 274 3824 7374 None Genewiz TFORF0185 SHOX2 NM_003030, ENST00000389589 275 3825 7375 None Genewiz TFORF0186 DMRTC1 XM_017029725, XM_005262288, XM_005262287, 276 3826 7376 None NM_033053, ENST00000595412 Genewiz TFORF0187 DMRTC2 XM_017027128, XM_017027126, XM_017027125, 277 3827 7377 None XM_017027127, ENST00000596827 Genewiz TFORF0188 DMRTC2 NM_001040283, ENST00000269945 278 3828 7378 None Genewiz TFORF0189 RFX8 NM_001145664, ENST00000428343 279 3829 7379 None Genewiz TFORF0190 RFX4 NM_213594, ENST00000392842 280 3830 7380 None Genewiz TFORF0191 RFX4 NM_032491, ENST00000229387 281 3831 7381 None Genewiz TFORF0192 RFX4 NM_001206691, ENST00000357881 282 3832 7382 None Genewiz TFORF0193 RFX5 NM_000449, NM_001025603, XM_011509850, 283 3833 7383 None XM_011509848, XM_005245405, XM_011509849, XM_005245406, XM_011509847, ENST0290524, ENST00000368870, ENST00000452671 Genewiz TFORF0194 RFX6 NM_173560, ENST00000332958 284 3834 7384 None Genewiz TFORF0195 RFX7 XM_017022508, XM_011521925, XM_017022507, 285 3835 7385 None ENST00000559447 Genewiz TFORF0196 ZNF844 NM_001136501, ENST00000439326 286 3836 7386 None Genewiz TFORF0197 RFX1 NM_002918, ENST00000254325 287 3837 7387 None Genewiz TFORF0198 RFX2 NM_134433, ENST00000592546 288 3838 7388 None Genewiz TFORF0199 RFX2 NM_000635, XM_011528171, ENST00000303657, 289 3839 7389 None ENST00000359161 Genewiz TFORF0200 RFX3 NM_002919, ENST00000358730 290 3840 7390 None Genewiz TFORF0201 RFX3 NM_001282117, ENST00000302303 291 3841 7391 None Genewiz TFORF0202 RFX3 NM_134428, NM_001282116, ENST00000382004, 292 3842 7392 None ENST00000617270 Genewiz TFORF0203 ELK1 NM_001257168, ENST00000343894 293 3843 7393 None Genewiz TFORF0204 ELK4 NM_001973, XM_005244950, XM_005244951, 294 3844 7394 None ENST00000616704, ENST00000357992 Genewiz TFORF0205 DBP NM_001352, ENST00000222122 295 3845 7395 None Genewiz TFORF0206 ETV3L NM_001004341, ENST00000454449 296 3846 7396 None Genewiz TFORF0207 FOXP2 NM_148899, ENST00000360232 297 3847 7397 None Genewiz TFORF0208 FOXP2 NM_014491, ENST00000393494, ENST00000350908 298 3848 7398 None Genewiz TFORF0209 FOXP2 XM_017012801, NM_148898, ENST00000408937 299 3849 7399 None Genewiz TFORF0210 FOXP2 NM_148900, ENST00000403559 300 3850 7400 None Genewiz TFORF0211 FOXP3 XM_006724533, ENST00000455775 301 3851 7401 None Genewiz TFORF0212 FOXP3 NM_001114377, ENST00000376199, ENST00000518685 302 3852 7402 None Genewiz TFORF0213 FOXP1 NM_001244813, XM_011533588, ENST00000614176 303 3853 7403 None Genewiz TFORF0214 FOXP1 NM_001244816, NM_032682, XM_011533585, 304 3854 7404 None XM_006713102, XM_017006165, XM_011533584, XM_006713103, XM_006713104, XM_017006166, NM_001244814, ENST00000318789, ENST00000475937, ENST00000498215 Genewiz TFORF0215 FOXP1 NM_001244808, ENST00000493089 305 3855 7405 None Genewiz TFORF0216 FOXP1 NM_001244812, ENST00000484350 306 3856 7406 None Genewiz TFORF0217 FOXP1 NM_001244815, ENST00000491238 307 3857 7407 None Genewiz TFORF0218 FOXP1 NM_001244810, ENST00000615603 308 3858 7408 None Genewiz TFORF0219 ADNP NM_001282532, NM_001282531, NM_015339, 309 3859 7409 None NM_181442, XM_017027758, XM_017027759, XM_011528747, XM_017027757, ENST00000371602, ENST00000621696, ENST00000349014, ENST00000396029, ENST00000396032 Genewiz TFORF0220 FOXP4 NM_138457, ENST00000373063 310 3860 7410 None Genewiz TFORF0221 FOXP4 NM_001012427, ENST00000373057 311 3861 7411 None Genewiz TFORF0222 FOXP4 NM_001012426, ENST00000373060, ENST00000307972 312 3862 7412 None Genewiz TFORF0223 ZNF479 XM_017012777, XM_011515604, XM_011515608, 313 3863 7413 None NM_033273, ENST00000331162, ENST00000319636 Genewiz TFORF0224 ZNF592 XM_005254996, XM_017022734, NM_014630, 314 3864 7414 None XM_011522247, XM_011522246, ENST00000560079, ENST00000299927 Genewiz TFORF0225 ZNF593 XM_017001398, ENST00000270812 315 3865 7415 None Genewiz TFORF0226 ZNF593 NM_015871, ENST00000374266 316 3866 7416 None Genewiz TFORF0227 ZNF596 NM_001287256, NM_001287255, NM_173539, 317 3867 7417 None NM_001042416, NM_001042415, NM_001287254, XM_017013166, ENST00000320552, ENST00000308811, ENST00000398612 Genewiz TFORF0228 ZNF594 NM_032530, XM_005256827, ENST00000399604, 318 3868 7418 None ENST00000575779 Genewiz TFORF0229 ZNF595 NM_001286052, ENST00000609518 319 3869 7419 None Genewiz TFORF0230 ZNF595 NM_001286053, NM_001286054, ENST00000608255 320 3870 7420 None Genewiz TFORF0231 ZNF595 NM_182524, ENST00000610261 321 3871 7421 None Genewiz TFORF0232 ZNF599 NM_001007248, ENST00000329285 322 3872 7422 None Genewiz TFORF0233 MNX1 NM_005515, ENST00000252971 323 3873 7423 None Genewiz TFORF0234 MNX1 NM_001165255, ENST00000543409 324 3874 7424 None Genewiz TFORF0235 MYCN NM_005378, NM_001293228, XM_017004168, 325 3875 7425 None ENST00000281043 Genewiz TFORF0236 MYCN NM_001293231, ENST00000638417 326 3876 7426 None Genewiz TFORF0237 ZNF600 NM_198457, ENST00000338230 327 3877 7427 None Genewiz TFORF0238 MYCL NM_001033082, ENST00000397332 328 3878 7428 None Genewiz TFORF0239 MYCL NM_001033081, ENST00000372816 329 3879 7429 None Genewiz TFORF0240 MYCL NM_005376, ENST00000372815 330 3880 7430 None Genewiz TFORF0241 ZNF606 NM_025027, XM_005259276, ENST00000341164 331 3881 7431 None Genewiz TFORF0242 ZNF607 NM_032689, XM_006723435, ENST00000355202 332 3882 7432 None Genewiz TFORF0243 ZNF607 NM_001172677, XM_006723436, ENST00000395835 333 3883 7433 None Genewiz TFORF0244 SUB1 XM_017008986, XM_017008987, NM_006713, 334 3884 7434 None XM_011513944, ENST00000506237, ENST00000512913, ENST00000265073, ENST00000515355, ENST00000502897 Genewiz TFORF0245 CAMTAZ NM_001171166, ENST00000381311 335 3885 7435 None Genewiz TFORF0246 CAMTA2 NM_001171167, ENST00000414043 336 3886 7436 None Genewiz TFORF0247 CAMTA2 XM_006721478, ENST00000572543 337 3887 7437 None Genewiz TFORF0248 CAMTA2 NM_015099, ENST00000348066 338 3888 7438 None Genewiz TFORF0249 CAMTAZ NM_001171168, ENST00000361571 339 3889 7439 None Genewiz TFORF0250 VEZF1 NM_007146, ENST00000581208 340 3890 7440 None Genewiz TFORF0251 VEZF1 XM_017025018, XM_017025017, ENST00000584396 341 3891 7441 None Genewiz TFORF0252 CAMTA1 NM_001242701, ENST00000557126 342 3892 7442 None Genewiz TFORF0253 CAMTA1 NM_015215, ENST00000303635 343 3893 7443 None Genewiz TFORF0254 CAMTA1 NM_001195563, ENST00000473578 344 3894 7444 None Genewiz TFORF0255 TCF4 NM_001243228, ENST00000564403 345 3895 7445 None Genewiz TFORF0256 TCF4 NM_001243233, XM_017025956, ENST00000537856, 346 3896 7446 None ENST00000561992, ENST00000570177 Genewiz TFORF0257 TCF4 NM_001306208, ENST00000564228 347 3897 7447 None Genewiz TFORF0258 TCF4 XM_017025949, XM_017025937, XM_017025948, 348 3898 7448 None ENST00000568740 Genewiz TFORF0259 TCF4 NM_001243236, ENST00000561831 349 3899 7449 None Genewiz TFORF0260 TCF4 NM_001243226, ENST00000398339 350 3900 7450 None Genewiz TFORF0261 TCF4 NM_001243234, ENST00000457482 351 3901 7451 None Genewiz TFORF0262 TCF4 NM_001306207, XM_017025947, XM_017025936, 352 3902 7452 None XM_017025946, ENST00000540999 Genewiz TFORF0263 TCF4 XM_011526160, XM_005266754, XM_005266755, 353 3903 7453 None XM_005266752, ENST00000636822 Genewiz TFORF0264 TCF4 NM_001243227, XM_005266744, XM_005266743, 354 3904 7454 None XM_005266739, ENST00000636400, ENST00000568673, ENST00000537578 Genewiz TFORF0265 TCF4 NM_001243232, ENST00000544241 355 3905 7455 None Genewiz TFORF0266 TCF4 NM_001243231, ENST00000543082 356 3906 7456 None Genewiz TFORF0267 TCF4 XM_017025939, ENST00000638154 357 3907 7457 None Genewiz TFORF0268 TCF4 NM_001243235, ENST00000570287 358 3908 7458 None Genewiz TFORF0269 TCF4 NM_001243230, ENST00000566286 359 3909 7459 None Genewiz TFORF0270 TCF7 NM_201632, NM_213648, ENST00000520958, 360 3910 7460 None ENST00000395023 Genewiz TFORF0271 TCF7 NM_001134851, ENST00000518915 361 3911 7461 None Genewiz TFORF0272 TCF7 NM_201634, ENST00000378560 362 3912 7462 None Genewiz TFORF0273 TCF3 XM_011528221, XM_006722857, XM_006722858, 363 3913 7463 None ENST00000453954 Genewiz TFORF0274 TCF3 XM_017027180, ENST00000395423 364 3914 7464 None Genewiz TFORF0275 TCF3 NM_003200, XM_017027181, ENST00000262965, 365 3915 7465 None ENST00000611869 Genewiz TFORF0276 TCF3 NM_001136139, XM_017027182, ENST00000588136 366 3916 7466 None Genewiz TFORF0277 TCF3 XM_011528227, ENST00000344749 367 3917 7467 None Genewiz TFORF0278 ARID2 NM_152641, ENST00000334344 368 3918 7468 None Genewiz TFORF0279 ZNF776 NM_173632, ENST00000317178 369 3919 7469 None Genewiz TFORF0280 IGFBP1 NM_000596, ENST00000275525 370 3920 7470 None Genewiz TFORF0281 ZNF90 NM_007138, ENST00000418063 371 3921 7471 None Genewiz TFORF0282 MEF2C NM_002397, NM_001193350, XM_005248511, 372 3922 7472 None XM_011543396, XM_006714619, ENST00000504921, ENST00000437473, ENST00000636294 Genewiz TFORF0283 MEF2C XM_011543401, XM_017009482, ENST00000510942 373 3923 7473 None Genewiz TFORF0284 MEF2C XM_017009483, ENST00000627659 374 3924 7474 None Genewiz TFORF0285 MEF2C NM_001131005, XM_017009478, ENST00000424173, 375 3925 7475 None ENST00000625674 Genewiz TFORF0286 MEF2C NM_001193348, ENST00000628656 376 3926 7476 None Genewiz TFORF0287 MEF2C XM_011543397, XM_017009475, ENST00000625585 377 3927 7477 None Genewiz TFORF0288 MEF2C NM_001308002, XM_017009477, XM_017009476, 378 3928 7478 None ENST00000629612, ENST00000508569 Genewiz TFORF0289 MEF2C NM_001193349, ENST00000626391 379 3929 7479 None Genewiz TFORF0290 MEF2C XM_011543400, XM_017009479, XM_017009481, 380 3930 7480 None XM_017009480, ENST00000636998, ENST00000514028, ENST00000637732, ENST00000514015 Genewiz TFORF0291 MEF2C NM_001193347, XM_006714625, ENST00000340208 381 3931 7481 None Genewiz TFORF0292 MEF2B NM_001145785, ENST00000424583 382 3932 7482 None Genewiz TFORF0293 MEF2A XM_011521586, NM_001130927, ENST00000558812 383 3933 7483 None Genewiz TFORF0294 MEF2A XM_005254915, NM_001130926, NM_001171894, 384 3934 7484 None ENST00000338042, ENST00000557785 Genewiz TFORF0295 MEF2A NM_001319206, XM_011521583, ENST00000557942 385 3935 7485 None Genewiz TFORF0296 MEF2A XM_011521587, NM_001130928, ENST00000449277 386 3936 7486 None Genewiz TFORF0297 CREM NM_182720, ENST00000356917 387 3937 7487 None Genewiz TFORF0298 CREM NM_182717, ENST00000473940 388 3938 7488 None Genewiz TFORF0299 CREM NM_001267570, ENST00000468236 389 3939 7489 None Genewiz TFORF0300 CREM XM_017015722, NM_183060, ENST00000374728, 390 3940 7490 None ENST00000348787 Genewiz TFORF0301 CREM XM_006717378, NM_181571, ENST00000345491 391 3941 7491 None Genewiz TFORF0302 CREM NM_182718, ENST00000488328 392 3942 7492 None Genewiz TFORF0303 CREM NM_182769, ENST00000342105 393 3943 7493 None Genewiz TFORF0304 CREM NM_182719, ENST00000487763 394 3944 7494 None Genewiz TFORF0305 CREM NM_182721, ENST00000474931 395 3945 7495 None Genewiz TFORF0306 CREM NM_001267564, ENST00000395887 396 3946 7496 None Genewiz TFORF0307 CREM NM_182771, ENST00000361599 397 3947 7497 None Genewiz TFORF0308 CREM XM_006717383, NM_183012, ENST00000374734 398 3948 7498 None Genewiz TFORF0309 CREM NM_182724, ENST00000490511 399 3949 7499 None Genewiz TFORF0310 CREM NM_183013, ENST00000354759, ENST00000439705 400 3950 7500 None Genewiz TFORF0311 CREM XM_011519324, XM_011519325, ENST00000479070 401 3951 7501 None Genewiz TFORF0312 CREM XM_006717379, NM_183011, ENST00000337656 402 3952 7502 None Genewiz TFORF0313 CREM NM_001267567, ENST00000463314 403 3953 7503 None Genewiz TFORF0314 CREM NM_001881, ENST00000374726, ENST00000489321 404 3954 7504 None Genewiz TFORF0315 CREM NM_182723, ENST00000488741 405 3955 7505 None Genewiz TFORF0316 MEF2D XM_006711333, NM_001271629, XM_017001314, 406 3956 7506 None XM_017001315, ENST00000360595 Genewiz TFORF0317 MEF2D XM_006711334, ENST00000464356 407 3957 7507 None Genewiz TFORF0318 ZNF117 NM_015852, ENST00000282869, ENST00000620222 408 3958 7508 None Genewiz TFORF0319 MSLN XM_017022857, XM_011522348, NM_013404, 409 3959 7509 None ENST00000382862 Genewiz TFORF0320 MSLN NM_001177355, NM_005823, ENST00000563941, 410 3960 7510 None ENST00000545450, ENST00000566549 Genewiz TFORF0321 ZNF112 NM_001083335, ENST00000337401 411 3961 7511 None Genewiz TFORF0322 ZNF112 NM_013380, ENST00000354340 412 3962 7512 None Genewiz TFORF0323 LCORL XM_011513822, ENST00000635767 413 3963 7513 None Genewiz TFORF0324 LCORL NM_153686, ENST00000326877 414 3964 7514 None Genewiz TFORF0325 LCORL NM_001166139, ENST00000382226 415 3965 7515 None Genewiz TFORF0326 ZNF337 NM_015655, XM_006723558, XM_011529219, 416 3966 7516 None NM_001290261, ENST00000376436, ENST00000252979 Genewiz TFORF0327 ZNF334 XM_011528892, XM_017027934, ENST00000593880 417 3967 7517 None Genewiz TFORF0328 ZNF334 XM_017027938, NM_018102, XM_017027936, 418 3968 7518 None ENST00000347606 Genewiz TFORF0329 ZNF334 XM_017027944, ENST00000457685 419 3969 7519 None Genewiz TFORF0330 ZNF334 NM_001270497, XM_011528897, XM_017027941, 420 3970 7520 None XM_017027939, XM_017027940, ENST00000615481, ENST00000625284 Genewiz TFORF0331 ZNF335 NM_022095, ENST00000322927 421 3971 7521 None Genewiz TFORF0332 ZNF333 NM_001300912, ENST00000540689 422 3972 7522 None Genewiz TFORF0333 ZNF333 XM_011528362, NM_032433, XM_017027367, 423 3973 7523 None ENST00000292530 Genewiz TFORF0334 ZNF331 XM_017026937, XM_017026936, XM_011527078, 424 3974 7524 None XM_011527076, NM_001317120, NM_018555, XM_017026939, XM_017026938, NM_001317114, NM_001317113, NM_001079906, NM_001317115, NM_001253799, NM_001253798, XM_017026940, NM_001253800, NM_001253801, NM_001317117, NM_001317116, NM_001317118, NM_001317121, NM_001317119, NM_001079907, ENST00000253144, ENST00000511593, ENST00000449416, ENST00000411977, ENST00000511154, ENST0513999, ENST00000512387 Genewiz TFORF0335 ZNF91 NM_001300951, ENST00000397082 425 3975 7525 None Genewiz TFORF0336 ZNF91 NM_003430, ENST00000300619 426 3976 7526 None Genewiz TFORF0337 MECOM NM_001205194, NM_005241, NM_001105078, 427 3977 7527 None XM_005247223, ENST00000628990, ENST00000468789 Genewiz TFORF0338 MECOM NM_001105077, XM_017005874, ENST00000264674 428 3978 7528 None Genewiz TFORF0339 MECOM NM_001164000, XM_017005875, ENST00000464456 429 3979 7529 None Genewiz TFORF0340 MECOM XM_005247220, XM_005247221, XM_005247219, 430 3980 7530 None ENST00000472280, ENST00000433243 Genewiz TFORF0341 MECOM XM_005247215, ENST00000494292 431 3981 7531 None Genewiz TFORF0342 ETF1 NM_001256302, NM_001291975, NM_001291974, 432 3982 7532 None XM_005271921, ENST00000499810 Genewiz TFORF0343 ETF1 NM_004730, ENST00000360541 433 3983 7533 None Genewiz TFORF0344 ETF1 NM_001282185, ENST00000503014 434 3984 7534 None Genewiz TFORF0345 MYT1L NM_015025, XM_017003614, XM_017003613, 435 3985 7535 None ENST00000428368 Genewiz TFORF0346 MYT1L NM_001303052, XM_017003609, XM_017003610, 436 3986 7536 None ENST00000399161 Genewiz TFORF0347 FOXB1 NM_012182, ENST00000396057 437 3987 7537 None Genewiz TFORF0348 FOXB2 NM_001013735, ENST00000376708 438 3988 7538 None Genewiz TFORF0349 ZNF48 NM_001214907, ENST00000622647 439 3989 7539 None Genewiz TFORF0350 ZNF48 NM_001214906, NM_001214909, NM_152652, 440 3990 7540 None ENST00000613509, ENST00000320159 Genewiz TFORF0351 FOXL2 NM_023067, ENST00000330315 441 3991 7541 None Genewiz TFORF0352 FOXL1 NM_005250, ENST00000320241 442 3992 7542 None Genewiz TFORF0353 ZNF33B NM_006955, ENST00000359467 443 3993 7543 None Genewiz TFORF0354 ZNF33A NM_001278170, ENST00000628825 444 3994 7544 None Genewiz TFORF0355 ZNF33A XM_011519650, NM_006954, ENST00000432900 445 3995 7545 None Genewiz TFORF0356 ZNF33A XM_011519651, NM_006974, ENST00000458705 446 3996 7546 None Genewiz TFORF0357 ZNF33A NM_001278173, ENST00000307441 447 3997 7547 None Genewiz TFORF0358 ZNF33A NM_001324175, NM_001278174, NM_001278179, 448 3998 7548 None NM_001278171, NM_001324176, NM_001278178, NM_001324177, ENST00000374618 Genewiz TFORF0359 TUB NM_003320, ENST00000305253 449 3999 7549 None Genewiz TFORF0360 TUB NM_177972, ENST00000299506 450 4000 7550 None Genewiz TFORF0361 TCEB3 NM_003198, ENST00000418390, ENST00000613537 451 4001 7551 None Genewiz TFORF0362 TCEB2 NM_207013, ENST00000262306 452 4002 7552 None Genewiz TFORF0363 TCEB2 NM_007108, ENST00000409906 453 4003 7553 None Genewiz TFORF0364 TCEB1 NM_001204861, NM_001204862, NM_001204858, 454 4004 7554 None NM_001204859, NM_005648, NM_001204860, NM_001204857, XM_011517580, XM_011517581, ENST00000518127, ENST00000622804, ENST00000520242, ENST00000519487, ENST00000284811, ENST00000522337, ENST00000523815 Genewiz TFORF0365 TCEB1 NM_001204863, NM_001204864, ENST00000520210 455 4005 7555 None Genewiz TFORF0366 NFYC XM_017001365, XM_005270894, XM_011541516, 456 4006 7556 None XM_005270895, NM_001308114, ENST00000308733 Genewiz TFORF0367 NFYC XM_006710661, NM_001308115, XM_006710658, 457 4007 7557 None ENST00000372652 Genewiz TFORF0368 NFYC NM_001142589, ENST00000427410 458 4008 7558 None Genewiz TFORF0369 NFYC NM_001142587, ENST00000456393 459 4009 7559 None Genewiz TFORF0370 NFYC NM_001142588, XM_017001367, XM_006710660, 460 4010 7560 None ENST00000425457 Genewiz TFORF0371 NFYC NM_001142590, ENST00000372653 461 4011 7561 None Genewiz TFORF0372 NFYB NM_006166, ENST00000240055, ENST00000551727 462 4012 7562 None Genewiz TFORF0373 NFYA NM_002505, ENST00000341376 463 4013 7563 None Genewiz TFORF0374 NFYA NM_021705, ENST00000353205 464 4014 7564 None Genewiz TFORF0375 MYBL1 NM_001080416, ENST00000522677 465 4015 7565 None Genewiz TFORF0376 SMARCA1 NM_001282875, ENST00000371123 466 4016 7566 None Genewiz TFORF0377 SMARCA1 NM_001282874, ENST00000371121 467 4017 7567 None Genewiz TFORF0378 SMARCA1 NM_003069, XM_005262461, ENST00000371122 468 4018 7568 None Genewiz TFORF0379 SMARCA2 NM_001289400, ENST00000302401 469 4019 7569 None Genewiz TFORF0380 SMARCA2 NM_001289399, ENST00000382185, ENST00000417599 470 4020 7570 None Genewiz TFORF0381 SMARCA2 NM_139045, ENST00000357248, ENST00000382194 471 4021 7571 None Genewiz TFORF0382 SMARCA2 NM_001289397, ENST00000450198 472 4022 7572 None Genewiz TFORF0383 SMARCA2 NM_001289398, ENST00000324954 473 4023 7573 None Genewiz TFORF0384 SMARCA2 NM_003070, NM_001289396, ENST00000349721, 474 4024 7574 None ENST00000382203 Genewiz TFORF0385 SMARCA4 XM_017027167, NM_001128847, ENST00000590574 475 4025 7575 None Genewiz TFORF0386 SMARCA4 XM_017027168, NM_001128848, ENST00000444061 476 4026 7576 None Genewiz TFORF0387 SMARCA4 XM_017027165, NM_001128845, ENST00000541122 477 4027 7577 None Genewiz TFORF0388 SMARCA4 XM_006722845, XM_011528198, NM_001128849, 478 4028 7578 None XM_006722846, ENST00000450717 Genewiz TFORF0389 SMARCA4 NM_003072, NM_001128844, ENST00000344626, 479 4029 7579 None ENST00000429416 Genewiz TFORF0390 SMARCA4 XM_017027166, NM_001128846, ENST00000589677 480 4030 7580 None Genewiz TFORF0391 FERD3L NM_152898, ENST00000275461 481 4031 7581 None Genewiz TFORF0392 ZNF7 NM_003416, XM_011517293, XM_011517294, 482 4032 7582 None ENST00000528372 Genewiz TFORF0393 ZNF7 NM_001282795, XM_011517292, ENST00000446747 483 4033 7583 None Genewiz TFORF0394 ZNF7 XM_017013817, ENST00000325217 484 4034 7584 None Genewiz TFORF0395 ZNF7 NM_001282796, ENST00000525266 485 4035 7585 None Genewiz TFORF0396 ZNF7 NM_001282797, XM_011517297, XM_006716654, 486 4036 7586 None XM_006716656, ENST00000544249 Genewiz TFORF0397 ZNF2 NM_001291604, ENST00000611463 487 4037 7587 None Genewiz TFORF0398 ZNF2 NM_001291605, ENST00000611147 488 4038 7588 None Genewiz TFORF0399 ZNF2 NM_001017396, ENST00000617923 489 4039 7589 None Genewiz TFORF0400 ZNF2 NM_001282398, ENST00000622059 490 4040 7590 None Genewiz TFORF0401 ZNF2 NM_021088, ENST00000614034 491 4041 7591 None Genewiz TFORF0402 ZNF3 NM_017715, ENST00000413658 492 4042 7592 None Genewiz TFORF0403 ZNF3 NM_032924, NM_001318135, NM_001278290, 493 4043 7593 None NM_001278284, NM_001278287, ENST00000424697, ENST00000303915, ENST00000299667 Genewiz TFORF0404 CTCFL NM_001269043, ENST00000423479 494 4044 7594 None Genewiz TFORF0405 CTCFL NM_001269041, NM_080618, NM_001269040, 495 4045 7595 None ENST00000608263, ENST00000609232, ENST00000243914, ENST00000371196 Genewiz TFORF0406 CTCFL NM_001269052, NM_001269051, ENST00000608158, 496 4046 7596 None ENST00000481655 Genewiz TFORF0407 CTCFL NM_001269049, ENST00000433949 497 4047 7597 None Genewiz TFORF0408 CTCFL NM_001269048, ENST00000432255 498 4048 7598 None Genewiz TFORF0409 CTCFL NM_001269046, ENST00000429804 499 4049 7599 None Genewiz TFORF0410 CTCFL NM_001269055, ENST00000608903 500 4050 7600 None Genewiz TFORF0411 CTCFL NM_001269054, ENST00000502686 501 4051 7601 None Genewiz TFORF0412 CTCFL NM_001269044, ENST00000608440 502 4052 7602 None Genewiz TFORF0413 CTCFL NM_001269045, ENST00000608425 503 4053 7603 None Genewiz TFORF0414 CTCFL NM_001269050, ENST00000539382 504 4054 7604 None Genewiz TFORF0415 CTCFL NM_001269047, ENST00000422869 505 4055 7605 None Genewiz TFORF0416 ZNF559- NM_001202425, ENST00000446085, ENST00000602856 506 4056 7606 None ZNF177 Genewiz TFORF0417 SULT2A1 NM_003167, ENST00000222002 507 4057 7607 None Genewiz TFORF0418 CDC5L NM_001253, ENST00000371477 508 4058 7608 None Genewiz TFORF0419 SKIL NM_001145098, ENST00000426052 509 4059 7609 None Genewiz TFORF0420 SKIL NM_001145097, ENST00000413427 510 4060 7610 None Genewiz TFORF0421 SKIL XM_006713735, NM_005414, XM_005247721, 511 4061 7611 None NM_001248008, ENST00000259119, ENST00000458537 Genewiz TFORF0422 PBX2 NM_002586, ENST00000375050 512 4062 7612 None Genewiz TFORF0423 PBX3 NM_001134778, ENST00000447726 513 4063 7613 None Genewiz TFORF0424 PBX3 NM_006195, ENST00000373489 514 4064 7614 None Genewiz TFORF0425 PBX3 XM_006717132, ENST00000342287 515 4065 7615 None Genewiz TFORF0426 PBX3 XM_006717130, ENST00000373487 516 4066 7616 None Genewiz TFORF0427 PBX1 NM_002585, XM_005245229, ENST00000420696 517 4067 7617 None Genewiz TFORF0428 PBX1 NM_001204963, ENST00000627490 518 4068 7618 None Genewiz TFORF0429 PBX1 NM_001204961, XM_017001396, ENST00000367897 519 4069 7619 None Genewiz TFORF0430 ZNF883 NM_001101338, ENST00000619044 520 4070 7620 None Genewiz TFORF0431 AIRE NM_000383, ENST00000291582 521 4071 7621 None Genewiz TFORF0432 ARHGAP35 XM_017026714, NM_004491, ENST00000614079, 522 4072 7622 None ENST00000404338 Genewiz TFORF0433 ZSCAN32 NM_001324340, NM_001324342, NM_001324344, 523 4073 7623 None NM_001324341, NM_001324345, NM_017810, ENST00000304926 Genewiz TFORF0434 ZSCAN32 NM_001284527, XM_011522555, XM_017023371, 524 4074 7624 None ENST00000396846, ENST00000396852 Genewiz TFORF0435 ZSCAN32 NM_001284528, NM_001284529, ENST00000618425, 525 4075 7625 None ENST00000439568 Genewiz TFORF0436 ZSCAN31 NM_001135216, NM_030899, NM_001243241, 526 4076 7626 None NM_145909, NM_001135215, XM_011514809, XM_011514811, XM_011514813, XM_011514812, XM_005249295, XM_011514807, XM_017011196, XM_005249296, XM_011514808, ENST00000439158, ENST00000396838, ENST00000414429, ENST00000344279 Genewiz TFORF0437 ZSCAN31 NM_001243243, NM_001243242, NM_001243244, 527 4077 7627 None ENST00000611469, ENST00000446474 Genewiz TFORF0438 ZSCAN30 NM_001166012, NM_001112734, XM_006722371, 528 4078 7628 None XM_017025515, XM_005258183, ENST00000333206, ENST00000420878 Genewiz TFORF0439 ZSCAN30 NM_001288711, XM_011525789, XM_017025522, 529 4079 7629 None XM_017025519, XM_017025520, XM_017025521, ENST00000610712 Genewiz TFORF0440 ZFY NM_001145276, ENST00000625061 530 4080 7630 None Genewiz TFORF0441 ZEY XM_005262570, XM_017030075, NM_003411, 531 4081 7631 None ENST00000383052, ENST00000155093 Genewiz TFORF0442 ZFX XM_011545578, XM_017029792, XM_017029791, 532 4082 7632 None XM_017029790, XM_006724513, XM_017029789, XM_017029788, XM_017029793, ENST00000379188 Genewiz TFORF0443 ZFX NM_001178086, ENST00000539115 533 4083 7633 None Genewiz TFORF0444 ZFX XM_005274592, XM_011545581, XM_005274591, 534 4084 7634 None NM_001178084, NM_001178085, NM_003410, XM_017029794, XM_011545579, XM_017029795, ENST00000379177, ENST00000304543 Genewiz TFORF0445 ZNF805 NM_001145078, ENST00000354309 535 4085 7635 None Genewiz TFORF0446 ZNF805 NM_001023563, ENST00000414468 536 4086 7636 None Genewiz TFORF0447 TOPORS NM_001195622, ENST00000379858 537 4087 7637 None Genewiz TFORF0448 TOPORS NM_005802, ENST00000360538 538 4088 7638 None Genewiz TFORF0449 DNMT1 NM_001130823, ENST00000359526 539 4089 7639 None Genewiz TFORF0450 DNMT1 NM_001379, ENST00000340748 540 4090 7640 None Genewiz TFORF0451 THRA NM_199334, ENST00000450525, ENST00000546243 541 4091 7641 None Genewiz TFORF0452 THRA NM_001190918, ENST00000584985 542 4092 7642 None Genewiz TFORF0453 THRA NM_001190919, NM_003250, ENST00000394121, 543 4093 7643 None ENST00000264637 Genewiz TFORF0454 TAF4B NM_005640, ENST00000269142 544 4094 7644 None Genewiz TFORF0455 TAF4B NM_001293725, ENST00000578121 545 4095 7645 None Genewiz TFORF0456 HOXD9 NM_014213, ENST00000249499 546 4096 7646 None Genewiz TFORF0457 TRIM24 NM_015905, ENST00000343526 547 4097 7647 None Genewiz TFORF0458 TRIM24 NM_003852, ENST00000415680 548 4098 7648 None Genewiz TFORF0459 CREB5 NM_182899, ENST00000396299, ENST00000409603 549 4099 7649 None Genewiz TFORF0460 CREB5 XM_017012808, NM_182898, ENST00000357727 550 4100 7650 None Genewiz TFORF0461 CREB5 XM_005249906, XM_017012806, NM_004904, 551 4101 7651 None ENST00000396300 Genewiz TFORF0462 CREB5 NM_001011666, ENST00000396298 552 4102 7652 None Genewiz TFORF0463 HOXD1 NM_024501, ENST00000331462 553 4103 7653 None Genewiz TFORF0464 TRIM22 NM_006074, ENST00000379965 554 4104 7654 None Genewiz TFORF0465 NKX1-1 NM_001290079 555 4105 7655 None Genewiz TFORF0466 SLC15A1 NM_005073, ENST00000376503 556 4106 7656 None Genewiz TFORF0467 CARHSP1 NM_001278265, NM_001278262, NM_001278266, 557 4107 7657 None NM_001278264, NM_001278263, NM_001042476, NM_001278261, NM_001278260, NM_014316, XM_005255229, XM_011522444, ENST00000396593, ENST0610831, ENST00000614449, ENST00000619881, ENST00000618335, ENST00000611932, ENST00000311052, ENST00000561530, ENST00000567554 Genewiz TFORF0468 KAT7 NM_001199158, ENST00000510819 558 4108 7658 None Genewiz TFORF0469 KAT7 NM_001199157, ENST00000509773 559 4109 7659 None Genewiz TFORF0470 KAT7 NM_001199155, ENST00000424009 560 4110 7660 None Genewiz TFORF0471 KAT7 NM_001199156, ENST00000454930 561 4111 7661 None Genewiz TFORF0472 ZNF221 XM_017027232, NM_001297588, NM_001297589, 562 4112 7662 None NM_013359, ENST00000587682, ENST00000251269, ENST00000592350, ENST00000622072 Genewiz TFORF0473 SP140 NM_001278452, ENST00000343805 563 4113 7663 None Genewiz TFORF0474 SP140 NM_007237, ENST00000392045 564 4114 7664 None Genewiz TFORF0475 SP140 NM_001005176, ENST00000373645 565 4115 7665 None Genewiz TFORF0476 SP140 NM_001278453, ENST00000417495 566 4116 7666 None Genewiz TFORF0477 SP140 NM_001278451, ENST00000420434 567 4117 7667 None Genewiz TFORF0478 ZNF222 NM_013360, ENST00000187879 568 4118 7668 None Genewiz TFORF0479 ZNF222 NM_001129996, ENST00000391960 569 4119 7669 None Genewiz TFORF0480 HIF1A NM_001243084, ENST00000539097 570 4120 7670 None Genewiz TFORF0481 HIF1A NM_181054, ENST00000323441 571 4121 7671 None Genewiz TFORF0482 DMRTB1 NM_033067, ENST00000371445 572 4122 7672 None Genewiz TFORF0483 ZNF223 XM_017027258, NM_013361, XM_017027259, 573 4123 7673 None ENST00000434772 Genewiz TFORF0484 ZFP92 XM_011531115, XM_005274652, NM_001136273, 574 4124 7674 None ENST00000338647 Genewiz TFORF0485 ZNF852 NM_001287349 575 4125 7675 None Genewiz TFORF0486 ZFP90 NM_001305204, ENST00000611381 576 4126 7676 None Genewiz TFORF0487 ZFP90 NM_133458, NM_001305203, XM_005255804, 577 4127 7677 None ENST00000570495, ENST00000563169, ENST00000398253 Genewiz TFORF0488 ZFP90 NM_001305208, NM_001305206, ENST00000564323 578 4128 7678 None Genewiz TFORF0489 ZFP91 NM_053023, ENST00000316059 579 4129 7679 None Genewiz TFORF0490 ZSCAN4 XM_011526607, XM_017026458, NM_152677, 580 4130 7680 None ENST00000318203, ENST00000612521 Genewiz TFORF0491 ZSCAN1 NM_182572, ENST00000282326 581 4131 7681 None Genewiz TFORF0492 ZSCAN2 NM_001007072, ENST00000379358 582 4132 7682 None Genewiz TFORF0493 ZSCAN2 NM_181877, ENST00000448803, ENST00000546148, 583 4133 7683 None ENST00000540894 Genewiz TFORF0494 ZSCAN2 NM_017894, ENST00000334141 584 4134 7684 None Genewiz TFORF0495 EGR2 NM_000399, NM_001136177, NM_001136178, 585 4135 7685 None ENST00000242480, ENST00000439032 Genewiz TFORF0496 EGR2 NM_001136179, NM_001321037, ENST00000411732 586 4136 7686 None Genewiz TFORF0497 EGR3 NM_004430, ENST00000317216 587 4137 7687 None Genewiz TFORF0498 EGR3 NM_001199880, XM_005273426, ENST00000522910 588 4138 7688 None Genewiz TFORF0499 PPARGC1A XM_011513770, XM_011513771, ENST00000613098 589 4139 7689 None Genewiz TFORF0500 PPARGC1A NM_013261, ENST00000264867 590 4140 7690 None Genewiz TFORF0501 EGR4 NM_001965, ENST00000545030 591 4141 7691 None Genewiz TFORF0502 NFE4 NM_001085386, ENST00000638942 592 4142 7692 None Genewiz TFORF0503 RNF138 XM_005258286, NM_198128, ENST00000257190 593 4143 7693 None Genewiz TFORF0504 ZFP30 NM_001320666, NM_001320667, NM_014898, 594 4144 7694 None NM_001320669, NM_001320668, ENST00000351218, ENST00000392144, ENST00000514101 Genewiz TFORF0505 CDK2 NM_001290230, ENST00000440311 595 4145 7695 None Genewiz TFORF0506 CDK2 NM_052827, ENST00000354056 596 4146 7696 None Genewiz TFORF0507 CDK2 XM_011537732, ENST00000553376 597 4147 7697 None Genewiz TFORF0508 TSHZ2 NM_001193421, ENST00000603338 598 4148 7698 None Genewiz TFORF0509 TSHZ2 NM_173485, XM_017027640, ENST00000371497 599 4149 7699 None Genewiz TFORF0510 TSHZ3 NM_020856, ENST00000240587 600 4150 7700 None Genewiz TFORF0511 TSHZ1 NM_005786, XM_005266641, ENST00000322038 601 4151 7701 None Genewiz TFORF0512 TSHZ1 NM_001308210, ENST00000580243 602 4152 7702 None Genewiz TFORF0513 PAX5 NM_001280550, ENST00000520154 603 4153 7703 None Genewiz TFORF0514 PAX5 NM_001280553, ENST00000520281 604 4154 7704 None Genewiz TFORF0515 PAX5 NM_001280548, ENST00000377853 605 4155 7705 None Genewiz TFORF0516 PAX5 NM_001280549, ENST00000523241 606 4156 7706 None Genewiz TFORF0517 PAX5 NM_016734, ENST00000358127 607 4157 7707 None Genewiz TFORF0518 PAX5 NM_001280547, ENST00000377852 608 4158 7708 None Genewiz TFORF0519 PAX5 NM_001280554, ENST00000414447 609 4159 7709 None Genewiz TFORF0520 PAX5 NM_001280555, ENST00000446742 610 4160 7710 None Genewiz TFORF0521 PAX5 NM_001280556, ENST00000522003 611 4161 7711 None Genewiz TFORF0522 PAX5 NM_001280551, ENST00000523145 612 4162 7712 None Genewiz TFORF0523 PAX5 NM_001280552, ENST00000377847 613 4163 7713 None Genewiz TFORF0524 PAX4 XM_011516276, ENST00000639438 614 4164 7714 None Genewiz TFORF0525 PAX4 NM_006193, ENST00000341640 615 4165 7715 None Genewiz TFORF0526 PAX7 NM_001135254, ENST00000420770 616 4166 7716 None Genewiz TFORF0527 PAX7 NM_002584, ENST00000375375 617 4167 7717 None Genewiz TFORF0528 PAX6 NM_001604, NM_001258463, NM_001310158, 618 4168 7718 None NM_001258462, ENST00000606377, ENST00000640368, ENST00000419022, ENST00000379129, ENST00000379107, ENST00000638903, ENST00000639409, ENST00000640975 Genewiz TFORF0529 PAX6 NM_001310160, NM_001310161, ENST00000638629, 619 4169 7719 None ENST00000639548, ENST00000640125, ENST00000481563, ENST00000639386 Genewiz TFORF0530 PAX6 NM_001310159, ENST00000639034 620 4170 7720 None Genewiz TFORF0531 PAX6 NM_001258465, NM_000280, NM_001258464, 621 4171 7721 None NM_001127612, ENST00000638914, ENST00000640610, ENST00000379132, ENST00000379109, ENST00000639916, ENST00000241001 Genewiz TFORF0532 PAX1 NM_006192, ENST00000398485 622 4172 7722 None Genewiz TFORF0533 PAX1 NM_001257096, ENST00000613128 623 4173 7723 None Genewiz TFORF0534 ZNF584 NM_001318002, ENST00000322834 624 4174 7724 None Genewiz TFORF0535 ZNF584 NM_173548, ENST00000306910 625 4175 7725 None Genewiz TFORF0536 PAX2 NM_003988, ENST00000370296 626 4176 7726 None Genewiz TFORF0537 PAX2 NM_003987, ENST00000428433 627 4177 7727 None Genewiz TFORF0538 PAX2 NM_000278, ENST00000355243 628 4178 7728 None Genewiz TFORF0539 ZNF589 NM_016089, ENST00000354698, ENST00000448461 629 4179 7729 None Genewiz TFORF0540 GCFC2 NM_003203, ENST00000321027 630 4180 7730 None Genewiz TFORF0541 GCFC2 NM_001201335, ENST00000470503 631 4181 7731 None Genewiz TFORF0542 ZFAT NM_001029939, NM_001167583, NM_001289394, 632 4182 7732 None XM_011517203, ENST00000520727, ENST00000520214 Genewiz TFORF0543 ZFAT NM_001174158, ENST00000520356 633 4183 7733 None Genewiz TFORF0544 ZFAT XM_011517206, ENST00000429442 634 4184 7734 None Genewiz TFORF0545 ZFAT NM_020863, ENST00000377838 635 4185 7735 None Genewiz TFORF0546 ZFAT NM_001174157, ENST00000523399 636 4186 7736 None Genewiz TFORF0547 PAX9 NM_006194, ENST00000402703, ENST00000361487 637 4187 7737 None Genewiz TFORF0548 PAX8 NM_013952, ENST00000348715 638 4188 7738 None Genewiz TFORF0549 PAX8 NM_013992, ENST00000397647 639 4189 7739 None Genewiz TFORF0550 PAX8 NM_013953, ENST00000263335 640 4190 7740 None Genewiz TFORF0551 ZNF611 NM_030972, NM_001161499, NM_001161500, 641 4191 7741 None ENST00000540744, ENST00000543227, ENST00000319783 Genewiz TFORF0552 ZNF611 NM_001161501, ENST00000453741, ENST00000602162, 642 4192 7742 None ENST00000595798 Genewiz TFORF0553 ZNF610 NM_001161427, ENST00000601151, ENST00000613461 643 4193 7743 None Genewiz TFORF0554 ZNF610 NM_001161426, NM_001161425, NM_173530, 644 4194 7744 None ENST00000321287, ENST00000403906, ENST00000327920, ENST00000616431 Genewiz TFORF0555 ZNF613 XM_017027315, NM_024840, ENST00000391794 645 4195 7745 None Genewiz TFORF0556 ZNF613 XM_011527333, XM_005259269, NM_001031721, 646 4196 7746 None ENST00000293471 Genewiz TFORF0557 ZNF615 XM_011526825, XM_011526824, XM_011526826, 647 4197 7747 None NM_001199324, NM_001321319, NM_001321321, ENST00000594083, ENST00000598071, ENST00000618487 Genewiz TFORF0558 ZNF615 NM_001321317, ENST00000391795 648 4198 7748 None Genewiz TFORF0559 ZNF615 XM_017026649, XM_017026648, NM_198480, 649 4199 7749 None NM_001321320, ENST00000602063, ENST00000376716 Genewiz TFORF0560 ZNF614 NM_025040, ENST00000270649 650 4200 7750 None Genewiz TFORF0561 ZNF616 NM_178523, ENST00000600228 651 4201 7751 None Genewiz TFORF0562 ZNF619 NM_001145083, ENST00000456778 652 4202 7752 None Genewiz TFORF0563 ZNF619 NM_001145093, ENST00000432264 653 4203 7753 None Genewiz TFORF0564 ZNF619 XM_017006225, XM_011533608, ENST00000429348, 654 4204 7754 None ENST00000522736 Genewiz TFORF0565 ZNF619 NM_001145082, ENST00000447116, ENST00000521353 655 4205 7755 None Genewiz TFORF0566 ZNF618 NM_001318040, ENST00000615615 656 4206 7756 None Genewiz TFORF0567 ZNF618 NM_133374, ENST00000288466 657 4207 7757 None Genewiz TFORF0568 ZNF618 NM_001318042, ENST00000374126 658 4208 7758 None Genewiz TFORF0569 SRCAP NM_006662, ENST00000262518 659 4209 7759 None Genewiz TFORF0570 ZNF502 NM_001134442, NM_001134440, NM_001282880, 660 4210 7760 None NM_001134441, NM_033210, XM_017007440, ENST00000449836, ENST00000436624, ENST00000296091 Genewiz TFORF0571 MEOX1 NM_001040002, ENST00000393661 661 4211 7761 None Genewiz TFORF0572 MEOX1 NM_013999, ENST00000329168 662 4212 7762 None Genewiz TFORF0573 MEOX2 NM_005924, ENST00000262041 663 4213 7763 None Genewiz TFORF0574 HOXA13 NM_000522, ENST00000222753 664 4214 7764 None Genewiz TFORF0575 HOXA11 NM_005523, ENST00000006015 665 4215 7765 None Genewiz TFORF0576 HOXA10 NM_018951, ENST00000283921 666 4216 7766 None Genewiz TFORF0577 GTF2A1L NM_001193487, ENST00000430487 667 4217 7767 None Genewiz TFORF0578 ZNF107 XM_017012284, XM_017012283, XM_017012285, 668 4218 7768 None NM_001013746, NM_016220, XM_017012286, ENST00000344930, ENST00000423627, ENST00000395391 Genewiz TFORF0579 ZNF107 NM_001282359, ENST00000620827 669 4219 7769 None Genewiz TFORF0580 ZNF107 NM_001282360, ENST00000613690 670 4220 7770 None Genewiz TFORF0581 ZNF101 NM_033204, ENST00000318110, ENST00000592502 671 4221 7771 None Genewiz TFORF0582 ZNF100 NM_173531, ENST00000358296 672 4222 7772 None Genewiz TFORF0583 ZNF324 NM_014347, ENST00000536459, ENST00000196482 673 4223 7773 None Genewiz TFORF0584 SRF NM_003131, ENST00000265354 674 4224 7774 None Genewiz TFORF0585 ZNF329 NM_024620, XM_006723382, XM_011527316, 675 4225 7775 None XM_006723381, XM_017027311, XM_017027310, XM_017027307, XM_017027308, XM_006723383, XM_011527315, XM_006723384, XM_017027309, ENST00000597186, ENST00000598312, ENST00000358067, ENST00000500161 Genewiz TFORF0586 SRY NM_003140, ENST00000383070 676 4226 7776 None Genewiz TFORF0587 NMI XM_017005247, XM_005246941, NM_004688, 677 4227 7777 None ENST00000243346 Genewiz TFORF0588 FOXO3 XM_017010586, XM_011535628, XM_011535629, 678 4228 7778 None XM_005266868, ENST00000540898 Genewiz TFORF0589 FOXO1 NM_002015, ENST00000379561 679 4229 7779 None Genewiz TFORF0590 FOXO6 NM_001291281 680 4230 7780 None Genewiz TFORF0591 FOXO4 NM_005938, ENST00000374259 681 4231 7781 None Genewiz TFORF0592 FOXO4 NM_001170931, ENST00000341558 682 4232 7782 None Genewiz TFORF0593 UNCX NM_001080461, ENST00000316333 683 4233 7783 None Genewiz TFORF0594 ANKRD30A NM_052997, ENST00000361713, ENST00000602533 684 4234 7784 None Genewiz TFORF0595 PLAG1 NM_002655, NM_001114634, XM_017013576, 685 4235 7785 None ENST00000316981, ENST00000429357 Genewiz TFORF0596 PLAG1 NM_001114635, XM_011517544, XM_017013577, 686 4236 7786 None ENST00000423799 Genewiz TFORF0597 SMARCB1 NM_001317946, ENST00000344921 687 4237 7787 None Genewiz TFORF0598 SMARCB1 NM_003073, ENST00000263121 688 4238 7788 None Genewiz TFORF0599 ZNF891 XM_017018666, XM_017018667, XM_017018668, 689 4239 7789 None NM_001277291, ENST00000537226 Genewiz TFORF0600 MZF1 NM_001267033, ENST00000594234 690 4240 7790 None Genewiz TFORF0601 ZNF726 NM_001244038, ENST00000594466 691 4241 7791 None Genewiz TFORF0602 ZNF728 NM_001267716, ENST00000594710 692 4242 7792 None Genewiz TFORF0603 ZNF292 NM_015021, ENST00000369577 693 4243 7793 None Genewiz TFORF0604 PUF60 NM_001271099, ENST00000456095 694 4244 7794 None Genewiz TFORF0605 PUF60 NM_001271097, ENST00000527197 695 4245 7795 None Genewiz TFORF0606 PUF60 NM_078480, ENST00000526683 696 4246 7796 None Genewiz TFORF0607 CREBL2 NM_001310, ENST00000228865 697 4247 7797 None Genewiz TFORF0608 MAFA NM_201589, ENST00000333480 698 4248 7798 None Genewiz TFORF0609 MAFF NM_001161574, ENST00000538999 699 4249 7799 None Genewiz TFORF0610 RBPJ XM_005248161, XM_017008175, XM_017008174, 700 4250 7800 None NM_203284, XM_011513840, XM_017008173, ENST00000355476, ENST00000342320 Genewiz TFORF0611 RBPJ XM_017008172, NM_005349, ENST00000342295, 701 4251 7801 None ENST00000361572 Genewiz TFORF0612 RBPJ NM_015874, ENST00000348160 702 4252 7802 None Genewiz TFORF0613 ZSCAN29 NM_152455, XM_006720401, XM_011521266, 703 4253 7803 None ENST00000396976 Genewiz TFORF0614 ZSCAN25 XM_011515906, XM_011515907, XM_011515905, 704 4254 7804 None NM_145115, XM_005250194, XM_017011824, ENST00000394152, ENST00000334715 Genewiz TFORF0615 ZSCAN26 NM_001287422, ENST00000611552 705 4255 7805 None Genewiz TFORF0616 ZSCAN26 NM_152736, NM_001287421, ENST00000316606, 706 4256 7806 None ENST00000614088 Genewiz TFORF0617 ZSCAN26 NM_001023560, XM_011514862, ENST00000421553, 707 4257 7807 None ENST00000619937 Genewiz TFORF0618 ZSCAN26 NM_001111039, XM_017011264, ENST00000623276 708 4258 7808 None Genewiz TFORF0619 ZSCAN20 XM_017002237, NM_145238, XM_006710874, 709 4259 7809 None ENST00000361328 Genewiz TFORF0620 ZSCAN21 XM_017012585, ENST00000456748 710 4260 7810 None Genewiz TFORF0621 ZSCAN22 NM_001321116, NM_181846, XM_006723192, 711 4261 7811 None XM_011526917, ENST00000329665 Genewiz TFORF0622 ZSCAN23 XM_017010479, XM_011514406, XM_011514408, 712 4262 7812 None XM_011514410, XM_005248950, NM_001012455, ENST00000289788 Genewiz TFORF0623 DDIT3 NM_001195056, NM_001195053, NM_001195054, 713 4263 7813 None NM_001195055, ENST00000551116, ENST00000552740 Genewiz TFORF0624 NKRF NM_001173487, ENST00000542113 714 4264 7814 None Genewiz TFORF0625 MYSM1 NM_001085487, ENST00000472487 715 4265 7815 None Genewiz TFORF0626 CEBPZ NM_005760, ENST00000234170 716 4266 7816 None Genewiz TFORF0627 TERF2 NM_005652, ENST00000254942 717 4267 7817 None Genewiz TFORF0628 NR112 NM_033013, ENST00000466380, ENST00000638727 718 4268 7818 None Genewiz TFORF0629 NR112 NM_022002, ENST00000337940 719 4269 7819 None Genewiz TFORF0630 NR112 NM_003889, ENST00000393716, ENST00000640105 720 4270 7820 None Genewiz TFORF0631 TERF1 NM_017489, ENST00000276603 721 4271 7821 None Genewiz TFORF0632 VPS72 NM_005997, ENST00000368892 722 4272 7822 None Genewiz TFORF0633 VPS72 NM_001271087, ENST00000354473 723 4273 7823 None Genewiz TFORF0634 TRIM33 NM_033020, ENST00000369543 724 4274 7824 None Genewiz TFORF0635 TRIM33 NM_015906, ENST00000358465 725 4275 7825 None Genewiz TFORF0636 HOXC5 NM_018953, ENST00000312492 726 4276 7826 None Genewiz TFORF0637 HOXC4 NM_014620, NM_153633, ENST00000303406, 727 4277 7827 None ENST00000430889 Genewiz TFORF0638 HOXC6 NM_153693, ENST00000394331 728 4278 7828 None Genewiz TFORF0639 HOXC6 NM_004503, ENST00000243108 729 4279 7829 None Genewiz TFORF0640 NHP2 NM_001034833, ENST00000314397 730 4280 7830 None Genewiz TFORF0641 THAP11 NM_020457, ENST00000303596 731 4281 7831 None Genewiz TFORF0642 NR113 NM_005122, ENST00000367983 732 4282 7832 None Genewiz TFORF0643 NR113 NM_001077469, ENST00000428574 733 4283 7833 None Genewiz TFORF0644 NR113 NM_001077471, ENST00000367984 734 4284 7834 None Genewiz TFORF0645 NR113 NM_001077481, ENST00000367985 735 4285 7835 None Genewiz TFORF0646 NR113 NM_001077477, ENST00000437437 736 4286 7836 None Genewiz TFORF0647 NR113 NM_001077478, ENST00000442691 737 4287 7837 None Genewiz TFORF0648 NR113 NM_001077475, ENST00000512372 738 4288 7838 None Genewiz TFORF0649 NR113 NM_001077476, ENST00000508740 739 4289 7839 None Genewiz TFORF0650 NR113 NM_001077473, ENST00000412844 740 4290 7840 None Genewiz TFORF0651 NR113 NM_001077482, ENST00000367980, ENST00000367979 741 4291 7841 None Genewiz TFORF0652 NR113 NM_001077470, ENST00000504010 742 4292 7842 None Genewiz TFORF0653 NR113 NM_001077479, ENST00000511676 743 4293 7843 None Genewiz TFORF0654 NR113 NM_001077472, ENST00000367981 744 4294 7844 None Genewiz TFORF0655 NR113 NM_001077474, ENST00000505005 745 4295 7845 None Genewiz TFORF0656 CHD3 NM_001005273, ENST00000330494 746 4296 7846 None Genewiz TFORF0657 CHD3 NM_005852, ENST00000358181 747 4297 7847 None Genewiz TFORF0658 CHD3 NM_001005271, ENST00000380358 748 4298 7848 None Genewiz TFORF0659 CHD5 NM_015557, ENST00000262450 749 4299 7849 None Genewiz TFORF0660 CHD7 NM_001316690, ENST00000524602 750 4300 7850 None Genewiz TFORF0661 CHD7 NM_017780, ENST00000423902 751 4301 7851 None Genewiz TFORF0662 CHD6 NM_032221, ENST00000373233 752 4302 7852 None Genewiz TFORF0663 MNT NM_020310, ENST00000174618 753 4303 7853 None Genewiz TFORF0664 NKX1-2 NM_001146340, ENST00000451024 754 4304 7854 None Genewiz TFORF0665 SCX XM_006716616, NM_001080514, ENST00000567180 755 4305 7855 None Genewiz TFORF0666 TGFB111 NM_001164719, NM_015927, ENST00000361773, 756 4306 7856 None ENST00000394858, ENST00000567607 Genewiz TFORF0667 TGFB111 NM_001042454, ENST00000394863 757 4307 7857 None Genewiz TFORF0668 ZNF823 NM_001080493, ENST00000341191 758 4308 7858 None Genewiz TFORF0669 ZNF821 NM_001201556, ENST00000611294 759 4309 7859 None Genewiz TFORF0670 ZNF821 NM_017530, NM_001201554, XM_017023414, 760 4310 7860 None ENST00000313565, ENST00000446827 Genewiz TFORF0671 ZNF821 NM_001201553, NM_001201552, XM_017023409, 761 4311 7861 None XM_006721233, XM_017023411, XM_017023410, XM_011523211, XM_005256033, XM_017023412, ENST00000425432, ENST00000565601 Genewiz TFORF0672 ZNF827 NM_001306215, ENST00000508784 762 4312 7862 None Genewiz TFORF0673 ZNF827 NM_178835, XM_017007772, XM_017007774, 763 4313 7863 None XM_017007773, XM_017007775, ENST00000379448 Genewiz TFORF0674 ZNF829 NM_001171979, ENST00000520965 764 4314 7864 None Genewiz TFORF0675 PRDM16 NM_022114, ENST00000270722 765 4315 7865 None Genewiz TFORF0676 PRDM16 XM_005244774, ENST00000511072 766 4316 7866 None Genewiz TFORF0677 PRDM16 NM_199454, ENST00000378391 767 4317 7867 None Genewiz TFORF0678 PRDM14 NM_024504, ENST00000276594 768 4318 7868 None Genewiz TFORF0679 PRDM15 NM_001282934, ENST00000422911 769 4319 7869 None Genewiz TFORF0680 PRDM15 NM_001040424, ENST00000398548 770 4320 7870 None Genewiz TFORF0681 PRDM15 NM_022115, ENST00000433067, ENST00000269844 771 4321 7871 None Genewiz TFORF0682 PRDM15 XM_017028426, XM_017028425, ENST00000447016, 772 4322 7872 None ENST00000447207 Genewiz TFORF0683 PRDM12 NM_021619, ENST00000253008 773 4323 7873 None Genewiz TFORF0684 PRDM13 NM_021620, ENST00000369215 774 4324 7874 None Genewiz TFORF0685 PRDM10 NM_020228, ENST00000358825 775 4325 7875 None Genewiz TFORF0686 PRDM10 NM_199438, ENST00000423662 776 4326 7876 None Genewiz TFORF0687 PRDM10 NM_199437, ENST00000360871 777 4327 7877 None Genewiz TFORF0688 PRDM10 NM_199439, ENST00000304538 778 4328 7878 None Genewiz TFORF0689 HNF4G XM_017013374, XM_011517516, XM_017013373, 779 4329 7879 None XM_017013375, XM_017013376, ENST00000354370 Genewiz TFORF0690 HNF4G NM_004133, ENST00000396423 780 4330 7880 None Genewiz TFORF0691 HNF4A NM_178849, ENST00000415691 781 4331 7881 None Genewiz TFORF0692 HNF4A NM_001030003, ENST00000457232 782 4332 7882 None Genewiz TFORF0693 HNF4A NM_001287183, ENST00000619550 783 4333 7883 None Genewiz TFORF0694 HNF4A NM_175914, ENST00000316673 784 4334 7884 None Genewiz TFORF0695 HNF4A NM_001030004, ENST00000609795 785 4335 7885 None Genewiz TFORF0696 GTF3A NM_002097, ENST00000381140, ENST00000640289 786 4336 7886 None Genewiz TFORF0697 BATF2 NM_138456, ENST00000301887 787 4337 7887 None Genewiz TFORF0698 BATF2 NM_001300808, ENST00000435842 788 4338 7888 None Genewiz TFORF0699 BATF2 NM_001300807, ENST00000527716 789 4339 7889 None Genewiz TFORF0700 PAXBP1 NM_013329, ENST00000290178 790 4340 7890 None Genewiz TFORF0701 PAXBP1 NM_016631, ENST00000331923 791 4341 7891 None Genewiz TFORF0702 NCOR2 NM_006312, ENST00000405201 792 4342 7892 None Genewiz TFORF0703 NCOR2 NM_001077261, ENST00000404621 793 4343 7893 None Genewiz TFORF0704 NCOR2 NM_001206654, ENST00000429285 794 4344 7894 None Genewiz TFORF0705 NCOR1 NM_006311, ENST00000268712 795 4345 7895 None Genewiz TFORF0706 NCOR1 NM_001190438, ENST00000395848 796 4346 7896 None Genewiz TFORF0707 NCOR1 NM_001190440, XM_011524086, ENST00000395851 797 4347 7897 None Genewiz TFORF0708 ZNF628 NM_033113, ENST00000598519 798 4348 7898 None Genewiz TFORF0709 ZNF629 NM_001080417, ENST00000262525 799 4349 7899 None Genewiz TFORF0710 ZNF624 NM_020787, XM_006721562, ENST00000311331 800 4350 7900 None Genewiz TFORF0711 ZNF625 NM_145233, ENST00000439556 801 4351 7901 None Genewiz TFORF0712 ZNF626 NM_001076675, ENST00000601440 802 4352 7902 None Genewiz TFORF0713 ZNF627 NM_145295, ENST00000361113 803 4353 7903 None Genewiz TFORF0714 ZNF620 XM_017006069, NM_001256167, NM_001256168, 804 4354 7904 None ENST00000418905 Genewiz TFORF0715 ZNF620 XM_005265012, XM_005265011, NM_175888, 805 4355 7905 None ENST00000314529 Genewiz TFORF0716 ZNF621 NM_001287245, ENST00000310898 806 4356 7906 None Genewiz TFORF0717 TER NM_001145398, ENST00000406644 807 4357 7907 None Genewiz TFORF0718 TEF NM_003216, ENST00000266304 808 4358 7908 None Genewiz TFORF0719 ZNF623 NM_001261843, NM_001082480, XM_006716708, 809 4359 7909 None ENST00000526926, ENST00000458270 Genewiz TFORF0720 ZNF623 NM_014789, ENST00000501748 810 4360 7910 None Genewiz TFORF0721 ZNF727 NM_001159522, ENST00000456806 811 4361 7911 None Genewiz TFORF0722 NFKB1 NM_001165412, NM_001319226, ENST00000394820, 812 4362 7912 None ENST00000505458 Genewiz TFORF0723 NFKB1 NM_003998, ENST00000226574 813 4363 7913 None Genewiz TFORF0724 NFKB2 NM_001288724, NM_002502, NM_001261403, 814 4364 7914 None ENST00000428099, ENST00000189444 Genewiz TFORF0725 NFKB2 NM_001077494, NM_001322934, ENST00000369966 815 4365 7915 None Genewiz TFORF0726 ASH1L NM_018489, XM_006711451, XM_006711450, 816 4366 7916 None XM_017001784, XM_017001785, ENST00000392403 Genewiz TFORF0727 ZFP36L2 NM_006887, ENST00000282388 817 4367 7917 None Genewiz TFORF0728 MLXIP NM_014938, ENST00000319080 818 4368 7918 None Genewiz TFORF0729 ZNF518B NM_053042, XM_017008786, XM_017008784, 819 4369 7919 None XM_017008785, XM_005248193, ENST00000326756 Genewiz TFORF0730 ZNF518A XM_017016994, XM_011540413, NM_001278524, 820 4370 7920 None NM_014803, XM_011540415, XM_011540412, XM_011540419, XM_017016993, XM_011540406, XM_011540410, XM_011540418, XM_011540420, XM_017016989, XM_011540408, NM_001278525, XM_017016992, XM_017016991, XM_017016986, XM_017016995, XM_017016990, XM_017016987, XM_017016996, XM_017016988, XM_017016997, XM_017016999, XM_017016998, ENST00000316045, ENST00000624776, ENST00000614149 Genewiz TFORF0731 PROX1 NM_001270616, NM_002763, XM_017001833, 821 4371 7921 None ENST00000498508, ENST00000366958, ENST00000261454, ENST00000435016 Genewiz TFORF0732 PROX2 NM_001243007, ENST00000556489 822 4372 7922 None Genewiz TFORF0733 PROX2 NM_001080408, ENST00000556084 823 4373 7923 None Genewiz TFORF0734 ZNF488 XM_017015642, XM_017015643, XM_011539244, 824 4374 7924 None XM_006717617, NM_153034, ENST00000585316 Genewiz TFORF0735 ZNF480 NM_001297625, ENST00000335090 825 4375 7925 None Genewiz TFORF0736 ZNF480 NM_001297624, ENST00000334564 826 4376 7926 None Genewiz TFORF0737 ZNF480 NM_144684, XM_011526465, ENST00000468240, 827 4377 7927 None ENST00000595962 Genewiz TFORF0738 ZNF155 XM_005259215, NM_001260487, NM_198089, 828 4378 7928 None NM_003445, NM_001260486, ENST00000611002, ENST00000270014, ENST00000590615 Genewiz TFORF0739 ZNF155 XM_011527279, NM_001260488, XM_017027248, 829 4379 7929 None XM_011527278, ENST00000407951 Genewiz TFORF0740 JDP2 XM_017020975, XM_017020973, NM_001135047, 830 4380 7930 None XM_017020974, NM_001135048, NM_130469, XM_005267332, ENST00000419727, ENST0437176, ENST00000435893 Genewiz TFORF0741 JDP2 XM_017020972, NM_001135049, ENST00000267569 831 4381 7931 None Genewiz TFORF0742 ZNF485 NM_001318141, NM_145312, NM_001318140, 832 4382 7932 None ENST00000361807, ENST00000374435 Genewiz TFORF0743 ZNF486 NM_052852, ENST00000335117 833 4383 7933 None Genewiz TFORF0744 PIAS2 NM_004671, ENST00000585916 834 4384 7934 None Genewiz TFORF0745 PIAS2 NM_173206, ENST00000324794 835 4385 7935 None Genewiz TFORF0746 PIAS3 NM_006099, ENST00000393045 836 4386 7936 None Genewiz TFORF0747 PIAS1 NM_001320687, XM_017022688, ENST00000545237 837 4387 7937 None Genewiz TFORF0748 PIAS4 NM_015897, ENST00000262971 838 4388 7938 None Genewiz TFORF0749 ZNF131 XM_005248359, XM_005248362, NM_001297548, 839 4389 7939 None XM_005248360, XM_017009830, XM_005248361, XM_005248363, ENST00000509156 Genewiz TFORF0750 ZNF131 XM_005248365, XM_017009834, NM_003432, 840 4390 7940 None XM_017009835, XM_017009832, XM_017009833, ENST00000306938, ENST00000505606, ENST00000509634 Genewiz TFORF0751 HIVEP1 XM_011514551, XM_011514552, NM_002114, 841 4391 7941 None XM_011514553, ENST00000379388 Genewiz TFORF0752 HIVEP3 NM_001127714, XM_017001994, ENST00000372584 842 4392 7942 None Genewiz TFORF0753 HIVEP3 XM_011541884, NM_024503, XM_017001993, 843 4393 7943 None XM_017001992, ENST00000372583 Genewiz TFORF0754 ZNF317 NM_001190791, ENST00000360385 844 4394 7944 None Genewiz TFORF0755 ZNF319 NM_020807, XM_005256069, ENST00000299237 845 4395 7945 None Genewiz TFORF0756 JUND NM_005354, ENST00000252818 846 4396 7946 None Genewiz TFORF0757 NOTCH1 NM_017617, ENST00000277541 847 4397 7947 None Genewiz TFORF0758 OVOL3 NM_001302757, ENST00000633214 848 4398 7948 None Genewiz TFORF0759 TLX2 NM_016170, ENST00000233638 849 4399 7949 None Genewiz TFORF0760 OVOL1 XM_011545067, XM_017017837, XM_005274018, 850 4400 7950 None XM_011545068, XM_017017838, ENST00000532448 Genewiz TFORF0761 ZNF556 NM_024967, ENST00000307635 851 4401 7951 None Genewiz TFORF0762 ZNF557 NM_001044388, ENST00000414706 852 4402 7952 None Genewiz TFORF0763 ZNF557 NM_001044387, NM_024341, ENST00000252840 853 4403 7953 None Genewiz TFORF0764 ZNF555 NM_001172775, ENST00000591539 854 4404 7954 None Genewiz TFORF0765 ZNF555 NM_152791, ENST00000334241 855 4405 7955 None Genewiz TFORF0766 ZNF552 NM_024762, ENST00000391701 856 4406 7956 None Genewiz TFORF0767 ZNF550 NM_001277090, NM_001277091, NM_001277092, 857 4407 7957 None XM_011526567, XM_017026401, ENST00000457177, ENST00000325134, ENST00000376230, ENST00000447310 Genewiz TFORF0768 ZNF551 NM_138347, ENST00000282296 858 4408 7958 None Genewiz TFORF0769 MSX1 NM_002448, ENST00000382723 859 4409 7959 None Genewiz TFORF0770 MSX2 NM_002449, ENST00000239243 860 4410 7960 None Genewiz TFORF0771 MSX2 XM_017009489, ENST00000507785 861 4411 7961 None Genewiz TFORF0772 SIN3A NM_015477, NM_001145358, NM_001145357, 862 4412 7962 None XM_006720466, XM_006720467, XM_006720465, ENST00000394947, ENST00000360439, ENST00000394949 Genewiz TFORF0773 ZNF92 NM_001287532, ENST00000357512 863 4413 7963 None Genewiz TFORF0774 ZNF92 NM_001287534, NM_001287533, ENST00000431504 864 4414 7964 None Genewiz TFORF0775 ZNF92 NM_007139, ENST00000450302 865 4415 7965 None Genewiz TFORF0776 ZNF92 NM_152626, ENST00000328747 866 4416 7966 None Genewiz TFORF0777 SHOX NM_000451, NM_000451, ENST00000381578, 867 4417 7967 None ENST00000554971, ENST00000381578, ENST00000554971 Genewiz TFORF0778 SHOX NM_006883, NM_006883, ENST00000334060, 868 4418 7968 None ENST00000381575, ENST00000334060, ENST00000381575 Genewiz TFORF0779 GTF2H2 NM_001515, XM_017009404, XM_017009405, 869 4419 7969 None XM_017009406, XM_017009403, ENST00000274400, ENST00000330280 Genewiz TFORF0780 GTF2H1 NM_005316, NM_001142307, XM_006718208, 870 4420 7970 None ENST00000453096, ENST00000265963 Genewiz TFORF0781 ZNF93 NM_031218, ENST00000343769 871 4421 7971 None Genewiz TFORF0782 FOXN4 NM_213596, ENST00000299162 872 4422 7972 None Genewiz TFORF0783 SEBOX NM_001080837, ENST00000536498 873 4423 7973 None Genewiz TFORF0784 ZNF585A NM_001288800, ENST00000292841 874 4424 7974 None Genewiz TFORF0785 ZNF585A NM_199126, NM_152655, ENST00000356958, 875 4425 7975 None ENST00000392157 Genewiz TFORF0786 FOXN1 XM_005258046, NM_003593, ENST00000579795, 876 4426 7976 None ENST00000226247 Genewiz TFORF0787 REPIN1 XM_006715949, NM_001099695, XM_006715947, 877 4427 7977 None XM_006715948, ENST00000489432 Genewiz TFORF0788 REPIN1 XM_005249985, XM_006715952, NM_001099696, 878 4428 7978 None NM_013400, XM_006715953, XM_017012082, XM_017012081, XM_011516112, NM_014374, ENST00000397281, ENST00000444957, ENST00000425389 Genewiz TFORF0789 DBX2 NM_001004329, ENST00000332700 879 4429 7979 None Genewiz TFORF0790 DBX1 NM_001029865, ENST00000524983 880 4430 7980 None Genewiz TFORF0791 ZNF99 NM_001080409, ENST00000596209 881 4431 7981 None Genewiz TFORF0792 TARBP2 NM_004178, XM_005269114, XM_005269115, 882 4432 7982 None NM_134324, ENST0456234, ENST00000394357 Genewiz TFORF0793 ATF5 NM_012068, NM_001193646, XM_011526629, 883 4433 7983 None NM_001290746, ENST00000595125, ENST00000423777 Genewiz TFORF0794 ATF7 XM_005268587, XM_017018722, NM_006856, 884 4434 7984 None ENST00000420353, ENST00000456903 Genewiz TFORF0795 ATF7 NM_001206683, NM_001206682, ENST00000548118, 885 4435 7985 None ENST00000591397 Genewiz TFORF0796 ATF6 NM_007348, ENST00000367942 886 4436 7986 None Genewiz TFORF0797 ATF1 XM_011538387, NM_005171, XM_017019334, 887 4437 7987 None XM_017019333, XM_011538386, XM_017019332, ENST00000262053 Genewiz TFORF0798 ATF3 NM_001206486, ENST00000336937 888 4438 7988 None Genewiz TFORF0799 ATF3 NM_001040619, ENST00000366983, ENST00000464547 889 4439 7989 None Genewiz TFORF0800 ATF3 NM_001206488, NM_001206484, ENST00000613954, 890 4440 7990 None ENST00000613104 Genewiz TFORF0801 ATF3 NM_001030287, XM_011509579, NM_001674, 891 4441 7991 None XM_005273146, ENST00000366987, ENST00000341491 Genewiz TFORF0802 ATF2 NM_001256091, ENST00000426833 892 4442 7992 None Genewiz TFORF0803 ATF2 NM_001256092, ENST00000345739, ENST00000409635 893 4443 7993 None Genewiz TFORF0804 SMARCC2 XM_005269101, ENST00000550164 894 4444 7994 None Genewiz TFORF0805 SMARCC2 NM_001130420, ENST00000394023 895 4445 7995 None Genewiz TFORF0806 SMARCC2 NM_139067, ENST00000347471 896 4446 7996 None Genewiz TFORF0807 SMARCC2 NM_003075, ENST00000267064 897 4447 7997 None Genewiz TFORF0808 SMARCC1 NM_003074, ENST00000254480 898 4448 7998 None Genewiz TFORF0809 SOHLH1 NM_001012415, ENST00000298466 899 4449 7999 None Genewiz TFORF0810 SOHLH1 NM_001101677, ENST00000425225 900 4450 8000 None Genewiz TFORF0811 CDCA7L NM_018719, ENST00000406877 901 4451 8001 None Genewiz TFORF0812 CDCA7L NM_001127370, ENST00000356195 902 4452 8002 None Genewiz TFORF0813 CDCA7L NM_001127371, ENST00000373934 903 4453 8003 None Genewiz TFORF0814 LHX1 NM_005568, ENST00000614239 904 4454 8004 None Genewiz TFORF0815 LHX2 NM_004789, ENST00000373615 905 4455 8005 None Genewiz TFORF0816 LHX3 NM_014564, ENST00000371746 906 4456 8006 None Genewiz TFORF0817 LHX3 NM_178138, ENST00000371748 907 4457 8007 None Genewiz TFORF0818 LHX3 XM_005263410, ENST00000619587 908 4458 8008 None Genewiz TFORF0819 LHX5 NM_022363, ENST00000261731 909 4459 8009 None Genewiz TFORF0820 LHX6 NM_014368, ENST00000394319 910 4460 8010 None Genewiz TFORF0821 LHX6 NM_001242333, ENST00000541397 911 4461 8011 None Genewiz TFORF0822 LHX6 NM_199160, ENST00000340587 912 4462 8012 None Genewiz TFORF0823 LHX6 NM_001242335, ENST00000559895 913 4463 8013 None Genewiz TFORF0824 LHX8 NM_001256114, ENST00000356261 914 4464 8014 None Genewiz TFORF0825 LHX8 NM_001001933, ENST00000294638 915 4465 8015 None Genewiz TFORF0826 LHX9 XM_005245350, ENST00000561173 916 4466 8016 None Genewiz TFORF0827 LHX9 NM_001014434, ENST00000367390 917 4467 8017 None Genewiz TFORF0828 ZNF585B NM_152279, ENST00000532828 918 4468 8018 None Genewiz TFORF0829 SLC45A2 NM_016180, ENST00000296589 919 4469 8019 None Genewiz TFORF0830 SLC45A2 NM_001297417, ENST00000509381 920 4470 8020 None Genewiz TFORF0831 ZBTB33 NM_006777, NM_001184742, ENST00000326624, 921 4471 8021 None ENST00000557385 Genewiz TFORF0832 GTF2F1 NM_002096, ENST00000394456 922 4472 8022 None Genewiz TFORF0833 GTF2F2 NM_004128, ENST00000340473 923 4473 8023 None Genewiz TFORF0834 HSFY1 NM_152584, ENST00000309834 924 4474 8024 None Genewiz TFORF0835 HSFY1 NM_033108, ENST00000307393 925 4475 8025 None Genewiz TFORF0836 ZSCAN18 XM_005259174, XM_011527238, XM_006723335, 926 4476 8026 None XM_011527239, NM_023926, NM_001145543, ENST00000240727, ENST00000601144 Genewiz TFORF0837 ZSCAN18 NM_001145544, ENST00000421612 927 4477 8027 None Genewiz TFORF0838 ZSCAN18 NM_001145542, ENST00000600404 928 4478 8028 None Genewiz TFORF0839 HSFY2 NM_001001877, ENST00000344884 929 4479 8029 None Genewiz TFORF0840 HSFY2 NM_153716, ENST00000304790 930 4480 8030 None Genewiz TFORF0841 ZSCAN10 NM_032805, XM_017023791, ENST00000576985 931 4481 8031 None Genewiz TFORF0842 ZSCAN10 NM_001282415, ENST00000575108 932 4482 8032 None Genewiz TFORF0843 ZSCAN10 NM_001282416, ENST00000538082 933 4483 8033 None Genewiz TFORF0844 ZSCAN12 XM_011515014, XM_017011528, XM_011515015, 934 4484 8034 None NM_001163391 Genewiz TFORF0845 ZSCAN12 XM_011515017 935 4485 8035 None Genewiz TFORF0846 YBX1 NM_004559, ENST00000321358 936 4486 8036 None Genewiz TFORF0847 YBX3 NM_001145426, ENST00000279550 937 4487 8037 None Genewiz TFORF0848 YBX3 NM_003651, ENST00000228251 938 4488 8038 None Genewiz TFORF0849 ZIC4 NM_001168379, ENST00000425731 939 4489 8039 None Genewiz TFORF0850 ZIC4 NM_032153, ENST00000383075, ENST00000484399, 940 4490 8040 None ENST00000473123 Genewiz TFORF0851 ZIC4 NM_001168378, ENST00000525172 941 4491 8041 None Genewiz TFORF0852 ZIC4 NM_001243256, ENST00000491672 942 4492 8042 None Genewiz TFORF0853 ZIC5 NM_033132, ENST00000267294 943 4493 8043 None Genewiz TFORF0854 VSX1 NM_001256272, ENST00000429762 944 4494 8044 None Genewiz TFORF0855 VSX1 XM_017027837, ENST00000409285 945 4495 8045 None Genewiz TFORF0856 VSX1 NM_199425, ENST00000376707 946 4496 8046 None Genewiz TFORF0857 VSX1 XM_017027838, ENST00000409958 947 4497 8047 None Genewiz TFORF0858 VSX1 NM_001256271, ENST00000444511 948 4498 8048 None Genewiz TFORF0859 ZIC3 XM_017029802, ENST00000370606 949 4499 8049 None Genewiz TFORF0860 NR1H4 NM_001206978, ENST00000549996 950 4500 8050 None Genewiz TFORF0861 NR1H4 NM_001206993, ENST00000551379 951 4501 8051 None Genewiz TFORF0862 NR1H4 NM_001206992, ENST00000188403 952 4502 8052 None Genewiz TFORF0863 GTF21 NM_032999, ENST00000573035 953 4503 8053 None Genewiz TFORF0864 GTF21 NM_033001, ENST00000621734 954 4504 8054 None Genewiz TFORF0865 GTF2 NM_033000, ENST00000614986 955 4505 8055 None Genewiz TFORF0866 GTF21 NM_001518, ENST00000620879 956 4506 8056 None Genewiz TFORF0867 NR1H2 NM_001256647, ENST00000411902 957 4507 8057 None Genewiz TFORF0868 NR1H3 NM_001251934, NM_001251935, ENST00000616973 958 4508 8058 None Genewiz TFORF0869 NR1H3 XM_011519805, XM_006718113, XM_005252706, 959 4509 8059 None XM_005252705, XM_006718112, XM_005252707, NM_005693, ENST00000441012, ENST00000467728 Genewiz TFORF0870 NR1H3 XM_005252713, NM_001130101, ENST00000407404, 960 4510 8060 None ENST00000405853 Genewiz TFORF0871 NR1H3 XM_011519806, ENST00000405576 961 4511 8061 None Genewiz TFORF0872 HOXB8 XM_005257286, ENST00000576562 962 4512 8062 None Genewiz TFORF0873 HOXB8 NM_024016, XM_017024564, ENST00000239144 963 4513 8063 None Genewiz TFORF0874 HOXB9 NM_024017, ENST00000311177 964 4514 8064 None Genewiz TFORF0875 GLMP NM_001256609, ENST00000614643 965 4515 8065 None Genewiz TFORF0876 GLMP NM_001256608, ENST00000612353 966 4516 8066 None Genewiz TFORF0877 GLMP NM_001256605, ENST00000622703 967 4517 8067 None Genewiz TFORF0878 HOXB2 NM_002145, ENST00000330070 968 4518 8068 None Genewiz TFORF0879 HOXB3 NM_002146, XM_011524719, XM_011524720, 969 4519 8069 None XM_011524710, XM_006721854, XM_005257277, ENST00000470495, ENST00000311626, ENST00000498678, ENST00000476342 Genewiz TFORF0880 HOXB3 XM_011524726, XM_005257280, ENST00000472863, 970 4520 8070 None ENST00000489475 Genewiz TFORF0881 HOXB3 XM_005257282, ENST00000460160 971 4521 8071 None Genewiz TFORF0882 HOXB1 NM_002144, ENST00000239174 972 4522 8072 None Genewiz TFORF0883 HOXB7 NM_004502, ENST00000239165 973 4523 8073 None Genewiz TFORF0884 HOXB4 NM_024015, ENST00000332503 974 4524 8074 None Genewiz TFORF0885 PPARG XM_011533844, ENST00000397000 975 4525 8075 None Genewiz TFORF0886 CUX2 NM_015267, ENST00000261726 976 4526 8076 None Genewiz TFORF0887 ZNF384 NM_001039920, ENST00000355772 977 4527 8077 None Genewiz TFORF0888 ZNF384 NM_001135734, XM_017018942, XM_017018943, 978 4528 8078 None XM_017018941, ENST00000396801, ENST00000361959 Genewiz TFORF0889 ZNF384 NM_133476, XM_017018950, XM_017018949, 979 4529 8079 None ENST00000319770 Genewiz TFORF0890 ING4 NM_001127585, ENST00000444704 980 4530 8080 None Genewiz TFORF0891 ING4 NM_001127582, ENST00000396807 981 4531 8081 None Genewiz TFORF0892 ING4 NM_001127584, ENST00000446105 982 4532 8082 None Genewiz TFORF0893 ING4 NM_016162, ENST00000341550 983 4533 8083 None Genewiz TFORF0894 ING4 NM_001127586, ENST00000423703 984 4534 8084 None Genewiz TFORF0895 ING4 NM_001127583, ENST00000412586 985 4535 8085 None Genewiz TFORF0896 ZNF383 XM_017026423, XM_011526587, XM_017026422, 986 4536 8086 None XM_005258585, XM_005258587, XM_005258588, XM_011526586, XM_011526590, NM_152604, XM_017026424, XM_011526588, XM_011526589, ENST00000590503, ENST00000589413, ENST00000352998 Genewiz TFORF0897 ING1 NM_198217, ENST00000338450 987 4537 8087 None Genewiz TFORF0898 ING1 NM_005537, ENST00000375774 988 4538 8088 None Genewiz TFORF0899 ING1 NM_198218, ENST00000375775 989 4539 8089 None Genewiz TFORF0900 ZNF382 NM_001256838, ENST00000439428 990 4540 8090 None Genewiz TFORF0901 ZNF382 NM_032825, ENST00000292928 991 4541 8091 None Genewiz TFORF0902 PTF1A NM_178161, ENST00000376504 992 4542 8092 None Genewiz TFORF0903 SPDEF NM_001252294, ENST00000544425 993 4543 8093 None Genewiz TFORF0904 DNAJC1 NM_022365, ENST00000376980 994 4544 8094 None Genewiz TFORF0905 DNAJC2 NM_014377, ENST00000379263 995 4545 8095 None Genewiz TFORF0906 DNAJC2 NM_001129887, ENST00000249270 996 4546 8096 None Genewiz TFORF0907 BCLAF1 NM_001077441, ENST00000530767 997 4547 8097 None Genewiz TFORF0908 BCLAF1 NM_001077440, ENST00000353331, ENST00000392348 998 4548 8098 None Genewiz TFORF0909 BCLAF1 NM_001301038, ENST00000527759 999 4549 8099 None Genewiz TFORF0910 BCLAF1 NM_014739, ENST00000531224 1000 4550 8100 None Genewiz TFORF0911 BCLAF1 XM_005267237, ENST00000527536 1001 4551 8101 None Genewiz TFORF0912 ZNF831 XM_006723698, XM_017027643, XM_017027644, 1002 4552 8102 None XM_011528534, XM_017027642, XM_005260272, XM_011528537, XM_011528536, XM_005260273, XM_011528538, NM_178457, ENST00000637017, ENST00000371030 Genewiz TFORF0913 ZNF835 NM_001005850, XM_005259383, XM_005259382, 1003 4553 8103 None ENST00000537055 Genewiz TFORF0914 ZNF836 XM_011526558, XM_011526559, NM_001102657, 1004 4554 8104 None ENST00000597252 Genewiz TFORF0915 NCOA2 NM_001321707, NM_006540, NM_001321703, 1005 4555 8105 None ENST00000452400 Genewiz TFORF0916 NCOA3 NM_001174088, ENST00000371997 1006 4556 8106 None Genewiz TFORF0917 NCOA3 NM_181659, ENST00000371998 1007 4557 8107 None Genewiz TFORF0918 NCOA1 XM_017005169, XM_017005168, XM_005264628, 1008 4558 8108 None NM_147223, ENST00000405141, ENST00000288599 Genewiz TFORF0919 NCOA1 XM_005264625, NM_003743, ENST00000406961, 1009 4559 8109 None ENST00000348332 Genewiz TFORF0920 NCOA1 XM_005264626, NM_147233, ENST00000395856 1010 4560 8110 None Genewiz TFORF0921 BSX NM_001098169, ENST00000343035 1011 4561 8111 None Genewiz TFORF0922 NR2F1 NM_005654, ENST00000327111 1012 4562 8112 None Genewiz TFORF0923 NR2F2 NM_001145155, ENST00000421109 1013 4563 8113 None Genewiz TFORF0924 NR2F2 NM_021005, ENST00000394166 1014 4564 8114 None Genewiz TFORF0925 NR2F2 NM_001145156, NM_001145157, ENST00000394171, 1015 4565 8115 None ENST00000453270 Genewiz TFORF0926 ATOH7 NM_145178, ENST00000373673 1016 4566 8116 None Genewiz TFORF0927 DDB2 NM_001300734, ENST00000378600 1017 4567 8117 None Genewiz TFORF0928 DDB1 NM_001923, ENST00000301764 1018 4568 8118 None Genewiz TFORF0929 ATOH8 NM_032827, ENST00000306279 1019 4569 8119 None Genewiz TFORF0930 ZNF83 NM_001277951, NM_001105549, NM_001105551, 1020 4570 8120 None NM_001277952, NM_001105550, NM_001105552, NM_018300, NM_001277947, NM_001277945, NM_001277949, NM_001277946, NM_001277948, XM_017026951, ENST00000301096, ENST00000536937, ENST00000597597, ENST00000541777, ENST00000545872 Genewiz TFORF0931 MECP2 NM_001110792, ENST00000453960 1021 4571 8121 None Genewiz TFORF0932 ZNF80 XM_017007133, XM_017007134, XM_017007136, 1022 4572 8122 None XM_017007132, XM_017007135, NM_007136, ENST00000482457, ENST00000308095, ENST00000619534 Genewiz TFORF0933 ZNF85 NM_001256172, ENST00000300540 1023 4573 8123 None Genewiz TFORF0934 ZNF85 NM_003429, ENST00000328178 1024 4574 8124 None Genewiz TFORF0935 ZNF84 NM_003428, NM_001289971, NM_001127372, 1025 4575 8125 None XM_005266185, XM_005266186, XM_011534832, NM_001289972, ENST00000392319, ENST00000539354, ENST00000327668 Genewiz TFORF0936 SS18 NM_005637, ENST00000269137 1026 4576 8126 None Genewiz TFORF0937 SS18 NM_001308201, XM_011526145, ENST00000542420 1027 4577 8127 None Genewiz TFORF0938 SS18 NM_001007559, ENST00000415083 1028 4578 8128 None Genewiz TFORF0939 ZNF473 NM_001308424, ENST00000445728 1029 4579 8129 None Genewiz TFORF0940 ZNF639 XM_017006551, NM_001303426, NM_001303425, 1030 4580 8130 None XM_017006552, XM_017006550, XM_017006553, NM_016331, ENST00000496856, ENST0326361, ENST00000484866, ENST00000621687 Genewiz TFORF0941 LMO1 XM_011520099, XM_011520098, ENST00000534484 1031 4581 8131 None Genewiz TFORF0942 LMO1 NM_001270428, ENST00000428101 1032 4582 8132 None Genewiz TFORF0943 LMO2 NM_005574, ENST00000257818 1033 4583 8133 None Genewiz TFORF0944 LMO3 NM_001243612, ENST00000541295 1034 4584 8134 None Genewiz TFORF0945 LMO3 NM_001243613, ENST00000540445 1035 4585 8135 None Genewiz TFORF0946 LMO3 NM_001243611, ENST00000261169 1036 4586 8136 None Genewiz TFORF0947 EMX2 NM_001165924, ENST00000442245 1037 4587 8137 None Genewiz TFORF0948 EMX2 NM_004098, ENST00000553456 1038 4588 8138 None Genewiz TFORF0949 EMX1 NM_004097, ENST00000258106 1039 4589 8139 None Genewiz TFORF0950 PBRM1 XM_017006765, ENST00000337303 1040 4590 8140 None Genewiz TFORF0951 PBRM1 XM_017006748, XM_017006749, XM_017006750, 1041 4591 8141 None ENST00000296302 Genewiz TFORF0952 PBRM1 NM_018313, ENST00000394830 1042 4592 8142 None Genewiz TFORF0953 PBRM1 XM_017006758, XM_017006757, ENST00000409057 1043 4593 8143 None Genewiz TFORF0954 CDIP1 NM_001199055, ENST00000563507 1044 4594 8144 None Genewiz TFORF0955 CDIP1 NM_001199056, ENST00000562334 1045 4595 8145 None Genewiz TFORF0956 MYF5 NM_005593, ENST00000228644 1046 4596 8146 None Genewiz TFORF0957 ZNF497 NM_198458, NM_001207009, ENST00000311044, 1047 4597 8147 None ENST00000425453 Genewiz TFORF0958 ZNF496 XM_005273330, NM_032752, ENST00000294753 1048 4598 8148 None Genewiz TFORF0959 ERF NM_001308402, NM_001301035, NM_001312656, 1049 4599 8149 None XM_017026468, XM_017026469, ENST00000440177 Genewiz TFORF0960 ERG NM_001243428, NM_001136154, ENST00000417133, 1050 4600 8150 None ENST00000398919 Genewiz TFORF0961 ERG NM_001243429, ENST00000398897 1051 4601 8151 None Genewiz TFORF0962 ERG NM_004449, ENST00000398911, ENST00000442448 1052 4602 8152 None Genewiz TFORF0963 ERG NM_001136155, ENST00000453032 1053 4603 8153 None Genewiz TFORF0964 ERG XM_017028288, ENST00000398905 1054 4604 8154 None Genewiz TFORF0965 ZNF493 NM_175910, ENST00000355504 1055 4605 8155 None Genewiz TFORF0966 ZNF493 NM_145326, ENST00000339914 1056 4606 8156 None Genewiz TFORF0967 ZNF493 NM_001076678, ENST00000392288 1057 4607 8157 None Genewiz TFORF0968 HAND1 NM_004821, ENST00000231121 1058 4608 8158 None Genewiz TFORF0969 HAND2 NM_021973, ENST00000359562 1059 4609 8159 None Genewiz TFORF0970 ZNF124 NM_001243740, ENST00000472531 1060 4610 8160 None Genewiz TFORF0971 ZNF124 NM_003431, ENST00000340684 1061 4611 8161 None Genewiz TFORF0972 ZNF124 NM_001297567, ENST00000491356 1062 4612 8162 None Genewiz TFORF0973 ZNF124 NM_001297568, ENST00000543802 1063 4613 8163 None Genewiz TFORF0974 GFI1B XM_011519068, XM_017015175, XM_011519069, 1064 4614 8164 None XM_006717297, ENST00000636137 Genewiz TFORF0975 GFI1B NM_004188, XM_011519070, ENST00000339463, 1065 4615 8165 None ENST00000372122 Genewiz TFORF0976 GFI1B XM_017015176, NM_001135031, ENST00000372123 1066 4616 8166 None Genewiz TFORF0977 ZNF121 XM_017027239, NM_001308269, NM_001008727, 1067 4617 8167 None ENST00000586602, ENST00000320451 Genewiz TFORF0978 KLF7 NM_001270944, ENST00000412414 1068 4618 8168 None Genewiz TFORF0979 KLF7 NM_003709, XM_017005161, ENST00000309446 1069 4619 8169 None Genewiz TFORF0980 KLF7 NM_001270943, ENST00000421199 1070 4620 8170 None Genewiz TFORF0981 KLF6 NM_001300, ENST00000497571 1071 4621 8171 None Genewiz TFORF0982 KLF6 NM_001160125, ENST00000542957 1072 4622 8172 None Genewiz TFORF0983 KLF5 NM_001730, ENST00000377687 1073 4623 8173 None Genewiz TFORF0984 KLF5 NM_001286818, ENST00000539231 1074 4624 8174 None Genewiz TFORF0985 KLF4 NM_004235, ENST00000374672 1075 4625 8175 None Genewiz TFORF0986 GZF1 XM_011529321, XM_011529322, NM_022482, 1076 4626 8176 None NM_001317012, ENST00000338121, ENST00000377051 Genewiz TFORF0987 KLF2 NM_016270, ENST00000248071 1077 4627 8177 None Genewiz TFORF0988 KLF1 NM_006563, ENST00000264834 1078 4628 8178 None Genewiz TFORF0989 KLF9 NM_001206, ENST00000377126 1079 4629 8179 None Genewiz TFORF0990 KLF8 XM_006724576, ENST00000358094 1080 4630 8180 None Genewiz TFORF0991 KLF8 XM_005261979, XM_005261977, NM_001324102, 1081 4631 8181 None NM_007250, ENST00000468660 Genewiz TFORF0992 KLF8 NM_001324099, ENST00000640927 1082 4632 8182 None Genewiz TFORF0993 KLF8 NM_001159296, XM_006724575, ENST00000374928 1083 4633 8183 None Genewiz TFORF0994 ZNF304 NM_020657, ENST00000391705, ENST00000282286 1084 4634 8184 None Genewiz TFORF0995 ZNF304 NM_001290318, ENST00000443917 1085 4635 8185 None Genewiz TFORF0996 ZNF304 NM_001290319, ENST00000598744 1086 4636 8186 None Genewiz TFORF0997 ZNF302 XM_011527111, NM_001289189, NM_001289188, 1087 4637 8187 None XM_017026986, ENST00000507959 Genewiz TFORF0998 ZNF302 NM_001289190, NM_001289191, NM_001289192, 1088 4638 8188 None NM_018675, ENST00000505365 Genewiz TFORF0999 ZNF302 NM_001289185, NM_001289184, NM_001289182, 1089 4639 8189 None XM_017026982, ENST00000613363 Genewiz TFORF1000 ZNF302 NM_001289181, XM_017026979, XM_017026981, 1090 4640 8190 None XM_017026980, ENST00000446502 Genewiz TFORF1001 REST NM_005612, NM_001193508, XM_011534401, 1091 4641 8191 None ENST00000309042, ENST00000619101 Genewiz TFORF1002 ZNF300 NM_001172832, ENST00000418587 1092 4642 8192 None Genewiz TFORF1003 ZNF300 NM_001172831, ENST00000446148 1093 4643 8193 None Genewiz TFORF1004 POU1F1 NM_001122757, ENST00000344265 1094 4644 8194 None Genewiz TFORF1005 POU1F1 NM_000306, ENST00000350375 1095 4645 8195 None Genewiz TFORF1006 ZNF564 NM_144976, ENST00000339282 1096 4646 8196 None Genewiz TFORF1007 ZNF544 NM_001320771, NM_001320770, NM_001320773, 1097 4647 8197 None ENST00000600044, ENST00000600220 Genewiz TFORF1008 ZNF544 NM_001320788, NM_001320792, NM_001320791, 1098 4648 8198 None NM_001320789, ENST00000599227, ENST00000594384, ENST00000596825 Genewiz TFORF1009 ZNF544 NM_001320782, ENST00000596929 1099 4649 8199 None Genewiz TFORF1010 ZNF547 NM_173631, ENST00000282282 1100 4650 8200 None Genewiz TFORF1011 ZNF546 NM_178544, XM_011526899, ENST00000347077 1101 4651 8201 None Genewiz TFORF1012 ZNF546 NM_001297763, ENST00000600094 1102 4652 8202 None Genewiz TFORF1013 NCL NM_005381, ENST00000322723 1103 4653 8203 None Genewiz TFORF1014 ZNF540 NM_001172226, ENST00000589117 1104 4654 8204 None Genewiz TFORF1015 ZNF540 NM_152606, NM_001172225, ENST00000592533, 1105 4655 8205 None ENST00000316433, ENST00000343599 Genewiz TFORF1016 ZNF549 NM_153263, ENST00000240719 1106 4656 8206 None Genewiz TFORF1017 ZNF548 NM_001172773, ENST00000336128 1107 4657 8207 None Genewiz TFORF1018 ZNF548 NM_152909, ENST00000366197 1108 4658 8208 None Genewiz TFORF1019 SCRT1 NM_031309, ENST00000569446 1109 4659 8209 None Genewiz TFORF1020 ZXDC NM_025112, ENST00000389709 1110 4660 8210 None Genewiz TFORF1021 ZXDC NM_001040653, ENST00000336332 1111 4661 8211 None Genewiz TFORF1022 SCRT2 NM_033129, ENST00000246104 1112 4662 8212 None Genewiz TFORF1023 FOXI1 NM_012188, ENST00000306268 1113 4663 8213 None Genewiz TFORF1024 FOXI3 NM_001135649, ENST00000428390 1114 4664 8214 None Genewiz TFORF1025 FOXI2 NM_207426, ENST00000388920 1115 4665 8215 None Genewiz TFORF1026 RARA NM_001024809, ENST00000394081 1116 4666 8216 None Genewiz TFORF1027 RARA XM_005257552, ENST00000394086 1117 4667 8217 None Genewiz TFORF1028 RARA NM_001145302, ENST00000425707 1118 4668 8218 None Genewiz TFORF1029 RARB NM_016152, NM_001290276, NM_001290217, 1119 4669 8219 None ENST00000437042, ENST00000458646 Genewiz TFORF1030 RARG NM_001243732, ENST00000543726 1120 4670 8220 None Genewiz TFORF1031 RARG NM_001042728, ENST00000338561 1121 4671 8221 None Genewiz TFORF1032 RARG NM_001243730, ENST00000394426 1122 4672 8222 None Genewiz TFORF1033 MYT1 NM_004535, ENST00000328439 1123 4673 8223 None Genewiz TFORF1034 WT1 NM_000378, ENST00000452863, ENST00000639563 1124 4674 8224 None Genewiz TFORF1035 WT1 NM_001198552, ENST00000530998 1125 4675 8225 None Genewiz TFORF1036 WT1 NM_024426, ENST00000332351, ENST00000640146 1126 4676 8226 None Genewiz TFORF1037 WT1 NM_024424, ENST00000448076, ENST00000639907 1127 4677 8227 None Genewiz TFORF1038 WT1 NM_001198551, ENST00000379079 1128 4678 8228 None Genewiz TFORF1039 TSC22D1 NM_001243799, ENST00000501704 1129 4679 8229 None Genewiz TFORF1040 TSC22D1 NM_006022, ENST00000261489 1130 4680 8230 None Genewiz TFORF1041 TSC22D1 NM_183422, ENST00000458659 1131 4681 8231 None Genewiz TFORF1042 TSC22D1 NM_001243797, NM_001243798, ENST00000622051, 1132 4682 8232 None ENST00000611198 Genewiz TFORF1043 TSC22D3 NM_004089, ENST00000372397 1133 4683 8233 None Genewiz TFORF1044 TSC22D3 NM_198057, NM_001318468, NM_001318470, 1134 4684 8234 None XM_017029335, XM_005262100, XM_005262102, XM_005262103, XM_011530884, XM_005262099, ENST00000315660, ENST00000372383, ENST00000372384, ENST00000506081 Genewiz TFORF1045 TSC22D2 NM_014779, ENST00000361875 1135 4685 8235 None Genewiz TFORF1046 BAZ1B NM_032408, XM_017012773, ENST00000339594, 1136 4686 8236 None ENST00000404251 Genewiz TFORF1047 SNAI2 NM_003068, ENST00000020945 1137 4687 8237 None Genewiz TFORF1048 PLAGL2 NM_002657, XM_005260436, XM_011528864, 1138 4688 8238 None XM_011528863, ENST00000246229 Genewiz TFORF1049 HKR1 XM_017026677, XM_017026676, ENST00000589392 1139 4689 8239 None Genewiz TFORF1050 HKR1 XM_017026686, XM_017026689, XM_017026690, 1140 4690 8240 None XM_017026687, XM_017026688, ENST00000541583 Genewiz TFORF1051 HKR1 NM_181786, ENST00000324411 1141 4691 8241 None Genewiz TFORF1052 HKR1 XM_017026679, XM_017026678, ENST00000392153 1142 4692 8242 None Genewiz TFORF1053 HKR1 XM_017026698, XM_017026702, XM_017026701, 1143 4693 8243 None XM_017026695, XM_017026692, XM_017026693, XM_017026700, XM_017026694, XM_017026704, XM_017026703, XM_017026697, XM_017026696, XM_017026691, XM_017026699, XM_017026705, ENST00000591471, ENST00000544914 Genewiz TFORF1054 PLAGL1 NM_001317159, NM_001080951, NM_001289043, 1144 4694 8244 None NM_001289044, NM_001289042, NM_001317157, NM_001317156, NM_001289046, NM_001317162, NM_001080954, NM_001080952, NM_006718, NM_001289048, NM_001289045, NM_001317161, NM_001080953, NM_001289049, NM_001289047, ENST00000360537, ENST00000354765, ENST0416623, ENST00000444202, ENST00000625622, ENST00000367571 Genewiz TFORF1055 PLAGL1 NM_001289039, NM_001289040, NM_001080955, 1145 4695 8245 None NM_001289037, NM_001289038, NM_001317158, NM_001080956, NM_001289041, NM_001317160, ENST00000437412, ENST00000367572, ENST00000417959 Genewiz TFORF1056 NFATC1 NM_001278673, ENST00000545796 1146 4696 8246 None Genewiz TFORF1057 NFATC1 NM_001278675, ENST00000592223 1147 4697 8247 None Genewiz TFORF1058 NFATC1 NM_172389, ENST00000318065 1148 4698 8248 None Genewiz TFORF1059 NFATC1 NM_006162, ENST00000253506 1149 4699 8249 None Genewiz TFORF1060 NFATC1 NM_001278672, ENST00000586434 1150 4700 8250 None Genewiz TFORF1061 NFATC1 NM_001278670, ENST00000542384 1151 4701 8251 None Genewiz TFORF1062 NFATC1 NM_001278669, ENST00000427363 1152 4702 8252 None Genewiz TFORF1063 NFATC1 NM_172388, ENST00000397790 1153 4703 8253 None Genewiz TFORF1064 NFATC1 NM_172387, ENST00000329101 1154 4704 8254 None Genewiz TFORF1065 NFATC2 NM_001258296, NM_001258294, ENST00000610033, 1155 4705 8255 None ENST00000609507 Genewiz TFORF1066 NFATC2 NM_173091, ENST00000396009 1156 4706 8256 None Genewiz TFORF1067 NFATC2 NM_001258292, ENST00000609943 1157 4707 8257 None Genewiz TFORF1068 NFATC2 NM_001136021, ENST00000414705 1158 4708 8258 None Genewiz TFORF1069 NFATC2 NM_012340, ENST00000371564 1159 4709 8259 None Genewiz TFORF1070 NFATC3 NM_004555, ENST00000329524 1160 4710 8260 None Genewiz TFORF1071 NFATC3 NM_173163, ENST00000349223 1161 4711 8261 None Genewiz TFORF1072 NFATC4 NM_001136022, ENST00000413692 1162 4712 8262 None Genewiz TFORF1073 NFATC4 XM_011536797, ENST00000555453 1163 4713 8263 None Genewiz TFORF1074 NFATC4 NM_001198966, ENST00000553879, ENST00000554344 1164 4714 8264 None Genewiz TFORF1075 NFATC4 NM_001198965, ENST00000554050 1165 4715 8265 None Genewiz TFORF1076 NFATC4 NM_001288802, ENST00000422617 1166 4716 8266 None Genewiz TFORF1077 NFATC4 XM_011536799, ENST00000556169 1167 4717 8267 None Genewiz TFORF1078 NFATC4 NM_001198967, ENST00000554591 1168 4718 8268 None Genewiz TFORF1079 NFATC4 NM_004554, ENST00000250373 1169 4719 8269 None Genewiz TFORF1080 NOBOX NM_001080413, ENST00000467773 1170 4720 8270 None Genewiz TFORF1081 NOBOX XM_017011742, ENST00000483238 1171 4721 8271 None Genewiz TFORF1082 FUBP1 NM_003902, ENST00000370768 1172 4722 8272 None Genewiz TFORF1083 FUBP1 XM_017002743, ENST00000294623 1173 4723 8273 None Genewiz TFORF1084 SPZ1 NM_032567, ENST00000296739 1174 4724 8274 None Genewiz TFORF1085 AP2B1 NM_001282, XM_017024284, ENST00000621914 1175 4725 8275 None Genewiz TFORF1086 GTF2E1 NM_005513, XM_011512745, XM_011512744, 1176 4726 8276 None ENST00000283875 Genewiz TFORF1087 TULP3 NM_001160408, ENST00000397132 1177 4727 8277 None Genewiz TFORF1088 TULP3 NM_003324, ENST00000448120 1178 4728 8278 None Genewiz TFORF1089 TULP1 NM_003322, ENST00000229771 1179 4729 8279 None Genewiz TFORF1090 TULP1 NM_001289395, ENST00000322263 1180 4730 8280 None Genewiz TFORF1091 ZBTB8A NM_001291496, ENST00000316459 1181 4731 8281 None Genewiz TFORF1092 ZBTB8A NM_001040441, ENST00000373510 1182 4732 8282 None Genewiz TFORF1093 ZBTB8B NM_001145720, ENST00000609129 1183 4733 8283 None Genewiz TFORF1094 HSFX2 NM_001164415, ENST00000598963 1184 4734 8284 None Genewiz TFORF1095 EN1 NM_001426, ENST00000295206 1185 4735 8285 None Genewiz TFORF1096 EN2 NM_001427, ENST00000297375 1186 4736 8286 None Genewiz TFORF1097 HSFX1 NM_016153, ENST00000370416 1187 4737 8287 None Genewiz TFORF1098 UBP1 NM_001128160, ENST00000447368 1188 4738 8288 None Genewiz TFORF1099 UBP1 NM_014517, NM_001128161, ENST00000283629, 1189 4739 8289 None ENST00000283628 Genewiz TFORF1100 GTF2IRD1 XM_006716182, XM_006716183, ENST00000476977 1190 4740 8290 None Genewiz TFORF1101 GTF2IRD1 NM_005685, XM_017012804, ENST00000424337 1191 4741 8291 None Genewiz TFORF1102 GTF2IRD1 NM_016328, ENST00000265755 1192 4742 8292 None Genewiz TFORF1103 GTF2IRD1 NM_001199207, ENST00000455841 1193 4743 8293 None Genewiz TFORF1104 GTF2IRD2 NM_173537, ENST00000451013 1194 4744 8294 None Genewiz TFORF1105 GTF2IRD2 NM_001281447, ENST00000614386 1195 4745 8295 None Genewiz TFORF1106 EPAS1 NM_001430, ENST00000263734 1196 4746 8296 None Genewiz TFORF1107 ZNF37A NM_001324258, NM_001324256, NM_001324257, 1197 4747 8297 None XM_017016621, ENST00000638053 Genewiz TFORF1108 CREBBP NM_004380, ENST00000262367 1198 4748 8298 None Genewiz TFORF1109 CREBBP NM_001079846, ENST00000382070 1199 4749 8299 None Genewiz TFORF1110 ISX NM_001303508, ENST00000404699, ENST00000308700 1200 4750 8300 None Genewiz TFORF1111 RCOR3 NM_001136225, ENST00000452621 1201 4751 8301 None Genewiz TFORF1112 RCOR3 NM_018254, ENST00000367005 1202 4752 8302 None Genewiz TFORF1113 RCOR3 NM_001136223, ENST00000419091 1203 4753 8303 None Genewiz TFORF1114 RCOR3 NM_001136224, ENST00000367006 1204 4754 8304 None Genewiz TFORF1115 RCOR2 NM_173587, ENST00000301459 1205 4755 8305 None Genewiz TFORF1116 RCOR1 NM_015156, ENST00000262241 1206 4756 8306 None Genewiz TFORF1117 BRIP1 NM_032043, ENST00000259008 1207 4757 8307 None Genewiz TFORF1118 SKP2 NM_001243120, ENST00000620197 1208 4758 8308 None Genewiz TFORF1119 ZNF714 NM_182515, ENST00000456283, ENST00000610902 1209 4759 8309 None Genewiz TFORF1120 IFI16 NM_001206567, ENST00000359709 1210 4760 8310 None Genewiz TFORF1121 IFI16 NM_005531, ENST00000368131, ENST00000368132 1211 4761 8311 None Genewiz TFORF1122 HNRNPAB NM_004499, ENST00000355836, ENST00000506259 1212 4762 8312 None Genewiz TFORF1123 HNRNPAB NM_031266, ENST00000358344, ENST00000504898 1213 4763 8313 None Genewiz TFORF1124 E2F7 NM_203394, ENST00000322886 1214 4764 8314 None Genewiz TFORF1125 ENO1 NM_001428, ENST00000234590 1215 4765 8315 None Genewiz TFORF1126 THAP3 XM_005263532, NM_001195752, ENST00000307896 1216 4766 8316 None Genewiz TFORF1127 THAP3 NM_138350, ENST00000377627 1217 4767 8317 None Genewiz TFORF1128 THAP3 NM_001195753, ENST00000054650 1218 4768 8318 None Genewiz TFORF1129 ZNF808 XM_005258909, ENST00000487863 1219 4769 8319 None Genewiz TFORF1130 ZNF808 NM_001321425, NM_001039886, NM_001321424, 1220 4770 8320 None ENST00000359798 Genewiz TFORF1131 BNC1 NM_001717, ENST00000345382 1221 4771 8321 None Genewiz TFORF1132 BNC1 NM_001301206, ENST00000569704 1222 4772 8322 None Genewiz TFORF1133 CARF NM_001104586, NM_024744, NM_001322427, 1223 4773 8323 None XM_011511867, XM_005246859, XM_005246858, XM_017004961, NM_001322428, ENST0402905, ENST00000438828 Genewiz TFORF1134 CARF XM_017004964, XM_017004965, ENST00000414439 1224 4774 8324 None Genewiz TFORF1135 CARF NM_001282910, NM_001282911, ENST00000320443, 1225 4775 8325 None ENST00000428585 Genewiz TFORF1136 LRRFIP1 NM_001137552, ENST00000392000 1226 4776 8326 None Genewiz TFORF1137 LRRFIP1 NM_001137550, ENST00000308482 1227 4777 8327 None Genewiz TFORF1138 LRRFIP1 NM_001137553, ENST00000289175 1228 4778 8328 None Genewiz TFORF1139 LRRFIP1 NM_004735, ENST00000244815 1229 4779 8329 None Genewiz TFORF1140 HMBOX1 NM_001324391, NM_001324392, ENST00000558662 1230 4780 8330 None Genewiz TFORF1141 HMBOX1 NM_001135726, NM_024567, NM_001324382, 1231 4781 8331 None ENST00000287701, ENST00000397358 Genewiz TFORF1142 HMBOX1 XM_005273635, ENST00000524238 1232 4782 8332 None Genewiz TFORF1143 HMBOX1 XM_017013825, ENST00000521516 1233 4783 8333 None Genewiz TFORF1144 HOXA7 NM_006896, ENST00000242159 1234 4784 8334 None Genewiz TFORF1145 CDK1 NM_033379, ENST00000316629, ENST00000373809 1235 4785 8335 None Genewiz TFORF1146 HOXA4 NM_002141, ENST00000610970, ENST00000360046, 1236 4786 8336 None ENST00000428284 Genewiz TFORF1147 HOXA3 XM_011515343, XM_006715715, XM_005249730, 1237 4787 8337 None NM_030661, NM_153631, XM_005249731, XM_005249732, ENST00000396352, ENST012286, ENST00000317201 Genewiz TFORF1148 HOXA2 NM_006735, ENST00000222718 1238 4788 8338 None Genewiz TFORF1149 HOXA1 NM_153620, ENST00000355633 1239 4789 8339 None Genewiz TFORF1150 CDK7 NM_001324072, NM_001324077, NM_001324078, 1240 4790 8340 None NM_001324074, NM_001324075, ENST00000502604 Genewiz TFORF1151 MYOCD NM_153604, ENST00000343344 1241 4791 8341 None Genewiz TFORF1152 MYOCD NM_001146312, ENST00000425538 1242 4792 8342 None Genewiz TFORF1153 RERE NM_012102, NM_001042681, XM_005263464, 1243 4793 8343 None XM_017001359, XM_017001358, ENST00000337907, ENST00000400908 Genewiz TFORF1154 RERE NM_001042682, ENST00000476556 1244 4794 8344 None Genewiz TFORF1155 RERE XM_005263466, ENST00000377464 1245 4795 8345 None Genewiz TFORF1156 DMBX1 NM_147192, ENST00000371956 1246 4796 8346 None Genewiz TFORF1157 DMBX1 XM_017000289, XM_011540668, NM_172225, 1247 4797 8347 None ENST00000360032 Genewiz TFORF1158 EOMES NM_001278182, ENST00000449599 1248 4798 8348 None Genewiz TFORF1159 EOMES NM_005442, ENST00000295743 1249 4799 8349 None Genewiz TFORF1160 EOMES NM_001278183, ENST00000461503 1250 4800 8350 None Genewiz TFORF1161 NFKBIL1 NM_001144962, ENST00000376146 1251 4801 8351 None Genewiz TFORF1162 NFKBIL1 NM_005007, ENST00000376148 1252 4802 8352 None Genewiz TFORF1163 NFKBIL1 NM_001144961, ENST00000376145 1253 4803 8353 None Genewiz TFORF1164 NFXL1 NM_001278624, NM_152995, NM_001278623, 1254 4804 8354 None ENST00000329043, ENST00000381538, ENST00000507489 Genewiz TFORF1165 PRKCD NM_212539, NM_006254, XM_006713259, 1255 4805 8355 None NM_001316327, XM_017006856, XM_017006855, ENST00000394729, ENST00000330452 Genewiz TFORF1166 EBF4 NM_001110514, ENST00000380648 1256 4806 8356 None Genewiz TFORF1167 EBF3 XM_005252668, ENST00000355311 1257 4807 8357 None Genewiz TFORF1168 EBF2 NM_022659, ENST00000520164 1258 4808 8358 None Genewiz TFORF1169 EBF1 NM_182708, ENST00000380654 1259 4809 8359 None Genewiz TFORF1170 MLX NM_198205, ENST00000346833 1260 4810 8360 None Genewiz TFORF1171 MLX NM_170607, ENST00000246912 1261 4811 8361 None Genewiz TFORF1172 ZNF157 NM_003446, ENST00000377073 1262 4812 8362 None Genewiz TFORF1173 ZNF154 NM_001085384, ENST00000512439, ENST00000451275 1263 4813 8363 None Genewiz TFORF1174 NRL XM_011536804, XM_011536805, XM_005267709, 1264 4814 8364 None XM_011536802, XM_005267710, XM_005267708, NM_006177, ENST00000397002, ENST0561028, ENST00000396997 Genewiz TFORF1175 ZNF790 XM_011526950, XM_005258903, NM_001242802, 1265 4815 8365 None NM_206894, NM_001242801, NM_001242800, ENST00000356725, ENST00000615484, ENST00000614179, ENST00000613249 Genewiz TFORF1176 ZNF792 NM_175872, ENST00000404801 1266 4816 8366 None Genewiz TFORF1177 ZNF793 NM_001013659, XM_006723213, XM_005258927, 1267 4817 8367 None ENST00000587143, ENST00000445217 Genewiz TFORF1178 ZNF98 NM_001098626, ENST00000357774 1268 4818 8368 None Genewiz TFORF1179 ZNF799 NM_001080821, ENST00000430385 1269 4819 8369 None Genewiz TFORF1180 ZNF799 NM_001322497, NM_001322498, ENST00000419318 1270 4820 8370 None Genewiz TFORF1181 TBX21 NM_013351, ENST00000177694 1271 4821 8371 None Genewiz TFORF1182 TBX20 NM_001077653, ENST00000408931 1272 4822 8372 None Genewiz TFORF1183 DEAF1 NM_021008, ENST00000382409 1273 4823 8373 None Genewiz TFORF1184 MAP3K7 NM_145331, ENST00000369329 1274 4824 8374 None Genewiz TFORF1185 MAP3K7 NM_145332, ENST00000369325 1275 4825 8375 None Genewiz TFORF1186 MAP3K7 NM_003188, ENST00000369332 1276 4826 8376 None Genewiz TFORF1187 ZNF570 NM_001300993, ENST00000586475 1277 4827 8377 None Genewiz TFORF1188 ZNF570 NM_144694, NM_001321991, XM_011526544, 1278 4828 8378 None ENST00000330173 Genewiz TFORF1189 ZNF571 NM_001321272, NM_001290314, NM_016536, 1279 4829 8379 None ENST00000328550, ENST00000593133, ENST00000451802, ENST00000358744 Genewiz TFORF1190 ZNF573 NM_001172690, ENST00000590414, ENST00000536220 1280 4830 8380 None Genewiz TFORF1191 ZNF573 XM_017026281, XM_017026280, NM_001172689, 1281 4831 8381 None NM_001172692, ENST00000357309 Genewiz TFORF1192 ZNF573 NM_152360, ENST00000339503 1282 4832 8382 None Genewiz TFORF1193 ZNF574 XM_017027149, ENST00000222339 1283 4833 8383 None Genewiz TFORF1194 ZNF575 XM_011526793, XM_005258783, NM_174945, 1284 4834 8384 None ENST00000314228, ENST00000601282 Genewiz TFORF1195 ZNF576 NM_024327, NM_001145347, ENST00000391965, 1285 4835 8385 None ENST00000525771, ENST00000533118, ENST00000528387, ENST00000529930, ENST00000336564 Genewiz TFORF1196 ZNF577 NM_032679, XM_006723432, XM_006723433, 1286 4836 8386 None XM_011527408, ENST00000301399 Genewiz TFORF1197 ZNF577 NM_001135590, XM_017027387, ENST00000639636, 1287 4837 8387 None ENST00000451628 Genewiz TFORF1198 ZNF578 XM_017026302, NM_001099694, ENST00000421239 1288 4838 8388 None Genewiz TFORF1199 ZNF579 NM_152600, ENST00000325421 1289 4839 8389 None Genewiz TFORF1200 SETDB1 NM_001243491, XM_017002955, ENST00000368962 1290 4840 8390 None Genewiz TFORF1201 SETDB1 NM_001145415, ENST00000271640 1291 4841 8391 None Genewiz TFORF1202 SETDB1 NM_012432, ENST00000368969 1292 4842 8392 None Genewiz TFORF1203 SETDB2 NM_031915, ENST00000354234 1293 4843 8393 None Genewiz TFORF1204 SETDB2 NM_001160308, ENST00000317257 1294 4844 8394 None Genewiz TFORF1205 ZNF205 NM_001042428, NM_003456, NM_001278158, 1295 4845 8395 None XM_005255558, ENST00000382192, ENST00000219091, ENST00000620094 Genewiz TFORF1206 ZNF202 NM_001301779, NM_003455, NM_001301780, 1296 4846 8396 None XM_006718901, XM_011542973, XM_011542975, XM_011542972, XM_005271660, XM_005271661, XM_005271659, XM_017018268, ENST00000336139, ENST0530393, ENST00000529691 Genewiz TFORF1207 CASZ1 NM_001079843, ENST00000377022 1297 4847 8397 None Genewiz TFORF1208 CASZ1 NM_017766, ENST00000344008 1298 4848 8398 None Genewiz TFORF1209 ZNF200 NM_001145447, NM_001145448, NM_001145446, 1299 4849 8399 None ENST00000396871, ENST00000396870, ENST00000575948 Genewiz TFORF1210 ZNF200 NM_198087, ENST00000396868 1300 4850 8400 None Genewiz TFORF1211 EVX1 NM_001989, ENST00000496902 1301 4851 8401 None Genewiz TFORF1212 ZNF208 NM_007153, ENST00000397126 1302 4852 8402 None Genewiz TFORF1213 EVX2 NM_001080458, ENST00000308618 1303 4853 8403 None Genewiz TFORF1214 FOXH1 NM_003923, ENST00000377317 1304 4854 8404 None Genewiz TFORF1215 NME2 NM_001198682, ENST00000393183 1305 4855 8405 None Genewiz TFORF1216 NHLH1 NM_005598, ENST00000302101 1306 4856 8406 None Genewiz TFORF1217 PRRX1 NM_022716, ENST00000239461 1307 4857 8407 None Genewiz TFORF1218 PRRX1 NM_006902, ENST00000367760 1308 4858 8408 None Genewiz TFORF1219 PRRX2 NM_016307, ENST00000372469 1309 4859 8409 None Genewiz TFORF1220 HELT NM_001300782, ENST00000505610 1310 4860 8410 None Genewiz TFORF1221 HELT NM_001300781, ENST00000515777 1311 4861 8411 None Genewiz TFORF1222 ZNF483 NM_001007169, ENST00000358151 1312 4862 8412 None Genewiz TFORF1223 ZNF483 NM_133464, XM_017014337, XM_017014338, 1313 4863 8413 None XM_011518300, ENST00000309235 Genewiz TFORF1224 AKNA XM_006717295, ENST00000312033 1314 4864 8414 None Genewiz TFORF1225 AKNA NM_001317950, NM_030767, XM_005252247, 1315 4865 8415 None XM_005252245, XM_006717294, XM_005252244, ENST00000307564, ENST00000374088 Genewiz TFORF1226 DUX4 XM_011531514, ENST00000570263 1316 4866 8416 None Genewiz TFORF1227 DUX4 NM_001293798, NM_001306068, ENST00000616166, 1317 4867 8417 None ENST00000565211, ENST00000569241 Genewiz TFORF1228 DAXX NM_001254717, ENST00000414083 1318 4868 8418 None Genewiz TFORF1229 AFF4 NM_014423, XM_005271963, ENST00000265343 1319 4869 8419 None Genewiz TFORF1230 AFF3 XM_011511174, XM_011511173, NM_001025108, 1320 4870 8420 None ENST00000409579 Genewiz TFORF1231 AFF3 XM_005263943, XM_011511176, XM_011511175, 1321 4871 8421 None XM_011511177, NM_002285, ENST00000409236, ENST00000317233 Genewiz TFORF1232 AFF2 NM_001169124, ENST00000370457 1322 4872 8422 None Genewiz TFORF1233 AFF2 NM_001169122, ENST00000342251 1323 4873 8423 None Genewiz TFORF1234 AFF2 NM_002025, ENST00000370460 1324 4874 8424 None Genewiz TFORF1235 AFF2 NM_001170628, ENST00000286437 1325 4875 8425 None Genewiz TFORF1236 AFF1 NM_001166693, XM_011531973, XM_005263007, 1326 4876 8426 None ENST00000395146 Genewiz TFORF1237 AFF1 NM_005935, ENST00000307808 1327 4877 8427 None Genewiz TFORF1238 HOMEZ NM_020834, ENST00000357460 1328 4878 8428 None Genewiz TFORF1239 ZNF431 NM_133473, ENST00000311048 1329 4879 8429 None Genewiz TFORF1240 LBX2 NM_001009812, ENST00000460508 1330 4880 8430 None Genewiz TFORF1241 LBX2 NM_001282430, ENST00000377566 1331 4881 8431 None Genewiz TFORF1242 LBX1 NM_006562, ENST00000370193 1332 4882 8432 None Genewiz TFORF1243 SFPQ XM_017002053, XM_017002054, XM_005271112, 1333 4883 8433 None XM_005271113, NM_005066, ENST00000357214 Genewiz TFORF1244 PHF20 XM_017027864, NM_016436, XM_017027867, 1334 4884 8434 None XM_017027866, XM_017027865, ENST00000374012 Genewiz TFORF1245 PKNOX2 XM_017018110, NM_022062, XM_011542945, 1335 4885 8435 None XM_005271642, XM_011542944, ENST00000298282 Genewiz TFORF1246 ZNF133 NM_001283004, NM_001283003, XM_017028055, 1336 4886 8436 None XM_017028057, XM_017028056, ENST00000402618 Genewiz TFORF1247 ZNF133 NM_001282998, NM_001282999, NM_001282997, 1337 4887 8437 None NM_001283001, NM_001283000, XM_011529338, XM_011529337, XM_017028046, XM017028044, XM_017028045, XM_011529336, XM_005260820, XM_011529339, XM_017028048, XM_017028047, XM_005260819, ENST001790, ENST00000622607, ENST00000316358 Genewiz TFORF1248 ZNF133 NM_001283005, ENST00000538547 1338 4888 8438 None Genewiz TFORF1249 ZNF133 NM_001283002, XM_017028041, XM_017028042, 1339 4889 8439 None ENST00000628216 Genewiz TFORF1250 ZNF133 NM_001283008, ENST00000630056 1340 4890 8440 None Genewiz TFORF1251 ZNF133 NM_001283007, ENST00000535822 1341 4891 8441 None Genewiz TFORF1252 TP53 NM_001276696, ENST00000622645 1342 4892 8442 None Genewiz TFORF1253 TP53 NM_001126118, NM_001276760, NM_001276761, 1343 4893 8443 None ENST00000610292, ENST00000620739, ENST00000619485 Genewiz TFORF1254 TP53 NM_001276697, ENST00000619186 1344 4894 8444 None Genewiz TFORF1255 TP53 NM_001126117, ENST00000504290 1345 4895 8445 None Genewiz TFORF1256 TP53 NM_001126113, ENST00000455263 1346 4896 8446 None Genewiz TFORF1257 TP53 NM_000546, NM_001126112, ENST00000269305, 1347 4897 8447 None ENST00000445888 Genewiz TFORF1258 TP53 NM_001276695, ENST00000610538 1348 4898 8448 None Genewiz TFORF1259 TP53 NM_001126115, ENST00000504937 1349 4899 8449 None Genewiz TFORF1260 TP53 NM_001126114, ENST00000617185, ENST00000420246 1350 4900 8450 None Genewiz TFORF1261 TP53 NM_001126116, ENST00000510385 1351 4901 8451 None Genewiz TFORF1262 TP53 NM_001276698, ENST00000618944 1352 4902 8452 None Genewiz TFORF1263 TP53 NM_001276699, ENST00000610623 1353 4903 8453 None Genewiz TFORF1264 ZNF135 NM_001164530, ENST00000359978 1354 4904 8454 None Genewiz TFORF1265 ZNF135 XM_006723362, NM_003436, XM_006723363, 1355 4905 8455 None ENST00000511556 Genewiz TFORF1266 ZNF135 NM_001289401, XM_017027240, ENST00000313434 1356 4906 8456 None Genewiz TFORF1267 ZNF135 NM_001289402, ENST00000506786 1357 4907 8457 None Genewiz TFORF1268 ZNF135 NM_007134, ENST00000401053 1358 4908 8458 None Genewiz TFORF1269 HIVEP2 NM_006734, XM_017010805, ENST00000367603, 1359 4909 8459 None ENST00000012134, ENST00000367604 Genewiz TFORF1270 HLTF XM_017007079, NM_001318934, ENST00000465259 1360 4910 8460 None Genewiz TFORF1271 HLTF NM_001318935, NM_003071, NM_139048, 1361 4911 8461 None ENST00000310053, ENST0392912, ENST00000494055 Genewiz TFORF1272 DUXA NM_001012729, ENST00000554048 1362 4912 8462 None Genewiz TFORF1273 DPF2 XM_005274149, ENST00000252268 1363 4913 8463 None Genewiz TFORF1274 PKNOX1 NM_001286258, ENST00000432907 1364 4914 8464 None Genewiz TFORF1275 TPRX1 NM_198479, ENST00000322175 1365 4915 8465 None Genewiz TFORF1276 ZNF253 NM_021047, ENST00000589717 1366 4916 8466 None Genewiz TFORF1277 ZNF587 NM_001204817, ENST00000423137 1367 4917 8467 None Genewiz TFORF1278 ZNF587 NM_032828, ENST00000339656 1368 4918 8468 None Genewiz TFORF1279 ZNF254 NM_001278663, XM_011528448, NM_001278678, 1369 4919 8469 None ENST00000616028 Genewiz TFORF1280 ZNF254 XM_017027518, XM_017027519, NM_001278664, 1370 4920 8470 None ENST00000611359 Genewiz TFORF1281 ZNF254 NM_001278661, NM_001278677, XM_011528443, 1371 4921 8471 None XM_017027515, XM_011528444, XM_017027514, NM_001278662, XM_017027516, ENST00000613065 Genewiz TFORF1282 SALL4 NM_001318031, ENST00000395997 1372 4922 8472 None Genewiz TFORF1283 SALL1 NM_001127892, ENST00000440970 1373 4923 8473 None Genewiz TFORF1284 SALL1 NM_002968, XM_011523254, XM_006721241, 1374 4924 8474 None ENST00000251020 Genewiz TFORF1285 ZNF257 NM_033468, ENST00000594947 1375 4925 8475 None Genewiz TFORF1286 SALL3 NM_171999, ENST00000537592 1376 4926 8476 None Genewiz TFORF1287 SALL2 NM_005407, ENST00000614342 1377 4927 8477 None Genewiz TFORF1288 SALL2 NM_001291447, ENST00000450879 1378 4928 8478 None Genewiz TFORF1289 GMEB2 NM_012384, XM_005260202, ENST00000266068, 1379 4929 8479 None ENST00000370077 Genewiz TFORF1290 GMEB1 NM_006582, XM_011540519, XM_011540518, 1380 4930 8480 None XM_017000087, ENST00000294409 Genewiz TFORF1291 SP140L NM_001308163, ENST00000396563 1381 4931 8481 None Genewiz TFORF1292 SP140L NM_138402, ENST00000415673 1382 4932 8482 None Genewiz TFORF1293 SP140L NM_001308162, ENST00000243810 1383 4933 8483 None Genewiz TFORF1294 ZFP64 NM_022088, ENST00000346617 1384 4934 8484 None Genewiz TFORF1295 ZFP64 NM_001319146, ENST00000371523 1385 4935 8485 None Genewiz TFORF1296 ZFP64 NM_199427, ENST00000361387 1386 4936 8486 None Genewiz TFORF1297 ZFP64 NM_199426, ENST00000371515 1387 4937 8487 None Genewiz TFORF1298 ZFP64 NM_018197, ENST00000216923 1388 4938 8488 None Genewiz TFORF1299 MAEL XM_017002602, NM_001286378, XM_017002603, 1389 4939 8489 None ENST00000622874 Genewiz TFORF1300 MAEL NM_001286377, ENST00000367870 1390 4940 8490 None Genewiz TFORF1301 HMG20A NM_001304504, NM_018200, XM_011521158, 1391 4941 8491 None ENST00000336216, ENST00000381714 Genewiz TFORF1302 ZMIZ2 XM_005249872, NM_174929, ENST00000265346 1392 4942 8492 None Genewiz TFORF1303 ZMIZ2 XM_005249873, ENST00000433667 1393 4943 8493 None Genewiz TFORF1304 ZMIZ2 XM_005249869, NM_031449, ENST00000309315, 1394 4944 8494 None ENST00000441627 Genewiz TFORF1305 ZMIZ1 NM_020338, ENST00000334512 1395 4945 8495 None Genewiz TFORF1306 ZSCAN9 NM_001199479, ENST00000425468 1396 4946 8496 None Genewiz TFORF1307 JARID2 NM_004973, ENST00000341776 1397 4947 8497 None Genewiz TFORF1308 JARID2 XM_017010834, NM_001267040, XM_005249089, 1398 4948 8498 None XM_017010835, ENST00000397311 Genewiz TFORF1309 TTF1 NM_007344, XM_006717273, ENST00000334270 1399 4949 8499 None Genewiz TFORF1310 TTF1 NM_001205296, ENST00000612514 1400 4950 8500 None Genewiz TFORF1311 RBMS1 NM_016836, ENST00000348849 1401 4951 8501 None Genewiz TFORF1312 AEBP2 NM_001267043, ENST00000360995 1402 4952 8502 None Genewiz TFORF1313 AEBP2 NM_153207, ENST00000266508 1403 4953 8503 None Genewiz TFORF1314 AEBP2 NM_001114176, ENST00000398864 1404 4954 8504 None Genewiz TFORF1315 ZNF813 NM_001004301, ENST00000396403 1405 4955 8505 None Genewiz TFORF1316 RB1CC1 NM_001083617, XM_017014107, ENST00000435644 1406 4956 8506 None Genewiz TFORF1317 RB1CC1 NM_014781, XM_011517643, ENST00000025008 1407 4957 8507 None Genewiz TFORF1318 MKX NM_173576, NM_001242702, XM_017016106, 1408 4958 8508 None XM_017016105, ENST00000419761, ENST00000375790 Genewiz TFORF1319 KLF13 NM_015995, ENST00000307145 1409 4959 8509 None Genewiz TFORF1320 KLF11 NM_001177716, NM_001177718, ENST00000540845, 1410 4960 8510 None ENST00000535335 Genewiz TFORF1321 KLF11 NM_003597, ENST00000305883 1411 4961 8511 None Genewiz TFORF1322 KLF10 NM_005655, ENST00000285407 1412 4962 8512 None Genewiz TFORF1323 KLF10 NM_001032282, ENST00000395884 1413 4963 8513 None Genewiz TFORF1324 KLF17 NM_173484, ENST00000372299 1414 4964 8514 None Genewiz TFORF1325 KLF16 NM_031918, ENST00000541015, ENST00000250916, 1415 4965 8515 None ENST00000617223 Genewiz TFORF1326 KLF15 NM_014079, XM_005247400, ENST00000296233 1416 4966 8516 None Genewiz TFORF1327 KLF14 NM_138693, ENST00000583337 1417 4967 8517 None Genewiz TFORF1328 TFAP2A NM_001042425, ENST00000319516 1418 4968 8518 None Genewiz TFORF1329 TFAP2A NM_003220, ENST00000482890 1419 4969 8519 None Genewiz TFORF1330 TFAP2C NM_003222, ENST00000201031 1420 4970 8520 None Genewiz TFORF1331 TFAP2B NM_003221, ENST00000393655 1421 4971 8521 None Genewiz TFORF1332 TFAP2E NM_178548, ENST00000373235 1422 4972 8522 None Genewiz TFORF1333 TFAP2D NM_172238, ENST00000008391 1423 4973 8523 None Genewiz TFORF1334 POU3F3 NM_006236, ENST00000361360 1424 4974 8524 None Genewiz TFORF1335 POU3F2 NM_005604, ENST00000328345 1425 4975 8525 None Genewiz TFORF1336 POU3F1 NM_002699, ENST00000373012 1426 4976 8526 None Genewiz TFORF1337 ZBTB7A NM_001317990, NM_015898, XM_005259571, 1427 4977 8527 None ENST00000322357, ENST00000601588 Genewiz TFORF1338 ZBTB7C XM_017025609, XM_011525870, XM_005258229, 1428 4978 8528 None NM_001318841, NM_001039360, XM_011525871, ENST00000535628, ENST00000586438, ENST00000588982, ENST00000590800 Genewiz TFORF1339 POU3F4 NM_000307, ENST00000373200 1429 4979 8529 None Genewiz TFORF1340 POU5F1 NM_001285986, ENST00000441888, ENST00000471529, 1430 4980 8530 None ENST00000512818, ENST00000513407 Genewiz TFORF1341 POU5F1 NM_203289, NM_001173531, ENST00000606567, 1431 4981 8531 None ENST00000620031 Genewiz TFORF1342 POU5F2 NM_153216, ENST00000606183 1432 4982 8532 None Genewiz TFORF1343 FEZF1 XM_011516202, NM_001160264, ENST00000427185 1433 4983 8533 None Genewiz TFORF1344 FEZF2 NM_018008, ENST00000486811, ENST00000283268, 1434 4984 8534 None ENST00000475839 Genewiz TFORF1345 RPA1 NM_002945, ENST00000254719 1435 4985 8535 None Genewiz TFORF1346 ZNF541 NM_001277075, XM_005259311, ENST00000391901 1436 4986 8536 None Genewiz TFORF1347 RPA3 NM_002947, ENST00000223129, ENST00000396682 1437 4987 8537 None Genewiz TFORF1348 RPA2 NM_001297558, ENST00000373909 1438 4988 8538 None Genewiz TFORF1349 POGZ NM_207171, XM_017000749, XM_005245006, 1439 4989 8539 None XM_017000748, XM_005245005, ENST00000392723 Genewiz TFORF1350 POGZ NM_001194937, XM_017000745, XM_017000746, 1440 4990 8540 None ENST00000409503 Genewiz TFORF1351 POGZ NM_145796, ENST00000368863 1441 4991 8541 None Genewiz TFORF1352 POGZ NM_015100, XM_005244999, XM_005245000, 1442 4992 8542 None XM_005245001, ENST00000271715 Genewiz TFORF1353 POGZ NM_001194938, ENST00000531094 1443 4993 8543 None Genewiz TFORF1354 ILF3 NM_017620, XM_017026763, XM_011527984, 1444 4994 8544 None ENST00000449870, ENST00000588657 Genewiz TFORF1355 ILF3 NM_012218, ENST00000590261 1445 4995 8545 None Genewiz TFORF1356 ILF3 NM_004516, ENST00000589998 1446 4996 8546 None Genewiz TFORF1357 ILF3 NM_153464, ENST00000250241 1447 4997 8547 None Genewiz TFORF1358 SP100 NM_003113, ENST00000264052 1448 4998 8548 None Genewiz TFORF1359 SP100 NM_001206701, ENST00000409112 1449 4999 8549 None Genewiz TFORF1360 SP100 NM_001206703, ENST00000427101 1450 5000 8550 None Genewiz TFORF1361 SP100 NM_001080391, ENST00000340126 1451 5001 8551 None Genewiz TFORF1362 SP100 NM_001206704, ENST00000409897 1452 5002 8552 None Genewiz TFORF1363 ELOF1 XM_017027356, ENST00000587806 1453 5003 8553 None Genewiz TFORF1364 ELOF1 XM_017027357, ENST00000591674 1454 5004 8554 None Genewiz TFORF1365 ISL1 NM_002202, ENST00000230658 1455 5005 8555 None Genewiz TFORF1366 ISL2 NM_145805, ENST00000290759 1456 5006 8556 None Genewiz TFORF1367 RREB1 NM_001003699, XM_006715157, ENST00000379938 1457 5007 8557 None Genewiz TFORF1368 RREB1 NM_001003700, ENST00000334984 1458 5008 8558 None Genewiz TFORF1369 RREB1 NM_001168344, NM_001003698, ENST00000379933, 1459 5009 8559 None ENST00000349384 Genewiz TFORF1370 ZNF141 NM_003441, ENST00000240499 1460 5010 8560 None Genewiz TFORF1371 ZNF140 XM_017019925, NM_001300776, NM_001300778, 1461 5011 8561 None XM_011534840, XM_017019924, ENST00000544426 Genewiz TFORF1372 ZNF143 NM_001282657, ENST00000396597 1462 5012 8562 None Genewiz TFORF1373 ZNF143 NM_001282656, XM_017018254, XM_017018255, 1463 5013 8563 None ENST00000396604, ENST00000530463 Genewiz TFORF1374 ZNF143 NM_003442, XM_011520349, ENST00000396602 1464 5014 8564 None Genewiz TFORF1375 ZNF142 XM_017004872, NM_001105537, XM_011511789, 1465 5015 8565 None ENST00000449707, ENST00000411696 Genewiz TFORF1376 ZNF146 NM_001099639, NM_001099638, NM_007145, 1466 5016 8566 None XM_005259214, XM_017027247, XM_017027245, XM_017027246, XM_017027244, ENST0456324, ENST00000443387 Genewiz TFORF1377 ZNF148 NM_021964, ENST00000360647, ENST00000484491, 1467 5017 8567 None ENST00000492394, ENST00000485866 Genewiz TFORF1378 ZNF783 NM_001195220, ENST00000434415 1468 5018 8568 None Genewiz TFORF1379 ZNF782 XM_011518315, XM_005251742, XM_011518318, 1469 5019 8569 None NM_001001662, ENST00000481138, ENST00000535338 Genewiz TFORF1380 ZNF787 NM_001002836, XM_011526445, ENST00000610935 1470 5020 8570 None Genewiz TFORF1381 ZNF789 NM_001013258, ENST00000379724 1471 5021 8571 None Genewiz TFORF1382 ZNF789 XM_017012018, NM_213603, ENST00000331410 1472 5022 8572 None Genewiz TFORF1383 ZNF788 ENST00000430298 1473 5023 8573 None Genewiz TFORF1384 ZNF788 ENST00000596883 1474 5024 8574 None Genewiz TFORF1385 TMF NM_007114, ENST00000398559 1475 5025 8575 None Genewiz TFORF1386 ZNF362 NM_152493, XM_017000415, ENST00000539719, 1476 5026 8576 None ENST00000373428 Genewiz TFORF1387 ZNF367 NM_153695, ENST00000375256 1477 5027 8577 None Genewiz TFORF1388 PMS1 NM_001128144, XM_017004348, ENST00000447232 1478 5028 8578 None Genewiz TFORF1389 PMS1 NM_001321051, ENST00000374826 1479 5029 8579 None Genewiz TFORF1390 PMS1 NM_001289409, NM_001289408, XM_011511356, 1480 5030 8580 None ENST00000432292, ENST00000624204 Genewiz TFORF1391 PMS1 NM_001321048, NM_000534, NM_001321045, 1481 5031 8581 None NM_001321047, ENST00000441310 Genewiz TFORF1392 PMS1 NM_001128143, XM_017004344, ENST00000409823 1482 5032 8582 None Genewiz TFORF1393 PMS1 NM_001321049, ENST00000409985 1483 5033 8583 None Genewiz TFORF1394 ZNF569 XM_017026381, XM_017026380, XM_017026379, 1484 5034 8584 None XM_017026382, ENST00000392150 Genewiz TFORF1395 ZNF569 XM_011526539, XM_017026376, XM_017026377, 1485 5035 8585 None NM_152484, ENST00000392149, ENST00000316950 Genewiz TFORF1396 ZNF568 NM_198539, ENST00000333987, ENST00000619231 1486 5036 8586 None Genewiz TFORF1397 ZNF568 NM_001204837, NM_001204836, XM_017026773, 1487 5037 8587 None XM_017026774, ENST00000415168 Genewiz TFORF1398 ZNF568 NM_001204839, XM_017026775, ENST00000455427 1488 5038 8588 None Genewiz TFORF1399 ZNF568 NM_001204838, XM_017026772, ENST00000617745 1489 5039 8589 None Genewiz TFORF1400 ZNF567 XM_017026420, NM_001322911, NM_152603, 1490 5040 8590 None NM_001322912, XM_017026421, XM_011526584, ENST00000360729, ENST00000585696 Genewiz TFORF1401 ZNF567 XM_017026417, NM_001322916, NM_001322915, 1491 5041 8591 None NM_001322919, NM_001322917, NM_001322918, NM_001322914, NM_001300979, NM_001322920, NM_001322913, XM_017026418, ENST00000536254 Genewiz TFORF1402 ZNF567 XM_011526585, ENST00000588311 1492 5042 8592 None Genewiz TFORF1403 ZNF566 NM_001300970, XM_017027400, XM_005259356, 1493 5043 8593 None XM_017027399, XM_011527428, ENST00000493391 Genewiz TFORF1404 ZNF566 NM_001145343, XM_006723447, ENST00000392170 1494 5044 8594 None Genewiz TFORF1405 ZNF566 NM_032838, NM_001145345, NM_001145344, 1495 5045 8595 None ENST00000434377, ENST00000424129 Genewiz TFORF1406 ZNF565 NM_152477, NM_001042474, XM_011526514, 1496 5046 8596 None XM_011526512, XM_017026341, ENST00000304116, ENST00000392173 Genewiz TFORF1407 ZNF559 NM_001202409, ENST00000592896 1497 5047 8597 None Genewiz TFORF1408 ZNF559 NM_001202406, ENST00000587557 1498 5048 8598 None Genewiz TFORF1409 ZNF559 NM_001202410, NM_001202411, NM_001202412, 1499 5049 8599 None ENST00000585352, ENST00000317221 Genewiz TFORF1410 RNF2 NM_007212, XM_011509852, XM_011509851, 1500 5050 8600 None ENST00000367510 Genewiz TFORF1411 ZNF562 XM_017026898, NM_001300885, ENST00000590155 1501 5051 8601 None Genewiz TFORF1412 ZNF562 NM_017656, ENST00000293648 1502 5052 8602 None Genewiz TFORF1413 ZNF561 XM_017027481, XM_017027479, XM_017027480, 1503 5053 8603 None ENST00000424629 Genewiz TFORF1414 ZNF561 NM_152289, XM_005260150, ENST00000302851 1504 5054 8604 None Genewiz TFORF1415 ZNF560 XM_011527696, XM_017026327, XM_017026329, 1505 5055 8605 None XM_017026328, XM_011527697, NM_152476, ENST00000301480 Genewiz TFORF1416 ZNF215 NM_013250, XM_006718311, ENST00000278319, 1506 5056 8606 None ENST00000414517 Genewiz TFORF1417 ZNF214 XM_005253128, XM_006718308, NM_013249, 1507 5057 8607 None ENST00000278314, ENST00000536068 Genewiz TFORF1418 ZNF217 NM_006526, XM_017028059, ENST00000371471, 1508 5058 8608 None ENST00000302342 Genewiz TFORF1419 VDR NM_001017536, ENST00000550325 1509 5059 8609 None Genewiz TFORF1420 ZNF211 NM_006385, ENST00000240731 1510 5060 8610 None Genewiz TFORF1421 ZNF211 NM_001265599, ENST00000254182 1511 5061 8611 None Genewiz TFORF1422 ZNF211 NM_001265600, ENST00000391703 1512 5062 8612 None Genewiz TFORF1423 ZNF211 NM_001265597, ENST00000299871 1513 5063 8613 None Genewiz TFORF1424 ZNF211 NM_001265598, ENST00000541801 1514 5064 8614 None Genewiz TFORF1425 ZNF211 NM_198855, ENST00000347302 1515 5065 8615 None Genewiz TFORF1426 FOXK2 NM_004514, ENST00000335255 1516 5066 8616 None Genewiz TFORF1427 FOXK1 NM_001037165, ENST00000328914 1517 5067 8617 None Genewiz TFORF1428 NR3C2 NM_001166104, ENST00000512865 1518 5068 8618 None Genewiz TFORF1429 NR3C2 XM_011531975, XM_011531976, XM_011531977, 1519 5069 8619 None ENST00000625323, ENST00000511528 Genewiz TFORF1430 NFKBIB NM_001243116, ENST00000392079 1520 5070 8620 None Genewiz TFORF1431 ZNF219 NM_001102454, XM_006720164, NM_016423, 1521 5071 8621 None NM_001101672, XM_017021354, XM_006720163, XM_017021355, ENST00000360947, ENST00000451119, ENST00000421093 Genewiz TFORF1432 NR3C1 NM_001024094, XM_005268420, XM_005268423, 1522 5072 8622 None XM_005268419, XM_005268422, ENST00000504572, ENST00000394466, ENST00000231509 Genewiz TFORF1433 NR3C1 NM_001020825, ENST00000415690 1523 5073 8623 None Genewiz TFORF1434 NFKBID XM_011527419, ENST00000606253 1524 5074 8624 None Genewiz TFORF1435 NFKBIE NM_004556, ENST00000275015 1525 5075 8625 None Genewiz TFORF1436 NFKBIZ NM_031419, ENST00000326172 1526 5076 8626 None Genewiz TFORF1437 NFKBIZ NM_001005474, ENST00000394054 1527 5077 8627 None Genewiz TFORF1438 EP400 NM_015409, ENST00000389561, ENST00000389562 1528 5078 8628 None Genewiz TFORF1439 ZNF846 XM_011527717, XM_005259772, NM_001077624, 1529 5079 8629 None ENST00000397902 Genewiz TFORF1440 ARID4A NM_002892, ENST00000355431 1530 5080 8630 None Genewiz TFORF1441 ARID4A NM_023001, ENST00000348476, ENST00000431317 1531 5081 8631 None Genewiz TFORF1442 ARID4A NM_023000, ENST00000395168 1532 5082 8632 None Genewiz TFORF1443 ARID4B NM_031371, ENST00000349213 1533 5083 8633 None Genewiz TFORF1444 ARID4B NM_001206794, NM_016374, XM_011544212, 1534 5084 8634 None ENST00000366603, ENST00000264183 Genewiz TFORF1445 GTF2H3 NM_001271868, XM_017019228, ENST00000618160 1535 5085 8635 None Genewiz TFORF1446 GTF2H3 NM_001271867, ENST00000228955 1536 5086 8636 None Genewiz TFORF1447 NR5A2 XM_011509382, XM_005245062, NM_001276464, 1537 5087 8637 None ENST00000544748 Genewiz TFORF1448 NR5A2 NM_205860, ENST00000367362 1538 5088 8638 None Genewiz TFORF1449 DLX4 XM_017024291, NM_001934, ENST00000411890 1539 5089 8639 None Genewiz TFORF1450 DLX6 NM_005222, ENST00000518156 1540 5090 8640 None Genewiz TFORF1451 DLX1 NM_178120, ENST00000361725 1541 5091 8641 None Genewiz TFORF1452 DLX1 NM_001038493, ENST00000341900 1542 5092 8642 None Genewiz TFORF1453 DLX2 NM_004405, ENST00000234198 1543 5093 8643 None Genewiz TFORF1454 TFEC NM_001018058, ENST00000320239 1544 5094 8644 None Genewiz TFORF1455 TFEC NM_012252, XM_017011875, ENST00000265440 1545 5095 8645 None Genewiz TFORF1456 TFEC NM_001244583, ENST00000457268 1546 5096 8646 None Genewiz TFORF1457 TFEB NM_001271943, ENST00000420312 1547 5097 8647 None Genewiz TFORF1458 TFEB NM_001167827, ENST00000358871 1548 5098 8648 None Genewiz TFORF1459 LMX1A NM_001174069, NM_177398, ENST00000342310, 1549 5099 8649 None ENST00000294816, ENST00000367893 Genewiz TFORF1460 LMX1B NM_001174147, ENST00000373474 1550 5100 8650 None Genewiz TFORF1461 LMX1B NM_001174146, ENST00000355497 1551 5101 8651 None Genewiz TFORF1462 LMX1B NM_002316, ENST00000526117 1552 5102 8652 None Genewiz TFORF1463 VAV1 NM_001258206, ENST00000304076 1553 5103 8653 None Genewiz TFORF1464 VAV1 NM_001258207, ENST00000596764 1554 5104 8654 None Genewiz TFORF1465 VAV1 NM_005428, ENST00000602142 1555 5105 8655 None Genewiz TFORF1466 HOXD12 NM_021193, ENST00000406506 1556 5106 8656 None Genewiz TFORF1467 HOXD13 NM_000523, ENST00000392539 1557 5107 8657 None Genewiz TFORF1468 HOXD11 NM_021192, ENST00000249504 1558 5108 8658 None Genewiz TFORF1469 TOX3 NM_001080430, ENST00000219746 1559 5109 8659 None Genewiz TFORF1470 TOX3 NM_001146188, ENST00000407228 1560 5110 8660 None Genewiz TFORF1471 TOX2 NM_001098797, ENST00000341197 1561 5111 8661 None Genewiz TFORF1472 TOX2 NM_001098798, ENST00000358131 1562 5112 8662 None Genewiz TFORF1473 LIN54 NM_001115008, NM_001288997, NM_001115007, 1563 5113 8663 None ENST00000442461, ENST00000446851, ENST00000510557 Genewiz TFORF1474 LIN54 NM_194282, XM_006714081, XM_005262750, 1564 5114 8664 None ENST00000340417, ENST00000505397 Genewiz TFORF1475 LIN54 NM_001288996, XM_017007728, ENST00000506560 1565 5115 8665 None Genewiz TFORF1476 VAX1 NM_199131, ENST00000277905 1566 5116 8666 None Genewiz TFORF1477 VAX1 NM_001112704, ENST00000369206 1567 5117 8667 None Genewiz TFORF1478 TP63 XM_017007387, ENST00000456148 1568 5118 8668 None Genewiz TFORF1479 TP63 NM_001114980, ENST00000354600 1569 5119 8669 None Genewiz TFORF1480 TP63 NM_001114978, ENST00000392460 1570 5120 8670 None Genewiz TFORF1481 TP63 NM_001114982, ENST00000437221 1571 5121 8671 None Genewiz TFORF1482 TP63 NM_001114979, ENST00000418709 1572 5122 8672 None Genewiz TFORF1483 TP63 NM_001114981, ENST00000392463 1573 5123 8673 None Genewiz TFORF1484 TP63 XM_005247843, ENST00000440651 1574 5124 8674 None Genewiz TFORF1485 NANOG NM_024865, ENST00000229307 1575 5125 8675 None Genewiz TFORF1486 NANOG NM_001297698, ENST00000526286 1576 5126 8676 None Genewiz TFORF1487 NR6A1 XM_005251917, ENST00000344523 1577 5127 8677 None Genewiz TFORF1488 NR6A1 NM_001278546, ENST00000373584 1578 5128 8678 None Genewiz TFORF1489 NR6A1 NM_033334, ENST00000487099 1579 5129 8679 None Genewiz TFORF1490 NR6A1 NM_001489, ENST00000416460 1580 5130 8680 None Genewiz TFORF1491 ZNF57 NM_001319083, XM_011527682, ENST00000523428 1581 5131 8681 None Genewiz TFORF1492 FOSL1 NM_001300855, ENST00000532401 1582 5132 8682 None Genewiz TFORF1493 FOSL1 NM_001300844, ENST00000531493 1583 5133 8683 None Genewiz TFORF1494 FOSL1 NM_001300857, ENST00000448083 1584 5134 8684 None Genewiz TFORF1495 FOXN3 NM_001085471, ENST00000345097, ENST00000261302 1585 5135 8685 None Genewiz TFORF1496 HHEX NM_002729, ENST00000282728 1586 5136 8686 None Genewiz TFORF1497 LYL1 NM_005583, ENST00000264824 1587 5137 8687 None Genewiz TFORF1498 RBL1 NM_002895, ENST00000373664 1588 5138 8688 None Genewiz TFORF1499 RBL1 NM_183404, ENST00000344359 1589 5139 8689 None Genewiz TFORF1500 RBL2 NM_001323608, NM_005611, ENST00000262133 1590 5140 8690 None Genewiz TFORF1501 STOX2 NM_020225, ENST00000308497 1591 5141 8691 None Genewiz TFORF1502 GABPB1 NM_016654, XM_017022053, ENST00000380877 1592 5142 8692 None Genewiz TFORF1503 GABPB1 NM_181427, NM_016655, ENST00000396464, 1593 5143 8693 None ENST00000359031 Genewiz TFORF1504 GABPB1 NM_002041, ENST00000429662 1594 5144 8694 None Genewiz TFORF1505 ZNF696 NM_030895, ENST00000330143 1595 5145 8695 None Genewiz TFORF1506 HEY2 XM_017010629, XM_017010628, XM_017010627, 1596 5146 8696 None ENST00000368365 Genewiz TFORF1507 HEY1 NM_001282851, ENST00000523976 1597 5147 8697 None Genewiz TFORF1508 MTF1 NM_005955, XM_011541491, ENST00000373036 1598 5148 8698 None Genewiz TFORF1509 STAT5B NM_012448, ENST00000293328 1599 5149 8699 None Genewiz TFORF1510 STAT5A NM_001288719, ENST00000546010 1600 5150 8700 None Genewiz TFORF1511 STAT5A NM_001288720, ENST00000588868 1601 5151 8701 None Genewiz TFORF1512 NKX3-2 NM_001189, ENST00000382438 1602 5152 8702 None Genewiz TFORF1513 NKX3-1 NM_006167, ENST00000380871 1603 5153 8703 None Genewiz TFORF1514 NKX3-1 NM_001256339, ENST00000523261 1604 5154 8704 None Genewiz TFORF1515 NANOGNB NM_001145465, ENST00000382119 1605 5155 8705 None Genewiz TFORF1516 ZNF737 NM_001159293, ENST00000427401 1606 5156 8706 None Genewiz TFORF1517 FOXD4L4 NM_199244, ENST00000377413 1607 5157 8707 None Genewiz TFORF1518 FOXD4L5 NM_001126334, ENST00000377420 1608 5158 8708 None Genewiz TFORF1519 FOXD4L6 NM_001085476, ENST00000622588 1609 5159 8709 None Genewiz TFORF1520 NR2E1 NM_001286102, ENST00000368983 1610 5160 8710 None Genewiz TFORF1521 NR2E3 NM_016346, ENST00000621098 1611 5161 8711 None Genewiz TFORF1522 NR2E3 NM_014249, ENST00000617575 1612 5162 8712 None Genewiz TFORF1523 NFRKB NM_006165, XM_011542852, XM_011542851, 1613 5163 8713 None ENST00000524794 Genewiz TFORF1524 NFRKB NM_001143835, XM_017017796, ENST00000446488, 1614 5164 8714 None ENST00000524746 Genewiz TFORF1525 GFI1 XM_005270749, XM_011541245, XM_011541246, 1615 5165 8715 None NM_001127215, NM_001127216, NM_005263, ENST00000370332, ENST00000427103, ENST00000294702 Genewiz TFORF1526 MXI1 NM_005962, ENST00000239007 1616 5166 8716 None Genewiz TFORF1527 MXI1 NM_001008541, ENST00000361248 1617 5167 8717 None Genewiz TFORF1528 MXI1 NM_130439, ENST00000332674 1618 5168 8718 None Genewiz TFORF1529 ZNF605 NM_001164715, ENST00000392321 1619 5169 8719 None Genewiz TFORF1530 ZNF605 NM_183238, ENST00000360187 1620 5170 8720 None Genewiz TFORF1531 ZBTB40 XM_011542499, NM_014870, NM_001083621, 1621 5171 8721 None ENST00000404138, ENST00000375647 Genewiz TFORF1532 ZBTB40 XM_017003003, ENST00000374651 1622 5172 8722 None Genewiz TFORF1533 ZBTB41 NM_194314, ENST00000367405 1623 5173 8723 None Genewiz TFORF1534 ZBTB42 XM_017020911, NM_001137601, ENST00000555360, 1624 5174 8724 None ENST00000342537 Genewiz TFORF1535 ZBTB44 NM_001301098, ENST00000397753, ENST00000357899 1625 5175 8725 None Genewiz TFORF1536 ZBTB44 NM_001301099, XM_017017623, ENST00000530205 1626 5176 8726 None Genewiz TFORF1537 ZBTB45 NM_001316978, NM_032792, NM_001316981, 1627 5177 8727 None NM_001316979, NM_001316982, NM_001316980, XM_006723445, ENST00000354590, ENST00000594051, ENST00000600990 Genewiz TFORF1538 ZBTB46 XM_005260197, XM_011528548, NM_025224, 1628 5178 8728 None XM_005260195, XM_006723700, XM_005260198, XM_005260196, ENST00000245663, ENST0395104, ENST00000302995 Genewiz TFORF1539 ZBTB47 NM_145166, ENST00000232974 1629 5179 8729 None Genewiz TFORF1540 ZBTB49 NM_145291, XM_006713864, ENST00000337872 1630 5180 8730 None Genewiz TFORF1541 ZBTB49 XM_017007835, ENST00000515012 1631 5181 8731 None Genewiz TFORF1542 ZNF286B NM_001145045, ENST00000545289 1632 5182 8732 None Genewiz TFORF1543 ZNF160 XM_017027446, NM_001322136, NM_001322130, 1633 5183 8733 None NM_001322131, NM_001322129, NM_001322134, NM_001322133, NM_001102603, NM_001322132, NM_198893, NM_001322128, NM_033288, NM_001322135, ENST00000599056, ENST00000418871, ENST00000429604 Genewiz TFORF1544 ZNF160 XM_017027448, NM_001322126, NM_001322125, 1634 5184 8734 None ENST00000355147 Genewiz TFORF1545 ZNF160 NM_001322139, NM_001322137, NM_001322138, 1635 5185 8735 None ENST00000601421 Genewiz TFORF1546 BHMG1 NM_001310124, ENST00000457052 1636 5186 8736 None Genewiz TFORF1547 SUPT5H NM_001319991, NM_001130825, ENST00000402194, 1637 5187 8737 None ENST00000359191 Genewiz TFORF1548 TAZ NM_181311, ENST00000612460 1638 5188 8738 None Genewiz TFORF1549 TAZ NM_181313, ENST00000613002 1639 5189 8739 None Genewiz TFORF1550 TAZ NM_181312, ENST00000475699 1640 5190 8740 None Genewiz TFORF1551 TAZ NM_000116, ENST00000601016 1641 5191 8741 None Genewiz TFORF1552 SP110 NM_001185015, ENST00000540870 1642 5192 8742 None Genewiz TFORF1553 SP110 NM_004510, ENST00000258382 1643 5193 8743 None Genewiz TFORF1554 SP110 NM_004509, ENST00000358662 1644 5194 8744 None Genewiz TFORF1555 SP110 NM_080424, ENST00000258381 1645 5195 8745 None Genewiz TFORF1556 YAF2 XM_017018670, ENST00000552928 1646 5196 8746 None Genewiz TFORF1557 YAF2 NM_005748, ENST00000534854 1647 5197 8747 None Genewiz TFORF1558 YAF2 NM_001190980, ENST00000555248 1648 5198 8748 None Genewiz TFORF1559 YAF2 NM_001190977, ENST00000380790 1649 5199 8749 None Genewiz TFORF1560 IL18 NM_001243211, XM_011542805, ENST00000524595 1650 5200 8750 None Genewiz TFORF1561 ZNF688 NM_145271, ENST00000223459 1651 5201 8751 None Genewiz TFORF1562 ZNF688 NM_001024683, ENST00000563276 1652 5202 8752 None Genewiz TFORF1563 ZNF682 XM_017027455, ENST00000597972 1653 5203 8753 None Genewiz TFORF1564 ZNF682 NM_033196, ENST00000397165 1654 5204 8754 None Genewiz TFORF1565 ZNF682 XM_017027456, ENST00000595736 1655 5205 8755 None Genewiz TFORF1566 ZNF682 NM_001077349, ENST00000397162, ENST00000358523 1656 5206 8756 None Genewiz TFORF1567 ZNF683 XM_005245832, XM_011541198, XM_005245830, 1657 5207 8757 None NM_001307925, ENST00000403843, ENST00000436292 Genewiz TFORF1568 ZNF683 NM_001114759, NM_173574, ENST00000349618 1658 5208 8758 None Genewiz TFORF1569 ZNF680 NM_001130022, ENST00000447137 1659 5209 8759 None Genewiz TFORF1570 ZNF680 NM_178558, ENST00000309683 1660 5210 8760 None Genewiz TFORF1571 ZNF681 NM_138286, ENST00000402377 1661 5211 8761 None Genewiz TFORF1572 ZNF687 NM_001304764, XM_011509812, XM_011509813, 1662 5212 8762 None NM_020832, NM_001304763, ENST00000336715, ENST00000324048 Genewiz TFORF1573 ZNF684 XM_011540672, XM_011540671, XM_017000290, 1663 5213 8763 None NM_152373, XM_011540673, ENST00000372699 Genewiz TFORF1574 ZNF174 NM_001032292, ENST00000575752, ENST00000344823 1664 5214 8764 None Genewiz TFORF1575 ZNF174 NM_003450, ENST00000571936, ENST00000268655 1665 5215 8765 None Genewiz TFORF1576 ZNF177 NM_001172651, ENST00000589262 1666 5216 8766 None Genewiz TFORF1577 AHR NM_001621, ENST00000242057, ENST00000463496 1667 5217 8767 None Genewiz TFORF1578 ZNF778 XM_011522940, NM_001201407, XM_005256288, 1668 5218 8768 None ENST00000433976 Genewiz TFORF1579 ZNF778 XM_017023015, ENST00000306502 1669 5219 8769 None Genewiz TFORF1580 ZNF778 NM_182531, ENST00000620195 1670 5220 8770 None Genewiz TFORF1581 ATMIN NM_015251, ENST00000299575 1671 5221 8771 None Genewiz TFORF1582 ZNF777 NM_015694, XM_011516055, ENST00000247930 1672 5222 8772 None Genewiz TFORF1583 ZNF774 NM_001004309, ENST00000354377 1673 5223 8773 None Genewiz TFORF1584 ZNF775 NM_173680, ENST00000329630 1674 5224 8774 None Genewiz TFORF1585 ZNF772 NM_001144068, ENST00000356584 1675 5225 8775 None Genewiz TFORF1586 ZNF772 NM_001024596, ENST00000343280 1676 5226 8776 None Genewiz TFORF1587 ZNF772 XM_005258943, XM_005258944, ENST00000427512 1677 5227 8777 None Genewiz TFORF1588 ZNF773 NM_198542, ENST00000282292 1678 5228 8778 None Genewiz TFORF1589 ZNF773 NM_001304334, ENST00000598770 1679 5229 8779 None Genewiz TFORF1590 ZNF773 NM_001304337, ENST00000593916 1680 5230 8780 None Genewiz TFORF1591 CTCF NM_001191022, ENST00000401394 1681 5231 8781 None Genewiz TFORF1592 ZNF771 NM_001142305, NM_016643, ENST00000319296, 1682 5232 8782 None ENST00000434417 Genewiz TFORF1593 ZNF197 NM_001024855, XM_017005495, ENST00000383744, 1683 5233 8783 None ENST00000383745 Genewiz TFORF1594 ZNF197 XM_005264783, NM_006991, ENST00000344387, 1684 5234 8784 None ENST00000396058 Genewiz TFORF1595 ZNF195 NM_001130519, ENST00000005082 1685 5235 8785 None Genewiz TFORF1596 ZNF195 NM_007152, ENST00000354599 1686 5236 8786 None Genewiz TFORF1597 ZNF195 NM_001256825, NM_001242843, ENST00000343338, 1687 5237 8787 None ENST00000429541 Genewiz TFORF1598 ZNF195 NM_001242841, XM_017018263, ENST00000526601 1688 5238 8788 None Genewiz TFORF1599 ZNF195 NM_001130520, ENST00000399602 1689 5239 8789 None Genewiz TFORF1600 ZNF195 NM_001256823, ENST00000438262, ENST00000528218, 1690 5240 8790 None ENST00000618467 Genewiz TFORF1601 ZNF195 XM_011520350, XM_017018261, XM_011520351, 1691 5241 8791 None ENST00000620374 Genewiz TFORF1602 ZNF256 NM_005773, ENST00000282308 1692 5242 8792 None Genewiz TFORF1603 HINFP NM_198971, NM_015517, XM_011542745, 1693 5243 8793 None ENST00000350777 Genewiz TFORF1604 HINFP NM_001243259, ENST00000527410 1694 5244 8794 None Genewiz TFORF1605 BAZ2A NM_013449, ENST00000551812 1695 5245 8795 None Genewiz TFORF1606 BAZ2A NM_001300905, ENST00000549884 1696 5246 8796 None Genewiz TFORF1607 BAZ2B NM_001289975, ENST00000392782 1697 5247 8797 None Genewiz TFORF1608 BAZ2B NM_013450, ENST00000392783 1698 5248 8798 None Genewiz TFORF1609 ZZZ3 XM_005270729, NM_001308237, ENST00000370798 1699 5249 8799 None Genewiz TFORF1610 ZZZ3 XM_005270725, NM_015534, XM_005270726, 1700 5250 8800 None ENST00000370801 Genewiz TFORF1611 GSX2 NM_133267, ENST00000611459, ENST00000326902 1701 5251 8801 None Genewiz TFORF1612 TADA2A NM_001291918, NM_133439, ENST00000620367 1702 5252 8802 None Genewiz TFORF1613 TADA2A NM_001166105, NM_001488, ENST00000615182, 1703 5253 8803 None ENST00000620628, ENST00000612272 Genewiz TFORF1614 TADA2B NM_152293, ENST00000310074 1704 5254 8804 None Genewiz TFORF1615 GSX1 NM_145657, ENST00000302945 1705 5255 8805 None Genewiz TFORF1616 ZNF519 XM_017025562, XM_017025563, XM_017025565, 1706 5256 8806 None XM_017025564, NM_145287, ENST00000590202 Genewiz TFORF1617 NF1 NM_001128147, ENST00000431387 1707 5257 8807 None Genewiz TFORF1618 NF1 NM_001042492, ENST00000358273 1708 5258 8808 None Genewiz TFORF1619 NF1 NM_000267, ENST00000356175 1709 5259 8809 None Genewiz TFORF1620 ZNF512 NM_032434, ENST00000355467 1710 5260 8810 None Genewiz TFORF1621 ZNF512 NM_001271286, ENST00000416005 1711 5261 8811 None Genewiz TFORF1622 ZNF513 NM_001201459, XM_005264142, ENST00000407879 1712 5262 8812 None Genewiz TFORF1623 ZNF510 NM_001314059, NM_014930, XM_017014483, 1713 5263 8813 None ENST00000223428, ENST00000375231 Genewiz TFORF1624 ZNF516 NM_014643, XM_011526271, XM_011526270, 1714 5264 8814 None XM_011526272, XM_011526269, XM_011526275, XM_017026097, XM_011526274, XM_011526273, ENST00000443185 Genewiz TFORF1625 ZNF517 NM_213605, XM_011517014, XM_011517015, 1715 5265 8815 None ENST00000359971, ENST00000533965, ENST00000531720 Genewiz TFORF1626 ZNF514 NM_032788, XM_006712806, ENST00000295208, 1716 5266 8816 None ENST00000411425 Genewiz TFORF1627 FOXJ3 NM_001198851, NM_014947, NM_001198850, 1717 5267 8817 None XM_011541026, ENST00000372572, ENST00000372573, ENST00000545068, ENST00000361346 Genewiz TFORF1628 FOXJ3 NM_001198852, ENST00000361776 1718 5268 8818 None Genewiz TFORF1629 ZNF224 NM_013398, NM_001321645, XM_017027261, 1719 5269 8819 None ENST00000336976 Genewiz TFORF1630 ZNF226 NM_001146220, NM_015919, NM_001032374, 1720 5270 8820 None XM_017027265, ENST00000413984, ENST00000588742, ENST00000300823 Genewiz TFORF1631 ZNF226 NM_001319089, NM_001319088, NM_001319090, 1721 5271 8821 None XM_006723368, XM_006723367, NM_001032373, XM_006723369, XM_005259227, XM_017027262, NM_016444, NM_001032372, ENST00000337433, ENST0590089, ENST00000454662 Genewiz TFORF1632 ZNF227 XM_017027270, XM_017027271, NM_001289168, 1722 5272 8822 None NM_001289167, NM_001289169, XM_017027268, XM_017027269, ENST00000391961, ENST00000589005 Genewiz TFORF1633 ZNF229 NM_001278510, XM_011527292, ENST00000613197 1723 5273 8823 None Genewiz TFORF1634 ZNF229 NM_014518, ENST00000614049 1724 5274 8824 None Genewiz TFORF1635 HOXC11 NM_014212, ENST00000546378 1725 5275 8825 None Genewiz TFORF1636 HOXC13 NM_017410, ENST00000243056 1726 5276 8826 None Genewiz TFORF1637 HOXC12 NM_173860, ENST00000243103 1727 5277 8827 None Genewiz TFORF1638 BNIP3 NM_004052, ENST00000368636 1728 5278 8828 None Genewiz TFORF1639 PSIP1 NM_033222, NM_001128217, ENST00000380738, 1729 5279 8829 None ENST00000380733 Genewiz TFORF1640 PSIP1 NM_021144, ENST00000380716, ENST00000397519 1730 5280 8830 None Genewiz TFORF1641 PSIP1 NM_001317898, ENST00000380715 1731 5281 8831 None Genewiz TFORF1642 ZNF426 NM_001300883, XM_017027293, ENST00000593003 1732 5282 8832 None Genewiz TFORF1643 ZNF426 NM_001318056, ENST00000589289 1733 5283 8833 None Genewiz TFORF1644 ZNF425 NM_001001661, ENST00000378061 1734 5284 8834 None Genewiz TFORF1645 ZNF423 XM_005255857, ENST00000535559, ENST00000567169 1735 5285 8835 None Genewiz TFORF1646 ZNF423 NM_001271620, XM_017023078, XM_005255856, 1736 5286 8836 None XM_017023077, ENST00000563137, ENST00000562871, ENST00000562520 Genewiz TFORF1647 ZNF358 XM_005272460, NM_018083, XM_011527695, 1737 5287 8837 None ENST00000597229 Genewiz TFORF1648 ZNF429 NM_001001415, ENST00000358491 1738 5288 8838 None Genewiz TFORF1649 REL NM_002908, ENST00000295025 1739 5289 8839 None Genewiz TFORF1650 ZSCAN5A NM_001322075, NM_001322076, NM_001322073, 1740 5290 8840 None NM_001322074, XM_017027299, ENST00000592355 Genewiz TFORF1651 ZSCAN5B NM_001080456, XM_006723189, ENST00000586855, 1741 5291 8841 None ENST00000358992 Genewiz TFORF1652 SPI1 NM_001080547, ENST00000227163 1742 5292 8842 None Genewiz TFORF1653 SPI1 NM_003120, ENST00000378538 1743 5293 8843 None Genewiz TFORF1654 TCFL5 NM_006602, ENST00000335351 1744 5294 8844 None Genewiz TFORF1655 SPIB NM_001243999, ENST00000270632 1745 5295 8845 None Genewiz TFORF1656 SPIB NM_003121, ENST00000595883 1746 5296 8846 None Genewiz TFORF1657 SPIB NM_001243998, ENST00000439922 1747 5297 8847 None Genewiz TFORF1658 LEF1 NM_001166119, ENST00000510624 1748 5298 8848 None Genewiz TFORF1659 LEF1 NM_001130714, ENST00000379951 1749 5299 8849 None Genewiz TFORF1660 LEF1 NM_001130713, ENST00000438313 1750 5300 8850 None Genewiz TFORF1661 ARID5B NM_001244638, ENST00000309334 1751 5301 8851 None Genewiz TFORF1662 ARIDSB NM_032199, ENST00000279873 1752 5302 8852 None Genewiz TFORF1663 ARIDSA NM_001319092, ENST00000454558 1753 5303 8853 None Genewiz TFORF1664 ARIDSA NM_212481, ENST00000357485 1754 5304 8854 None Genewiz TFORF1665 TP73 NM_005427, ENST00000378295 1755 5305 8855 None Genewiz TFORF1666 TP73 NM_001126242, ENST00000378280 1756 5306 8856 None Genewiz TFORF1667 TP73 NM_001204184, ENST00000354437 1757 5307 8857 None Genewiz TFORF1668 TP73 NM_001204192, ENST00000378290 1758 5308 8858 None Genewiz TFORF1669 TP73 NM_001204186, ENST00000604074 1759 5309 8859 None Genewiz TFORF1670 TP73 NM_001204187, ENST00000357733, ENST00000603362 1760 5310 8860 None Genewiz TFORF1671 TP73 NM_001126241, ENST00000378285 1761 5311 8861 None Genewiz TFORF1672 TP73 NM_001126240, ENST00000378288 1762 5312 8862 None Genewiz TFORF1673 TP73 NM_001204188, ENST00000346387, ENST00000604479 1763 5313 8863 None Genewiz TFORF1674 RUNX1T1 NM_001198679, ENST00000436581 1764 5314 8864 None Genewiz TFORF1675 RUNX1T1 NM_001198633, ENST00000615601 1765 5315 8865 None Genewiz TFORF1676 RUNX1T1 NM_001198630, NM_001198629, NM_001198627, 1766 5316 8866 None NM_001198626, NM_001198631, NM_001198628, NM_175634, XM_017013931, XM_011517351, ENST00000613302, ENST00000614812, ENST00000617740, ENST00000265814, ENST00000523629 Genewiz TFORF1677 RUNX1T1 NM_001198634, ENST00000520724 1767 5317 8867 None Genewiz TFORF1678 RUNX1T1 NM_004349, NM_001198632, NM_001198625, 1768 5318 8868 None XM_017013932, XM_017013933, XM_011517352, ENST00000613886, ENST00000396218, ENST00000518844 Genewiz TFORF1679 RUNX1T1 NM_175636, NM_175635, XM_011517353, 1769 5319 8869 None XM_017013935, XM_017013937, XM_017013934, XM_006716676, XM_017013936, ENST00000360348, ENST00000422361 Genewiz TFORF1680 GTF2IRD2B NM_001003795, ENST00000472837 1770 5320 8870 None Genewiz TFORF1681 SIM2 NM_005069, ENST00000290399 1771 5321 8871 None Genewiz TFORF1682 SIM1 XM_017011197, XM_005267100, NM_005068, 1772 5322 8872 None ENST00000369208, ENST00000262901 Genewiz TFORF1683 TWIST1 XM_011515496, NM_000474, ENST00000242261 1773 5323 8873 None Genewiz TFORF1684 HNF1B NM_000458, ENST00000617811 1774 5324 8874 None Genewiz TFORF1685 HNF1B XM_011525160, ENST00000614313 1775 5325 8875 None Genewiz TFORF1686 HNF1B NM_001165923, ENST00000621123 1776 5326 8876 None Genewiz TFORF1687 HNF1B NM_001304286, ENST00000613727 1777 5327 8877 None Genewiz TFORF1688 HNF1A NM_001306179, ENST00000544413 1778 5328 8878 None Genewiz TFORF1689 HNF1A XM_005253931, ENST00000541395 1779 5329 8879 None Genewiz TFORF1690 HNF1A NM_000545, ENST00000257555 1780 5330 8880 None Genewiz TFORF1691 PGR NM_001271162, ENST00000534013 1781 5331 8881 None Genewiz TFORF1692 ZNF814 NM_001144989, ENST00000435989 1782 5332 8882 None Genewiz TFORF1693 E4F1 NM_004424, ENST00000301727 1783 5333 8883 None Genewiz TFORF1694 E4F1 NM_001288778, ENST00000565090 1784 5334 8884 None Genewiz TFORF1695 ZNF114 NM_153608, XM_017026415, ENST00000595607, 1785 5335 8885 None ENST00000600687, ENST00000315849 Genewiz TFORF1696 SOX21 NM_007084, ENST00000376945 1786 5336 8886 None Genewiz TFORF1697 E2F6 NM_001278277, NM_001278276, NM_001278278, 1787 5337 8887 None ENST00000542100, ENST00000546212 Genewiz TFORF1698 E2F6 NM_001278275, ENST00000307236 1788 5338 8888 None Genewiz TFORF1699 E2F5 NM_001083588, ENST00000418930 1789 5339 8889 None Genewiz TFORF1700 E2F5 NM_001951, ENST00000416274 1790 5340 8890 None Genewiz TFORF1701 E2F5 NM_001083589, ENST00000517476 1791 5341 8891 None Genewiz TFORF1702 E2F4 NM_001950, ENST00000379378 1792 5342 8892 None Genewiz TFORF1703 E2F3 NM_001949, ENST00000346618 1793 5343 8893 None Genewiz TFORF1704 E2F3 NM_001243076, ENST00000535432 1794 5344 8894 None Genewiz TFORF1705 E2F2 NM_004091, ENST00000361729 1795 5345 8895 None Genewiz TFORF1706 E2F1 NM_005225, ENST00000343380 1796 5346 8896 None Genewiz TFORF1707 BNC2 NM_017637, ENST00000380672 1797 5347 8897 None Genewiz TFORF1708 PPP1R13L NM_006663, NM_001142502, XM_017026177, 1798 5348 8898 None XM_017026178, ENST00000360957, ENST00000418234 Genewiz TFORF1709 ZFP41 NM_173832, NM_001271156, ENST00000520584, 1799 5349 8899 None ENST00000330701 Genewiz TFORF1710 CBFB NM_022845, ENST00000412916 1800 5350 8900 None Genewiz TFORF1711 FOXE1 NM_004473, ENST00000375123 1801 5351 8901 None Genewiz TFORF1712 DPRX XM_011527011, XM_011527012, NM_001012728, 1802 5352 8902 None ENST00000376650 Genewiz TFORF1713 LDB2 NM_001304434, ENST00000515064 1803 5353 8903 None Genewiz TFORF1714 LDB2 NM_001130834, ENST00000441778 1804 5354 8904 None Genewiz TFORF1715 LDB2 NM_001290, ENST00000304523 1805 5355 8905 None Genewiz TFORF1716 LDB2 NM_001304435, ENST00000502640 1806 5356 8906 None Genewiz TFORF1717 LDB1 NM_003893, ENST00000361198 1807 5357 8907 None Genewiz TFORF1718 LDB1 NM_001113407, ENST00000425280 1808 5358 8908 None Genewiz TFORF1719 ZNF699 NM_198535, ENST00000591998, ENST00000308650 1809 5359 8909 None Genewiz TFORF1720 DMRT2 XM_011517690, XM_017014213, XM_011517687, 1810 5360 8910 None XM_017014215, XM_017014214, NM_181872, ENST00000382251, ENST00000358146 Genewiz TFORF1721 DMRT2 XM_017014216, NM_006557, NM_001130865, 1811 5361 8911 None ENST00000635183, ENST00000382255, ENST00000412350, ENST00000259622 Genewiz TFORF1722 DMRT3 NM_021240, ENST00000190165 1812 5362 8912 None Genewiz TFORF1723 ZNF691 NM_001242739, XM_017001400, XM_017001401, 1813 5363 8913 None ENST00000372502 Genewiz TFORF1724 ZNF691 XM_017001402, XM_017001403, ENST00000372504, 1814 5364 8914 None ENST00000630961 Genewiz TFORF1725 ZNF691 NM_015911, ENST00000372508, ENST00000372507, 1815 5365 8915 None ENST00000372506 Genewiz TFORF1726 ZNF692 NM_001136036, ENST00000451251 1816 5366 8916 None Genewiz TFORF1727 ZNF692 NM_001193328, ENST00000366471 1817 5367 8917 None Genewiz TFORF1728 ZNF695 NM_001204221, ENST00000487338 1818 5368 8918 None Genewiz TFORF1729 ZNF695 NM_020394, ENST00000339986 1819 5369 8919 None Genewiz TFORF1730 CRAMP1 NM_020825, ENST00000397412, ENST00000293925 1820 5370 8920 None Genewiz TFORF1731 ZNF697 NM_001080470, XM_011542416, XM_005271315, 1821 5371 8921 None ENST00000421812 Genewiz TFORF1732 HDX NM_001177478, XM_006724619, ENST00000506585 1822 5372 8922 None Genewiz TFORF1733 ZNF169 NM_194320, ENST00000395395 1823 5373 8923 None Genewiz TFORF1734 RUVBL1 NM_001319086, ENST00000464873 1824 5374 8924 None Genewiz TFORF1735 ADAR NM_001111, ENST00000368474 1825 5375 8925 None Genewiz TFORF1736 ADAR NM_001193495, NM_001025107, XM_006711111, 1826 5376 8926 None XM_006711113, XM_006711112, ENST00000368471 Genewiz TFORF1737 ZBTB7B XM_006711359, NM_001256455, XM_006711354, 1827 5377 8927 None XM_006711353, XM_006711349, XM_011509599, XM_006711357, XM_006711358, XM_006711356, ENST00000535420, ENST00000368426, ENST00000292176 Genewiz TFORF1738 ZBTB7B NM_001252406, XM_011509598, ENST00000417934 1828 5378 8928 None Genewiz TFORF1739 ZNF165 NM_003447, XM_017011258, XM_017011260, 1829 5379 8929 None XM_017011259, ENST00000377325 Genewiz TFORF1740 ZNF768 XM_017023666, ENST00000562803 1830 5380 8930 None Genewiz TFORF1741 ZNF768 NM_024671, ENST00000380412 1831 5381 8931 None Genewiz TFORF1742 ZNF765 NM_001040185, ENST00000396408 1832 5382 8932 None Genewiz TFORF1743 ZNF764 NM_001172679, ENST00000395091 1833 5383 8933 None Genewiz TFORF1744 ZNF764 NM_033410, ENST00000252797 1834 5384 8934 None Genewiz TFORF1745 ZNF766 NM_001010851, ENST00000439461 1835 5385 8935 None Genewiz TFORF1746 ZNF761 NM_001289951, NM_001008401, NM_001289952, 1836 5386 8936 None ENST00000432094, ENST00000454407 Genewiz TFORF1747 ZNF761 NM_001289953, ENST00000613950 1837 5387 8937 None Genewiz TFORF1748 ZNF763 NM_001012753, ENST00000343949 1838 5388 8938 None Genewiz TFORF1749 ZNF184 NM_001318892, NM_001318891, NM_007149, 1839 5389 8939 None ENST00000211936, ENST00000377419 Genewiz TFORF1750 ZNF181 XM_017026739, XM_005258850, NM_001145665, 1840 5390 8940 None ENST00000459757 Genewiz TFORF1751 ZNF180 NM_001278508, ENST00000391956 1841 5391 8941 None Genewiz TFORF1752 ZNF180 NM_013256, ENST00000221327 1842 5392 8942 None Genewiz TFORF1753 ZNF180 NM_001278509, NM_001291633, ENST00000592529 1843 5393 8943 None Genewiz TFORF1754 ZNF182 NM_001007088, ENST00000376943 1844 5394 8944 None Genewiz TFORF1755 ZNF182 NM_006962, NM_001178099, ENST00000396965 1845 5395 8945 None Genewiz TFORF1756 ZNF189 NM_001278240, ENST00000615466 1846 5396 8946 None Genewiz TFORF1757 ZNF189 XM_006717281, XM_011518998, XM_006717280, 1847 5397 8947 None NM_197977, XM_017015121, ENST00000259395 Genewiz TFORF1758 ZNF189 NM_003452, ENST00000339664 1848 5398 8948 None Genewiz TFORF1759 ALYREF NM_005782, ENST00000505490 1849 5399 8949 None Genewiz TFORF1760 ZNF25 XM_005252385, XM_005252386, XM_005252384, 1850 5400 8950 None NM_145011, ENST00000302609 Genewiz TFORF1761 ZNF24 NM_006965, XM_005258341, ENST00000261332, 1851 5401 8951 None ENST00000399061 Genewiz TFORF1762 ZNF24 NM_001308123, ENST00000589881 1852 5402 8952 None Genewiz TFORF1763 ZNF26 XM_011534829, XM_005266182, XM_017019923, 1853 5403 8953 None XM_005266183, XM_017019922, ENST00000534834 Genewiz TFORF1764 ZNF20 NM_021143, ENST00000334213 1854 5404 8954 None Genewiz TFORF1765 ZNF23 NM_145911, NM_001304492, ENST00000393539, 1855 5405 8955 None ENST00000357254, ENST00000428724 Genewiz TFORF1766 ZNF23 NM_001304493, NM_001304494, ENST00000564528 1856 5406 8956 None Genewiz TFORF1767 ZNF22 NM_006963, ENST00000298299 1857 5407 8957 None Genewiz TFORF1768 ZNF28 NM_006969, XM_011527262, ENST00000457749 1858 5408 8958 None Genewiz TFORF1769 MGA NM_001080541, ENST00000545763, ENST00000566586 1859 5409 8959 None Genewiz TFORF1770 MGA NM_001164273, ENST00000219905, ENST00000570161 1860 5410 8960 None Genewiz TFORF1771 TBP NM_003194, ENST00000392092, ENST00000230354 1861 5411 8961 None Genewiz TFORF1772 TBP NM_001172085, ENST00000540980 1862 5412 8962 None Genewiz TFORF1773 ZNF501 NM_145044, NM_001258280, ENST00000396048, 1863 5413 8963 None ENST00000620116 Genewiz TFORF1774 ZNF500 NM_021646, ENST00000219478 1864 5414 8964 None Genewiz TFORF1775 ZNF500 NM_001303450, ENST00000545009 1865 5415 8965 None Genewiz TFORF1776 WHSC1 NM_007331, ENST00000514045, ENST00000420906 1866 5416 8966 None Genewiz TFORF1777 WHSC1 NM_001042424, XM_011513557, NM_133331, 1867 5417 8967 None NM_133330, NM_133335, XM_005248001, ENST00000508803, ENST00000382892, ENST082891, ENST00000382895 Genewiz TFORF1778 WHSC1 XM_005248005, XM_006713914, NM_133334, 1868 5418 8968 None ENST00000353275, ENST00000312087, ENST00000503128, ENST00000398261 Genewiz TFORF1779 ZNF507 NM_014910, NM_001136156, ENST00000311921, 1869 5419 8969 None ENST00000355898 Genewiz TFORF1780 ZNF506 NM_001099269, ENST00000591639, ENST00000443905, 1870 5420 8970 None ENST00000540806 Genewiz TFORF1781 ZNF506 NM_001145404, ENST00000450683 1871 5421 8971 None Genewiz TFORF1782 ZNF239 XM_011540238, XM_005271832, XM_006718003, 1872 5422 8972 None NM_005674, NM_001099282, NM_001324349, NM_001324350, NM_001324351, NM_001324348, NM_001324347, NM_001099284, NM_001099283, ENST00000306006, ENST00000374446, ENST00000426961, ENST00000535642 Genewiz TFORF1783 ZNF236 NM_001306089, ENST00000320610 1873 5423 8973 None Genewiz TFORF1784 ZNF236 XM_011526165, NM_007345, ENST00000253159 1874 5424 8974 None Genewiz TFORF1785 ZNF235 NM_004234, ENST00000291182 1875 5425 8975 None Genewiz TFORF1786 ZNF234 NM_001144824, NM_006630, XM_017026149, 1876 5426 8976 None XM_006722974, ENST00000592437, ENST00000426739 Genewiz TFORF1787 TCERG1 NM_006706, ENST00000296702 1877 5427 8977 None Genewiz TFORF1788 TCERG1 NM_001040006, ENST00000394421 1878 5428 8978 None Genewiz TFORF1789 ZNF232 NM_014519, XM_017025021, ENST00000250076 1879 5429 8979 None Genewiz TFORF1790 ZNF232 NM_001320952, ENST00000575898 1880 5430 8980 None Genewiz TFORF1791 FOXE3 NM_012186, ENST00000335071 1881 5431 8981 None Genewiz TFORF1792 CNBP NM_001127194, ENST00000446936 1882 5432 8982 None Genewiz TFORF1793 CNBP NM_001127193, ENST00000451728 1883 5433 8983 None Genewiz TFORF1794 CNBP NM_001127192, ENST00000441626 1884 5434 8984 None Genewiz TFORF1795 CNBP NM_003418, ENST00000422453 1885 5435 8985 None Genewiz TFORF1796 WDHD1 NM_001008396, ENST00000420358 1886 5436 8986 None Genewiz TFORF1797 WDHD1 NM_007086, ENST00000360586 1887 5437 8987 None Genewiz TFORF1798 USF2 NM_003367, ENST00000222305 1888 5438 8988 None Genewiz TFORF1799 USF2 NM_207291, ENST00000343550 1889 5439 8989 None Genewiz TFORF1800 USF2 NM_001321150, ENST00000379134 1890 5440 8990 None Genewiz TFORF1801 TBX10 NM_005995, ENST00000335385 1891 5441 8991 None Genewiz TFORF1802 USF3 NM_001009899, XM_017005871, ENST00000316407, 1892 5442 8992 None ENST00000478658 Genewiz TFORF1803 USF1 NM_001276373, NM_007122, ENST00000368020, 1893 5443 8993 None ENST00000368021 Genewiz TFORF1804 ZNF347 XM_017027384, XM_005259335, NM_032584, 1894 5444 8994 None ENST00000334197 Genewiz TFORF1805 ZNF347 NM_001172674, NM_001172675, ENST00000452676, 1895 5445 8995 None ENST00000601469 Genewiz TFORF1806 TBX15 XM_005271162, ENST00000369429 1896 5446 8996 None Genewiz TFORF1807 ZNF432 NM_001322285, NM_001322284, NM_014650, 1897 5447 8997 None ENST00000594154, ENST00000221315 Genewiz TFORF1808 TBX18 NM_001080508, ENST00000369663 1898 5448 8998 None Genewiz TFORF1809 TBX19 NM_005149, ENST00000367821 1899 5449 8999 None Genewiz TFORF1810 ZNF439 NM_152262, ENST00000304030 1900 5450 9000 None Genewiz TFORF1811 ZNF438 XM_017015877, XM_017015875, XM_017015881, 1901 5451 9001 None XM_017015879, XM_017015878, XM_017015880, XM_017015882, XM_017015876, ENST00000375311 Genewiz TFORF1812 ZNF438 NM_001143770, NM_001143771, XM_017015873, 1902 5452 9002 None XM_011519376, XM_011519377, XM_006717399, XM_006717398, XM_017015874, ENST00000331737, ENST00000452305 Genewiz TFORF1813 ZNF438 NM_001143769, ENST00000538351 1903 5453 9003 None Genewiz TFORF1814 ZNF430 NM_025189, ENST00000261560 1904 5454 9004 None Genewiz TFORF1815 ZNF433 NM_001080411, ENST00000344980 1905 5455 9005 None Genewiz TFORF1816 SATB1 NM_001195470, NM_001322871, XM_011533988, 1906 5456 9006 None XM_011533989, ENST00000417717 Genewiz TFORF1817 UHRF1 NM_013282, ENST00000622802 1907 5457 9007 None Genewiz TFORF1818 UHRF1 NM_001290050, NM_001048201, NM_001290051, 1908 5458 9008 None NM_001290052, ENST00000612630, ENST00000624301, ENST00000615884, ENST00000616255 Genewiz TFORF1819 ZNF138 NM_006524, ENST00000440155 1909 5459 9009 None Genewiz TFORF1820 ZNF138 NM_001271637, ENST00000440598 1910 5460 9010 None Genewiz TFORF1821 ZNF138 NM_001271640, ENST00000359735 1911 5461 9011 None Genewiz TFORF1822 ZNF138 NM_001160183, ENST00000494380 1912 5462 9012 None Genewiz TFORF1823 ZNF138 NM_001271638, ENST00000437743 1913 5463 9013 None Genewiz TFORF1824 ZNF138 NM_001271639, ENST00000307355 1914 5464 9014 None Genewiz TFORF1825 ARNT NM_178427, ENST00000505755 1915 5465 9015 None Genewiz TFORF1826 ARNT NM_001286036, ENST00000354396 1916 5466 9016 None Genewiz TFORF1827 ARNT NM_001668, ENST00000358595 1917 5467 9017 None Genewiz TFORF1828 ARNT NM_001286035, XM_017001289, ENST00000515192 1918 5468 9018 None Genewiz TFORF1829 DMRTA2 XM_011541937, NM_032110, ENST00000418121, 1919 5469 9019 None ENST00000404795 Genewiz TFORF1830 ZNF717 NM_001324026, NM_001290210, XM_017005487, 1920 5470 9020 None ENST00000477374 Genewiz TFORF1831 ZNF717 NM_001290209, ENST00000478296 1921 5471 9021 None Genewiz TFORF1832 MTERF1 NM_006980, XM_005250593, XM_006716126, 1922 5472 9022 None ENST00000351870 Genewiz TFORF1833 MTERF1 NM_001301135, NM_001301134, XM_017012620, 1923 5473 9023 None ENST00000419292, ENST00000406735 Genewiz TFORF1834 EZH1 NM_001321081, ENST00000415827 1924 5474 9024 None Genewiz TFORF1835 EZH1 NM_001321082, ENST00000585893 1925 5475 9025 None Genewiz TFORF1836 EZH2 NM_152998, ENST00000350995 1926 5476 9026 None Genewiz TFORF1837 EZH2 NM_001203249, ENST00000478654, ENST00000476773 1927 5477 9027 None Genewiz TFORF1838 EZH2 NM_001203247, ENST00000460911 1928 5478 9028 None Genewiz TFORF1839 EZH2 NM_001203248, ENST00000483967 1929 5479 9029 None Genewiz TFORF1840 GTF2A1 NM_201595, ENST00000434192 1930 5480 9030 None Genewiz TFORF1841 HBP1 XM_017011967, NM_012257, XM_005250266, 1931 5481 9031 None XM_005250267, ENST00000468410, ENST00000222574, ENST00000485846 Genewiz TFORF1842 YWHAZ NM_001135702, NM_001135701, NM_001135700, 1932 5482 9032 None NM_145690, NM_001135699, NM_003406, XM_005251063, XM_017013811, XM_017013810, XM_005251061, ENST00000395957, ENST00000395958, ENST057309, ENST00000395956, ENST00000353245, ENST00000395953, ENST00000395951, ENST00000419477 Genewiz TFORF1843 ALX1 NM_006982, ENST00000316824 1933 5483 9033 None Genewiz TFORF1844 RB1 NM_000321, ENST00000267163 1934 5484 9034 None Genewiz TFORF1845 ALX4 NM_021926, ENST00000329255 1935 5485 9035 None Genewiz TFORF1846 ETS1 NM_005238, ENST00000319397 1936 5486 9036 None Genewiz TFORF1847 ETS1 NM_001143820, XM_017017314, ENST00000392668 1937 5487 9037 None Genewiz TFORF1848 ETS1 XM_011542652, ENST00000526145 1938 5488 9038 None Genewiz TFORF1849 ETS1 NM_001162422, ENST00000535549 1939 5489 9039 None Genewiz TFORF1850 YWHAE NM_006761, ENST00000264335 1940 5490 9040 None Genewiz TFORF1851 ZBTB3 NM_024784, ENST00000394807 1941 5491 9041 None Genewiz TFORF1852 SOX18 NM_018419, ENST00000340356 1942 5492 9042 None Genewiz TFORF1853 ZBTB1 NM_014950, XM_011536568, ENST00000358738 1943 5493 9043 None Genewiz TFORF1854 ZBTB5 NM_014872, XM_005251634, ENST00000307750 1944 5494 9044 None Genewiz TFORF1855 ZBTB4 NM_001128833, NM_020899, XM_006721563, 1945 5495 9045 None XM_006721564, XM_011523972, ENST00000380599, ENST00000311403 Genewiz TFORF1856 NR4A2 NM_006186, XM_017004219, ENST00000339562, 1946 5496 9046 None ENST00000409572 Genewiz TFORF1857 NR4A2 XM_005246622, ENST00000426264 1947 5497 9047 None Genewiz TFORF1858 NR4A3 NM_006981, XM_017015162, ENST00000395097 1948 5498 9048 None Genewiz TFORF1859 NR4A3 NM_173199, ENST00000338488 1949 5499 9049 None Genewiz TFORF1860 NR4A3 NM_173200, ENST00000618101, ENST00000330847 1950 5500 9050 None Genewiz TFORF1861 DRGX NM_001276451, ENST00000374139, ENST00000434016 1951 5501 9051 None Genewiz TFORF1862 OTP NM_032109, ENST00000306422 1952 5502 9052 None Genewiz TFORF1863 POU2F2 XM_005259010, ENST00000342301 1953 5503 9053 None Genewiz TFORF1864 POU2F2 NM_001207025, ENST00000526816 1954 5504 9054 None Genewiz TFORF1865 POU2F2 XM_011527042, ENST00000560398 1955 5505 9055 None Genewiz TFORF1866 POU2F2 NM_002698, ENST00000389341 1956 5506 9056 None Genewiz TFORF1867 POU2F2 NM_001207026, ENST00000529952 1957 5507 9057 None Genewiz TFORF1868 MMP3 NM_002422, ENST00000299855 1958 5508 9058 None Genewiz TFORF1869 POU2F1 NM_001198783, ENST00000367862 1959 5509 9059 None Genewiz TFORF1870 POU2F1 NM_001198786, ENST00000429375 1960 5510 9060 None Genewiz TFORF1871 POU2F1 NM_002697, ENST00000367866 1961 5511 9061 None Genewiz TFORF1872 BPTF NM_182641, ENST00000306378 1962 5512 9062 None Genewiz TFORF1873 BBX NM_020235, XM_017006882, XM_011513004, 1963 5513 9063 None ENST00000415149, ENST00000406780 Genewiz TFORF1874 BBX NM_001142568, XM_005247644, XM_011513001, 1964 5514 9064 None XM_011513000, XM_017006881, XM_005247642, XM_005247643, ENST00000325805 Genewiz TFORF1875 CCNT1 NM_001277842, ENST00000618666, ENST00000417344 1965 5515 9065 None Genewiz TFORF1876 CONT1 NM_001240, ENST00000261900 1966 5516 9066 None Genewiz TFORF1877 CCNT2 NM_001241, ENST00000295238 1967 5517 9067 None Genewiz TFORF1878 CCNT2 NM_058241, ENST00000264157 1968 5518 9068 None Genewiz TFORF1879 CREBZF NM_001039618, XM_017018089, XM_017018092, 1969 5519 9069 None XM_017018091, XM_017018088, XM_011545195, XM_017018090, ENST00000490820, ENST00000527447 Genewiz TFORF1880 CREBZF XM_017018087, XM_006718642, XM_017018086, 1970 5520 9070 None ENST00000525639 Genewiz TFORF1881 SOX30 NM_178424, ENST00000265007 1971 5521 9071 None Genewiz TFORF1882 SOX30 NM_007017, ENST00000311371 1972 5522 9072 None Genewiz TFORF1883 SOX30 XM_011534420, XM_005265803, NM_001308165, 1973 5523 9073 None ENST00000519442 Genewiz TFORF1884 ZIC2 NM_007129, ENST00000376335 1974 5524 9074 None Genewiz TFORF1885 BRCA1 NM_007294, ENST00000357654 1975 5525 9075 None Genewiz TFORF1886 BRCA1 NM_007300, ENST00000471181 1976 5526 9076 None Genewiz TFORF1887 BRCA1 NM_007298, ENST00000491747 1977 5527 9077 None Genewiz TFORF1888 BRCA1 NM_007297, ENST00000493795 1978 5528 9078 None Genewiz TFORF1889 WNK1 NM_001184985, ENST00000537687 1979 5529 9079 None Genewiz TFORF1890 WNK1 NM_014823, ENST00000535572 1980 5530 9080 None Genewiz TFORF1891 WNK1 NM_018979, ENST00000315939 1981 5531 9081 None Genewiz TFORF1892 WNK1 NM_213655, ENST00000340908 1982 5532 9082 None Genewiz TFORF1893 AHRR NM_001242412, ENST00000505113 1983 5533 9083 None Genewiz TFORF1894 AHRR NM_020731, ENST00000316418 1984 5534 9084 None Genewiz TFORF1895 NR2C2 NM_001291694, XM_017007120, ENST00000425241, 1985 5535 9085 None ENST00000393102, ENST00000406272 Genewiz TFORF1896 NR2C1 NM_003297, ENST00000333003 1986 5536 9086 None Genewiz TFORF1897 NR2C1 NM_001127362, ENST00000330677 1987 5537 9087 None Genewiz TFORF1898 PATZ1 NM_014323, ENST00000266269 1988 5538 9088 None Genewiz TFORF1899 PATZ1 NM_032050, ENST00000351933 1989 5539 9089 None Genewiz TFORF1900 PATZ1 NM_032052, ENST00000405309 1990 5540 9090 None Genewiz TFORF1901 TCF7L1 NM_031283, ENST00000282111 1991 5541 9091 None Genewiz TFORF1902 TCF7L2 NM_001198528, ENST00000352065 1992 5542 9092 None Genewiz TFORF1903 TCF7L2 XM_005270096, ENST00000538897 1993 5543 9093 None Genewiz TFORF1904 TCF7L2 NM_001198530, ENST00000534894 1994 5544 9094 None Genewiz TFORF1905 TCF7L2 XM_005270084, ENST00000355995 1995 5545 9095 None Genewiz TFORF1906 TCF7L2 NM_001146274, ENST00000627217 1996 5546 9096 None Genewiz TFORF1907 TCF7L2 NM_030756, ENST00000369397 1997 5547 9097 None Genewiz TFORF1908 TCF7L2 NM_001146283, ENST00000355717 1998 5548 9098 None Genewiz TFORF1909 TCF7L2 NM_001146285, ENST00000536810 1999 5549 9099 None Genewiz TFORF1910 TCF7L2 XM_005270089, ENST00000629706 2000 5550 9100 None Genewiz TFORF1911 TCF7L2 NM_001198527, ENST00000369395 2001 5551 9101 None Genewiz TFORF1912 TCF7L2 NM_001198525, ENST00000545257 2002 5552 9102 None Genewiz TFORF1913 TCF7L2 XM_005270085, ENST00000543371 2003 5553 9103 None Genewiz TFORF1914 RUNX2 NM_001024630, ENST00000465038, ENST00000371438 2004 5554 9104 None Genewiz TFORF1915 RUNX2 XM_017011396, ENST00000478660 2005 5555 9105 None Genewiz TFORF1916 RUNX2 NM_001278478, ENST00000625924 2006 5556 9106 None