1. 有參物種使用gene ID的方法
1. 差異基因文件準(zhǔn)備
只需要用到兩列
ENTREZ_GENE_ID
logFC
geneNames ENTREZ_GENE_ID normalAve tumorAve logFC pValue qValueCCL23 6368 95.05964624 5.566645819 -4.066608903 2.07E-31 5.99E-29COLEC10 10584 1459.366228 83.66298626 -4.122671832 2.11E-31 6.00E-29FAM189B 10712 383.9435808 1289.852064 1.747953745 2.17E-31 6.08E-29CDC45 8318 12.20616678 258.9248256 4.38682126 3.59E-31 9.94E-29RCAN1 1827 11046.97758 2309.590455 -2.257915165 3.90E-31 1.07E-28N4BP2L1 90634 2644.65753 734.73331 -1.847750259 4.57E-31 1.23E-28FCN3 8547 6777.184345 389.412555 -4.120767162 5.41E-31 1.44E-28UHRF1 29128 15.89471347 327.8659692 4.353192433 5.73E-31 1.50E-28HMMR 3161 25.23294528 407.9486624 4.008655285 8.18E-31 2.12E-28NEK2 4751 18.88655007 390.7591103 4.36024922 9.48E-31 2.43E-28
選擇基因的ID作為輸入文件
63681058410712831818279063485472912831614751
2. 登陸kobas數(shù)據(jù)庫(kù)
網(wǎng)站:http://kobas.cbi.pku.edu.cn/
進(jìn)入 Gene-list-Enrichment
http://kobas.cbi.pku.edu.cn/anno_iden.php
輸入數(shù)據(jù)類型:
Fasta Protein Sequence ——蛋白序列
Fasta Nucleotide Sequence——核酸序列
Tabular BLAST Output——blast輸出的表格
Entrez Gene ID——基因ID
UniProtKB AC
Refseq Protein ID
Ensembl Gene ID
3. 選擇
1. 輸入類型選擇:Gene ID
2. 物種選擇:Homo sapiens (human)
3. 粘貼Gene ID列表
4. 數(shù)據(jù)庫(kù) Clear All取消Pathway、Disease、GO全部選項(xiàng),只選擇KEGG Patway
點(diǎn)擊RUN
4. 在線分析完成,輸出結(jié)果
5. 輸出文件說(shuō)明
統(tǒng)計(jì)學(xué)檢驗(yàn)方法:超幾何檢驗(yàn)、FIsher精確檢驗(yàn)
FDR校正方法:Benjamini and Hochberg,需要補(bǔ)充此方法
##Statistical test method: hypergeometric test / Fisher's exact test##FDR correction method: Benjamini and Hochberg
輸出表格:
Term KEGG的注釋類
Database 數(shù)據(jù)庫(kù)類型
ID Term對(duì)應(yīng)的ID
input number 富集到這個(gè)Term的輸入基因個(gè)數(shù)
Background number 數(shù)據(jù)庫(kù)中富集到這個(gè)通路的總有基因數(shù)量
P-value P值
Corrected P-Value 校正后P值
Input 輸入的Gene ID,如果多個(gè),以|號(hào)分開(kāi)
Hyperlink 網(wǎng)頁(yè)鏈接
如鏈接:
http://www.genome.jp/kegg-bin/show_pathway?hsa04512/hsa:3161%09red
圖片會(huì)將對(duì)應(yīng)的Gene name標(biāo)志為紅色
6. 軟件安裝準(zhǔn)備
由于bioconductor外網(wǎng)鏈接慢,使用conda的方法安裝,同時(shí)安裝依賴的包
conda install bioconductor-clusterprofiler
7. 畫(huà)圖
# 初始化環(huán)境rm(list=ls())# 安裝軟件#source("https://bioconductor.org/biocLite.R")#biocLite()#biocLite("clusterProfiler")#biocLite("pathview")# 設(shè)置通路setwd("/home/toucan/Project/001.kegg_map")# 加載庫(kù)library("clusterProfiler")# 讀入文件,不檢測(cè)namert=read.table("input.txt",sep="\t",header=T,check.names=F)rt# 構(gòu)建gene id為行名稱的,logFC geneFC=rt$logFCgeneFCgene <- rt$ENTREZ_GENE_IDgenenames(geneFC)=genegeneFC#kegg# 保存輸出文件# 設(shè)定物種,qvalue小于0.05才輸出,readable是否輸出轉(zhuǎn)換為gene namekk <- enrichKEGG(gene = gene, organism = "human", pvalueCutoff = 0.05,qvalueCutoff = 0.05)class(kk)kkas.data.frame(kk)write.table(as.data.frame(kk),file="KEGG.xls",sep="\t",quote=F,row.names = F)# 生成barplotpdf(file="KEGG.barplot.pdf")barplot(kk, drop = TRUE, showCategory = 12)pdf(file="KEGG.cnetplot.pdf")# 生成網(wǎng)絡(luò)圖,需要通路描述列、輸入gene ID列組成#cnetplot(kk,categorySize = "geneNum", foldChange = geneFC)library("pathview")keggxls=read.table("KEGG.xls",sep="\t",header=T)# 聯(lián)網(wǎng),將map圖片下載for(i in keggxls$ID){ pv.out <- pathview(gene.data = geneFC, pathway.id = i, species = "hsa", out.suffix = "pathview")}
7. 輸出結(jié)果
輸出富集的表格:
ID Description GeneRatio BgRatio pvalue p.adjust qvalue geneID Counthsa04110 Cell cycle 19/199 124/7431 5.54E-10 1.37E-07 1.31E-07 8318/7272/890/1870/701/4085/4998/4171/4175/898/23594/1031/4172/4616/8317/4176/4174/9134/993 19hsa03030 DNA replication 11/199 36/7431 1.29E-09 1.60E-07 1.53E-07 2237/4171/4175/10535/5984/4172/5558/5424/23649/4176/4174 11hsa03440 Homologous recombination 8/199 41/7431 1.03E-05 0.000849457 0.000811238 146956/8438/5888/7517/5424/641/7516/25788 8hsa05222 Small cell lung cancer 11/199 93/7431 3.44E-05 0.002135572 0.002039489 1870/898/3910/4616/1282/3655/1284/9134/5743/3915/1163 11
每個(gè)通路生成三個(gè)文件
hsa03030.pathview.png
hsa03030.png
hsa03030.xml
輸出富集的圖片

輸出伏擊通路下載的map:
有差異基因顯示,紅色為正相關(guān),綠色為負(fù)

同時(shí),輸出網(wǎng)站原始下載的,無(wú)顏色標(biāo)注

非模式生物
以序列作為輸入文件
>seq1CTAATTTTGATGTAACAATAAGCAAATCCATCTCATTGACATGTCAACTTACCTTAATCTTTAATAAGTGATAAAGTCATATGTATGCCAAAAATTGCCTTAGCATTGCGTTATGACCTACCGTTAGTAGATGTCTGATT>seq2AGTCTCGAATACAACTTGTTGCTGCGCGGACGCGAATCGCTCAGTACGGACGTCTTGAGCTCGAATCCTCGGCCATATCTGTGCTCTCGATCGCAGCGTTTGCTAATTCGAAGATCGTGCTAATCGAAGTACCGAGAAAT
注意,物種應(yīng)選擇KO,但會(huì)筆記慢

顯示:
不應(yīng)該超過(guò)200行的輸入文件
If you choose KO, Please input no more than 200 lines at one time.
運(yùn)行中:
http://kobas.cbi.pku.edu.cn/wait_kobas.php?taskid=180629456069220Your task is still running, your task id is 180629456069220, you can get the results automatically when the task is finished.Also you can use the task id to fetch results at the result retrive page in the future.