基本了解
motifs指在一些序列当中,高频出现的序列特征。motif 的发现与富集是生信分析的常用手段,其原理和常用的工具,以及不同工具的特点,如何选择合适的工具,都值得深入的了解。先在这里填个坑,希望能把这个大坑填完。
motifs 常以 seq logo 的形式出现,根据纵坐标的不同分为 probability 和 entropy 两种方式。probability 好理解,就是在每个位置上,不同的状态所占比例,如 A, C, G, T 各自占的百分比。而 entropy(熵) 涉及到了信息熵的概念。这里有一篇问答,值得细看。
工具
meme suite
Motif Discovery
MEME
discover novel, ungapped motifs, recurring, fixed-length pattern.
input: no more than 50 sequences, fixed length (or STREME is a better choice)
STREME
ungapped motif
note: work with DNA sequence even if you specify the RNA alphabet
XSTREME
comprehensive motif analysis (including motif discovery)
input: any length, and length may vary
note:
- 找 motifs (with STREME and MEME)
- determine which motifs are most enriched (with SEA)
- 分析与已知 motif 的相似度 (with Tomtom)
- 根据相似性进行分组
- 创建一个 GFF file,方便查看每一个 motif 的 predicted sites
MEME-ChIP
comprehensive motif analysis (including motif discovery), 并且针对于 motif 倾向出现在 centrally located, 例如 ChIP-seq peaks.
input: 可能包含 motif 的 100 character 在序列中心,整条序列长度大概在 500 letters.
will do:
- 在中心区域(默认 centered 100 characters)找 motif
- determine which motifs are most centrally enriched (with CentriMo)
- 分析与已知 motif 的相似度 (with Tomtom)
- 根据相似性进行分组
- perform a motif spacing analysis (with SpaMo)
- create a GFG file
GLAM2(Gapped Local Alignment of Motifs)
在 DNA 或 protein 水平上,discover novel, gapped motifs (variable-length pattern)
- 允许 insertions and deletions in motifs
- simulated annealing algorithm, with a temperature parameter.
MoMo(Modification motifs)
discover sequence motifs associated with PTMs (post-translational modifications)
Motif Enrichment
SEA(Simple Enrichment Analysis)
CentriMo(Central Motif)
AME(Analysis of Motif Enrichment)
AME identifies known
homer
HOMER Motif Analysis
findMotifs.pl
findMotifsGenome.pl
[]
homer2
chromVAR (for scATAC-seq)
inferring transcription-factor-associated accessibility from single-cell epigenomic data.
[weblogo3]
网页工具
其他工具
database
Cis-BP
the online library of transcription factors and their DNA binding motifs.