网页地址:https://www.ncbi.nlm.nih.gov/biocollections<ref name=":0" /><ref>[https://www.ncbi.nlm.nih.gov/biocollections/docs/query/ Biocollections Query Tips]</ref><ref>[[/www.cnblogs.com/yahengwang/p/9550410.html|生物数据库介绍——NCBI]]</ref>
===BioProject数据库(旧称:Genome Project)===

=== BioProject数据库(旧称:Genome Project) ===
生物项目是与来自单个组织或财团的单个计划相关的生物数据的集合。 BioProject记录为用户提供了一个单一的位置,可以找到该项目生成的各种数据的链接,并存储到INSDC成员维护的档案数据库中。 BioProject的典型示例包括用于对多种细菌菌株进行测序的多分离项目,或特定生物的基因组和转录组的单分离项目。您提供的有关这项研究工作的描述对于为实验数据提供上下文非常重要。一个基因组学,功能基因组学和遗传学研究的集合,并链接到它们产生的数据集。该资源描述了项目的范围,材料和目标,并提供了一种检索数据集的机制,这些数据集由于注释不一致,多次独立提交以及通常存储在不同数据库中的多种数据类型的不同性质而常常很难找到。(机翻)

''(原文:A BioProject is a collection of biological data related to a single initiative originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data generated for that project and deposited into the archival databases maintained by members of the INSDC. Typical examples of a BioProject include a multiisolate project for sequencing multiple strains of a bacterial species, or a monoisolate project for the genome and transcriptome of a particular organism. The description you supply about this research effort is important for providing context to your experimental data. A collection of genomics, functional genomics, and genetics studies and links to their resulting datasets. This resource describes project scope, material, and objectives and provides a mechanism to retrieve datasets that are often difficult to find due to inconsistent annotation, multiple independent submissions, and the varied nature of diverse data types which are often stored in different databases.)''
The 1000 Genomes Project (human)千人基因组计划[https://www.ncbi.nlm.nih.gov/bioproject/28889 (ID:28889)]
The human ENCODE (ENCyclopedia Of DNA Elements) project人类DNA元素百科全书计划[https://www.ncbi.nlm.nih.gov/bioproject/30707 (ID:30707)]
NIH Human Microbiome Project (HMP) Roadmap Project人类微生物组计划[https://www.ncbi.nlm.nih.gov/bioproject/43021 (ID:43021)]
= UCSC基因组浏览器 =
'''UCSC 基因组浏览器'''是由加州大学圣克鲁斯分校(UCSC)的在线可下载基因组浏览器。提供来自各种脊椎动物和无脊椎动物物种以及主要模式生物的基因组序列数据,是一个图形查看器.
* 深蓝色的RefSeq
* 绿色的OMIM(在线人类孟德尔遗传),收集了有关人类遗传病的突变。
* 等等等等
[[文件:UCSC 3.png|左|缩略图|UCSC的RefSeq]]
=== BLAT ===
== UniProt:联合蛋白质序列数据库 ==

* 起源:TrEMBL、Swiss-Prot、PIR-PSD三个数据库的数据合并而成
* 第一层次:UniPrac(Uniprot archive,档案馆):三个子库中所有序列的直接合并,信息比较粗糙,冗余。
* 第二层次:UniRef(UniProt Reference Clusters):去除了重复序列,UniRef100即去除了完全相同的冗余序列后的剩余,UniRef90即去除了相似性在90%以上的相似序列后的剩余,依此类推。
* 第三层次:UniProtKB(UniProt KnowledgeBase):详细注释的,有文献和其他数据库链接的精品数据库,有UniProtKB/TrEMBL(自动注释)和UniProtKB/Swissprot(人工注释)两部分。
* 还有Proteomes库是蛋白质组数据库
NCBI(National Center for Biotechnology Information,美国国家生物技术信息中心)

已故的参议员克劳德·佩珀(Claude Pepper)认识到计算机信息处理方法对于进行生物医学研究的重要性,并发起了立法。他于1988年11月4日在美国国立卫生研究院(NIH)成立了美国国家生物技术信息中心(NCBI)并作为美国国家医学图书馆(NLM)的一个部门。选择了在NLM是因为它有创建和维护生物医学数据库方面的经验,又因为它是NIH的一部分,因此可以建立计算分子生物学的研究计划。 NIH的集体研究组成了世界上最大的生物医学研究机构。(机翻)

(原文:The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research and sponsored legislation that established the National Center for Biotechnology Information (NCBI) on November 4, 1988, as a division of the National Library of Medicine (NLM) at the National Institutes of Health (NIH). NLM was chosen for its experience in creating and maintaining biomedical databases, and because as part of NIH, it could establish an intramural research program in computational molecular biology. The collective research components of NIH make up the largest biomedical research facility in the world.)[1]





(原文:A database providing information on the structure of assembled genomes, assembly names and other meta-data, statistical reports, and links to genomic sequence data.)

Assembly数据库包含有关组装基因组结构的信息,如AGP文件中所示或完整测序的染色体的集合。 该数据库提供了版本化的程序集登录号,该编号可跟踪程序集的更改,这些更改是通过随着时间的推移提交组来更新的。 该Web资源提供有关程序集的元数据,例如程序集名称(和备用名称),简单的程序集统计报告(重叠群,脚手架的类型和数量; N50)以及更新的历史视图。 它还跟踪提交给国际核苷酸序列数据库协作(INSDC)的程序集(即DDBJ,ENA或GenBank)与NCBI参考序列(RefSeq)项目中表示的程序集之间的关系。(机翻)

(原文:The Assembly database has information about the structure of assembled genomes as represented in an AGP file or as a collection of completely sequenced chromosomes. The database provides a versioned Assembly accession number that tracks changes to assemblies as they are updated by submitting groups over time. The web resource provides meta-data about assemblies such as assembly names (and alternate names), simple statistical reports of the assembly (type and number of contigs, scaffolds; N50s) and a history view of updates. It also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Collaboration ( INSDC ), i.e. DDBJ , ENA or GenBank , and the assembly represented in the NCBI Reference Sequence (RefSeq) project.)






(原文:BioCollections is a curated dataset of metadata for culture collections, museums, herbaria and other natural history collections, including Darwin Core institution and collection codes, and URL formulae for mapping specimen ids to web pages at the collection site. Biocollections stores acronyms used in “structured vouchers” for sequence entries submitted to the International Nucleotide Sequence Database (INSDC)(GenBank, European Nucleotide Archive (ENA), and DNA Databank of Japan (DDBJ)) and NCBI’s BioSample.)



代码 意思 注文
[icode] 检索机构代码(部分代码)
[uicode] 检索唯一的机构代码
[ccode] 检索对应的收藏代码(/specimen_voucher="UAM:Mamm:24119") 哺乳类:Mamm



[iname] 搜索机构名称
[cname] 搜索收藏类型
[all] 以上所有
按类别搜索 collection type museum[prop] 检索博物馆
collection type herbarium[prop] 检索植物标本室
collection type culture collection[prop] 检索文化类型收集



BioProject数据库(旧称:Genome Project)


生物项目是与来自单个组织或财团的单个计划相关的生物数据的集合。 BioProject记录为用户提供了一个单一的位置,可以找到该项目生成的各种数据的链接,并存储到INSDC成员维护的档案数据库中。 BioProject的典型示例包括用于对多种细菌菌株进行测序的多分离项目,或特定生物的基因组和转录组的单分离项目。您提供的有关这项研究工作的描述对于为实验数据提供上下文非常重要。一个基因组学,功能基因组学和遗传学研究的集合,并链接到它们产生的数据集。该资源描述了项目的范围,材料和目标,并提供了一种检索数据集的机制,这些数据集由于注释不一致,多次独立提交以及通常存储在不同数据库中的多种数据类型的不同性质而常常很难找到。(机翻)

(原文:A BioProject is a collection of biological data related to a single initiative originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data generated for that project and deposited into the archival databases maintained by members of the INSDC. Typical examples of a BioProject include a multiisolate project for sequencing multiple strains of a bacterial species, or a monoisolate project for the genome and transcriptome of a particular organism. The description you supply about this research effort is important for providing context to your experimental data. A collection of genomics, functional genomics, and genetics studies and links to their resulting datasets. This resource describes project scope, material, and objectives and provides a mechanism to retrieve datasets that are often difficult to find due to inconsistent annotation, multiple independent submissions, and the varied nature of diverse data types which are often stored in different databases.)


The 1000 Genomes Project (human)千人基因组计划(ID:28889)

The human ENCODE (ENCyclopedia Of DNA Elements) project人类DNA元素百科全书计划(ID:30707)

NIH Human Microbiome Project (HMP) Roadmap Project人类微生物组计划(ID:43021)








UCSC 基因组浏览器是由加州大学圣克鲁斯分校(UCSC)的在线可下载基因组浏览器。提供来自各种脊椎动物和无脊椎动物物种以及主要模式生物的基因组序列数据,是一个图形查看器.



  • 蓝色的GENCODE
  • 深蓝色的RefSeq
  • 绿色的OMIM(在线人类孟德尔遗传),收集了有关人类遗传病的突变。
  • 黄色的ENCODE
  • 等等等等





  • 起源:TrEMBL、Swiss-Prot、PIR-PSD三个数据库的数据合并而成
  • 第一层次:UniPrac(Uniprot archive,档案馆):三个子库中所有序列的直接合并,信息比较粗糙,冗余。
  • 第二层次:UniRef(UniProt Reference Clusters):去除了重复序列,UniRef100即去除了完全相同的冗余序列后的剩余,UniRef90即去除了相似性在90%以上的相似序列后的剩余,依此类推。
  • 第三层次:UniProtKB(UniProt KnowledgeBase):详细注释的,有文献和其他数据库链接的精品数据库,有UniProtKB/TrEMBL(自动注释)和UniProtKB/Swissprot(人工注释)两部分。
  • 还有Proteomes库是蛋白质组数据库