现介绍生物信息数据库及工具。

NCBI（National Center for Biotechnology Information，美国国家生物技术信息中心）

已故的参议员克劳德·佩珀（Claude Pepper）认识到计算机信息处理方法对于进行生物医学研究的重要性，并发起了立法。他于1988年11月4日在美国国立卫生研究院（NIH）成立了美国国家生物技术信息中心（NCBI）并作为美国国家医学图书馆（NLM）的一个部门。选择了在NLM是因为它有创建和维护生物医学数据库方面的经验，又因为它是NIH的一部分，因此可以建立计算分子生物学的研究计划。 NIH的集体研究组成了世界上最大的生物医学研究机构。（机翻）

（原文：The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research and sponsored legislation that established the National Center for Biotechnology Information (NCBI) on November 4, 1988, as a division of the National Library of Medicine (NLM) at the National Institutes of Health (NIH). NLM was chosen for its experience in creating and maintaining biomedical databases, and because as part of NIH, it could establish an intramural research program in computational molecular biology. The collective research components of NIH make up the largest biomedical research facility in the world.）^[1]

数据库

Assembly数据库

标签：综合基因组数据库

一个提供有关组装的基因组结构，组装名称和其他元数据，统计报告以及与基因组序列数据的链接的信息的数据库。（机翻）

（原文：A database providing information on the structure of assembled genomes, assembly names and other meta-data, statistical reports, and links to genomic sequence data.）

Assembly数据库包含有关组装基因组结构的信息，如AGP文件中所示或完整测序的染色体的集合。该数据库提供了版本化的程序集登录号，该编号可跟踪程序集的更改，这些更改是通过随着时间的推移提交组来更新的。该Web资源提供有关程序集的元数据，例如程序集名称（和备用名称），简单的程序集统计报告（重叠群，脚手架的类型和数量； N50）以及更新的历史视图。它还跟踪提交给国际核苷酸序列数据库协作（INSDC）的程序集（即DDBJ，ENA或GenBank）与NCBI参考序列（RefSeq）项目中表示的程序集之间的关系。（机翻）

（原文：The Assembly database has information about the structure of assembled genomes as represented in an AGP file or as a collection of completely sequenced chromosomes. The database provides a versioned Assembly accession number that tracks changes to assemblies as they are updated by submitting groups over time. The web resource provides meta-data about assemblies such as assembly names (and alternate names), simple statistical reports of the assembly (type and number of contigs, scaffolds; N50s) and a history view of updates. It also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Collaboration ( INSDC ), i.e. DDBJ , ENA or GenBank , and the assembly represented in the NCBI Reference Sequence (RefSeq) project.）

网页视图：

网页地址：https://www.ncbi.nlm.nih.gov/assembly^[2]^[3]

BioCollections数据库

标签：标本数据库

是用于文化收藏，博物馆，草本植物和其他自然历史收藏（包括达尔文核心机构和收藏代码）的链接元数据的精选数据集，以及用于将标本ID映射到收藏站点上的网页的链接。Biocollections存储“结构化凭证”（机构代码：可选集合代码：样本ID，如：/culture_collection="ISBC:CMF:1866"）中使用的首字母缩略词，用于向国际核苷酸序列数据库（INSDC）（GenBank，欧洲核苷酸档案库（ENA）和日本DNA数据库（DDBJ））和NCBI的BioSample提交的序列条目。（机翻）

（原文：BioCollections is a curated dataset of metadata for culture collections, museums, herbaria and other natural history collections, including Darwin Core institution and collection codes, and URL formulae for mapping specimen ids to web pages at the collection site. Biocollections stores acronyms used in “structured vouchers” for sequence entries submitted to the International Nucleotide Sequence Database (INSDC)(GenBank, European Nucleotide Archive (ENA), and DNA Databank of Japan (DDBJ)) and NCBI’s BioSample.）

注意：本数据库不收录来自个人的生物标本藏品，仅指向各馆藏数据库。

查询：

代码		意思	注文
[icode]		检索机构代码（部分代码）
[uicode]		检索唯一的机构代码
[ccode]		检索对应的收藏代码（/specimen_voucher="UAM:Mamm:24119"）	哺乳类：Mamm 鱼类：Fish 昆虫：Ento
[iname]		搜索机构名称
[cname]		搜索收藏类型
[all]		以上所有
按类别搜索	collection type museum[prop]	检索博物馆
	collection type herbarium[prop]	检索植物标本室
	collection type culture collection[prop]	检索文化类型收集

网页视图：

网页地址：https://www.ncbi.nlm.nih.gov/biocollections^[2]^[4]^[5]

BioProject数据库（旧称：Genome Project）

标签：项目计划数据库

生物项目是与来自单个组织或财团的单个计划相关的生物数据的集合。 BioProject记录为用户提供了一个单一的位置，可以找到该项目生成的各种数据的链接，并存储到INSDC成员维护的档案数据库中。 BioProject的典型示例包括用于对多种细菌菌株进行测序的多分离项目，或特定生物的基因组和转录组的单分离项目。您提供的有关这项研究工作的描述对于为实验数据提供上下文非常重要。一个基因组学，功能基因组学和遗传学研究的集合，并链接到它们产生的数据集。该资源描述了项目的范围，材料和目标，并提供了一种检索数据集的机制，这些数据集由于注释不一致，多次独立提交以及通常存储在不同数据库中的多种数据类型的不同性质而常常很难找到。（机翻）

（原文：A BioProject is a collection of biological data related to a single initiative originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data generated for that project and deposited into the archival databases maintained by members of the INSDC. Typical examples of a BioProject include a multiisolate project for sequencing multiple strains of a bacterial species, or a monoisolate project for the genome and transcriptome of a particular organism. The description you supply about this research effort is important for providing context to your experimental data. A collection of genomics, functional genomics, and genetics studies and links to their resulting datasets. This resource describes project scope, material, and objectives and provides a mechanism to retrieve datasets that are often difficult to find due to inconsistent annotation, multiple independent submissions, and the varied nature of diverse data types which are often stored in different databases.）

大型计划：

The 1000 Genomes Project (human)千人基因组计划（ID：28889）