RCSB PDB Help

Annotations

● Introduction

○ What are Annotations?

○ Why look at Annotations?

● Documentation

○ Gene Product Annotation*

○ Protein Sequence Annotations

■ Protein Family Annotation

■ InterPro Annotations*

○ Protein Structural Domain Annotations

■ SCOP/SCOPe

■ CATH

■ ECOD

■ SCOP2

○ Membrane Protein Annotations

■ OPM

■ PDBTM

■ MemProtMD

■ mpstruc

○ Sequence, Structure and Function Annotations

■ Antibody Annotations

□ IMGT

□ SAbDab

□ Thera-SAbDab

■ Antimicrobial Resistance Annotations (from CARD)

■ Pharos Disease Annotations*

■ Mechanism and Catalytic Site Atlas (M-CSA) Annotations

Introduction

The Annotations tab aggregates information from various bioinformatics data resources pertaining to all or parts of a structure. While these data and analyses are not directly part of the PDB entries, the information that they present can be useful in learning more about the protein(s) of interest. Annotation types marked with an asterisk (*) is included for both experimental models and computed structure models, the others are only available for the experimental structures.

What are Annotations?

Annotations are notes, comments, and classifications based on analyses that present different perspectives and information about the subject (in this case the biological molecules in the PDB entry). Some of these annotations are based on identifying and organizing conserved regions in polymer sequences, structural domains, locations of proteins in cells or in membranes, and protein functions.

Why look at Annotations?

Information from annotations can help us develop new hypotheses about the function and interactions of the molecule(s) of interest. They can provide a foundation for creating new knowledge about the molecule(s) being studied.

Documentation

There are several types of annotations that are presented in PDB entries. Some of these annotations are based on analyses and classifications performed using PDB data. A variety of annotations are also integrated from a variety of other data sources.

The annotations documented here can be grouped into the following types based on:

Gene product annotations (Gene Ontology or GO)
Protein sequence - Protein Family annotations (Pfam)
Protein structural domains - SCOPe, CATH, ECOD, SCOP2
Membrane protein annotations - OPM, PDBTM, MemProtMD, mpstruc
Sequence, Structure and Function annotations (for specific molecular classes) - e.g., Antibody (IMGT, SAbDab, and Thera-SAbDab), Antimicrobial Resistance (CARD)

All annotations listed here are listed at the level of either the PDB entry, polymer entity, or instance. For each of the annotations the Chain IDs of the polymers link back to the Sequence View tab where the residue ranges for the annotations are marked (where available).

Gene Product Annotation*

The Gene Ontology resource provides a compilation of controlled vocabularies (ontologies) about the functions of gene products from a variety of organisms, ranging from bacteria to humans. Various research groups have used this ontology to annotate the functions of gene products in the PDB to describe the Molecular Function, Cellular Component, and Biological Process. Learn more about the Gene Ontology project and Annotations.

The Interface

The Gene Product (protein and/or RNA) or GO annotations groups all three types of annotation into one table where each row is dedicated to a specific polymer entity (and all its instances within the structure). Examples are presented in Figures A1a, for an experimentally determined structure, and A1b for a computed structure model (CSM).

Figure A1a: Tabular representation of the Gene Product Annotations of T4 Lysozyme for an experimental strucutre (PDB ID 102l).

Figure A1b: Tabular representation of the Gene Product Annotations of Evasin P546 for a computed structure model (CSM ID AF_AFA0A023FBW7F1).

Learning about the Structure
Examining the table in Figure A1 allows you to learn the following:

The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (GO Annotations).
A direct link to the Gene Ontology Resource is available in the top right corner (above the table).
The Molecular Functions column lists all the functions of the polymer entity (i.e., protein chain) - such as catalytic and hydrolase activity in Figure A1a and protein binding and chemokine binding in Figure A1b.
The Biological processes assigned to the protein - such as catabolic process, cell wall organization in Figure A1a, and regulation of cell motility and chemotaxis in Figure A1b.
The Cellular Component lists the location(s) of the protein within the cell, such as intracellular, cytoplasmic, in Figure A1a, and extracellular in Figure A1b.

Exploring other structures

Each of the hyperlinked words displayed in the Gene Product Annotation table can be used to launch a search for other structures in the archive that have the same GO annotation.
Note that the Gene Annotation Products annotations are available for both experimentally determined structures and computed structure models (CSMs). When the search for other structures available from RCSB.org with the same Gene Annotation Product annotations is launched from an experimental structure Annotations page, and only experimental structures are included in the search results. However, when the search is launched from the Annotations of a CSM structure, CSMs are included in the search results. The inclusion and exclusion of CSMs can always be changed using the toggle switch located next to the search icon in the Advanced Search Query Builder (in the top part of the Query Results page).
The same set of PDB structures can also be identified using three different browse options: Biological Process, Cellular Component, and Molecular Function. Learn more about these browse options by following the links included here.

Protein Sequence Annotations

Comparing protein sequences is the most common way to group and organize protein structures into families. Protein sequences also represent a convenient way to integrate information about sequence conservations, sites of modification, etc.

Protein Family Annotation

Protein Family (Pfam) classifies proteins using multiple sequence alignments and presents annotations for these families. The annotations may be of the following six types:
Pfam entries are classified in one of six types of conservations:

Family - a protein region
Domain - a stable structural unit
Repeat - a short unit which may be unstable in isolation but forms a stable structure when multiple copies are present
Motifs - a short unit outside globular domains
Coiled-Coil - predominantly contain coiled-coil motifs
Disordered - do not have a specific shape, may be intrinsically disordered

Learn more about Pfam here.

The Interface

Annotations from Pfam for protein entities in the PDB are presented as in Figure B1.

Figure B1: Tabular representation of the Pfam annotations for PDB ID 6pa1.

Learning about the Structure

Examining the table in Figure B1 allows you to learn the following:

The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (Pfam).
A direct link to Pfam is available in the top right corner (above the table)
Chains: The Chain IDs in this column indicate the polymers that were annotated with information from Pfam. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity. The residue ranges for the Pfam domain annotations can be seen there.
Domain annotations for several chains are displayed in the table with descriptions and comments.

Exploring other structures

Each of the hyperlinked words displayed in the Accession and Identifier columns of the annotation table can be used to launch a search for other structures in the archive that have the same Pfam classification.

InterPro Annotations*

InterPro integrates information from 13 different data resources and maps them on the protein sequence to classify protein sequences into families, predicts the presence of domains, and provides functional annotations of other important sites (e.g., conserved sites, active sites, binding sites etc.). These annotations may be of the following types:

Domain - distinct functional, structural or sequence units that may exist in a variety of biological contexts. Examples include PH domain, Immunoglobulin domain or the classical C2H2 zinc finger.
Family - group of proteins that share a common evolutionary origin reflected by their similarities in sequence; similar primary, secondary or tertiary structure; or related functions.
Homologous Superfamily - group of proteins that share a common evolutionary origin, reflected by similarity in their structure. Superfamily members often display very low similarity at the sequence level. Instead of being based on a single signature sequence they are usually based on a collection of underlying hidden Markov models classified by the SUPERFAMILY and CATH-Gene3D databases.
Repeat - a short sequence (<50 amino acids in length) that is repeated within a protein. Examples include Leucine Rich Repeats or WD40 repeats.
Site - a few different types of sites are annotated:

Active site - A short sequence that contains one or more conserved residues, which allow the protein to bind to a ligand and carry out a catalytic activity.
Binding site - A short sequence that contains one or more conserved residues, which form a protein interaction site.
Conserved site - A short sequence that contains one or more conserved residues.
PTM site - A short sequence that contains one or more conserved residues some of which are the site of a post-translational modification (PTM).

Unintegrated - sequence signatures from member databases that are "unintegrated" in InterPro. These signatures might not yet be curated or might not reach InterPro's standards for integration, but may still provide important information about a protein of interest.

Learn more about InterPro here.

The Interface

Annotations from InterPro for protein entities in the PDB are presented as in Figures B2a, for an experimentally determined structure, and B2b for a computed structure model (CSM).

Figure B2a: Tabular representation of the InterPro annotations for PDB ID 1b17.

Figure B2b: Tabular representation of the InterPro annotations for CSM ID AF_AFA0A009IHW8F1.

Learning about the Structure

Examining the table in Figures B2a and B2b allows you to learn the following:

The orange banner (color of the header row in the tables shown) indicates that the information presented here was integrated from an external resource (InterPro).
A direct link to InterPro is available in the top right corner (above the tables).
Family, Domain, and specific Site annotations for each of the polymer chains are displayed in the Type column of the table.
Specific Accession identifiers for annotations about each polymer chain and links to InterPro to learn more about the annotations are included in the column titled Accession.

Exploring other structures

Each of the hyperlinked words displayed in the Name columns of the annotation table can be used to launch a search for other structures in the archive that have the same InterPro classification.
Note that the InterPro annotations are available for both experimentally determined structures and computed structure models (CSMs). When the search for other structures available from RCSB.org with the same InterPro annotations is launched from an experimental structure Annotations page, only experimental structures are included in the search results. However, when the search is launched from the Annotations of a CSM structure, CSMs are included in the search results. The inclusion and exclusion of CSMs can always be changed using the toggle switch located next to the search icon in the Advanced Search Query Builder (in the top part of the Query Results page).

Protein Structural Domain Annotations

Domains are structurally and functionally stable regions of the protein that can fold and function independently from the rest of the protein. While some proteins are composed of a single domain, there are many proteins that have multiple domains, each with specific shapes, interactions, and functions.

Both shapes and functions of protein domains are conserved in nature and suggest evolutionary relationships. Several algorithms were developed to identify structural domains in the PDB and organize them into databases such as SCOP/SCOPe, CATH, and ECOD. Annotations from these databases are integrated to allow PDB users to learn about a protein’s structure, functions, and evolution.

SCOP/SCOPe

The Structural Classification of Proteins — extended (SCOPe) uses a combination of manual curation and rigorously validated automated methods to classify PDB structures based on structural features and similarities as well as homology and evolution. Learn more about SCOPe classification.

The Interface

Classification information from SCOPe is mapped to PDB structural domains (Figure C1).

Figure C1: Tabular representation of the SCOP/SCOPe classification of hemoglobin alpha and beta chains for PDB ID 4hhb.

The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (SCOPe).
A direct link to SCOP/SCOPe is available in the top right corner (above the table).
The Chains column lists the protein chains for the classified domains.
The SCOPe classification for each of the protein chains is listed in the Class, Fold, Superfamily, Family, Domain, and Species columns.
The classification presented here was based on SCOPe version 2.08.

Learning About the Structure

The structure-based classification of the hemoglobin alpha and beta chains indicates that the proteins have domains that are all alpha helical and contain a globin-like fold.
The species- and evolution-based classifications indicate that these proteins belong to the globin family and are of human origin.
To learn about the sequence ranges for the SCOPe domain see the SCOPe track (one of the 2 SCOP tracks) in the Sequence tab of the Structure Summary page.

Exploring other structures

Each of the hyperlinked words displayed in the SCOPe annotation table can be used to launch a search for other structures in the archive that have the same SCOPe classification.
The same set of PDB structures can also be identified using the SCOPe Browse options. Learn more about SCOPe browsing options.

CATH

The CATH database classifies protein domains based on evolutionary relationships using a combination of automated and manual procedures. The classification groups protein domains at four levels - Class, Architecture, Topology (fold family), and Homologous superfamily. Learn more about the CATH classification.

The Interface

Classification information from CATH is mapped to the PDB structural domains (Figure C2).

Figure C2: Tabular representation of the CATH classification of hemoglobin alpha and beta chains for PDB ID 4hhb.

Learning About the Structure

The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (CATH).
A direct link to the CATH database is available in the top right corner (above the table).
The Chains column lists the protein chains for the classified domains. Clicking on the hyperlinked chain IDs will open the Sequence tab showing the polymer entity. The residue ranges for the CATH domain annotations can be seen there.
The CATH classification for each of the protein chains is listed in the Class, Architecture, Topology, Homology columns.
The classification presented here is based on CATH version (4.2.0).

Exploring other structures

Each of the hyperlinked words displayed in the CATH annotation table can be used to launch a search for other structures in the archive that have the same CATH classification.
The same set of PDB structures can also be identified using the CATH Browse options. Learn more about CATH browsing options.

ECOD

Evolutionary Classification of protein Domains (ECOD) is a hierarchical classification of protein domains organized according to their evolutionary relationships. The domains are organized into the following five levels:

(A) architecture
(X) possible homology
(H) homology
(T) topology
(F) family

Learn more about ECOD classifications.

The Interface

Annotations from ECOD mapped to PDB are presented as in Figure C3.

Figure C3: Tabular representation of the ECOD annotations for PDB ID 6xzl.

Learning About the Structure

Examining the table in Figure C3 allows you to learn the following:

The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (ECOD).
A direct link to ECOD is available in the top right corner (above the table).
Chains: The Chain IDs in this column indicate that the polymers were annotated with information from ECOD. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity. The residue ranges for the ECOD domain annotations can be seen there.
The Domain Identifier column provides a link to the page in ECOD with additional information and a graphic to view the domain within the context of the full protein.
Annotations of the protein domain include family, topology, homology, possible homology, and architecture.

Exploring other structures

Explore other structures in the archive that have the same ECOD classification using the ECOD Browse options. Learn more about ECOD browsing options.

SCOP2

The SCOP2 database classifies representative structures with unique protein domains and extends the classification to related entries using SIFTS. Learn more about SCOP2 classification.

The Interface

Classification information from SCOP2 is mapped to PDB structural domains (Figure C4)

Figure C4: Tabular representation of the SCOP2 classification of hemoglobin alpha and beta chains for PDB ID 4hhb.

The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (SCOP2).
A direct link to SCOP2 is available in the top right corner (above the table).
The Chains column lists the protein chains for the classified domains.
The SCOP2 classification for each of the protein chains is listed in the Family name, while identifiers for the Domain and Family are linked to the corresponding pages in the SCOP2 database.
The classification presented here was based on SCOP2B (dated 2022-02-25).

Learning About the Structure

Examining the table in Figure C4 allows you to learn the following:

The structure-based classification of the hemoglobin alpha and beta chains indicates that the proteins have domains that contain a globin-like fold.

Exploring other structures

The same set of PDB structures can also be identified using the SCOP2 Browse options. Learn more about SCOP2 browsing options.

Membrane Protein Annotations

Membrane proteins are different from soluble proteins because parts of their structure either exist within the interior of a membrane or are associated with its surface. Several approaches have been used to organize these proteins into groups to study their membrane association as well as their overall structure and functions. Information from a few of these classifications have been mapped to PDB structures, and these annotations are described here. Learn more about Membrane Proteins Resources in the PDB.

OPM

The Orientations of Proteins in Membranes (OPM) database classifies membrane proteins based on their transmembrane or membrane-associated domain. Learn more about OPM.

The Interface

The OPM classification was built using SCOP and TCDB but has some unique features. It has four levels of hierarchy (Figure D1):

Type classifies a protein as transmembrane, monotopic/peripheral, or membrane-active.
Class groups the proteins by secondary structure, either all-α, all-β, α+β, α/β, or nonregular.
Superfamily groups evolutionarily related proteins with superimposable 3D structures.
Family includes proteins with detectable sequence homology.

Figure D1: Tabular representation of OPM Annotations of Klebsiella pneumoniae OmpK36 for PDB ID 5o79.

Learning About the Structure

Examining the table in Figure D1 allows you to learn the following about OmpK36:

The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (OPM).
A direct link to the OPM database is available in the top right corner (above the table).
Chains A, B, and C are all instances of the OmpK36 protein in this structure. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity. The residue ranges for the OPM domain annotations can be seen there.
An external link provides access to a page in the OPM resource where you can learn more about this protein and see a graphical representation of the membrane position relative to the protein structure.
The protein is a transmembrane, beta-barrel protein that is trimeric and has a general bacterial porin structure.

Exploring other structures

Each of the hyperlinked words displayed in the OMP annotation table can be used to launch a search for other structures in the archive that have the same OPM classification.
The same set of PDB structures can also be identified using the OPM Browse options. Learn more about OPM browsing options.

PDBTM

The Protein Data Bank of Transmembrane Proteins (PDBTM) classifies transmembrane proteins using the TMDET algorithm. Learn more about PBDTM.

The Interface

Information from PDBTM is used to identify this as a membrane protein (Figure D2).

Figure D2: Tabular representation of PDBTM Annotations of Klebsiella pneumoniae OmpK36 for PDB ID 5o79.

Learning About the Structure

Examining the table in Figure D2 allows you to learn the following about OmpK36:

The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (PDBTM).
A direct link to the PDBTM database is available in the top right corner (above the table).
The Chains A, B, and C are all instances of the OmpK36 protein in this structure. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity. The residue ranges for the PDBTM domain annotations can be seen there.
An external link provides access to a page in PDBTM and a graphical representation of the membrane position relative to the protein structure.

Exploring other structures

The hyperlink “Annotated as Membrane Protein by PDBTM” displayed in the PDBTM annotation table can be used to launch a search for other structures in the archive that have the same annotation.

MemProtMD

This is a database of intrinsic membrane protein structures identified in the Protein Data Bank and studied using molecular dynamics after insertion into simulated lipid bilayers. A coarse-grain self-assembly approach is used for the molecular dynamics simulations. Learn more about MemProtMD.

The Interface

Information from MemProtMD is used to identify this as a membrane protein (Figure D3).

Figure D3: Tabular representation of MemProtMD Annotations of Klebsiella pneumoniae OmpK36 for PDB ID 5o79.

Learning About the Structure

Examining the table in Figure D3 allows you to learn the following about OmpK36:

The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (MemProtMD).
A direct link to the MemProtMD database is available in the top right corner (above the table).
The Chains A, B, and C are all instances of the OmpK36 protein in this structure. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity.
An external link provides access to a page in MemProtMD and a graphical representation of the membrane position around the protein structure. Simulations in the lipid membrane are also available.

Exploring other structures

The hyperlink “Annotated as Membrane Protein by MemProtMD” displayed in the MemProtMD annotation table can be used to launch a search for other structures in the archive that have the same annotation.

mpstruc

The membrane proteins of known 3D structure (mpstruc) is a manually curated database that organizes membrane proteins by secondary structure and interactions with the membrane (transmembrane or monotopic). Learn more about mpstruc.

The Interface

Information from mpstruc is used to identify this as a membrane protein (Figure D4).

Figure D4: Tabular representation of mpstruc annotations of Klebsiella pneumoniae OmpK36 for PDB ID 5o79.

The three main groups in the mpstruc classification are:

Monotopic Membrane Proteins
Transmembrane Proteins: Beta-Barrel
Transmembrane Proteins: Alpha-helical

Learning About the Structure

Examining the table in Figure D4 allows you to learn the following about OmpK36:

The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (mpstruc).
A direct link to the mpstruc database is available in the top right corner (above the table).
The Chains A, B, and C are all instances of the OmpK36 protein in this structure.
An external link provides access to the home page for mpstruc. You can expand the groups shown on that page by clicking on the “+++” signs to open up the entire classification. As an exercise, you can do a search for a PDB ID of interest on the mpstruc page to see its classification.
The mpstruc classification of the OmpK36 protein places it in the “Transmembrane Proteins: Beta-Barrel” group and the “Beta-Barrel Membrane Proteins: Porins and Relatives” subgroup.

Exploring other structures

Each of the hyperlinks displayed in the mpstruc annotation table can be used to launch a search for other structures in the archive that have the same mpstruc classification.
The same set of PDB structures can also be identified using the mpstruc Browse options. Learn more about the mpstruc browsing options.

Sequence, Structure and Function Annotations

There are several classes of molecules in the PDB that have specific structural compositions and functional roles in biology. Their sequences and structures may show too wide a range of variations to be meaningfully classified and studied by sequence or structure, so several projects have grouped these molecules by their functions. Examples of these classifications are included here.

Antibody Annotations

Although antibodies are products of the adaptive immune system, understanding of their structures and functions has enabled scientists to design molecules that represent the functionally important regions of antibodies and produce them for diagnostics, therapeutics, and research. Information from two antibody databases were mapped to PDB data and antibody annotations from the above resources are presented in a tabular format with provenance and version information.

IMGT

The international ImMunoGeneTics information system (IMGT) is a resource that provides access to sequence, genome, and structure Immunogenetics data, and web-based interactive tools to explore them. Learn more about IMGT annotations.

The Interface

Antibody information integrated from IMGT is mapped on polymer entities in the structure (Figure E1).

Figure E1: Tabular representation of the Antibody Annotations of cetuximab from IMGT for PDB ID 6ayn.

Learning About the Structure

The orange banner (color of the header row in the table shown) indicates that information presented here was integrated from the IMGT resources.
A direct link to the IMGT databases is available in the top right corner (above the table).
Chains: This column lists the protein chains that were used to run the sequence matches against IMGT data. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity. Residue ranges for the Immunoglobulin domain annotations can be seen there.
Various features such as protein name, source organism, domain names and descriptions listed in this table show that this structure has two copies of a Fab fragment. Each Fab fragment of the chimeric therapeutic antibody cetuximab has a heavy and a light chain.
The Description column provides a link to the page in IMGT with additional information about this chain in the context of the full protein.

SAbDab

The Structural Antibody Database (SAbDab) annotates all antibody structures in the PDB, including experimental details, antibody nomenclature (e.g. heavy-light chain pairings), curated affinity data, and sequence annotations.

The Interface

Antibody information integrated from SAbDab is mapped on to polymer entities in the structure (Figure E2).

Figure E2: Tabular representation of the Antibody Annotations from SAbDab for the antibody present in PDB ID 4jn2.

Learning About the Structure

The orange banner (color of the header row in the table shown) indicates that information presented here was integrated from the SAbDab resource.
A direct link to the SAbDab database is available in the top right corner (above the table).
Chains: This column lists the protein chains that were used to run the sequence matches. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity and various annotations.
Chain Subclass: This column lists the names of the polymer chain entities forming the antibody. Clicking on the hyperlinked names will open the entry specific page in the SAbDab database.
Chain Type: Where available, this column lists the name of the heavy or light chain type present in the antibody.
Antigen Name: This column lists the name of the antigen that this antibody binds.

Thera-SAbDab

The Therapeutic Structural Antibody Database (Thera-SAbDab) is a database of immunotherapeutic variable domain sequences and their representatives in SAbDab. It includes close sequence matches, too (e.g., 95-98% seqID, and 99% seqID). Learn more about Thera-SAbDab annotations.

The Interface

Antibody information integrated from Thera-SAbDab is mapped on to polymer entities in the structure (Figure E3).

Figure E3: Tabular representation of the Antibody Annotations of Idarucizumab from Thera-SAbDab for PDB ID 4jn2.

Learning About the Structure

The orange banner (color of the header row in the table shown) indicates that information presented here was integrated from the Thera-SAbDab resource.
A direct link to the Thera-SAbDab database (search interface) is available in the top right corner (above the table).
Name: This structure has two copies of a Fab fragment of the therapeutic antibody Idarucizumab. Clicking on the hyperlinked name opens the therapeutic antibody specific page in Thera-SAbDab.
Target: This antibody targets the molecule Dabigatran.

Antimicrobial Resistance Annotations (from CARD)

The Comprehensive Antibiotic Resistance Database (CARD) provides curated reference sequences of antibiotic resistance genes, proteins, and their phenotypes, organized by the Antibiotic Resistance Ontology ("ARO"). PDB structures with proteins that perfectly or closely match (>95% sequence identity and 80% sequence coverage) the antibiotic resistance genes, are identified and linked to CARD annotations. The matched protein’s gene name, ARO identifier, description, impacted drug classes, and resistance mechanism are listed on the Annotations page.

The Interface

When the sequences of the PDB protein and reference Antimicrobial Resistance Gene have an exact match, information from CARD is mapped to the PDB entry in two different tables, (Figure E4).

Figure E4: Tabular representation of gene and gene family annotations from the CARD for IMP-1 beta lactamase from Pseudomonas aeruginosa, PDB ID 1jje.

When the sequences of the PDB protein and reference Antimicrobial Resistance Gene have <95% sequence identity, only the gene family annotations are included on the Annotations page (Figure E5)

Figure E5: Tabular representation of gene family annotations from the CARD for an aminoglycoside antibiotic inactivating protein AAC(6), from Escherichia coli, PDB ID 2bue.

Learning About the Structure

Examining the tables in Figure E4 and E5 allows you to learn the following about determinant of antibiotic resistance present in these PDB entries:

The orange banner (color of the header row in the table shown) indicates that the information presented here was integrated from an external resource (CARD).
A direct link to the CARD database is available in the top right corner (above the table marked by the red-outlined oval shape).
Chains: This column lists the protein chains that were used to run the sequence matches against CARD data.
Accession: This column lists the gene or gene family ARO identifier, mapped to the proteins listed. The ARO identifier is linked to the CARD page that lists additional information about the gene or gene family.
The AMR Gene table lists its name and a description of the gene, including common organisms that are known to have this gene.
The AMR Gene family table lists the gene family name, drugs impacted by members of this family, and the antibiotic resistance mechanism.
Provenance Source (Version): The CARD classification presented here is based on the version 3.2.6.

Exploring other structures

The table in Figures E2 and E3 presents options to launch a PDB query by example or PDB Advanced Search using the gene/gene family name(s) (shown with a black outlined rectangles.

Pharos Disease Annotations*

Annotations from Pharos provide a comprehensive knowledge base for the Druggable Genome (DG). Information about disease target proteins is obtained from the Target Central Resource Database (TCRD) which in turn integrates data from a various data sources including the Harmonizome, Jensen Lab datasets, EBI data sets (such as ChEMBL) and the Drug Target Ontology (DTO) from U. Miami.

Learn more about Pharos here.

The Interface

Annotations about diseases and possible drug targets from Pharos are mapped to the experimental and computed structure models available from the RCSB.org by the UniProt identifiers as seen in Figures E6a and E6b.

Figure E6a: Tabular representation of Disease and possible Drug Target annotations from Pharos for PDB ID 121p.

Figure E6b: Tabular representation of Disease and possible Drug Target annotations from Pharos for CSM ID AF_AFP05108F1.

Learning about the Structure

Examining the table in Figures E6a and E6b allows you to learn the following:

The orange banner (color of the header row in the tables shown) indicates that the information presented here was integrated from an external resource (Pharos).
A direct link to Pharos is available in the top right corner (above the tables).
Disease annotations about each polymer chain in the entry are included in the Associated Disease column, while UniProt identifiers preceded by the word Pharos (in an orange box) provide links to Pharos to learn more about the polymer as a drug target, its expression, approved drugs, and more.

Exploring other structures

Each of the hyperlinked words displayed in the Associated Disease column of the annotation table can be used to launch a search for other structures in the archive that have the same Pharos classification.
Note that the Pharos annotations are available for both experimentally determined structures and computed structure models (CSMs). When the search for other structures available from RCSB.org with the same Pharos disease annotations is launched from an experimental structure Annotations page, only experimental structures are included in the results returned. However, when the search is launched from the Annotations page of a CSM structure, CSMs are included in the search. The inclusion and exclusion of CSMs can always be changed using the toggle switch located next to the search icon in the Advanced Search Query Builder (in the top part of the Query Results page).
The same set of PDB structures can also be identified using the MONDO Disease Ontology browser. Learn more about Disease Ontology browsing options.

Mechanism and Catalytic Site Atlas (M-CSA) Annotations

The Mechanism and Catalytic Site Atlas (M-CSA) resource is a database of enzyme reaction mechanisms. Each of the ~1,000 entries included in this resource is linked to at least one experimentally determined (PDB) structure. These PDB structures are annotated in RCSB.org with details from the M-CSA.

Learn more about the M-CSA here.

The Interface

Annotations about the enzyme name, biological function, and catalytic residues from the M-CSA are available for the ~1,000 experimentally determined structures in RCSB.org, as seen in Figure E7.

In addition, to listing the enzyme name, biological function, and catalytic site residues, the table provides options to view the catalytic residues in the context of the enzyme's 3D structure, and search for additional examples of this structural motif (catalytic site residues) in other 3D structures in the archive.

Beyond learning more about the mechanism of M-CSA annotations are valuable when studying enzyme activity, e.g., those specific to a particular enzyme commission (EC) number.

Figure E7: Tabular representation of Mechanism and Catalytic Site Atlas (M-CSA) annotations for PDB ID 1b73.

Learning about the Structure

Examining the table in Figure E7 allows you to learn the following:

The orange banner (color of the header row in the tables shown) indicates that the information presented here was integrated from an external resource (M-CSA).
A direct link to the M-CSA is available in the top right corner (above the table marked by the red-outlined oval shape).
Chains: This column lists the protein chains that are annotated with details from the M-CSA. Clicking on a hyperlinked chain ID will open a Sequence tab showing the polymer entity and various annotations.
Enzyme Name: This column lists the name of the enzyme and an outgoing link in an orange colored box (with the M-CSA ID) that points to the corresponding M-CSA page.
Description: This column provides a summary of the biological function or relevance of the enzyme.
Catalytic Residues: This column identifies all the active residues of this entry that play a role in the catalysis. The number of amino acid residues in the motif (catalytic site) is given and all relevant residues are identified in the format ${label_comp_id}:${label_asym_id}-${label_seq_id}, which presents the three-letter-code of the amino acid, its chain identifiers, and its sequence position. Optionally, an additional property may appear (e.g., GLU:A_2-147), which uniquely identifies a chain instance if there are multiple copies of this chain present. Following the "Explore in 3D: M-CSA Motif Definition" link will visualize the motif residues in Mol* (marked with the red arrow labeled with the number 1).

Exploring other structures

Additional options in the Catalytic Residues column enable you to search for other structures in the PDB that have the same catalytic site residues and/or the same enzyme classification. These search options are described below:

The "Search M-CSA Motif" link (marked with the red arrow labeled with the number 2) will launch a structure motif query based on this motif definition, which streamlines the process of finding similar occurrences of this structural motif (of the catalytic residues) across the PDB archive. By turning on the "Include Computed Structure Models (CSM)" toggle switch in the Query result browser Computed Structure Models (CSM) can also be included in this search.
All enzyme commission (EC) numbers included in the PDB for the polymer chain are listed along with provenance information. For example, in Figure E7, the PDB entry includes the EC number 5.1.1.3. Clicking on the hyperlinked EC number link (e.g., 5.1.1.3, marked with the red arrow labeled with the number 3) will launch a search for other structures in the PDB that have the same EC number. This will help identify other enzyme structures with the same classification. CSM structures may be included in the search by turning on the toggle switch.
Finally, the "Search M-CSA Motif + EC 5.1.1.3" (marked with the red arrow labeled with the number 4) launches a query that represents the intersection of both criteria: a structure motif query looking for similar arrangements of catalytic residues AND entries with the same EC number. CSM structures may be included in the search by turning on the toggle switch.
All entries with M-CSA annotations can be requested like this. This query can also be combined with an enzyme commission (EC) number filter, e.g. this query returns all M-CSA annotated entries that describe EC 1.1.1.1 (alcohol dehydrogenase).

Limitations

Structure motif searching requires two or more residues, but not more than ten. Smaller or larger motifs will be unavailable for structure motif searching. Furthermore, motifs must be reasonable compact and cannot include residue pairs if they are more than 20 Å apart.
In rare cases, M-CSA annotations do not include any residue positions, making visualization and structure motif searching unavailable. To avoid misleading results, both visualization and structure motif searching are also disabled if the motif consists of any unmodeled residues, which are known at sequence level but lack atomic coordinates.
A descriptive warning will be shown whenever either visualization or structure motif searching is unavailable for whatever reason.

Please report any encountered broken links to [email protected]

Last updated: 2/29/2024