Crosslinking-MS_class

Crosslinking Mass Spec Analysis Excercises.

We will spend some time playing with some crosslinking datasets to get a feel for some different types of experiments and how well behaved data should look.

These examples make use of the Protein Prospector crosslinking search feature. Protein Prospector was the first crosslinking search engine to describe a mass modification strategy to linearize the crosslinking search space:

To save time, the datasets have already been searched using Protein Prospector. We will go over how to set the search parameters in class. We will then explore and classify the CLMS datasets using the Touchstone package which works with Prospector data. The Touchstone package builds a scoring function (“SVM.score”) using various features of the spectral match. These include MS1 level features such as the mass accuracy (ppm) and charge state (z), as well as MS2 level features such as the Prospector Score contributed by the lower scoring peptide (“Score Difference”). Additionally some features that are not related to the quality of the spectral match are included. For instance, intra-protein crosslinks are scored differently than inter-protein cross-links in most cases.

Touchstone builds the SVM score and attempts to classify the data at a desired False Discovery Rate (FDR). By default, intra-protein and inter-protein links are classifed as separate pools.The user can choose to classify the dataset at different summarization levels:

CSMs - Crosslinked Spectral Matches. These are MS2 spectra that match to crosslinked peptides regardless of redunancy. Redundancy can be due to same peptides in a different charge state, or with different modification state (eg, Methionine oxidation). It also includes repeat matches of the same precursor to the same sequence (either in different MS fractions or in cases where the precursor was not excluded from being re-analyzed by the control software).
URPs - Unique Residue Pairs. The most relevant summarization level for most crosslinking experiments. Residue pairs are defined just by the position number of the adducted amino acid residue and the protein without regards to charge state or peptide sequence, etc. The best scoring spectrum per URP is retained as an illustrative example.
PPIs - Protein-Protein Interactions. Mostly relevant in large scale studies of complex systems. PPI level summarization retains only two proteins involved in the crosslink.

Touchstone additionally allows you to pre-filter the data by mass accuracy, score difference, length of the peptides, or minimum number of backbone ion cleavages matched per peptide. This allows the user to adjust the stringency of the classification to account for common reasons for error, such as matches to very short peptides.

Touchstone has some nice visualizations such as linking to the spectral matches in Prospector, as well as links to XiNet, software developed by the Rapsilber group for visualization of crosslinked network maps. An optional module file can be provided that allows the user to assign proteins (or particular regions of proteins) to biologically relevant “modules” as well as to a high-res structure file (pdb or cid) for assessing the agreement of the data with known structures.

Touchstone is currently limited in distribution, but will be more widely available soon. Touchstone hasn’t been optimized for scalability and our use in this class will undoubtedly tax the servers ability to load and analyze multiple datasets simultaneously. I anticipate it will take a few minutes to load the datasets. Please be patient with it. If it seems that the server can’t handle the class load, we will try to split up in groups.

I’ve provided four dataset for us to have a look at:

dataset	name	organism	crosslinker	msStrategy	fractionation
1	ribosome	rabbit	DSSO	ms2.hcd	4 SEC Fractions
2	ribosome	rabbit	DSSO	ms2.cid-ms3.cid	4 SEC Fractions
3	translocon	human	DSS	ms2.ethcd	4 SEC Fractions
4	axon initial segment	rat	DSSO	ms2.hcd	4 SEC Fraction x 8 high pH C18 Fractions = 34 MS samples

Datasets #1-3 were acquired on an Orbitrap Lumos MS, while dataset #4 was acquired using an Orbitrap Exploris with a FAIMS source. Each MS acquisition was 3-4 hours long.

Datasets 1-2.

The 80S ribosome was produced using a rabbit reticulocyte cell free expression system. 80S ribosomes were crosslinked with the cleavable reagent DSSO. DSSO is a cleavable crosslinker, which can cleave on either side of the sulfoxide. The ions which contain a DSSO can each produce either a thiol or an alkene modification. In Protein Prospector these are annotated with the symbols * and #. For details see the original publication: Kao et al, 2011

I often use this ribosomal system for method development and optimization of CLMS workflows. There is a high-res EM structure of the complex which can be helpful in determining if the crosslinked are assigned correctly or not, pdb:6HCJ.

I’ve included the 80S sample analyzed using both a MS2 level stepped-HCD acquisition cycle (dataset #1) and using an ms2-CID-ms3-CID acquisition on the same Lumos instrument (dataset #2). Note that the MS3 capabilities in touchstone are pretty basic and I’ve included this data mostly so you can familiarize yourselves with what the linked spectra look like… the Crosslink Table has both the MS2 and MS3-level spectral annotations linked.

The modulefile for this excercise categorizes the 80 or so ribosomal proteins to either the large (60S) or small (40S) subunits.

cryoEM structure ofRabbit 80S ribosome

Dataset 3.

The translocon dataset is a published work: McGilvray et al, 2020 My collaborators at University of Chicago are interested in the biogenesis of membrane proteins. They discovered a novel translocon complex that acts co-transationally to assemble multi-pass transmembrane proteins. Newly synthesized proteins are passed from the ribosome to the Sec61 complex in the endoplasmic reticulum (ER). Sec61 interacts with the novel translocon members (inclcuding TMCO1, TMEM147, Nicalin, NOMO, CCDC47) to fold the emerging protein across the lipid bilayer.

The collaborators had an cryo electron microscopy (EM) reconstruction of the translocon embedded in a micelle and the Ribosome was clearly identified. The CLMS data was useful to help map the interactions of the translocon accessory factors with the ribosome near the lipid phase. Additionally, the ER resolution was very poor in the lumenal side of the ER membrane and we hoped the CLMS data would provide more information on that compartment.

Figure 2 - cryoEM + CLMS model of novel translocon

The sample was crosslinked with the membrane permeable, non-cleavable reagent DSS and the data was acquired using an ms2.EThcD dissociation strategy. For non-cleavable crosslinker, electron transfer dissociation (ETD) with supplemental collision energy (EThcD) provide better fragmentation of both of the crosslinked peptides than collisional energy alone (eg, HCD). ETD dissociated peptide bonds by a radical mechanism that cleaves amino acid residue between the amino group and the alpha carbon to give c- and z- ions. Since this data is aquired with the hybrid EThcD method we expect both c- and z- ions as well as typical b- and y- ions (from cleavage at the peptide bond).

The module file for this dataset divides the protein components into 5 modules: 60S Ribosome, 40S Ribosome, Sec61 complex, “lumenal”, and “200220”. The Lumenal components are proteins that we identified by mass spectrometry that annotated as having ER-lumen subcellular localization. “200220” are a few additional proteins that we discovered in the dataset and wanted to explore. The name reflects the date I added them to the analysis. Mapping interactions between Lumenal components and either Ribosome or Sec61 were the main aims of this study.

Dataset 4.

The Axon Initial Segment (AIS) is a structure that defines the boundary between the somatodendritic part of the neuron and the axon. It is a tightly intertwined network of 1000s of different proteins and macromolecules that spans different cellular compartments: extracellular, membrane, cytosolic, and microtubular/cytoskeletal. In addition to maintaining the polarity of the neuron, the AIS is responsible for clustering a high concentration of volate gated ion channels and action potentials are initiated in the AIS. Disfunction of AIS specific proteins is implicated in many neurological conditions such as bipolar disorder, epilepsy, and schizophrenia.

The aim of the study is to map the topology of the AIS. We are working with a preparation of AIS that derived from primary rat hippocampal neurons. The neurons are treated with Triton X100 detergent which is supposed to remove non AIS parts of the neuron, while the densely networked AIS is retained. Since the degree of complexity in these samples is an order of magnitude higher than the earlier samples, we perform more fractionation prior to mass spectrometry. For instance, instead of simply taking 4 size exclusion fractions (SEC) as in the translocon or ribosome studies with 100-200 proteins, we now take 4 SEC fraction and subject them to a 2nd dimension of chromatography, in this case high pH reverse phase was used. In total, 32 fractions were collected and analyzed on an Orbitrap Exploris 480 instrument.

This dataset is included to illustrate some of the challenges involved in working with highly complex CLMS experiments while showing that the fundamental technique of matching MS spectra to crosslinked peptide sequences while controlling the FDR is the same as in the smaller scale studies.

The module file provided with this dataset does a rough classification based on subcellular compartment annotations pulled from uniprot.

$Axon Initial Segments$

Axon Initial Segments

In Class Excercise

You can access the instance of touchstone with the class datasets here: To access the touchstone instance with the example datasets go to: https://prospts.shinyapps.io/tstoneapp/

Please allow a minute or two for a dataset to load. Load the example data by clicking the “Browse” button under the “Search Compare Output” heading. To load the rRibo_DSSO_ms3CID data you will need to first select “ms3” before specifying the search compare file. When multiple users are trying to load datasets at the same time things might get cludgy so it would be best to try to space this out a bit.

Once you’ve loaded a dataset, you can open the module file that is saved in the same directory as the data. In case there is confusion, the module file should have either “module” or “modFile” in the name and the search results are the other file in the project directory.

The intentions of the project are to learn to recognize a well classified dataset as well as correctly and incorrectly assigned spectra. Also, to get a better understanding of the power and limitations of CLMS experiments. My hope is that you will just play around with the datasets and see if you find anything interesting. Some suggested activities:

Open one of the ms2 datasets. Manually drag the SVM thresholds to the far left. Look at the graphs, the FDR, and the numbers of crosslinks. Try summarizing the data on URPs. What happens to the FDR? Summarize on PPIs; what happens? How do the indicator plots look as the SVM threshold is raised? Look at the mass accuracy plot, the euclidean distance output, and the protein-pair and module-pair dot plots.
Examine some of the spectra of high scoring and low scoring matches. Raise and lower the SVM threshold again and be sure to look at some bad hts. In the annotated MS-Product spectrum, look at the lists of product ion matches for both peptides.
Look at some spectra from decoy hits (in the “Decoy Hits” tab). To what extent can you tell that these hits are wrong?
Look at some spectra that correspond to very large distance measurements… eg, “violations” that aren’t consistent with the Ribosome high-res structure. Do all of these spectra look unreliable and do they represent random matches?
How are the spectra different between ms2-hcd, ms2-ethcd, and the ms2-cid-ms3-cid datasets?
Look at the list of crosslinks. Or select a group of crosslinks that correspond to a particularly interesting module-pair by clicking on the dot in the module-pair plot.

CSHL_class_exercise

Mike Trnka

2025-05-07

Crosslinking Mass Spec Analysis Excercises.

Datasets 1-2.

Dataset 3.

Dataset 4.

In Class Excercise