.. _examples: examples ============ We have designed scab to be used primarily in interactive notebook-like programming environments like Jupyter. Although it may have a steeper learning curve than a GUI-based tool, we believe that the gains in flexibility and customizability are more than worth the tradeoff. The scab API is quite similar to that of scanpy_ [Wolf18]_. This is by design, as we are big fans of the scanpy API and are also striving minimize the learning curve for users already familiar with scanpy. Additionally, scab builds on the ``AnnData`` object at the core of scanpy to integrate BCR/TCR and antigen specificity data. Below are a few hypothetical use cases, with functional code examples. The `scab Github repository`_ includes interactive examples with sample datasets so that users can take scab for a more comprehensive test drive. example #1 ------------ Our first example is relatively simple. We're starting with two single cell libraries (cell hashes and B cell VDJ) generated from a set of multiplexed samples of enriched B cells on a single 10x Genomics Chromium Controller reaction. We have two primary outputs from CellRanger: 1) a counts matrix, which includes only cell hashes; and 2) assembled BCR contigs with associated summary annotations. With this dataset, we'd like to do the following: - read, annotate and integrate the input data (cell hashes and BCR sequences) - demultiplex the samples using cell hashes and rename the samples - filter out any cells without paired heavy/light chains - assign BCR clonal lineages - make a lineage donut plot for each sample, colored by VH gene use .. code-block:: python import scab # read, integrate and annotate the input data adata = scab.read_10x_mtx( mtx_path = '/path/to/filtered_bc_matrix', bcr_file = '/path/to/filtered_contigs.fasta', bcr_annot = '/path/to/filtered_summary.csv' ) # demultiplex the samples using cell hashes and rename the samples sample_names = { 'control1': 'CellHash1', 'control2': 'CellHash2', 'test1': 'CellHash3', 'test2': 'CellHash4' } adata = scab.tl.demultiplex(adata, rename=sample_names) # filter out any cells that don't contain a single BCR pair adata = adata[adata.obs.bcr_pairing == "single pair"] # assign BCR clonal lineages adata = scab.vdj.clonify(adata) # make a lineage donut plot for each sample, colored by VH gene use for sample in adata.obs.sample.unique(): a = adata[adata.obs.sample == sample] scab.pl.lineage_donut(a, hue='v_gene', chain='heavy') | | example #2 ------------ Next, we have a more complex set of libraries, generated from multiplexed peripheral blood mononuclear cell (PBMC) samples. The PBMCs were labeled with a panel of CITE-seq antibodies and we recovered BCR and TCR sequences, to produce the following CellRanger outputs: 1) a counts matrix, including GEX, cell hash and CITE-seq (feature barcode) UMI counts; 2) assembled BCR contigs with associated summary annotations; and 3) assembled TCR contigs with associated summary annotations. With this dataset, we'd like to: - read, annotate and integrate all of the input data - demultiplex the samples using cell hashes, and rename the samples using a dictionary mapping sample names to cell hash names - preprocess the GEX data, including leiden clustering and UMAP embedding - for each CITE-seq antibody, make a pair of plots comparing transcription and cell surface abundance - group TCR sequences into clonotypes - select cells expressing a clonally expanded TCR .. code-block:: python import scab # read, integrate and annotate the input data adata = scab.read_10x_mtx( mtx_path = '/path/to/filtered_bc_matrix', bcr_file = '/path/to/BCR/filtered_contigs.fasta', bcr_annot = '/path/to/BCR/filtered_summary.csv', tcr_file = '/path/to/TCR/filtered_contigs.fasta', tcr_annot = '/path/to/TCR/filtered_summary.csv' ) # demultiplex the samples using cell hashes and rename the samples sample_names = { 'donor123': 'CellHash1', 'donor456': 'CellHash2', 'donor789': 'CellHash3' } adata = adata.tl.demultiplex(adata, rename=sample_names) # preprocess the GEX data and compute the UMAP embedding adata = scab.pp.filter_and_normalize(adata) adata = scab.tl.umap(adata) # for each CITE-seq antibody, make a pair of plots comparing transcription and expression gene2citeseq = { 'gene_name1': 'citeseq_name1', ... 'gene_nameN': 'citeseq_nameN' } for gene, citeseq in gene2citeseq.items(): scab.pl.umap(adata, colors=[gene, citeseq]) # group TCR sequences into clonotypes adata = scab.vdj.group_clonotypes(adata) # select cells expressing a clonally expanded TCR expanded = adata[adata.obs.clonotype_size > 1] .. _scanpy: https://github.com/scverse/scanpy .. _abutils: https://github.com/briney/abutils .. _scab Github repository: htts://github.com/briney/scab