preprocessing: pp
#
Functions:
performs quality filtering and normalization of 10x Genomics count data |
|
Predicts doublets using scrublet [Wolock19]. |
|
Predicts doublets using doubletdetection [Gayoso20]. |
|
Removes doublets. |
- scab.pp.filter_and_normalize(adata: <MagicMock name='mock.AnnData' id='140098729656384'>, make_var_names_unique: bool = True, min_genes: int = 200, min_cells: float | None = None, n_genes_by_counts: int = 2500, percent_mito: int | float = 10, percent_ig: int | float = 100, hvg_batch_key: str | None = None, ig_regex_pattern: str = 'IG[HKL][VDJ][1-9].+|TR[ABDG][VDJ][1-9]', regress_out_mt: bool = False, regress_out_ig: bool = False, target_sum: int | None = None, n_top_genes: int | None = None, normalization_flavor: str = 'cell_ranger', log: bool = True, scale_max_value: float | None = None, save_raw: bool = True, verbose: bool = True) <MagicMock name='mock.AnnData' id='140098729656384'> #
performs quality filtering and normalization of 10x Genomics count data
- Parameters:
adata (anndata.AnnData) –
AnnData
object containing gene count data.make_var_names_unique (bool, default=True) – If
True
,adata.var_names_make_unique()
will be called before filtering and normalization.min_genes (int, default=200) – Minimum number of identified genes for a droplet to be considered a valid cell. Passed to
sc.pp.filter_cells()
as themin_genes
parameter.min_cells (int, optional) – Minimum number of cells in which a gene has been identified. Genes below this threshold will be filtered. If not provided, a dynamic threshold equal to 0.1% of the total number of cells in the dataset will be used.
n_genes_by_counts (int, default=2500) – Threshold for filtering cells based on the number of genes by counts.
percent_mito (int or float, default=10) – Threshold for filtering cells based on the percentage of mitochondrial genes.
hvg_batch_key (str, optional) – When processing an
AnnData
object containing multiple samples that may require integration and batch correction, hvg_batch_key will be passed tosc.pp.highly_variable_genes()
to force separate identification of highly variable genes for each batch. If not provided, variable genes will be computed on the entire dataset.ig_regex_pattern (str, default='IG[HKL][VDJ][1-9].+|TR[ABDG][VDJ][1-9]') – Regular expression pattern used to identify immunoglobulin genes. The default is designed to match all immunoglobulin germline gene segments (V, D and J). Constant region genes are not matched.
target_sum (int, optional) – Target read count for normalization, passed to
sc.pp.normalize_total()
. If not provided, the median count of all cells (pre-normalization) is used.n_top_genes (int, optional) – The number of top highly variable genes to retain. If not provided, the default number of genes for the selected normalization flavor is used.
normalization_flavor (str, default='cell_ranger') – Options are
'cell_ranger'
,'seurat'
or'seurat_v3'
.log (bool, default=True) – If
True
, counts will be log-plus-1 transformed.scale_max_value (float, optional) – Value at which normalized count values will be clipped. Default is no clipping.
save_raw (bool, default=True) – If
True
, normalized and filtered data will be saved toadata.raw
prior to scaling and regressing out mitochondrial/immmunoglobulin genes.verbose (bool, default=True) – If
True
, progress updates will be printed.
- scab.pp.scrublet(adata: <MagicMock name='mock.AnnData' id='140098729656384'>, verbose: bool = True) <MagicMock name='mock.AnnData' id='140098729656384'> #
Predicts doublets using scrublet [Wolock19].
See also
Samuel L. Wolock, Romain Lopez, Allon M. KleinScrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic DataCell Systems 2019. https://doi.org/10.1016/j.cels.2018.11.005- Parameters:
adata (anndata.AnnData) –
AnnData
object containing gene count data.verbose (bool, default=True) – If
True
, progress updates will be printed.
- Return type:
Returns an updated adata object with doublet predictions found at
adata.obs.is_doublet
and doublet scores atadata.obs.doublet_score
.
- scab.pp.doubletdetection(adata: <MagicMock name='mock.AnnData' id='140098729656384'>, n_iters: int = 25, use_phenograph: bool = False, standard_scaling: bool = True, p_thresh: float = 1e-16, voter_thresh: float = 0.5, verbose: bool = False) <MagicMock name='mock.AnnData' id='140098729656384'> #
Predicts doublets using doubletdetection [Gayoso20].
See also
Adam Gayoso, Jonathan Shor, Ambrose J Carr, Roshan Sharma, Dana Pe’erDoubletDetection (Version v3.0)Zenodo 2020. http://doi.org/10.5281/zenodo.2678041- Parameters:
adata (anndata.AnnData) –
AnnData
object containing gene counts data.n_iters (int, default=25) – Iterations of doubletdetection to perform.
use_phenograph (bool, default=False) – Passed directly to
doubletdection.BoostClassifier()
.standard_scaling (bool, default=True) – Passed directly to
doubletdection.BoostClassifier()
.p_thresh (float, default=1e-16) – P-value threshold for doublet classification.
voter_thresh (float, default=0.5) – Voter threshold, as a fraction of all voters.
verbose (bool, default=True) – If
True
, progress updates will be printed.
- Return type:
Returns an updated adata object with doublet predictions found at
adata.obs.is_doublet
and doublet scores atadata.obs.doublet_score
. Note thatadata.obs.is_doublet
values are0.0
and1.0
, notTrue
andFalse
. This is the default output ofdoubletdetection
and is useful for plotting doublets usingscanpy.pl.umap()
, which cannot handle boolean color values.
- scab.pp.remove_doublets(adata: <MagicMock name='mock.AnnData' id='140098729656384'>, doublet_identification_method: str | None = None, verbose: bool = True) <MagicMock name='mock.AnnData' id='140098729656384'> #
Removes doublets. If not already performed, doublet identification is performed using either doubletdetection or scrublet.
- Parameters:
adata (anndata.AnnData):) –
AnnData
object containing gene count data.doublet_identification_method (str, default='doubletdetection') – Method for identifying doublets. Only used if
adata.obs.is_doublet
does not already exist. Options are'doubletdetection'
and'scrublet'
.verbose (bool, default=True) – If
True
, progress updates will be printed.
- Return type:
An updated
adata
object that does not contain observations that were identified as doublets.