Package bedshift Documentation

Class Bedshift

The bedshift object with methods to perturb regions

def __init__(self, bedfile_path, chrom_sizes=None, delimiter='\t')

Read in a .bed file to pandas DataFrame format

Parameters:

  • bedfile_path (str): the path to the BED file
  • chrom_sizes (str): the path to the chrom.sizes file
  • delimiter (str): the delimiter used in the BED file
def add(self, addrate, addmean, addstdev, valid_bed=None, delimiter='\t')

Add regions

Parameters:

  • addrate (float): the rate to add regions
  • addmean (float): the mean length of added regions
  • addstdev (float): the standard deviation of the length of added regions
  • valid_bed (str): the file with valid regions where new regions can be added
  • delimiter (str): the delimiter used in valid_bed

Returns:

  • int: the number of regions added
def add_from_file(self, fp, addrate, delimiter='\t')

Add regions from another bedfile to this perturbed bedfile

Parameters:

  • addrate (float): the rate to add regions
  • fp (str): the filepath to the other bedfile

Returns:

  • int: the number of regions added
def all_perturbations(self, addrate=0.0, addmean=320.0, addstdev=30.0, addfile=None, valid_regions=None, shiftrate=0.0, shiftmean=0.0, shiftstdev=150.0, shiftfile=None, cutrate=0.0, mergerate=0.0, droprate=0.0, dropfile=None, yaml=None, seed=None)

Perform all five perturbations in the order of shift, add, cut, merge, drop.

Parameters:

  • addrate (float): the rate (as a proportion of the total number of regions) to add regions
  • addmean (float): the mean length of added regions
  • addstdev (float): the standard deviation of the length of added regions
  • addfile (str): the file containing regions to be added
  • valid_regions (str): the file containing regions where new regions can be added
  • shiftrate (float): the rate to shift regions (both the start and end are shifted by the same amount)
  • shiftmean (float): the mean shift distance
  • shiftstdev (float): the standard deviation of the shift distance
  • shiftfile (str): the file containing regions to be shifted
  • cutrate (float): the rate to cut regions into two separate regions
  • mergerate (float): the rate to merge two regions into one
  • droprate (float): the rate to drop/remove regions
  • dropfile (str): the file containing regions to be dropped
  • yaml (str): the yaml_config filepath
  • bedshifter (bedshift.Bedshift): Bedshift instance
  • seed (int): a seed for allowing reproducible perturbations

Returns:

  • int: the number of total regions perturbed
def cut(self, cutrate)

Cut regions to create two new regions

Parameters:

  • cutrate (float): the rate to cut regions into two separate regions

Returns:

  • int: the number of regions cut
def drop(self, droprate)

Drop regions

Parameters:

  • droprate (float): the rate to drop/remove regions

Returns:

  • int: the number of rows dropped
def drop_from_file(self, fp, droprate, delimiter='\t')

drop regions that overlap between the reference bedfile and the provided bedfile.

Parameters:

  • droprate (float): the rate to drop regions
  • fp (str): the filepath to the other bedfile containing regions to be dropped

Returns:

  • int: the number of regions dropped
def merge(self, mergerate)

Merge two regions into one new region

Parameters:

  • mergerate (float): the rate to merge two regions into one

Returns:

  • int: number of regions merged
def pick_random_chroms(self, n)

Utility function to pick a random chromosome

Parameters:

  • n (str): the number of random chromosomes to pick

Returns:

  • str, float chrom_str, chrom_len: chromosome number and length
def read_bed(self, bedfile_path, delimiter='\t')

Read a BED file into pandas dataframe

Parameters:

  • bedfile_path (str): The path to the BED file
def reset_bed(self)

Reset the stored bedfile to the state before perturbations

def set_seed(self, seednum)
def shift(self, shiftrate, shiftmean, shiftstdev, shift_rows=[])

Shift regions

Parameters:

  • shiftrate (float): the rate to shift regions (both the start and end are shifted by the same amount)
  • shiftmean (float): the mean shift distance
  • shiftstdev (float): the standard deviation of the shift distance

Returns:

  • int: the number of regions shifted
def shift_from_file(self, fp, shiftrate, shiftmean, shiftstdev, delimiter='\t')

Shift regions that overlap the specified file's regions

Parameters:

  • fp (str): the file on which to find overlaps
  • shiftrate (float): the rate to shift regions (both the start and end are shifted by the same amount)
  • shiftmean (float): the mean shift distance
  • shiftstdev (float): the standard deviation of the shift distance
  • delimiter (str): the delimiter used in fp

Returns:

  • int: the number of regions shifted
def to_bed(self, outfile_name)

Write a pandas dataframe back into BED file format

Parameters:

  • outfile_name (str): The name of the output BED file

Version Information: bedshift v1.1.1, generated by lucidoc v0.4.2