Package bedshift
Documentation
Class Bedshift
The bedshift object with methods to perturb regions
def __init__(self, bedfile_path, chrom_sizes=None, delimiter='\t')
Read in a .bed file to pandas DataFrame format
Parameters:
bedfile_path
(str
): the path to the BED filechrom_sizes
(str
): the path to the chrom.sizes filedelimiter
(str
): the delimiter used in the BED file
def add(self, addrate, addmean, addstdev, valid_bed=None, delimiter='\t')
Add regions
Parameters:
addrate
(float
): the rate to add regionsaddmean
(float
): the mean length of added regionsaddstdev
(float
): the standard deviation of the length of added regionsvalid_bed
(str
): the file with valid regions where new regions can be addeddelimiter
(str
): the delimiter used in valid_bed
Returns:
int
: the number of regions added
def add_from_file(self, fp, addrate, delimiter='\t')
Add regions from another bedfile to this perturbed bedfile
Parameters:
addrate
(float
): the rate to add regionsfp
(str
): the filepath to the other bedfile
Returns:
int
: the number of regions added
def all_perturbations(self, addrate=0.0, addmean=320.0, addstdev=30.0, addfile=None, valid_regions=None, shiftrate=0.0, shiftmean=0.0, shiftstdev=150.0, shiftfile=None, cutrate=0.0, mergerate=0.0, droprate=0.0, dropfile=None, yaml=None, seed=None)
Perform all five perturbations in the order of shift, add, cut, merge, drop.
Parameters:
addrate
(float
): the rate (as a proportion of the total number of regions) to add regionsaddmean
(float
): the mean length of added regionsaddstdev
(float
): the standard deviation of the length of added regionsaddfile
(str
): the file containing regions to be addedvalid_regions
(str
): the file containing regions where new regions can be addedshiftrate
(float
): the rate to shift regions (both the start and end are shifted by the same amount)shiftmean
(float
): the mean shift distanceshiftstdev
(float
): the standard deviation of the shift distanceshiftfile
(str
): the file containing regions to be shiftedcutrate
(float
): the rate to cut regions into two separate regionsmergerate
(float
): the rate to merge two regions into onedroprate
(float
): the rate to drop/remove regionsdropfile
(str
): the file containing regions to be droppedyaml
(str
): the yaml_config filepathbedshifter
(bedshift.Bedshift
): Bedshift instanceseed
(int
): a seed for allowing reproducible perturbations
Returns:
int
: the number of total regions perturbed
def cut(self, cutrate)
Cut regions to create two new regions
Parameters:
cutrate
(float
): the rate to cut regions into two separate regions
Returns:
int
: the number of regions cut
def drop(self, droprate)
Drop regions
Parameters:
droprate
(float
): the rate to drop/remove regions
Returns:
int
: the number of rows dropped
def drop_from_file(self, fp, droprate, delimiter='\t')
drop regions that overlap between the reference bedfile and the provided bedfile.
Parameters:
droprate
(float
): the rate to drop regionsfp
(str
): the filepath to the other bedfile containing regions to be dropped
Returns:
int
: the number of regions dropped
def merge(self, mergerate)
Merge two regions into one new region
Parameters:
mergerate
(float
): the rate to merge two regions into one
Returns:
int
: number of regions merged
def pick_random_chroms(self, n)
Utility function to pick a random chromosome
Parameters:
n
(str
): the number of random chromosomes to pick
Returns:
str, float chrom_str, chrom_len
: chromosome number and length
def read_bed(self, bedfile_path, delimiter='\t')
Read a BED file into pandas dataframe
Parameters:
bedfile_path
(str
): The path to the BED file
def reset_bed(self)
Reset the stored bedfile to the state before perturbations
def set_seed(self, seednum)
def shift(self, shiftrate, shiftmean, shiftstdev, shift_rows=[])
Shift regions
Parameters:
shiftrate
(float
): the rate to shift regions (both the start and end are shifted by the same amount)shiftmean
(float
): the mean shift distanceshiftstdev
(float
): the standard deviation of the shift distance
Returns:
int
: the number of regions shifted
def shift_from_file(self, fp, shiftrate, shiftmean, shiftstdev, delimiter='\t')
Shift regions that overlap the specified file's regions
Parameters:
fp
(str
): the file on which to find overlapsshiftrate
(float
): the rate to shift regions (both the start and end are shifted by the same amount)shiftmean
(float
): the mean shift distanceshiftstdev
(float
): the standard deviation of the shift distancedelimiter
(str
): the delimiter used in fp
Returns:
int
: the number of regions shifted
def to_bed(self, outfile_name)
Write a pandas dataframe back into BED file format
Parameters:
outfile_name
(str
): The name of the output BED file
Version Information: bedshift
v1.1.1, generated by lucidoc
v0.4.2