| Title: | Stratified Sampling and Labeling of Data in R |
|---|---|
| Description: | Provides functions for stratified sampling and assigning custom labels to data, ensuring randomness within groups. The package supports various sampling methods such as stratified, cluster, and systematic sampling. It allows users to apply transformations and customize the sampling process. This package can be useful for statistical analysis and data preparation tasks. |
| Authors: | Duan Yuanheng [aut, cre] |
| Maintainer: | Duan Yuanheng <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-12 06:24:06 UTC |
| Source: | https://github.com/cran/stratifiedyh |
This function performs cluster sampling on the dataframe and assigns "Yes" or "No" labels to rows based on selected clusters.
cluster_labels(df, group_col, yes_percentage)cluster_labels(df, group_col, yes_percentage)
df |
A data frame containing the data. |
group_col |
A character string specifying the column to use for clustering. |
yes_percentage |
A numeric value between 0 and 100 indicating the percentage of clusters to label as "Yes". |
A data frame with an additional column "Clustered_Yes_No" containing the cluster-sampled "Yes"/"No" labels.
result <- cluster_labels(iris, group_col = "Species", yes_percentage = 50)result <- cluster_labels(iris, group_col = "Species", yes_percentage = 50)
This function allows the user to apply a custom transformation (scaling, normalization, log transform, or custom function) to a specified numeric column.
custom_transform(df, selected_column, transformation_type)custom_transform(df, selected_column, transformation_type)
df |
A data frame containing the data. |
selected_column |
A character string specifying the column to be transformed. |
transformation_type |
A character string representing the transformation type: "scale", "normalize", "log", or a custom R function. |
A data frame with the transformed column.
result <- custom_transform(iris, selected_column = "Sepal.Length", transformation_type = "scale")result <- custom_transform(iris, selected_column = "Sepal.Length", transformation_type = "scale")
This function stratifies data based on a specified grouping column and assigns custom labels according to a given percentage.
stratified_custom_labels(df, group_col, label_percentage, label1, label2)stratified_custom_labels(df, group_col, label_percentage, label1, label2)
df |
A data frame to be stratified. |
group_col |
A character string specifying the column name to group by. |
label_percentage |
A numeric value between 0 and 100 indicating the percentage of the first label to assign within each group. |
label1 |
A character string representing the first label. |
label2 |
A character string representing the second label. |
A data frame with an additional column "Custom_Labels" containing the stratified custom labels.
result <- stratified_custom_labels(iris, group_col = "Species", label_percentage = 50, label1 = "High", label2 = "Low")result <- stratified_custom_labels(iris, group_col = "Species", label_percentage = 50, label1 = "High", label2 = "Low")
This function stratifies data based on a specified grouping column and assigns "Yes" or "No" labels according to a given percentage.
stratified_labels(df, group_col, yes_percentage)stratified_labels(df, group_col, yes_percentage)
df |
A data frame to be stratified. |
group_col |
A character string specifying the column name to group by. |
yes_percentage |
A numeric value between 0 and 100 indicating the percentage of "Yes" labels to assign within each group. |
A data frame with an additional column "Sampled_Yes_No" containing the stratified "Yes"/"No" labels.
# Example with the iris dataset result <- stratified_labels(iris, group_col = "Species", yes_percentage = 50)# Example with the iris dataset result <- stratified_labels(iris, group_col = "Species", yes_percentage = 50)
This function performs systematic sampling on the dataframe and assigns "Yes" or "No" labels to rows based on the specified interval.
systematic_labels(df, group_col, sampling_interval)systematic_labels(df, group_col, sampling_interval)
df |
A data frame containing the data. |
group_col |
A character string specifying the column to use for grouping. |
sampling_interval |
A numeric value representing the interval for systematic sampling. |
A data frame with an additional column "Systematic_Yes_No" containing the systematically sampled "Yes"/"No" labels.
result <- systematic_labels(iris, group_col = "Species", sampling_interval = 2)result <- systematic_labels(iris, group_col = "Species", sampling_interval = 2)