true

Learn SAP from the Best Tutors

Affordable fees
1-1 or Group class
Flexible Timings
Verified Tutors

Search in

Variations Of Random Forest In R

14/07/2017 0 0

One of the important steps in using analytics to generate insights is model fitting. Typical projects involve a lot of data cleaning so that high accuracy is achieved on application of the model. Competitions are all about data cleaning and models. There are various models which can be fitted on data under different conditions. One of the most intuitive of those models is decision trees. Decision trees classify data into buckets based on “decisions” based on the feature values. Most of the competitions start with bench-marking based on results from ensemble of trees, known as random decision forests. Random Forests, as they are called, use ensemble of trees based and are the best examples of ‘Bagging’ techniques. R, the popular language for model fitting has made a variety of random forest packages available for use. Let’s discuss a few of them (in no way this list is exhaustive).

RandomForest: The ‘classic’ package in R which implements the most basic random forest logic and is really robust. The package is very user friendly and provides the user with the option to tune features such as number of trees and depth of trees. The package optionally provides the ability to derive feature importanceand proximity measures. Feature importance is based on the error increase when OOB data is changed while keep all other things same. On the other hand, Proximity measure is a matrix where (i, j) element indicates fraction of trees in which elements i and j fall in the same terminal node. The package can be used for classification or regression problems and can be learnt with ease
Cforest: This package is computationally more expensive and better than the randomForest package in terms of accuracy. cforest uses OOB data which means more information and higher accuracy. At the same time it is slower and can handle less data for the same memory. It then uses weighted average of the trees to get the final ensemble. However, the main cause for cforest having a more reliable predictions is the fact that it produces unbiased trees. randomForest have a drawback that the simple algorithm is invariably biased towards features with many cut points. There are features which are continuous or have many categories and can be preferred. Whenever you have large computational resources at your disposal, do use cforest for accuracy.
ObliqueRF: “Oblique” forests is an underrated, advanced yet useful concept which is based on separating trees using hyper planes instead of features. They can easily outperform randomForest especially in cases when all the features are discrete or we have spectral data. Just like randomForest, Oblique forests are also governed by subspace dimensions(or number of features) and ensemble size(or number of trees). However, since they make oblique cuts rather than orthogonal ones, recursive binary splits and ridge regression are also involved for splitting. I have seen a cool implementation of oblique random forests as the prize winning code in a kaggle competition! Hence oblique random forests sure pack a punch. ObliqueRF does end up having a higher bias and lower variance than randomForest.
ParallelForest: ParallelForest is an implementation to run randomForest using parallel computing. The package has functions grow.forest. Its pretty handy when there are millions of rows in the training set. A data set which took days for randomForest package to fit on was handled by ParallelForest in under an hour. However, there are still doubts on whether the accuracy is the same for both packages under all conditions and whether classification can be implemented using parallel processing. (Another package bigrf is also based on using multi-threading and caching for very large data but it was not built with the objective to speed up processing rather it is based on handling very large data).
RandomUniformForest: This package produces unpruned trees and are useful for regression, classification and unsupervised learning. If cforest is slower but more accurate than randomForest then randomUniformForest falls on the other end of being the faster but slightly less accurate version. The trees have lower correlation, thereby resulting in lower bias but higher variance. Moreover, they involve use of uniform distribution. Since we don’t care much about bias as perfectly randomized trees will cancel it out, randomUniformForests are useful in situations where the features themselves follow specified distributions
Randomforest SRC: Survival, Regression and Classification(SRC) are the three types of models this package provides a unified function for. Additionally, there are multivariate and unsupervised extensions as well as parallel processing through openMP. I have come to use this package whenever there is doubt on what should be the best approach for data model fitting. Coupled with missing value imputation, the package provides a first look kind of model useful for further exploration and deep dive analysis.
Ranger: Ranger comes to the rescue when you have high dimensional data and want a memory efficient yet fast implementation of randomForest. The word ranger came from RANdom forest GEneRator. The main purpose where I have used ranger is to build models quickly and find out optimal parameter values using parameter tuning.
Rborist: Rborist is a high performance implementation of randomForest. Compared to original randomForest, this package optimizes the algorithms such that model fitting is performed with less data movement within memory and create opportunities for scaling up performance. Hence, as the features increase, the processing time increases only linearly (as opposed to exponential increase expected for randomForests). The package also supports missing value imputation. Hence, in projects where we ourselves generate a lot of features, this package becomes seemingly more suitable.

Since the idea being first suggested in the 90’s Random forests have become a popular method of model fitting and are used in various forms. There are even more implementations such as rotationForests(based on fitting features over principal components), xgboost (extreme gradient boosting, a clever tree based technique that uses boosting) and rFerns (useful for comparing images) and regularized random forests. This article will be useful for those who have had gone through decision tree and basic random forest concepts and are willing to learn its different variations in R.

0 Like 0 Dislike

Follow 0

Other Lessons for You

Built-in Functions In Excel

i. Built In Functions: MS Excel has many built in functions, which we can use in our formula. To see all the functions by category, choose Formulas Tab » Insert Function. Then Insert function Dialog...

ITech Analytic Solutions

0 0

Built-in Functions In Excel

ITech Analytic Solutions

0 0

Adding Graphics In Excel

Graphic Objects in MS Excel: MS Excel supports various types of graphic objects like Shapes gallery, SmartArt, Text Box, and WordArt available on the Insert tab of the Ribbon.Graphics are available in...

ITech Analytic Solutions

0 0

Simple Charts In Excel

i. Charts: A chart is a visual representation of numeric values. Charts (also known as graphs) have been an integral part of spreadsheets. Charts generated by early spreadsheet products were quite crude,...

ITech Analytic Solutions

0 0

Keyboard Shortcuts In Excel

MS Excel offers many keyboard short-cuts. If you are familiar with windows operating system, you should be aware of most of them. Below is the list of all the major shortcut keys in Microsoft Excel. Ctrl...

ITech Analytic Solutions

0 0

Find SAP Training near you

Looking for SAP Training?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

SAP Questions

Is there any placement consultancy for helping SAP MM Fresher Job in Pune?

5 Answers

Which is the best course after B.com ,MBA F&M ,SAP MM Consultant level Training and having 5yrs exp into SAP MM End user ?

32 Answers

How to do my SAP institute advertisement on UrbanPro?

6 Answers

My name is sandeep i am working in mnc company mumbai, i am working on BI tools planing to learn sap...

12 Answers

Hi, I currently work in mainframe technology. i want to learn something new so planning to learn a different...

6 Answers

Looking for SAP Classes?

The best tutors for SAP Classes are on UrbanPro

Select the best Tutor
Book & Attend a Free Demo
Pay and start Learning

Learn SAP with the Best Tutors

The best Tutors for SAP Classes are on UrbanPro

I am a Student I am a Tutor
Name*	Please enter your full name. Please enter institute name.
Email*	Please enter your email address.
Phone*	Please enter a valid phone number.
Location*	Please enter a pincode or area name.
City*	Please enter city name.
Category*	Please enter category.
Gender*	Male Female Please select your gender.
Email ID/ Mobile No.*	Please enter either mobile no. or email.
Enter Password*	Please enter OTP Please enter Password Sorry, this phone number is not verified, Please login with your email Id.