-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Implementation of AdaBoost #33
Comments
We can start with the basic Algorithm 1.
Will get back to you on this. Until then if your doubt is resolved let me know. |
You can propose the API for algorithm 1 here in this issue. |
This is how we plan to store each T, but then each T will not be the same. So the question is how can we store all the T's together? |
Can you please share some fake function calls? Like how would the end user call the function to run the AdaBoost algorithm and in return what will be the output. |
For the data points: Output as storing all the stumps together: Hence fake function call will be like so: Then there can be another function which knows how to interpret the Further discussion of representing the stump
As I mentioned above, the issue of storing all the stumps together is because each stump is made using a particular feature. For any categorical feature (such as color of object) we can use one-hot-encoding (https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/). We will clearly need to remove all string type variables and make them numerical. So some kind of pre-proccessing of the data is required. But again in numerical there are various data types like |
Why can't we use |
Ok great that solves many of our doubts. We will be implementing Algorithm 1 from https://web.stanford.edu/~hastie/Papers/samme.pdf as follows: Here is the proposed directory structure.
Each stump (weak classifier) will be represented as a struct:
Please let us know what you think. We are ready to split this work amongst ourselves and begin coding it. |
I think the pipeline of training the algorithm, then using it for Inferencing should be more clear. How will you fit the weak classifiers while training? The paper doesn't specify clearly in the algorithm about that. Rest I have a design in my mind, will comment tomorrow here. |
No hurries. What I am thinking is that training, and predicting should require only the data from the user, and rest of the implementation details should be hidden. template <class data_type>
class AdaBoost
{
private Matrix<data_type> data;
private Vector<unsigned> classes;
private Vector<Classifier> T;
private Vector<date_type> alpha;
private AdaBoost(Matrix<data_type> data, Vector<unsigned> classes);
static public AdaBoost* createModel(Matrix<data_type> data, Vector<unsigned> classes);
void train();
unsigned predict(Vector<data_type> input);
Vector<unsigned> predict(Matrix<data_type> input);
Vector<Classifier> get_classifier();
Vector<data_type> get_weights();
} Now, the thing to be decided upon is how to decide the interface of the classifer. The algorithm has a step,
How will we actually do the above step? |
So in order to do this we will need to implement fitting a weak classifier. Here is our plan. pseudocode:
How to find weak classifier for feature Method 1:
Method 2: This means no re-sampling. Directly proceed with step two of the above algorithm. Only now to find the best possible classifier, instead of using GINI INDEX, we should use another method of calculation which also takes in the weights of the data points. I have not yet been able to find an appropriate formula which takes weights of individual examples into account when measuring the performance of a decision tree. But if we are able to find one we can implement it. Which method to choose? |
As suggested in this paper there are multiple ways to find any weak classifier that does the job of classifying examples better than random guessing. So, the first thing to take care of is to allow the end user to define their own weak learning algorithm and use it inside AdaBoost training. struct BaseProperties
{
};
template <class data_type>
class WeakClassifier
{
private Matrix<data_type>* data;
private Vector<unsigned>* classes;
private BaseProperties* classifier_information;
private WeakClassifier(Matrix<data_type> data, Vector<unsigned> classes);
static public WeakClassifier* createWeakClassifier(Matrix<data_type> data, Vector<unsigned> classes);
void train(Vector<data_type> example_weights);
unsigned predict(Vector<data_type> input);
Vector<unsigned> predict(Matrix<data_type> input);
} Now, inside Now, coming on to implementing a default weak classifier, I had thought of using a linear weak classifier, which simply uses a linearly weighted sum of the features of inputs. While training the weak classifier, simple gradient descent can be applied on binary-crossentropy and then after every update, we can check whether it works better than random-guessing(see the function in page 18, step 1 of http://people.csail.mit.edu/dsontag/courses/ml12/slides/lecture13.pdf), if it is stop the training and return to the main algorithm. I don't know if it will work but since the task of finding weak classifiers is pretty open, any method should be acceptable. |
Ok, we will go ahead with user-defined weak classifier then. However every source we looked at online mentions using decision stumps as a weak classifier. It seems this is the traditionally used weak classifier so we feel that this will be the best fit to choose as the default weak classifier. |
Well, on what loss function do these decision stumps optimise? Or in other words, how will it be ensured that these stumps are better than random guesses. Also, what will be go inside the |
These stumps will definitely be better than random guesses. As we mentioned above, the method we would use to compare will be Gini Index or Gini Impurity. There are some variations of how to calculate it, we have followed the one illustrated here: https://stats.stackexchange.com/questions/308885/a-simple-clear-explanation-of-the-gini-impurity How to ensure that the stump will be better than a random classifier? Gini Impurity of a random classifier will have each value pi = 1/2 since there are two classes. Plugging this in the formula gives Gini Impurity of a random classifier as
Please see the explanation we gave in our initial design proposal: #33 (comment) |
I see. You can proceed with the implementation. Make a new folder, |
Description of the problem
Task is to discuss implementation of both Multi-Class and Two-Class AdaBoost. Paper we have referred to is this: https://web.stanford.edu/~hastie/Papers/samme.pdf
Current Thoughts
SAMME.R
Initial Doubts
where the RHS is expanded as:
We are not able to understand what the
f
and theI
symbol mean.The text was updated successfully, but these errors were encountered: