Use the outline of code we discussed in class to create a decision tree for the IrisDataSet which predicts the Type column using the other attributes. Create three versions of this tree: one using entropy, one using the Gini coefficient, and one using the Classification error as splitting criteria. Use the first half of the data set as the training data and the second half as the test data. Provide the error rate for each tree.
An office supply store tests a telemarketing campaign to its existing business customers. The company targeted approximately 16,000 customers for the campaign. Assume you are a consultant brought on board to help the company leverage and use the findings from the tests to its advantage. Refer to the accompanying spreadsheet, which contain the results of the tests.
The detailed requirements and expected deliverables are mentioned in Capstone Assignment.docx.
Three sample presentations are attached for reference. Data to be used are in excel file.
I have done 80%, I need to fulfil some functional requirements.
1. The algorithm needs to change, the attributes is dynamically read from the uploaded csv file. Details I can share with you later on.
2. Once the algorithm is modified successfully, I need to display the final result graph/table onto Tkinter GUI, inside a canvas frame.
Research Report - Twitter Analysis and Presentation
The aim of this assignment is to collect Twitter data, summarise the data using a spreadsheet or other tool, and then write a report about that data. The purpose of the report is to investigate and discuss the use of twitter analysis by researchers, brands or journalists (depending on your major). The report is not meant to be written as a public facing report or feature, but rather an internal research report that might be used in a professional context or to inform your own practices.
You can choose to follow a group of people or a hashtag/hashtags over a period of time that will yield a reasonable sized data set ( a few thousand tweets at least up to a max of 250 thousand is about the right size for this task, much bigger as Excel will struggle to open the file). Suitable targets could be hashtags for a TV show or media event, a new or defective product, a group of journalists attending a conference or the conference
Produce an illustrated report that uses analysis and techniques examined during lectures and practicals to examine the distribution, variation and relationships between at least two variables from the following London data:
UK Census
Air Quality
Roads and Parks
Airbnb
Another dataset for London as agreed with your lecturers
The following specific requirements apply (over and above the official Coursework Submission Requirements):
Students are expected to present and interpret a mix of descriptive statistics, maps, tables (and other visualisations) to provide an evidential base to describe spatial patterns and relationships. Literature should be used to support analysis of the patterns and relationships observed, including a discussion of the possible underlying drivers or causes. Analysis could be at neighbourhood, borough, or city scales.
You are free to develop a topic that speaks to your research and study interests, but some possible topics include: the