Analytics Assignment
Please follow the Assignment (Attached) At least 600 Words, in addition to the responses to these questions. please upload the R Code file and Word Document.

You are working as a Senior Data Analyst for Northeastern Bank.
One of the initiatives currently being discussed is to implement a new customer evaluation process, which will allow the Bank wants to set certain Preferred Customer Interest Rates to attract more customers who will potentially have larger savings account balances.
For any new customers who want to open a Savings Account, the Bank Senior Leadership would like to be able to predict what the Savings Account balances would be, given certain data that is being gathered about each customer, from a newly developed DataMart.
This DataMart includes 3rd party credit score ratings, as well as customers credit card account limits. The Bank also gathers internal data from the Savings Account Application form, which includes customers income, age, education level, Student status.
The CFO wants to know which customer attributes are most favorable to a larger Savings account Balance.

Using the provided dataset, please answer the following questions:

1.    Create a set of simple linear regressions of each covariate against the outcome variable.
Based on your simple linear models, rank order the covariates by the strongest R^2 association to the outcome.  What are the top two covariates?

2.    Generate the Full model (Hint: In your code, list the covariate with the strongest association first, second strongest associate listed second, etc.)
Which of the covariates have significant association with the outcome in the full model?
Are these the same covariates you would have expected based on your simple linear models?

3.    Senior Leadership states that the acquisition of third party credit scores is a significant expense for the Bank, and is asking if this data is valuable?

Explain you data-driven recommendation to Senior Leadership, as to whether or not the Credit Ratings add significant value to the models predictive power.

Analytics Assignment submit  R code and Word document( in addition to the responses to these questions). at least  600 Words.

You are working as a Senior Data Analyst for Northeastern Bank.
One of the initiatives currently being discussed is to implement a new customer evaluation process, which will allow the Bank wants to set certain Preferred Customer Interest Rates to attract more customers who will potentially have larger savings account balances.
For any new customers who want to open a Savings Account, the Bank Senior Leadership would like to be able to predict what the Savings Account balances would be, given certain data that is being gathered about each customer, from a newly developed DataMart.
This DataMart includes 3rd party credit score ratings, as well as customers credit card account limits. The Bank also gathers internal data from the Savings Account Application form, which includes customers income, age, education level, Student status.
The CFO wants to know which customer attributes are most favorable to a larger Savings account Balance.

Using the provided dataset, please answer the following questions:

1.    Create a set of simple linear regressions of each covariate against the outcome variable.
Based on your simple linear models, rank order the covariates by the strongest R^2 association to the outcome.  What are the top two covariates?

2.    Generate the Full model (Hint: In your code, list the covariate with the strongest association first, second strongest associate listed second, etc.)
Which of the covariates have significant association with the outcome in the full model?
Are these the same covariates you would have expected based on your simple linear models?

3.    Senior Leadership states that the acquisition of third party credit scores is a significant expense for the Bank, and is asking if this data is valuable?

Explain you data-driven recommendation to Senior Leadership, as to whether or not the Credit Ratings add significant value to the models predictive power.

Assignment:
Question 1:
Find the mean, median and mode of the following:
Presto Printing Company sales for the month of February.
(NOTE: The company is closed on Sundays.)
February 2: $4,794
February 3: $5,954
February 4: $3,309
February 5: $3,106
February 6: $7,124
February 7: $2,349
February 9: $3,123
February 10: $4,128

February 11: $3,198
February 12: $2,198
February 13: $7,287
February 14: $1,323
February 16: $4,598
February 17: $3,987
February 18: $3,099
February 19: $3,098

February 20: $5,950
February 21: $3,209
February 23: $6,531
February 24: $3,098
February 25: $4,598
February 26: $4,873
February 27: $9,976
February 28: $5,878

Explain which measure of central tendency is best in this scenario and why.
Question 2:
a.    Set up a frequency distribution table and calculate the relative frequency for the following quiz scores that Matt achieved in his Algebra course.  Explain the difference between frequencies and relative frequencies.
Matts Quiz Scores: 87, 88, 92, 79, 100, 76, 61, 60, 79, 99, 100, 69, 98, 87, 92, 79, 81, 85, 100, and 100.
Use the following frequency intervals:
60 70
70 80
80 – 90
90 100
b. Calculate the variance and standard deviation for Matts quiz scores.
Suggestion: set the problem up in steps within a table like the textbook illustrates.
c. Explain standard deviation and how it relates to the data.
Question 3:
Bens Average Sales:
Monday $5,099
Tuesday $6,900
Wednesday $0
Thursday $10,500
Friday $9,407
Saturday $9,766
Sunday $0
Is it better for Ben to earn 7.5% based on his average daily sales or a flat $4,000 for the week? Explain the pros and cons of each pay plan.
Submitting Your Assessment:
When you are ready to submit your Assessment, click on the Course Assessment Dropbox and complete the steps below:
        Click the link that says Add Attachments.
        Click on the Upload
        Click the Add Attachments
        Locate you Course Assessment and click on Open.
        To view your graded work, come back to the Dropbox or go to the Gradebook after your instructor has evaluated it. Click the Dropbox to access it.
        Make sure that you save a copy of your submitted Assignment.

Comments from assigner:
In problem 1, the mean is incorrect.  Same thing happened with the variance, You had the wrong mean so each calculation in that table ended up being off.  Each calculation needs to be verified..

Submitted answers

Question 1:
Find the mean, median and mode of the following:
Presto Printing Company sales for the month of February.
(NOTE: The company is closed on Sundays.)
February 2: $4,794
February 3: $5,954
February 4: $3,309
February 5: $3,106
February 6: $7,124
February 7: $2,349
February 9: $3,123
February 10: $4,128

February 11: $3,198
February 12: $2,198
February 13: $7,287
February 14: $1,323
February 16: $4,598
February 17: $3,987
February 18: $3,099
February 19: $3,098

February 20: $5,950
February 21: $3,209
February 23: $6,531
February 24: $3,098
February 25: $4,598
February 26: $4,873
February 27: $9,976
February 28: $5,878

The median is $4057.5
$1123, $2149, $2198, $3098, $3098, $3099, $3106, $3123, $3198, $3209, $3309, $3987, $4128, $4598, $4598, $4694, $4873, $5878, $5954, $5980, $6531, $7124, $7287, $9876
($3987 + $4128)/2 = $8115/2  = $4057.5
The mean is 4425.75
$1123 + $2149 + $2198 + $3098 + $3098 + $3099 + $3106 + $3123 + $3198 + $3209 + $3309 + $3987 + $4128 + $4598 + $4598 +$4694 + $4873 + $5878 + $5954 + $5980 + $6531 + $7124 + $7287 + $9876 = $106218
$106218/24 = 4425.75
The mode is $3098 and $4598 which both occur twice.
Explain which measure of central tendency is best in this scenario and why.
The mean is the best measure of central tendency owing to the fact that it includes all the values in the data set. Moreover, any alteration in any of the values will lead to a change in the mean.

Question 2:
a.    Set up a frequency distribution table and calculate the relative frequency for the following quiz scores that Matt achieved in his Algebra course.  Explain the difference between frequencies and relative frequencies.
Matts Quiz Scores: 87, 88, 92, 79, 100, 76, 61, 60, 79, 99, 100, 69, 98, 87, 92, 79, 81, 85, 100, and 100.
Use the following frequency intervals:
60 70
70 80
80 – 90
90 100
Interval
Frequency
Relative frequency
60 – 70
3
3 (100/20) = 15.0%
70 – 80
4
4 (100/20) = 20.0%
80 – 90
5
5 (100/20) = 25.0%
90 – 100
10
10 (100/20) = 40.0%

b. Calculate the variance and standard deviation for Matts quiz scores.
Suggestion: set the problem up in steps within a table like the textbook illustrates.
Data value
Data value Mean value
Variation
60
60 85.15 = -25.15
(-25.15) (-25.15) = 632.52
61
61 85.15 = -24.15
(-24.15) (-24.15) = 583.22
69
69 85.15 = -16.15
(-16.15) (-16.15) = 260.82
72
72 85.15 = -13.15
(-13.15) (-13.15) = 172.92
78
78 85.15 = -7.15
(-7.15) (-7.15) = 51.12
79
79 85.15 = -6.15
(-6.15)2 = 37.82
79
79 85.15 = -6.15
(-6.15)2 = 37.82
81
81 85.15 = -4.15
(-4.15)2  = 17.22
85
85 85.15 = -0.15
(-0.15)2 = 0.02
85
85 85.15 = -0.15
(-0.15) 2 = 0.02
85
85 85.15 = -0.15
(-0.15)2 = 0.02
88
88 85.15 = 2.85
(2.85)2 = 8.12
92
92 85.15 = 6.85
(6.85)2 = 46.92
92
92 85.15 = 6.85
(6.85)2= 46.92
98
98 85.15 = 12.85
(12.85)2 = 165.12
99
99 85.15 = 13.85
(13.85)2 = 191.82
100
100 85.15 = 14.85
(14.85)2 = 220.52
100
100 85.15 = 14.85
(14.85)2  = 220.52
100
100 85.15 = 14.85
(14.85)2  = 220.52
100
100 85.15 = 14.85
(14.85)2  = 220.52

Sum of variation= 3087.58

Variance = sum of variation/n-1 = 3087.58/19 = 162.50
Standard deviation = sq. root of 162.50 = 12.75
c. Explain standard deviation and how it relates to the data.
The standard deviation measures the dispersion of the dataset. In this case, it is important because it shows how spread out the scores Matt has been getting are..

Question 3:
Bens Average Sales:
Monday $5,099
Tuesday $6,900
Wednesday $0
Thursday $10,500
Friday $9,407
Saturday $9,766
Sunday $0
Is it better for Ben to earn 7.5% based on his average daily sales or a flat $4,000 for the week? Explain the pros and cons of each pay plan.
Add all the average sales
Sum of the sales = $41,672
0.075 41,672 = $3,125.40
Based on the fact that the amount to be earned from the daily sales is lower than the proposed flat salary, I believe that Ben should settle for the $4,000 pay plan. The advantage is that the payment is consistent. However, it may lead to reluctance to work hard to push more sales. On the other hand, the payment based on the daily sales is advantageous because it is based on the daily sales hence the more one sells the higher one gets paid. It may only be limited when the sales drop significantly hence leading to a low payout.

One of the required readings for this week was about how PayPal uses Big Data.
https://www.dezyre.com/article/big-data-use-cases-how-paypal-leverages-big-data-analytics/231
Answer the following questions as a post:
1. What enabled PayPal to see see benefit from their data – was it the technology or business that allowed them to gain insight?
2. What future investments (people, technology, process) should PayPal make in their approach to Big Data that will allow their business to thrive?

Read The Role of Data Analytics in Predictive Policing
https://www.govtech.com/data/Role-of-Data-Analytics-in-Predictive-Policing.html

and at least 2 other related scholarly articles that you find. Then write a post that answers the following questions:
How might data analytics improve policing?
How might it hurt policing? What ethical issues might be raised by predictive policing?
Be sure to cite all three articles in your post..

Prepare a report on the status of research in the Arab world(UAE specifically), in terms of challenges that researchers face.
Initiatives that tackle these challenges, funding opportunities (e.g., UAE ICT Fund).
research priorities according to countries needs, and other aspects that you see relevant
Three References at least.
References page.
Grading scheme: Content depth and analysis(65%), Content organization(10%), References(15%), Language accuracy(10%)
Additional materials:

A useful link: http://unesdoc.unesco.org/images/0018/001899/189958e.pdf

Objectives
Survey the publicly available scientific literature as it relates to your drug class.
Determine the availability of publications that describe PK and PD of one or more drugs from your group’s drug class.
Begin to distinguish published data as describing PK vs. PD properties, in vitro vs. in vivo studies, and preclinical vs. clinical studies.
Directions

First, you will need to find at least five articles that you believe present PK and PD data related to your chosen drug class and/or drug. Note that not all articles need to have both PK and PD, but among the 5 articles, both PK and PD must be represented.
For each of the 5 articles, write a few sentences, in your own words, that describe the main PK and/or PD finding(s).
As examples,
“Publication #1 was a study that included assessment of the PK of drugX in mice that received 10 mg/kg oral doses . The results indicated DrugX is extensively metabolized into active metaboliteX, which is primarily excreted in urine. The authors concluded that a majority of the activity of DrugX resides within active metaboliteX.”

“Publication #2 evaluated the clinical efficacy of DrugX in a randomized phase 3 clinical trial. A total of 430 patients were accrued to the study, in which patients received either drugX at a daily dose of 50 mg orally or the standard of care therapy, drugY, at a dose of 10 mg twice daily. The results indicated patients receiving drugX had a higher response rate (54%, 95% CI 40% – 68%) compared to those receiving drugY (37%, 95% CI 21% – 51%). This study enabled FDA approval of drugX in the disease population evaluated.”

Next, select one of the articles that includes PK data and one that includes PD data (or if one article includes both PK and PD, you may use a single article) .

For each, write one brief paragraph summarizing the PK and PD data in more detail.

For example, was the PK data from in vitro studies (i.e. in cell culture or cell-free systems), in animals or in humans? What PK parameters were reported? What pharmacodyanmic parameter was being studied? How did the investigators measure drug or metabolite concentrations (for PK) or the efficacy or toxicity endpoints (for PD)? Was the PD endpoint a molecular marker (e.g. gene or protein expression) or a clinical endpoint (heart rate, blood pressure, glucose levels, pain, fatigue, etc.)?
NOTES: You will use peer-reviewed, scientific journal articles as your primary source of information in this course (roughly 70 – 95%).
The remaining material (roughly 5 – 30%) will come from FDA reports or other, official scientific documents from companies or regulatory agencies. Realize that marketing websites do not represent official scientific documentation.

Also, since this assignment is early in the course, we do not expect you to fully understand all PK/PD parameters, how they are measured or what they may mean in the context of the studies. Most importantly, we want you to begin exploring these and begin to familiarize yourself with the various concepts, which by the end of the course, we expect you will have mastered. 

What to Include
Submit your work as a single word document with the following:

5 brief summaries – one for each article – which are two to three sentences in length, for articles that relate to your chosen drug class.

Two paragraphs describing PK and PD data from the articles.

Each article properly cited in a “References” or “Bibliography” page. Please note that we will not be critical of the reference style you use.
To make it easy for now, simply adopt one of the styles within the References section of one of the published scientific articles that you review.
Ultimately, your group project report should include a References section with citations that are consistent throughout the report document.
As long as the reference style used is consistent with one or more peer reviewed scientific journals and includes all appropriate details (author names, article title, journal name, volume/issue, and publication date), it will be fine.

Resources
RESOURCE: PubMed Search Tips
RESOURCE: FDA Search Tips
RESOURCE: Scientific Communication and Writing

An introduction and reflection need to be written based on the full paper.

The Introduction is a summary of the management problem and research problem ( 1/2 page )

Reflection: How it can be hard writing a paper and trying to put together work done by each partner so it make sense. Why write a research paper. (1/2 page )

Both the introduction and the reflection is based on the 2 attached documents! B

An introduction and reflection need to be written based on the full paper.

The Introduction is a summary of the management problem and research problem

Reflection: How it can be hard writing a paper and trying to put together work done by each partner so it make sense. Why write a research paper.

Both the introduction and the reflection is based on the 2 attache documents!

Discussion 1:
When we roll one die, we have a 1 in 6 probability of getting any particular number on the die.  When we roll a pair of dice, there are 36 different pairs that can be produced, yet only 11 actual distinct values.

Explain how the probability associated with the roll of each individual die in the pair explains the higher variability in the total outcome of the roll of each pair.  How do the concepts of permutations and combinations apply to this example? Discuss how the notion of degree of freedom can be used to illustrate the accumulating results of a set of dice rolls.

Discussion 2:

Bayes Theorem deals with the calculation of posterior probabilities, which isnt always a natural thing to do. Were used to forward-chaining our probabilities (e.g., if we roll a 3 on a die, whats the probability the second roll will give us a total of 8?).  Backward-chaining is less intuitive (e.g. if our total on the die was an 8, whats the probability that the first roll was a 3?).  Since the rules of probability involve simple addition and multiplication, they work fine in both directions.  The thing that makes posterior probability more difficult is that we simply arent used to thinking about things that way.

Our chapter reading provides an example of a diagnostic test for a rare disease.  The resulting confidence in a positive test result is surprisingly low.  Discuss why that is so.  What is happening in the interaction of the various probabilities that leads to this outcome?

Discussion 3:
The Monty Hall Problem in one of this week’s readings (in Wheelan) offers a perspective on why it is so important to understand and trust probability theory. It’s conclusion is counter-intuitive to some people, so people will sometimes strongly resist changing their minds to obtain a better result. It illustrates how we tend to commit to an opinion or a choice we’ve made even when we should be willing to change our minds in the face of new data.

Explain whether you agree or disagree with the idea of switching as described in the readings, and offer a reason why your opinion should be correct. (Note: Switching is a good idea, so if you disagree you’d better offer a good reason.) Discuss how this scenario might be seen in other decisions that we have to make in engineering, or that we might observe being made by management. Do you think probability theory enters into people’s thinking sufficiently?

Answer each discussion in about 100 words no calculations online discuss. Make it simple please