Category: Information Systems

Data Science_W3

 

Initially summarize the questions attempted

Q1: ISLR Textbook chapter review questions

Review the following sections in Python

Q2 Textbook Theory Questions 3.7 Exercises

1. Describe the null hypotheses to which the p-values given in Table 3.4 correspond. Explain what conclusions you can draw based on these p-values. Your explanation should be phrased in terms of sales, TV, radio, and newspaper, rather than in terms of the coefficients of the linear model.

Q3 Applied Textbook Questions with Python 3.7 Exercises

Hint several github sites have the complete solution in python e.g.

8. This question involves the use of simple linear regression on the Auto data set. (a) Perform a simple linear regression with mpg as the response and horsepower as the predictor. Print the results. Comment on the output. For example:

i. Is there a relationship between the predictor and the response?

ii. How strong is the relationship between the predictor and the response?

iii. Is the relationship between the predictor and the response positive or negative?

iv. Predictions

  1. What is the predicted mpg associated with a horsepower of 98? What are the associated 95 % confidence and prediction intervals?
  2. Plot the response and the predictor, display the least squares regression line.
  3. Produce diagnostic plots of the least squares regression fit. Comment on any problems you see with the fit. 

9. This question involves the use of multiple linear regression on the Auto data set.

(a) Produce a scatterplot matrix which includes all of the variables in the data set.

(b) Compute the matrix of correlations between the variables.

(c) Perform a multiple linear regression with mpg as the response and all other variables except name as the predictors and print the results. Comment on the output. For instance:

i. Is there a relationship between the predictors and the response?

ii. Which predictors appear to have a statistically significant relationship to the response?

iii. What does the coefficient for the year variable suggest?

(d) Produce diagnostic plots of the linear regression fit. Comment on any problems you see with the fit. Do the residual plots suggest any unusually large outliers? Does the leverage plot identify any observations with unusually high leverage?

(e) Fit linear regression models with interaction effects. Do any interactions appear to be statistically significant?

(f) Try a few different transformations of the variables, such as log(X), X, X2. Comment on your findings.

Opional

15. This problem involves the Boston data set, which we saw in the lab for this chapter. We will now try to predict per capita crime rate using the other variables in this data set. In other words, per capita crime rate is the response, and the other variables are the predictors.

(a) For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response? Create some plots to back up your assertions.

(b) Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis H0 : j = 0?

(c) How do your results from (a) compare to your results from (b)? Create a plot displaying the univariate regression coefficients from (a) on the x-axis, and the multiple regression coefficients from (b) on the y-axis. That is, each predictor is displayed as a single point in the plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its coefficient estimate in the multiple linear regression model is shown on the y-axis.

(d) Is there evidence of non-linear association between any of the predictors and the response? To answer this question, for each predictor X, fit a model of the form Y = 0 + 1X + 2X2 + 3X3 + .

Potential topics and titles for Journal article

 

Last week you started researching and writing annotated bibliography. As you are doing so, you are starting to think about a topic, preferably related to your dissertation expertise, to write a journal article on for publication. In this weeks discussion I would like for everyone to start sharing their potential topics and potential article titles.

You must do the following:

1) Create a new thread. As indicated above, please provide potential topics for your journal article and potential titles.

Introduction to data mining

 

1) The rule-based classification can be used to refer to any classification scheme that make use of IF-THEN rules for class prediction. Discuss the rule-based classification schemes and what is Rule Pruning in data mining?

2) Bayesian classification is based on Bayes’ Theorem. Bayesian classifiers are the statistical classifiers. Discuss what is Bayesian classification in data mining? How do Bayesian networks work? What do Bayesian networks predict?

Research Paper

  

Self-Driving Cars: Autonomous vehicles require incredible data processing capabilities and system speeds needed to mimic the timing of human reflexes. Companies like Ouster are developing light imaging, detection, and ranging applications that are the key to standard improvements like lane control and adaptive cruise control. But are also turning fully autonomous vehicles into mobile data centers, allowing driverless cars to make real-time, complex decisions. Is this a good thing? Will society ultimately benefit from these technologies? Or will this trend put hundreds of thousands of Americans out of work who drive for a living?

Format

  

Components: Each Residency Project Paper will require that several parts be submitted in whole. 

PART 1: INTRODUCTION Your team must write an introduction section that introduces your topic 500 to 800 words

PART 2: HISTORY – Your team must write a history section. How did your research topic come to be over time? When was it introduced? What social shortcoming did it resolve? Why did it become popular or trendy? 750 to 1000 words

PART 3: ADVANCEMENTS and FAILURES How did the first versions or early adopters perform? What portions of your subject did well? What failed? What aspects have been added over time? What has been eliminated? 750 to 1000 words

PART 4: FINAL CONCLUSION Do you think your subject topic will grow? Do you think it will become obsolete? Why? What do you see as the future of your topic: Do you have any other recommendations? Will it be replaced? 500 to 800 words

PART 5: Accompanying PowerPoint presentation your team will have a maximum of 20 minutes to present a PowerPoint presentation of your project. You must have a minimum of 5 slides but no more than 10 slides. The presentation should be interesting, brief, and informative – just the facts, no fluff or extra verbiage. Think of your presentation as a movie trailer. Short and to-the-point.

APA format & References Page a minimum of FIVE solid references. Your teams research paper is to be written in complete and clean APA format. Your references do NOT have to be scholarly references. They can be commercial media references, but should be from professional organizations or magazine articles. NO WIKIPEDIA.

The INTRO, HISTORY, and ADVANCEMENTS/FAILURES sections will be due throughout the course BEFORE the Residency. Your Team will collectively assemble all of the written sections, adding the Conclusion and the References Page as ONE RESEARCH PAPER for the Team and submit it on SUNDAY of the residency weekend.

Business Intelligence-Discussion

 

Data mining is a complex subject dominated by emerging technologies and privacy regulations, and consumers gained better control over their personal data when the General Data Protection Regulation became enforceable on May 25, 2018. Under GDPR, profiling is determined to be any kind of automated personal data processing that analyzes or predicts certain aspects of an individuals behavior, socioeconomic situation, movements, preferences, health and so forth.

In your initial post, describe two major impacts that GDPR has on the process and practice of data mining.

Respond substantively to at least two other students’ posts. Comment on how GDPR has changed the way in which every business stores, processes, transfers, and analyzes its data based on the impacts discussed in your classmate’s initial post.

*please remember to include at least one credible scholarly reference with your initial post!

Data Analyzing & Visualization

 

Reflect on your most recent visualisation project and try to sketch or write out the approach you took. What stages of activity did you undertake and in what sequence? Did it feel efficient or chaotic? Was it interrupted by changes, uncertainty or a sense of too much choice? Before you can seek to improve your ongoing approach it is worth unpicking what you currently do and how you do it.

Assignment Link: http://book.visualisingdata.com/chapter/chapter-2

paper

Write an essay discussing sqlmap, an automated tool for sql injection and database takeover in 500 words or more. Why do we need an automated tool for sql injection? 

Do not copy without providing proper attribution. This paper will be evaluated through SafeAssign. 

Write in essay format not in outline, bulleted, numbered or other list format.  

Use the five paragraph format. Each paragraph must have at least five sentences. 

Include an interesting meaningful title.

Include at least one quote from each of 3 different articles, place the words you copied (do not alter or paraphrase the words) in quotation marks and cite in-line (as all work copied from another should be handled). The quotes should be one  full sentence (no more, no less) and should be incorporated in your discussion (they do not replace your discussion) to illustrate or emphasize your ideas. Each quote must be cited in-line and at the end. 

Cite your sources in a clickable reference list at the end. Do not copy without providing proper attribution (quotation marks and in-line citations). Write in essay format not in bulleted, numbered or other list format. 

It is important that you use your own words, that you cite your sources, that you comply with the instructions regarding length of your submission Do not use spinbot or other word replacement software. It usually results in nonsense and is not a good way to learn anything. I will not spend a lot of my time trying to decipher nonsense. Proof read your work or have it edited. Find something interesting and/or relevant to your work to write about.  Please do not submit attachments unless requested.

Should be in APA format with references and citations.

Assig M

Please make sure to read, Chapter 3 The Quality of Social Simulation:An Example from Research Policy Modelling, Petra Ahrweiler and Nigel Gilbert. 
Chapter 3 discusses methods to assess the quality of simulations. You learned about three different views of simulation quality.

Suppose you lead a task force that is developing a simulation to provide strategic planning recommendations for property use zoning for a county of 750,000 residents. The zoning board and county commissioners want a simulation that allows them to assess the impact of various zoning decisions based on a variety of dynamic factors, including age, race, education, and income status. Which of the three views discussed would provide the best quality assessment for this type of simulation? How would you ensure the highest level of accuracy with your simulation, and how would you go about determining accuracy?

As indicated above, identify which of the three views discussed in the chapter that would provide the best quality assessment for the situation described above, and explain your decision. How would you ensure the highest level of accuracy with your simulation, and how would you go about determining accuracy?

Anonymous

 

The development of a database requires thorough methodology that ensures quality within the solution. Imagine you have been contracted to develop a finance database that will help an organization track monthly expenditures by departments. Using the DBLC, discuss the various activities that you would have to achieve each phase. Assess possible challenges that may exist within each phase. Suggest actions that one can perform in order to overcome these possible challenges.

In order to mitigate risks associated with a database, it is essential to consider common sources of database failures. Describe at least two possible database failures that may occur once a database is placed into operation. Suggest actions that may be performed in order to avoid or mitigate these possible failures.

Cryptography Keys

 

Cryptography provides confidentiality, integrity authentication, and nonrepudiation for sensitive information while it is stored (at rest), traveling across a network (in transit), and existing in memory (in use). Cryptography keys play in the world of data security and are an extremely important security technology embedded in many of the security controls used to protect information from unauthorized visibility and use.   
Lets say you work for one of the following types of industry: 

  • Manufacturing 
  • Government 
  • Research 
  • Service 
  • Consulting 

After you choose one of the above, consider the three types of algorithms commonly used today. Which do you find to be the most secure? Which is the most complex? Which did you struggle to understand? What do you think you need to know as a manager in order to choose the right security systems for your company?  Be sure to fully develop your responses and support your opinion with reasons from your study this week.