You have been approached by the airline Qantas to help them make a decision on the type of service they should introduce for their Brisbane customers wanting to fly to New York via Los Angeles.
Qantas already has a service operating between Brisbane (BNE) and Los Angeles (LAX) with one return flight per day. QF15 (BNE LAX) lands in Los Angeles at 6:25am local time, and QF16 (LAX BNE) departs Los Angeles at 11:45pm local time. QF15 on average lands 13mins early, and QF16 on average departs 29mins late.
Qantas has two options to introduce a service to New York. Firstly they can introduce their own Qantas operated service between New York and LAX. Secondly they can recommend to passengers a connecting airline that services New York, which they would book through Qantas. Regardless of what option is selected Qantas would like you to ensure that there is gap of two to four hours between the average departure/arrival time and the relevant service to/from New York.
Important facts about flights between New York and Los Angles:
- The flight time between New York and Los Angles is 6 hours both ways.
- New York is 3 hours ahead of Los Angeles in time zone.
- Both cities are within the United States so passengers do not have to pass through customs when travelling between the cities.
- New York has three airports, which are serviced by many carriers.
Qantas has supplied you six datasets in csv format that contain information about the flights leaving three New York City airports on a specified day. The csv files also contain some additional information that may be useful for your analysis.
Qantas would like you to analyze this data only and at a minimum report on:
- What services currently exist between each New York City airport and Los Angeles (LAX)?
- How the performance of each New York Airport compares?
- How the performance of each Airline compares?
- What factors may affect the performance of particular airlines, and New York airports, and any evidence of association?
- What suitable options are available to Qantas based on the information available, and what you would recommend to Qantas?
- What further data would help them better inform their decision?
How to approach the assignment
You have to work within the data provided, in other words, you cannot consider any other information than given in the CSV files. If you need to make an assumption while analyzing data, it should be meaningful.
You will be required to explore the data provided, improve its quality and process it for reporting and mining purpose. Once the data is ready for analysis (i.e. integrated and cleaned), you should be able to set the variables as per their roles in your chosen software. The variable roles should also be set according to the type of analyses, such as decision tree, clustering or association analysis, that you choose. Perform the various data analytics operations and report the meaningful results.
Overall the aim of the assignment is to produce a 1,750 word report that discusses the questions Qantas would like answered and identifies any trends in the data. To do this you will need to:
- Clean the data
- Reformat the data to remove redundancy and inconsistency
- Remove outliers from the data (this is easiest to do in excel using the Z score method)
- Determine the variables and their roles (e.g. categorical or numerical, input or target)
- Build a cube, perform multivariate analysis and discuss the outcome
- Import the data into a data analytics package, such as Orange
- Use software to manipulate the data, calculate descriptive statistics, make tables and graphs, and use data-mining tools to identify patterns and trends in the data
- Make recommendations to the company based on what you find in the data
You are to submit the following files, compressed as assignment2.zip, through Blackboard:
- CSV files containing your clean data that you used for analysis (import1.csv, import2.csv etc.)
- A CSV file containing only the data you identified as outliers and removed from your dataset (outliers.csv)
- A 1,750 word report (+/- 20%) which contains the following sections (report.pdf):
- Data Summary (quality improvement: errors and outliers removed etc.)
- Existing Services
- Airport Performance
- Airline Performance
- Cube based Multivariate Analysis and Data mining based trends influencing Performance
- Business Options and Recommendations
- Further Data Required
- Description of how software was used