Abstract
With the world economy recovering and the increasing activity in the second-hand market, the second-hand sailboat market has shown enormous potential. In order to help a Hong Kong second-hand sailboat broker better understand the market and make accurate predictions, we searched for and supplemented other relevant data on second-hand sailboats based on the given data, and established a multiple linear regression model and other models on this basis.
In Task 1, we first supplemented the existing data by adding six variables. We then conducted data cleaning and encoding, and performed multicollinearity tests, finding that there was multicollinearity. We used the stepwise regression method for feature selection to reduce multicollinearity, and then established a multiple linear regression model and calculated the mean absolute error for different variants of sailboats. We then used the Shapiro-Wilk test method to judge the normal distribution of errors and found that the data showed a slight deviation from normal distribution. Therefore, this model could be established. The results showed that Length, Make, and Year had the greatest impact on the listing price of each sailboat, and the model had high accuracy in estimating the prices of different variants of sailboats.
In Task 2, we conducted one-way ANOVA on the regions, calculated the intergroup differences and total dispersion of each feature, and then used classification and summarization to obtain the average prices of each feature. We found that the regions had an impact on the prices, and the regional effects were not consistent for all sailboat variants. In addition, we found differences in the lengths of second-hand sailboats in different countries, which were mostly distributed between 40ft-50ft.
In Task 3, we first made box plots of the average prices in different geographical regions, then made radar charts of each feature in these three regions, and finally re-established a multiple linear regression model by using whether each sailboat type was sold in Hong Kong as a 0-1 variable. We found that the second-hand sailboat markets in the United States and Hong Kong were similar. The prices of sailboats in Hong Kong were generally higher than those in other regions, and their impact on Monohulled Sailboats and Catamarans was different.
In Task 4, we calculated the average prices of second-hand sailboats in each country, visualized the data, and established a map model. We found that GDP per capita (USD) and GDP (USD billion) had significant differences in their impact on length.
In Task 5, based on the searched second-hand sailboat transaction data in Hong Kong, we established a model for the transaction price and frequency of sailboats according to their single or double type, and found that brokers should pay attention to the main features of second-hand sailboats and successful cases in the US second-hand sailboat market, as well as the price range of sailboats.
To improve our model, we adopted the random forest model that can handle nonlinear relationships, improve the robustness of the model, avoid overfitting, and increase prediction accuracy.
Publisher
Darcy & Roy Press Co. Ltd.
Reference5 articles.
1. PELES, YORAM C. “ON THE DEPRECIATION OF AUTOMOBILES.” International Journal of Transport Economics / Rivista Internazionale Di Economia Dei Trasporti, vol. 15, no. 1, 1988, pp. 43–54. JSTOR, http://www.jstor.org/stable/42748214. Accessed 3 Apr. 2023.
2. Ackerman, Susan Rose. “Used cars as a depreciating asset.” Economic Inquiry 11.4 (1973): 463.
3. Shafiee, Mahmood, and Stefanka Chukova. “Optimal upgrade strategy, warranty policy and sale price for second-hand products.” Applied Stochastic Models in Business and Industry 29.2 (2013): 157-169.
4. Dong, B. (2022) Statistics Study Notes: Analysis of Variance, Analysis of Variance Analysis of variance within and between groups. Available at: https://blog.csdn.net/BernardDong/article/details/126177688 (Accessed: April 4, 2023).
5. Uyanık, Gülden Kaya, and Nes¸e Güler. “A study on multiple linear regression analysis.” Procedia-Social and Behavioral Sciences 106 (2013): 234-240.