The authors of this work propose an algorithm that determines optimal search keyword combinations for querying online product data sources in order to minimize identification errors during the product feature extraction process. Data-driven product design methodologies based on acquiring and mining online product-feature-related data are presented with two fundamental challenges: (1) determining optimal search keywords that result in relevant product related data being returned and (2) determining how many search keywords are sufficient to minimize identification errors during the product feature extraction process. These challenges exist because online data, which is primarily textual in nature, may violate several statistical assumptions relating to the independence and identical distribution of samples relating to a query. Existing design methodologies have predetermined search terms that are used to acquire textual data online, which makes the resulting data acquired, a function of the quality of the search term(s) themselves. Furthermore, the lack of independence and identical distribution of text data from online sources impacts the quality of the acquired data. For example, a designer may search for a product feature using the term “screen,” which may return relevant results such as “the screen size is just perfect,” but may also contain irrelevant noise such as “researchers should really screen for this type of error.” A text mining algorithm is introduced to determine the optimal terms without labeled training data that would maximize the veracity of the data acquired to make a valid conclusion. A case study involving real-world smartphones is used to validate the proposed methodology.

References

1.
Tucker
,
C. S.
, and
Kim
,
H. M.
,
2009
, “
Data-Driven Decision Tree Classification for Product Portfolio Design Optimization
,”
ASME J. Comput. Inf. Sci. Eng.
,
9
(
4
), p.
041004
.
2.
de Albornoz
,
J. C.
,
Plaza
,
L.
,
Gervás
,
P.
, and
Díaz
,
A.
,
2011
, “
A Joint Model of Feature Mining and Sentiment Analysis for Product Review Rating
,”
Advances in Information Retrieval
,
Springer
, Berlin, Heidelberg, pp.
55
66
.
3.
Rai
,
R.
,
2012
, “
Identifying Key Product Attributes and Their Importance Levels From Online Customer Reviews
,”
ASME
Paper No. DETC2012-70493.
4.
Tuarob
,
S.
, and
Tucker
,
C.
,
2015
, “
Quantifying Product Favorability and Extracting Notable Product Features Using Large Scale Social Media Data
,”
ASME J. Comput. Inf. Sci. Eng.
,
15
(
3
), p.
031003
.
5.
Chou
,
A.
, and
Shu
,
L. H.
,
2014
, “
Towards Extracting Affordances From Online Consumer Product Reviews
,”
ASME
Paper No. DETC2014-35288.
6.
Zhou
,
F.
,
(Roger) Jiao
,
J.
, and
Linsey
,
J.
, “
Latent Customer Needs Elicitation by Use Case Analogical Reasoning From Sentiment Analysis of Online Product Reviews
,”
ASME J. Mech. Des.
,
137
(
7
), p.
071401
.
7.
Tuarob
,
S.
,
Tucker
,
C. S.
,
Salathe
,
M.
, and
Ram
,
N.
,
2014
, “
An Ensemble Heterogeneous Classification Methodology for Discovering Health-Related Knowledge in Social Media Messages
,”
J. Biomed. Inf.
,
49
, pp.
255
268
.
8.
Phan
,
X.-H.
,
Nguyen
,
L.-M.
, and
Horiguchi
,
S.
,
2008
, “
Learning to Classify Short and Sparse Text and Web With Hidden Topics From Large-Scale Data Collections
,”
17th International Conference on World Wide Web
, pp.
91
100
.
9.
Hu
,
X.
,
Sun
,
N.
,
Zhang
,
C.
, and
Chua
,
T.-S.
,
2009
, “
Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge
,” 18th
ACM
Conference on Information and Knowledge Management
, pp.
919
928
.
10.
Ginsberg
,
J.
,
Mohebbi
,
M. H.
,
Patel
,
R. S.
,
Brammer
,
L.
,
Smolinski
,
M. S.
, and
Brilliant
,
L.
,
2009
, “
Detecting Influenza Epidemics Using Search Engine Query Data
,”
Nature
,
457
(
7232
), pp.
1012
1014
.
11.
Culotta
,
A.
,
2010
, “
Towards Detecting Influenza Epidemics by Analyzing Twitter Messages
,”
First Workshop on Social Media Analytics
, New York, pp.
115
122
.
12.
Glier
,
M. W.
,
McAdams
,
D. A.
, and
Linsey
,
J. S.
,
2014
, “
Exploring Automated Text Classification to Improve Keyword Corpus Search Results for Bioinspired Design
,”
ASME J. Mech. Des.
,
136
(
11
), p.
111103
.
13.
Aramaki
,
E.
,
Maskawa
,
S.
, and
Morita
,
M.
,
2011
, “
Twitter Catches the Flu: Detecting Influenza Epidemics Using Twitter
,”
Conference on Empirical Methods in Natural Language Processing
, Stroudsburg, PA, pp.
1568
1576
.
14.
Paul
,
M. J.
, and
Dredze
,
M.
,
2011
, “
A Model for Mining Public Health Topics From Twitter
,”
Health
, Technical Report, Johns Hopkins University.
15.
Stone
,
T.
, and
Choi
,
S.-K.
,
2013
, “
Extracting Consumer Preference From User-Generated Content Sources Using Classification
,”
ASME
Paper No. DETC2013-13228.
16.
Fuge
,
M.
,
Peters
,
B.
, and
Agogino
,
A.
,
2014
, “
Machine Learning Algorithms for Recommending Design Methods
,”
ASME J. Mech. Des.
,
136
(
10
), p.
101103
.
17.
Slonim
,
N.
, and
Tishby
,
N.
,
2001
, “
The Power of Word Clusters for Text Classification
,”
23rd European Colloquium on Information Retrieval Research
, Vol.
1
.
18.
Dong
,
A.
, and
Agogino
,
A. M.
,
1997
, “
Text Analysis for Constructing Design Representations
,”
Artif. Intell. Eng.
,
11
(
2
), pp.
65
75
.
19.
Wassenaar
,
H. J.
,
Chen
,
W.
,
Cheng
,
J.
, and
Sudjianto
,
A.
,
2005
, “
Enhancing Discrete Choice Demand Modeling for Decision-Based Design
,”
ASME J. Mech. Des.
,
127
(
4
), pp.
514
523
.
20.
Yoshimura
,
M.
,
Taniguchi
,
M.
,
Izui
,
K.
, and
Nishiwaki
,
S.
,
2006
, “
Hierarchical Arrangement of Characteristics in Product Design Optimization
,”
ASME J. Mech. Des.
,
128
(
4
), pp.
701
709
.
21.
Zhao
,
Y.
,
Qin
,
B.
,
Hu
,
S.
, and
Liu
,
T.
,
2010
, “
Generalizing Syntactic Structures for Product Attribute Candidate Extraction
,”
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
, pp.
377
380
.
22.
Wang
,
L.
,
Youn
,
B. D.
,
Azarm
,
S.
, and
Kannan
,
P. K.
,
2011
, “
Customer-Driven Product Design Selection Using Web Based User-Generated Content
,”
ASME
Paper No. DETC2011-48338.
23.
Tucker
,
C. S.
, and
Kim
,
H. M.
,
2011
, “
Trend Mining for Predictive Product Design
,”
ASME J. Mech. Des.
,
133
(
11
), p.
111008
.
24.
Poppa
,
K.
,
Arlitt
,
R.
, and
Stone
,
R.
,
2013
, “
An Approach to Automated Concept Generation Through Latent Semantic Indexing
,”
IIE
Annual Conference
, p.
151
.
25.
Lemeshow
,
S.
,
Hosmer
,
D. W.
,
Klar
,
J.
,
Lwanga
,
S. K.
, and
World Health Organization
,
1990
,
Adequacy of Sample Size in Health Studies
, Wiley, Chichester, UK.
26.
Müller
,
P.
,
Parmigiani
,
G.
,
Robert
,
C.
, and
Rousseau
,
J.
,
2004
, “
Optimal Sample Size for Multiple Testing: The Case of Gene Expression Microarrays
,”
J. Am. Stat. Assoc.
,
99
(
468
), pp.
990
1001
.
27.
Fritz
,
M. S.
, and
MacKinnon
,
D. P.
,
2007
, “
Required Sample Size to Detect the Mediated Effect
,”
Psychol. Sci.
,
18
(
3
), pp.
233
239
.
28.
Byrd
,
R. H.
,
Chin
,
G. M.
,
Nocedal
,
J.
, and
Wu
,
Y.
,
2012
, “
Sample Size Selection in Optimization Methods for Machine Learning
,”
Math. Program.
,
134
(
1
), pp.
127
155
.
29.
“Customer Review,” Amazon, last accessed Jan. 24, 2016, http://www.amazon.com/review/R1ZZ4LU5RWTHXZ/ref=cm_cr_dp_cmt#wasThisHelpful
30.
Liu
,
R. Y.
, and
Singh
,
K.
,
1995
, “
Using iid Bootstrap Inference for General Non-iid Models
,”
J. Stat. Plann. Inference
,
43
(
1
), pp.
67
75
.
31.
Zhou
,
Z.-H.
,
Sun
,
Y.-Y.
, and
Li
,
Y.-F.
,
2009
, “
Multi-Instance Learning by Treating Instances as Non-iid Samples
,”
26th Annual International Conference on Machine Learning
, pp.
1249
1256
.
32.
Ganiz
,
M. C.
,
George
,
C.
, and
Pottenger
,
W. M.
,
2011
, “
Higher Order Naive Bayes: A Novel Non-IID Approach to Text Classification
,”
IEEE Trans. Knowl. Data Eng.
,
23
(
7
), pp.
1022
1034
.
33.
Görnitz
,
N.
,
Porbadnigk
,
A. K.
,
Binder
,
A.
,
Sannelli
,
C.
,
Braun
,
M.
,
Müller
,
K.-R.
, and
Kloft
,
M.
,
2014
, “
Learning and Evaluation in Presence of Non-iid Label Noise
,”
Seventeenth International Conference on Artificial Intelligence and Statistics
, pp.
293
302
.
34.
Lavrenko
,
V.
, and
Croft
,
W. B.
,
2001
, “
Relevance Based Language Models
,” 24th Annual International
ACM SIGIR
Conference on Research and Development in Information Retrieval
, pp.
120
127
.
35.
Zhang
,
K.
,
Cheng
,
Y.
,
Xie
,
Y.
,
Honbo
,
D.
,
Agrawal
,
A.
,
Palsetia
,
D.
,
Lee
,
K.
,
Liao
,
W.
, and
Choudhary
,
A.
,
2011
, “
SES: Sentiment Elicitation System for Social Media Data
,”
11th International Conference on Data Mining Workshops
, Vancouver, BC, Canada, Dec. 11, pp.
129
136
.
36.
Fox
,
C.
,
1989
, “
A Stop List for General Text
,”
ACM SIGIR Forum
, Vol.
24
, pp.
19
21
.
37.
Cormen
,
T. H.
,
Leiserson
,
C. E.
,
Rivest
,
R. L.
, and
Stein
,
C.
,
2001
,
Introduction to Algorithms
,
MIT Press
,
Cambridge, MA
, Vol.
6
.
38.
Chu
,
C.
,
Kim
,
S. K.
,
Lin
,
Y.-A.
,
Yu
,
Y.
,
Bradski
,
G.
,
Ng
,
A. Y.
, and
Olukotun
,
K.
,
2007
, “
Map-Reduce for Machine Learning on Multicore
,”
Adv. Neural Inf. Process. Syst.
,
19
, pp.
281
288
.
39.
“Wikipedia,” Wikipedia, last accessed Jan. 24,
2016
, https://en.wikipedia.org/wiki/Wikipedia
40.
Berthon
,
P. R.
,
Pitt
,
L. F.
,
McCarthy
,
I.
, and
Kates
,
S. M.
,
2007
, “
When Customers Get Clever: Managerial Approaches to Dealing With Creative Consumers
,”
Bus. Horiz.
,
50
(
1
), pp.
39
47
.
You do not currently have access to this content.