HCI Bibliography : Search Results skip to search form | skip to results |
Database updated: 2016-05-10 Searches since 2006-12-01: 32,270,337
director@hcibib.org
Hosted by ACM SIGCHI
The HCI Bibliogaphy was moved to a new server 2015-05-12 and again 2016-01-05, substantially degrading the environment for making updates.
There are no plans to add to the database.
Please send questions or comments to director@hcibib.org.
Query: ma_w* Results: 70 Sorted by: Date  Comments?
Help Dates
Limit:   
<<First <Previous Permalink Next> Last>> Records: 1 to 25 of 70 Jump to: 2016 | 15 | 14 | 13 | 11 | 10 | 09 | 08 | 07 | 06 | 05 | 04 | 03 | 02 |
[1] Highly Successful Projects Inhibit Coordination on Crowdfunding Sites Backstage of Crowdsourcing Legitimacy, Performance and Crowd Support / Solomon, Jacob / Ma, Wenjuan / Wash, Rick Proceedings of the ACM CHI'16 Conference on Human Factors in Computing Systems 2016-05-07 v.1 p.4568-4572
ACM Digital Library Link
Summary: Donors on crowdfunding sites must coordinate their actions to identify and collectively fund projects prior to their deadline. Some projects receive vast support immediately upon launch. Other seemingly worthwhile projects have more modest success or no success at raising funds. We examine how the presence of high-performing "superstar' projects on a crowdfunding site affects donors' ability to coordinate their actions and fund other less popular but still worthwhile projects on the site. In a lab experiment where users simulate the dynamics of a crowdfunding site, we found that superstar projects reduce the likelihood that other projects are funded by the crowd, even when the super project has no opportunity to steal away donations form other projects. We argue that this is due to superstar projects setting too high of a standard of what a "fundable" project looks like, leading donors to underestimate the amount of support within a crowd for less exceptional projects.

[2] Automatic Image Dataset Construction from Click-through Logs Using Deep Neural Network Session 9: Deep Learning and Multimedia / Bai, Yalong / Yang, Kuiyuan / Yu, Wei / Xu, Chang / Ma, Wei-Ying / Zhao, Tiejun Proceedings of the 2015 ACM International Conference on Multimedia 2015-10-26 p.441-450
ACM Digital Library Link
Summary: Labelled image datasets are the backbone for high-level image understanding tasks with wide application scenarios, and continuously drive and evaluate the progress of feature designing and supervised learning models. Recently, the million scale labelled image dataset further contributes to the rebirth of deep convolutional neural network and bypass manual designing handcraft features. However, the construction process of image dataset is mainly manual-based and quite labor intensive, which often take years' efforts to construct a million scale dataset with high quality. In this paper, we propose a deep learning based method to construct large scale image dataset in an automatic way. Specifically, word representation and image representation are learned in a deep neural network from large amount of click-through logs, and further used to define word-word similarity and image-word similarity. These two similarities are used to automatize the two labor intensive steps in manual-based image dataset construction: query formation and noisy image removal. With a new proposed cross convolutional filter regularizer, we can construct a million scale image dataset in one week. Finally, two image datasets are constructed to verify the effectiveness of the method. In addition to scale, the automatically constructed dataset has comparable accuracy, diversity and cross-dataset generalization with manually labelled image datasets.

[3] LightLDA: Big Topic Models on Modest Computer Clusters Technical Papers 2 / Yuan, Jinhui / Gao, Fei / Ho, Qirong / Dai, Wei / Wei, Jinliang / Zheng, Xun / Xing, Eric Po / Liu, Tie-Yan / Ma, Wei-Ying Proceedings of the 2015 International Conference on the World Wide Web 2015-05-18 v.1 p.1351-1361
ACM Digital Library Link
Summary: When building large-scale machine learning (ML) programs, such as massive topic models or deep neural networks with up to trillions of parameters and training examples, one usually assumes that such massive tasks can only be attempted with industrial-sized clusters with thousands of nodes, which are out of reach for most practitioners and academic researchers. We consider this challenge in the context of topic modeling on web-scale corpora, and show that with a modest cluster of as few as 8 machines, we can train a topic model with 1 million topics and a 1-million-word vocabulary (for a total of 1 trillion parameters), on a document collection with 200 billion tokens -- a scale not yet reported even with thousands of machines. Our major contributions include: 1) a new, highly-efficient O(1) Metropolis-Hastings sampling algorithm, whose running cost is (surprisingly) agnostic of model size, and empirically converges nearly an order of magnitude more quickly than current state-of-the-art Gibbs samplers; 2) a model-scheduling scheme to handle the big model challenge, where each worker machine schedules the fetch/use of sub-models as needed, resulting in a frugal use of limited memory capacity and network bandwidth; 3) a differential data-structure for model storage, which uses separate data structures for high- and low-frequency words to allow extremely large models to fit in memory, while maintaining high inference speed. These contributions are built on top of the Petuum open-source distributed ML framework, and we provide experimental evidence showing how this development puts massive data and models within reach on a small cluster, while still enjoying proportional time cost reductions with increasing cluster size.

[4] Don't Wait!: How Timing Affects Coordination of Crowdfunding Donations Studies of Coordination / Solomon, Jacob / Ma, Wenjuan / Wash, Rick Proceedings of ACM CSCW 2015 Conference on Computer-Supported Cooperative Work and Social Computing 2015-02-28 v.1 p.547-556
ACM Digital Library Link
Summary: Crowdfunding sites often impose deadlines for projects to receive their requested funds. This deadline structure creates a difficult decision for potential donors. Donors can donate early to a project to help it reach its goal and to signal to other donors that the project is worthwhile. But donors may also want to wait for a similar signal from others. We conduct an experimental simulation of a crowdfunding website to explore how potential donors to projects make this decision. We find evidence for both strategies in our experiment; some donate early while others wait till the last second. However, we also find that making an early donation is usually a better strategy for donors because the amount of donations made early in a project's campaign is often the only difference between that project being funded or not. This finding suggests that crowdfunding sites need to develop designs, policies and incentives that encourage people to make immediate donations so that the site can most efficiently fund projects.

[5] Bag-of-Words Based Deep Neural Network for Image Retrieval Multimedia Grand Challenge / Bai, Yalong / Yu, Wei / Xiao, Tianjun / Xu, Chang / Yang, Kuiyuan / Ma, Wei-Ying / Zhao, Tiejun Proceedings of the 2014 ACM International Conference on Multimedia 2014-11-03 p.229-232
ACM Digital Library Link
Summary: This work targets image retrieval task hold by MSR-Bing Grand Challenge. Image retrieval is considered as a challenge task because of the gap between low-level image representation and high-level textual query representation. Recently further developed deep neural network sheds light on narrowing the gap by learning high-level image representation from raw pixels. In this paper, we proposed a bag-of-words based deep neural network for image retrieval task, which learns high-level image representation and maps images into bag-of-words space. The DNN model is trained on the large scale clickthrough data, and the relevance between query and image is measured by the cosine similarity of query's bag-of-words representation and image's bag-of-words representation predicted by DNN, the visual similarity of images is computed by high-level image representation extracted via the DNN model too. Finally, PageRank algorithm is used to further improve the ranking list by considering visual similarity of images for each query. The experimental results achieved state-of-the-art performance and verified the effectiveness of our proposed method.

[6] Knowledge sharing and social media: Altruism, perceived online attachment motivation, and perceived online relationship commitment / Ma, Will W. K. / Chan, Albert Computers in Human Behavior 2014-10 v.39 n.0 p.51-58
Keywords: Knowledge sharing
Keywords: Perceived online attachment motivation
Keywords: Perceived online relationship commitment
Keywords: Altruism
Keywords: Social media
Link to Article at sciencedirect
Summary: Social media, such as Facebook and Twitter, have become extremely popular. Facebook, for example, has more than a billion registered users and thousands of millions of units of information are shared every day, including short phrases, articles, photos, and audio and video clips. However, only a tiny proportion of these sharing units trigger any type of knowledge exchange that is ultimately beneficial to the users. This study draws on the theory of belonging and the intrinsic motivation of altruism to explore the factors contributing to knowledge sharing behavior. Using a survey of 299 high school students applying for university after the release of the public examination results, we find that perceived online attachment motivation (β² = 0.31, p < 0.001) and perceived online relationship commitment (β² = 0.49, p < 0.001) have positive, direct, and significant effects on online knowledge sharing (R² 0.568). Moreover, when introduced into the model, altruism has a direct and significant effect on online knowledge sharing (β² = 0.46, p < 0.001) and the total variance explained by the extended model increases to 64.9%. The implications of the findings are discussed.

[7] Indoor air quality monitoring system for smart buildings Energy & environment / Chen, Xuxu / Zheng, Yu / Chen, Yubiao / Jin, Qiwei / Sun, Weiwei / Chang, Eric / Ma, Wei-Ying Proceedings of the 2014 International Joint Conference on Pervasive and Ubiquitous Computing 2014-09-13 v.1 p.471-475
ACM Digital Library Link
Summary: Many developing countries are suffering from air pollution, especially the Particulate Matter with diameter of 2.5 micrometers or less (PM2.5). While quite a few air quality monitoring stations have been built by governments in a city's public areas, the indoor PM2.5 has not yet been monitored and dealt with effectively. Though many office buildings have an HVAC (heating, ventilation, and air conditioning) system, PM2.5 is not considered as a factor when the system circulates fresh air from outdoors. This paper introduces a real system that we have deployed in the offices of four Microsoft campuses in China. This system instantly monitors indoor air quality on different floors of a building (including office areas, gyms, garages, and restaurants), enabling Microsoft employees to enquire the air quality of a place by using a mobile phone or checking a website. The information can guide a user's decision making, e.g., finding the right time to work out in the gym or turn on individual air filters in her own office. Through analyzing the indoor and outdoor air quality data collected over a long period, our system can even offer actionable and energy-efficient suggestion to HVAC systems, e.g., automatically turning on the system only a few hours earlier than usual if it is a heavily polluted day, or identifying the filters in HVAC system that should be renewed.

[8] Comparison of Enhanced Visual and Haptic Features in a Virtual Reality-Based Haptic Simulation Haptic Interaction / Clamann, Michael / Ma, Wenqi / Kaber, David B. HCI International 2013: 15th International Conference on HCI, Part IV: Interaction Modalities and Techniques 2013-07-21 v.4 p.551-560
Keywords: haptics; virtual reality; rehabilitation
Link to Digital Content at Springer
Summary: An experiment was conducted to compare the learning effects following motor skill training using three types of virtual reality simulations. Training and testing were presented using virtual reality (VR) and standardized forms of existing psychomotor tests, respectively. The VR training simulations included haptic, visual and a combination of haptic and visual assistance designed to accelerate training. A comparison of performance test results prior to and following training revealed conditions providing haptic assistance to yield lower scores related to fine motor skill training than the visual-only aiding condition. Similarly, training in the visual condition resulted in comparatively lower cognitive skill scores. The present investigation incorporating healthy subjects was designed as part of an ongoing research effort to provide insight on the design of VR simulations for rehabilitation of motor skills in patients with a history of mTBI.

[9] Impact of restrictive composition policy on user password choices / Campbell, John / Ma, Wanli / Kleeman, Dale Behaviour and Information Technology 2011-05-01 v.30 n.3 p.379-388
Link to Article at Taylor & Francis
Summary: This study investigates the efficacy of using a restrictive password composition policy. The primary function of access controls is to restrict the use of information systems and other computer resources to authorised users only. Although more secure alternatives exist, password-based systems remain the predominant method of user authentication. Prior research shows that password security is often compromised by users who adopt inadequate password composition and management practices. One particularly under-researched area is whether restrictive password composition policies actually change user behaviours in significant ways. The results of this study show that a password composition policy reduces the similarity of passwords to dictionary words. However, in this case the regime did not reduce the use of meaningful information in passwords such as names and birth dates, nor did it reduce password recycling.

[10] Recommending friends and locations based on individual location history / Zheng, Yu / Zhang, Lizhu / Ma, Zhengxin / Xie, Xing / Ma, Wei-Ying ACM Transactions on The Web 2011-02 v.5 n.1 p.5
ACM Digital Library Link
Summary: The increasing availability of location-acquisition technologies (GPS, GSM networks, etc.) enables people to log the location histories with spatio-temporal data. Such real-world location histories imply, to some extent, users' interests in places, and bring us opportunities to understand the correlation between users and locations. In this article, we move towards this direction and report on a personalized friend and location recommender for the geographical information systems (GIS) on the Web. First, in this recommender system, a particular individual's visits to a geospatial region in the real world are used as their implicit ratings on that region. Second, we measure the similarity between users in terms of their location histories and recommend to each user a group of potential friends in a GIS community. Third, we estimate an individual's interests in a set of unvisited regions by involving his/her location history and those of other users. Some unvisited locations that might match their tastes can be recommended to the individual. A framework, referred to as a hierarchical-graph-based similarity measurement (HGSM), is proposed to uniformly model each individual's location history, and effectively measure the similarity among users. In this framework, we take into account three factors: 1) the sequence property of people's outdoor movements, 2) the visited popularity of a geospatial region, and 3) the hierarchical property of geographic spaces. Further, we incorporated a content-based method into a user-based collaborative filtering algorithm, which uses HGSM as the user similarity measure, to estimate the rating of a user on an item. We evaluated this recommender system based on the GPS data collected by 75 subjects over a period of 1 year in the real world. As a result, HGSM outperforms related similarity measures, namely similarity-by-count, cosine similarity, and Pearson similarity measures. Moreover, beyond the item-based CF method and random recommendations, our system provides users with more attractive locations and better user experiences of recommendation.

[11] Pricing guaranteed contracts in online display advertising KM track: large-scale statistical techniques / Bharadwaj, Vijay / Ma, Wenjing / Schwarz, Michael / Shanmugasundaram, Jayavel / Vee, Erik / Xie, Jack / Yang, Jian Proceedings of the 2010 ACM Conference on Information and Knowledge Management 2010-10-26 p.399-408
ACM Digital Library Link
Summary: We consider the problem of pricing guaranteed contracts in online display advertising. This problem has two key characteristics that when taken together distinguish it from related offline and online pricing problems: (1) the guaranteed contracts are sold months in advance, and at various points in time, and (2) the inventory that is sold to guaranteed contracts -- user visits -- is very high-dimensional, having hundreds of possible attributes, and advertisers can potentially buy any of the very large number (many trillions) of combinations of these attributes. Consequently, traditional pricing methods such as real-time or combinatorial auctions, or optimization-based pricing based on self- and cross-elasticities are not directly applicable to this problem. We hence propose a new pricing method, whereby the price of a guaranteed contract is computed based on the prices of the individual user visits that the contract is expected to get. The price of each individual user visit is in turn computed using historical sales prices that are negotiated between a sales person and an advertiser, and we propose two different variants in this context. Our evaluation using real guaranteed contracts shows that the proposed pricing method is accurate in the sense that it can effectively predict the prices of other (out-of-sample) historical contracts.

[12] Mining adjacent markets from a large-scale ads video collection for image advertising Poster presentations / Feng, Guwen / Wang, Xin-Jing / Zhang, Lei / Ma, Wei-Ying Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2010-07-19 p.893-894
Keywords: adjacent marketing, image advertising, video retrieval
ACM Digital Library Link
Summary: The research on image advertising is still in its infancy. Most previous approaches suggest ads by directly matching an ad to a query image, which lacks the power to identify ads from adjacent market. In this paper, we tackle the problem by mining knowledge on adjacent markets from ads videos with a novel Multi-Modal Dirichlet Process Mixture Sets model, which is a unified model of (video frames) clustering and (ads) ranking. Our approach is not only capable of discovering relevant ads (e.g. car ads for a query car image), but also suggesting ads from adjacent markets (e.g. tyre ads). Experimental results show that our proposed approach is fairly effective.

[13] A large-scale study on map search logs / Xiao, Xiangye / Luo, Qiong / Li, Zhisheng / Xie, Xing / Ma, Wei-Ying ACM Transactions on The Web 2010-07 v.4 n.3 p.8
ACM Digital Library Link
Summary: Map search engines, such as Google Maps, Yahoo! Maps, and Microsoft Live Maps, allow users to explicitly specify a target geographic location, either in keywords or on the map, and to search businesses, people, and other information of that location. In this article, we report a first study on a million-entry map search log. We identify three key attributes of a map search record -- the keyword query, the target location and the user location, and examine the characteristics of these three dimensions separately as well as the associations between them. Comparing our results with those previously reported on logs of general search engines and mobile search engines, including those for geographic queries, we discover the following unique features of map search: (1) People use longer queries and modify queries more frequently in a session than in general search and mobile search; People view fewer result pages per query than in general search; (2) The popular query topics in map search are different from those in general search and mobile search; (3) The target locations in a session change within 50 kilometers for almost 80% of the sessions; (4) Queries, search target locations and user locations (both at the city level) all follow the power law distribution; (5) One third of queries are issued for target locations within 50 kilometers from the user locations; (6) The distribution of a query over target locations appears to follow the geographic location of the queried entity.

[14] Back to the future: bleeding-edge IVR Human interfaces / Bouzid, Ahmed / Ma, Weiye interactions 2010-05 v.17 n.3 p.18-20
ACM Digital Library Link

[15] Diversifying landmark image search results by learning interested views from community photos WWW 2010 demos / Ren, Yuheng / Yu, Mo / Wang, Xin-Jing / Zhang, Lei / Ma, Wei-Ying Proceedings of the 2010 International Conference on the World Wide Web 2010-04-26 v.1 p.1289-1292
Keywords: landmark image search, set-based ranking, user interest modeling
ACM Digital Library Link
Summary: In this paper, we demonstrate a novel landmark photo search and browsing system: Agate, which ranks landmark image search results considering their relevance, diversity and quality. Agate learns from community photos the most interested aspects and related activities of a landmark, and generates adaptively a Table of Content (TOC) as a summary of the attractions to facilitate the user browsing. Image search results are thus re-ranked with the TOC so as to ensure a quick overview of the attractions of the landmarks. A novel non-parametric TOC generation and set-based ranking algorithm, MoM-DPM Sets, is proposed as the key technology of Agate. Experimental results based on human evaluation show the effectiveness of our model and users' preference for Agate.

[16] Understanding transportation modes based on GPS data for web applications / Zheng, Yu / Chen, Yukun / Li, Quannan / Xie, Xing / Ma, Wei-Ying ACM Transactions on The Web 2010-01 v.4 n.1 p.1
ACM Digital Library Link
Summary: User mobility has given rise to a variety of Web applications, in which the global positioning system (GPS) plays many important roles in bridging between these applications and end users. As a kind of human behavior, transportation modes, such as walking and driving, can provide pervasive computing systems with more contextual information and enrich a user's mobility with informative knowledge. In this article, we report on an approach based on supervised learning to automatically infer users' transportation modes, including driving, walking, taking a bus and riding a bike, from raw GPS logs. Our approach consists of three parts: a change point-based segmentation method, an inference model and a graph-based post-processing algorithm. First, we propose a change point-based segmentation method to partition each GPS trajectory into separate segments of different transportation modes. Second, from each segment, we identify a set of sophisticated features, which are not affected by differing traffic conditions (e.g., a person's direction when in a car is constrained more by the road than any change in traffic conditions). Later, these features are fed to a generative inference model to classify the segments of different modes. Third, we conduct graph-based postprocessing to further improve the inference performance. This postprocessing algorithm considers both the commonsense constraints of the real world and typical user behaviors based on locations in a probabilistic manner. The advantages of our method over the related works include three aspects. (1) Our approach can effectively segment trajectories containing multiple transportation modes. (2) Our work mined the location constraints from user-generated GPS logs, while being independent of additional sensor data and map information like road networks and bus stops. (3) The model learned from the dataset of some users can be applied to infer GPS data from others. Using the GPS logs collected by 65 people over a period of 10 months, we evaluated our approach via a set of experiments. As a result, based on the change-point-based segmentation method and Decision Tree-based inference model, we achieved prediction accuracy greater than 71 percent. Further, using the graph-based post-processing algorithm, the performance attained a 4-percent enhancement.

[17] Incorporating site-level knowledge to extract structured data from web forums Data mining/session: learning / Yang, Jiang-Ming / Cai, Rui / Wang, Yida / Zhu, Jun / Zhang, Lei / Ma, Wei-Ying Proceedings of the 2009 International Conference on the World Wide Web 2009-04-20 p.181-190
Keywords: Markov logic networks (MLNS), information extraction, site-level knowledge, structured data, web forums
ACM Digital Library Link
Summary: Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to both complex page layout designs and unrestricted user created posts. In this paper, we study the problem of structured data extraction from various web forum sites. Our target is to find a solution as general as possible to extract structured data, such as post title, post author, post time, and post content from any forum site. In contrast to most existing information extraction methods, which only leverage the knowledge inside an individual page, we incorporate both page-level and site-level knowledge and employ Markov logic networks (MLNs) to effectively integrate all useful evidence by learning their importance automatically. Site-level knowledge includes (1) the linkages among different object pages, such as list pages and post pages, and (2) the interrelationships of pages belonging to the same object. The experimental results on 20 forums show a very encouraging information extraction performance, and demonstrate the ability of the proposed approach on various forums. We also show that the performance is limited if only page-level knowledge is used, while when incorporating the site-level knowledge both precision and recall can be significantly improved.

[18] Mining interesting locations and travel sequences from GPS trajectories User interfaces and mobile web/session: mobile web / Zheng, Yu / Zhang, Lizhu / Xie, Xing / Ma, Wei-Ying Proceedings of the 2009 International Conference on the World Wide Web 2009-04-20 p.791-800
Keywords: GPS trajectories, location recommendation, spatial data mining, user travel experience
ACM Digital Library Link
Summary: The increasing availability of GPS-enabled devices is changing the way people interact with the Web, and brings us a large amount of GPS trajectories representing people's location histories. In this paper, based on multiple users' GPS trajectories, we aim to mine interesting locations and classical travel sequences in a given geospatial region. Here, interesting locations mean the culturally important places, such as Tiananmen Square in Beijing, and frequented public areas, like shopping malls and restaurants, etc. Such information can help users understand surrounding locations, and would enable travel recommendation. In this work, we first model multiple individuals' location histories with a tree-based hierarchical graph (TBHG). Second, based on the TBHG, we propose a HITS (Hypertext Induced Topic Search)-based inference model, which regards an individual's access on a location as a directed link from the user to that location. This model infers the interest of a location by taking into account the following three factors. 1) The interest of a location depends on not only the number of users visiting this location but also these users' travel experiences. 2) Users' travel experiences and location interests have a mutual reinforcement relationship. 3) The interest of a location and the travel experience of a user are relative values and are region-related. Third, we mine the classical travel sequences among locations considering the interests of these locations and users' travel experiences. We evaluated our system using a large GPS dataset collected by 107 users over a period of one year in the real world. As a result, our HITS-based inference model outperformed baseline approaches like rank-by-count and rank-by-frequency. Meanwhile, when considering the users' travel experiences and location interests, we achieved a better performance beyond baselines, such as rank-by-count and rank-by-interest, etc.

[19] Browsing on small displays by transforming Web pages into hierarchically structured subpages / Xiao, Xiangye / Luo, Qiong / Hong, Dan / Fu, Hongbo / Xie, Xing / Ma, Wei-Ying ACM Transactions on The Web 2009-01 v.3 n.1 p.4
ACM Digital Library Link
Summary: We propose a new Web page transformation method to facilitate Web browsing on handheld devices such as Personal Digital Assistants (PDAs). In our approach, an original Web page that does not fit on the screen is transformed into a set of subpages, each of which fits on the screen. This transformation is done through slicing the original page into page blocks iteratively, with several factors considered. These factors include the size of the screen, the size of each page block, the number of blocks in each transformed page, the depth of the tree hierarchy that the transformed pages form, as well as the semantic coherence between blocks. We call the tree hierarchy of the transformed pages an SP-tree. In an SP-tree, an internal node consists of a textually enhanced thumbnail image with hyperlinks, and a leaf node is a block extracted from a subpage of the original Web page. We adaptively adjust the fanout and the height of the SP-tree so that each thumbnail image is clear enough for users to read, while at the same time, the number of clicks needed to reach a leaf page is few. Through this transformation algorithm, we preserve the contextual information in the original Web page and reduce scrolling. We have implemented this transformation module on a proxy server and have conducted usability studies on its performance. Our system achieved a shorter task completion time compared with that of transformations from the Opera browser in nine of ten tasks. The average improvement on familiar pages was 44%. The average improvement on unfamiliar pages was 37%. Subjective responses were positive.

[20] Search-based query suggestion Poster session 2/information retrieval / Yang, Jiang-Ming / Cai, Rui / Jing, Feng / Wang, Shuo / Zhang, Lei / Ma, Wei-Ying Proceedings of the 2008 ACM Conference on Information and Knowledge Management 2008-10-26 p.1439-1440
ACM Digital Library Link
Summary: In this paper, we proposed a unified strategy to combine query log and search results for query suggestion. In this way, we leverage both the users' search intentions for popular queries and the power of search engines for unpopular queries. The suggested queries are also ranked according to their relevance and qualities; and each suggestion is described with a rich snippet including a photo and related description.

[21] Understanding mobility based on GPS data Location-aware applications / Zheng, Yu / Li, Quannan / Chen, Yukun / Xie, Xing / Ma, Wei-Ying Proceedings of the 2008 International Conference on Uniquitous Computing 2008-09-21 p.312-321
Keywords: GPS, GeoLife, infer transportation mode, machine learning, recognize human behavior
ACM Digital Library Link
Summary: Both recognizing human behavior and understanding a user's mobility from sensor data are critical issues in ubiquitous computing systems. As a kind of user behavior, the transportation modes, such as walking, driving, etc., that a user takes, can enrich the user's mobility with informative knowledge and provide pervasive computing systems with more context information. In this paper, we propose an approach based on supervised learning to infer people's motion modes from their GPS logs. The contribution of this work lies in the following two aspects. On one hand, we identify a set of sophisticated features, which are more robust to traffic condition than those other researchers ever used. On the other hand, we propose a graph-based post-processing algorithm to further improve the inference performance. This algorithm considers both the commonsense constraint of real world and typical user behavior based on location in a probabilistic manner. Using the GPS logs collected by 65 people over a period of 10 months, we evaluated our approach via a set of experiments. As a result, based on the change point-based segmentation method and Decision Tree-based inference model, the new features brought an eight percent improvement in inference accuracy over previous result, and the graph-based post-processing achieve a further four percent enhancement.

[22] Directly optimizing evaluation measures in learning to rank Learning to rank: 1 / Xu, Jun / Liu, Tie-Yan / Lu, Min / Li, Hang / Ma, Wei-Ying Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2008-07-20 p.107-114
ACM Digital Library Link
Summary: One of the central issues in learning to rank for information retrieval is to develop algorithms that construct ranking models by directly optimizing evaluation measures used in information retrieval such as Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG). Several such algorithms including SVMmap and AdaRank have been proposed and their effectiveness has been verified. However, the relationships between the algorithms are not clear, and furthermore no comparisons have been conducted between them. In this paper, we conduct a study on the approach of directly optimizing evaluation measures in learning to rank for Information Retrieval (IR). We focus on the methods that minimize loss functions upper bounding the basic loss function defined on the IR measures. We first provide a general framework for the study and analyze the existing algorithms of SVMmap and AdaRank within the framework. The framework is based on upper bound analysis and two types of upper bounds are discussed. Moreover, we show that we can derive new algorithms on the basis of this analysis and create one example algorithm called PermuRank. We have also conducted comparisons between SVMmap, AdaRank, PermuRank, and conventional methods of Ranking SVM and RankBoost, using benchmark datasets. Experimental results show that the methods based on direct optimization of evaluation measures can always outperform conventional methods of Ranking SVM and RankBoost. However, no significant difference exists among the performances of the direct optimization methods themselves.

[23] Exploring traversal strategy for web forum crawling Analysis of social networks / Wang, Yida / Yang, Jiang-Ming / Lai, Wei / Cai, Rui / Zhang, Lei / Ma, Wei-Ying Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2008-07-20 p.459-466
ACM Digital Library Link
Summary: In this paper, we study the problem of Web forum crawling. Web forum has now become an important data source of many Web applications; while forum crawling is still a challenging task due to complex in-site link structures and login controls of most forum sites. Without carefully selecting the traversal path, a generic crawler usually downloads many duplicate and invalid pages from forums, and thus wastes both the precious bandwidth and the limited storage space. To crawl forum data more effectively and efficiently, in this paper, we propose an automatic approach to exploring an appropriate traversal strategy to direct the crawling of a given target forum. In detail, the traversal strategy consists of the identification of the skeleton links and the detection of the page-flipping links. The skeleton links instruct the crawler to only crawl valuable pages and meanwhile avoid duplicate and uninformative ones; and the page-flipping links tell the crawler how to completely download a long discussion thread which is usually shown in multiple pages in Web forums. The extensive experimental results on several forums show encouraging performance of our approach. Following the discovered traversal strategy, our forum crawler can archive more informative pages in comparison with previous related work and a commercial generic crawler.

[24] Rich media and web 2.0 Panels / Chang, Edward / Ong, Ken / Boll, Susanne / Ma, Wei-Ying Proceedings of the 2008 International Conference on the World Wide Web 2008-04-21 p.1259-1260
Keywords: rich media, web 2.0
ACM Digital Library Link
Summary: Rich media data, such as video, imagery, music, and gaming, do no longer play just a supporting role on the World Wide Web to text data. Thanks to Web 2.0, rich media is the primary content on sites such as Flickr, PicasaWeb, YouTube, and QQ. Because of massive user generated content, the volume of rich media being transmitted on the Internet has surpassed that of text. It is vital to properly manage these data to ensure efficient bandwidth utilization, to support effective indexing and search, and to safeguard copyrights (just to name a few). This panel invites both researchers and practitioners to discuss the challenges of Web-scale media-data management. In particular, the panelists will address issues such as leveraging Rich Media and Web 2.0, indexing, search, and scalability.

[25] FRank: a ranking method with fidelity loss Learning to rank II / Tsai, Ming-Feng / Liu, Tie-Yan / Qin, Tao / Chen, Hsin-Hsi / Ma, Wei-Ying Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2007-07-23 p.383-390
ACM Digital Library Link
Summary: Ranking problem is becoming important in many fields, especially in information retrieval (IR). Many machine learning techniques have been proposed for ranking problem, such as RankSVM, RankBoost, and RankNet. Among them, RankNet, which is based on a probabilistic ranking framework, is leading to promising results and has been applied to a commercial Web search engine. In this paper we conduct further study on the probabilistic ranking framework and provide a novel loss function named fidelity loss for measuring loss of ranking. The fidelity loss not only inherits effective properties of the probabilistic ranking framework in RankNet, but possesses new properties that are helpful for ranking. This includes the fidelity loss obtaining zero for each document pair, and having a finite upper bound that is necessary for conducting query-level normalization. We also propose an algorithm named FRank based on a generalized additive model for the sake of minimizing the fidelity loss and learning an effective ranking function. We evaluated the proposed algorithm for two datasets: TREC dataset and real Web search dataset. The experimental results show that the proposed FRank algorithm outperforms other learning-based ranking methods on both conventional IR problem and Web search.
<<First <Previous Permalink Next> Last>> Records: 1 to 25 of 70 Jump to: 2016 | 15 | 14 | 13 | 11 | 10 | 09 | 08 | 07 | 06 | 05 | 04 | 03 | 02 |