Sunday, October 30, 2016

Optimal interface for specifying a mapping between known field set and fields discovered in CSV file?


We are building a data import feature in which our users will open an arbitrary CSV file and define a one-to-one mapping of its fields to a list of known fields in our database. The list of fields is very long (around 30 known fields and an arbitrary number in the CSV [though likely between 20 and 50 in the CSV).


A naive implementation would be simply laying out all of the discovered fields in a vertical list paired with combo boxes for selecting the known-field mapping.


[discovered 1] [combo]
[discovered 2] [combo]
etc

Or horizontally,



[discovered 1] [discovered 2] [etc...]
[combo] [combo] [combo]

The issues here are:



  • Long list makes it impossible to view the entire mapping without scrolling, for example to check if a known field has already been mapped.

  • Preventing multiple selection of known field or allowing an assignment to be changed.


Possibly a better approach would be similar to the above, except with draggable boxes for the known fields that can be positioned adjacent to the discovered field.


Another issue to consider is CSV files without header rows: how to identify each discovered field?



Are there existing examples out there of a good solution to this interaction?



Answer



A lot hinges on what "The list of fields is very long" actually means. Can you clarify your question by giving an actual number? both for number of fields in the CSV and number of fields in the database. Also are the field names relatively short or are some descriptive long names?


Grids can display more information easily than lists. Have a grid for the fields found from the CSV and a grid for the fields in the database, both shown at the same time. Drag and drop to match them up. There is then visual feedback on what has been used and what hasn't been used, from both points of view, and there's a vary natural way to make changes.



To have more than about 200 fields in the database you will need ways to hide some of the fields and only show fields that are of interest. A tab control with 8 tabs, based on alphabetic sorting, is one straightforward way to fit more on.



Another issue to consider is CSV files without header rows: how to identify each discovered field?



There is nothing wrong with calling the discovered fields 'Field 1', 'Field 2' etc...



To be nicer - helping the user to actually identify the field - there should be an info-box on the screen that has more information for one field, for example 'Field 1: Appears to be a date; values between 9/Jul/1066 and 01/Apr/2035; 17 blank entries; 2 entries not in a valid date format; View'. This can show information for the first unmapped field, or the information for a field that has just been clicked on or dragged over.


This will help whether the user chooses to drag from discovered CSV field to database field or from database field to discovered CSV field.



Something to consider, you asked for optimal interface, in the case where you do have field headers for the discovered field. If your software is able to make an educated guess about what field from the database matches based on name and type, that can be shown as 'suggested match' in the info-box. Checkbox to confirm. You could also have a screen that shows all suggestions and allows them to be reviewed/accepted in bulk.


No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...